Professional Documents
Culture Documents
Ehb Chapters 1 8
Ehb Chapters 1 8
This chapter presents the assumptions, principles, and techniques necessary to gain insight into
data via EDA--exploratory data analysis.
1. EDA Introduction [1.1.]
1. What is EDA? [1.1.1.]
2. How Does Exploratory Data Analysis differ from Classical Data Analysis? [1.1.2.]
1. Model [1.1.2.1.]
2. Focus [1.1.2.2.]
3. Techniques [1.1.2.3.]
4. Rigor [1.1.2.4.]
5. Data Treatment [1.1.2.5.]
6. Assumptions [1.1.2.6.]
3. How Does Exploratory Data Analysis Differ from Summary Analysis? [1.1.3.]
4. What are the EDA Goals? [1.1.4.]
5. The Role of Graphics [1.1.5.]
6. An EDA/Graphics Example [1.1.6.]
7. General Problem Categories [1.1.7.]
Philosophy EDA is not identical to statistical graphics although the two terms are
used almost interchangeably. Statistical graphics is a collection of
techniques--all graphically based and all focusing on one data
characterization aspect. EDA encompasses a larger venue; EDA is an
approach to data analysis that postpones the usual assumptions about
what kind of model the data follow with the more direct approach of
allowing the data itself to reveal its underlying structure and model.
EDA is not a mere collection of techniques; EDA is a philosophy as to
how we dissect a data set; what we look for; how we look; and how we
interpret. It is true that EDA heavily uses the collection of techniques
that we call "statistical graphics", but it is not identical to statistical
graphics per se.
History The seminal work in EDA is Exploratory Data Analysis, Tukey, (1977).
Over the years it has benefitted from other noteworthy publications such
as Data Analysis and Regression, Mosteller and Tukey (1977),
Interactive Data Analysis, Hoaglin (1977), The ABC's of EDA,
Velleman and Hoaglin (1981) and has gained a large following as "the"
way to analyze a data set.
Techniques Most EDA techniques are graphical in nature with a few quantitative
techniques. The reason for the heavy reliance on graphics is that by its
very nature the main role of EDA is to open-mindedly explore, and
graphics gives the analysts unparalleled power to do so, enticing the
data to reveal its structural secrets, and being always ready to gain some
new, often unsuspected, insight into the data. In combination with the
natural pattern-recognition capabilities that we all possess, graphics
provides, of course, unparalleled power to carry this out.
The particular graphical techniques employed in EDA are often quite
simple, consisting of various techniques of:
1. Plotting the raw data (such as data traces, histograms,
bihistograms, probability plots, lag plots, block plots, and Youden
plots.
2. Plotting simple statistics such as mean plots, standard deviation
plots, box plots, and main effects plots of the raw data.
3. Positioning such plots so as to maximize our natural
pattern-recognition abilities, such as using multiple plots per
page.
Paradigms These three approaches are similar in that they all start with a general
for Analysis science/engineering problem and all yield science/engineering
Techniques conclusions. The difference is the sequence and focus of the
intermediate steps.
For classical analysis, the sequence is
Problem => Data => Model => Analysis => Conclusions
For EDA, the sequence is
Problem => Data => Analysis => Model => Conclusions
For Bayesian, the sequence is
Problem => Data => Model => Prior Distribution => Analysis =>
Conclusions
Method of Thus for classical analysis, the data collection is followed by the
dealing with imposition of a model (normality, linearity, etc.) and the analysis,
underlying estimation, and testing that follows are focused on the parameters of
model for that model. For EDA, the data collection is not followed by a model
the data imposition; rather it is followed immediately by analysis with a goal of
distinguishes inferring what model would be appropriate. Finally, for a Bayesian
the 3 analysis, the analyst attempts to incorporate scientific/engineering
approaches knowledge/expertise into the analysis by imposing a data-independent
distribution on the parameters of the selected model; the analysis thus
consists of formally combining both the prior distribution on the
parameters and the collected data to jointly make inferences and/or test
assumptions about the model parameters.
In the real world, data analysts freely mix elements of all of the above
three approaches (and other approaches). The above distinctions were
made to emphasize the major differences among the three approaches.
1.1.2.1. Model
Classical The classical approach imposes models (both deterministic and
probabilistic) on the data. Deterministic models include, for example,
regression models and analysis of variance (ANOVA) models. The most
common probabilistic model assumes that the errors about the
deterministic model are normally distributed--this assumption affects the
validity of the ANOVA F tests.
Exploratory The Exploratory Data Analysis approach does not impose deterministic
or probabilistic models on the data. On the contrary, the EDA approach
allows the data to suggest admissible models that best fit the data.
1.1.2.2. Focus
Classical The two approaches differ substantially in focus. For classical analysis,
the focus is on the model--estimating parameters of the model and
generating predicted values from the model.
Exploratory For exploratory data analysis, the focus is on the data--its structure,
outliers, and models suggested by the data.
1.1.2.3. Techniques
Classical Classical techniques are generally quantitative in nature. They include
ANOVA, t tests, chi-squared tests, and F tests.
Exploratory EDA techniques are generally graphical. They include scatter plots,
character plots, box plots, histograms, bihistograms, probability plots,
residual plots, and mean plots.
1.1.2.4. Rigor
Classical Classical techniques serve as the probabilistic foundation of science and
engineering; the most important characteristic of classical techniques is
that they are rigorous, formal, and "objective".
Exploratory EDA techniques do not share in that rigor or formality. EDA techniques
make up for that lack of rigor by being very suggestive, indicative, and
insightful about what the appropriate model should be.
EDA techniques are subjective and depend on interpretation which may
differ from analyst to analyst, although experienced analysts commonly
arrive at identical conclusions.
Exploratory The EDA approach, on the other hand, often makes use of (and shows)
all of the available data. In this sense there is no corresponding loss of
information.
1.1.2.6. Assumptions
Classical The "good news" of the classical approach is that tests based on
classical techniques are usually very sensitive--that is, if a true shift in
location, say, has occurred, such tests frequently have the power to
detect such a shift and to conclude that such a shift is "statistically
significant". The "bad news" is that classical tests depend on underlying
assumptions (e.g., normality), and hence the validity of the test
conclusions becomes dependent on the validity of the underlying
assumptions. Worse yet, the exact underlying assumptions may be
unknown to the analyst, or if known, untested. Thus the validity of the
scientific conclusions becomes intrinsically linked to the validity of the
underlying assumptions. In practice, if such assumptions are unknown
or untested, the validity of the scientific conclusions becomes suspect.
Exploratory In contrast, EDA has as its broadest goal the desire to gain insight into
the engineering/scientific process behind the data. Whereas summary
statistics are passive and historical, EDA is active and futuristic. In an
attempt to "understand" the process and improve it in the future, EDA
uses the data as a "window" to peer into the heart of the process that
generated the data. There is an archival role in the research and
manufacturing world for summary statistics, but there is an enormously
larger role for the EDA approach.
Insight into Insight implies detecting and uncovering underlying structure in the
the Data data. Such underlying structure may not be encapsulated in the list of
items above; such items serve as the specific targets of an analysis, but
the real insight and "feel" for a data set comes as the analyst judiciously
probes and explores the various subtleties of the data. The "feel" for the
data comes almost exclusively from the application of various graphical
techniques, the collection of which serves as the window into the
essence of the data. Graphics are irreplaceable--there are no quantitative
analogues that will give the same insight as well-chosen graphics.
To get a "feel" for the data, it is not enough for the analyst to know what
is in the data; the analyst also must know what is not in the data, and the
only way to do that is to draw on our own human pattern-recognition
and comparative abilities in the context of a series of judicious graphical
techniques applied to the data.
● graphical
Quantitative Quantitative techniques are the set of statistical procedures that yield
numeric or tabular output. Examples of quantitative techniques include:
● hypothesis testing
● analysis of variance
● point estimates and confidence intervals
● least squares regression
These and similar techniques are all valuable and are mainstream in
terms of classical analysis.
Graphical On the other hand, there is a large collection of statistical tools that we
generally refer to as graphical techniques. These include:
● scatter plots
● histograms
● probability plots
● residual plots
● box plots
● block plots
EDA The EDA approach relies heavily on these and similar graphical
Approach techniques. Graphical procedures are not just tools that we could use in
Relies an EDA context, they are tools that we must use. Such graphical tools
Heavily on are the shortest path to gaining insight into a data set in terms of
Graphical ● testing assumptions
Techniques
● model selection
● model validation
● estimator selection
● relationship identification
● outlier detection
If one is not using statistical graphics, then one is forfeiting insight into
one or more aspects of the underlying structure of the data.
Data
X Y
10.00 8.04
8.00 6.95
13.00 7.58
9.00 8.81
11.00 8.33
14.00 9.96
6.00 7.24
4.00 4.26
12.00 10.84
7.00 4.82
5.00 5.68
Scatter Plot In contrast, the following simple scatter plot of the data
Three This kind of characterization for the data serves as the core for getting
Additional insight/feel for the data. Such insight/feel does not come from the
Data Sets quantitative statistics; on the contrary, calculations of quantitative
statistics such as intercept and slope should be subsequent to the
characterization and will make sense only if the characterization is
true. To illustrate the loss of information that results when the graphics
insight step is skipped, consider the following three data sets
[Anscombe data sets 2, 3, and 4]:
X2 Y2 X3 Y3 X4 Y4
10.00 9.14 10.00 7.46 8.00 6.58
8.00 8.14 8.00 6.77 8.00 5.76
13.00 8.74 13.00 12.74 8.00 7.71
Scatter Plots
Importance These points are exactly the substance that provide and define "insight"
of and "feel" for a data set. They are the goals and the fruits of an open
Exploratory exploratory data analysis (EDA) approach to the data. Quantitative
Analysis statistics are not wrong per se, but they are incomplete. They are
incomplete because they are numeric summaries which in the
summarization operation do a good job of focusing on a particular
aspect of the data (e.g., location, intercept, slope, degree of relatedness,
etc.) by judiciously reducing the data to a few numbers. Doing so also
filters the data, necessarily omitting and screening out other sometimes
crucial information in the focusing operation. Quantitative statistics
focus but also filter; and filtering is exactly what makes the
quantitative approach incomplete at best and misleading at worst.
The estimated intercepts (= 3) and slopes (= 0.5) for data sets 2, 3, and
4 are misleading because the estimation is done in the context of an
assumed linear model and that linearity assumption is the fatal flaw in
this analysis.
Univariate
UNIVARIATE CONTROL
and Control
Data: Data:
A single column of A single column of
numbers, Y. numbers, Y.
Model: Model:
y = constant + error y = constant + error
Output: Output:
1. A number (the estimated A "yes" or "no" to the
constant in the model). question "Is the system
2. An estimate of uncertainty out of control?".
for the constant. Techniques:
3. An estimate of the ● Control Charts
distribution for the error.
Techniques:
● 4-Plot
● Probability Plot
● PPCC Plot
Comparative
COMPARATIVE SCREENING
and
Screening Data: Data:
A single response variable A single response variable
and k independent and k independent
variables (Y, X1, X2, ... , variables (Y, X1, X2, ... ,
Xk), primary focus is on Xk).
one (the primary factor) of Model:
these independent
y = f(x1, x2, ..., xk) + error
variables.
Model: Output:
y = f(x1, x2, ..., xk) + error 1. A ranked list (from most
important to least
Output: important) of factors.
A "yes" or "no" to the 2. Best settings for the
question "Is the primary factors.
factor significant?".
3. A good model/prediction
Techniques: equation relating Y to the
● Block Plot factors.
● Scatter Plot Techniques:
● Box Plot ● Block Plot
● Probability Plot
● Bihistogram
Optimization
OPTIMIZATION REGRESSION
and
Regression Data: Data:
A single response variable A single response variable
and k independent and k independent
variables (Y, X1, X2, ... , variables (Y, X1, X2, ... ,
Xk). Xk). The independent
Model: variables can be
continuous.
y = f(x1, x2, ..., xk) + error
Model:
Output:
y = f(x1, x2, ..., xk) + error
Best settings for the factor
variables. Output:
Techniques: A good model/prediction
equation relating Y to the
● Block Plot
factors.
● Scatter Plot
● 6-Plot
Time Series
TIME SERIES MULTIVARIATE
and
Multivariate Data: Data:
A column of time k factor variables (X1, X2, ... ,
dependent numbers, Y. Xk).
In addition, time is an
indpendent variable. Model:
The time variable can The model is not explicit.
be either explicit or Output:
implied. If the data are Identify underlying
not equi-spaced, the correlation structure in the
time variable should be data.
explicitly provided.
Techniques:
Model:
● Star Plot
yt = f(t) + error
● Scatter Plot Matrix
The model can be either
a time domain based or ● Conditioning Plot
frequency domain ● Profile Plot
based.
● Principal Components
Output:
Clustering
●
A good
● Discrimination/Classification
model/prediction
equation relating Y to Note that multivarate analysis is
previous values of Y. only covered lightly in this
Techniques: Handbook.
● Autocorrelation Plot
● Spectrum
● Complex Demodulation
Amplitude Plot
● Complex Demodulation
Phase Plot
● ARIMA Models
Univariate or The "fixed location" referred to in item 3 above differs for different
Single problem types. The simplest problem type is univariate; that is, a
Response single variable. For the univariate problem, the general model
Variable response = deterministic component + random component
becomes
response = constant + error
Assumptions For this case, the "fixed location" is simply the unknown constant. We
for Univariate can thus imagine the process at hand to be operating under constant
Model conditions that produce a single column of data with the properties
that
● the data are uncorrelated with one another;
Extrapolation The universal power and importance of the univariate model is that it
to a Function can easily be extended to the more general case where the
of Many deterministic component is not just a constant, but is in fact a function
Variables of many variables, and the engineering objective is to characterize and
model the function.
Residuals Will The key point is that regardless of how many factors there are, and
Behave regardless of how complicated the function is, if the engineer succeeds
According to in choosing a good model, then the differences (residuals) between the
Univariate raw response data and the predicted values from the fitted model
Assumptions should themselves behave like a univariate process. Furthermore, the
residuals from this univariate process fit will behave like:
● random drawings;
Validation of Thus if the residuals from the fitted model do in fact behave like the
Model ideal, then testing of underlying assumptions becomes a tool for the
validation and quality of fit of the chosen model. On the other hand, if
the residuals from the chosen fitted model violate one or more of the
above univariate assumptions, then the chosen fitted model is
inadequate and an opportunity exists for arriving at an improved
model.
1.2.2. Importance
Predictability Predictability is an all-important goal in science and engineering. If the
and four underlying assumptions hold, then we have achieved probabilistic
Statistical predictability--the ability to make probability statements not only
Control about the process in the past, but also about the process in the future.
In short, such processes are said to be "in statistical control".
Validity of Moreover, if the four assumptions are valid, then the process is
Engineering amenable to the generation of valid scientific and engineering
Conclusions conclusions. If the four assumptions are not valid, then the process is
drifting (with respect to location, variation, or distribution),
unpredictable, and out of control. A simple characterization of such
processes by a location estimate, a variation estimate, or a distribution
"estimate" inevitably leads to engineering conclusions that are not
valid, are not supportable (scientifically or legally), and which are not
repeatable in the laboratory.
Four Techniques The following EDA techniques are simple, efficient, and powerful
to Test for the routine testing of underlying assumptions:
Underlying 1. run sequence plot (Yi versus i)
Assumptions
2. lag plot (Yi versus Yi-1)
3. histogram (counts versus subgroups of Y)
4. normal probability plot (ordered Y versus theoretical ordered
Y)
Plot on a Single The four EDA plots can be juxtaposed for a quick look at the
Page for a characteristics of the data. The plots below are ordered as follows:
Quick 1. Run sequence plot - upper left
Characterization
2. Lag plot - upper right
of the Data
3. Histogram - lower left
4. Normal probability plot - lower right
Sample Plot:
Assumptions
Hold
This 4-plot reveals a process that has fixed location, fixed variation,
is random, apparently has a fixed approximately normal
distribution, and has no outliers.
Sample Plot: If one or more of the four underlying assumptions do not hold, then
Assumptions Do it will show up in the various plots as demonstrated in the following
Not Hold example.
This 4-plot reveals a process that has fixed location, fixed variation,
is non-random (oscillatory), has a non-normal, U-shaped
distribution, and has several outliers.
Plots Utilized Conversely, the underlying assumptions are tested using the EDA
to Test the plots:
Assumptions ● Run Sequence Plot:
If the run sequence plot is flat and non-drifting, the
fixed-location assumption holds. If the run sequence plot has a
vertical spread that is about the same over the entire plot, then
the fixed-variation assumption holds.
● Lag Plot:
If the lag plot is structureless, then the randomness assumption
holds.
● Histogram:
If the histogram is bell-shaped, the underlying distribution is
symmetric and perhaps approximately normal.
● Normal Probability Plot:
1.2.5. Consequences
What If If some of the underlying assumptions do not hold, what can be done
Assumptions about it? What corrective actions can be taken? The positive way of
Do Not Hold? approaching this is to view the testing of underlying assumptions as a
framework for learning about the process. Assumption-testing
promotes insight into important aspects of the process that may not
have surfaced otherwise.
Primary Goal The primary goal is to have correct, validated, and complete
is Correct and scientific/engineering conclusions flowing from the analysis. This
Valid usually includes intermediate goals such as the derivation of a
Scientific good-fitting model and the computation of realistic parameter
Conclusions estimates. It should always include the ultimate goal of an
understanding and a "feel" for "what makes the process tick". There is
no more powerful catalyst for discovery than the bringing together of
an experienced/expert scientist/engineer and a data set ripe with
intriguing "anomalies" and characteristics.
Consequences If the run sequence plot does not support the assumption of fixed
of Non-Fixed location, then
Location 1. The location may be drifting.
2. The single location estimate may be meaningless (if the process
is drifting).
3. The choice of location estimator (e.g., the sample mean) may be
sub-optimal.
4. The usual formula for the uncertainty of the mean:
Consequences If the run sequence plot does not support the assumption of fixed
of Non-Fixed variation, then
Variation 1. The variation may be drifting.
2. The single variation estimate may be meaningless (if the process
variation is drifting).
3. The variation estimate may be poor.
4. The variation estimate may be biased.
Case Studies The airplane glass failure case study gives an example of determining
an appropriate distribution and estimating the parameters of that
distribution. The uniform random numbers case study gives an
example of determining a more appropriate centrality parameter for a
non-normal distribution.
Table of 1. Introduction
Contents for 2. Analysis Questions
Section 3
3. Graphical Techniques: Alphabetical
4. Graphical Techniques: By Problem Category
5. Quantitative Techniques: Alphabetical
6. Probability Distributions
1.3.1. Introduction
Graphical This section describes many techniques that are commonly used in
and exploratory and classical data analysis. This list is by no means meant
Quantitative to be exhaustive. Additional techniques (both graphical and
Techniques quantitative) are discussed in the other chapters. Specifically, the
product comparisons chapter has a much more detailed description of
many classical statistical techniques.
EDA emphasizes graphical techniques while classical techniques
emphasize quantitative techniques. In practice, an analyst typically
uses a mixture of graphical and quantitative techniques. In this section,
we have divided the descriptions into graphical and quantitative
techniques. This is for organizational clarity and is not meant to
discourage the use of both graphical and quantitiative techniques when
analyzing data.
Use of This section emphasizes the techniques themselves; how the graph or
Techniques test is defined, published references, and sample output. The use of the
Shown in techniques to answer engineering questions is demonstrated in the case
Case Studies studies section. The case studies do not demonstrate all of the
techniques.
Availability The sample plots and output in this section were generated with the
in Software Dataplot software program. Other general purpose statistical data
analysis programs can generate most of the plots, intervals, and tests
discussed here, or macros can be written to acheive the same result.
Analyst A critical early step in any analysis is to identify (for the engineering
Should problem at hand) which of the above questions are relevant. That is, we
Identify need to identify which questions we want answered and which questions
Relevant have no bearing on the problem at hand. After collecting such a set of
Questions questions, an equally important step, which is invaluable for maintaining
for his focus, is to prioritize those questions in decreasing order of importance.
Engineering EDA techniques are tied in with each of the questions. There are some
Problem EDA techniques (e.g., the scatter plot) that are broad-brushed and apply
almost universally. On the other hand, there are a large number of EDA
techniques that are specific and whose specificity is tied in with one of
the above questions. Clearly if one chooses not to explicitly identify
relevant questions, then one cannot take advantage of these
question-specific EDA technqiues.
Linear Intercept Linear Slope Plot: Linear Residual Mean Plot: 1.3.3.20
Plot: 1.3.3.17 1.3.3.18 Standard Deviation
Plot: 1.3.3.19
6-Plot: 1.3.3.33
Sample Plot:
Autocorrelations
should be
near-zero for
randomness.
Such is not the
case in this
example and
thus the
randomness
assumption fails
This sample autocorrelation plot shows that the time series is not
random, but rather has a high degree of autocorrelation between
adjacent and near-adjacent observations.
Importance: Randomness (along with fixed model, fixed variation, and fixed
Ensure validity distribution) is one of the four assumptions that typically underlie all
of engineering measurement processes. The randomness assumption is critically
conclusions important for the following three reasons:
1. Most standard statistical tests depend on randomness. The
validity of the test conclusions is directly linked to the
validity of the randomness assumption.
2. Many commonly-used statistical formulae depend on the
randomness assumption, the most common formula being the
formula for determining the standard deviation of the sample
mean:
Case Study The autocorrelation plot is demonstrated in the beam deflection data
case study.
Recommended The next step would be to estimate the parameters for the
Next Step autoregressive model:
Conclusions We can make the following conclusions from the above plot.
1. The data come from an underlying autoregressive model with
strong positive autocorrelation.
Discussion The plot starts with a high autocorrelation at lag 1 (only slightly less
than 1) that slowly declines. It continues decreasing until it becomes
negative and starts showing an incresing negative autocorrelation.
The decreasing autocorrelation is generally linear with little noise.
Such a pattern is the autocorrelation plot signature of "strong
autocorrelation", which in turn provides high predictability if
modeled properly.
Recommended The next step would be to estimate the parameters for the
Next Step autoregressive model:
Conclusions We can make the following conclusions from the above plot.
1. The data come from an underlying sinusoidal model.
1.3.3.2. Bihistogram
Purpose: The bihistogram is an EDA tool for assessing whether a
Check for a before-versus-after engineering modification has caused a change in
change in ● location;
location,
● variation; or
variation, or
distribution ● distribution.
Sample Plot:
This
bihistogram
reveals that
there is a
significant
difference in
ceramic
breaking
strength
between
batch 1
(above) and
batch 2
(below)
factor has a significant effect on the location (typical value) for strength
and hence batch is said to be "significant" or to "have an effect". We
thus see graphically and convincingly what a t-test or analysis of
variance would indicate quantitatively.
Case Study The bihistogram is demonstrated in the ceramic strength data case
study.
Sample
Plot:
Weld
method 2 is
lower
(better) than
weld method
1 in 10 of 12
cases
This block plot reveals that in 10 of the 12 cases (bars), weld method 2
is lower (better) than weld method 1. From a binomial point of view,
weld method is statistically significant.
Discussion: Average number of defective lead wires per hour from a study with four
Primary factors,
factor is 1. weld strength (2 levels)
denoted by
2. plant (2 levels)
plot
character: 3. speed (2 levels)
within-bar 4. shift (3 levels)
plot are shown in the plot above. Weld strength is the primary factor and the
character. other three factors are nuisance factors. The 12 distinct positions along
the horizontal axis correspond to all possible combinations of the three
nuisance factors, i.e., 12 = 2 plants x 2 speeds x 3 shifts. These 12
conditions provide the framework for assessing whether any conclusions
about the 2 levels of the primary factor (weld method) can truly be
called "general conclusions". If we find that one weld method setting
does better (smaller average defects per hour) than the other weld
method setting for all or most of these 12 nuisance factor combinations,
then the conclusion is in fact general and robust.
Ordering In the above chart, the ordering along the horizontal axis is as follows:
along the ● The left 6 bars are from plant 1 and the right 6 bars are from plant
horizontal 2.
axis
● The first 3 bars are from speed 1, the next 3 bars are from speed
2, the next 3 bars are from speed 1, and the last 3 bars are from
speed 2.
● Bars 1, 4, 7, and 10 are from the first shift, bars 2, 5, 8, and 11 are
from the second shift, and bars 3, 6, 9, and 12 are from the third
shift.
Setting 2 is In the block plot for the first bar (plant 1, speed 1, shift 1), weld method
better than 1 yields about 28 defects per hour while weld method 2 yields about 22
setting 1 in defects per hour--hence the difference for this combination is about 6
10 out of 12 defects per hour and weld method 2 is seen to be better (smaller number
cases of defects per hour).
Is "weld method 2 is better than weld method 1" a general conclusion?
For the second bar (plant 1, speed 1, shift 2), weld method 1 is about 37
while weld method 2 is only about 18. Thus weld method 2 is again seen
to be better than weld method 1. Similarly for bar 3 (plant 1, speed 1,
shift 3), we see weld method 2 is smaller than weld method 1. Scanning
over all of the 12 bars, we see that weld method 2 is smaller than weld
method 1 in 10 of the 12 cases, which is highly suggestive of a robust
weld method effect.
Questions The block plot can provide answers to the following questions:
1. Is the factor of interest significant?
2. Does the factor of interest have an effect?
3. Does the location change between levels of the primary factor?
4. Has the process improved?
5. What is the best setting (= level) of the primary factor?
6. How much of an average improvement can we expect with this
best setting of the primary factor?
7. Is there an interaction between the primary factor and one or more
nuisance factors?
8. Does the effect of the primary factor change depending on the
setting of some nuisance factor?
Case Study The block plot is demonstrated in the ceramic strength data case study.
Software Block plots can be generated with the Dataplot software program. They
are not currently available in other statistical software programs.
Sample
Plot:
This bootstrap plot was generated from 500 uniform random numbers.
Bootstrap plots and corresponding histograms were generated for the
mean, median, and mid-range. The histograms for the corresponding
statistics clearly show that for uniform random numbers the mid-range
has the smallest variance and is, therefore, a superior location estimator
to the mean or the median.
The bootstrap plot is simply the computed value of the statistic versus
the subsample number. That is, the bootstrap plot generates the values
for the desired statistic. This is usually immediately followed by a
histogram or some other distributional plot to show the location and
variation of the sampling distribution of the statistic.
Cautuion on The bootstrap is not appropriate for all distributions and statistics (Efron
use of the and Tibrashani). For example, because of the shape of the uniform
bootstrap distribution, the bootstrap is not appropriate for estimating the
distribution of statistics that are heavily dependent on the tails, such as
the range.
Related Histogram
Techniques Jackknife
The jacknife is a technique that is closely related to the bootstrap. The
jackknife is beyond the scope of this handbook. See the Efron and Gong
article for a discussion of the jackknife.
Case Study The bootstrap plot is demonstrated in the uniform random numbers case
study.
Sample Plot
The plot of the original data with the predicted values from a linear fit
indicate that a quadratic fit might be preferable. The Box-Cox
linearity plot shows a value of = 2.0. The plot of the transformed
data with the predicted values from a linear fit with the transformed
data shows a better fit (verified by the significant reduction in the
residual standard deviation).
Questions The Box-Cox linearity plot can provide answers to the following
questions:
1. Would a suitable transformation improve my fit?
2. What is the optimal value of the transformation parameter?
Case Study The Box-Cox linearity plot is demonstrated in the Alaska pipeline
data case study.
Software Box-Cox linearity plots are not a standard part of most general
purpose statistical software programs. However, the underlying
technique is based on a transformation and computing a correlation
coefficient. So if a statistical program supports these capabilities,
writing a macro for a Box-Cox linearity plot should be feasible.
Dataplot supports a Box-Cox linearity plot directly.
Sample Plot
The histogram in the upper left-hand corner shows a data set that has
significant right skewness (and so does not follow a normal
distribution). The Box-Cox normality plot shows that the maximum
value of the correlation coefficient is at = -0.3. The histogram of the
data after applying the Box-Cox transformation with = -0.3 shows a
data set for which the normality assumption is reasonable. This is
verified with a normal probability plot of the transformed data.
Questions The Box-Cox normality plot can provide answers to the following
questions:
1. Is there a transformation that will normalize my data?
2. What is the optimal value of the transformation parameter?
Importance: Normality assumptions are critical for many univariate intervals and
Normalization hypothesis tests. It is important to test the normality assumption. If the
Improves data are in fact clearly not normal, the Box-Cox normality plot can
Validity of often be used to find a transformation that will approximately
Tests normalize the data.
Software Box-Cox normality plots are not a standard part of most general
purpose statistical software programs. However, the underlying
technique is based on a normal probability plot and computing a
correlation coefficient. So if a statistical program supports these
capabilities, writing a macro for a Box-Cox normality plot should be
feasible. Dataplot supports a Box-Cox normality plot directly.
Sample
Plot:
This box
plot reveals
that
machine has
a significant
effect on
energy with
respect to
location and
possibly
variation
This box plot, comparing four machines for energy output, shows that
machine has a significant effect on energy with respect to both location
and variation. Machine 3 has the highest energy response (about 72.5);
machine 4 has the least variable energy response with about 50% of its
readings being within 1 energy unit.
Single or A single box plot can be drawn for one batch of data with no distinct
multiple box groups. Alternatively, multiple box plots can be drawn together to
plots can be compare multiple data sets or to compare groups in a single data set. For
drawn a single box plot, the width of the box is arbitrary. For multiple box
plots, the width of the box plot can be set proportional to the number of
points in the given group or sample (some software implementations of
the box plot simply set all the boxes to the same width).
Box plots There is a useful variation of the box plot that more specifically
with fences identifies outliers. To create this variation:
1. Calculate the median and the lower and upper quartiles.
2. Plot a symbol at the median and draw a box between the lower
and upper quartiles.
3. Calculate the interquartile range (the difference between the upper
and lower quartile) and call it IQ.
4. Calculate the following points:
L1 = lower quartile - 1.5*IQ
L2 = lower quartile - 3.0*IQ
U1 = upper quartile + 1.5*IQ
U2 = upper quartile + 3.0*IQ
5. The line from the lower quartile to the minimum is now drawn
from the lower quartile to the smallest point that is greater than
L1. Likewise, the line from the upper quartile to the maximum is
now drawn to the largest point smaller than U1.
Questions The box plot can provide answers to the following questions:
1. Is a factor significant?
2. Does the location differ between subgroups?
3. Does the variation differ between subgroups?
4. Are there any outliers?
Importance: The box plot is an important EDA tool for determining if a factor has a
Check the significant effect on the response with respect to either location or
significance variation.
of a factor
The box plot is also an effective tool for summarizing large quantities of
information.
Case Study The box plot is demonstrated in the ceramic strength data case study.
Software Box plots are available in most general purpose statistical software
programs, including Dataplot.
where is some type of linear model fit with standard least squares.
The most common case is a linear fit, that is the model becomes
Sample
Plot:
Case Study The complex demodulation amplitude plot is demonstrated in the beam
deflection data case study.
Software Complex demodulation amplitude plots are available in some, but not
most, general purpose statistical software programs. Dataplot supports
complex demodulation amplitude plots.
Sample
Plot:
The mathematical computations for the phase plot are beyond the scope
of the Handbook. Consult Granger (Granger, 1964) for details.
Questions The complex demodulation phase plot answers the following question:
Is the specified demodulation frequency correct?
Case Study The complex demodulation amplitude plot is demonstrated in the beam
deflection data case study.
Software Complex demodulation phase plots are available in some, but not most,
general purpose statistical software programs. Dataplot supports
complex demodulation phase plots.
Sample Plot:
This contour plot shows that the surface is symmetric and peaks in the
center.
Importance: For univariate data, a run sequence plot and a histogram are considered
Visualizing necessary first steps in understanding the data. For 2-dimensional data,
3-dimensional a scatter plot is a necessary first step in understanding the data.
data
In a similar manner, 3-dimensional data should be plotted. Small data
sets, such as result from designed experiments, can typically be
represented by block plots, dex mean plots, and the like (here, "DEX"
stands for "Design of Experiments"). For large data sets, a contour plot
or a 3-D surface plot should be considered a necessary first step in
understanding the data.
DEX Contour The dex contour plot is a specialized contour plot used in the design of
Plot experiments. In particular, it is useful for full and fractional designs.
Software Contour plots are available in most general purpose statistical software
programs. They are also available in many general purpose graphics
and mathematics programs. These programs vary widely in the
capabilities for the contour plots they generate. Many provide just a
basic contour plot over a rectangular grid while others permit color
filled or shaded contours. Dataplot supports a fairly basic contour plot.
Construction The following are the primary steps in the construction of the dex contour
of DEX plot.
Contour Plot 1. The x and y axes of the plot represent the values of the first and
second factor (independent) variables.
2. The four vertex points are drawn. The vertex points are (-1,-1),
(-1,1), (1,1), (1,-1). At each vertex point, the average of all the
response values at that vertex point is printed.
3. Similarly, if there are center points, a point is drawn at (0,0) and the
average of the response values at the center points is printed.
4. The linear dex contour plot assumes the model:
The user specifies the target values for which contour lines will be
generated.
The above algorithm assumes a linear model for the design. Dex contour
plots can also be generated for the case in which we assume a quadratic
model for the design. The algebra for solving for U2 in terms of U1
becomes more complicated, but the fundamental idea is the same.
Quadratic models are needed for the case when the average for the center
points does not fall in the range defined by the vertex point (i.e., there is
curvature).
Sample DEX The following is a dex contour plot for the data used in the Eddy current
Contour Plot case study. The analysis in that case study demonstrated that X1 and X2
were the most important factors.
Interpretation From the above dex contour plot we can derive the following information.
of the Sample 1. Interaction significance;
DEX Contour
2. Best (data) setting for these 2 dominant factors;
Plot
Interaction Note the appearance of the contour plot. If the contour curves are linear,
Significance then that implies that the interaction term is not significant; if the contour
curves have considerable curvature, then that implies that the interaction
term is large and important. In our case, the contour curves do not have
considerable curvature, and so we conclude that the X1*X2 term is not
significant.
Best Settings To determine the best factor settings for the already-run experiment, we
first must define what "best" means. For the Eddy current data set used to
generate this dex contour plot, "best" means to maximize (rather than
minimize or hit a target) the response. Hence from the contour plot we
determine the best settings for the two dominant factors by simply
scanning the four vertices and choosing the vertex with the largest value
(= average response). In this case, it is (X1 = +1, X2 = +1).
As for factor X3, the contour plot provides no best setting information, and
so we would resort to other tools: the main effects plot, the interaction
effects matrix, or the ordered data to determine optimal X3 settings.
Case Study The Eddy current case study demonstrates the use of the dex contour plot
in the context of the analysis of a full factorial design.
Software DEX contour plots are available in many statistical software programs that
analyze data from designed experiments. Dataplot supports a linear dex
contour plot and it provides a macro for generating a quadratic dex contour
plot.
Sample Plot:
Factors 4, 2,
3, and 7 are
the Important
Factors.
Description For this sample plot, there are seven factors and each factor has two
of the Plot levels. For each factor, we define a distinct x coordinate for each level
of the factor. For example, for factor 1, level 1 is coded as 0.8 and level
2 is coded as 1.2. The y coordinate is simply the value of the response
variable. The solid horizontal line is drawn at the overall mean of the
response variable. The vertical dotted lines are added for clarity.
Although the plot can be drawn with an arbitrary number of levels for a
factor, it is really only useful when there are two or three levels for a
factor.
Questions The dex scatter plot can be used to answer the following questions:
1. Which factors are important with respect to location and scale?
2. Are there outliers?
Extension for
Interaction
Effects
Using the concept of the scatterplot matrix, the dex scatter plot can be
extended to display first order interaction effects.
Specifically, if there are k factors, we create a matrix of plots with k
rows and k columns. On the diagonal, the plot is simply a dex scatter
plot with a single factor. For the off-diagonal plots, we multiply the
values of Xi and Xj. For the common 2-level designs (i.e., each factor
has two levels) the values are typically coded as -1 and 1, so the
multiplied values are also -1 and 1. We then generate a dex scatter plot
for this interaction variable. This plot is called a dex interaction effects
plot and an example is shown below.
Interpretation We can first examine the diagonal elements for the main effects. These
of the Dex diagonal plots show a great deal of overlap between the levels for all
Interaction three factors. This indicates that location and scale effects will be
Effects Plot relatively small.
We can then examine the off-diagonal plots for the first order
interaction effects. For example, the plot in the first row and second
column is the interaction between factors X1 and X2. As with the main
effect plots, no clear patterns are evident.
Case Study The dex scatter plot is demonstrated in the ceramic strength data case
study.
Software Dex scatter plots are available in some general purpose statistical
software programs, although the format may vary somewhat between
these programs. They are essentially just scatter plots with the X
variable defined in a particular way, so it should be feasible to write
macros for dex scatter plots in most statistical software programs.
Dataplot supports a dex scatter plot.
Sample
Plot:
Factors 4, 2,
and 1 are
the Most
Important
Factors
Questions The dex mean plot can be used to answer the following questions:
1. Which factors are important? The dex mean plot does not provide
a definitive answer to this question, but it does help categorize
factors as "clearly important", "clearly not important", and
"borderline importance".
2. What is the ranking list of the important factors?
Extension Using the concept of the scatter plot matrix, the dex mean plot can be
for extended to display first-order interaction effects.
Interaction
Effects Specifically, if there are k factors, we create a matrix of plots with k
rows and k columns. On the diagonal, the plot is simply a dex mean plot
with a single factor. For the off-diagonal plots, measurements at each
level of the interaction are plotted versus level, where level is Xi times
Xj and Xi is the code for the ith main effect level and Xj is the code for
the jth main effect. For the common 2-level designs (i.e., each factor has
two levels) the values are typically coded as -1 and 1, so the multiplied
values are also -1 and 1. We then generate a dex mean plot for this
interaction variable. This plot is called a dex interaction effects plot and
an example is shown below.
DEX
Interaction
Effects Plot
This plot shows that the most significant factor is X1 and the most
significant interaction is between X1 and X3.
Case Study The dex mean plot and the dex interaction effects plot are demonstrated
in the ceramic strength data case study.
Software Dex mean plots are available in some general purpose statistical
software programs, although the format may vary somewhat between
these programs. It may be feasible to write macros for dex mean plots in
some statistical software programs that do not support this plot directly.
Dataplot supports both a dex mean plot and a dex interaction effects
plot.
Sample Plot
Questions The dex standard deviation plot can be used to answer the following
questions:
1. How do the standard deviations vary across factors?
2. How do the standard deviations vary within a factor?
3. Which are the most important factors with respect to scale?
4. What is the ranked list of the important factors with respect to
scale?
Importance: The goal with many designed experiments is to determine which factors
Assess are significant. This is usually determined from the means of the factor
Variability levels (which can be conveniently shown with a dex mean plot). A
secondary goal is to assess the variability of the responses both within a
factor and between factors. The dex standard deviation plot is a
convenient way to do this.
Case Study The dex standard deviation plot is demonstrated in the ceramic strength
data case study.
Software Dex standard deviation plots are not available in most general purpose
statistical software programs. It may be feasible to write macros for dex
standard deviation plots in some statistical software programs that do
not support them directly. Dataplot supports a dex standard deviation
plot.
1.3.3.14. Histogram
Purpose: The purpose of a histogram (Chambers) is to graphically summarize the
Summarize distribution of a univariate data set.
a Univariate
Data Set The histogram graphically shows the following:
1. center (i.e., the location) of the data;
2. spread (i.e., the scale) of the data;
3. skewness of the data;
4. presence of outliers; and
5. presence of multiple modes in the data.
These features provide strong indications of the proper distributional
model for the data. The probability plot or a goodness-of-fit test can be
used to verify the distributional model.
The examples section shows the appearance of a number of common
features revealed by histograms.
Sample Plot
Definition The most common form of the histogram is obtained by splitting the
range of the data into equal-sized bins (called classes). Then for each
bin, the number of points from the data set that fall into each bin are
counted. That is
● Vertical axis: Frequency (i.e., counts for each bin)
The classes can either be defined arbitrarily by the user or via some
systematic rule. A number of theoretically derived rules have been
proposed by Scott (Scott 1992).
Examples 1. Normal
2. Symmetric, Non-Normal, Short-Tailed
3. Symmetric, Non-Normal, Long-Tailed
4. Symmetric and Bimodal
5. Bimodal Mixture of 2 Normals
6. Skewed (Non-Symmetric) Right
7. Skewed (Non-Symmetric) Left
8. Symmetric with Outlier
Frequency Plot
Stem and Leaf Plot
Density Trace
Case Study The histogram is demonstrated in the heat flow meter data case study.
Discussion of The histogram shown above illustrates data from a bimodal (2 peak)
Unimodal and distribution.
Bimodal
In contrast to the previous example, this example illustrates bimodality
due not to an underlying deterministic model, but bimodality due to a
mixture of probability models. In this case, each of the modes appears
to have a rough bell-shaped component. One could easily imagine the
above histogram being generated by a process consisting of two
normal distributions with the same standard deviation but with two
different locations (one centered at approximately 9.17 and the other
centered at approximately 9.26). If this is the case, then the research
challenge is to determine physically why there are two similar but
separate sub-processes.
Recommended If the histogram indicates that the data might be appropriately fit with
Next Steps a mixture of two normal distributions, the recommended next step is:
Fit the normal mixture model using either least squares or maximum
likelihood. The general normal mixing model is
Some Causes Skewed data often occur due to lower or upper bounds on the data.
for Skewed That is, data that have a lower bound are often skewed right while data
Data that have an upper bound are often skewed left. Skewness can also
result from start-up effects. For example, in reliability applications
some processes may have a large number of initial failures that could
cause left skewness. On the other hand, a reliability process could
have a long start-up period where failures are rare resulting in
right-skewed data.
Data collected in scientific and engineering applications often have a
lower bound of zero. For example, failure data must be non-negative.
Many measurement processes generate only positive data. Time to
occurence and size are common measurements that cannot be less than
zero.
❍ Gamma family
❍ Chi-square family
❍ Lognormal family
❍ Power lognormal family
3. Consider a normalizing transformation such as the Box-Cox
transformation.
The issues for skewed left data are similar to those for skewed right
data.
6. warm-up effects
to more subtle causes such as
1. A change in settings of factors that (knowingly or unknowingly)
affect the response.
2. Nature is trying to tell us something.
Recommended If the histogram shows the presence of outliers, the recommended next
Next Steps steps are:
1. Graphically check for outliers (in the commonly encountered
normal case) by generating a box plot. In general, box plots are
a much better graphical tool for detecting outliers than are
histograms.
2. Quantitatively check for outliers (in the commonly encountered
normal case) by carrying out Grubbs test which indicates how
many sample standard deviations away from the sample mean
are the data in question. Large values indicate outliers.
Sample Plot
This sample lag plot exhibits a linear pattern. This shows that the data
are strongly non-random and further suggests that an autoregressive
model might be appropriate.
Definition A lag is a fixed time displacement. For example, given a data set Y1, Y2
..., Yn, Y2 and Y7 have lag 5 since 7 - 2 = 5. Lag plots can be generated
for any arbitrary lag, although the most commonly used lag is 1.
A plot of lag 1 is a plot of the values of Yi versus Yi-1
● Vertical axis: Yi for all i
● Horizontal axis: Yi-1 for all i
Case Study The lag plot is demonstrated in the beam deflection data case study.
Software Lag plots are not directly available in most general purpose statistical
software programs. Since the lag plot is essentially a scatter plot with
the 2 variables properly lagged, it should be feasible to write a macro for
the lag plot in most statistical programs. Dataplot supports a lag plot.
Conclusions We can make the following conclusions based on the above plot.
1. The data are random.
2. The data exhibit no autocorrelation.
3. The data contain no outliers.
Discussion The lag plot shown above is for lag = 1. Note the absence of structure.
One cannot infer, from a current value Yi-1, the next value Yi. Thus for a
known value Yi-1 on the horizontal axis (say, Yi-1 = +0.5), the Yi-th
value could be virtually anything (from Yi = -2.5 to Yi = +1.5). Such
non-association is the essence of randomness.
Discussion In the plot above for lag = 1, note how the points tend to cluster (albeit
noisily) along the diagonal. Such clustering is the lag plot signature of
moderate autocorrelation.
If the process were completely random, knowledge of a current
observation (say Yi-1 = 0) would yield virtually no knowledge about
the next observation Yi. If the process has moderate autocorrelation, as
above, and if Yi-1 = 0, then the range of possible values for Yi is seen
to be restricted to a smaller range (.01 to +.01). This suggests
prediction is possible using an autoregressive model.
Conclusions We can make the following conclusions based on the above plot.
1. The data come from an underlying autoregressive model with
strong positive autocorrelation
2. The data contain no outliers.
Discussion Note the tight clustering of points along the diagonal. This is the lag
plot signature of a process with strong positive autocorrelation. Such
processes are highly non-random--there is strong association between
an observation and a succeeding observation. In short, if you know
Yi-1 you can make a strong guess as to what Yi will be.
If the above process were completely random, the plot would have a
shotgun pattern, and knowledge of a current observation (say Yi-1 = 3)
would yield virtually no knowledge about the next observation Yi (it
could here be anywhere from -2 to +8). On the other hand, if the
process had strong autocorrelation, as seen above, and if Yi-1 = 3, then
the range of possible values for Yi is seen to be restricted to a smaller
range (2 to 4)--still wide, but an improvement nonetheless (relative to
-2 to +8) in predictive power.
Recommended When the lag plot shows a strongly autoregressive pattern and only
Next Step successive observations appear to be correlated, the next steps are to:
1. Extimate the parameters for the autoregressive model:
Since Yi and Yi-1 are precisely the axes of the lag plot, such
estimation is a linear regression straight from the lag plot.
Conclusions We can make the following conclusions based on the above plot.
1. The data come from an underlying single-cycle sinusoidal
model.
2. The data contain three outliers.
Discussion In the plot above for lag = 1, note the tight elliptical clustering of
points. Processes with a single-cycle sinusoidal model will have such
elliptical lag plots.
Consequences If one were to naively assume that the above process came from the
of Ignoring null model
Cyclical
Pattern
and then estimate the constant by the sample mean, then the analysis
would suffer because
1. the sample mean would be biased and meaningless;
2. the confidence limits would be meaningless and optimistically
small.
The proper model
Unexpected Frequently a technique (e.g., the lag plot) is constructed to check one
Value of EDA aspect (e.g., randomness) which it does well. Along the way, the
technique also highlights some other anomaly of the data (namely, that
there are 3 outliers). Such outlier identification and removal is
extremely important for detecting irregularities in the data collection
system, and also for arriving at a "purified" data set for modeling. The
lag plot plays an important role in such outlier identification.
Recommended When the lag plot indicates a sinusoidal model with possible outliers,
Next Step the recommended next steps are:
1. Do a spectral plot to obtain an initial estimate of the frequency
of the underlying cycle. This will be helpful as a starting value
for the subsequent non-linear fitting.
2. Omit the outliers.
3. Carry out a non-linear fit of the model to the 197 points.
Sample Plot
This linear correlation plot shows that the correlations are high for all
groups. This implies that linear fits could provide a good model for
each of these groups.
Questions The linear correlation plot can be used to answer the following
questions.
1. Are there linear relationships across groups?
2. Are the strength of the linear relationships relatively constant
across the groups?
Importance: For grouped data, it may be important to know whether the different
Checking groups are homogeneous (i.e., similar) or heterogeneous (i.e., different).
Group Linear correlation plots help answer this question in the context of
Homogeneity linear fitting.
Case Study The linear correlation plot is demonstrated in the Alaska pipeline data
case study.
In some cases you might not have groups. Instead, you have different
data sets and you want to know if the same fit can be adequately applied
to each of the data sets. In this case, simply think of each distinct data
set as a group and apply the linear intercept plot as for groups.
Sample Plot
the other groups. Note that these are small differences in the intercepts.
Questions The linear intercept plot can be used to answer the following questions.
1. Is the intercept from linear fits relatively constant across groups?
2. If the intercepts vary across groups, is there a discernible pattern?
Importance: For grouped data, it may be important to know whether the different
Checking groups are homogeneous (i.e., similar) or heterogeneous (i.e., different).
Group Linear intercept plots help answer this question in the context of linear
Homogeneity fitting.
Case Study The linear intercept plot is demonstrated in the Alaska pipeline data
case study.
In some cases you might not have groups. Instead, you have different
data sets and you want to know if the same fit can be adequately applied
to each of the data sets. In this case, simply think of each distinct data
set as a group and apply the linear slope plot as for groups.
Sample Plot
This linear slope plot shows that the slopes are about 0.174 (plus or
minus 0.002) for all groups. There does not appear to be a pattern in the
variation of the slopes. This implies that a single fit may be adequate.
Questions The linear slope plot can be used to answer the following questions.
1. Do you get the same slope across groups for linear fits?
2. If the slopes differ, is there a discernible pattern in the slopes?
Importance: For grouped data, it may be important to know whether the different
Checking groups are homogeneous (i.e., similar) or heterogeneous (i.e., different).
Group Linear slope plots help answer this question in the context of linear
Homogeneity fitting.
Case Study The linear slope plot is demonstrated in the Alaska pipeline data case
study.
Sample Plot
This linear RESSD plot shows that the residual standard deviations
from a linear fit are about 0.0025 for all the groups.
Questions The linear RESSD plot can be used to answer the following questions.
1. Is the residual standard deviation from a linear fit constant across
groups?
2. If the residual standard deviations vary, is there a discernible
pattern across the groups?
Importance: For grouped data, it may be important to know whether the different
Checking groups are homogeneous (i.e., similar) or heterogeneous (i.e., different).
Group Linear RESSD plots help answer this question in the context of linear
Homogeneity fitting.
Case Study The linear residual standard deviation plot is demonstrated in the
Alaska pipeline data case study.
Sample Plot
This sample mean plot shows a shift of location after the 6th month.
Questions The mean plot can be used to answer the following questions.
1. Are there any shifts in location?
2. What is the magnitude of the shifts in location?
3. Is there a distinct pattern in the shifts in location?
Sample Plot
The points on this plot form a nearly linear pattern, which indicates
that the normal distribution is a good model for this data set.
Questions The normal probability plot is used to answer the following questions.
1. Are the data normally distributed?
2. What is the nature of the departure from normality (data
skewed, shorter than expected tails, longer than expected tails)?
Importance: The underlying assumptions for a measurement process are that the
Check data should behave like:
Normality 1. random drawings;
Assumption
2. from a fixed distribution;
3. with fixed location;
4. with fixed scale.
Probability plots are used to assess the assumption of a fixed
distribution. In particular, most statistical models are of the form:
response = deterministic + random
where the deterministic part is the fit and the random part is error. This
error component in most common statistical models is specifically
assumed to be normally distributed with fixed location and scale. This
is the most frequent application of normal probability plots. That is, a
model is fit and a normal probability plot is generated for the residuals
from the fitted model. If the residuals from the fitted model are not
normally distributed, then one of the major assumptions of the model
has been violated.
Related Histogram
Techniques Probability plots for other distributions (e.g., Weibull)
Probability plot correlation coefficient plot (PPCC plot)
Anderson-Darling Goodness-of-Fit Test
Chi-Square Goodness-of-Fit Test
Kolmogorov-Smirnov Goodness-of-Fit Test
Case Study The normal probability plot is demonstrated in the heat flow meter
data case study.
Conclusions We can make the following conclusions from the above plot.
1. The normal probability plot shows a strongly linear pattern. There
are only minor deviations from the line fit to the points on the
probability plot.
2. The normal distribution appears to be a good model for these
data.
Discussion Visually, the probability plot shows a strongly linear pattern. This is
verified by the correlation coefficient of 0.9989 of the line fit to the
probability plot. The fact that the points in the lower and upper extremes
of the plot do not deviate significantly from the straight-line pattern
indicates that there are not any significant outliers (relative to a normal
distribution).
In this case, we can quite reasonably conclude that the normal
distribution provides an excellent model for the data. The intercept and
slope of the fitted line give estimates of 9.26 and 0.023 for the location
and scale parameters of the fitted normal distribution.
Conclusions We can make the following conclusions from the above plot.
1. The normal probability plot shows a non-linear pattern.
2. The normal distribution is not a good model for these data.
Discussion For data with short tails relative to the normal distribution, the
non-linearity of the normal probability plot shows up in two ways. First,
the middle of the data shows an S-like pattern. This is common for both
short and long tails. Second, the first few and the last few points show a
marked departure from the reference fitted line. In comparing this plot
to the long tail example in the next section, the important difference is
the direction of the departure from the fitted line for the first few and
last few points. For short tails, the first few points show increasing
departure from the fitted line above the line and last few points show
increasing departure from the fitted line below the line. For long tails,
this pattern is reversed.
In this case, we can reasonably conclude that the normal distribution
does not provide an adequate fit for this data set. For probability plots
that indicate short-tailed distributions, the next step might be to generate
a Tukey Lambda PPCC plot. The Tukey Lambda PPCC plot can often
be helpful in identifying an appropriate distributional family.
Conclusions We can make the following conclusions from the above plot.
1. The normal probability plot shows a reasonably linear pattern in
the center of the data. However, the tails, particularly the lower
tail, show departures from the fitted line.
2. A distribution other than the normal distribution would be a good
model for these data.
Discussion For data with long tails relative to the normal distribution, the
non-linearity of the normal probability plot can show up in two ways.
First, the middle of the data may show an S-like pattern. This is
common for both short and long tails. In this particular case, the S
pattern in the middle is fairly mild. Second, the first few and the last few
points show marked departure from the reference fitted line. In the plot
above, this is most noticeable for the first few data points. In comparing
this plot to the short-tail example in the previous section, the important
difference is the direction of the departure from the fitted line for the
first few and the last few points. For long tails, the first few points show
increasing departure from the fitted line below the line and last few
points show increasing departure from the fitted line above the line. For
short tails, this pattern is reversed.
In this case we can reasonably conclude that the normal distribution can
be improved upon as a model for these data. For probability plots that
indicate long-tailed distributions, the next step might be to generate a
Tukey Lambda PPCC plot. The Tukey Lambda PPCC plot can often be
helpful in identifying an appropriate distributional family.
Conclusions We can make the following conclusions from the above plot.
1. The normal probability plot shows a strongly non-linear pattern.
Specifically, it shows a quadratic pattern in which all the points
are below a reference line drawn between the first and last points.
2. The normal distribution is not a good model for these data.
Discussion This quadratic pattern in the normal probability plot is the signature of a
significantly right-skewed data set. Similarly, if all the points on the
normal probability plot fell above the reference line connecting the first
and last points, that would be the signature pattern for a significantly
left-skewed data set.
In this case we can quite reasonably conclude that we need to model
these data with a right skewed distribution such as the Weibull or
lognormal.
Sample Plot
● What are good estimates for the location and scale parameters of
the chosen distribution?
Importance: The discussion for the normal probability plot covers the use of
Check probability plots for checking the fixed distribution assumption.
distributional
assumption Some statistical models assume data have come from a population with
a specific type of distribution. For example, in reliability applications,
the Weibull, lognormal, and exponential are commonly used
distributional models. Probability plots can be useful for checking this
distributional assumption.
Related Histogram
Techniques Probability Plot Correlation Coefficient (PPCC) Plot
Hazard Plot
Quantile-Quantile Plot
Anderson-Darling Goodness of Fit
Chi-Square Goodness of Fit
Kolmogorov-Smirnov Goodness of Fit
Case Study The probability plot is demonstrated in the airplane glass failure time
data case study.
Use When comparing distributional models, do not simply choose the one
Judgement with the maximum PPCC value. In many cases, several distributional
When fits provide comparable PPCC values. For example, a lognormal and
Selecting An Weibull may both fit a given set of reliability data quite well.
Appropriate Typically, we would consider the complexity of the distribution. That
Distributional is, a simpler distribution with a marginally smaller PPCC value may
Family be preferred over a more complex distribution. Likewise, there may be
theoretical justification in terms of the underlying scientific model for
preferring a distribution with a marginally smaller PPCC value in
some cases. In other cases, we may not need to know if the
distributional model is optimal, only that it is adequate for our
purposes. That is, we may be able to use techniques designed for
normally distributed data even if other distributions fit the data
somewhat better.
Sample Plot The following is a PPCC plot of 100 normal random numbers. The
maximum value of the correlation coefficient = 0.997 at = 0.099.
Case Study The PPCC plot is demonstrated in the airplane glass failure data case
study.
Software PPCC plots are currently not available in most common general
purpose statistical software programs. However, the underlying
technique is based on probability plots and correlation coefficients, so
it should be possible to write macros for PPCC plots in statistical
programs that support these capabilities. Dataplot supports PPCC
plots.
Sample Plot
Importance: When there are two data samples, it is often desirable to know if the
Check for assumption of a common distribution is justified. If so, then location and
Common scale estimators can pool both data sets to obtain estimates of the
Distribution common location and scale. If two samples do differ, it is also useful to
gain some understanding of the differences. The q-q plot can provide
more insight into the nature of the difference than analytical methods
such as the chi-square and Kolmogorov-Smirnov 2-sample tests.
Related Bihistogram
Techniques T Test
F Test
2-Sample Chi-Square Test
2-Sample Kolmogorov-Smirnov Test
Case Study The quantile-quantile plot is demonstrated in the ceramic strength data
case study.
Software Q-Q plots are available in some general purpose statistical software
programs, including Dataplot. If the number of data points in the two
samples are equal, it should be relatively easy to write a macro in
statistical programs that do not support the q-q plot. If the number of
points are not equal, writing a macro for a q-q plot may be difficult.
Sample
Plot:
Last Third
of Data
Shows a
Shift of
Location
This sample run sequence plot shows that the location shifts up for the
last third of the data.
Questions The run sequence plot can be used to answer the following questions
1. Are there any shifts in location?
2. Are there any shifts in variation?
3. Are there any outliers?
The run sequence plot can also give the analyst an excellent feel for the
data.
Case Study The run sequence plot is demonstrated in the Filter transmittance data
case study.
Software Run sequence plots are available in most general purpose statistical
software programs, including Dataplot.
Sample
Plot:
Linear
Relationship
Between
Variables Y
and X
This sample plot reveals a linear relationship between the two variables
indicating that a linear regression model might be appropriate.
Examples 1. No relationship
2. Strong linear (positive correlation)
3. Strong linear (negative correlation)
4. Exact linear (positive correlation)
5. Quadratic relationship
6. Exponential relationship
7. Sinusoidal relationship (damped)
8. Variation of Y doesn't depend on X (homoscedastic)
9. Variation of Y does depend on X (heteroscedastic)
10. Outlier
Combining Scatter plots can also be combined in multiple plots per page to help
Scatter Plots understand higher-level structure in data sets with more than two
variables.
The scatterplot matrix generates all pairwise scatter plots on a single
page. The conditioning plot, also called a co-plot or subset plot,
generates scatter plots of Y versus X dependent on the value of a third
variable.
Case Study The scatter plot is demonstrated in the load cell calibration data case
study.
Software Scatter plots are a fundamental technique that should be available in any
general purpose statistical software program, including Dataplot. Scatter
plots are also available in most graphics and spreadsheet programs as
well.
Discussion Note in the plot above how for a given value of X (say X = 0.5), the
corresponding values of Y range all over the place from Y = -2 to Y = +2.
The same is true for other values of X. This lack of predictablility in
determining Y from a given value of X, and the associated amorphous,
non-structured appearance of the scatter plot leads to the summary
conclusion: no relationship.
Discussion Note in the plot above how a straight line comfortably fits through the
data; hence a linear relationship exists. The scatter about the line is quite
small, so there is a strong linear relationship. The slope of the line is
positive (small values of X correspond to small values of Y; large values
of X correspond to large values of Y), so there is a positive co-relation
(that is, a positive correlation) between X and Y.
Discussion Note in the plot above how a straight line comfortably fits through the
data; hence there is a linear relationship. The scatter about the line is
quite small, so there is a strong linear relationship. The slope of the line
is negative (small values of X correspond to large values of Y; large
values of X correspond to small values of Y), so there is a negative
co-relation (that is, a negative correlation) between X and Y.
Discussion Note in the plot above how a straight line comfortably fits through the
data; hence there is a linear relationship. The scatter about the line is
zero--there is perfect predictability between X and Y), so there is an
exact linear relationship. The slope of the line is positive (small values
of X correspond to small values of Y; large values of X correspond to
large values of Y), so there is a positive co-relation (that is, a positive
correlation) between X and Y.
Discussion Note in the plot above how no imaginable simple straight line could
ever adequately describe the relationship between X and Y--a curved (or
curvilinear, or non-linear) function is needed. The simplest such
curvilinear function is a quadratic model
for some A, B, and C. Many other curvilinear functions are possible, but
the data analysis principle of parsimony suggests that we try fitting a
quadratic function first.
Discussion Note that a simple straight line is grossly inadequate in describing the
relationship between X and Y. A quadratic model would prove lacking,
especially for large values of X. In this example, the large values of X
correspond to nearly constant values of Y, and so a non-linear function
beyond the quadratic is needed. Among the many other non-linear
functions available, one of the simpler ones is the exponential model
Closer inspection of the scatter plot reveals that the amount of swing
(the amplitude in the model) does not appear to be constant but rather
is decreasing (damping) as X gets large. We thus would be led to the
conclusion: damped sinusoidal relationship, with the simplest
corresponding model being
Discussion This scatter plot reveals a linear relationship between X and Y: for a
given value of X, the predicted value of Y will fall on a line. The plot
further reveals that the variation in Y about the predicted value is
about the same (+- 10 units), regardless of the value of X.
Statistically, this is referred to as homoscedasticity. Such
homoscedasticity is very important as it is an underlying assumption
for regression, and its violation leads to parameter estimates with
inflated variances. If the data are homoscedastic, then the usual
regression estimates can be used. If the data are not homoscedastic,
then the estimates can be improved using weighting procedures as
shown in the next example.
name.
5. Some analysts prefer to connect the scatter plots. Others prefer to
leave a little gap between each plot.
6. Although this plot type is most commonly used for scatter plots,
the basic concept is both simple and powerful and extends easily
to other plot formats that involve pairwise plots such as the
quantile-quantile plot and the bihistogram.
Sample Plot
This sample plot was generated from pollution data collected by NIST
chemist Lloyd Currie.
There are a number of ways to view this plot. If we are primarily
interested in a particular variable, we can scan the row and column for
that variable. If we are interested in finding the strongest relationship,
we can scan all the plots and then determine which variables are
related.
Definition Given k variables, scatter plot matrices are formed by creating k rows
and k columns. Each row and column defines a single scatter plot
The individual plot for row i and column j is defined as
● Vertical axis: Variable Xi
Questions The scatterplot matrix can provide answers to the following questions:
1. Are there pairwise relationships between the variables?
2. If there are relationships, what is the nature of these
relationships?
3. Are there outliers in the data?
4. Is there clustering by groups in the data?
Linking and The scatterplot matrix serves as the foundation for the concepts of
Brushing linking and brushing.
By linking, we mean showing how a point, or set of points, behaves in
each of the plots. This is accomplished by highlighting these points in
some fashion. For example, the highlighted points could be drawn as a
filled circle while the remaining points could be drawn as unfilled
circles. A typical application of this would be to show how an outlier
shows up in each of the individual pairwise plots. Brushing extends this
concept a bit further. In brushing, the points to be highlighted are
interactively selected by a mouse and the scatterplot matrix is
dynamically updated (ideally in real time). That is, we can select a
rectangular region of points in one plot and see how those points are
reflected in the other plots. Brushing is discussed in detail by Becker,
Cleveland, and Wilks in the paper "Dynamic Graphics for Data
Analysis" (Cleveland and McGill, 1988).
4. Although this plot type is most commonly used for scatter plots,
the basic concept is both simple and powerful and extends easily
to other plot formats.
Sample Plot
In this case, temperature has six distinct values. We plot torque versus
time for each of these temperatures. This example is discussed in more
detail in the process modeling chapter.
where only the points in the group corresponding to the ith row and jth
column are used.
Questions The conditioning plot can provide answers to the following questions:
1. Is there a relationship between two variables?
2. If there is a relationship, does the nature of the relationship
depend on the value of a third variable?
3. Are groups in the data similar?
4. Are there outliers in the data?
Sample Plot
Questions The spectral plot can be used to answer the following questions:
1. How many cyclic components are there?
2. Is there a dominant cyclic frequency?
3. If there is a dominant cyclic frequency, what is it?
Importance The spectral plot is the primary technique for assessing the cyclic nature
Check of univariate time series in the frequency domain. It is almost always the
Cyclic second plot (after a run sequence plot) generated in a frequency domain
Behavior of analysis of a time series.
Time Series
Case Study The spectral plot is demonstrated in the beam deflection data case study.
Conclusions We can make the following conclusions from the above plot.
1. There are no dominant peaks.
2. There is no identifiable pattern in the spectrum.
3. The data are random.
Discussion For random data, the spectral plot should show no dominant peaks or
distinct pattern in the spectrum. For the sample plot above, there are no
clearly dominant peaks and the peaks seem to fluctuate at random. This
type of appearance of the spectral plot indicates that there are no
significant cyclic patterns in the data.
Conclusions We can make the following conclusions from the above plot.
1. Strong dominant peak near zero.
2. Peak decays rapidly towards zero.
3. An autoregressive model is an appropriate model.
Discussion This spectral plot starts with a dominant peak near zero and rapidly
decays to zero. This is the spectral plot signature of a process with
strong positive autocorrelation. Such processes are highly non-random
in that there is high association between an observation and a
succeeding observation. In short, if you know Yi you can make a
strong guess as to what Yi+1 will be.
Recommended The next step would be to determine the parameters for the
Next Step autoregressive model:
Conclusions We can make the following conclusions from the above plot.
1. There is a single dominant peak at approximately 0.3.
2. There is an underlying single-cycle sinusoidal model.
Discussion This spectral plot shows a single dominant frequency. This indicates
that a single-cycle sinusoidal model might be appropriate.
If one were to naively assume that the data represented by the graph
could be fit by the model
and then estimate the constant by the sample mean, the analysis would
be incorrect because
● the sample mean is biased;
● the confidence interval for the mean, which is valid only for
random data, is meaningless and too small.
On the other hand, the choice of the proper model
Sample Plot
Questions The standard deviation plot can be used to answer the following
questions.
1. Are there any shifts in variation?
2. What is the magnitude of the shifts in variation?
3. Is there a distinct pattern in the shifts in variation?
Sample Plot The plot below contains the star plots of 16 cars. The data file actually
contains 74 cars, but we restrict the plot to what can reasonably be
shown on one page. The variable list for the sample star plot is
1 Price
2 Mileage (MPG)
3 1978 Repair Record (1 = Worst, 5 = Best)
4 1977 Repair Record (1 = Worst, 5 = Best)
5 Headroom
6 Rear Seat Room
7 Trunk Space
8 Weight
9 Length
Definition The star plot consists of a sequence of equi-angular spokes, called radii,
with each spoke representing one of the variables. The data length of a
spoke is proportional to the magnitude of the variable for the data point
relative to the maximum magnitude of the variable across all data
points. A line is drawn connecting the data values for each spoke. This
gives the plot a star-like appearance and the origin of the name of this
plot.
Questions The star plot can be used to answer the following questions:
1. What variables are dominant for a given observation?
2. Which observations are most similar, i.e., are there clusters of
observations?
3. Are there outliers?
Weakness in Star plots are helpful for small-to-moderate-sized multivariate data sets.
Technique Their primary weakness is that their effectiveness is limited to data sets
with less than a few hundred points. After that, they tend to be
overwhelming.
Graphical techniques suited for large data sets are discussed by Scott.
Software Star plots are available in some general purpose statistical software
progams, including Dataplot.
Sample Plot
Questions The Weibull plot can be used to answer the following questions:
1. Do the data follow a 2-parameter Weibull distribution?
2. What is the best estimate of the shape parameter for the
2-parameter Weibull distribution?
3. What is the best estimate of the scale (= variation) parameter for
the 2-parameter Weibull distribution?
The Weibull probability plot (in conjunction with the Weibull PPCC
plot), the Weibull hazard plot, and the Weibull plot are all similar
techniques that can be used for assessing the adequacy of the Weibull
distribution as a model for the data, and additionally providing
estimation for the shape, scale, or location parameters.
The Weibull hazard plot and Weibull plot are designed to handle
censored data (which the Weibull probability plot does not).
Case Study The Weibull plot is demonstrated in the airplane glass failure data case
study.
Sample Plot
Questions The Youden plot can be used to answer the following questions:
1. Are all labs equivalent?
2. What labs have between-lab problems (reproducibility)?
3. What labs have within-lab problems (repeatability)?
4. What labs are outliers?
Importance In interlaboratory studies or in comparing two runs from the same lab, it
is useful to know if consistent results are generated. Youden plots
should be a routine plot for analyzing this type of data.
DEX Youden The dex Youden plot is a specialized Youden plot used in the design of
Plot experiments. In particular, it is useful for full and fractional designs.
Construction The following are the primary steps in the construction of the dex
of DEX Youden plot.
Youden Plot
1. For a given factor or interaction term, compute the mean of the
response variable for the low level of the factor and for the high
level of the factor. Any center points are omitted from the
computation.
2. Plot the point where the y-coordinate is the mean for the high
level of the factor and the x-coordinate is the mean for the low
level of the factor. The character used for the plot point should
identify the factor or interaction term (e.g., "1" for factor 1, "13"
for the interaction between factors 1 and 3).
3. Repeat steps 1 and 2 for each factor and interaction term of the
data.
The high and low values of the interaction terms are obtained by
multiplying the corresponding values of the main level factors. For
example, the interaction term X13 is obtained by multiplying the values
for X1 with the corresponding values of X3. Since the values for X1 and
X3 are either "-1" or "+1", the resulting values for X13 are also either
"-1" or "+1".
In summary, the dex Youden plot is a plot of the mean of the response
variable for the high level of a factor or interaction term against the
mean of the response variable for the low level of that factor or
interaction term.
For unimportant factors and interaction terms, these mean values
should be nearly the same. For important factors and interaction terms,
these mean values should be quite different. So the interpretation of the
plot is that unimportant factors should be clustered together near the
grand mean. Points that stand apart from this cluster identify important
factors that should be included in the model.
Sample DEX The following is a dex Youden plot for the data used in the Eddy
Youden Plot current case study. The analysis in that case study demonstrated that
X1 and X2 were the most important factors.
Interpretation From the above dex Youden plot, we see that factors 1 and 2 stand out
of the Sample from the others. That is, the mean response values for the low and high
DEX Youden levels of factor 1 and factor 2 are quite different. For factor 3 and the 2
Plot and 3-term interactions, the mean response values for the low and high
levels are similar.
We would conclude from this plot that factors 1 and 2 are important
and should be included in our final model while the remaining factors
and interactions should be omitted from the final model.
Case Study The Eddy current case study demonstrates the use of the dex Youden
plot in the context of the analysis of a full factorial design.
Software DEX Youden plots are not typically available as built-in plots in
statistical software programs. However, it should be relatively
straightforward to write a macro to generate this plot in most general
purpose statistical software programs.
1.3.3.32. 4-Plot
Purpose: The 4-plot is a collection of 4 specific EDA graphical techniques
Check whose purpose is to test the assumptions that underlie most
Underlying measurement processes. A 4-plot consists of a
Statistical 1. run sequence plot;
Assumptions
2. lag plot;
3. histogram;
4. normal probability plot.
If the 4 underlying assumptions of a typical measurement process
hold, then the above 4 plots will have a characteristic appearance (see
the normal random numbers case study below); if any of the
underlying assumptions fail to hold, then it will be revealed by an
anomalous appearance in one or more of the plots. Several commonly
encountered situations are demonstrated in the case studies below.
Although the 4-plot has an obvious use for univariate and time series
data, its usefulness extends far beyond that. Many statistical models of
the form
have the same underlying assumptions for the error term. That is, no
matter how complicated the functional fit, the assumptions on the
underlying error term are still the same. The 4-plot can and should be
routinely applied to the residuals when fitting models regardless of
whether the model is simple or complicated.
Sample Plot:
Process Has
Fixed
Location,
Fixed
Variation,
Non-Random
(Oscillatory),
Non-Normal
U-Shaped
Distribution,
and Has 3
Outliers.
❍ Horizontally: Y
Autocorrelation Plot
Spectral Plot
PPCC Plot
Case Studies The 4-plot is used in most of the case studies in this chapter:
1. Normal random numbers (the ideal)
2. Uniform random numbers
3. Random walk
4. Josephson junction cryothermometry
5. Beam deflections
6. Filter transmittance
7. Standard resistor
8. Heat flow meter 1
Software It should be feasible to write a macro for the 4-plot in any general
purpose statistical software program that supports the capability for
multiple plots per page and supports the underlying plot techniques.
Dataplot supports the 4-plot.
1.3.3.33. 6-Plot
Purpose: The 6-plot is a collection of 6 specific graphical techniques whose
Graphical purpose is to assess the validity of a Y versus X fit. The fit can be a
Model linear fit, a non-linear fit, a LOWESS (locally weighted least squares)
Validation fit, a spline fit, or any other fit utilizing a single independent variable.
The 6 plots are:
1. Scatter plot of the response and predicted values versus the
independent variable;
2. Scatter plot of the residuals versus the independent variable;
3. Scatter plot of the residuals versus the predicted values;
4. Lag plot of the residuals;
5. Histogram of the residuals;
6. Normal probability plot of the residuals.
Sample Plot
This 6-plot, which followed a linear fit, shows that the linear model is
not adequate. It suggests that a quadratic model would be a better
model.
5. Histogram of residuals
❍ Vertical axis: Counts
Case Study The 6-plot is used in the Alaska pipeline data case study.
Software It should be feasible to write a macro for the 6-plot in any general
purpose statistical software program that supports the capability for
multiple plots per page and supports the underlying plot techniques.
Dataplot supports the 6-plot.
1.3.4. Graphical
Techniques: By
Problem
Category
Univariate
y=c+e
Time Series
y = f(t) + e
Complex Complex
Demodulation Demodulation
Amplitude Plot: Phase Plot:
1.3.3.8 1.3.3.9
1 Factor
y = f(x) + e
Multi-Factor/Comparative
y = f(xp, x1,x2,...,xk) + e
Block Plot:
1.3.3.3
Multi-Factor/Screening
y = f(x1,x2,x3,...,xk) + e
Contour Plot:
1.3.3.10
Regression
y = f(x1,x2,x3,...,xk) + e
Interlab
(y1,y2) = f(x) + e
Youden Plot:
1.3.3.31
Multivariate
(y1,y2,...,yp)
Star Plot:
1.3.3.29
Hypothesis Hypothesis tests also address the uncertainty of the sample estimate.
Tests However, instead of providing an interval, a hypothesis test attempts to
refute a specific claim about a population parameter based on the
sample data. For example, the hypothesis might be one of the
following:
● the population mean is equal to 10
Table of Some of the more common classical quantitative techniques are listed
Contents below. This list of quantitative techniques is by no means meant to be
exhaustive. Additional discussions of classical statistical techniques are
contained in the product comparisons chapter.
● Location
1. Measures of Location
2. Confidence Limits for the Mean and One Sample t-Test
3. Two Sample t-Test for Equal Means
4. One Factor Analysis of Variance
5. Multi-Factor Analysis of Variance
● Scale (or variability or spread)
1. Measures of Scale
2. Bartlett's Test
3. Chi-Square Test
4. F-Test
5. Levene Test
● Skewness and Kurtosis
1. Measures of Skewness and Kurtosis
● Randomness
1. Autocorrelation
2. Runs Test
● Distributional Measures
1. Anderson-Darling Test
2. Chi-Square Goodness-of-Fit Test
3. Kolmogorov-Smirnov Test
● Outliers
1. Grubbs Test
● 2-Level Factorial Designs
1. Yates Analysis
Definition of The first step is to define what we mean by a typical value. For
Location univariate data, there are three common definitions:
1. mean - the mean is the sum of the data points divided by the
number of data points. That is,
3. mode - the mode is the value of the random sample that occurs
with the greatest frequency. It is not necessarily unique. The
mode is typically used in a qualitative fashion. For example, there
may be a single dominant hump in the data perhaps two or more
smaller humps in the data. This is usually evident from a
histogram of the data.
When taking samples from continuous populations, we need to be
somewhat careful in how we define the mode. That is, any
specific value may not occur more than once if the data are
continuous. What may be a more meaningful, if less exact
measure, is the midpoint of the class interval of the histogram
with the highest peak.
Why A natural question is why we have more than one measure of the typical
Different value. The following example helps to explain why these alternative
Measures definitions are useful and necessary.
This plot shows histograms for 10,000 random numbers generated from
a normal, an exponential, a Cauchy, and a lognormal distribution.
Normal The first histogram is a sample from a normal distribution. The mean is
Distribution 0.005, the median is -0.010, and the mode is -0.144 (the mode is
computed as the midpoint of the histogram interval with the highest
peak).
The normal distribution is a symmetric distribution with well-behaved
tails and a single peak at the center of the distribution. By symmetric,
we mean that the distribution can be folded about an axis so that the 2
sides coincide. That is, it behaves the same to the left and right of some
center point. For a normal distribution, the mean, median, and mode are
actually equivalent. The histogram above generates similar estimates for
the mean, median, and mode. Therefore, if a histogram or normal
probability plot indicates that your data are approximated well by a
normal distribution, then it is reasonable to use the mean as the location
estimator.
Cauchy The third histogram is a sample from a Cauchy distribution. The mean is
Distribution 3.70, the median is -0.016, and the mode is -0.362 (the mode is
computed as the midpoint of the histogram interval with the highest
peak).
For better visual comparison with the other data sets, we restricted the
histogram of the Cauchy distribution to values between -10 and 10. The
full Cauchy data set in fact has a minimum of approximately -29,000
and a maximum of approximately 89,000.
The Cauchy distribution is a symmetric distribution with heavy tails and
a single peak at the center of the distribution. The Cauchy distribution
has the interesting property that collecting more data does not provide a
more accurate estimate of the mean. That is, the sampling distribution of
the mean is equivalent to the sampling distribution of the original data.
This means that for the Cauchy distribution the mean is useless as a
measure of the typical value. For this histogram, the mean of 3.7 is well
above the vast majority of the data. This is caused by a few very
extreme values in the tail. However, the median does provide a useful
measure for the typical value.
Although the Cauchy distribution is an extreme case, it does illustrate
the importance of heavy tails in measuring the mean. Extreme values in
the tails distort the mean. However, these extreme values do not distort
the median since the median is based on ranks. In general, for data with
extreme values in the tails, the median provides a better estimate of
location than does the mean.
Robustness There are various alternatives to the mean and median for measuring
location. These alternatives were developed to address non-normal data
since the mean is an optimal estimator if in fact your data are normal.
Tukey and Mosteller defined two types of robustness where robustness
is a lack of susceptibility to the effects of nonnormality.
1. Robustness of validity means that the confidence intervals for the
population location have a 95% chance of covering the population
location regardless of what the underlying distribution is.
2. Robustness of efficiency refers to high effectiveness in the face of
non-normal tails. That is, confidence intervals for the population
location tend to be almost as narrow as the best that could be done
if we knew the true shape of the distributuion.
The mean is an example of an estimator that is the best we can do if the
underlying distribution is normal. However, it lacks robustness of
validity. That is, confidence intervals based on the mean tend not to be
precise if the underlying distribution is in fact not normal.
The median is an example of a an estimator that tends to have
robustness of validity but not robustness of efficiency.
The alternative measures of location try to balance these two concepts of
robustness. That is, the confidence intervals for the case when the data
are normal should be almost as narrow as the confidence intervals based
on the mean. However, they should maintain their validity even if the
underlying data are not normal. In particular, these alternatives address
the problem of heavy-tailed distributions.
Case Study The uniform random numbers case study compares the performance of
several different location estimators for a particular non-normal
distribution.
This simply means that noisy data, i.e., data with a large standard deviation, are
going to generate wider intervals than data with a smaller standard deviation.
Definition: To test whether the population mean has a specific value, , against the two-sided
Hypothesis alternative that it does not have a value , the confidence interval is converted to
Test hypothesis-test form. The test is a one-sample t-test, and it is defined as:
H0:
Ha:
Test Statistic:
where , N, and are defined as above.
Significance Level: . The most commonly used value for is 0.05.
Critical Region: Reject the null hypothesis that the mean is a specified value, ,
if
or
Sample Dataplot generated the following output for a confidence interval from the
Output for ZARR13.DAT data set:
Confidence
Interval
CONFIDENCE LIMITS FOR MEAN
(2-SIDED)
Interpretation The first few lines print the sample statistics used in calculating the confidence
of the Sample interval. The table shows the confidence interval for several different significance
Output levels. The first column lists the confidence level (which is 1 - expressed as a
percent), the second column lists the t-value (i.e., ), the third column lists
the t-value times the standard error (the standard error is ), the fourth column
lists the lower confidence limit, and the fifth column lists the upper confidence limit.
For example, for a 95% confidence interval, we go to the row identified by 95.000 in
the first column and extract an interval of (9.25824, 9.26468) from the last two
columns.
Output from other statistical software may look somewhat different from the above
output.
Sample Dataplot generated the following output for a one-sample t-test from the
Output for t ZARR13.DAT data set:
Test
T TEST
(1-SAMPLE)
MU0 = 5.000000
NULL HYPOTHESIS UNDER TEST--MEAN MU = 5.000000
SAMPLE:
NUMBER OF OBSERVATIONS = 195
MEAN = 9.261460
STANDARD DEVIATION = 0.2278881E-01
STANDARD DEVIATION OF MEAN = 0.1631940E-02
TEST:
MEAN-MU0 = 4.261460
T TEST STATISTIC VALUE = 2611.284
DEGREES OF FREEDOM = 194.0000
T TEST STATISTIC CDF VALUE = 1.000000
ALTERNATIVE- ALTERNATIVE-
ALTERNATIVE- HYPOTHESIS HYPOTHESIS
HYPOTHESIS ACCEPTANCE INTERVAL CONCLUSION
MU <> 5.000000 (0,0.025) (0.975,1) ACCEPT
MU < 5.000000 (0,0.05) REJECT
MU > 5.000000 (0.95,1) ACCEPT
Interpretation We are testing the hypothesis that the population mean is 5. The output is divided into
of Sample three sections.
Output 1. The first section prints the sample statistics used in the computation of the t-test.
2. The second section prints the t-test statistic value, the degrees of freedom, and
the cumulative distribution function (cdf) value of the t-test statistic. The t-test
statistic cdf value is an alternative way of expressing the critical value. This cdf
value is compared to the acceptance intervals printed in section three. For an
upper one-tailed test, the alternative hypothesis acceptance interval is (1 - ,1),
the alternative hypothesis acceptance interval for a lower one-tailed test is (0,
), and the alternative hypothesis acceptance interval for a two-tailed test is (1 -
/2,1) or (0, /2). Note that accepting the alternative hypothesis is equivalent to
rejecting the null hypothesis.
3. The third section prints the conclusions for a 95% test since this is the most
common case. Results are given in terms of the alternative hypothesis for the
two-tailed test and for the one-tailed test in both directions. The alternative
hypothesis acceptance interval column is stated in terms of the cdf value printed
in section two. The last column specifies whether the alternative hypothesis is
accepted or rejected. For a different significance level, the appropriate
conclusion can be drawn from the t-test statistic cdf value printed in section
two. For example, for a significance level of 0.10, the corresponding alternative
hypothesis acceptance intervals are (0,0.05) and (0.95,1), (0, 0.10), and (0.90,1).
Output from other statistical software may look somewhat different from the above
output.
Questions Confidence limits for the mean can be used to answer the following questions:
1. What is a reasonable estimate for the mean?
2. How much variability is there in the estimate of the mean?
3. Does a given target value fall within the confidence limits?
Software Confidence limits for the mean and one-sample t-tests are available in just about all
general purpose statistical software programs, including Dataplot.
Definition The two sample t test for unpaired data is defined as:
H0:
Ha:
Test
Statistic:
where N1 and N2 are the sample sizes, and are the sample
means, and and are the sample variances.
where
Significance .
Level:
Critical Reject the null hypothesis that the two means are equal if
Region:
or
Sample Dataplot generated the following output for the t test from the AUTO83B.DAT
Output data set:
T TEST
(2-SAMPLE)
NULL HYPOTHESIS UNDER TEST--POPULATION MEANS MU1 = MU2
SAMPLE 1:
NUMBER OF OBSERVATIONS = 249
MEAN = 20.14458
STANDARD DEVIATION = 6.414700
STANDARD DEVIATION OF MEAN = 0.4065151
SAMPLE 2:
NUMBER OF OBSERVATIONS = 79
MEAN = 30.48101
STANDARD DEVIATION = 6.107710
STANDARD DEVIATION OF MEAN = 0.6871710
ALTERNATIVE- ALTERNATIVE-
ALTERNATIVE- HYPOTHESIS HYPOTHESIS
HYPOTHESIS ACCEPTANCE INTERVAL CONCLUSION
MU1 <> MU2 (0,0.025) (0.975,1) ACCEPT
MU1 < MU2 (0,0.05) ACCEPT
MU1 > MU2 (0.95,1) REJECT
Interpretation We are testing the hypothesis that the population mean is equal for the two
of Sample samples. The output is divided into five sections.
Output 1. The first section prints the sample statistics for sample one used in the
computation of the t-test.
2. The second section prints the sample statistics for sample two used in the
computation of the t-test.
3. The third section prints the pooled standard deviation, the difference in the
means, the t-test statistic value, the degrees of freedom, and the cumulative
distribution function (cdf) value of the t-test statistic under the assumption
that the standard deviations are equal. The t-test statistic cdf value is an
alternative way of expressing the critical value. This cdf value is compared
to the acceptance intervals printed in section five. For an upper one-tailed
test, the acceptance interval is (0,1 - ), the acceptance interval for a
two-tailed test is ( /2, 1 - /2), and the acceptance interval for a lower
Software Two-sample t-tests are available in just about all general purpose statistical
software programs, including Dataplot.
18 19
14 32
14 34
14 26
14 30
12 22
13 22
13 33
18 39
22 36
19 28
18 27
23 21
26 24
25 30
20 34
21 32
13 38
14 37
15 30
14 31
17 37
11 32
13 47
12 41
13 45
15 34
13 33
13 24
14 32
22 39
28 35
13 32
14 37
13 38
14 34
15 34
12 32
13 33
13 32
14 25
13 24
12 37
13 31
18 36
16 36
18 34
18 38
23 32
11 38
12 32
13 -999
12 -999
18 -999
21 -999
19 -999
21 -999
15 -999
16 -999
15 -999
11 -999
20 -999
21 -999
19 -999
15 -999
26 -999
25 -999
16 -999
16 -999
18 -999
16 -999
13 -999
14 -999
14 -999
14 -999
28 -999
19 -999
18 -999
15 -999
15 -999
16 -999
15 -999
16 -999
14 -999
17 -999
16 -999
15 -999
18 -999
21 -999
20 -999
13 -999
23 -999
20 -999
23 -999
18 -999
19 -999
25 -999
26 -999
18 -999
16 -999
16 -999
15 -999
22 -999
22 -999
24 -999
23 -999
29 -999
25 -999
20 -999
18 -999
19 -999
18 -999
27 -999
13 -999
17 -999
13 -999
13 -999
13 -999
30 -999
26 -999
18 -999
17 -999
16 -999
15 -999
18 -999
21 -999
19 -999
19 -999
16 -999
16 -999
16 -999
16 -999
25 -999
26 -999
31 -999
34 -999
36 -999
20 -999
19 -999
20 -999
19 -999
21 -999
20 -999
25 -999
21 -999
19 -999
21 -999
21 -999
19 -999
18 -999
19 -999
18 -999
18 -999
18 -999
30 -999
31 -999
23 -999
24 -999
22 -999
20 -999
22 -999
20 -999
21 -999
17 -999
18 -999
17 -999
18 -999
17 -999
16 -999
19 -999
19 -999
36 -999
27 -999
23 -999
24 -999
34 -999
35 -999
28 -999
29 -999
27 -999
34 -999
32 -999
28 -999
26 -999
24 -999
19 -999
28 -999
24 -999
27 -999
27 -999
26 -999
24 -999
30 -999
39 -999
35 -999
34 -999
30 -999
22 -999
27 -999
20 -999
18 -999
28 -999
27 -999
34 -999
31 -999
29 -999
27 -999
24 -999
23 -999
38 -999
36 -999
25 -999
38 -999
26 -999
22 -999
36 -999
27 -999
27 -999
32 -999
28 -999
31 -999
This model decomposes the response into a mean for each cell and an
error term. The analysis of variance provides estimates for each cell
mean. These estimated cell means are the predicted values of the model
and the differences between the response variable and the estimated cell
means are the residuals. That is
This model decomposes the response into an overall (grand) mean, the
effect of the ith factor level, and an error term. The analysis of variance
provides estimates of the grand mean and the effect of the ith factor
level. The predicted values and the residuals of the model are
The distinction between these models is that the second model divides
the cell mean into an overall mean and the effect of the ith factor level.
This second model makes the factor effect more explicit, so we will
emphasize this approach.
Model Note that the ANOVA model assumes that the error term, Eij, should
Validation follow the assumptions for a univariate measurement process. That is,
after performing an analysis of variance, the model should be validated
by analyzing the residuals.
Sample Dataplot generated the following output for the one-way analysis of variance from the
Output GEAR.DAT data set.
*****************
* ANOVA TABLE *
*****************
****************
* ESTIMATION *
****************
GRAND MEAN =
0.99764001369E+00
GRAND STANDARD DEVIATION =
0.62789078802E-02
For these data, including the factor effect reduces the residual
standard deviation from 0.00623 to 0.0059. That is, although the
factor is statistically significant, it has minimal improvement
over a simple constant model. This is because the factor is just
barely significant.
Output from other statistical software may look somewhat different
from the above output.
In addition to the quantitative ANOVA output, it is recommended that
any analysis of variance be complemented with model validation. At a
minimum, this should include
1. A run sequence plot of the residuals.
2. A normal probability plot of the residuals.
3. A scatter plot of the predicted values against the residuals.
Question The analysis of variance can be used to answer the following question
● Are means the same across groups in the data?
This model decomposes the response into a mean for each cell and an
error term. The analysis of variance provides estimates for each cell
mean. These cell means are the predicted values of the model and the
differences between the response variable and the estimated cell means
are the residuals. That is
factor effects ( and represent the effects of the ith level of the first
factor and the jth level of the second factor, respectively), and an error
term. The analysis of variance provides estimates of the grand mean and
the factor effects. The predicted values and the residuals of the model
are
The distinction between these models is that the second model divides
the cell mean into an overall mean and factor effects. This second model
makes the factor effect more explicit, so we will emphasize this
approach.
Model Note that the ANOVA model assumes that the error term, Eijk, should
Validation follow the assumptions for a univariate measurement process. That is,
after performing an analysis of variance, the model should be validated
by analyzing the residuals.
Sample Dataplot generated the following ANOVA output for the JAHANMI2.DAT data set:
Output
**********************************
**********************************
** 4-WAY ANALYSIS OF VARIANCE **
**********************************
**********************************
*****************
* ANOVA TABLE *
*****************
****************
* ESTIMATION *
****************
GRAND MEAN =
0.65007739258E+03
GRAND STANDARD DEVIATION =
0.74638252258E+02
a model with each factor individually, and the model with all
four factors included.
For these data, we see that including factor 4 has a significant
impact on the residual standard deviation (63.73 when only the
factor 4 effect is included compared to 63.058 when all four
factors are included).
Output from other statistical software may look somewhat different
Case Study The quantitative ANOVA approach can be contrasted with the more
graphical EDA approach in the ceramic strength case study.
Definitions of For univariate data, there are several common numerical measures of
Variability the spread:
1. variance - the variance is defined as
where is the mean of the data and |Y| is the absolute value of
Y. This measure does not square the distance from the mean, so
it is less affected by extreme observations than are the variance
and standard deviation.
5. median absolute deviation - the median absolute deviation
(MAD) is defined as
where is the median of the data and |Y| is the absolute value
of Y. This is a variation of the average absolute deviation that is
even less affected by extremes in the tail because the data in the
tails have less influence on the calculation of the median than
they do on the mean.
6. interquartile range - this is the value of the 75th percentile
minus the value of the 25th percentile. This measure of scale
attempts to measure the variability of points near the center.
In summary, the variance, standard deviation, average absolute
deviation, and median absolute deviation measure both aspects of the
variability; that is, the variability near the center and the variability in
the tails. They differ in that the average absolute deviation and median
absolute deviation do not give undue weight to the tail behavior. On
the other hand, the range only uses the two most extreme points and
the interquartile range only uses the middle portion of the data.
Why Different The following example helps to clarify why these alternative
Measures? defintions of spread are useful and necessary.
This plot shows histograms for 10,000 random numbers generated
from a normal, a double exponential, a Cauchy, and a Tukey-Lambda
distribution.
In the above, si2 is the variance of the ith group, N is the total sample size,
Ni is the sample size of the ith group, k is the number of groups, and sp2 is
the pooled variance. The pooled variance is a weighted average of the
group variances and is defined as:
Significance
Level:
Sample Dataplot generated the following output for Bartlett's test using the GEAR.DAT
Output data set:
BARTLETT TEST
(STANDARD DEFINITION)
NULL HYPOTHESIS UNDER TEST--ALL SIGMA(I) ARE EQUAL
TEST:
DEGREES OF FREEDOM = 9.000000
Interpretation We are testing the hypothesis that the group variances are all equal.
of Sample The output is divided into two sections.
Output 1. The first section prints the value of the Bartlett test statistic, the
degrees of freedom (k-1), the upper critical value of the
chi-square distribution corresponding to significance levels of
0.05 (the 95% percent point) and 0.01 (the 99% percent point).
We reject the null hypothesis at that significance level if the
value of the Bartlett test statistic is greater than the
corresponding critical value.
2. The second section prints the conclusion for a 95% test.
Output from other statistical software may look somewhat different
from the above output.
Critical Region: Reject the null hypothesis that the standard deviation is a
specified value, , if
for an upper one-tailed alternative
The formula for the hypothesis test can easily be converted to form an interval
estimate for the standard deviation:
Sample Dataplot generated the following output for a chi-square test from the
Output GEAR.DAT data set:
CHI-SQUARED TEST
SIGMA0 = 0.1000000
NULL HYPOTHESIS UNDER TEST--STANDARD DEVIATION SIGMA =
.1000000
SAMPLE:
NUMBER OF OBSERVATIONS = 100
MEAN = 0.9976400
STANDARD DEVIATION S = 0.6278908E-02
TEST:
S/SIGMA0 = 0.6278908E-01
CHI-SQUARED STATISTIC = 0.3903044
ALTERNATIVE- ALTERNATIVE-
ALTERNATIVE- HYPOTHESIS HYPOTHESIS
HYPOTHESIS ACCEPTANCE INTERVAL CONCLUSION
SIGMA <> .1000000 (0,0.025), (0.975,1) ACCEPT
SIGMA < .1000000 (0,0.05) ACCEPT
SIGMA > .1000000 (0.95,1) REJECT
Interpretation We are testing the hypothesis that the population standard deviation is 0.1. The
of Sample output is divided into three sections.
Output 1. The first section prints the sample statistics used in the computation of the
chi-square test.
2. The second section prints the chi-square test statistic value, the degrees of
freedom, and the cumulative distribution function (cdf) value of the
chi-square test statistic. The chi-square test statistic cdf value is an
alternative way of expressing the critical value. This cdf value is compared
to the acceptance intervals printed in section three. For an upper one-tailed
test, the alternative hypothesis acceptance interval is (1 - ,1), the
alternative hypothesis acceptance interval for a lower one-tailed test is (0,
), and the alternative hypothesis acceptance interval for a two-tailed test
is (1 - /2,1) or (0, /2). Note that accepting the alternative hypothesis is
equivalent to rejecting the null hypothesis.
3. The third section prints the conclusions for a 95% test since this is the most
common case. Results are given in terms of the alternative hypothesis for
the two-tailed test and for the one-tailed test in both directions. The
alternative hypothesis acceptance interval column is stated in terms of the
cdf value printed in section two. The last column specifies whether the
alternative hypothesis is accepted or rejected. For a different significance
level, the appropriate conclusion can be drawn from the chi-square test
statistic cdf value printed in section two. For example, for a significance
level of 0.10, the corresponding alternative hypothesis acceptance intervals
are (0,0.05) and (0.95,1), (0, 0.10), and (0.90,1).
Output from other statistical software may look somewhat different from the
above output.
Questions The chi-square test can be used to answer the following questions:
1. Is the standard deviation equal to some pre-determined threshold value?
2. Is the standard deviation greater than some pre-determined threshold value?
3. Is the standard deviation less than some pre-determined threshold value?
Related F Test
Techniques Bartlett Test
Levene Test
Software The chi-square test for the standard deviation is available in many general purpose
statistical software programs, including Dataplot.
0.999 3.000
0.996 3.000
0.996 3.000
1.005 4.000
1.002 4.000
0.994 4.000
1.000 4.000
0.995 4.000
0.994 4.000
0.998 4.000
0.996 4.000
1.002 4.000
0.996 4.000
0.998 5.000
0.998 5.000
0.982 5.000
0.990 5.000
1.002 5.000
0.984 5.000
0.996 5.000
0.993 5.000
0.980 5.000
0.996 5.000
1.009 6.000
1.013 6.000
1.009 6.000
0.997 6.000
0.988 6.000
1.002 6.000
0.995 6.000
0.998 6.000
0.981 6.000
0.996 6.000
0.990 7.000
1.004 7.000
0.996 7.000
1.001 7.000
0.998 7.000
1.000 7.000
1.018 7.000
1.010 7.000
0.996 7.000
1.002 7.000
0.998 8.000
1.000 8.000
1.006 8.000
1.000 8.000
1.002 8.000
0.996 8.000
0.998 8.000
0.996 8.000
1.002 8.000
1.006 8.000
1.002 9.000
0.998 9.000
0.996 9.000
0.995 9.000
0.996 9.000
1.004 9.000
1.004 9.000
0.998 9.000
0.999 9.000
0.991 9.000
0.991 10.000
0.995 10.000
0.984 10.000
0.994 10.000
0.997 10.000
0.997 10.000
0.991 10.000
0.998 10.000
1.004 10.000
0.997 10.000
Critical The hypothesis that the two standard deviations are equal is rejected if
Region:
for an upper one-tailed test
or
Sample Dataplot generated the following output for an F-test from the JAHANMI2.DAT data
Output set:
F TEST
NULL HYPOTHESIS UNDER TEST--SIGMA1 = SIGMA2
ALTERNATIVE HYPOTHESIS UNDER TEST--SIGMA1 NOT EQUAL SIGMA2
SAMPLE 1:
NUMBER OF OBSERVATIONS = 240
MEAN = 688.9987
STANDARD DEVIATION = 65.54909
SAMPLE 2:
NUMBER OF OBSERVATIONS = 240
MEAN = 611.1559
STANDARD DEVIATION = 61.85425
TEST:
STANDARD DEV. (NUMERATOR) = 65.54909
STANDARD DEV. (DENOMINATOR) = 61.85425
F TEST STATISTIC VALUE = 1.123037
DEG. OF FREEDOM (NUMER.) = 239.0000
DEG. OF FREEDOM (DENOM.) = 239.0000
F TEST STATISTIC CDF VALUE = 0.814808
Interpretation We are testing the hypothesis that the standard deviations for sample one and sample
of Sample two are equal. The output is divided into four sections.
Output 1. The first section prints the sample statistics for sample one used in the
computation of the F-test.
2. The second section prints the sample statistics for sample two used in the
computation of the F-test.
3. The third section prints the numerator and denominator standard deviations, the
F-test statistic value, the degrees of freedom, and the cumulative distribution
function (cdf) value of the F-test statistic. The F-test statistic cdf value is an
alternative way of expressing the critical value. This cdf value is compared to the
acceptance interval printed in section four. The acceptance interval for a
two-tailed test is (0,1 - ).
4. The fourth section prints the conclusions for a 95% test since this is the most
common case. Results are printed for an upper one-tailed test. The acceptance
interval column is stated in terms of the cdf value printed in section three. The
last column specifies whether the null hypothesis is accepted or rejected. For a
different significance level, the appropriate conclusion can be drawn from the
F-test statistic cdf value printed in section four. For example, for a significance
level of 0.10, the corresponding acceptance interval become (0.000,0.9000).
Output from other statistical software may look somewhat different from the above
output.
Software The F-test for equality of two standard deviations is available in many general purpose
statistical software programs, including Dataplot.
Critical The Levene test rejects the hypothesis that the variances are
Region: equal if
Sample Dataplot generated the following output for Levene's test using the
Output GEAR.DAT data set (by default, Dataplot performs the form of the test
based on the median):
1. STATISTICS
NUMBER OF OBSERVATIONS = 100
NUMBER OF GROUPS = 10
LEVENE F TEST STATISTIC = 1.705910
Interpretation We are testing the hypothesis that the group variances are equal. The
of Sample output is divided into three sections.
Output 1. The first section prints the number of observations (N), the number
of groups (k), and the value of the Levene test statistic.
2. The second section prints the upper critical value of the F
distribution corresponding to various significance levels. The value
in the first column, the confidence level of the test, is equivalent to
100(1- ). We reject the null hypothesis at that significance level if
the value of the Levene F test statistic printed in section one is
greater than the critical value printed in the last column.
3. The third section prints the conclusion for a 95% test. For a
different significance level, the appropriate conclusion can be drawn
from the table printed in section two. For example, for = 0.10, we
look at the row for 90% confidence and compare the critical value
1.702 to the Levene test statistic 1.7059. Since the test statistic is
greater than the critical value, we reject the null hypothesis at the
= 0.10 level.
Output from other statistical software may look somewhat different from
the above output.
Software The Levene test is available in some general purpose statistical software
programs, including Dataplot.
Definition of For univariate data Y1, Y2, ..., YN, the formula for skewness is:
Skewness
Definition of For univariate data Y1, Y2, ..., YN, the formula for kurtosis is:
Kurtosis
Examples The following example shows histograms for 10,000 random numbers
generated from a normal, a double exponential, a Cauchy, and a Weibull
distribution.
Normal The first histogram is a sample from a normal distribution. The normal
Distribution distribution is a symmetric distribution with well-behaved tails. This is
indicated by the skewness of 0.03. The kurtosis of 2.96 is near the
expected value of 3. The histogram verifies the symmetry.
Weibull The fourth histogram is a sample from a Weibull distribution with shape
Distribution parameter 1.5. The Weibull distribution is a skewed distribution with the
amount of skewness depending on the value of the shape parameter. The
degree of decay as we move away from the center also depends on the
value of the shape parameter. For this data set, the skewness is 1.08 and
the kurtosis is 4.46, which indicates moderate skewness and kurtosis.
correlation coefficient plot and the probability plot are useful tools for
determining a good distributional model for the data.
Software The skewness and kurtosis coefficients are available in most general
purpose statistical software programs, including Dataplot.
1.3.5.12. Autocorrelation
Purpose: The autocorrelation ( Box and Jenkins, 1976) function can be used for
Detect the following two purposes:
Non-Randomness, 1. To detect non-randomness in data.
Time Series
Modeling 2. To identify an appropriate time series model if the data are not
random.
Definition Given measurements, Y1, Y2, ..., YN at time X1, X2, ..., XN, the lag k
autocorrelation function is defined as
Sample Output Dataplot generated the following autocorrelation output using the
LEW.DAT data set:
lag autocorrelation
0. 1.00
1. -0.31
2. -0.74
3. 0.77
4. 0.21
5. -0.90
6. 0.38
7. 0.63
8. -0.77
9. -0.12
10. 0.82
11. -0.40
12. -0.55
13. 0.73
14. 0.07
15. -0.76
16. 0.40
17. 0.48
18. -0.70
19. -0.03
20. 0.70
21. -0.41
22. -0.43
23. 0.67
24. 0.00
25. -0.66
26. 0.42
27. 0.39
28. -0.65
29. 0.03
30. 0.63
31. -0.42
32. -0.36
33. 0.64
34. -0.05
35. -0.60
36. 0.43
37. 0.32
38. -0.64
39. 0.08
40. 0.58
41. -0.45
42. -0.28
43. 0.62
44. -0.10
45. -0.55
46. 0.45
47. 0.25
48. -0.61
49. 0.14
Case Study The heat flow meter data demonstrate the use of autocorrelation in
determining if the data are from a random process.
The beam deflection data demonstrate the use of autocorrelation in
developing a non-linear sinusoidal model.
Typical Analysis The first step in the runs test is to compute the sequential differences (Yi -
and Test Yi-1). Positive values indicate an increasing value and negative values
Statistics indicate a decreasing value. A runs test should include information such as
the output shown below from Dataplot for the LEW.DAT data set. The
output shows a table of:
1. runs of length exactly I for I = 1, 2, ..., 10
2. number of runs of length I
3. expected number of runs of length I
4. standard deviation of the number of runs of length I
5. a z-score where the z-score is defined to be
Sample Output Dataplot generated the following runs test output using the LEW.DAT data
set:
RUNS UP
RUNS DOWN
Interpretation of Scanning the last column labeled "Z", we note that most of the z-scores for
Sample Output run lengths 1, 2, and 3 have an absolute value greater than 1.96. This is strong
evidence that these data are in fact not random.
Output from other statistical software may look somewhat different from the
above output.
Question The runs test can be used to answer the following question:
● Were these sample data generated from a random process?
Related Autocorrelation
Techniques Run Sequence Plot
Lag Plot
Significance
Level:
Critical The critical values for the Anderson-Darling test are dependent
Region: on the specific distribution that is being tested. Tabulated values
and formulas have been published (Stephens, 1974, 1976, 1977,
1979) for a few specific distributions (normal, lognormal,
exponential, Weibull, logistic, extreme value type 1). The test is
a one-sided test and the hypothesis that the distribution is of a
specific form is rejected if the test statistic, A, is greater than the
critical value.
Note that for a given distribution, the Anderson-Darling statistic
may be multiplied by a constant (which usually depends on the
sample size, n). These constants are given in the various papers
by Stephens. In the sample output below, this is the "adjusted
Anderson-Darling" statistic. This is what should be compared
against the critical values. Also, be aware that different constants
(and therefore critical values) have been published. You just
need to be aware of what constant was used for a given set of
critical values (the needed constant is typically given with the
critical values).
Sample Dataplot generated the following output for the Anderson-Darling test. 1,000
Output random numbers were generated for a normal, double exponential, Cauchy,
and lognormal distribution. In all four cases, the Anderson-Darling test was
applied to test for a normal distribution. When the data were generated using a
normal distribution, the test statistic was small and the hypothesis was
accepted. When the data were generated using the double exponential, Cauchy,
and lognormal distributions, the statistics were significant, and the hypothesis
of an underlying normal distribution was rejected at significance levels of 0.10,
0.05, and 0.01.
The normal random numbers were stored in the variable Y1, the double
exponential random numbers were stored in the variable Y2, the Cauchy
random numbers were stored in the variable Y3, and the lognormal random
numbers were stored in the variable Y4.
***************************************
** anderson darling normal test y1 **
***************************************
1. STATISTICS:
NUMBER OF OBSERVATIONS = 1000
MEAN = 0.4359940E-02
2. CRITICAL VALUES:
90 % POINT = 0.6560000
95 % POINT = 0.7870000
97.5 % POINT = 0.9180000
99 % POINT = 1.092000
***************************************
** anderson darling normal test y2 **
***************************************
1. STATISTICS:
NUMBER OF OBSERVATIONS = 1000
MEAN = 0.2034888E-01
STANDARD DEVIATION = 1.321627
2. CRITICAL VALUES:
90 % POINT = 0.6560000
95 % POINT = 0.7870000
97.5 % POINT = 0.9180000
99 % POINT = 1.092000
***************************************
** anderson darling normal test y3 **
***************************************
1. STATISTICS:
NUMBER OF OBSERVATIONS = 1000
MEAN = 1.503854
STANDARD DEVIATION = 35.13059
2. CRITICAL VALUES:
90 % POINT = 0.6560000
95 % POINT = 0.7870000
97.5 % POINT = 0.9180000
99 % POINT = 1.092000
***************************************
** anderson darling normal test y4 **
***************************************
1. STATISTICS:
NUMBER OF OBSERVATIONS = 1000
MEAN = 1.518372
STANDARD DEVIATION = 1.719969
2. CRITICAL VALUES:
90 % POINT = 0.6560000
95 % POINT = 0.7870000
97.5 % POINT = 0.9180000
99 % POINT = 1.092000
Anderson-Darling test statistic (for the normal data) 0.256. Since the test
statistic is less than the critical value, we do not reject the null
hypothesis at the = 0.10 level.
As we would hope, the Anderson-Darling test accepts the hypothesis of
normality for the normal random numbers and rejects it for the 3 non-normal
cases.
The output from other statistical software programs may differ somewhat from
the output above.
Questions The Anderson-Darling test can be used to answer the following questions:
● Are the data from a normal distribution?
Importance Many statistical tests and procedures are based on specific distributional
assumptions. The assumption of normality is particularly common in classical
statistical tests. Much reliability modeling is based on the assumption that the
data follow a Weibull distribution.
There are many non-parametric and robust techniques that do not make strong
distributional assumptions. However, techniques based on specific
distributional assumptions are in general more powerful than non-parametric
and robust techniques. Therefore, if the distributional assumptions can be
validated, they are generally preferred.
Test Statistic: For the chi-square goodness-of-fit computation, the data are divided
into k bins and the test statistic is defined as
Sample Dataplot generated the following output for the chi-square test where 1,000 random
Output numbers were generated for the normal, double exponential, t with 3 degrees of freedom,
and lognormal distributions. In all cases, the chi-square test was applied to test for a
normal distribution. The test statistics show the characteristics of the test; when the data
are from a normal distribution, the test statistic is small and the hypothesis is accepted;
when the data are from the double exponential, t, and lognormal distributions, the
statistics are significant and the hypothesis of an underlying normal distribution is
rejected at significance levels of 0.10, 0.05, and 0.01.
The normal random numbers were stored in the variable Y1, the double exponential
random numbers were stored in the variable Y2, the t random numbers were stored in the
variable Y3, and the lognormal random numbers were stored in the variable Y4.
*************************************************
** normal chi-square goodness of fit test y1 **
*************************************************
SAMPLE:
NUMBER OF OBSERVATIONS = 1000
NUMBER OF NON-EMPTY CELLS = 24
NUMBER OF PARAMETERS USED = 0
TEST:
CHI-SQUARED TEST STATISTIC = 17.52155
DEGREES OF FREEDOM = 23
CHI-SQUARED CDF VALUE = 0.217101
*************************************************
** normal chi-square goodness of fit test y2 **
*************************************************
SAMPLE:
NUMBER OF OBSERVATIONS = 1000
NUMBER OF NON-EMPTY CELLS = 26
NUMBER OF PARAMETERS USED = 0
TEST:
CHI-SQUARED TEST STATISTIC = 2030.784
DEGREES OF FREEDOM = 25
CHI-SQUARED CDF VALUE = 1.000000
*************************************************
** normal chi-square goodness of fit test y3 **
*************************************************
SAMPLE:
NUMBER OF OBSERVATIONS = 1000
NUMBER OF NON-EMPTY CELLS = 25
NUMBER OF PARAMETERS USED = 0
TEST:
CHI-SQUARED TEST STATISTIC = 103165.4
DEGREES OF FREEDOM = 24
CHI-SQUARED CDF VALUE = 1.000000
1% 42.97982 REJECT H0
*************************************************
** normal chi-square goodness of fit test y4 **
*************************************************
SAMPLE:
NUMBER OF OBSERVATIONS = 1000
NUMBER OF NON-EMPTY CELLS = 10
NUMBER OF PARAMETERS USED = 0
TEST:
CHI-SQUARED TEST STATISTIC = 1162098.
DEGREES OF FREEDOM = 9
CHI-SQUARED CDF VALUE = 1.000000
As we would hope, the chi-square test does not reject the normality hypothesis for the
normal distribution data set and rejects it for the three non-normal cases.
Questions The chi-square test can be used to answer the following types of questions:
● Are the data from a normal distribution?
Importance Many statistical tests and procedures are based on specific distributional assumptions.
The assumption of normality is particularly common in classical statistical tests. Much
reliability modeling is based on the assumption that the distribution of the data follows a
Weibull distribution.
There are many non-parametric and robust techniques that are not based on strong
distributional assumptions. By non-parametric, we mean a technique, such as the sign
test, that is not based on a specific distributional assumption. By robust, we mean a
statistical technique that performs well under a wide range of distributional assumptions.
However, techniques based on specific distributional assumptions are in general more
powerful than these non-parametric and robust techniques. By power, we mean the ability
to detect a difference when that difference actually exists. Therefore, if the distributional
assumption can be confirmed, the parametric techniques are generally preferred.
If you are using a technique that makes a normality (or some other type of distributional)
assumption, it is important to confirm that this assumption is in fact justified. If it is, the
more powerful parametric techniques can be used. If the distributional assumption is not
justified, a non-parametric or robust technique may be required.
Software Some general purpose statistical software programs, including Dataplot, provide a
chi-square goodness-of-fit test for at least some of the common distributions.
where n(i) is the number of points less than Yi and the Yi are ordered from
smallest to largest value. This is a step function that increases by 1/N at the value
of each ordered data point.
The graph below is a plot of the empirical distribution function with a normal
cumulative distribution function for 100 normal random numbers. The K-S test is
based on the maximum distance between these two curves.
Characteristics An attractive feature of this test is that the distribution of the K-S test statistic
and itself does not depend on the underlying cumulative distribution function being
Limitations of tested. Another advantage is that it is an exact test (the chi-square goodness-of-fit
the K-S Test test depends on an adequate sample size for the approximations to be valid).
Despite these advantages, the K-S test has several important limitations:
1. It only applies to continuous distributions.
2. It tends to be more sensitive near the center of the distribution than at the
tails.
3. Perhaps the most serious limitation is that the distribution must be fully
specified. That is, if location, scale, and shape parameters are estimated
from the data, the critical region of the K-S test is no longer valid. It
typically must be determined by simulation.
Due to limitations 2 and 3 above, many analysts prefer to use the
Anderson-Darling goodness-of-fit test. However, the Anderson-Darling test is
only available for a few specific distributions.
Technical Note Previous editions of e-Handbook gave the following formula for the computation
of the Kolmogorov-Smirnov goodness of fit statistic:
This formula is in fact not correct. Note that this formula can be rewritten as:
This form makes it clear that an upper bound on the difference between these two
formulas is i/N. For actual data, the difference is likely to be less than the upper
bound.
For example, for N = 20, the upper bound on the difference between these two
formulas is 0.05 (for comparison, the 5% critical value is 0.294). For N = 100, the
upper bound is 0.001. In practice, if you have moderate to large sample sizes (say
N ≥ 50), these formulas are essentially equivalent.
Sample Output Dataplot generated the following output for the Kolmogorov-Smirnov test where
1,000 random numbers were generated for a normal, double exponential, t with 3
degrees of freedom, and lognormal distributions. In all cases, the
Kolmogorov-Smirnov test was applied to test for a normal distribution. The
Kolmogorov-Smirnov test accepts the normality hypothesis for the case of normal
data and rejects it for the double exponential, t, and lognormal data with the
exception of the double exponential data being significant at the 0.01 significance
level.
The normal random numbers were stored in the variable Y1, the double
exponential random numbers were stored in the variable Y2, the t random
numbers were stored in the variable Y3, and the lognormal random numbers were
stored in the variable Y4.
*********************************************************
** normal Kolmogorov-Smirnov goodness of fit test y1 **
*********************************************************
TEST:
KOLMOGOROV-SMIRNOV TEST STATISTIC = 0.2414924E-01
*********************************************************
** normal Kolmogorov-Smirnov goodness of fit test y2 **
*********************************************************
TEST:
KOLMOGOROV-SMIRNOV TEST STATISTIC = 0.5140864E-01
*********************************************************
** normal Kolmogorov-Smirnov goodness of fit test y3 **
*********************************************************
TEST:
KOLMOGOROV-SMIRNOV TEST STATISTIC = 0.6119353E-01
*********************************************************
** normal Kolmogorov-Smirnov goodness of fit test y4 **
*********************************************************
TEST:
KOLMOGOROV-SMIRNOV TEST STATISTIC = 0.5354889
Questions The Kolmogorov-Smirnov test can be used to answer the following types of
questions:
● Are the data from a normal distribution?
Importance Many statistical tests and procedures are based on specific distributional
assumptions. The assumption of normality is particularly common in classical
statistical tests. Much reliability modeling is based on the assumption that the
data follow a Weibull distribution.
There are many non-parametric and robust techniques that are not based on strong
distributional assumptions. By non-parametric, we mean a technique, such as the
sign test, that is not based on a specific distributional assumption. By robust, we
mean a statistical technique that performs well under a wide range of
distributional assumptions. However, techniques based on specific distributional
assumptions are in general more powerful than these non-parametric and robust
techniques. By power, we mean the ability to detect a difference when that
difference actually exists. Therefore, if the distributional assumptions can be
confirmed, the parametric techniques are generally preferred.
If you are using a technique that makes a normality (or some other type of
distributional) assumption, it is important to confirm that this assumption is in
fact justified. If it is, the more powerful parametric techniques can be used. If the
distributional assumption is not justified, using a non-parametric or robust
technique may be required.
Software Some general purpose statistical software programs, including Dataplot, support
the Kolmogorov-Smirnov goodness-of-fit test, at least for some of the more
common distributions.
Sample Dataplot generated the following output for the ZARR13.DAT data set
Output showing that Grubbs' test finds no outliers in the dataset:
*********************
** grubbs test y **
*********************
1. STATISTICS:
NUMBER OF OBSERVATIONS = 195
MINIMUM = 9.196848
MEAN = 9.261460
MAXIMUM = 9.327973
STANDARD DEVIATION = 0.2278881E-01
Importance Many statistical techniques are sensitive to the presence of outliers. For
example, simple calculations of the mean and standard deviation may
be distorted by a single grossly inaccurate data point.
Checking for outliers should be a routine part of any data analysis.
Potential outliers should be examined to see if they are possibly
erroneous. If the data point is in error, it should be corrected if possible
and deleted if it is not possible. If there is no reason to believe that the
outlying point is in error, it should not be deleted without careful
consideration. However, the use of more robust techniques may be
warranted. Robust techniques will often downweight the effect of
outlying points without deleting them.
Yates Before performing a Yates analysis, the data should be arranged in "Yates order". That
Order is, given k factors, the kth column consists of 2k-1 minus signs (i.e., the low level of the
factor) followed by 2k-1 plus signs (i.e., the high level of the factor). For example, for
a full factorial design with three factors, the design matrix is
- - -
+ - -
- + -
+ + -
- - +
+ - +
- + +
+ + +
Determining the Yates order for fractional factorial designs requires knowledge of the
confounding structure of the fractional factorial design.
Yates
A Yates analysis generates the following output.
Output
1. A factor identifier (from Yates order). The specific identifier will vary
depending on the program used to generate the Yates analysis. Dataplot, for
example, uses the following for a 3-factor model.
1 = factor 1
2 = factor 2
3 = factor 3
12 = interaction of factor 1 and factor 2
13 = interaction of factor 1 and factor 3
23 = interaction of factor 2 and factor 3
123 =interaction of factors 1, 2, and 3
2. Least squares estimated factor effects ordered from largest in magnitude (most
significant) to smallest in magnitude (least significant).
That is, we obtain a ranked list of important factors.
3. A t-value for the individual factor effect estimates. The t-value is computed as
where e is the estimated factor effect and is the standard deviation of the
estimated factor effect.
4. The residual standard deviation that results from the model with the single term
only. That is, the residual standard deviation from the model
response = constant + 0.5 (Xi)
Sample Dataplot generated the following Yates analysis output for the Eddy current data set:
Output
Interpretation In summary, the Yates analysis provides us with the following ranked
of Sample list of important factors along with the estimated effect estimate.
Output 1. X1: effect estimate = 3.1025 ohms
2. X2: effect estimate = -0.8675 ohms
3. X2*X3: effect estimate = 0.2975 ohms
4. X1*X3: effect estimate = 0.2475 ohms
5. X3: effect estimate = 0.2125 ohms
6. X1*X2*X3: effect estimate = 0.1425 ohms
7. X1*X2: effect estimate = 0.1275 ohms
Model From the above Yates output, we can define the potential models from
Selection and the Yates analysis. An important component of a Yates analysis is
Validation selecting the best model from the available potential models.
Once a tentative model has been selected, the error term should follow
the assumptions for a univariate measurement process. That is, the
model should be validated by analyzing the residuals.
Graphical Some analysts may prefer a more graphical presentation of the Yates
Presentation results. In particular, the following plots may be useful:
1. Ordered data plot
2. Ordered absolute effects plot
3. Cumulative residual standard deviation plot
Questions The Yates analysis can be used to answer the following questions:
1. What is the ranked list of factors?
2. What is the goodness-of-fit (as measured by the residual
standard deviation) for the various models?
Case Study The Yates analysis is demonstrated in the Eddy current case study.
Yates For convenience, we list the sample Yates output for the Eddy current data set here.
Table
The last column of the Yates table gives the residual standard deviation for 8 possible
models, each with one more term than the previous model.
Potential For this example, we can summarize the possible prediction equations using the second
Models and last columns of the Yates table:
●
has a residual standard deviation of 1.74106 ohms. Note that this is the default
model. That is, if no factors are important, the model is simply the overall mean.
●
has a residual standard deviation of 0.0 ohms. Note that the model with all
possible terms included will have a zero residual standard deviation. This will
always occur with an unreplicated two-level factorial design.
Model The above step lists all the potential models. From this list, we want to select the most
Selection appropriate model. This requires balancing the following two goals.
1. We want the model to include all important factors.
2. We want the model to be parsimonious. That is, the model should be as simple as
possible.
Note that the residual standard deviation alone is insufficient for determining the most
appropriate model as it will always be decreased by adding additional factors. The next
section describes a number of approaches for determining which factors (and
interactions) to include in the model.
Criteria for The seven criteria that we can use in determining whether to keep a factor in the model can be
Including summarized as follows.
Terms in the 1. Effects: Engineering Significance
Model
2. Effects: Order of Magnitude
3. Effects: Statistical Significance
4. Effects: Probability Plots
5. Averages: Youden Plot
6. Residual Standard Deviation: Engineering Significance
7. Residual Standard Deviation: Statistical Significance
The first four criteria focus on effect estimates with three numeric criteria and one graphical
criteria. The fifth criteria focuses on averages. The last two criteria focus on the residual standard
deviation of the model. We discuss each of these seven criteria in detail in the following sections.
The last section summarizes the conclusions based on all of the criteria.
That is, declare a factor as important if its effect is more than 2 standard deviations away from 0
(0, by definition, meaning "no effect").
The "2" comes from normal theory (more specifically, a value of 1.96 yields a 95% confidence
interval). More precise values would come from t-distribution theory.
The difficulty with this is that in order to invoke this criterion we need the standard deviation, ,
of an observation. This is problematic because
1. the engineer may not know ;
2. the experiment might not have replication, and so a model-free estimate of is not
obtainable;
3. obtaining an estimate of by assuming the sometimes- employed assumption of ignoring
3-term interactions and higher may be incorrect from an engineering point of view.
For the Eddy current example:
1. the engineer did not know ;
2. the design (a 23 full factorial) did not have replication;
This results in keeping three terms: X1 (3.10250), X2 (-.86750), and X1*X2 (.29750).
Normal The following half-normal plot shows the normal probability plot of the effect estimates and the
Probablity half-normal probability plot of the absolute value of the estimates for the Eddy current data.
Plot of
Effects and
Half-Normal
Probability
Plot of
Effects
For the example at hand, both probability plots clearly show two factors displaced off the line,
and from the third plot (with factor tags included), we see that those two factors are factor 1 and
factor 2. All of the remaining five effects are behaving like random drawings from a normal
distribution centered at zero, and so are deemed to be statistically non-significant. In conclusion,
this rule keeps two factors: X1 (3.10250) and X2 (-.86750).
Effects: A Youden plot can be used in the following way. Keep a factor as "important" if it is displaced
Youden Plot away from the central-tendancy "bunch" in a Youden plot of high and low averages. By
definition, a factor is important when its average response for the low (-1) setting is significantly
different from its average response for the high (+1) setting. Conversely, if the low and high
averages are about the same, then what difference does it make which setting to use and so why
would such a factor be considered important? This fact in combination with the intrinsic benefits
of the Youden plot for comparing pairs of items leads to the technique of generating a Youden
plot of the low and high averages.
Youden Plot The following is the Youden plot of the effect estimatess for the Eddy current data.
of Effect
Estimatess
For the example at hand, the Youden plot clearly shows a cluster of points near the grand average
(2.65875) with two displaced points above (factor 1) and below (factor 2). Based on the Youden
plot, we conclude to keep two factors: X1 (3.10250) and X2 (-.86750).
In other words, how good does the engineer want the prediction equation to be. Unfortunately,
this engineering specification has not always been formulated and so this criterion can become
moot.
In the absence of a prior specified cutoff, a good rough rule for the minimum engineering residual
standard deviation is to keep adding terms until the residual standard deviation just dips below,
say, 5% of the current production average. For the Eddy current data, let's say that the average
detector has a sensitivity of 2.5 ohms. Then this would suggest that we would keep adding terms
to the model until the residual standard deviation falls below 5% of 2.5 ohms = 0.125 ohms.
Based on the minimum residual standard deviation criteria, and by scanning the far right column
of the Yates table, we would conclude to keep the following terms:
1. X1 (with a cumulative residual standard deviation = 0.57272)
2. X2 (with a cumulative residual standard deviation = 0.30429)
3. X2*X3 (with a cumulative residual standard deviation = 0.26737)
4. X1*X3 (with a cumulative residual standard deviation = 0.23341)
5. X3 (with a cumulative residual standard deviation = 0.19121)
6. X1*X2*X3 (with a cumulative residual standard deviation = 0.18031)
7. X1*X2 (with a cumulative residual standard deviation = 0.00000)
Note that we must include all terms in order to drive the residual standard deviation below 0.125.
Again, the 5% rule is a rough-and-ready rule that has no basis in engineering or statistics, but is
simply a "numerics". Ideally, the engineer has a better cutoff for the residual standard deviation
that is based on how well he/she wants the equation to peform in practice. If such a number were
available, then for this criterion and data set we would select something less than the entire
collection of terms.
Conclusions In summary, the seven criteria for specifying "important" factors yielded the following for the
Eddy current data:
1. Effects, Engineering Significance: X1, X2
2. Effects, Numerically Significant: X1, X2
3. Effects, Statistically Significant: X1, X2, X2*X3
4. Effects, Probability Plots: X1, X2
5. Averages, Youden Plot: X1, X2
6. Residual SD, Engineering Significance: all 7 terms
7. Residual SD, Statistical Significance: not applicable
Such conflicting results are common. Arguably, the three most important criteria (listed in order
of most important) are:
4. Effects, Probability Plots: X1, X2
1. Effects, Engineering Significance: X1, X2
3. Residual SD, Engineering Significance: all 7 terms
Scanning all of the above, we thus declare the following consensus for the Eddy current data:
1. Important Factors: X1 and X2
2. Parsimonious Prediction Equation:
where j represents all possible values that x can have and pj is the
probability at xj.
Probability For a continuous function, the probability density function (pdf) is the
Density probability that the variate has the value x. Since for continuous
Function distributions the probability at a single point is zero, this is often
expressed in terms of an integral between two points.
For a discrete distribution, the pdf is the probability that the variate takes
the value x.
Cumulative The cumulative distribution function (cdf) is the probability that the
Distribution variable takes a value less than or equal to x. That is
Function
The horizontal axis is the allowable domain for the given probability
function. Since the vertical axis is a probability, it must fall between
zero and one. It increases from zero to one as we go from left to right on
the horizontal axis.
Percent The percent point function (ppf) is the inverse of the cumulative
Point distribution function. For this reason, the percent point function is also
Function commonly referred to as the inverse distribution function. That is, for a
distribution function we calculate the probability that the variable is less
than or equal to x for a given x. For the percent point function, we start
with the probability and compute the corresponding x for the cumulative
distribution. Mathematically, this can be expressed as
or alternatively
Since the horizontal axis is a probability, it goes from zero to one. The
vertical axis goes from the smallest to the largest value of the
cumulative distribution function.
Hazard The hazard function is the ratio of the probability density function to the
Function survival function, S(x).
Cumulative The cumulative hazard function is the integral of the hazard function. It
Hazard can be interpreted as the probability of failure at time x given survival
Function until time x.
Survival Survival functions are most often used in reliability and related fields.
Function The survival function is the probability that the variate takes a value
greater than x.
Inverse Just as the percent point function is the inverse of the cumulative
Survival distribution function, the survival function also has an inverse function.
Function The inverse survival function can be defined in terms of the percent
point function.
PPCC Plots The PPCC plot is an effective graphical tool for selecting the member of
a distributional family with a single shape parameter that best fits a
given set of data.
Location The next plot shows the probability density function for a normal
Parameter distribution with a location parameter of 10 and a scale parameter of 1.
Scale
Parameter
In contrast, the next graph has a scale parameter of 1/3 (=0.333). The
effect of this scale parameter is to squeeze the pdf. That is, the
maximum y value is approximately 1.2 as opposed to 0.4 and the y
value is near zero at (+/-) 1 as opposed to (+/-) 3.
The effect of a scale parameter greater than one is to stretch the pdf. The
greater the magnitude, the greater the stretching. The effect of a scale
parameter less than one is to compress the pdf. The compressing
approaches a spike as the scale parameter goes to zero. A scale
Location The following graph shows the effect of both a location and a scale
and Scale parameter. The plot has been shifted right 10 units and stretched by a
Together factor of 3.
Standard The standard form of any distribution is the form that has location
Form parameter zero and scale parameter one.
It is common in statistical software packages to only compute the
standard form of the distribution. There are formulas for converting
from the standard form to the form with other location and scale
parameters. These formulas are independent of the particular probability
distribution.
Formulas The following are the formulas for computing various probability
for Location functions based on the standard form of the distribution. The parameter
and Scale a refers to the location parameter and the parameter b refers to the scale
Based on parameter. Shape parameters are not included.
the Standard
Cumulative Distribution Function F(x;a,b) = F((x-a)/b;0,1)
Form
Probability Density Function f(x;a,b) = (1/b)f((x-a)/b;0,1)
Percent Point Function G( ;a,b) = a + bG( ;0,1)
Hazard Function h(x;a,b) = (1/b)h((x-a)/b;0,1)
Cumulative Hazard Function H(x;a,b) = H((x-a)/b;0,1)
Survival Function S(x;a,b) = S((x-a)/b;0,1)
Inverse Survival Function Z( ;a,b) = a + bZ( ;0,1)
Random Numbers Y(a,b) = a + bY(0,1)
Relationship For the normal distribution, the location and scale parameters
to Mean and correspond to the mean and standard deviation, respectively. However,
Standard this is not necessarily true for other distributions. In fact, it is not true
Deviation for most distributions.
Various There are various methods, both numerical and graphical, for estimating
Methods the parameters of a probability distribution.
1. Method of moments
2. Maximum likelihood
3. Least squares
4. PPCC and probability plots
Software Most general purpose statistical software does not include explicit
method of moments parameter estimation commands. However, when
utilized, the method of moment formulas tend to be straightforward and
can be easily implemented in most statistical software programs.
Case Study The airplane glass failure time case study demonstrates the use of the
PPCC and probability plots in finding the best distributional model
and the parameter estimation of the distributional model.
Other For reliability applications, the hazard plot and the Weibull plot are
Graphical alternative graphical methods that are commonly used to estimate
Methods parameters.
Continuous
Distributions
Discrete
Distributions
where is the location parameter and is the scale parameter. The case
where = 0 and = 1 is called the standard normal distribution. The
equation for the standard normal distribution is
Cumulative The formula for the cumulative distribution function of the normal
Distribution distribution does not exist in a simple closed formula. It is computed
Function numerically.
The following is the plot of the normal cumulative distribution function.
Percent The formula for the percent point function of the normal distribution
Point does not exist in a simple closed formula. It is computed numerically.
Function
The following is the plot of the normal percent point function.
Hazard The formula for the hazard function of the normal distribution is
Function
Cumulative The normal cumulative hazard function can be computed from the
Hazard normal cumulative distribution function.
Function
The following is the plot of the normal cumulative hazard function.
Survival The normal survival function can be computed from the normal
Function cumulative distribution function.
The following is the plot of the normal survival function.
Inverse The normal inverse survival function can be computed from the normal
Survival percent point function.
Function
The following is the plot of the normal inverse survival function.
Parameter The location and scale parameters of the normal distribution can be
Estimation estimated with the sample mean and sample standard deviation,
respectively.
Comments For both theoretical and practical reasons, the normal distribution is
probably the most important distribution in statistics. For example,
● Many classical statistical tests are based on the assumption that
the data follow a normal distribution. This assumption should be
tested before applying these tests.
● In modeling applications, such as linear and non-linear regression,
the error term is often assumed to follow a normal distribution
with fixed location and scale.
● The normal distribution is used to find significance levels in many
hypothesis tests and confidence intervals.
Theroretical The normal distribution is widely used. Part of the appeal is that it is
Justification well behaved and mathematically tractable. However, the central limit
- Central theorem provides a theoretical basis for why it has wide applicability.
Limit
Theorem The central limit theorem basically states that as the sample size (N)
becomes large, the following occur:
1. The sampling distribution of the mean becomes approximately
normal regardless of the distribution of the original variable.
2. The sampling distribution of the mean is centered at the
population mean, , of the original variable. In addition, the
standard deviation of the sampling distribution of the mean
approaches .
where A is the location parameter and (B - A) is the scale parameter. The case
where A = 0 and B = 1 is called the standard uniform distribution. The
equation for the standard uniform distribution is
Cumulative The formula for the cumulative distribution function of the uniform
Distribution distribution is
Function
Percent The formula for the percent point function of the uniform distribution is
Point
Function
The following is the plot of the uniform percent point function.
Hazard The formula for the hazard function of the uniform distribution is
Function
Cumulative The formula for the cumulative hazard function of the uniform distribution is
Hazard
Function
The following is the plot of the uniform cumulative hazard function.
Survival The uniform survival function can be computed from the uniform cumulative
Function distribution function.
The following is the plot of the uniform survival function.
Inverse The uniform inverse survival function can be computed from the uniform
Survival percent point function.
Function
The following is the plot of the uniform inverse survival function.
Coefficient of
Variation
Skewness 0
Kurtosis 9/5
Comments The uniform distribution defines equal probability over a given range for a
continuous distribution. For this reason, it is important as a reference
distribution.
One of the most important applications of the uniform distribution is in the
generation of random numbers. That is, almost all random number generators
generate random numbers on the (0,1) interval. For other distributions, some
transformation is applied to the uniform random numbers.
where t is the location parameter and s is the scale parameter. The case
where t = 0 and s = 1 is called the standard Cauchy distribution. The
equation for the standard Cauchy distribution reduces to
Cumulative The formula for the cumulative distribution function for the Cauchy
Distribution distribution is
Function
Percent The formula for the percent point function of the Cauchy distribution is
Point
Function
The following is the plot of the Cauchy percent point function.
Hazard The Cauchy hazard function can be computed from the Cauchy
Function probability density and cumulative distribution functions.
The following is the plot of the Cauchy hazard function.
Cumulative The Cauchy cumulative hazard function can be computed from the
Hazard Cauchy cumulative distribution function.
Function
The following is the plot of the Cauchy cumulative hazard function.
Survival The Cauchy survival function can be computed from the Cauchy
Function cumulative distribution function.
The following is the plot of the Cauchy survival function.
Inverse The Cauchy inverse survival function can be computed from the Cauchy
Survival percent point function.
Function
The following is the plot of the Cauchy inverse survival function.
Parameter The likelihood functions for the Cauchy maximum likelihood estimates
Estimation are given in chapter 16 of Johnson, Kotz, and Balakrishnan. These
equations typically must be solved numerically on a computer.
1.3.6.6.4. t Distribution
Probability The formula for the probability density function of the t distribution is
Density
Function
These plots all have a similar shape. The difference is in the heaviness
of the tails. In fact, the t distribution with equal to 1 is a Cauchy
distribution. The t distribution approaches a normal distribution as
becomes large. The approximation is quite good for values of > 30.
Cumulative The formula for the cumulative distribution function of the t distribution
Distribution is complicated and is not included here. It is given in the Evans,
Function Hastings, and Peacock book.
The following are the plots of the t cumulative distribution function with
the same values of as the pdf plots above.
Percent The formula for the percent point function of the t distribution does not
Point exist in a simple closed form. It is computed numerically.
Function
The following are the plots of the t percent point function with the same
values of as the pdf plots above.
Other Since the t distribution is typically used to develop hypothesis tests and
Probability confidence intervals and rarely for modeling applications, we omit the
Functions formulas and plots for the hazard, cumulative hazard, survival, and
inverse survival probability functions.
Parameter Since the t distribution is typically used to develop hypothesis tests and
Estimation confidence intervals and rarely for modeling applications, we omit any
discussion of parameter estimation.
Comments The t distribution is used in many cases for the critical regions for
hypothesis tests and in determining confidence intervals. The most
common example is testing if data are consistent with the assumed
process mean.
1.3.6.6.5. F Distribution
Probability The F distribution is the ratio of two chi-square distributions with
Density degrees of freedom and , respectively, where each chi-square has
Function first been divided by its degrees of freedom. The formula for the
probability density function of the F distribution is
where and are the shape parameters and is the gamma function.
The formula for the gamma function is
Percent The formula for the percent point function of the F distribution does not
Point exist in a simple closed form. It is computed numerically.
Function
The following is the plot of the F percent point function with the same
values of and as the pdf plots above.
Other Since the F distribution is typically used to develop hypothesis tests and
Probability confidence intervals and rarely for modeling applications, we omit the
Functions formulas and plots for the hazard, cumulative hazard, survival, and
inverse survival probability functions.
Common The formulas below are for the case where the location parameter is
Statistics zero and the scale parameter is one.
Mean
Mode
Coefficient of
Variation
Skewness
Parameter Since the F distribution is typically used to develop hypothesis tests and
Estimation confidence intervals and rarely for modeling applications, we omit any
discussion of parameter estimation.
Comments The F distribution is used in many cases for the critical regions for
hypothesis tests and in determining confidence intervals. Two common
examples are the analysis of variance and the F test to determine if the
variances of two populations are equal.
Cumulative The formula for the cumulative distribution function of the chi-square
Distribution distribution is
Function
Percent The formula for the percent point function of the chi-square distribution
Point does not exist in a simple closed form. It is computed numerically.
Function
The following is the plot of the chi-square percent point function with
the same values of as the pdf plots above.
Common Mean
Statistics Median approximately - 2/3 for large
Mode
Range 0 to positive infinity
Standard Deviation
Coefficient of
Variation
Skewness
Kurtosis
Comments The chi-square distribution is used in many cases for the critical regions
for hypothesis tests and in determining confidence intervals. Two
common examples are the chi-square test for independence in an RxC
contingency table and the chi-square test to determine if the standard
deviation of a population is equal to a pre-specified value.
Cumulative The formula for the cumulative distribution function of the exponential
Distribution distribution is
Function
Percent The formula for the percent point function of the exponential
Point distribution is
Function
Hazard The formula for the hazard function of the exponential distribution is
Function
Cumulative The formula for the cumulative hazard function of the exponential
Hazard distribution is
Function
Survival The formula for the survival function of the exponential distribution is
Function
Inverse The formula for the inverse survival function of the exponential
Survival distribution is
Function
Common Mean
Statistics Median
Mode Zero
Range Zero to plus infinity
Standard Deviation
Coefficient of 1
Variation
Skewness 2
Kurtosis 9
Parameter For the full sample case, the maximum likelihood estimator of the scale
Estimation parameter is the sample mean. Maximum likelihood estimation for the
exponential distribution is discussed in the chapter on reliability
(Chapter 8). It is also discussed in chapter 19 of Johnson, Kotz, and
Balakrishnan.
where is the shape parameter, is the location parameter and is the scale
parameter. The case where = 0 and = 1 is called the standard Weibull
distribution. The case where = 0 is called the 2-parameter Weibull distribution.
The equation for the standard Weibull distribution reduces to
Since the general form of probability functions can be expressed in terms of the
standard distribution, all subsequent formulas in this section are given for the
standard form of the function.
The following is the plot of the Weibull probability density function.
Cumulative The formula for the cumulative distribution function of the Weibull distribution is
Distribution
Function
The following is the plot of the Weibull cumulative distribution function with the
same values of as the pdf plots above.
Percent The formula for the percent point function of the Weibull distribution is
Point
Function
The following is the plot of the Weibull percent point function with the same
values of as the pdf plots above.
Hazard The formula for the hazard function of the Weibull distribution is
Function
The following is the plot of the Weibull hazard function with the same values of
as the pdf plots above.
Cumulative The formula for the cumulative hazard function of the Weibull distribution is
Hazard
Function
The following is the plot of the Weibull cumulative hazard function with the same
values of as the pdf plots above.
Survival The formula for the survival function of the Weibull distribution is
Function
The following is the plot of the Weibull survival function with the same values of
as the pdf plots above.
Inverse The formula for the inverse survival function of the Weibull distribution is
Survival
Function
The following is the plot of the Weibull inverse survival function with the same
values of as the pdf plots above.
Common The formulas below are with the location parameter equal to zero and the scale
Statistics parameter equal to one.
Mean
Median
Mode
Coefficient of Variation
Parameter Maximum likelihood estimation for the Weibull distribution is discussed in the
Estimation Reliability chapter (Chapter 8). It is also discussed in Chapter 21 of Johnson, Kotz,
and Balakrishnan.
Software Most general purpose statistical software programs, including Dataplot, support at
least some of the probability functions for the Weibull distribution.
Cumulative The formula for the cumulative distribution function of the lognormal
Distribution distribution is
Function
Percent The formula for the percent point function of the lognormal distribution
Point is
Function
Hazard The formula for the hazard function of the lognormal distribution is
Function
Cumulative The formula for the cumulative hazard function of the lognormal
Hazard distribution is
Function
Survival The formula for the survival function of the lognormal distribution is
Function
The following is the plot of the lognormal survival function with the
same values of as the pdf plots above.
Inverse The formula for the inverse survival function of the lognormal
Survival distribution is
Function
Common The formulas below are with the location parameter equal to zero and
Statistics the scale parameter equal to one.
Mean
Median Scale parameter m (= 1 if scale parameter not
specified).
Mode
Skewness
Kurtosis
Coefficient of
Variation
Parameter The maximum likelihood estimates for the scale parameter, m, and the
Estimation shape parameter, , are
and
where
Since the general form of probability functions can be expressed in terms of the
standard distribution, all subsequent formulas in this section are given for the
standard form of the function.
The following is the plot of the fatigue life probability density function.
Cumulative The formula for the cumulative distribution function of the fatigue life
Distribution distribution is
Function
Percent The formula for the percent point function of the fatigue life distribution is
Point
Function
where is the percent point function of the standard normal distribution. The
following is the plot of the fatigue life percent point function with the same
values of as the pdf plots above.
Hazard The fatigue life hazard function can be computed from the fatigue life probability
Function density and cumulative distribution functions.
The following is the plot of the fatigue life hazard function with the same values
of as the pdf plots above.
Cumulative The fatigue life cumulative hazard function can be computed from the fatigue life
Hazard cumulative distribution function.
Function
The following is the plot of the fatigue cumulative hazard function with the same
values of as the pdf plots above.
Survival The fatigue life survival function can be computed from the fatigue life
Function cumulative distribution function.
The following is the plot of the fatigue survival function with the same values of
as the pdf plots above.
Inverse The fatigue life inverse survival function can be computed from the fatigue life
Survival percent point function.
Function
The following is the plot of the gamma inverse survival function with the same
values of as the pdf plots above.
Common The formulas below are with the location parameter equal to zero and the scale
Statistics parameter equal to one.
Mean
Coefficient of Variation
Parameter Maximum likelihood estimation for the fatigue life distribution is discussed in the
Estimation Reliability chapter.
Comments The fatigue life distribution is used extensively in reliability applications to model
failure times.
Software Some general purpose statistical software programs, including Dataplot, support
at least some of the probability functions for the fatigue life distribution. Support
for this distribution is likely to be available for statistical programs that
emphasize reliability applications.
Cumulative The formula for the cumulative distribution function of the gamma
Distribution distribution is
Function
Percent The formula for the percent point function of the gamma distribution
Point does not exist in a simple closed form. It is computed numerically.
Function
The following is the plot of the gamma percent point function with the
same values of as the pdf plots above.
Hazard The formula for the hazard function of the gamma distribution is
Function
The following is the plot of the gamma hazard function with the same
values of as the pdf plots above.
Cumulative The formula for the cumulative hazard function of the gamma
Hazard distribution is
Function
Survival The formula for the survival function of the gamma distribution is
Function
Inverse The gamma inverse survival function does not exist in simple closed
Survival form. It is computed numberically.
Function
The following is the plot of the gamma inverse survival function with
the same values of as the pdf plots above.
Common The formulas below are with the location parameter equal to zero and
Statistics the scale parameter equal to one.
Mean
Mode
Range Zero to positive infinity.
Standard Deviation
Skewness
Kurtosis
Coefficient of
Variation
where and s are the sample mean and standard deviation, respectively.
The equations for the maximum likelihood estimation of the shape and
scale parameters are given in Chapter 18 of Evans, Hastings, and
Peacock and Chapter 17 of Johnson, Kotz, and Balakrishnan. These
equations need to be solved numerically; this is typically accomplished
by using statistical software packages.
Cumulative The formula for the cumulative distribution function of the double
Distribution exponential distribution is
Function
Percent The formula for the percent point function of the double exponential
Point distribution is
Function
Hazard The formula for the hazard function of the double exponential
Function distribution is
Cumulative The formula for the cumulative hazard function of the double
Hazard exponential distribution is
Function
Survival The double exponential survival function can be computed from the
Function cumulative distribution function of the double exponential distribution.
The following is the plot of the double exponential survival function.
Inverse The formula for the inverse survival function of the double exponential
Survival distribution is
Function
Common Mean
Statistics Median
Mode
Range Negative infinity to positive infinity
Standard Deviation
Skewness 0
Kurtosis 6
Coefficient of
Variation
Parameter The maximum likelihood estimators of the location and scale parameters
Estimation of the double exponential distribution are
Cumulative The formula for the cumulative distribution function of the power
Distribution normal distribution is
Function
Percent The formula for the percent point function of the power normal
Point distribution is
Function
Hazard The formula for the hazard function of the power normal distribution is
Function
The following is the plot of the power normal hazard function with the
same values of p as the pdf plots above.
Cumulative The formula for the cumulative hazard function of the power normal
Hazard distribution is
Function
Survival The formula for the survival function of the power normal distribution is
Function
The following is the plot of the power normal survival function with the
same values of p as the pdf plots above.
Inverse The formula for the inverse survival function of the power normal
Survival distribution is
Function
The following is the plot of the power normal inverse survival function
with the same values of p as the pdf plots above.
Common The statistics for the power normal distribution are complicated and
Statistics require tables. Nelson discusses the mean, median, mode, and standard
deviation of the power normal distribution and provides references to
the appropriate tables.
Software Most general purpose statistical software programs do not support the
probability functions for the power normal distribution. Dataplot does
support them.
where p (also referred to as the power parameter) and are the shape parameters,
is the cumulative distribution function of the standard normal distribution, and
is the probability density function of the standard normal distribution.
Cumulative The formula for the cumulative distribution function of the power lognormal
Distribution distribution is
Function
Percent The formula for the percent point function of the power lognormal distribution is
Point
Function
Hazard The formula for the hazard function of the power lognormal distribution is
Function
The following is the plot of the power lognormal hazard function with the same
values of p as the pdf plots above.
Cumulative The formula for the cumulative hazard function of the power lognormal
Hazard distribution is
Function
The following is the plot of the power lognormal cumulative hazard function with
the same values of p as the pdf plots above.
Survival The formula for the survival function of the power lognormal distribution is
Function
The following is the plot of the power lognormal survival function with the same
values of p as the pdf plots above.
Inverse The formula for the inverse survival function of the power lognormal distribution is
Survival
Function
The following is the plot of the power lognormal inverse survival function with the
same values of p as the pdf plots above.
Common The statistics for the power lognormal distribution are complicated and require
Statistics tables. Nelson discusses the mean, median, mode, and standard deviation of the
power lognormal distribution and provides references to the appropriate tables.
Parameter Nelson discusses maximum likelihood estimation for the power lognormal
Estimation distribution. These estimates need to be performed with computer software.
Software for maximum likelihood estimation of the parameters of the power
lognormal distribution is not as readily available as for other reliability
distributions such as the exponential, Weibull, and lognormal.
Software Most general purpose statistical software programs do not support the probability
functions for the power lognormal distribution. Dataplot does support them.
Cumulative The Tukey-Lambda distribution does not have a simple, closed form. It
Distribution is computed numerically.
Function
The following is the plot of the Tukey-Lambda cumulative distribution
function with the same values of as the pdf plots above.
Percent The formula for the percent point function of the standard form of the
Point Tukey-Lambda distribution is
Function
Software Most general purpose statistical software programs do not support the
probability functions for the Tukey-Lambda distribution. Dataplot does
support them.
The following is the plot of the Gumbel probability density function for
the minimum case.
The general formula for the probability density function of the Gumbel
(maximum) distribution is
The following is the plot of the Gumbel probability density function for
the maximum case.
Cumulative The formula for the cumulative distribution function of the Gumbel
Distribution distribution (minimum) is
Function
Percent The formula for the percent point function of the Gumbel distribution
Point (minimum) is
Function
The following is the plot of the Gumbel percent point function for the
minimum case.
The formula for the percent point function of the Gumbel distribution
(maximum) is
The following is the plot of the Gumbel percent point function for the
maximum case.
Hazard The formula for the hazard function of the Gumbel distribution
Function (minimum) is
The following is the plot of the Gumbel hazard function for the
minimum case.
(maximum) is
The following is the plot of the Gumbel hazard function for the
maximum case.
Cumulative The formula for the cumulative hazard function of the Gumbel
Hazard distribution (minimum) is
Function
The following is the plot of the Gumbel cumulative hazard function for
the minimum case.
The following is the plot of the Gumbel cumulative hazard function for
the maximum case.
Survival The formula for the survival function of the Gumbel distribution
Function (minimum) is
The following is the plot of the Gumbel survival function for the
minimum case.
The following is the plot of the Gumbel survival function for the
maximum case.
Inverse The formula for the inverse survival function of the Gumbel distribution
Survival (minimum) is
Function
The following is the plot of the Gumbel inverse survival function for the
minimum case.
The formula for the inverse survival function of the Gumbel distribution
(maximum) is
The following is the plot of the Gumbel inverse survival function for the
maximum case.
Common The formulas below are for the maximum order statistic case.
Statistics Mean
Skewness 1.13955
Kurtosis 5.4
Coefficient of
Variation
where p and q are the shape parameters, a and b are the lower and upper bounds,
respectively, of the distribution, and B(p,q) is the beta function. The beta function has
the formula
The case where a = 0 and b = 1 is called the standard beta distribution. The equation
for the standard beta distribution is
Typically we define the general form of a distribution in terms of location and scale
parameters. The beta is different in that we define the general distribution in terms of
the lower and upper bounds. However, the location and scale parameters can be
defined in terms of the lower and upper limits as follows:
location = a
scale = b - a
Since the general form of probability functions can be expressed in terms of the
standard distribution, all subsequent formulas in this section are given for the standard
form of the function.
The following is the plot of the beta probability density function for four different
values of the shape parameters.
Cumulative The formula for the cumulative distribution function of the beta distribution is also
Distribution called the incomplete beta function ratio (commonly denoted by Ix) and is defined as
Function
Percent The formula for the percent point function of the beta distribution does not exist in a
Point simple closed form. It is computed numerically.
Function
The following is the plot of the beta percent point function with the same values of the
shape parameters as the pdf plots above.
Other Since the beta distribution is not typically used for reliability applications, we omit the
Probability formulas and plots for the hazard, cumulative hazard, survival, and inverse survival
Functions probability functions.
Common The formulas below are for the case where the lower limit is zero and the upper limit is
Statistics one.
Mean
Mode
Range 0 to 1
Standard Deviation
Coefficient of Variation
Skewness
Parameter First consider the case where a and b are assumed to be known. For this case, the
Estimation method of moments estimates are
where is the sample mean and s2 is the sample variance. If a and b are not 0 and 1,
equations.
For the case when a and b are known, the maximum likelihood estimates can be
obtained by solving the following set of equations
The maximum likelihood equations for the case when a and b are not known are given
in pages 221-235 of Volume II of Johnson, Kotz, and Balakrishan.
Software Most general purpose statistical software programs, including Dataplot, support at
least some of the probability functions for the beta distribution.
where
The following is the plot of the binomial probability density function for four
values of p and n = 100.
The following is the plot of the binomial cumulative distribution function with
the same values of p as the pdf plots above.
Percent The binomial percent point function does not exist in simple closed form. It is
Point computed numerically. Note that because this is a discrete distribution that is
Function only defined for integer values of x, the percent point function is not smooth in
the way the percent point function typically is for a continuous distribution.
The following is the plot of the binomial percent point function with the same
values of p as the pdf plots above.
Common Mean
Statistics Mode
Range 0 to N
Standard Deviation
Coefficient of
Variation
Skewness
Kurtosis
Comments The binomial distribution is probably the most commonly used discrete
distribution.
Software Most general purpose statistical software programs, including Dataplot, support
at least some of the probability functions for the binomial distribution.
Percent The Poisson percent point function does not exist in simple closed form.
Point It is computed numerically. Note that because this is a discrete
Function distribution that is only defined for integer values of x, the percent point
function is not smooth in the way the percent point function typically is
for a continuous distribution.
The following is the plot of the Poisson percent point function with the
same values of as the pdf plots above.
Common Mean
Statistics Mode For non-integer , it is the largest integer less
than . For integer , x = and x = - 1 are
both the mode.
Range 0 to positive infinity
Standard Deviation
Coefficient of
Variation
Skewness
Kurtosis
\ 1 2 3 4 5
6 7 8 9 10
\ 11 12 13 14 15
16 17 18 19 20
\ 1 2 3 4 5
6 7 8 9 10
\ 11 12 13 14 15
16 17 18 19 20
\ 1 2 3 4 5
6 7 8 9 10
\ 11 12 13 14 15
16 17 18 19 20
24.725 31.264
12 18.549 21.026 23.337
26.217 32.910
13 19.812 22.362 24.736
27.688 34.528
14 21.064 23.685 26.119
29.141 36.123
15 22.307 24.996 27.488
30.578 37.697
16 23.542 26.296 28.845
32.000 39.252
17 24.769 27.587 30.191
33.409 40.790
18 25.989 28.869 31.526
34.805 42.312
19 27.204 30.144 32.852
36.191 43.820
20 28.412 31.410 34.170
37.566 45.315
21 29.615 32.671 35.479
38.932 46.797
22 30.813 33.924 36.781
40.289 48.268
23 32.007 35.172 38.076
41.638 49.728
24 33.196 36.415 39.364
42.980 51.179
25 34.382 37.652 40.646
44.314 52.620
26 35.563 38.885 41.923
45.642 54.052
27 36.741 40.113 43.195
46.963 55.476
28 37.916 41.337 44.461
48.278 56.892
29 39.087 42.557 45.722
49.588 58.301
30 40.256 43.773 46.979
50.892 59.703
31 41.422 44.985 48.232
52.191 61.098
32 42.585 46.194 49.480
53.486 62.487
33 43.745 47.400 50.725
54.776 63.870
34 44.903 48.602 51.966
56.061 65.247
35 46.059 49.802 53.203
57.342 66.619
36 47.212 50.998 54.437
58.619 67.985
37 48.363 52.192 55.668
59.893 69.347
38 49.513 53.384 56.896
61.162 70.703
39 50.660 54.572 58.120
62.428 72.055
40 51.805 55.758 59.342
63.691 73.402
41 52.949 56.942 60.561
64.950 74.745
42 54.090 58.124 61.777
66.206 76.084
43 55.230 59.304 62.990
67.459 77.419
44 56.369 60.481 64.201
68.710 78.750
45 57.505 61.656 65.410
69.957 80.077
46 58.641 62.830 66.617
71.201 81.400
47 59.774 64.001 67.821
72.443 82.720
48 60.907 65.171 69.023
73.683 84.037
49 62.038 66.339 70.222
74.919 85.351
50 63.167 67.505 71.420
76.154 86.661
51 64.295 68.669 72.616
77.386 87.968
52 65.422 69.832 73.810
78.616 89.272
53 66.548 70.993 75.002
79.843 90.573
54 67.673 72.153 76.192
81.069 91.872
55 68.796 73.311 77.380
82.292 93.168
56 69.919 74.468 78.567
83.513 94.461
57 71.040 75.624 79.752
84.733 95.751
58 72.160 76.778 80.936
85.950 97.039
59 73.279 77.931 82.117
87.166 98.324
60 74.397 79.082 83.298
88.379 99.607
61 75.514 80.232 84.476
89.591 100.888
62 76.630 81.381 85.654
90.802 102.166
63 77.745 82.529 86.830
92.010 103.442
64 78.860 83.675 88.004
93.217 104.716
65 79.973 84.821 89.177
94.422 105.988
66 81.085 85.965 90.349
95.626 107.258
67 82.197 87.108 91.519
96.828 108.526
68 83.308 88.250 92.689
98.028 109.791
69 84.418 89.391 93.856
99.228 111.055
70 85.527 90.531 95.023
100.425 112.317
71 86.635 91.670 96.189
101.621 113.577
72 87.743 92.808 97.353
102.816 114.835
73 88.850 93.945 98.516
104.010 116.092
74 89.956 95.081 99.678
105.202 117.346
75 91.061 96.217 100.839
106.393 118.599
76 92.166 97.351 101.999
107.583 119.850
77 93.270 98.484 103.158
108.771 121.100
78 94.374 99.617 104.316
109.958 122.348
79 95.476 100.749 105.473
111.144 123.594
80 96.578 101.879 106.629
112.329 124.839
81 97.680 103.010 107.783
113.512 126.083
82 98.780 104.139 108.937
114.695 127.324
83 99.880 105.267 110.090
115.876 128.565
84 100.980 106.395 111.242
117.057 129.804
85 102.079 107.522 112.393
118.236 131.041
86 103.177 108.648 113.544
119.414 132.277
87 104.275 109.773 114.693
120.591 133.512
88 105.372 110.898 115.841
121.767 134.746
89 106.469 112.022 116.989
122.942 135.978
90 107.565 113.145 118.136
124.116 137.208
91 108.661 114.268 119.282
125.289 138.438
92 109.756 115.390 120.427
126.462 139.666
93 110.850 116.511 121.571
127.633 140.893
94 111.944 117.632 122.715
128.803 142.119
95 113.038 118.752 123.858
129.973 143.344
96 114.131 119.871 125.000
131.141 144.567
97 115.223 120.990 126.141
132.309 145.789
98 116.315 122.108 127.282
133.476 147.010
99 117.407 123.225 128.422
134.642 148.230
100 118.498 124.342 129.561
135.807 149.449
100 118.498 124.342 129.561
135.807 149.449
2.558 1.479
11. 5.578 4.575 3.816
3.053 1.834
12. 6.304 5.226 4.404
3.571 2.214
13. 7.042 5.892 5.009
4.107 2.617
14. 7.790 6.571 5.629
4.660 3.041
15. 8.547 7.261 6.262
5.229 3.483
16. 9.312 7.962 6.908
5.812 3.942
17. 10.085 8.672 7.564
6.408 4.416
18. 10.865 9.390 8.231
7.015 4.905
19. 11.651 10.117 8.907
7.633 5.407
20. 12.443 10.851 9.591
8.260 5.921
21. 13.240 11.591 10.283
8.897 6.447
22. 14.041 12.338 10.982
9.542 6.983
23. 14.848 13.091 11.689
10.196 7.529
24. 15.659 13.848 12.401
10.856 8.085
25. 16.473 14.611 13.120
11.524 8.649
26. 17.292 15.379 13.844
12.198 9.222
27. 18.114 16.151 14.573
12.879 9.803
28. 18.939 16.928 15.308
13.565 10.391
29. 19.768 17.708 16.047
14.256 10.986
30. 20.599 18.493 16.791
14.953 11.588
31. 21.434 19.281 17.539
15.655 12.196
32. 22.271 20.072 18.291
16.362 12.811
33. 23.110 20.867 19.047
17.074 13.431
34. 23.952 21.664 19.806
17.789 14.057
35. 24.797 22.465 20.569
18.509 14.688
36. 25.643 23.269 21.336
19.233 15.324
37. 26.492 24.075 22.106
19.960 15.965
38. 27.343 24.884 22.878
20.691 16.611
39. 28.196 25.695 23.654
21.426 17.262
40. 29.051 26.509 24.433
22.164 17.916
41. 29.907 27.326 25.215
22.906 18.575
42. 30.765 28.144 25.999
23.650 19.239
43. 31.625 28.965 26.785
24.398 19.906
44. 32.487 29.787 27.575
25.148 20.576
45. 33.350 30.612 28.366
25.901 21.251
46. 34.215 31.439 29.160
26.657 21.929
47. 35.081 32.268 29.956
27.416 22.610
48. 35.949 33.098 30.755
28.177 23.295
49. 36.818 33.930 31.555
28.941 23.983
50. 37.689 34.764 32.357
29.707 24.674
51. 38.560 35.600 33.162
30.475 25.368
52. 39.433 36.437 33.968
31.246 26.065
53. 40.308 37.276 34.776
32.018 26.765
54. 41.183 38.116 35.586
32.793 27.468
55. 42.060 38.958 36.398
33.570 28.173
56. 42.937 39.801 37.212
34.350 28.881
57. 43.816 40.646 38.027
35.131 29.592
58. 44.696 41.492 38.844
35.913 30.305
59. 45.577 42.339 39.662
36.698 31.020
60. 46.459 43.188 40.482
37.485 31.738
61. 47.342 44.038 41.303
38.273 32.459
62. 48.226 44.889 42.126
39.063 33.181
63. 49.111 45.741 42.950
39.855 33.906
64. 49.996 46.595 43.776
40.649 34.633
65. 50.883 47.450 44.603
41.444 35.362
66. 51.770 48.305 45.431
42.240 36.093
67. 52.659 49.162 46.261
43.038 36.826
68. 53.548 50.020 47.092
43.838 37.561
69. 54.438 50.879 47.924
44.639 38.298
70. 55.329 51.739 48.758
45.442 39.036
71. 56.221 52.600 49.592
46.246 39.777
72. 57.113 53.462 50.428
47.051 40.519
73. 58.006 54.325 51.265
47.858 41.264
74. 58.900 55.189 52.103
48.666 42.010
75. 59.795 56.054 52.942
49.475 42.757
76. 60.690 56.920 53.782
50.286 43.507
77. 61.586 57.786 54.623
51.097 44.258
78. 62.483 58.654 55.466
51.910 45.010
79. 63.380 59.522 56.309
52.725 45.764
80. 64.278 60.391 57.153
53.540 46.520
81. 65.176 61.261 57.998
54.357 47.277
82. 66.076 62.132 58.845
55.174 48.036
83. 66.976 63.004 59.692
55.993 48.796
84. 67.876 63.876 60.540
56.813 49.557
85. 68.777 64.749 61.389
57.634 50.320
86. 69.679 65.623 62.239
58.456 51.085
87. 70.581 66.498 63.089
59.279 51.850
88. 71.484 67.373 63.941
60.103 52.617
89. 72.387 68.249 64.793
60.928 53.386
90. 73.291 69.126 65.647
61.754 54.155
91. 74.196 70.003 66.501
62.581 54.926
92. 75.100 70.882 67.356
63.409 55.698
93. 76.006 71.760 68.211
64.238 56.472
94. 76.912 72.640 69.068
65.068 57.246
95. 77.818 73.520 69.925
65.898 58.022
96. 78.725 74.401 70.783
66.730 58.799
97. 79.633 75.282 71.642
67.562 59.577
98. 80.541 76.164 72.501
68.396 60.356
99. 81.449 77.046 73.361
69.230 61.137
100. 82.358 77.929 74.222
70.065 61.918
1 37.544 61 2.455
2 7.582 62 2.454
3 4.826 63 2.453
4 3.941 64 2.452
5 3.518 65 2.451
6 3.274 66 2.450
7 3.115 67 2.449
8 3.004 68 2.448
9 2.923 69 2.447
10 2.860 70 2.446
11 2.811 71 2.445
12 2.770 72 2.445
13 2.737 73 2.444
14 2.709 74 2.443
15 2.685 75 2.442
16 2.665 76 2.441
17 2.647 77 2.441
18 2.631 78 2.440
19 2.617 79 2.439
20 2.605 80 2.439
21 2.594 81 2.438
22 2.584 82 2.437
23 2.574 83 2.437
24 2.566 84 2.436
25 2.558 85 2.436
26 2.551 86 2.435
27 2.545 87 2.435
28 2.539 88 2.434
29 2.534 89 2.434
30 2.528 90 2.433
31 2.524 91 2.432
32 2.519 92 2.432
33 2.515 93 2.431
34 2.511 94 2.431
35 2.507 95 2.431
36 2.504 96 2.430
37 2.501 97 2.430
38 2.498 98 2.429
39 2.495 99 2.429
40 2.492 100 2.428
41 2.489 101 2.428
42 2.487 102 2.428
43 2.484 103 2.427
44 2.482 104 2.427
45 2.480 105 2.426
46 2.478 106 2.426
47 2.476 107 2.426
48 2.474 108 2.425
49 2.472 109 2.425
50 2.470 110 2.425
51 2.469 111 2.424
52 2.467 112 2.424
53 2.466 113 2.424
54 2.464 114 2.423
55 2.463 115 2.423
56 2.461 116 2.423
57 2.460 117 2.422
58 2.459 118 2.422
59 2.457 119 2.422
60 2.456 120 2.422
Critical values of the normal PPCC for testing if data come from
a normal distribution
N 0.01 0.05
3 0.8687 0.8790
4 0.8234 0.8666
5 0.8240 0.8786
6 0.8351 0.8880
7 0.8474 0.8970
8 0.8590 0.9043
9 0.8689 0.9115
10 0.8765 0.9173
11 0.8838 0.9223
12 0.8918 0.9267
13 0.8974 0.9310
14 0.9029 0.9343
15 0.9080 0.9376
16 0.9121 0.9405
17 0.9160 0.9433
18 0.9196 0.9452
19 0.9230 0.9479
20 0.9256 0.9498
21 0.9285 0.9515
22 0.9308 0.9535
23 0.9334 0.9548
24 0.9356 0.9564
25 0.9370 0.9575
26 0.9393 0.9590
27 0.9413 0.9600
28 0.9428 0.9615
29 0.9441 0.9622
30 0.9462 0.9634
31 0.9476 0.9644
32 0.9490 0.9652
33 0.9505 0.9661
34 0.9521 0.9671
35 0.9530 0.9678
36 0.9540 0.9686
37 0.9551 0.9693
38 0.9555 0.9700
39 0.9568 0.9704
40 0.9576 0.9712
41 0.9589 0.9719
42 0.9593 0.9723
43 0.9609 0.9730
44 0.9611 0.9734
45 0.9620 0.9739
46 0.9629 0.9744
47 0.9637 0.9748
48 0.9640 0.9753
49 0.9643 0.9758
50 0.9654 0.9761
55 0.9683 0.9781
60 0.9706 0.9797
65 0.9723 0.9809
70 0.9742 0.9822
75 0.9758 0.9831
80 0.9771 0.9841
85 0.9784 0.9850
90 0.9797 0.9857
95 0.9804 0.9864
100 0.9814 0.9869
110 0.9830 0.9881
120 0.9841 0.9889
130 0.9854 0.9897
140 0.9865 0.9904
150 0.9871 0.9909
160 0.9879 0.9915
170 0.9887 0.9919
180 0.9891 0.9923
190 0.9897 0.9927
200 0.9903 0.9930
210 0.9907 0.9933
220 0.9910 0.9936
230 0.9914 0.9939
240 0.9917 0.9941
250 0.9921 0.9943
260 0.9924 0.9945
270 0.9926 0.9947
280 0.9929 0.9949
290 0.9931 0.9951
300 0.9933 0.9952
310 0.9936 0.9954
320 0.9937 0.9955
330 0.9939 0.9956
340 0.9941 0.9957
350 0.9942 0.9958
Table of 1. Introduction
Contents for
Section 4 2. By Problem Category
Yi = C + Ei
If the above assumptions are satisfied, the process is said to be
statistically "in control" with the core characteristic of having
"predictability". That is, probability statements can be made about the
process, not only in the past, but also in the future.
An appropriate model for an "in control" process is
Yi = C + Ei
where C is a constant (the "deterministic" or "structural" component),
and where Ei is the error term (or "random" component).
is valid.
4. Distributional tests assist in determining a better estimator, if
needed.
5. Simulator tools (namely bootstrapping) provide values for the
uncertainty of alternative estimators.
Assumptions If one or more of the above assumptions is not satisfied, then we use
not satisfied EDA techniques, or some mix of EDA and classical techniques, to
find a more appropriate model for the data. That is,
Yi = D + Ei
where D is the deterministic part and E is an error component.
If the data are not random, then we may investigate fitting some
simple time series models to the data. If the constant location and
scale assumptions are violated, we may need to investigate the
measurement process to see if there is an explanation.
The assumptions on the error term are still quite relevant in the sense
that for an appropriate model the error component should follow the
assumptions. The criterion for validating the model, or comparing
competing models, is framed in terms of these assumptions.
Multivariable Although the case studies in this chapter utilize univariate data, the
data assumptions above are relevant for multivariable data as well.
If the data are not univariate, then we are trying to find a model
Yi = F(X1, ..., Xk) + Ei
where F is some function based on one or more variables. The error
component, which is a univariate data set, of a good model should
satisfy the assumptions given above. The criterion for validating and
comparing models is based on how well the error component follows
these assumptions.
The load cell calibration case study in the process modeling chapter
shows an example of this in the regression context.
First three The first three case studies utilize data that are randomly generated
case studies from the following distributions:
utilize data ● normal distribution with mean 0 and standard deviation 1
with known
● uniform distribution with mean 0 and standard deviation
characteristics
(uniform over the interval (0,1))
random walk
●
The other univariate case studies utilize data from scientific processes.
The goal is to determine if
Yi = C + Ei
is a reasonable model. This is done by testing the underlying
assumptions. If the assumptions are satisfied, then an estimate of C
and an estimate of the uncertainty of C are computed. If the
assumptions are not satisfied, we attempt to find a model where the
error component does satisfy the underlying assumptions.
Graphical To test the underlying assumptions, each data set is analyzed using
methods that four graphical methods that are particularly suited for this purpose:
are applied to 1. run sequence plot which is useful for detecting shifts of location
the data or scale
2. lag plot which is useful for detecting non-randomness in the
data
3. histogram which is useful for trying to determine the underlying
distribution
4. normal probability plot for deciding whether the data follow the
normal distribution
There are a number of other techniques for addressing the underlying
Quantitative The normal and uniform random number data sets are also analyzed
methods that with the following quantitative techniques, which are explained in
are applied to more detail in an earlier section:
the data 1. Summary statistics which include:
❍ mean
❍ standard deviation
❍ autocorrelation coefficient to test for randomness
❍ normal and uniform probability plot correlation
coefficients (ppcc) to test for a normal or uniform
distribution, respectively
❍ Wilk-Shapiro test for a normal distribution
Reliability
Airplane Glass
Failure Time
Multi-Factor
Ceramic Strength
Resulting The following is the set of normal random numbers used for this case
Data study.
4-Plot of
Data
Run
Sequence
Plot
Lag Plot
Histogram
(with
overlaid
Normal PDF)
Normal
Probability
Plot
SUMMARY
***********************************************************************
* LOCATION MEASURES * DISPERSION MEASURES
*
***********************************************************************
* MIDRANGE = 0.3945000E+00 * RANGE = 0.6083000E+01
*
* MEAN = -0.2935997E-02 * STAND. DEV. = 0.1021041E+01
*
* MIDMEAN = 0.1623600E-01 * AV. AB. DEV. = 0.8174360E+00
*
* MEDIAN = -0.9300000E-01 * MINIMUM = -0.2647000E+01
*
* = * LOWER QUART. = -0.7204999E+00
*
* = * LOWER HINGE = -0.7210000E+00
*
* = * UPPER HINGE = 0.6455001E+00
*
* = * UPPER QUART. = 0.6447501E+00
*
* = * MAXIMUM = 0.3436000E+01
*
***********************************************************************
* RANDOMNESS MEASURES * DISTRIBUTIONAL MEASURES
*
***********************************************************************
* AUTOCO COEF = 0.4505888E-01 * ST. 3RD MOM. = 0.3072273E+00
*
* = 0.0000000E+00 * ST. 4TH MOM. = 0.2990314E+01
*
* = 0.0000000E+00 * ST. WILK-SHA = 0.7515639E+01
*
* = * UNIFORM PPCC = 0.9756625E+00
*
* = * NORMAL PPCC = 0.9961721E+00
*
* = * TUK -.5 PPCC = 0.8366451E+00
*
* = * CAUCHY PPCC = 0.4922674E+00
*
***********************************************************************
Location One way to quantify a change in location over time is to fit a straight line to the data set,
using the index variable X = 1, 2, ..., N, with N denoting the number of observations. If
there is no significant drift in the location, the slope parameter should be zero. For this
data set, Dataplot generated the following output:
The slope parameter, A1, has a t value of -0.13 which is statistically not significant. This
indicates that the slope can in fact be considered zero.
Variation One simple way to detect a change in variation is with a Bartlett test, after dividing the
data set into several equal-sized intervals. The choice of the number of intervals is
somewhat arbitrary, although values of 4 or 8 are reasonable. Dataplot generated the
following output for the Bartlett test.
BARTLETT TEST
(STANDARD DEFINITION)
NULL HYPOTHESIS UNDER TEST--ALL SIGMA(I) ARE EQUAL
TEST:
DEGREES OF FREEDOM = 3.000000
In this case, the Bartlett test indicates that the standard deviations are not significantly
different in the 4 intervals.
Randomness There are many ways in which data can be non-random. However, most common forms
of non-randomness can be detected with a few simple tests. The lag plot in the 4-plot
above is a simple graphical technique.
Another check is an autocorrelation plot that shows the autocorrelations for various lags.
Confidence bands can be plotted at the 95% and 99% confidence levels. Points outside
this band indicate statistically significant values (lag 0 is always 1). Dataplot generated
the following autocorrelation plot.
The lag 1 autocorrelation, which is generally the one of most interest, is 0.045. The
critical values at the 5% significance level are -0.087 and 0.087. Thus, since 0.045 is in
the interval, the lag 1 autocorrelation is not statistically significant, so there is no
evidence of non-randomness.
A common test for randomness is the runs test.
RUNS UP
STATISTIC = NUMBER OF RUNS UP
OF LENGTH EXACTLY I
I STAT EXP(STAT) SD(STAT) Z
Values in the column labeled "Z" greater than 1.96 or less than -1.96 are statistically
significant at the 5% level. The runs test does not indicate any significant
non-randomness.
Distributional Probability plots are a graphical test for assessing if a particular distribution provides an
Analysis adequate fit to a data set.
A quantitative enhancement to the probability plot is the correlation coefficient of the
points on the probability plot. For this data set the correlation coefficient is 0.996. Since
this is greater than the critical value of 0.987 (this is a tabulated value), the normality
assumption is not rejected.
Chi-square and Kolmogorov-Smirnov goodness-of-fit tests are alternative methods for
assessing distributional adequacy. The Wilk-Shapiro and Anderson-Darling tests can be
used to test for normality. Dataplot generates the following output for the
Anderson-Darling normality test.
1. STATISTICS:
NUMBER OF OBSERVATIONS = 500
MEAN = -0.2935997E-02
STANDARD DEVIATION = 1.021041
2. CRITICAL VALUES:
90 % POINT = 0.6560000
95 % POINT = 0.7870000
97.5 % POINT = 0.9180000
99 % POINT = 1.092000
Outlier A test for outliers is the Grubbs test. Dataplot generated the following output for Grubbs'
Analysis test.
1. STATISTICS:
NUMBER OF OBSERVATIONS = 500
MINIMUM = -2.647000
MEAN = -0.2935997E-02
MAXIMUM = 3.436000
STANDARD DEVIATION = 1.021041
Model Since the underlying assumptions were validated both graphically and analytically, we
conclude that a reasonable model for the data is:
Yi = -0.00294 + Ei
We can express the uncertainty for C as the 95% confidence interval
(-0.09266,0.086779).
Univariate It is sometimes useful and convenient to summarize the above results in a report. The
Report report for the 500 normal random numbers follows.
2: Location
Mean = -0.00294
Standard Deviation of Mean = 0.045663
95% Confidence Interval for Mean = (-0.09266,0.086779)
Drift with respect to location? = NO
3: Variation
Standard Deviation = 1.021042
95% Confidence Interval for SD = (0.961437,1.088585)
Drift with respect to variation?
4: Distribution
Normal PPCC = 0.996173
Data are Normal?
(as measured by Normal PPCC) = YES
5: Randomness
Autocorrelation = 0.045059
Data are Random?
(as measured by autocorrelation) = YES
6: Statistical Control
(i.e., no drift in location or scale,
data are random, distribution is
fixed, here we are testing only for
fixed normal)
Data Set is in Statistical Control? = YES
7: Outliers?
(as determined by Grubbs' test) = NO
Click on the links below to start Dataplot and run this case study
The links in this column will connect you with more detailed
yourself. Each step may use results from previous steps, so please be
information about each analysis step from the case study
patient. Wait until the software verifies that the current step is
description.
complete before clicking on the next step.
1. Generate a run sequence plot. 1. The run sequence plot indicates that
there are no shifts of location or
scale.
2. Generate a lag plot. 2. The lag plot does not indicate any
significant patterns (which would
show the data were not random).
5. Check for normality by computing the 5. The normal probability plot correlation
normal probability plot correlation coefficient is 0.996. At the 5% level,
coefficient. we cannot reject the normality assumption.
6. Check for outliers using Grubbs' test. 6. Grubbs' test detects no outliers at the
5% level.
Resulting The following is the set of uniform random numbers used for this case
Data study.
4-Plot of
Data
Run
Sequence
Plot
Lag Plot
Histogram
(with
overlaid
Normal PDF)
This plot shows that a normal distribution is a poor fit. The flatness of
the histogram suggests that a uniform distribution might be a better fit.
Histogram
(with
overlaid
Uniform
PDF)
Since the histogram from the 4-plot suggested that the uniform
distribution might be a good fit, we overlay a uniform distribution on
top of the histogram. This indicates a much better fit than a normal
distribution.
Normal
Probability
Plot
As with the histogram, the normal probability plot shows that the
normal distribution does not fit these data well.
Uniform
Probability
Plot
Better Model Since the data follow the underlying assumptions, but with a uniform
distribution rather than a normal distribution, we would still like to
characterize C by a typical value plus or minus a confidence interval.
In this case, we would like to find a location estimator with the
smallest variability.
The bootstrap plot is an ideal tool for this purpose. The following plots
show the bootstrap plot, with the corresponding histogram, for the
mean, median, mid-range, and median absolute deviation.
Bootstrap
Plots
Mid-Range is From the above histograms, it is obvious that for these data, the
Best mid-range is far superior to the mean or median as an estimate for
location.
Using the mean, the location estimate is 0.507 and a 95% confidence
interval for the mean is (0.482,0.534). Using the mid-range, the
location estimate is 0.499 and the 95% confidence interval for the
mid-range is (0.497,0.503).
Although the values for the location are similar, the difference in the
uncertainty intervals is quite large.
Note that in the case of a uniform distribution it is known theoretically
that the mid-range is the best linear unbiased estimator for location.
However, in many applications, the most appropriate estimator will not
be known or it will be mathematically intractable to determine a valid
condfidence interval. The bootstrap provides a method for determining
SUMMARY
***********************************************************************
* LOCATION MEASURES * DISPERSION MEASURES
*
***********************************************************************
* MIDRANGE = 0.4997850E+00 * RANGE = 0.9945900E+00
*
* MEAN = 0.5078304E+00 * STAND. DEV. = 0.2943252E+00
*
* MIDMEAN = 0.5045621E+00 * AV. AB. DEV. = 0.2526468E+00
*
* MEDIAN = 0.5183650E+00 * MINIMUM = 0.2490000E-02
*
* = * LOWER QUART. = 0.2508093E+00
*
* = * LOWER HINGE = 0.2505935E+00
*
* = * UPPER HINGE = 0.7594775E+00
*
* = * UPPER QUART. = 0.7591152E+00
*
* = * MAXIMUM = 0.9970800E+00
*
***********************************************************************
* RANDOMNESS MEASURES * DISTRIBUTIONAL MEASURES
*
***********************************************************************
* AUTOCO COEF = -0.3098569E-01 * ST. 3RD MOM. = -0.3443941E-01
*
* = 0.0000000E+00 * ST. 4TH MOM. = 0.1796969E+01
*
* = 0.0000000E+00 * ST. WILK-SHA = -0.2004886E+02
*
* = * UNIFORM PPCC = 0.9995682E+00
*
* = * NORMAL PPCC = 0.9771602E+00
*
Note that under the distributional measures the uniform probability plot correlation
coefficient (PPCC) value is significantly larger than the normal PPCC value. This is
evidence that the uniform distribution fits these data better than does a normal
distribution.
Location One way to quantify a change in location over time is to fit a straight line to the data set
using the index variable X = 1, 2, ..., N, with N denoting the number of observations. If
there is no significant drift in the location, the slope parameter should be zero. For this
data set, Dataplot generated the following output:
The slope parameter, A1, has a t value of -0.66 which is statistically not significant. This
indicates that the slope can in fact be considered zero.
Variation One simple way to detect a change in variation is with a Bartlett test after dividing the
data set into several equal-sized intervals. However, the Bartlett test is not robust for
non-normality. Since we know this data set is not approximated well by the normal
distribution, we use the alternative Levene test. In partiuclar, we use the Levene test
based on the median rather the mean. The choice of the number of intervals is somewhat
arbitrary, although values of 4 or 8 are reasonable. Dataplot generated the following
output for the Levene test.
1. STATISTICS
NUMBER OF OBSERVATIONS = 500
NUMBER OF GROUPS = 4
LEVENE F TEST STATISTIC = 0.7983007E-01
90 % POINT = 2.094885
95 % POINT = 2.622929
99 % POINT = 3.821479
99.9 % POINT = 5.506884
In this case, the Levene test indicates that the standard deviations are not significantly
different in the 4 intervals.
Randomness There are many ways in which data can be non-random. However, most common forms
of non-randomness can be detected with a few simple tests. The lag plot in the 4-plot in
the previous section is a simple graphical technique.
Another check is an autocorrelation plot that shows the autocorrelations for various lags.
Confidence bands can be plotted using 95% and 99% confidence levels. Points outside
this band indicate statistically significant values (lag 0 is always 1). Dataplot generated
the following autocorrelation plot.
The lag 1 autocorrelation, which is generally the one of most interest, is 0.03. The critical
values at the 5% significance level are -0.087 and 0.087. This indicates that the lag 1
autocorrelation is not statistically significant, so there is no evidence of non-randomness.
A common test for randomness is the runs test.
RUNS UP
STATISTIC = NUMBER OF RUNS UP
OF LENGTH EXACTLY I
I STAT EXP(STAT) SD(STAT) Z
Values in the column labeled "Z" greater than 1.96 or less than -1.96 are statistically
significant at the 5% level. This runs test does not indicate any significant
non-randomness. There is a statistically significant value for runs of length 7. However,
further examination of the table shows that there is in fact a single run of length 7 when
near 0 are expected. This is not sufficient evidence to conclude that the data are
non-random.
Distributional Probability plots are a graphical test of assessing whether a particular distribution
Analysis provides an adequate fit to a data set.
A quantitative enhancement to the probability plot is the correlation coefficient of the
points on the probability plot. For this data set the correlation coefficient, from the
summary table above, is 0.977. Since this is less than the critical value of 0.987 (this is a
tabulated value), the normality assumption is rejected.
1. STATISTICS:
NUMBER OF OBSERVATIONS = 500
MEAN = 0.5078304
2. CRITICAL VALUES:
90 % POINT = 0.6560000
95 % POINT = 0.7870000
97.5 % POINT = 0.9180000
99 % POINT = 1.092000
Model Based on the graphical and quantitative analysis, we use the model
Yi = C + Ei
where C is estimated by the mid-range and the uncertainty interval for C is based on a
bootstrap analysis. Specifically,
C = 0.499
95% confidence limit for C = (0.497,0.503)
Univariate It is sometimes useful and convenient to summarize the above results in a report. The
Report report for the 500 uniform random numbers follows.
2: Location
Mean = 0.50783
Standard Deviation of Mean = 0.013163
95% Confidence Interval for Mean = (0.48197,0.533692)
Drift with respect to location? = NO
3: Variation
Standard Deviation = 0.294326
95% Confidence Interval for SD = (0.277144,0.313796)
Drift with respect to variation?
(based on Levene's test on quarters
of the data) = NO
4: Distribution
Normal PPCC = 0.999569
Data are Normal?
(as measured by Normal PPCC) = NO
5: Randomness
Autocorrelation = -0.03099
Data are Random?
6: Statistical Control
(i.e., no drift in location or scale,
data is random, distribution is
fixed, here we are testing only for
fixed uniform)
Data Set is in Statistical Control? = YES
Click on the links below to start Dataplot and run this case study
yourself. Each step may use results from previous steps, so please be The links in this column will connect you with more detailed
patient. Wait until the software verifies that the current step is information about each analysis step from the case study description.
complete before clicking on the next step.
1. Generate a run sequence plot. 1. The run sequence plot indicates that
there are no shifts of location or
scale.
2. Generate a lag plot. 2. The lag plot does not indicate any
significant patterns (which would
show the data were not random).
5. Check for normality by computing the 5. The uniform probability plot correlation
normal probability plot correlation coefficient is 0.9995. This indicates that
coefficient. the uniform distribution is a good fit.
Resulting The following is the set of random walk numbers used for this case
Data study.
-0.399027
-0.645651
-0.625516
-0.262049
-0.407173
-0.097583
0.314156
0.106905
-0.017675
-0.037111
0.357631
0.820111
0.844148
0.550509
0.090709
0.413625
-0.002149
0.393170
0.538263
0.070583
0.473143
0.132676
0.109111
-0.310553
0.179637
-0.067454
-0.190747
-0.536916
-0.905751
-0.518984
-0.579280
-0.643004
-1.014925
-0.517845
-0.860484
-0.884081
-1.147428
-0.657917
-0.470205
-0.798437
-0.637780
-0.666046
-1.093278
-1.089609
-0.853439
-0.695306
-0.206795
-0.507504
-0.696903
-1.116358
-1.044534
-1.481004
-1.638390
-1.270400
-1.026477
-1.123380
-0.770683
-0.510481
-0.958825
-0.531959
-0.457141
-0.226603
-0.201885
-0.078000
0.057733
-0.228762
-0.403292
-0.414237
-0.556689
-0.772007
-0.401024
-0.409768
-0.171804
-0.096501
-0.066854
0.216726
0.551008
0.660360
0.194795
-0.031321
0.453880
0.730594
1.136280
0.708490
1.149048
1.258757
1.102107
1.102846
0.720896
0.764035
1.072312
0.897384
0.965632
0.759684
0.679836
0.955514
1.290043
1.753449
1.542429
1.873803
2.043881
1.728635
1.289703
1.501481
1.888335
1.408421
1.416005
0.929681
1.097632
1.501279
1.650608
1.759718
2.255664
2.490551
2.508200
2.707382
2.816310
3.254166
2.890989
2.869330
3.024141
3.291558
3.260067
3.265871
3.542845
3.773240
3.991880
3.710045
4.011288
4.074805
4.301885
3.956416
4.278790
3.989947
4.315261
4.200798
4.444307
4.926084
4.828856
4.473179
4.573389
4.528605
4.452401
4.238427
4.437589
4.617955
4.370246
4.353939
4.541142
4.807353
4.706447
4.607011
4.205943
3.756457
3.482142
3.126784
3.383572
3.846550
4.228803
4.110948
4.525939
4.478307
4.457582
4.822199
4.605752
5.053262
5.545598
5.134798
5.438168
5.397993
5.838361
5.925389
6.159525
6.190928
6.024970
5.575793
5.516840
5.211826
4.869306
4.912601
5.339177
5.415182
5.003303
4.725367
4.350873
4.225085
3.825104
3.726391
3.301088
3.767535
4.211463
4.418722
4.554786
4.987701
4.993045
5.337067
5.789629
5.726147
5.934353
5.641670
5.753639
5.298265
5.255743
5.500935
5.434664
5.588610
6.047952
6.130557
5.785299
5.811995
5.582793
5.618730
5.902576
6.226537
5.738371
5.449965
5.895537
6.252904
6.650447
7.025909
6.770340
7.182244
6.941536
7.368996
7.293807
7.415205
7.259291
6.970976
7.319743
6.850454
6.556378
6.757845
6.493083
6.824855
6.533753
6.410646
6.502063
6.264585
6.730889
6.753715
6.298649
6.048126
5.794463
5.539049
5.290072
5.409699
5.843266
5.680389
5.185889
5.451353
5.003233
5.102844
5.566741
5.613668
5.352791
5.140087
4.999718
5.030444
5.428537
5.471872
5.107334
5.387078
4.889569
4.492962
4.591042
4.930187
4.857455
4.785815
5.235515
4.865727
4.855005
4.920206
4.880794
4.904395
4.795317
5.163044
4.807122
5.246230
5.111000
5.228429
5.050220
4.610006
4.489258
4.399814
4.606821
4.974252
5.190037
5.084155
5.276501
4.917121
4.534573
4.076168
4.236168
3.923607
3.666004
3.284967
2.980621
2.623622
2.882375
3.176416
3.598001
3.764744
3.945428
4.408280
4.359831
4.353650
4.329722
4.294088
4.588631
4.679111
4.182430
4.509125
4.957768
4.657204
4.325313
4.338800
4.720353
4.235756
4.281361
3.795872
4.276734
4.259379
3.999663
3.544163
3.953058
3.844006
3.684740
3.626058
3.457909
3.581150
4.022659
4.021602
4.070183
4.457137
4.156574
4.205304
4.514814
4.055510
3.938217
4.180232
3.803619
3.553781
3.583675
3.708286
4.005810
4.419880
4.881163
5.348149
4.950740
5.199262
4.753162
4.640757
4.327090
4.080888
3.725953
3.939054
3.463728
3.018284
2.661061
3.099980
3.340274
3.230551
3.287873
3.497652
3.014771
3.040046
3.342226
3.656743
3.698527
3.759707
4.253078
4.183611
4.196580
4.257851
4.683387
4.224290
3.840934
4.329286
3.909134
3.685072
3.356611
2.956344
2.800432
2.761665
2.744913
3.037743
2.787390
2.387619
2.424489
2.247564
2.502179
2.022278
2.213027
2.126914
2.264833
2.528391
2.432792
2.037974
1.699475
2.048244
1.640126
1.149858
1.475253
1.245675
0.831979
1.165877
1.403341
1.181921
1.582379
1.632130
2.113636
2.163129
2.545126
2.963833
3.078901
3.055547
3.287442
2.808189
2.985451
3.181679
2.746144
2.517390
2.719231
2.581058
2.838745
2.987765
3.459642
3.458684
3.870956
4.324706
4.411899
4.735330
4.775494
4.681160
4.462470
3.992538
3.719936
3.427081
3.256588
3.462766
3.046353
3.537430
3.579857
3.931223
3.590096
3.136285
3.391616
3.114700
2.897760
2.724241
2.557346
2.971397
2.479290
2.305336
1.852930
1.471948
1.510356
1.633737
1.727873
1.512994
1.603284
1.387950
1.767527
2.029734
2.447309
2.321470
2.435092
2.630118
2.520330
2.578147
2.729630
2.713100
3.107260
2.876659
2.774242
3.185503
3.403148
3.392646
3.123339
3.164713
3.439843
3.321929
3.686229
3.203069
3.185843
3.204924
3.102996
3.496552
3.191575
3.409044
3.888246
4.273767
3.803540
4.046417
4.071581
3.916256
3.634441
4.065834
3.844651
3.915219
is appropriate and valid, with s denoting the standard deviation of the original data.
4-Plot of Data
When the randomness assumption is seriously violated, a time series model may be
appropriate. The lag plot often suggests a reasonable model. For example, in this case the
strongly linear appearance of the lag plot suggests a model fitting Yi versus Yi-1 might be
appropriate. When the data are non-random, it is helpful to supplement the lag plot with
an autocorrelation plot and a spectral plot. Although in this case the lag plot is enough to
suggest an appropriate model, we provide the autocorrelation and spectral plots for
comparison.
Autocorrelation When the lag plot indicates significant non-randomness, it can be helpful to follow up
Plot with a an autocorrelation plot.
Spectral Plot Another useful plot for non-random data is the spectral plot.
Quantitative Although the 4-plot above clearly shows the violation of the assumptions, we supplement
Output the graphical output with some quantitative measures.
Summary As a first step in the analysis, a table of summary statistics is computed from the data.
Statistics The following table, generated by Dataplot, shows a typical set of statistics.
SUMMARY
***********************************************************************
* LOCATION MEASURES * DISPERSION MEASURES
*
***********************************************************************
* MIDRANGE = 0.2888407E+01 * RANGE = 0.9053595E+01
*
* MEAN = 0.3216681E+01 * STAND. DEV. = 0.2078675E+01
*
* MIDMEAN = 0.4791331E+01 * AV. AB. DEV. = 0.1660585E+01
*
* MEDIAN = 0.3612030E+01 * MINIMUM = -0.1638390E+01
*
* = * LOWER QUART. = 0.1747245E+01
*
* = * LOWER HINGE = 0.1741042E+01
*
* = * UPPER HINGE = 0.4682273E+01
*
* = * UPPER QUART. = 0.4681717E+01
*
Location One way to quantify a change in location over time is to fit a straight line to the data set
using the index variable X = 1, 2, ..., N, with N denoting the number of observations. If
there is no significant drift in the location, the slope parameter should be zero. For this
data set, Dataplot generates the following output:
Variation One simple way to detect a change in variation is with a Bartlett test after dividing the
data set into several equal-sized intervals. However, the Bartlett test is not robust for
non-normality. Since we know this data set is not approximated well by the normal
distribution, we use the alternative Levene test. In partiuclar, we use the Levene test
based on the median rather the mean. The choice of the number of intervals is somewhat
arbitrary, although values of 4 or 8 are reasonable. Dataplot generated the following
output for the Levene test.
1. STATISTICS
NUMBER OF OBSERVATIONS = 500
NUMBER OF GROUPS = 4
LEVENE F TEST STATISTIC = 10.45940
In this case, the Levene test indicates that the standard deviations are significantly
different in the 4 intervals since the test statistic of 10.46 is greater than the 95% critical
value of 2.62. Therefore we conclude that the scale is not constant.
Randomness Although the lag 1 autocorrelation coefficient above clearly shows the non-randomness,
we show the output from a runs test as well.
RUNS UP
RUNS DOWN
Values in the column labeled "Z" greater than 1.96 or less than -1.96 are statistically
significant at the 5% level. Numerous values in this column are much larger than +/-1.96,
so we conclude that the data are not random.
Distributional Since the quantitative tests show that the assumptions of randomness and constant
Assumptions location and scale are not met, the distributional measures will not be meaningful.
Therefore these quantitative tests are omitted.
The slope parameter, A1, has a t value of 156.4 which is statistically significant. Also,
the residual standard deviation is 0.29. This can be compared to the standard deviation
shown in the summary table, which is 2.08. That is, the fit to the autoregressive model
has reduced the variability by a factor of 7.
Time This model is an example of a time series model. More extensive discussion of time
Series series is given in the Process Monitoring chapter.
Model
Test In addition to the plot of the predicted values, the residual standard
Underlying deviation from the fit also indicates a significant improvement for the
Assumptions model. The next step is to validate the underlying assumptions for the
on the error component, or residuals, from this model.
Residuals
4-Plot of
Residuals
Uniform
Probability
Plot of
Residuals
Since the uniform probability plot is nearly linear, this verifies that a
uniform distribution is a good model for the error component.
Conclusions Since the residuals from our model satisfy the underlying assumptions,
we conlude that
where the Ei follow a uniform distribution is a good model for this data
set. We could simplify this model to
This has the advantage of simplicity (the current point is simply the
previous point plus a uniformly distributed error term).
Using In this case, the above model makes sense based on our definition of
Scientific and the random walk. That is, a random walk is the cumulative sum of
Engineering uniformly distributed data points. It makes sense that modeling the
Knowledge current point as the previous point plus a uniformly distributed error
term is about as good as we can do. Although this case is a bit artificial
in that we knew how the data were constructed, it is common and
desirable to use scientific and engineering knowledge of the process
that generated the data in formulating and testing models for the data.
Quite often, several competing models will produce nearly equivalent
mathematical results. In this case, selecting the model that best
approximates the scientific understanding of the process is a reasonable
choice.
Time Series This model is an example of a time series model. More extensive
Model discussion of time series is given in the Process Monitoring chapter.
Click on the links below to start Dataplot and run this case
The links in this column will connect you with more detailed
study yourself. Each step may use results from previous steps,
information about each analysis step from the case study
so please be patient. Wait until the software verifies that the
description.
current step is complete before clicking on the next step.
2. Validate assumptions.
4. Fit Yi = A0 + A1*Yi-1 + Ei
and validate.
2. Plot fitted line with original data. 2. The plot of the predicted values with
the original data indicates a good fit.
3. Generate a 4-plot of the residuals 3. The 4-plot indicates that the assumptions
from the fit. of constant location and scale are valid.
The lag plot indicates that the data are
random. However, the histogram and normal
probability plot indicate that the uniform
disribution might be a better model for
the residuals than the normal
distribution.
Motivation The motivation for studying this data set is to illustrate the case where
there is discreteness in the measurements, but the underlying
assumptions hold. In this case, the discreteness is due to the data being
integers.
Resulting The following are the data used for this case study.
Data
2899 2898 2898 2900 2898
2901 2899 2901 2900 2898
2898 2898 2898 2900 2898
2897 2899 2897 2899 2899
2900 2897 2900 2900 2899
2898 2898 2899 2899 2899
2899 2899 2898 2899 2899
2899 2902 2899 2900 2898
2899 2899 2899 2899 2899
2899 2900 2899 2900 2898
2901 2900 2899 2899 2899
2899 2899 2900 2899 2898
2898 2898 2900 2896 2897
4-Plot of
Data
Run
Sequence
Plot
Lag Plot
Histogram
(with
overlaid
Normal PDF)
Normal
Probability
Plot
SUMMARY
***********************************************************************
* LOCATION MEASURES * DISPERSION MEASURES
*
***********************************************************************
* MIDRANGE = 0.2898500E+04 * RANGE = 0.7000000E+01
*
* MEAN = 0.2898562E+04 * STAND. DEV. = 0.1304969E+01
*
* MIDMEAN = 0.2898363E+04 * AV. AB. DEV. = 0.1058571E+01
*
* MEDIAN = 0.2899000E+04 * MINIMUM = 0.2895000E+04
*
* = * LOWER QUART. = 0.2898000E+04
*
* = * LOWER HINGE = 0.2898000E+04
*
* = * UPPER HINGE = 0.2899000E+04
*
* = * UPPER QUART. = 0.2899000E+04
*
* = * MAXIMUM = 0.2902000E+04
*
***********************************************************************
* RANDOMNESS MEASURES * DISTRIBUTIONAL MEASURES
*
***********************************************************************
* AUTOCO COEF = 0.3148023E+00 * ST. 3RD MOM. = 0.6580265E-02
*
* = 0.0000000E+00 * ST. 4TH MOM. = 0.2825334E+01
*
* = 0.0000000E+00 * ST. WILK-SHA = -0.2272378E+02
*
* = * UNIFORM PPCC = 0.9554127E+00
*
* = * NORMAL PPCC = 0.9748405E+00
*
Location One way to quantify a change in location over time is to fit a straight line to the data set
using the index variable X = 1, 2, ..., N, with N denoting the number of observations. If
there is no significant drift in the location, the slope parameter should be zero. For this
data set, Dataplot generates the following output:
Variation One simple way to detect a change in variation is with a Bartlett test after dividing the
data set into several equal-sized intervals. However, the Bartlett test is not robust for
non-normality. Since the nature of the data (a few distinct points repeated many times)
makes the normality assumption questionable, we use the alternative Levene test. In
partiuclar, we use the Levene test based on the median rather the mean. The choice of the
number of intervals is somewhat arbitrary, although values of 4 or 8 are reasonable.
Dataplot generated the following output for the Levene test.
1. STATISTICS
NUMBER OF OBSERVATIONS = 700
NUMBER OF GROUPS = 4
LEVENE F TEST STATISTIC = 1.432365
Randomness
There are many ways in which data can be non-random. However, most common forms
of non-randomness can be detected with a few simple tests. The lag plot in the previous
section is a simple graphical technique.
Another check is an autocorrelation plot that shows the autocorrelations for various lags.
Confidence bands can be plotted at the 95% and 99% confidence levels. Points outside
this band indicate statistically significant values (lag 0 is always 1). Dataplot generated
the following autocorrelation plot.
The lag 1 autocorrelation, which is generally the one of most interest, is 0.31. The critical
values at the 5% level of significance are -0.087 and 0.087. This indicates that the lag 1
autocorrelation is statistically significant, so there is some evidence for non-randomness.
A common test for randomness is the runs test.
RUNS UP
RUNS DOWN
Distributional Probability plots are a graphical test for assessing if a particular distribution provides an
Analysis adequate fit to a data set.
A quantitative enhancement to the probability plot is the correlation coefficient of the
points on the probability plot. For this data set the correlation coefficient is 0.975. Since
this is less than the critical value of 0.987 (this is a tabulated value), the normality
assumption is rejected.
Chi-square and Kolmogorov-Smirnov goodness-of-fit tests are alternative methods for
assessing distributional adequacy. The Wilk-Shapiro and Anderson-Darling tests can be
used to test for normality. Dataplot generates the following output for the
Anderson-Darling normality test.
1. STATISTICS:
NUMBER OF OBSERVATIONS = 700
MEAN = 2898.562
STANDARD DEVIATION = 1.304969
2. CRITICAL VALUES:
90 % POINT = 0.6560000
95 % POINT = 0.7870000
97.5 % POINT = 0.9180000
99 % POINT = 1.092000
Outlier A test for outliers is the Grubbs test. Dataplot generated the following output for Grubbs'
Analysis test.
1. STATISTICS:
NUMBER OF OBSERVATIONS = 700
MINIMUM = 2895.000
MEAN = 2898.562
MAXIMUM = 2902.000
STANDARD DEVIATION = 1.304969
Model Although the randomness and normality assumptions were mildly violated, we conclude
that a reasonable model for the data is:
Univariate It is sometimes useful and convenient to summarize the above results in a report.
Report
Analysis for Josephson Junction Cryothermometry Data
2: Location
Mean = 2898.562
Standard Deviation of Mean = 0.049323
95% Confidence Interval for Mean = (2898.465,2898.658)
Drift with respect to location? = YES
(Further analysis indicates that
the drift, while statistically
significant, is not practically
significant)
3: Variation
Standard Deviation = 1.30497
95% Confidence Interval for SD = (1.240007,1.377169)
Drift with respect to variation?
(based on Levene's test on quarters
of the data) = NO
4: Distribution
Normal PPCC = 0.97484
Data are Normal?
(as measured by Normal PPCC) = NO
5: Randomness
Autocorrelation = 0.314802
Data are Random?
(as measured by autocorrelation) = NO
6: Statistical Control
(i.e., no drift in location or scale,
data are random, distribution is
7: Outliers?
(as determined by Grubbs test) = NO
Click on the links below to start Dataplot and run this case study
yourself. Each step may use results from previous steps, so please be The links in this column will connect you with more detailed
patient. Wait until the software verifies that the current step is information about each analysis step from the case study description.
complete before clicking on the next step.
1. Generate a run sequence plot. 1. The run sequence plot indicates that
there are no shifts of location or
scale.
2. Generate a lag plot. 2. The lag plot does not indicate any
significant patterns (which would
show the data were not random).
5. Check for normality by computing the 5. The normal probability plot correlation
normal probability plot correlation coefficient is 0.975. At the 5% level,
coefficient. we reject the normality assumption.
6. Check for outliers using Grubbs' test. 6. Grubbs' test detects no outliers at the
5% level.
Resulting The following are the data used for this case study.
Data
-213
-564
-35
-15
141
115
-420
-360
203
-338
-431
194
-220
-513
154
-125
-559
92
-21
-579
-52
99
-543
-175
162
-457
-346
204
-300
-474
164
-107
-572
-8
83
-541
-224
180
-420
-374
201
-236
-531
83
27
-564
-112
131
-507
-254
199
-311
-495
143
-46
-579
-90
136
-472
-338
202
-287
-477
169
-124
-568
17
48
-568
-135
162
-430
-422
172
-74
-577
-13
92
-534
-243
194
-355
-465
156
-81
-578
-64
139
-449
-384
193
-198
-538
110
-44
-577
-6
66
-552
-164
161
-460
-344
205
-281
-504
134
-28
-576
-118
156
-437
-381
200
-220
-540
83
11
-568
-160
172
-414
-408
188
-125
-572
-32
139
-492
-321
205
-262
-504
142
-83
-574
0
48
-571
-106
137
-501
-266
190
-391
-406
194
-186
-553
83
-13
-577
-49
103
-515
-280
201
300
-506
131
-45
-578
-80
138
-462
-361
201
-211
-554
32
74
-533
-235
187
-372
-442
182
-147
-566
25
68
-535
-244
194
-351
-463
174
-125
-570
15
72
-550
-190
172
-424
-385
198
-218
-536
96
is appropriate and valid where s is the standard deviation of the original data.
4-Plot of Data
is not appropriate.
We need to develop a better model. Non-random data can frequently be modeled using
time series mehtodology. Specifically, the circular pattern in the lag plot indicates that a
sinusoidal model might be appropriate. The sinusoidal model will be developed in the
next section.
Individual The plots can be generated individually for more detail. In this case, only the run
Plots sequence plot and the lag plot are drawn since the distributional plots are not meaningful.
Run Sequence
Plot
Lag Plot
We have drawn some lines and boxes on the plot to better isolate the outliers. The
following output helps identify the points that are generating the outliers on the lag plot.
****************************************************
** print y index xplot yplot subset yplot > 250 **
****************************************************
****************************************************
** print y index xplot yplot subset xplot > 250 **
****************************************************
********************************************************
** print y index xplot yplot subset yplot -100 to 0
subset xplot -100 to 0 **
********************************************************
*********************************************************
** print y index xplot yplot subset yplot 100 to 200
subset xplot 100 to 200 **
*********************************************************
That is, the third, fifth, and 158th points appear to be outliers.
Autocorrelation When the lag plot indicates significant non-randomness, it can be helpful to follow up
Plot with a an autocorrelation plot.
This autocorrelation plot shows a distinct cyclic pattern. As with the lag plot, this
suggests a sinusoidal model.
Spectral Plot Another useful plot for non-random data is the spectral plot.
This spectral plot shows a single dominant peak at a frequency of 0.3. This frequency of
0.3 will be used in fitting the sinusoidal model in the next section.
Quantitative Although the lag plot, autocorrelation plot, and spectral plot clearly show the violation of
Output the randomness assumption, we supplement the graphical output with some quantitative
measures.
Summary As a first step in the analysis, a table of summary statistics is computed from the data.
Statistics The following table, generated by Dataplot, shows a typical set of statistics.
SUMMARY
***********************************************************************
* LOCATION MEASURES * DISPERSION MEASURES
*
***********************************************************************
* MIDRANGE = -0.1395000E+03 * RANGE = 0.8790000E+03
*
* MEAN = -0.1774350E+03 * STAND. DEV. = 0.2773322E+03
*
* MIDMEAN = -0.1797600E+03 * AV. AB. DEV. = 0.2492250E+03
*
* MEDIAN = -0.1620000E+03 * MINIMUM = -0.5790000E+03
*
* = * LOWER QUART. = -0.4510000E+03
*
* = * LOWER HINGE = -0.4530000E+03
*
* = * UPPER HINGE = 0.9400000E+02
*
* = * UPPER QUART. = 0.9300000E+02
*
* = * MAXIMUM = 0.3000000E+03
*
***********************************************************************
* RANDOMNESS MEASURES * DISTRIBUTIONAL MEASURES
*
***********************************************************************
* AUTOCO COEF = -0.3073048E+00 * ST. 3RD MOM. = -0.5010057E-01
*
* = 0.0000000E+00 * ST. 4TH MOM. = 0.1503684E+01
*
* = 0.0000000E+00 * ST. WILK-SHA = -0.1883372E+02
*
* = * UNIFORM PPCC = 0.9925535E+00
*
* = * NORMAL PPCC = 0.9540811E+00
*
* = * TUK -.5 PPCC = 0.7313794E+00
*
* = * CAUCHY PPCC = 0.4408355E+00
*
***********************************************************************
Location One way to quantify a change in location over time is to fit a straight line to the data set
using the index variable X = 1, 2, ..., N, with N denoting the number of observations. If
there is no significant drift in the location, the slope parameter should be zero. For this
data set, Dataplot generates the following output:
Variation One simple way to detect a change in variation is with a Bartlett test after dividing the
data set into several equal-sized intervals. However, the Bartlett the non-randomness of
this data does not allows us to assume normality, we use the alternative Levene test. In
partiuclar, we use the Levene test based on the median rather the mean. The choice of the
number of intervals is somewhat arbitrary, although values of 4 or 8 are reasonable.
Dataplot generated the following output for the Levene test.
1. STATISTICS
NUMBER OF OBSERVATIONS = 200
NUMBER OF GROUPS = 4
LEVENE F TEST STATISTIC = 0.9378599E-01
different in the 4 intervals since the test statistic of 13.2 is greater than the 95% critical
value of 2.6. Therefore we conclude that the scale is not constant.
RUNS UP
RUNS DOWN
Values in the column labeled "Z" greater than 1.96 or less than -1.96 are statistically
significant at the 5% level. Numerous values in this column are much larger than +/-1.96,
so we conclude that the data are not random.
Distributional Since the quantitative tests show that the assumptions of constant scale and
Assumptions non-randomness are not met, the distributional measures will not be meaningful.
Therefore these quantitative tests are omitted.
Good Starting A good starting value for C can be obtained by calculating the mean of the data.
Value for C If the data show a trend, i.e., the assumption of constant location is violated, we
can replace C with a linear or quadratic least squares fit. That is, the model
becomes
or
Since our data did not have any meaningful change of location, we can fit the
simpler model with C equal to the mean. From the summary output in the
previous page, the mean is -177.44.
Good Starting The starting value for the frequency can be obtained from the spectral plot,
Value for which shows the dominant frequency is about 0.3.
Frequency
Complex The complex demodulation phase plot can be used to refine this initial estimate
Demodulation for the frequency.
Phase Plot
For the complex demodulation plot, if the lines slope from left to right, the
frequency should be increased. If the lines slope from right to left, it should be
decreased. A relatively flat (i.e., horizontal) slope indicates a good frequency.
We could generate the demodulation phase plot for 0.3 and then use trial and
error to obtain a better estimate for the frequency. To simplify this, we generate
16 of these plots on a single page starting with a frequency of 0.28, increasing in
increments of 0.0025, and stopping at 0.3175.
Interpretation The plots start with lines sloping from left to right but gradually change to a right
to left slope. The relatively flat slope occurs for frequency 0.3025 (third row,
second column). The complex demodulation phase plot restricts the range from
to . This is why the plot appears to show some breaks.
Good Starting The complex demodulation amplitude plot is used to find a good starting value
Values for for the amplitude. In addition, this plot indicates whether or not the amplitude is
Amplitude constant over the entire range of the data or if it varies. If the plot is essentially
flat, i.e., zero slope, then it is reasonable to assume a constant amplitude in the
non-linear model. However, if the slope varies over the range of the plot, we
may need to adjust the model to be:
That is, we replace with a function of time. A linear fit is specified in the
model above, but this can be replaced with a more elaborate function if needed.
Complex
Demodulation
Amplitude
Plot
The complex demodulation amplitude plot for this data shows that:
1. The amplitude is fixed at approximately 390.
2. There is a short start-up effect.
3. There is a change in amplitude at around x=160 that should be
investigated for an outlier.
In terms of a non-linear model, the plot indicates that fitting a single constant for
should be adequate for this data set.
Fit Output Using starting estimates of 0.3025 for the frequency, 390 for the amplitude, and
-177.44 for C, Dataplot generated the following output for the fit.
ITERATION CONVERGENCE
RESIDUAL * PARAMETER
NUMBER MEASURE
STANDARD * ESTIMATES
DEVIATION *
----------------------------------*-----------
1-- 0.10000E-01 0.52903E+03 *-0.17743E+03 0.39000E+03
0.30250E+00 0.10000E+01
2-- 0.50000E-02 0.22218E+03 *-0.17876E+03-0.33137E+03
0.30238E+00 0.71471E+00
3-- 0.25000E-02 0.15634E+03 *-0.17886E+03-0.24523E+03
0.30233E+00 0.14022E+01
4-- 0.96108E-01 0.15585E+03 *-0.17879E+03-0.36177E+03
0.30260E+00 0.14654E+01
Fit Output Dataplot generated the following fit output after removing 3 outliers.
with Outliers
Removed
ITERATION CONVERGENCE
RESIDUAL * PARAMETER
NUMBER MEASURE
STANDARD * ESTIMATES
DEVIATION *
----------------------------------*-----------
1-- 0.10000E-01 0.14834E+03 *-0.17879E+03-0.36177E+03
0.30260E+00 0.14654E+01
New The original fit, with a residual standard deviation of 155.84, was:
Fit to
Edited
Data The new fit, with a residual standard deviation of 148.34, is:
4-Plot
for
New
Fit
This plot shows that the underlying assumptions are satisfied and therefore the
new fit is a good descriptor of the data.
In this case, it is a judgment call whether to use the fit with or without the
outliers removed.
Click on the links below to start Dataplot and run this case
study yourself. Each step may use results from previous steps, The links in this column will connect you with more detailed
so please be patient. Wait until the software verifies that the information about each analysis step from the case study description.
current step is complete before clicking on the next step.
2. Validate assumptions.
3. Fit
Yi = C + A*SIN(2*PI*omega*ti+phi).
1. Complex demodulation phase plot
1. Generate a complex demodulation indicates a starting frequency
phase plot. of 0.3025.
4. Validate fit.
1. Generate a 4-plot of the residuals 1. The 4-plot indicates that the assumptions
from the fit. of constant location and scale are valid.
The lag plot indicates that the data are
random. The histogram and normal
probability plot indicate that the residuals
that the normality assumption for the
residuals are not seriously violated,
although there is a bend on the probablity
plot that warrants attention.
Resulting The following are the data used for this case study.
Data
2.00180
2.00170
2.00180
2.00190
2.00180
2.00170
2.00150
2.00140
2.00150
2.00150
2.00170
2.00180
2.00180
2.00190
2.00190
2.00210
2.00200
2.00160
2.00140
2.00130
2.00130
2.00150
2.00150
2.00160
2.00150
2.00140
2.00130
2.00140
2.00150
2.00140
2.00150
2.00160
2.00150
2.00160
2.00190
2.00200
2.00200
2.00210
2.00220
2.00230
2.00240
2.00250
2.00270
2.00260
2.00260
2.00260
2.00270
2.00260
2.00250
2.00240
4-Plot of
Data
is not valid. Given the linear appearance of the lag plot, the first step
might be to consider a model of the type
Run
Sequence
Plot
Lag Plot
SUMMARY
NUMBER OF OBSERVATIONS = 50
***********************************************************************
* LOCATION MEASURES * DISPERSION MEASURES
*
***********************************************************************
* MIDRANGE = 0.2002000E+01 * RANGE = 0.1399994E-02
*
* MEAN = 0.2001856E+01 * STAND. DEV. = 0.4291329E-03
*
* MIDMEAN = 0.2001638E+01 * AV. AB. DEV. = 0.3480196E-03
*
* MEDIAN = 0.2001800E+01 * MINIMUM = 0.2001300E+01
*
* = * LOWER QUART. = 0.2001500E+01
*
* = * LOWER HINGE = 0.2001500E+01
*
* = * UPPER HINGE = 0.2002100E+01
*
* = * UPPER QUART. = 0.2002175E+01
*
* = * MAXIMUM = 0.2002700E+01
*
***********************************************************************
* RANDOMNESS MEASURES * DISTRIBUTIONAL MEASURES
*
***********************************************************************
* AUTOCO COEF = 0.9379919E+00 * ST. 3RD MOM. = 0.6191616E+00
*
* = 0.0000000E+00 * ST. 4TH MOM. = 0.2098746E+01
*
* = 0.0000000E+00 * ST. WILK-SHA = -0.4995516E+01
*
* = * UNIFORM PPCC = 0.9666610E+00
*
* = * NORMAL PPCC = 0.9558001E+00
*
* = * TUK -.5 PPCC = 0.8462552E+00
*
* = * CAUCHY PPCC = 0.6822084E+00
*
***********************************************************************
Location One way to quantify a change in location over time is to fit a straight line to the data set
using the index variable X = 1, 2, ..., N, with N denoting the number of observations. If
there is no significant drift in the location, the slope parameter should be zero. For this
data set, Dataplot generates the following output:
Variation One simple way to detect a change in variation is with a Bartlett test after dividing the
data set into several equal sized intervals. However, the Bartlett test is not robust for
non-normality. Since the normality assumption is questionable for these data, we use the
alternative Levene test. In partiuclar, we use the Levene test based on the median rather
the mean. The choice of the number of intervals is somewhat arbitrary, although values of
4 or 8 are reasonable. Dataplot generated the following output for the Levene test.
1. STATISTICS
NUMBER OF OBSERVATIONS = 50
NUMBER OF GROUPS = 4
LEVENE F TEST STATISTIC = 0.9714893
99 % POINT = 4.238307
99.9 % POINT = 6.424733
Randomness There are many ways in which data can be non-random. However, most common forms
of non-randomness can be detected with a few simple tests. The lag plot in the 4-plot in
the previous seciton is a simple graphical technique.
One check is an autocorrelation plot that shows the autocorrelations for various lags.
Confidence bands can be plotted at the 95% and 99% confidence levels. Points outside
this band indicate statistically significant values (lag 0 is always 1). Dataplot generated
the following autocorrelation plot.
The lag 1 autocorrelation, which is generally the one of most interest, is 0.93. The critical
values at the 5% level are -0.277 and 0.277. This indicates that the lag 1 autocorrelation
is statistically significant, so there is strong evidence of non-randomness.
A common test for randomness is the runs test.
RUNS UP
RUNS DOWN
Values in the column labeled "Z" greater than 1.96 or less than -1.96 are statistically
significant at the 5% level. Due to the number of values that are much larger than the
1.96 cut-off, we conclude that the data are not random.
Distributional Since we rejected the randomness assumption, the distributional tests are not meaningful.
Analysis Therefore, these quantitative tests are omitted. We also omit Grubbs' outlier test since it
also assumes the data are approximately normally distributed.
Univariate It is sometimes useful and convenient to summarize the above results in a report.
Report
1: Sample Size = 50
2: Location
Mean = 2.001857
Standard Deviation of Mean = 0.00006
95% Confidence Interval for Mean = (2.001735,2.001979)
Drift with respect to location? = NO
3: Variation
Standard Deviation = 0.00043
95% Confidence Interval for SD = (0.000359,0.000535)
Change in variation?
(based on Levene's test on quarters
of the data) = NO
4: Distribution
Distributional tests omitted due to
non-randomness of the data
5: Randomness
Lag One Autocorrelation = 0.937998
Data are Random?
(as measured by autocorrelation) = NO
6: Statistical Control
(i.e., no drift in location or scale,
data are random, distribution is
fixed, here we are testing only for
normal)
Data Set is in Statistical Control? = NO
7: Outliers?
(Grubbs' test omitted) = NO
Click on the links below to start Dataplot and run this case study
The links in this column will connect you with more detailed
yourself. Each step may use results from previous steps, so please
information about each analysis step from the case study
be patient. Wait until the software verifies that the current step is
description.
complete before clicking on the next step.
1. Generate a run sequence plot. 1. The run sequence plot indicates that
there is a shift in location.
2. Compute a linear fit based on 2. The linear fit indicates a slight drift in
quarters of the data to detect location since the slope parameter is
drift in location. statistically significant, but small.
Resulting The following are the data used for this case study.
Data
27.8680
27.8929
27.8773
27.8530
27.8876
27.8725
27.8743
27.8879
27.8728
27.8746
27.8863
27.8716
27.8818
27.8872
27.8885
27.8945
27.8797
27.8627
27.8870
27.8895
27.9138
27.8931
27.8852
27.8788
27.8827
27.8939
27.8558
27.8814
27.8479
27.8479
27.8848
27.8809
27.8479
27.8611
27.8630
27.8679
27.8637
27.8985
27.8900
27.8577
27.8848
27.8869
27.8976
27.8610
27.8567
27.8417
27.8280
27.8555
27.8639
27.8702
27.8582
27.8605
27.8900
27.8758
27.8774
27.9008
27.8988
27.8897
27.8990
27.8958
27.8830
27.8967
27.9105
27.9028
27.8977
27.8953
27.8970
27.9190
27.9180
27.8997
27.9204
27.9234
27.9072
27.9152
27.9091
27.8882
27.9035
27.9267
27.9138
27.8955
27.9203
27.9239
27.9199
27.9646
27.9411
27.9345
27.8712
27.9145
27.9259
27.9317
27.9239
27.9247
27.9150
27.9444
27.9457
27.9166
27.9066
27.9088
27.9255
27.9312
27.9439
27.9210
27.9102
27.9083
27.9121
27.9113
27.9091
27.9235
27.9291
27.9253
27.9092
27.9117
27.9194
27.9039
27.9515
27.9143
27.9124
27.9128
27.9260
27.9339
27.9500
27.9530
27.9430
27.9400
27.8850
27.9350
27.9120
27.9260
27.9660
27.9280
27.9450
27.9390
27.9429
27.9207
27.9205
27.9204
27.9198
27.9246
27.9366
27.9234
27.9125
27.9032
27.9285
27.9561
27.9616
27.9530
27.9280
27.9060
27.9380
27.9310
27.9347
27.9339
27.9410
27.9397
27.9472
27.9235
27.9315
27.9368
27.9403
27.9529
27.9263
27.9347
27.9371
27.9129
27.9549
27.9422
27.9423
27.9750
27.9339
27.9629
27.9587
27.9503
27.9573
27.9518
27.9527
27.9589
27.9300
27.9629
27.9630
27.9660
27.9730
27.9660
27.9630
27.9570
27.9650
27.9520
27.9820
27.9560
27.9670
27.9520
27.9470
27.9720
27.9610
27.9437
27.9660
27.9580
27.9660
27.9700
27.9600
27.9660
27.9770
27.9110
27.9690
27.9698
27.9616
27.9371
27.9700
27.9265
27.9964
27.9842
27.9667
27.9610
27.9943
27.9616
27.9397
27.9799
28.0086
27.9709
27.9741
27.9675
27.9826
27.9676
27.9703
27.9789
27.9786
27.9722
27.9831
28.0043
27.9548
27.9875
27.9495
27.9549
27.9469
27.9744
27.9744
27.9449
27.9837
27.9585
28.0096
27.9762
27.9641
27.9854
27.9877
27.9839
27.9817
27.9845
27.9877
27.9880
27.9822
27.9836
28.0030
27.9678
28.0146
27.9945
27.9805
27.9785
27.9791
27.9817
27.9805
27.9782
27.9753
27.9792
27.9704
27.9794
27.9814
27.9794
27.9795
27.9881
27.9772
27.9796
27.9736
27.9772
27.9960
27.9795
27.9779
27.9829
27.9829
27.9815
27.9811
27.9773
27.9778
27.9724
27.9756
27.9699
27.9724
27.9666
27.9666
27.9739
27.9684
27.9861
27.9901
27.9879
27.9865
27.9876
27.9814
27.9842
27.9868
27.9834
27.9892
27.9864
27.9843
27.9838
27.9847
27.9860
27.9872
27.9869
27.9602
27.9852
27.9860
27.9836
27.9813
27.9623
27.9843
27.9802
27.9863
27.9813
27.9881
27.9850
27.9850
27.9830
27.9866
27.9888
27.9841
27.9863
27.9903
27.9961
27.9905
27.9945
27.9878
27.9929
27.9914
27.9914
27.9997
28.0006
27.9999
28.0004
28.0020
28.0029
28.0008
28.0040
28.0078
28.0065
27.9959
28.0073
28.0017
28.0042
28.0036
28.0055
28.0007
28.0066
28.0011
27.9960
28.0083
27.9978
28.0108
28.0088
28.0088
28.0139
28.0092
28.0092
28.0049
28.0111
28.0120
28.0093
28.0116
28.0102
28.0139
28.0113
28.0158
28.0156
28.0137
28.0236
28.0171
28.0224
28.0184
28.0199
28.0190
28.0204
28.0170
28.0183
28.0201
28.0182
28.0183
28.0175
28.0127
28.0211
28.0057
28.0180
28.0183
28.0149
28.0185
28.0182
28.0192
28.0213
28.0216
28.0169
28.0162
28.0167
28.0167
28.0169
28.0169
28.0161
28.0152
28.0179
28.0215
28.0194
28.0115
28.0174
28.0178
28.0202
28.0240
28.0198
28.0194
28.0171
28.0134
28.0121
28.0121
28.0141
28.0101
28.0114
28.0122
28.0124
28.0171
28.0165
28.0166
28.0159
28.0181
28.0200
28.0116
28.0144
28.0141
28.0116
28.0107
28.0169
28.0105
28.0136
28.0138
28.0114
28.0122
28.0122
28.0116
28.0025
28.0097
28.0066
28.0072
28.0066
28.0068
28.0067
28.0130
28.0091
28.0088
28.0091
28.0091
28.0115
28.0087
28.0128
28.0139
28.0095
28.0115
28.0101
28.0121
28.0114
28.0121
28.0122
28.0121
28.0168
28.0212
28.0219
28.0221
28.0204
28.0169
28.0141
28.0142
28.0147
28.0159
28.0165
28.0144
28.0182
28.0155
28.0155
28.0192
28.0204
28.0185
28.0248
28.0185
28.0226
28.0271
28.0290
28.0240
28.0302
28.0243
28.0288
28.0287
28.0301
28.0273
28.0313
28.0293
28.0300
28.0344
28.0308
28.0291
28.0287
28.0358
28.0309
28.0286
28.0308
28.0291
28.0380
28.0411
28.0420
28.0359
28.0368
28.0327
28.0361
28.0334
28.0300
28.0347
28.0359
28.0344
28.0370
28.0355
28.0371
28.0318
28.0390
28.0390
28.0390
28.0376
28.0376
28.0377
28.0345
28.0333
28.0429
28.0379
28.0401
28.0401
28.0423
28.0393
28.0382
28.0424
28.0386
28.0386
28.0373
28.0397
28.0412
28.0565
28.0419
28.0456
28.0426
28.0423
28.0391
28.0403
28.0388
28.0408
28.0457
28.0455
28.0460
28.0456
28.0464
28.0442
28.0416
28.0451
28.0432
28.0434
28.0448
28.0448
28.0373
28.0429
28.0392
28.0469
28.0443
28.0356
28.0474
28.0446
28.0348
28.0368
28.0418
28.0445
28.0533
28.0439
28.0474
28.0435
28.0419
28.0538
28.0538
28.0463
28.0491
28.0441
28.0411
28.0507
28.0459
28.0519
28.0554
28.0512
28.0507
28.0582
28.0471
28.0539
28.0530
28.0502
28.0422
28.0431
28.0395
28.0177
28.0425
28.0484
28.0693
28.0490
28.0453
28.0494
28.0522
28.0393
28.0443
28.0465
28.0450
28.0539
28.0566
28.0585
28.0486
28.0427
28.0548
28.0616
28.0298
28.0726
28.0695
28.0629
28.0503
28.0493
28.0537
28.0613
28.0643
28.0678
28.0564
28.0703
28.0647
28.0579
28.0630
28.0716
28.0586
28.0607
28.0601
28.0611
28.0606
28.0611
28.0066
28.0412
28.0558
28.0590
28.0750
28.0483
28.0599
28.0490
28.0499
28.0565
28.0612
28.0634
28.0627
28.0519
28.0551
28.0696
28.0581
28.0568
28.0572
28.0529
28.0421
28.0432
28.0211
28.0363
28.0436
28.0619
28.0573
28.0499
28.0340
28.0474
28.0534
28.0589
28.0466
28.0448
28.0576
28.0558
28.0522
28.0480
28.0444
28.0429
28.0624
28.0610
28.0461
28.0564
28.0734
28.0565
28.0503
28.0581
28.0519
28.0625
28.0583
28.0645
28.0642
28.0535
28.0510
28.0542
28.0677
28.0416
28.0676
28.0596
28.0635
28.0558
28.0623
28.0718
28.0585
28.0552
28.0684
28.0646
28.0590
28.0465
28.0594
28.0303
28.0533
28.0561
28.0585
28.0497
28.0582
28.0507
28.0562
28.0715
28.0468
28.0411
28.0587
28.0456
28.0705
28.0534
28.0558
28.0536
28.0552
28.0461
28.0598
28.0598
28.0650
28.0423
28.0442
28.0449
28.0660
28.0506
28.0655
28.0512
28.0407
28.0475
28.0411
28.0512
28.1036
28.0641
28.0572
28.0700
28.0577
28.0637
28.0534
28.0461
28.0701
28.0631
28.0575
28.0444
28.0592
28.0684
28.0593
28.0677
28.0512
28.0644
28.0660
28.0542
28.0768
28.0515
28.0579
28.0538
28.0526
28.0833
28.0637
28.0529
28.0535
28.0561
28.0736
28.0635
28.0600
28.0520
28.0695
28.0608
28.0608
28.0590
28.0290
28.0939
28.0618
28.0551
28.0757
28.0698
28.0717
28.0529
28.0644
28.0613
28.0759
28.0745
28.0736
28.0611
28.0732
28.0782
28.0682
28.0756
28.0857
28.0739
28.0840
28.0862
28.0724
28.0727
28.0752
28.0732
28.0703
28.0849
28.0795
28.0902
28.0874
28.0971
28.0638
28.0877
28.0751
28.0904
28.0971
28.0661
28.0711
28.0754
28.0516
28.0961
28.0689
28.1110
28.1062
28.0726
28.1141
28.0913
28.0982
28.0703
28.0654
28.0760
28.0727
28.0850
28.0877
28.0967
28.1185
28.0945
28.0834
28.0764
28.1129
28.0797
28.0707
28.1008
28.0971
28.0826
28.0857
28.0984
28.0869
28.0795
28.0875
28.1184
28.0746
28.0816
28.0879
28.0888
28.0924
28.0979
28.0702
28.0847
28.0917
28.0834
28.0823
28.0917
28.0779
28.0852
28.0863
28.0942
28.0801
28.0817
28.0922
28.0914
28.0868
28.0832
28.0881
28.0910
28.0886
28.0961
28.0857
28.0859
28.1086
28.0838
28.0921
28.0945
28.0839
28.0877
28.0803
28.0928
28.0885
28.0940
28.0856
28.0849
28.0955
28.0955
28.0846
28.0871
28.0872
28.0917
28.0931
28.0865
28.0900
28.0915
28.0963
28.0917
28.0950
28.0898
28.0902
28.0867
28.0843
28.0939
28.0902
28.0911
28.0909
28.0949
28.0867
28.0932
28.0891
28.0932
28.0887
28.0925
28.0928
28.0883
28.0946
28.0977
28.0914
28.0959
28.0926
28.0923
28.0950
28.1006
28.0924
28.0963
28.0893
28.0956
28.0980
28.0928
28.0951
28.0958
28.0912
28.0990
28.0915
28.0957
28.0976
28.0888
28.0928
28.0910
28.0902
28.0950
28.0995
28.0965
28.0972
28.0963
28.0946
28.0942
28.0998
28.0911
28.1043
28.1002
28.0991
28.0959
28.0996
28.0926
28.1002
28.0961
28.0983
28.0997
28.0959
28.0988
28.1029
28.0989
28.1000
28.0944
28.0979
28.1005
28.1012
28.1013
28.0999
28.0991
28.1059
28.0961
28.0981
28.1045
28.1047
28.1042
28.1146
28.1113
28.1051
28.1065
28.1065
28.0985
28.1000
28.1066
28.1041
28.0954
28.1090
4-Plot of
Data
is not valid. Given the linear appearance of the lag plot, the first step
might be to consider a model of the type
data in the first and last thirds was collected in winter while the more
stable middle third was collected in the summer. The seasonal effect
was determined to be caused by the amount of humidity affecting the
measurement equipment. In this case, the solution was to modify the
test equipment to be less sensitive to enviromental factors.
Simple graphical techniques can be quite effective in revealing
unexpected results in the data. When this occurs, it is important to
investigate whether the unexpected result is due to problems in the
experiment and data collection, or is it in fact indicative of an
unexpected underlying structure in the data. This determination cannot
be made on the basis of statistics alone. The role of the graphical and
statistical analysis is to detect problems or unexpected results in the
data. Resolving the issues requires the knowledge of the scientist or
engineer.
Run
Sequence
Plot
Lag Plot
SUMMARY
***********************************************************************
* LOCATION MEASURES * DISPERSION MEASURES
*
***********************************************************************
* MIDRANGE = 0.2797325E+02 * RANGE = 0.2905006E+00
*
* MEAN = 0.2801634E+02 * STAND. DEV. = 0.6349404E-01
*
* MIDMEAN = 0.2802659E+02 * AV. AB. DEV. = 0.5101655E-01
*
* MEDIAN = 0.2802910E+02 * MINIMUM = 0.2782800E+02
*
* = * LOWER QUART. = 0.2797905E+02
*
* = * LOWER HINGE = 0.2797900E+02
*
* = * UPPER HINGE = 0.2806295E+02
*
* = * UPPER QUART. = 0.2806293E+02
*
* = * MAXIMUM = 0.2811850E+02
*
***********************************************************************
* RANDOMNESS MEASURES * DISTRIBUTIONAL MEASURES
*
***********************************************************************
* AUTOCO COEF = 0.9721591E+00 * ST. 3RD MOM. = -0.6936395E+00
*
* = 0.0000000E+00 * ST. 4TH MOM. = 0.2689681E+01
*
* = 0.0000000E+00 * ST. WILK-SHA = -0.4216419E+02
*
* = * UNIFORM PPCC = 0.9689648E+00
*
* = * NORMAL PPCC = 0.9718416E+00
*
* = * TUK -.5 PPCC = 0.7334843E+00
*
* = * CAUCHY PPCC = 0.3347875E+00
*
***********************************************************************
Location One way to quantify a change in location over time is to fit a straight line to the data set
using the index variable X = 1, 2, ..., N, with N denoting the number of observations. If
there is no significant drift in the location, the slope parameter estimate should be zero.
For this data set, Dataplot generates the following output:
Variation One simple way to detect a change in variation is with a Bartlett test after dividing the
data set into several equal-sized intervals. However, the Bartlett test is not robust for
non-normality. Since the normality assumption is questionable for these data, we use the
alternative Levene test. In partiuclar, we use the Levene test based on the median rather
the mean. The choice of the number of intervals is somewhat arbitrary, although values of
4 or 8 are reasonable. Dataplot generated the following output for the Levene test.
1. STATISTICS
NUMBER OF OBSERVATIONS = 1000
NUMBER OF GROUPS = 4
LEVENE F TEST STATISTIC = 140.8509
Randomness
There are many ways in which data can be non-random. However, most common forms
of non-randomness can be detected with a few simple tests. The lag plot in the 4-plot in
the previous section is a simple graphical technique.
One check is an autocorrelation plot that shows the autocorrelations for various lags.
Confidence bands can be plotted at the 95% and 99% confidence levels. Points outside
this band indicate statistically significant values (lag 0 is always 1). Dataplot generated
the following autocorrelation plot.
The lag 1 autocorrelation, which is generally the one of greatest interest, is 0.97. The
critical values at the 5% significance level are -0.062 and 0.062. This indicates that the
lag 1 autocorrelation is statistically significant, so there is strong evidence of
non-randomness.
A common test for randomness is the runs test.
RUNS UP
RUNS DOWN
Values in the column labeled "Z" greater than 1.96 or less than -1.96 are statistically
significant at the 5% level. Due to the number of values that are larger than the 1.96
cut-off, we conclude that the data are not random. However, in this case the evidence
from the runs test is not nearly as strong as it is from the autocorrelation plot.
Distributional Since we rejected the randomness assumption, the distributional tests are not meaningful.
Analysis Therefore, these quantitative tests are omitted. Since the Grubbs' test for outliers also
assumes the approximate normality of the data, we omit Grubbs' test as well.
Univariate It is sometimes useful and convenient to summarize the above results in a report.
Report
2: Location
Mean = 28.01635
Standard Deviation of Mean = 0.002008
95% Confidence Interval for Mean = (28.0124,28.02029)
Drift with respect to location? = NO
3: Variation
Standard Deviation = 0.063495
95% Confidence Interval for SD = (0.060829,0.066407)
Change in variation?
(based on Levene's test on quarters
of the data) = YES
4: Randomness
Autocorrelation = 0.972158
5: Distribution
Distributional test omitted due to
non-randomness of the data
6: Statistical Control
(i.e., no drift in location or scale,
data are random, distribution is
fixed)
Data Set is in Statistical Control? = NO
7: Outliers?
(Grubbs' test omitted due to
non-randomness of the data
Click on the links below to start Dataplot and run this case study
yourself. Each step may use results from previous steps, so please be
patient. Wait until the software verifies that the current step is
complete before clicking on the next step.
The links in this column will connect you with more detailed information about
NOTE: This case study has 1,000 points. For better performance, it each analysis step from the case study description.
is highly recommended that you check the "No Update" box on the
Spreadsheet window for this case study. This will suppress
subsequent updating of the Spreadsheet window as the data are
created or modified.
1. Generate a run sequence plot. 1. The run sequence plot indicates that
there are shifts of location and
variation.
2. Generate the sample mean, a confidence 2. The mean is 28.0163 and a 95%
interval for the population mean, and confidence interval is (28.0124,28.02029).
compute a linear fit to detect drift in The linear fit indicates drift in
3. Generate the sample standard deviation, 3. The standard deviation is 0.0635 with
a confidence interval for the population a 95% confidence interval of (0.060829,0.066407).
standard deviation, and detect drift in Levene's test indicates significant
variation by dividing the data into change in variation.
quarters and computing Levene's test for
equal standard deviations.
Resulting The following are the data used for this case study.
Data
9.206343
9.299992
9.277895
9.305795
9.275351
9.288729
9.287239
9.260973
9.303111
9.275674
9.272561
9.288454
9.255672
9.252141
9.297670
9.266534
9.256689
9.277542
9.248205
9.252107
9.276345
9.278694
9.267144
9.246132
9.238479
9.269058
9.248239
9.257439
9.268481
9.288454
9.258452
9.286130
9.251479
9.257405
9.268343
9.291302
9.219460
9.270386
9.218808
9.241185
9.269989
9.226585
9.258556
9.286184
9.320067
9.327973
9.262963
9.248181
9.238644
9.225073
9.220878
9.271318
9.252072
9.281186
9.270624
9.294771
9.301821
9.278849
9.236680
9.233988
9.244687
9.221601
9.207325
9.258776
9.275708
9.268955
9.257269
9.264979
9.295500
9.292883
9.264188
9.280731
9.267336
9.300566
9.253089
9.261376
9.238409
9.225073
9.235526
9.239510
9.264487
9.244242
9.277542
9.310506
9.261594
9.259791
9.253089
9.245735
9.284058
9.251122
9.275385
9.254619
9.279526
9.275065
9.261952
9.275351
9.252433
9.230263
9.255150
9.268780
9.290389
9.274161
9.255707
9.261663
9.250455
9.261952
9.264041
9.264509
9.242114
9.239674
9.221553
9.241935
9.215265
9.285930
9.271559
9.266046
9.285299
9.268989
9.267987
9.246166
9.231304
9.240768
9.260506
9.274355
9.292376
9.271170
9.267018
9.308838
9.264153
9.278822
9.255244
9.229221
9.253158
9.256292
9.262602
9.219793
9.258452
9.267987
9.267987
9.248903
9.235153
9.242933
9.253453
9.262671
9.242536
9.260803
9.259825
9.253123
9.240803
9.238712
9.263676
9.243002
9.246826
9.252107
9.261663
9.247311
9.306055
9.237646
9.248937
9.256689
9.265777
9.299047
9.244814
9.287205
9.300566
9.256621
9.271318
9.275154
9.281834
9.253158
9.269024
9.282077
9.277507
9.284910
9.239840
9.268344
9.247778
9.225039
9.230750
9.270024
9.265095
9.284308
9.280697
9.263032
9.291851
9.252072
9.244031
9.283269
9.196848
9.231372
9.232963
9.234956
9.216746
9.274107
9.273776
4-Plot of
Data
Run
Sequence
Plot
Lag Plot
Histogram
(with
overlaid
Normal PDF)
Normal
Probability
Plot
SUMMARY
***********************************************************************
* LOCATION MEASURES * DISPERSION MEASURES
*
***********************************************************************
* MIDRANGE = 0.9262411E+01 * RANGE = 0.1311255E+00
*
* MEAN = 0.9261460E+01 * STAND. DEV. = 0.2278881E-01
*
* MIDMEAN = 0.9259412E+01 * AV. AB. DEV. = 0.1788945E-01
*
* MEDIAN = 0.9261952E+01 * MINIMUM = 0.9196848E+01
*
* = * LOWER QUART. = 0.9246826E+01
*
* = * LOWER HINGE = 0.9246496E+01
*
* = * UPPER HINGE = 0.9275530E+01
*
* = * UPPER QUART. = 0.9275708E+01
*
* = * MAXIMUM = 0.9327973E+01
*
***********************************************************************
* RANDOMNESS MEASURES * DISTRIBUTIONAL MEASURES
*
***********************************************************************
* AUTOCO COEF = 0.2805789E+00 * ST. 3RD MOM. = -0.8537455E-02
*
* = 0.0000000E+00 * ST. 4TH MOM. = 0.3049067E+01
*
* = 0.0000000E+00 * ST. WILK-SHA = 0.9458605E+01
*
* = * UNIFORM PPCC = 0.9735289E+00
*
* = * NORMAL PPCC = 0.9989640E+00
*
* = * TUK -.5 PPCC = 0.8927904E+00
*
* = * CAUCHY PPCC = 0.6360204E+00
*
***********************************************************************
Location One way to quantify a change in location over time is to fit a straight line to the data set
using the index variable X = 1, 2, ..., N, with N denoting the number of observations. If
there is no significant drift in the location, the slope parameter should be zero. For this
data set, Dataplot generates the following output:
Variation One simple way to detect a change in variation is with a Bartlett test after dividing the
data set into several equal-sized intervals. The choice of the number of intervals is
somewhat arbitrary, although values of 4 or 8 are reasonable. Dataplot generated the
following output for the Bartlett test.
BARTLETT TEST
(STANDARD DEFINITION)
NULL HYPOTHESIS UNDER TEST--ALL SIGMA(I) ARE EQUAL
TEST:
DEGREES OF FREEDOM = 3.000000
In this case, since the Bartlett test statistic of 3.14 is less than the critical value at the 5%
significance level of 7.81, we conclude that the standard deviations are not significantly
different in the 4 intervals. That is, the assumption of constant scale is valid.
Randomness There are many ways in which data can be non-random. However, most common forms
of non-randomness can be detected with a few simple tests. The lag plot in the previous
section is a simple graphical technique.
Another check is an autocorrelation plot that shows the autocorrelations for various lags.
Confidence bands can be plotted at the 95% and 99% confidence levels. Points outside
this band indicate statistically significant values (lag 0 is always 1). Dataplot generated
the following autocorrelation plot.
The lag 1 autocorrelation, which is generally the one of greatest interest, is 0.281. The
critical values at the 5% significance level are -0.087 and 0.087. This indicates that the
lag 1 autocorrelation is statistically significant, so there is evidence of non-randomness.
A common test for randomness is the runs test.
RUNS UP
RUNS DOWN
OF LENGTH EXACTLY I
Values in the column labeled "Z" greater than 1.96 or less than -1.96 are statistically
significant at the 5% level. The runs test does indicate some non-randomness.
Although the autocorrelation plot and the runs test indicate some mild non-randomness,
the violation of the randomness assumption is not serious enough to warrant developing a
more sophisticated model. It is common in practice that some of the assumptions are
mildly violated and it is a judgement call as to whether or not the violations are serious
enough to warrant developing a more sophisticated model for the data.
Distributional Probability plots are a graphical test for assessing if a particular distribution provides an
Analysis adequate fit to a data set.
A quantitative enhancement to the probability plot is the correlation coefficient of the
points on the probability plot. For this data set the correlation coefficient is 0.996. Since
this is greater than the critical value of 0.987 (this is a tabulated value), the normality
assumption is not rejected.
Chi-square and Kolmogorov-Smirnov goodness-of-fit tests are alternative methods for
assessing distributional adequacy. The Wilk-Shapiro and Anderson-Darling tests can be
used to test for normality. Dataplot generates the following output for the
Anderson-Darling normality test.
1. STATISTICS:
NUMBER OF OBSERVATIONS = 195
MEAN = 9.261460
STANDARD DEVIATION = 0.2278881E-01
2. CRITICAL VALUES:
90 % POINT = 0.6560000
95 % POINT = 0.7870000
97.5 % POINT = 0.9180000
99 % POINT = 1.092000
The Anderson-Darling test also does not reject the normality assumption because the test
statistic, 0.129, is less than the critical value at the 5% significance level of 0.918.
Outlier A test for outliers is the Grubbs' test. Dataplot generated the following output for Grubbs'
Analysis
test.
1. STATISTICS:
NUMBER OF OBSERVATIONS = 195
MINIMUM = 9.196848
MEAN = 9.261460
MAXIMUM = 9.327973
STANDARD DEVIATION = 0.2278881E-01
For this data set, Grubbs' test does not detect any outliers at the 25%, 10%, 5%, and 1%
significance levels.
Model Since the underlying assumptions were validated both graphically and analytically, with a
mild violation of the randomness assumption, we conclude that a reasonable model for
the data is:
We can express the uncertainty for C, here estimated by 9.26146, as the 95% confidence
interval (9.258242,9.26479).
Univariate It is sometimes useful and convenient to summarize the above results in a report. The
Report report for the heat flow meter data follows.
2: Location
Mean = 9.26146
Standard Deviation of Mean = 0.001632
95% Confidence Interval for Mean = (9.258242,9.264679)
Drift with respect to location? = NO
3: Variation
Standard Deviation = 0.022789
95% Confidence Interval for SD = (0.02073,0.025307)
Drift with respect to variation?
(based on Bartlett's test on quarters
of the data) = NO
4: Randomness
Autocorrelation = 0.280579
Data are Random?
(as measured by autocorrelation) = NO
5: Distribution
Normal PPCC = 0.998965
Data are Normal?
(as measured by Normal PPCC) = YES
6: Statistical Control
(i.e., no drift in location or scale,
7: Outliers?
(as determined by Grubbs' test) = NO
Click on the links below to start Dataplot and run this case study
yourself. Each step may use results from previous steps, so please be The links in this column will connect you with more detailed information
patient. Wait until the software verifies that the current step is about each analysis step from the case study description.
complete before clicking on the next step.
1. Generate a run sequence plot. 1. The run sequence plot indicates that
there are no shifts of location or
scale.
2. Generate a lag plot. 2. The lag plot does not indicate any
significant patterns (which would
show the data were not random).
5. Check for normality by computing the 5. The normal probability plot correlation
normal probability plot correlation coefficient is 0.999. At the 5% level,
coefficient. we cannot reject the normality assumption.
6. Check for outliers using Grubbs' test. 6. Grubbs' test detects no outliers at the
5% level.
Purpose of The goal of this case study is to find a good distributional model for the
Analysis polished window strength data. Once a good distributional model has
been determined, various percent points for the polished widow strength
will be computed.
Since the data were used in a study to predict failure times, this case
study is a form of reliability analysis. The assessing product reliability
chapter contains a more complete discussion of reliabilty methods. This
case study is meant to complement that chapter by showing the use of
graphical techniques in one aspect of reliability modeling.
Data in reliability analysis do not typically follow a normal distribution;
non-parametric methods (techniques that do not rely on a specific
distribution) are frequently recommended for developing confidence
intervals for failure data. One problem with this approach is that sample
sizes are often small due to the expense involved in collecting the data,
and non-parametric methods do not work well for small sample sizes.
For this reason, a parametric method based on a specific distributional
model of the data is preferred if the data can be shown to follow a
specific distribution. Parametric models typically have greater efficiency
at the cost of more specific assumptions about the data, but, it is
important to verify that the distributional assumption is indeed valid. If
the distributional assumption is not justified, then the conclusions drawn
Resulting The following are the data used for this case study. The data are in ksi
Data (= 1,000 psi).
18.830
20.800
21.657
23.030
23.230
24.050
24.321
25.500
25.520
25.800
26.690
26.770
26.780
27.050
27.670
29.900
31.110
33.200
33.730
33.760
33.890
34.760
35.750
35.910
36.980
37.080
37.090
39.580
44.045
45.290
45.381
● The data are somewhat symmetric, but with a gap in the middle.
The normal probability plot has a correlation coefficient of 0.980. We can use
this number as a reference baseline when comparing the performance of other
distributional fits.
Other Potential There is a large number of distributions that would be distributional model
Distributions candidates for the data. However, we will restrict ourselves to consideration of
the following distributional models because these have proven to be useful in
reliability studies.
1. Normal distribution
2. Exponential distribution
3. Weibull distribution
4. Lognormal distribution
5. Gamma distribution
6. Power normal distribution
7. Fatigue life distribution
Analyses for We analyzed the data using the approach described above for the following
Specific distributional models:
Distributions 1. Normal distribution - from the 4-plot above, the PPCC value was 0.980.
2. Exponential distribution - the exponential distribution is a special case of
the Weibull with shape parameter equal to 1. If the Weibull analysis
yields a shape parameter close to 1, then we would consider using the
simpler exponential model.
3. Weibull distribution
4. Lognormal distribution
5. Gamma distribution
6. Power normal distribution
7. Power lognormal distribution
Percent Point The final step in this analysis is to compute percent point estimates for the 1%,
Estimates 2.5%, 5%, 95%, 97.5%, and 99% percent points. A percent point estimate is an
estimate of the strength at which a given percentage of units will be weaker. For
example, the 5% point is the strength at which we estimate that 5% of the units
will be weaker.
To calculate these values, we use the Weibull percent point function with the
appropriate estimates of the shape, location, and scale parameters. The Weibull
percent point function can be computed in many general purpose statistical
software programs, including Dataplot.
Normal
Anderson-Darling ANDERSON-DARLING 1-SAMPLE TEST
THAT THE DATA CAME FROM A NORMAL DISTRIBUTION
Output
1. STATISTICS:
NUMBER OF OBSERVATIONS = 31
MEAN = 30.81142
STANDARD DEVIATION = 7.253381
2. CRITICAL VALUES:
90 % POINT = 0.6160000
95 % POINT = 0.7350000
97.5 % POINT = 0.8610000
99 % POINT = 1.021000
Lognormal
Anderson-Darling ANDERSON-DARLING 1-SAMPLE TEST
THAT THE DATA CAME FROM A LOGNORMAL DISTRIBUTION
Output
1. STATISTICS:
NUMBER OF OBSERVATIONS = 31
MEAN OF LOG OF DATA = 3.401242
STANDARD DEVIATION OF LOG OF DATA = 0.2349026
2. CRITICAL VALUES:
90 % POINT = 0.6160000
95 % POINT = 0.7350000
97.5 % POINT = 0.8610000
99 % POINT = 1.021000
Weibull
Anderson-Darling ANDERSON-DARLING 1-SAMPLE TEST
THAT THE DATA CAME FROM A WEIBULL DISTRIBUTION
Output
1. STATISTICS:
NUMBER OF OBSERVATIONS = 31
MEAN = 14.91142
STANDARD DEVIATION = 7.253381
SHAPE PARAMETER = 2.237495
SCALE PARAMETER = 16.87868
2. CRITICAL VALUES:
90 % POINT = 0.6370000
95 % POINT = 0.7570000
97.5 % POINT = 0.8770000
99 % POINT = 1.038000
Conclusions The Anderson-Darling test passes all three of these distributions. Note that the
value of the Anderson-Darling test statistic is the smallest for the Weibull
distribution with the value for the lognormal distribution just slightly larger. The
test statistic for the normal distribution is noticeably higher than for the Weibull
or lognormal.
This provides additional confirmation that either the Weibull or lognormal
distribution fits this data better than the normal distribution with the Weibull
providing a slightly better fit than the lognormal.
Alternative The Weibull plot and the Weibull hazard plot are alternative graphical
Plots analysis procedures to the PPCC plots and probability plots.
These two procedures, especially the Weibull plot, are very commonly
employed. That not withstanding, the disadvantage of these two
procedures is that they both assume that the location parameter (i.e., the
lower bound) is zero and that we are fitting a 2-parameter Weibull
instead of a 3-parameter Weibull. The advantage is that there is an
extensive literature on these methods and they have been designed to
work with either censored or uncensored data.
Weibull Plot
Weibull
Hazard Plot
Click on the links below to start Dataplot and run this case study
yourself. Each step may use results from previous steps, so please be The links in this column will connect you with more detailed information
patient. Wait until the software verifies that the current step is about each analysis step from the case study description.
complete before clicking on the next step.
4. Direction (X4)
For this case study, we are using only half the data. Specifically, we are using
the data with the direction longitudinal. Therefore, we have only three primary
factors
In addtion, we are interested in the nuisance factors
1. Lab
2. Batch
The complete file can be read into Dataplot with the following commands:
DIMENSION 20 VARIABLES
SKIP 50
READ JAHANMI2.DAT RUN RUN LAB BAR SET Y X1 TO X8 BATCH
Resulting The following are the data used for this case study
Data
Run Lab Batch Y X1 X2 X3
1 1 1 608.781 -1 -1 -1
2 1 2 569.670 -1 -1 -1
3 1 1 689.556 -1 -1 -1
4 1 2 747.541 -1 -1 -1
5 1 1 618.134 -1 -1 -1
6 1 2 612.182 -1 -1 -1
7 1 1 680.203 -1 -1 -1
8 1 2 607.766 -1 -1 -1
9 1 1 726.232 -1 -1 -1
10 1 2 605.380 -1 -1 -1
11 1 1 518.655 -1 -1 -1
12 1 2 589.226 -1 -1 -1
13 1 1 740.447 -1 -1 -1
14 1 2 588.375 -1 -1 -1
15 1 1 666.830 -1 -1 -1
16 1 2 531.384 -1 -1 -1
17 1 1 710.272 -1 -1 -1
18 1 2 633.417 -1 -1 -1
19 1 1 751.669 -1 -1 -1
20 1 2 619.060 -1 -1 -1
21 1 1 697.979 -1 -1 -1
22 1 2 632.447 -1 -1 -1
23 1 1 708.583 -1 -1 -1
24 1 2 624.256 -1 -1 -1
25 1 1 624.972 -1 -1 -1
26 1 2 575.143 -1 -1 -1
27 1 1 695.070 -1 -1 -1
28 1 2 549.278 -1 -1 -1
29 1 1 769.391 -1 -1 -1
30 1 2 624.972 -1 -1 -1
61 1 1 720.186 -1 1 1
62 1 2 587.695 -1 1 1
63 1 1 723.657 -1 1 1
64 1 2 569.207 -1 1 1
65 1 1 703.700 -1 1 1
66 1 2 613.257 -1 1 1
67 1 1 697.626 -1 1 1
68 1 2 565.737 -1 1 1
69 1 1 714.980 -1 1 1
70 1 2 662.131 -1 1 1
71 1 1 657.712 -1 1 1
72 1 2 543.177 -1 1 1
73 1 1 609.989 -1 1 1
74 1 2 512.394 -1 1 1
75 1 1 650.771 -1 1 1
76 1 2 611.190 -1 1 1
77 1 1 707.977 -1 1 1
78 1 2 659.982 -1 1 1
79 1 1 712.199 -1 1 1
80 1 2 569.245 -1 1 1
81 1 1 709.631 -1 1 1
82 1 2 725.792 -1 1 1
83 1 1 703.160 -1 1 1
84 1 2 608.960 -1 1 1
85 1 1 744.822 -1 1 1
86 1 2 586.060 -1 1 1
87 1 1 719.217 -1 1 1
88 1 2 617.441 -1 1 1
89 1 1 619.137 -1 1 1
90 1 2 592.845 -1 1 1
151 2 1 753.333 1 1 1
152 2 2 631.754 1 1 1
153 2 1 677.933 1 1 1
154 2 2 588.113 1 1 1
155 2 1 735.919 1 1 1
156 2 2 555.724 1 1 1
157 2 1 695.274 1 1 1
158 2 2 702.411 1 1 1
159 2 1 504.167 1 1 1
160 2 2 631.754 1 1 1
161 2 1 693.333 1 1 1
162 2 2 698.254 1 1 1
163 2 1 625.000 1 1 1
164 2 2 616.791 1 1 1
165 2 1 596.667 1 1 1
166 2 2 551.953 1 1 1
167 2 1 640.898 1 1 1
168 2 2 636.738 1 1 1
169 2 1 720.506 1 1 1
170 2 2 571.551 1 1 1
171 2 1 700.748 1 1 1
172 2 2 521.667 1 1 1
173 2 1 691.604 1 1 1
174 2 2 587.451 1 1 1
175 2 1 636.738 1 1 1
176 2 2 700.422 1 1 1
177 2 1 731.667 1 1 1
178 2 2 595.819 1 1 1
179 2 1 635.079 1 1 1
180 2 2 534.236 1 1 1
181 2 1 716.926 1 -1 -1
182 2 2 606.188 1 -1 -1
183 2 1 759.581 1 -1 -1
184 2 2 575.303 1 -1 -1
185 2 1 673.903 1 -1 -1
186 2 2 590.628 1 -1 -1
187 2 1 736.648 1 -1 -1
188 2 2 729.314 1 -1 -1
189 2 1 675.957 1 -1 -1
190 2 2 619.313 1 -1 -1
191 2 1 729.230 1 -1 -1
192 2 2 624.234 1 -1 -1
193 2 1 697.239 1 -1 -1
194 2 2 651.304 1 -1 -1
195 2 1 728.499 1 -1 -1
196 2 2 724.175 1 -1 -1
197 2 1 797.662 1 -1 -1
198 2 2 583.034 1 -1 -1
199 2 1 668.530 1 -1 -1
200 2 2 620.227 1 -1 -1
201 2 1 815.754 1 -1 -1
202 2 2 584.861 1 -1 -1
203 2 1 777.392 1 -1 -1
204 2 2 565.391 1 -1 -1
205 2 1 712.140 1 -1 -1
206 2 2 622.506 1 -1 -1
207 2 1 663.622 1 -1 -1
208 2 2 628.336 1 -1 -1
209 2 1 684.181 1 -1 -1
210 2 2 587.145 1 -1 -1
271 3 1 629.012 1 -1 1
272 3 2 584.319 1 -1 1
273 3 1 640.193 1 -1 1
274 3 2 538.239 1 -1 1
275 3 1 644.156 1 -1 1
276 3 2 538.097 1 -1 1
277 3 1 642.469 1 -1 1
278 3 2 595.686 1 -1 1
279 3 1 639.090 1 -1 1
280 3 2 648.935 1 -1 1
281 3 1 439.418 1 -1 1
282 3 2 583.827 1 -1 1
283 3 1 614.664 1 -1 1
284 3 2 534.905 1 -1 1
285 3 1 537.161 1 -1 1
286 3 2 569.858 1 -1 1
287 3 1 656.773 1 -1 1
288 3 2 617.246 1 -1 1
289 3 1 659.534 1 -1 1
290 3 2 610.337 1 -1 1
291 3 1 695.278 1 -1 1
292 3 2 584.192 1 -1 1
293 3 1 734.040 1 -1 1
294 3 2 598.853 1 -1 1
295 3 1 687.665 1 -1 1
296 3 2 554.774 1 -1 1
297 3 1 710.858 1 -1 1
298 3 2 605.694 1 -1 1
299 3 1 701.716 1 -1 1
300 3 2 627.516 1 -1 1
301 3 1 382.133 1 1 -1
302 3 2 574.522 1 1 -1
303 3 1 719.744 1 1 -1
304 3 2 582.682 1 1 -1
305 3 1 756.820 1 1 -1
306 3 2 563.872 1 1 -1
307 3 1 690.978 1 1 -1
308 3 2 715.962 1 1 -1
309 3 1 670.864 1 1 -1
310 3 2 616.430 1 1 -1
311 3 1 670.308 1 1 -1
312 3 2 778.011 1 1 -1
313 3 1 660.062 1 1 -1
314 3 2 604.255 1 1 -1
315 3 1 790.382 1 1 -1
316 3 2 571.906 1 1 -1
317 3 1 714.750 1 1 -1
318 3 2 625.925 1 1 -1
319 3 1 716.959 1 1 -1
320 3 2 682.426 1 1 -1
321 3 1 603.363 1 1 -1
322 3 2 707.604 1 1 -1
323 3 1 713.796 1 1 -1
324 3 2 617.400 1 1 -1
325 3 1 444.963 1 1 -1
326 3 2 689.576 1 1 -1
327 3 1 723.276 1 1 -1
328 3 2 676.678 1 1 -1
329 3 1 745.527 1 1 -1
330 3 2 563.290 1 1 -1
361 4 1 778.333 -1 -1 1
362 4 2 581.879 -1 -1 1
363 4 1 723.349 -1 -1 1
364 4 2 447.701 -1 -1 1
365 4 1 708.229 -1 -1 1
366 4 2 557.772 -1 -1 1
367 4 1 681.667 -1 -1 1
368 4 2 593.537 -1 -1 1
369 4 1 566.085 -1 -1 1
370 4 2 632.585 -1 -1 1
371 4 1 687.448 -1 -1 1
372 4 2 671.350 -1 -1 1
373 4 1 597.500 -1 -1 1
374 4 2 569.530 -1 -1 1
375 4 1 637.410 -1 -1 1
376 4 2 581.667 -1 -1 1
377 4 1 755.864 -1 -1 1
378 4 2 643.449 -1 -1 1
379 4 1 692.945 -1 -1 1
380 4 2 581.593 -1 -1 1
381 4 1 766.532 -1 -1 1
382 4 2 494.122 -1 -1 1
383 4 1 725.663 -1 -1 1
384 4 2 620.948 -1 -1 1
385 4 1 698.818 -1 -1 1
386 4 2 615.903 -1 -1 1
387 4 1 760.000 -1 -1 1
388 4 2 606.667 -1 -1 1
389 4 1 775.272 -1 -1 1
390 4 2 579.167 -1 -1 1
421 4 1 708.885 -1 1 -1
422 4 2 662.510 -1 1 -1
423 4 1 727.201 -1 1 -1
424 4 2 436.237 -1 1 -1
425 4 1 642.560 -1 1 -1
426 4 2 644.223 -1 1 -1
427 4 1 690.773 -1 1 -1
428 4 2 586.035 -1 1 -1
429 4 1 688.333 -1 1 -1
430 4 2 620.833 -1 1 -1
431 4 1 743.973 -1 1 -1
432 4 2 652.535 -1 1 -1
433 4 1 682.461 -1 1 -1
434 4 2 593.516 -1 1 -1
435 4 1 761.430 -1 1 -1
436 4 2 587.451 -1 1 -1
437 4 1 691.542 -1 1 -1
438 4 2 570.964 -1 1 -1
439 4 1 643.392 -1 1 -1
440 4 2 645.192 -1 1 -1
441 4 1 697.075 -1 1 -1
442 4 2 540.079 -1 1 -1
443 4 1 708.229 -1 1 -1
444 4 2 707.117 -1 1 -1
445 4 1 746.467 -1 1 -1
446 4 2 621.779 -1 1 -1
447 4 1 744.819 -1 1 -1
448 4 2 585.777 -1 1 -1
449 4 1 655.029 -1 1 -1
450 4 2 703.980 -1 1 -1
541 5 1 715.224 -1 -1 -1
542 5 2 698.237 -1 -1 -1
543 5 1 614.417 -1 -1 -1
544 5 2 757.120 -1 -1 -1
545 5 1 761.363 -1 -1 -1
546 5 2 621.751 -1 -1 -1
547 5 1 716.106 -1 -1 -1
548 5 2 472.125 -1 -1 -1
549 5 1 659.502 -1 -1 -1
550 5 2 612.700 -1 -1 -1
551 5 1 730.781 -1 -1 -1
552 5 2 583.170 -1 -1 -1
553 5 1 546.928 -1 -1 -1
554 5 2 599.771 -1 -1 -1
555 5 1 734.203 -1 -1 -1
556 5 2 549.227 -1 -1 -1
557 5 1 682.051 -1 -1 -1
558 5 2 605.453 -1 -1 -1
559 5 1 701.341 -1 -1 -1
560 5 2 569.599 -1 -1 -1
561 5 1 759.729 -1 -1 -1
562 5 2 637.233 -1 -1 -1
563 5 1 689.942 -1 -1 -1
564 5 2 621.774 -1 -1 -1
565 5 1 769.424 -1 -1 -1
566 5 2 558.041 -1 -1 -1
567 5 1 715.286 -1 -1 -1
568 5 2 583.170 -1 -1 -1
569 5 1 776.197 -1 -1 -1
570 5 2 345.294 -1 -1 -1
571 5 1 547.099 1 -1 1
572 5 2 570.999 1 -1 1
573 5 1 619.942 1 -1 1
574 5 2 603.232 1 -1 1
575 5 1 696.046 1 -1 1
576 5 2 595.335 1 -1 1
577 5 1 573.109 1 -1 1
578 5 2 581.047 1 -1 1
579 5 1 638.794 1 -1 1
580 5 2 455.878 1 -1 1
581 5 1 708.193 1 -1 1
582 5 2 627.880 1 -1 1
583 5 1 502.825 1 -1 1
584 5 2 464.085 1 -1 1
585 5 1 632.633 1 -1 1
586 5 2 596.129 1 -1 1
587 5 1 683.382 1 -1 1
588 5 2 640.371 1 -1 1
589 5 1 684.812 1 -1 1
590 5 2 621.471 1 -1 1
591 5 1 738.161 1 -1 1
592 5 2 612.727 1 -1 1
593 5 1 671.492 1 -1 1
594 5 2 606.460 1 -1 1
595 5 1 709.771 1 -1 1
596 5 2 571.760 1 -1 1
597 5 1 685.199 1 -1 1
598 5 2 599.304 1 -1 1
599 5 1 624.973 1 -1 1
600 5 2 579.459 1 -1 1
601 6 1 757.363 1 1 1
602 6 2 761.511 1 1 1
603 6 1 633.417 1 1 1
604 6 2 566.969 1 1 1
605 6 1 658.754 1 1 1
606 6 2 654.397 1 1 1
607 6 1 664.666 1 1 1
608 6 2 611.719 1 1 1
609 6 1 663.009 1 1 1
610 6 2 577.409 1 1 1
611 6 1 773.226 1 1 1
612 6 2 576.731 1 1 1
613 6 1 708.261 1 1 1
614 6 2 617.441 1 1 1
615 6 1 739.086 1 1 1
616 6 2 577.409 1 1 1
617 6 1 667.786 1 1 1
618 6 2 548.957 1 1 1
619 6 1 674.481 1 1 1
620 6 2 623.315 1 1 1
621 6 1 695.688 1 1 1
622 6 2 621.761 1 1 1
623 6 1 588.288 1 1 1
624 6 2 553.978 1 1 1
625 6 1 545.610 1 1 1
626 6 2 657.157 1 1 1
627 6 1 752.305 1 1 1
628 6 2 610.882 1 1 1
629 6 1 684.523 1 1 1
630 6 2 552.304 1 1 1
631 6 1 717.159 -1 1 -1
632 6 2 545.303 -1 1 -1
633 6 1 721.343 -1 1 -1
634 6 2 651.934 -1 1 -1
635 6 1 750.623 -1 1 -1
636 6 2 635.240 -1 1 -1
637 6 1 776.488 -1 1 -1
638 6 2 641.083 -1 1 -1
639 6 1 750.623 -1 1 -1
640 6 2 645.321 -1 1 -1
641 6 1 600.840 -1 1 -1
642 6 2 566.127 -1 1 -1
643 6 1 686.196 -1 1 -1
644 6 2 647.844 -1 1 -1
645 6 1 687.870 -1 1 -1
646 6 2 554.815 -1 1 -1
647 6 1 725.527 -1 1 -1
648 6 2 620.087 -1 1 -1
649 6 1 658.796 -1 1 -1
650 6 2 711.301 -1 1 -1
651 6 1 690.380 -1 1 -1
652 6 2 644.355 -1 1 -1
653 6 1 737.144 -1 1 -1
654 6 2 713.812 -1 1 -1
655 6 1 663.851 -1 1 -1
656 6 2 696.707 -1 1 -1
657 6 1 766.630 -1 1 -1
658 6 2 589.453 -1 1 -1
659 6 1 625.922 -1 1 -1
660 6 2 634.468 -1 1 -1
721 7 1 694.430 1 1 -1
722 7 2 599.751 1 1 -1
723 7 1 730.217 1 1 -1
724 7 2 624.542 1 1 -1
725 7 1 700.770 1 1 -1
726 7 2 723.505 1 1 -1
727 7 1 722.242 1 1 -1
728 7 2 674.717 1 1 -1
729 7 1 763.828 1 1 -1
730 7 2 608.539 1 1 -1
731 7 1 695.668 1 1 -1
732 7 2 612.135 1 1 -1
733 7 1 688.887 1 1 -1
734 7 2 591.935 1 1 -1
735 7 1 531.021 1 1 -1
736 7 2 676.656 1 1 -1
737 7 1 698.915 1 1 -1
738 7 2 647.323 1 1 -1
739 7 1 735.905 1 1 -1
740 7 2 811.970 1 1 -1
741 7 1 732.039 1 1 -1
742 7 2 603.883 1 1 -1
743 7 1 751.832 1 1 -1
744 7 2 608.643 1 1 -1
745 7 1 618.663 1 1 -1
746 7 2 630.778 1 1 -1
747 7 1 744.845 1 1 -1
748 7 2 623.063 1 1 -1
749 7 1 690.826 1 1 -1
750 7 2 472.463 1 1 -1
811 7 1 666.893 -1 1 1
812 7 2 645.932 -1 1 1
813 7 1 759.860 -1 1 1
814 7 2 577.176 -1 1 1
815 7 1 683.752 -1 1 1
816 7 2 567.530 -1 1 1
817 7 1 729.591 -1 1 1
818 7 2 821.654 -1 1 1
819 7 1 730.706 -1 1 1
820 7 2 684.490 -1 1 1
821 7 1 763.124 -1 1 1
822 7 2 600.427 -1 1 1
823 7 1 724.193 -1 1 1
824 7 2 686.023 -1 1 1
825 7 1 630.352 -1 1 1
826 7 2 628.109 -1 1 1
827 7 1 750.338 -1 1 1
828 7 2 605.214 -1 1 1
829 7 1 752.417 -1 1 1
830 7 2 640.260 -1 1 1
831 7 1 707.899 -1 1 1
832 7 2 700.767 -1 1 1
833 7 1 715.582 -1 1 1
834 7 2 665.924 -1 1 1
835 7 1 728.746 -1 1 1
836 7 2 555.926 -1 1 1
837 7 1 591.193 -1 1 1
838 7 2 543.299 -1 1 1
839 7 1 592.252 -1 1 1
840 7 2 511.030 -1 1 1
901 8 1 740.833 -1 -1 1
902 8 2 583.994 -1 -1 1
903 8 1 786.367 -1 -1 1
904 8 2 611.048 -1 -1 1
905 8 1 712.386 -1 -1 1
906 8 2 623.338 -1 -1 1
907 8 1 738.333 -1 -1 1
908 8 2 679.585 -1 -1 1
909 8 1 741.480 -1 -1 1
910 8 2 665.004 -1 -1 1
911 8 1 729.167 -1 -1 1
912 8 2 655.860 -1 -1 1
913 8 1 795.833 -1 -1 1
914 8 2 715.711 -1 -1 1
915 8 1 723.502 -1 -1 1
916 8 2 611.999 -1 -1 1
917 8 1 718.333 -1 -1 1
918 8 2 577.722 -1 -1 1
919 8 1 768.080 -1 -1 1
920 8 2 615.129 -1 -1 1
921 8 1 747.500 -1 -1 1
922 8 2 540.316 -1 -1 1
923 8 1 775.000 -1 -1 1
924 8 2 711.667 -1 -1 1
925 8 1 760.599 -1 -1 1
926 8 2 639.167 -1 -1 1
927 8 1 758.333 -1 -1 1
928 8 2 549.491 -1 -1 1
929 8 1 682.500 -1 -1 1
930 8 2 684.167 -1 -1 1
931 8 1 658.116 1 -1 -1
932 8 2 672.153 1 -1 -1
933 8 1 738.213 1 -1 -1
934 8 2 594.534 1 -1 -1
935 8 1 681.236 1 -1 -1
936 8 2 627.650 1 -1 -1
937 8 1 704.904 1 -1 -1
938 8 2 551.870 1 -1 -1
939 8 1 693.623 1 -1 -1
940 8 2 594.534 1 -1 -1
941 8 1 624.993 1 -1 -1
942 8 2 602.660 1 -1 -1
943 8 1 700.228 1 -1 -1
944 8 2 585.450 1 -1 -1
945 8 1 611.874 1 -1 -1
946 8 2 555.724 1 -1 -1
947 8 1 579.167 1 -1 -1
948 8 2 574.934 1 -1 -1
949 8 1 720.872 1 -1 -1
950 8 2 584.625 1 -1 -1
951 8 1 690.320 1 -1 -1
952 8 2 555.724 1 -1 -1
953 8 1 677.933 1 -1 -1
954 8 2 611.874 1 -1 -1
955 8 1 674.600 1 -1 -1
956 8 2 698.254 1 -1 -1
957 8 1 611.999 1 -1 -1
958 8 2 748.130 1 -1 -1
959 8 1 530.680 1 -1 -1
960 8 2 689.942 1 -1 -1
SUMMARY
***********************************************************************
* LOCATION MEASURES * DISPERSION MEASURES
*
***********************************************************************
* MIDRANGE = 0.5834740E+03 * RANGE = 0.4763600E+03
*
* MEAN = 0.6500773E+03 * STAND. DEV. = 0.7463826E+02
*
* MIDMEAN = 0.6426155E+03 * AV. AB. DEV. = 0.6184948E+02
*
* MEDIAN = 0.6466275E+03 * MINIMUM = 0.3452940E+03
*
* = * LOWER QUART. = 0.5960515E+03
*
* = * LOWER HINGE = 0.5959740E+03
*
* = * UPPER HINGE = 0.7084220E+03
*
* = * UPPER QUART. = 0.7083415E+03
*
* = * MAXIMUM = 0.8216540E+03
*
***********************************************************************
* RANDOMNESS MEASURES * DISTRIBUTIONAL MEASURES
*
***********************************************************************
* AUTOCO COEF = -0.2290508E+00 * ST. 3RD MOM. = -0.3682922E+00
*
* = 0.0000000E+00 * ST. 4TH MOM. = 0.3220554E+01
*
* = 0.0000000E+00 * ST. WILK-SHA = 0.3877698E+01
*
* = * UNIFORM PPCC = 0.9756916E+00
*
From the above output, the mean strength is 650.08 and the standard deviation of the
strength is 74.64.
Bihistogram
Quantile-Quantile
Plot
Box Plot
2. The spread is reasonably similar for both batches, maybe slightly larger for batch
1.
3. Both batches have a number of outliers on the low side. Batch 2 also has a few
outliers on the high side. Box plots are a particularly effective method for
identifying the presence of outliers.
Block Plots A block plot is generated for each of the eight labs, with "1" and "2" denoting the batch
numbers. In the first plot, we do not include any of the primary factors. The next 3
block plots include one of the primary factors. Note that each of the 3 primary factors
(table speed = X1, down feed rate = X2, wheel grit size = X3) has 2 levels. With 8 labs
and 2 levels for the primary factor, we would expect 16 separate blocks on these plots.
The fact that some of these blocks are missing indicates that some of the combinations
of lab and primary factor are empty.
Quantitative We can confirm some of the conclusions drawn from the above graphics by using
Techniques quantitative techniques. The two sample t-test can be used to test whether or not the
means from the two batches are equal and the F-test can be used to test whether or not
the standard deviations from the two batches are equal.
Two Sample The following is the Dataplot output from the two sample t-test.
T-Test
T-TEST
(2-SAMPLE)
NULL HYPOTHESIS UNDER TEST--POPULATION MEANS MU1 = MU2
SAMPLE 1:
NUMBER OF OBSERVATIONS = 240
MEAN = 688.9987
STANDARD DEVIATION = 65.54909
STANDARD DEVIATION OF MEAN = 4.231175
SAMPLE 2:
NUMBER OF OBSERVATIONS = 240
MEAN = 611.1559
STANDARD DEVIATION = 61.85425
STANDARD DEVIATION OF MEAN = 3.992675
ALTERNATIVE- ALTERNATIVE-
ALTERNATIVE- HYPOTHESIS HYPOTHESIS
HYPOTHESIS ACCEPTANCE INTERVAL CONCLUSION
MU1 <> MU2 (0,0.025) (0.975,1) ACCEPT
MU1 < MU2 (0,0.05) REJECT
MU1 > MU2 (0.95,1) ACCEPT
The t-test indicates that the mean for batch 1 is larger than the mean for batch 2 (at the
5% confidence level).
F-TEST
NULL HYPOTHESIS UNDER TEST--SIGMA1 = SIGMA2
ALTERNATIVE HYPOTHESIS UNDER TEST--SIGMA1 NOT EQUAL SIGMA2
SAMPLE 1:
NUMBER OF OBSERVATIONS = 240
MEAN = 688.9987
STANDARD DEVIATION = 65.54909
SAMPLE 2:
NUMBER OF OBSERVATIONS = 240
MEAN = 611.1559
STANDARD DEVIATION = 61.85425
TEST:
STANDARD DEV. (NUMERATOR) = 65.54909
STANDARD DEV. (DENOMINATOR) = 61.85425
F-TEST STATISTIC VALUE = 1.123037
DEG. OF FREEDOM (NUMER.) = 239.0000
DEG. OF FREEDOM (DENOM.) = 239.0000
F-TEST STATISTIC CDF VALUE = 0.814808
Conclusions We can draw the following conclusions from the above analysis.
1. There is in fact a significant batch effect. This batch effect is consistent across
labs and primary factors.
2. The magnitude of the difference is on the order of 75 to 100 (with batch 2 being
smaller than batch 1). The standard deviations do not appear to be significantly
different.
3. There is some skewness in the batches.
This batch effect was completely unexpected by the scientific investigators in this
study.
Note that although the quantitative techniques support the conclusions of unequal
means and equal standard deviations, they do not show the more subtle features of the
data such as the presence of outliers and the skewness of the batch 2 data.
Box Plot for Given that the previous section showed a distinct batch effect, the next
Batch 1 step is to generate the box plots for the two batches separately.
Conclusions We can draw the following conclusions about a possible lab effect from
the above box plots.
1. The batch effect (of approximately 75 to 100 units) on location
dominates any lab effects.
2. It is reasonable to treat the labs as homogeneous.
Dex Scatter
Plot for
Batch 1
Dex Mean
Plot for
Batch 1
Dex SD Plot
for Batch 1
This dex standard deviation plot shows the following for batch 1.
1. The table speed factor (X1) has a significant difference in
variability between the levels of the factor. The difference is
approximately 20 units.
2. The wheel grit factor (X3) and the feed rate factor (X2) have
minimal differences in variability.
Dex Scatter
Plot for
Batch 2
Dex Mean
Plot for
Batch 2
Dex SD Plot
for Batch 2
This dex standard deviation plot shows the following for batch 2.
1. The difference in the standard deviations is roughly comparable
for the three factors (slightly less for the feed rate factor).
Interaction The above plots graphically show the main effects. An additonal
Effects concern is whether or not there any significant interaction effects.
Main effects and 2-term interaction effects are discussed in the chapter
on Process Improvement.
In the following dex interaction plots, the labels on the plot give the
variables and the estimated effect. For example, factor 1 is TABLE
SPEED and it has an estimated effect of 30.77 (it is actually -30.77 if
the direction is taken into account).
DEX
Interaction
Plot for
Batch 1
DEX
Interaction
Plot for
Batch 2
Conclusions From the above plots, we can draw the following overall conclusions.
1. The batch effect (of approximately 75 units) is the dominant
primary factor.
2. The most important factors differ from batch to batch. See the
above text for the ranked list of factors with the estimated effects.
Click on the links below to start Dataplot and run this case
study yourself. Each step may use results from previous The links in this column will connect you with more
steps, so please be patient. Wait until the software verifies detailed information about each analysis step from the case
that the current step is complete before clicking on the next study description.
step.
1. Generate a box plot for the labs 1. The box plot does not show a
with the 2 batches combined. significant lab effect.
2. Generate a box plot for the labs 2. The box plot does not show a
for batch 1 only. significant lab effect for batch 1.
3. Generate a box plot for the labs 3. The box plot does not show a
for batch 2 only. significant lab effect for batch 2.
1. Generate a dex scatter plot for 1. The dex scatter plot shows the
batch 1. range of the points and the
presence of outliers.
2. Generate a dex mean plot for 2. The dex mean plot shows that
batch 1. table speed is the most
significant factor for batch 1.
4. Generate a dex scatter plot for 4. The dex scatter plot shows
batch 2. the range of the points and
the presence of outliers.
Bloomfield, Peter (1976), Fourier Analysis of Time Series, John Wiley and Sons.
Box, G. E. P., Hunter, W. G., and Hunter, J. S. (1978), Statistics for Experimenters: An
Introduction to Design, Data Analysis, and Model Building, John Wiley and Sons.
Box, G. E. P., and Jenkins, G. (1976), Time Series Analysis: Forecasting and Control,
Holden-Day.
Chakravarti, Laha, and Roy, (1967). Handbook of Methods of Applied Statistics, Volume
I, John Wiley and Sons, pp. 392-394.
Chambers, John, William Cleveland, Beat Kleiner, and Paul Tukey, (1983), Graphical
Methods for Data Analysis, Wadsworth.
Cleveland, William and Marylyn McGill, Editors (1988), Dynamic Graphics for
Statistics, Wadsworth.
Draper and Smith, (1981). Applied Regression Analysis, 2nd ed., John Wiley and Sons.
Evans, Hastings, and Peacock (2000), Statistical Distributions, 3rd. Ed., John Wiley and
Sons.
Efron and Gong (February 1983), A Leisurely Look at the Bootstrap, the Jackknife, and
Cross Validation, The American Statistician.
Filliben, J. J. (February 1975), The Probability Plot Correlation Coefficient Test for
Normality , Technometrics, pp. 111-117.
Gill, Lisa (April 1997), Summary Analysis: High Performance Ceramics Experiment to
Characterize the Effect of Grinding Parameters on Sintered Reaction Bonded Silicon
Nitride, Reaction Bonded Silicon Nitride, and Sintered Silicon Nitride , presented at the
NIST - Ceramic Machining Consortium, 10th Program Review Meeting, April 10, 1997.
Fuller Jr., E. R., Frieman, S. W., Quinn, J. B., Quinn, G. D., and Carter, W. C. (1994),
Fracture Mechanics Approach to the Design of Glass Aircraft Windows: A Case Study,
SPIE Proceedings, Vol. 2286, (Spciety of Photo-Optical Instrumentation Engineers
(SPIE), Bellingham, WA).
Granger and Hatanaka (1964). Spectral Analysis of Economic Time Series, Princeton
University Press.
Jenkins and Watts, (1968), Spectral Analysis and Its Applications, Holden-Day.
Johnson, Kotz, and Kemp, (1992), Univariate Discrete Distributions, 2nd. Ed., John
Wiley and Sons.
Kuo, Way and Pierson, Marcia Martens, Eds. (1993), Quality Through Engineering
Design", specifically, the article Filliben, Cetinkunt, Yu, and Dommenz (1993),
Exploratory Data Analysis Techniques as Applied to a High-Precision Turning Machine,
Elsevier, New York, pp. 199-223.
McNeil, Donald (1977), Interactive Data Analysis, John Wiley and Sons.
Mosteller, Frederick and Tukey, John (1977), Data Analysis and Regression,
Addison-Wesley.
Neter, Wasserman, and Kunter (1990). Applied Linear Statistical Models, 3rd ed., Irwin.
Nelson, Wayne and Doganaksoy, Necip (1992), A Computer Program POWNOR for
Fitting the Power-Normal and -Lognormal Models to Life or Strength Data from
Specimens of Various Sizes, NISTIR 4760, U.S. Department of Commerce, National
Institute of Standards and Technology.
Pepi, John W., (1994), Failsafe Design of an All BK-7 Glass Aircraft Window, SPIE
Proceedings, Vol. 2286, (Spciety of Photo-Optical Instrumentation Engineers (SPIE),
Bellingham, WA).
The RAND Corporation (1955), A Million Random Digits with 100,000 Normal
Deviates, Free Press.
Stephens, M. A. (1974). EDF Statistics for Goodness of Fit and Some Comparisons,
Journal of the American Statistical Association, Vol. 69, pp. 730-737.
Stephens, M. A. (1979). Tests of Fit for the Logistic Distribution Based on the Empirical
Distribution Function, Biometrika, Vol. 66, pp. 591-595.
Tufte, Edward (1983), The Visual Display of Quantitative Information, Graphics Press.
Velleman, Paul and Hoaglin, David (1981), The ABC's of EDA: Applications, Basics,
and Computing of Exploratory Data Analysis, Duxbury.
1. Characterization [2.1.]
1. What are the issues for characterization? [2.1.1.]
1. Purpose [2.1.1.1.]
2. Reference base [2.1.1.2.]
3. Bias and Accuracy [2.1.1.3.]
4. Variability [2.1.1.4.]
2. What is a check standard? [2.1.2.]
1. Assumptions [2.1.2.1.]
2. Data collection [2.1.2.2.]
3. Analysis [2.1.2.3.]
3. Calibration [2.3.]
1. Issues in calibration [2.3.1.]
1. Reference base [2.3.1.1.]
2. Reference standards [2.3.1.2.]
2. What is artifact (single-point) calibration? [2.3.2.]
3. What are calibration designs? [2.3.3.]
1. Elimination of special types of bias [2.3.3.1.]
1. Left-right (constant instrument) bias [2.3.3.1.1.]
2. Bias caused by instrument drift [2.3.3.1.2.]
2. Solutions to calibration designs [2.3.3.2.]
1. General matrix solutions to calibration designs [2.3.3.2.1.]
3. Uncertainties of calibrated values [2.3.3.3.]
1. Type A evaluations for calibration designs [2.3.3.3.1.]
2. Repeatability and level-2 standard deviations [2.3.3.3.2.]
3. Combination of repeatability and level-2 standard
deviations [2.3.3.3.3.]
4. Calculation of standard deviations for 1,1,1,1 design [2.3.3.3.4.]
5. Type B uncertainty [2.3.3.3.5.]
6. Expanded uncertainties [2.3.3.3.6.]
4. Catalog of calibration designs [2.3.4.]
1. Mass weights [2.3.4.1.]
1. Design for 1,1,1 [2.3.4.1.1.]
2. Design for 1,1,1,1 [2.3.4.1.2.]
3. Design for 1,1,1,1,1 [2.3.4.1.3.]
4. Design for 1,1,1,1,1,1 [2.3.4.1.4.]
5. Design for 2,1,1,1 [2.3.4.1.5.]
6. Design for 2,2,1,1,1 [2.3.4.1.6.]
7. Design for 2,2,2,1,1 [2.3.4.1.7.]
8. Design for 5,2,2,1,1,1 [2.3.4.1.8.]
9. Design for 5,2,2,1,1,1,1 [2.3.4.1.9.]
10. Design for 5,3,2,1,1,1 [2.3.4.1.10.]
11. Design for 5,3,2,1,1,1,1 [2.3.4.1.11.]
7. References [2.7.]
2.1. Characterization
The primary goal of this section is to lay the groundwork for
understanding the measurement process in terms of the errors that affect
the process.
What are the issues for characterization?
1. Purpose
2. Reference base
3. Bias and Accuracy
4. Variability
Scope is limited The techniques in this chapter are intended primarily for ongoing
to ongoing processes. One-time tests and special tests or destructive tests are
processes difficult to characterize. Examples of ongoing processes are:
● Calibration where similar test items are measured on a regular
basis
● Certification where materials are characterized on a regular
basis
● Production where the metrology (tool) errors may be
significant
● Special studies where data can be collected over the life of the
study
2.1.1.1. Purpose
Purpose is The purpose of characterization is to develop an understanding of the
to sources of error in the measurement process and how they affect specific
understand measurement results. This section provides the background for:
and quantify ● identifying sources of error in the measurement process
the effect of
● understanding and quantifying errors in the measurement process
error on
reported ● codifying the effects of these errors on a specific reported value in
values a statement of uncertainty
● bias
● variability
● check standard
Reported The reported value is the measurement result for a particular test item. It
value is a can be:
generic term ● a single measurement
that
● an average of several measurements
identifies the
result that is ● a least-squares prediction from a model
Depiction of
bias and
unbiased
measurements Unbiased measurements relative to the target
Caution Errors that contribute to bias can be present even where all equipment
and standards are properly calibrated and under control. Temperature
probably has the most potential for introducing this type of bias into
the measurements. For example, a constant heat source will introduce
serious errors in dimensional measurements of metal objects.
Temperature affects chemical and electrical measurements as well.
Generally speaking, errors of this type can be identified only by those
who are thoroughly familiar with the measurement technology. The
reader is advised to consult the technical literature and experts in the
field for guidance.
2.1.1.4. Variability
Sources of Variability is the tendency of the measurement process to produce slightly different
time-dependent measurements on the same test item, where conditions of measurement are either stable
variability or vary over time, temperature, operators, etc. In this chapter we consider two sources of
time-dependent variability:
● Short-term variability ascribed to the precision of the instrument
Depiction of
two Process 1 Process 2
measurement Large between-day variability Small between-day variability
processes with
the same
short-term
variability over
six days where
process 1 has
large
between-day
variability and
process 2 has
negligible
between-day
variability
Short-term Short-term errors affect the precision of the instrument. Even very precise instruments
variability exhibit small changes caused by random errors. It is useful to think in terms of
measurements performed with a single instrument over minutes or hours; this is to be
understood, normally, as the time that it takes to complete a measurement sequence.
Terminology Four terms are in common usage to describe short-term phenomena. They are
interchangeable.
1. precision
2. repeatability
3. within-time variability
4. short-term variability
Precision is
The measure of precision is a standard deviation. Good precision implies a small standard
quantified by a
deviation. This standard deviation is called the short-term standard deviation of the
standard
process or the repeatability standard deviation.
deviation
Caution -- With very precise instrumentation, it is not unusual to find that the variability exhibited
long-term by the measurement process from day-to-day often exceeds the precision of the
variability may instrument because of small changes in environmental conditions and handling
be dominant techniques which cannot be controlled or corrected in the measurement process. The
measurement process is not completely characterized until this source of variability is
quantified.
Terminology Three terms are in common usage to describe long-term phenomena. They are
interchangeable.
1. day-to-day variability
2. long-term variability
3. reproducibility
Caution -- The term 'reproducibility' is given very specific definitions in some national and
regarding term international standards. However, the definitions are not always in agreement. Therefore,
'reproducibility' it is used here only in a generic sense to indicate variability across days.
Definitions in We adopt precise definitions and provide data collection and analysis techniques in the
this Handbook sections on check standards and measurement control for estimating:
● Level-1 standard deviation for short-term variability
● Level-2 standard deviation for day-to-day variability
In the section on gauge studies, the concept of variability is extended to include very
long-term measurement variability:
● Level-1 standard deviation for short-term variability
Long-term The measure of long-term variability is the standard deviation of measurements taken
variability is over several days, weeks or months.
quantified by a
standard The simplest method for doing this assessment is by analysis of a check standard
deviation database. The measurements on the check standards are structured to cover a long time
interval and to capture all sources of variation in the measurement process.
Solves the Measurement processes are similar to production processes in that they
difficulty of are continual and are expected to produce identical results (within
sampling the acceptable limits) over time, instruments, operators, and environmental
process conditions. However, it is difficult to sample the output of the
measurement process because, normally, test items change with each
measurement sequence.
Surrogate for Measurements on the check standard, spaced over time at regular
unseen intervals, act as surrogates for measurements that could be made on
measurements test items if sufficient time and resources were available.
2.1.2.1. Assumptions
Case study: Before applying the quality control procedures recommended in
this chapter to check standard data, basic assumptions should be
Resistivity check examined. The basic assumptions underlying the quality control
standard procedures are:
1. The data come from a single statistical distribution.
2. The distribution is a normal distribution.
3. The errors are uncorrelated over time.
Exception One exception to this rule is that there should be at least J = 2 repetitions per day.
Without this redundancy, there is no way to check on the short-term precision of the
measurement system.
Depiction of
schedule for
making check
standard
measurements
with four
repetitions
per day over
K days on the
surface of a
silicon wafer
with the
repetitions
randomized
at various K days - 4 repetitions
positions on
the wafer 2-level design for measurement process
Case study: The values for the check standard should be recorded along with pertinent
Resistivity environmental readings and identifications for all other significant factors. The best
check way to record this information is in one file with one line or row (on a spreadsheet)
standard for of information in fixed fields for each check standard measurement. A list of typical
measurements entries follows.
on silicon 1. Identification for check standard
wafers 2. Date
3. Identification for the measurement design (if applicable)
4. Identification for the instrument
5. Check standard value
6. Short-term standard deviation from J repetitions
7. Degrees of freedom
8. Operator identification
9. Environmental readings (if pertinent)
2.1.2.3. Analysis
Short-term An analysis of the check standard data is the basis for quantifying
or level-1 random errors in the measurement process -- particularly
standard time-dependent errors.
deviations
from J Given that we have a database of check standard measurements as
repetitions described in data collection where
represents the jth repetition on the kth day, the mean for the kth day is
.
This standard deviation can be interpreted as quantifying the basic
precision of the instrumentation used in the measurement process.
Process The level-2 standard deviation of the check standard is appropriate for
(level-2) representing the process variability. It is computed with v = K - 1
standard degrees of freedom as:
deviation
where
Use in The check standard data and standard deviations that are described in
quality this section are used for controlling two aspects of a measurement
control process:
1. Control of short-term variability
2. Control of bias and long-term variability
Case study: For an example, see the case study for resistivity where several check
Resistivity standards were measured J = 6 times per day over several days.
check
standard
Shewhart The Shewhart control chart has the advantage of being intuitive and easy to
Chart is easy implement. It is characterized by a center line and symmetric upper and
to implement lower control limits. The chart is good for detecting large changes but not
for quickly detecting small changes (of the order of one-half to one standard
deviation) in the process.
Depiction of In the simplistic illustration of a Shewhart control chart shown below, the
Shewhart measurements are within the control limits with the exception of one
control chart measurement which exceeds the upper control limit.
EWMA Chart The EWMA control chart (exponentially weighted moving average) is more
is better for difficult to implement but should be considered if the goal is quick detection
detecting of small changes. The decision process for the EWMA chart is based on an
small changes exponentially decreasing (over time) function of prior measurements on the
check standard while the decision process for the Shewhart chart is based on
the current measurement only.
Example of In the EWMA control chart below, the red dots represent the measurements.
EWMA Chart Control is exercised via the exponentially weighted moving average (shown
as the curved line) which, in this case, is approaching its upper control limit.
Artifacts for The check standard artifacts for controlling the bias or long-term variability
process of the process must be of the same type and geometry as items that are
control must measured in the workload. The artifacts must be stable and available to the
be stable and measurement process on a continuing basis. Usually, one artifact is
available sufficient. It can be:
Case study: 1. An individual item drawn at random from the workload
Resistivity 2. A specific item reserved by the laboratory for the purpose.
Baseline is the The baseline for the control chart is the accepted value, an average of
average from the historical check standard values. A minimum of 100 check
historical data standard values is required to establish an accepted value.
Caution - The upper (UCL) and lower (LCL) control limits are:
control limits
are computed UCL = Accepted value + k*process standard
from the deviation
process
standard
deviation --
LCL = Accepted value - k*process standard deviation
not from where the process standard deviation is the standard deviation
rational computed from the check standard database.
subsets
Control The EWMA control chart can be made sensitive to small changes or a
mechanism gradual drift in the process by the choice of the weighting factor, .A
for EWMA weighting factor of 0.2 - 0.3 is usually suggested for this purpose
(Hunter), and 0.15 is also a popular choice.
Limits for the The target or center line for the control chart is the average of historical
control chart data. The upper (UCL) and lower (LCL) limits are
Procedure The implementation of the EWMA control chart is the same as for any
for other type of control procedure. The procedure is built on the
implementing assumption that the "good" historical data are representative of the
the EWMA in-control process, with future data from the same process tested for
control chart agreement with the historical data. To start the procedure, a target
(average) and process standard deviation are computed from historical
check standard data. Then the procedure enters the monitoring stage
with the EWMA statistics computed and tested against the control
limits. The EWMA statistics are weighted averages, and thus their
standard deviations are smaller than the standard deviations of the raw
data and the corresponding control limits are narrower than the control
limits for the Shewhart individual observations chart.
Depiction of
check
standard
measurements
with J = 4
repetitions
per day on the
surface of a
silicon wafer
over K days
where the
repetitions
are
randomized
over position K days - 4 repetitions
on the wafer 2-level design for measurements on a check standard
The check The check standard value for the kth day is
standard
value is
defined as an
average of
short-term
repetitions
Caution Check standard measurements should be structured in the same way as values
reported on the test items. For example, if the reported values are averages of two
measurements made within 5 minutes of each other, the check standard values
should be averages of the two measurements made in the same manner.
Database Averages and short-term standard deviations computed from J repetitions should be
recorded in a file along with identifications for all significant factors. The best way
Case study: to record this information is to use one file with one line (row in a spreadsheet) of
Resistivity information in fixed fields for each group. A list of typical entries follows:
1. Month
2. Day
3. Year
4. Check standard identification
5. Identification for the measurement design (if applicable)
6. Instrument identification
7. Check standard value
8. Repeatability (short-term) standard deviation from J repetitions
9. Degrees of freedom
10. Operator identification
11. Environmental readings (if pertinent)
Shewhart
control chart
of
measurements
of kilogram
check
standard
showing
outliers and a
shift in the
process that
occurred after
1985
EWMA chart In the EWMA control chart below, the control data after 1985 are shown in green, and the EWMA
for statistics are shown as black dots superimposed on the raw data. The EWMA statistics, and not the
measurements raw data, are of interest in looking for out-of-control signals. Because the EWMA statistic is a
on kilogram weighted average, it has a smaller standard deviation than a single control measurement, and,
therefore, the EWMA control limits are narrower than the limits for the Shewhart control chart shown
check
above.
standard
showing
multiple
violations of
the control
limits for the
EWMA
statistics
Measurements The control strategy is based on the predictability of future measurements from historical data. Each
that exceed new check standard measurement is plotted on the control chart in real time. These values are
the control expected to fall within the control limits if the process has not changed. Measurements that exceed the
limits require control limits are probably out-of-control and require remedial action. Possible causes of
action out-of-control signals need to be understood when developing strategies for dealing with outliers.
Signs of The control chart should be viewed in its entirety on a regular basis] to identify drift or shift in the
significant process. In the Shewhart control chart shown above, only a few points exceed the control limits. The
trends or small, but significant, shift in the process that occurred after 1985 can only be identified by examining
shifts the plot of control measurements over time. A re-analysis of the kilogram check standard data shows
that the control limits for the Shewhart control chart should be updated based on the the data after
1985. In the EWMA control chart, multiple violations of the control limits occur after 1986. In the
calibration environment, the incidence of several violations should alert the control engineer that a
shift in the process has occurred, possibly because of damage or change in the value of a reference
standard, and the process requires review.
Check for 3. Examine the patterns of recent data. If the process is gradually
drift drifting out of control because of degradation in instrumentation
or artifacts, then:
❍ Instruments may need to be repaired
Reevaluate 4. Reestablish the process value and control limits from more
recent data if the measurement process cannot be brought back
into control.
Artifacts - The artifacts must be of the same type and geometry as items that are
Case study: measured in the workload, such as:
Resistivity 1. Items from the workload
2. A single check standard chosen for this purpose
3. A collection of artifacts set aside for this specific purpose
Example of Only the upper control limit, UCL, is of interest for detecting
control chart degradation in the instrument. As long as the short-term standard
for a mass deviations fall within the upper control limit established from historical
balance data, there is reason for confidence that the precision of the instrument
has not degraded (i.e., common cause variations).
where 1 is the population value for the s1 defined above and 2 is the
population value for the standard deviation of the current values being
tested. Generally, s1 is based on sufficient historical data that it is
reasonable to make the assumption that 1 is a "known" value.
The upper control limit above is then derived based on the standard
F-test for equal standard deviations. Justification and details of this
derivation are given in Cameron and Hailes (1974).
Run software Dataplot can compute the value of the F-statistic. For the case where
macro for alpha = 0.05; J = 6; K = 6, the commands
computing
the F factor
let alpha = 0.05
let alphau = 1 - alpha
let j = 6
let k = 6
let v1 = j-1
let v2 = k*(v1)
let F = fppf(alphau, v1, v2)
return the following value:
THE COMPUTED VALUE OF THE CONSTANT F =
0.2533555E+01
The short-term (repeatability) standard deviation for the kth occasion is:
Pooled The repeatability standard deviations are pooled over the K occasions to
standard obtain an estimate with K(J - 1) degrees of freedom of the level-1
deviation standard deviation
Note: The same notation is used for the repeatability standard deviation
whether it is based on one set of measurements or pooled over several
sets.
Each new standard deviation is Each new short-term standard deviation based on J measurements is plotted on the control
monitored on the control chart chart; points that exceed the control limits probably indicate lack of statistical control. Drift
over time indicates degradation of the instrument. Points out of control require remedial
action, and possible causes of out of control signals need to be understood when developing
strategies for dealing with outliers.
TIME IN YEARS
.
Notice that the numerator degrees of
freedom, v1 = J'- 1, changes but the
denominator degrees of freedom, v2
= K(J - 1), remains the same.
Assign new With high precision processes, for which the uncertainty must be
value to test guaranteed, new values should be assigned to the test items based on
item new measurement data.
Check for Examine the patterns of recent standard deviations. If the process is
degradation gradually drifting out of control because of degradation in
instrumentation or artifacts, instruments may need to be repaired or
replaced.
2.3. Calibration
The purpose of this section is to outline the procedures for calibrating
artifacts and instruments while guaranteeing the 'goodness' of the
calibration results. Calibration is a measurement process that assigns
values to the property of an artifact or to the response of an instrument
relative to reference standards or to a designated measurement process.
The purpose of calibration is to eliminate or reduce bias in the user's
measurement system relative to the reference base. The calibration
procedure compares an "unknown" or test item(s) or instrument with
reference standards according to a specific algorithm.
What are the issues for calibration?
1. Artifact or instrument calibration
2. Reference base
3. Reference standard(s)
Base and The base units of measurement in the Le Systeme International d'Unites
derived units (SI) are (Taylor):
of ● kilogram - mass
measurement
● meter - length
● second - time
● ampere - electric current
● kelvin - thermodynamic temperature
● mole - amount of substance
● candela - luminous intensity
These units are maintained by the Bureau International des Poids et
Mesures in Paris. Local reference bases for these units and SI derived
units such as:
● pascal - pressure
● newton - force
● hertz - frequency
● ohm - resistance
,
the difference between the test item and the reference is estimated by
,
and the value of the test item is reported as
● length standards
● angle blocks
● indexing tables
Assumptions The assumptions that are necessary for working with calibration
for calibration designs are that:
designs include ● Random errors associated with the measurements are
demands on independent.
the quality of
● All measurements come from a distribution with the same
the artifacts
standard deviation.
● Reference standards and test items respond to the measuring
environment in the same manner.
● Handling procedures are consistent from item to item.
● Reference standards and test items are stable during the time of
measurement.
● Bias is canceled by taking the difference between
measurements on the test item and the reference standard.
Important The restraint is the known value of the reference standard or, for
concept - designs with two or more reference standards, the restraint is the
Restraint summation of the values of the reference standards.
Check Designs listed in this Handbook have provision for a check standard
standard in a in each series of measurements. The check standard is usually an
design artifact, of the same nominal size, type, and quality as the items to be
calibrated. Check standards are used for:
● Controlling the calibration process
Elimination The difference between the test and the reference can be estimated
of left-right without bias only by taking the difference between the two
bias requires measurements shown above where P cancels in the differencing so
two
that
measurements
in reverse
direction .
The value of The test item, X, can then be estimated without bias by
the test item
depends on
the known
value of the
reference
standard, R* and P can be estimated by
Calibration This type of scheme is called left-right balanced and the principle is
designs that extended to create a catalog of left-right balanced designs for
are left-right intercomparing reference standards among themselves. These designs
balanced are appropriate ONLY for comparing reference standards in the same
environment, or enclosure, and are not appropriate for comparing, say,
across standard voltage cells in two boxes.
1. Left-right balanced design for a group of 3 artifacts
2. Left-right balanced design for a group of 4 artifacts
3. Left-right balanced design for a group of 5 artifacts
4. Left-right balanced design for a group of 6 artifacts
Estimates of The drift-free difference between the test and the reference is
drift-free estimated by
difference and
size of drift
Measurements The use of the tables shown in the catalog are illustrated for three artifacts; namely,
for the 1,1,1 a reference standard with known value R* and a check standard and a test item with
design unknown values. All artifacts are of the same nominal size. The design is referred
to as a 1,1,1 design for
● n = 3 difference measurements
● m = 3 artifacts
Convention The convention for showing the measurement sequence is shown below. Nominal
for showing values are underlined in the first line showing that this design is appropriate for
the comparing three items of the same nominal size such as three one-kilogram
measurement weights. The reference standard is the first artifact, the check standard is the second,
sequence and and the test item is the third.
identifying the
reference and
check
standards 1 1 1
Y(1) = + -
Y(2) = + -
Y(3) = + -
Restraint +
Check standard +
OBSERVATIONS 1 1 1
Y(1) 0 -2 -1
Y(2) 0 -1 -2
Y(3) 0 1 -1
R* 3 3 3
Solutions for For example, the solution for the reference standard is shown under the first
individual column; for the check standard under the second column; and for the test item
items from the under the third column. Notice that the estimate for the reference standard is
table above guaranteed to be R*, regardless of the measurement results, because of the restraint
that is imposed on the design. The estimates are as follows:
Convention The standard deviations are computed from two tables of factors as shown below.
for showing The standard deviations for combinations of items include appropriate covariance
standard terms.
deviations for
individual
items and FACTORS FOR REPEATABILITY STANDARD DEVIATIONS
combinations
of items WT FACTOR
K1 1 1 1
1 0.0000 +
1 0.8165 +
1 0.8165 +
2 1.4142 + +
1 0.8165 +
WT FACTOR
K2 1 1 1
1 0.0000 +
1 1.4142 +
1 1.4142 +
2 2.4495 + +
1 1.4142 +
Unifying The standard deviation for each item is computed using the unifying equation:
equation
Process In order to apply these equations, we need an estimate of the standard deviation,
standard sdays, that describes day-to-day changes in the measurement process. This standard
deviations deviation is in turn derived from the level-2 standard deviation, s2, for the check
must be standard. This standard deviation is estimated from historical data on the check
known from standard; it can be negligible, in which case the calculations are simplified.
historical
data The repeatability standard deviation s1, is estimated from historical data, usually
from data of several designs.
Steps in The steps in computing the standard deviation for a test item are:
computing ● Compute the repeatability standard deviation from the design or historical
standard data.
deviations
● Compute the standard deviation of the check standard from historical data.
● Locate the factors, K1 and K2 for the check standard; for the 1,1,1 design
the factors are 0.8165 and 1.4142, respectively, where the check standard
entries are last in the tables.
● Apply the unifying equation to the check standard to estimate the standard
deviation for days. Notice that the standard deviation of the check standard is
the same as the level-2 standard deviation, s2, that is referred to on some
pages. The equation for the between-days standard deviation from the
unifying equation is
.
Thus, for the example above
.
● This is the number that is entered into the NIST mass calibration software as
the between-time standard deviation. If you are using this software, this is the
only computation that you need to make because the standard deviations for
the test items are computed automatically by the software.
● If the computation under the radical sign gives a negative number, set
sdays=0. (This is possible and indicates that there is no contribution to
uncertainty from day-to-day effects.)
● For completeness, the computations of the standard deviations for the test
item and for the sum of the test and the check standard using the appropriate
factors are shown below.
● multiplication
● inversion
Y(2) = + -
Y(3) = + -
Matrix The (mxn) design matrix X is constructed by replacing the pluses (+), minues (-) and blanks
algebra for with the entries 1, -1, and 0 respectively.
solving a
design The (mxm) matrix of normal equations, X'X, is formed and augmented by the restraint vector
to form an (m+1)x(m+1) matrix, A:
where Q is an mxm matrix that, when multiplied by s2, yields the usual variance-covariance
matrix.
Estimates of The least-squares estimates for the values of the individual artifacts are contained in the (mx1)
values of matrix, B, where
individual
artifacts
where Q is the upper left element of the A-1 matrix shown above. The structure of the
individual estimates is contained in the QX' matrix; i.e. the estimate for the ith item can be
computed from XQ and Y by
The first two columns represent the two NIST kilograms while the third column represents the
customers kilogram (i.e., the kilogram being calibrated).
The measurements obtained, i.e., the Y matrix, are
The measurements are the differences between two measurements, as specified by the design
matrix, measured in grams. That is, Y(1) is the difference in measurement between NIST
kilogram one and NIST kilogram two, Y(2) is the difference in measurement between NIST
kilogram one and the customer kilogram, and Y(3) is the difference in measurement between
NIST kilogram two and the customer kilogram.
If there are three weights with known values for weights one and two, then
r=[1 1 0]
Thus
and so
where
The process standard deviation, which is a measure of the overall precision of the (NIST) mass
calibrarion process,
is the residual standard deviation from the design, and sdays is the standard deviation for days,
which can only be estimated from check standard measurements.
Example We continue the example started above. Since n = 3 and m = 3, the formula reduces to:
and
s1 = 0.002887
The next step is to compute the standard deviation of item 3 (the customers kilogram), that is
sitem3. We start by substitituting the values for X and Q and computing D
and
and
The latter includes, but is not limited to, uncertainties from sources
that are not replicated in the calibration design such as uncertainties of
values assigned to reference standards.
Uncertainties Uncertainties associated with calibrated values for test items from
for test items designs require calculations that are specific to the individual designs.
The steps involved are outlined below.
Example of To contrast the simple model with the more complicated model, a
the two measurement of the difference between X, the test item, with unknown
models for a and yet to be determined value, X*, and a reference standard, R, with
design for known value, R*, and the reverse measurement are shown below.
calibrating
test item Model (1) takes into account only instrument imprecision so that:
using 1 (1)
reference
standard
with the error terms random errors that come from the imprecision of
the measuring instrument.
Model (2) allows for both instrument imprecision and level-2 effects
such that:
(2)
where the delta terms explain small changes in the values of the
artifacts that occur over time. For both models, the value of the test
item is estimated as
Standard For model (l), the standard deviation of the test item is
deviations
from both
models
v=n-m+1
for n difference measurements and m items. Typically the
degrees of freedom are very small. For two differences
measurements on a reference standard and test item, the degrees
of freedom is v=1.
Level-2 The level-2 standard deviation cannot be estimated from the data of the
standard calibration design. It cannot generally be estimated from repeated
deviation is designs involving the test items. The best mechanism for capturing the
estimated day-to-day effects is a check standard, which is treated as a test item
from check and included in each calibration design. Values of the check standard,
standard estimated over time from the calibration design, are used to estimate
measurements the standard deviation.
Assumptions The check standard value must be stable over time, and the
measurements must be in statistical control for this procedure to be
valid. For this purpose, it is necessary to keep a historical record of
values for a given check standard, and these values should be kept by
instrument and by design.
where
Derivations Tables for estimating standard deviations for all test items are reported
require along with the solutions for all designs in the catalog. The use of the
matrix tables for estimating the standard deviations for test items is illustrated
algebra for the 1,1,1,1 design. Matrix equations can be used for deriving
estimates for designs that are not in the catalog.
The check standard for each design is either an additional test item in
the design, other than the test items that are submitted for calibration,
or it is a construction, such as the difference between two reference
standards as estimated by the design.
Check
standard is
the
difference OBSERVATIONS 1 1 1 1
between the
2 reference Y(1) + -
standards
Y(2) + -
Y(3) + -
Y(4) + -
Y(5) + -
Y(6) + -
RESTRAINT + +
CHECK STANDARD + -
DEGREES OF FREEDOM = 3
SOLUTION MATRIX
DIVISOR = 8
OBSERVATIONS 1 1 1 1
Y(1) 2 -2 0 0
Y(2) 1 -1 -3 -1
Y(3) 1 -1 -1 -3
Y(4) -1 1 -3 -1
Y(5) -1 1 -1 -3
Y(6) 0 0 2 -2
R* 4 4 4 4
Explanation The solution matrix gives values for the test items of
of solution
matrix
Factors for
computing
contributions
of FACTORS FOR REPEATABILITY STANDARD
repeatability DEVIATIONS
and level-2 WT FACTOR
standard
deviations to K1 1 1 1 1
uncertainty 1 0.3536 +
1 0.3536 +
1 0.6124 +
1 0.6124 +
0 0.7071 + -
1 1.2247 +
0 1.4141 + -
The first table shows factors for computing the contribution of the repeatability
standard deviation to the total uncertainty. The second table shows factors for
computing the contribution of the between-day standard deviation to the uncertainty.
Notice that the check standard is the last entry in each table.
Standard The steps in computing the standard deviation for a test item are:
deviations ● Compute the repeatability standard deviation from historical data.
are
computed ● Compute the standard deviation of the check standard from historical data.
using the ● Locate the factors, K1 and K2, for the check standard.
factors from ● Compute the between-day variance (using the unifying equation for the check
the tables standard). For this example,
with the
unifying
equation
.
● If this variance estimate is negative, set = 0. (This is possible and
indicates that there is no contribution to uncertainty from day-to-day effects.)
● Locate the factors, K1 and K2, for the test items, and compute the standard
deviations using the unifying equation. For this example,
and
Situation Usually, a reference standard and test item are of the same nominal size
where the and the calibration relies on measuring the small difference between the
test is two; for example, the intercomparison of a reference kilogram compared
different in with a test kilogram. The calibration may also consist of an
size from the intercomparison of the reference with a summation of artifacts where
reference the summation is of the same nominal size as the reference; for example,
a reference kilogram compared with 500 g + 300 g + 200 g test weights.
Type B The type B uncertainty that accrues to the test artifact from the
uncertainty uncertainty of the reference standard is proportional to their nominal
for the test sizes; i.e.,
artifact
where k is either the critical value from the t table for degrees of freedom v or k is set
equal to 2.
Problem of the The calculation of degrees of freedom, v, can be a problem. Sometimes it can be
degrees of freedom computed using the Welch-Satterthwaite approximation and the structure of the
uncertainty of the test item. Degrees of freedom for the standard deviation of the
restraint is assumed to be infinite. The coefficients in the Welch-Satterthwaite formula
must all be positive for the approximation to be reliable.
Standard deviation For the 1,1,1,1 design, the standard deviation of the test items can be rewritten by
for test item from substituting in the equation
the 1,1,1,1 design
so that the degrees of freedom depends only on the degrees of freedom in the standard
deviation of the check standard. This device may not work satisfactorily for all designs.
Standard To complete the calculation shown in the equation at the top of the page, the nominal
uncertainty from the value of the test item (which is equal to 1) is divided by the nominal value of the
1,1,1,1 design restraint (which is also equal to 1), and the result is squared. Thus, the standard
uncertainty is
where n - 1 is the degrees of freedom associated with the check standard uncertainty.
Notice that the standard deviation of the restraint drops out of the calculation because
of an infinite degrees of freedom.
Designs Designs are listed by traditional subject area although many of the designs are
listed in this appropriate generally for intercomparisons of artifact standards.
catalog ● Designs for mass weights
Information: Given
Design ● n = number of difference measurements
● m = number of artifacts (reference standards + test items) to be calibrated
Solution
the following information is shown for each design:
Factors for
● Design matrix -- (n x m)
computing
standard ● Vector that identifies standards in the restraint -- (1 x m)
Convention Nominal sizes of standards and test items are shown at the top of the design. Pluses (+)
for showing indicate items that are measured together; and minuses (-) indicate items are not
the measured together. The difference measurements are constructed from the design of
measurement pluses and minuses. For example, a 1,1,1 design for one reference standard and two test
sequence items of the same nominal size with three measurements is shown below:
1 1 1
Y(1) = + -
Y(2) = + -
Y(3) = + -
Solution
matrix The cross-product of the column of difference measurements and R* with a column
from the solution matrix, divided by the named divisor, gives the value for an individual
Example and item. For example,
interpretation
Solution matrix
Divisor = 3
1 1 1
Y(1) 0 -2 -1
Y(2) 0 -1 -2
Y(3) 0 +1 -1
R* +3 +3 +3
implies that estimates for the restraint and the two test items are:
Interpretation The factors in this table provide information on precision. The repeatability standard
of table of deviation, , is multiplied by the appropriate factor to obtain the standard deviation for
factors an individual item or combination of items. For example,
Sum Factor 1 1 1
1 0.0000 +
1 0.8166 +
1 0.8166 +
2 1.4142 + +
Depiction of
a design
with three
series for
calibrating
a 5,3,2,1
weight set
with weights
between 1
kg and 10 g
First series The calibrations start with a comparison of the one kilogram test weight
using with the reference kilograms (see the graphic above). The 1,1,1,1 design
1,1,1,1 requires two kilogram reference standards with known values, R1* and
design R2*. The fourth kilogram in this design is actually a summation of the
500, 300, 200 g weights which becomes the restraint in the next series.
The restraint for the first series is the known average mass of the
reference kilograms,
2nd series The second series is a 5,3,2,1,1,1 design where the restraint over the
using 500g, 300g and 200g weights comes from the value assigned to the
5,3,2,1,1,1 summation in the first series; i.e.,
design
Other The calibration sequence can also start with a 1,1,1 design. This design
starting has the disadvantage that it does not have provision for a check
points standard.
Better A better choice is a 1,1,1,1,1 design which allows for two reference
choice of kilograms and a kilogram check standard which occupies the 4th
design position among the weights. This is preferable to the 1,1,1,1 design but
has the disadvantage of requiring the laboratory to maintain three
kilogram standards.
Important The solutions are only applicable for the restraints as shown.
detail
OBSERVATIONS 1 1 1
Y(1) + -
Y(2) + -
Y(3) + -
RESTRAINT +
CHECK STANDARD +
DEGREES OF FREEDOM = 1
SOLUTION MATRIX
DIVISOR = 3
OBSERVATIONS 1 1 1
Y(1) 0 -2 -1
Y(2) 0 -1 -2
Y(3) 0 1 -1
R* 3 3 3
Design 1,1,1,1
OBSERVATIONS 1 1 1 1
Y(1) + -
Y(2) + -
Y(3) + -
Y(4) + -
Y(5) + -
Y(6) + -
RESTRAINT + +
CHECK STANDARD + -
DEGREES OF FREEDOM = 3
SOLUTION MATRIX
DIVISOR = 8
OBSERVATIONS 1 1 1 1
Y(1) 2 -2 0 0
Y(2) 1 -1 -3 -1
Y(3) 1 -1 -1 -3
Y(4) -1 1 -3 -1
Y(5) -1 1 -1 -3
Y(6) 0 0 2 -2
R* 4 4 4 4
OBSERVATIONS 1 1 1 1 1
OBSERVATIONS 1 1 1 1 1
Y(1) + -
Y(1) + - Y(2) + -
Y(2) + - Y(3) + -
Y(3) + - Y(4) + -
Y(4) + - Y(5) + -
Y(5) + - Y(6) + -
Y(6) + - Y(7) + -
Y(7) + - Y(8) + -
Y(8) + - Y(9) + -
Y(9) + - Y(10) + -
Y(10) + -
RESTRAINT + +
RESTRAINT + +
CHECK STANDARD +
CHECK STANDARD + -
DEGREES OF FREEDOM = 6
DEGREES OF FREEDOM = 6
SOLUTION MATRIX
SOLUTION MATRIX DIVISOR = 10
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3413.htm (1 of 3) [5/1/2006 10:11:46 AM]
2.3.4.1.3. Design for 1,1,1,1,1
DIVISOR = 10
OBSERVATIONS 1 1 1 1 1
OBSERVATIONS 1 1 1 1 1
Y(1) 2 -2 0 0 0
Y(1) 2 -2 0 0 0 Y(2) 1 -1 -3 -1 -1
Y(2) 1 -1 -3 -1 -1 Y(3) 1 -1 -1 -3 -1
Y(3) 1 -1 -1 -3 -1 Y(4) 1 -1 -1 -1 -3
Y(4) 1 -1 -1 -1 -3 Y(5) -1 1 -3 -1 -1
Y(5) -1 1 -3 -1 -1 Y(6) -1 1 -1 -3 -1
Y(6) -1 1 -1 -3 -1 Y(7) -1 1 -1 -1 -3
Y(7) -1 1 -1 -1 -3 Y(8) 0 0 2 -2 0
Y(8) 0 0 2 -2 0 Y(9) 0 0 2 0 -2
Y(9) 0 0 2 0 -2 Y(10) 0 0 0 2 -2
Y(10) 0 0 0 2 -2 R* 5 5 5 5 5
R* 5 5 5 5 5
R* = sum of two reference standards
R* = sum of two reference standards
Design 1,1,1,1,1,1
OBSERVATIONS 1 1 1 1 1 1
X(1) + -
X(2) + -
X(3) + -
X(4) + -
X(5) + -
X(6) + -
X(7) + -
X(8) + -
X(9) + -
X(10) + -
X(11) + -
X(12) + -
X(13) + -
X(14) + -
X(15) + -
RESTRAINT + +
CHECK STANDARD +
DEGREES OF FREEDOM = 10
SOLUTION MATRIX
DIVISOR = 8
OBSERVATIONS 1 1 1 1 1 1
Y(1) 1 -1 0 0 0 0
Y(2) 1 0 -1 0 0 0
Y(3) 1 0 0 -1 0 0
Y(4) 1 0 0 0 -1 0
Y(5) 2 1 1 1 1 0
Y(6) 0 1 -1 0 0 0
Y(7) 0 1 0 -1 0 0
Y(8) 0 1 0 0 -1 0
Y(9) 1 2 1 1 1 0
Y(10) 0 0 1 -1 0 0
Y(11) 0 0 1 0 -1 0
Y(12) 1 1 2 1 1 0
Y(13) 0 0 0 1 -1 0
Y(14) 1 1 1 2 1 0
Y(15) 1 1 1 1 2 0
R* 6 6 6 6 6 6
1 1.2247 +
1 1.2247 +
1 1.2247 +
2 2.0000 + +
3 2.7386 + + +
4 3.4641 + + + +
1 1.2247 +
Design 2,1,1,1
OBSERVATIONS 2 1 1 1
Y(1) + - -
Y(2) + - -
Y(3) + - -
Y(4) + -
Y(5) + -
Y(6) + -
RESTRAINT +
CHECK STANDARD +
DEGREES OF FREEDOM = 3
SOLUTION MATRIX
DIVISOR = 4
OBSERVATIONS 2 1 1 1
Y(1) 0 -1 0 -1
Y(2) 0 0 -1 -1
Y(3) 0 -1 -1 0
Y(4) 0 1 0 -1
Y(5) 0 1 -1 0
Y(6) 0 0 1 -1
R* 4 2 2 2
Design 2,2,1,1,1
OBSERVATIONS 2 2 1 1 1
Y(1) + - - +
Y(2) + - - +
Y(3) + - + -
Y(4) + -
Y(5) + - -
Y(6) + - -
Y(7) + - -
Y(8) + - -
Y(9) + - -
Y(10) + - -
RESTRAINT + + +
CHECK STANDARD +
DEGREES OF FREEDOM = 6
SOLUTION MATRIX
DIVISOR = 275
OBSERVATIONS 2 2 1 1 1
Y(1) 47 -3 -44 66 11
Y(2) 25 -25 0 -55 55
Y(3) 3 -47 44 -11 -66
Y(4) 25 -25 0 0 0
Y(5) 29 4 -33 -33 22
Y(6) 29 4 -33 22 -33
Y(7) 7 -18 11 -44 -44
Y(8) 4 29 -33 -33 22
Y(9) 4 29 -33 22 -33
Y(10) -18 7 11 -44 -44
R* 110 110 55 55 55
Design 2,2,2,1,1
OBSERVATIONS 2 2 2 1 1
Y(1) + -
Y(2) + -
Y(3) + -
Y(4) + - -
Y(5) + - -
Y(6) + - -
Y(7) + -
RESTRAINT + +
CHECK STANDARD +
DEGREES OF FREEDOM = 3
SOLUTION MATRIX
DIVISOR = 16
OBSERVATIONS 2 2 2 1 1
Y(1) 4 -4 0 0 0
Y(2) 2 -2 -6 -1 -1
Y(3) -2 2 -6 -1 -1
Y(4) 2 -2 -2 -3 -3
Y(5) -2 2 -2 -3 -3
Y(6) 0 0 4 -2 -2
Y(7) 0 0 0 8 -8
R* 8 8 8 4 4
Design 5,2,2,1,1,1
OBSERVATIONS 5 2 2 1 1 1
Y(1) + - - - - +
Y(2) + - - - + -
Y(3) + - - + - -
Y(4) + - - - -
Y(5) + - - - -
Y(6) + - + -
Y(7) + - - +
Y(8) + - + -
RESTRAINT + + + +
CHECK STANDARD +
DEGREES OF FREEDOM = 3
SOLUTION MATRIX
DIVISOR = 70
OBSERVATIONS 5 2 2 1 1 1
Y(1) 15 -8 -8 1 1 21
Y(2) 15 -8 -8 1 21 1
Y(3) 5 -12 -12 19 -1 -1
Y(4) 0 2 12 -14 -14 -14
Y(5) 0 12 2 -14 -14 -14
Y(6) -5 8 -12 9 -11 -1
Y(7) 5 12 -8 -9 1 11
Y(8) 0 10 -10 0 10 -10
R* 35 14 14 7 7 7
Design 5,2,2,1,1,1,1
OBSERVATIONS 5 2 2 1 1 1 1
Y(1) + - - -
Y(2) + - - -
Y(3) + - - -
Y(4) + - - -
Y(5) + + - - -
Y(6) + + - - -
Y(7) + + - - - -
Y(8) + -
Y(9) + -
Y(10) + -
RESTRAINT + + + +
CHECK STANDARD +
DEGREES OF FREEDOM = 4
SOLUTION MATRIX
DIVISOR = 60
OBSERVATIONS 5 2 2 1 1 1
1
Y(1) 12 0 0 -12 0 0
0
Y(2) 6 -4 -4 2 -12 3
3
Y(3) 6 -4 -4 2 3 -12
3
Y(4) 6 -4 -4 2 3 3
-12
Y(5) -6 28 -32 10 -6 -6
-6
Y(6) -6 -32 28 10 -6 -6
-6
Y(7) 6 8 8 -22 -6 -6
-6
Y(8) 0 0 0 0 15 -15
0
Y(9) 0 0 0 0 15 0
-15
Y(10) 0 0 0 0 0 15
-15
R* 30 12 12 6 6 6
6
WT FACTOR
5 2 2 1 1 1 1
5 1.0000 +
2 0.8718 +
2 0.8718 +
1 0.9165 +
1 1.0198 +
1 1.0198 +
1 1.0198 +
2 1.4697 + +
3 1.8330 + + +
1 1.0198 +
OBSERVATIONS 5 3 2 1 1 1
Y(1) + - - + -
Y(2) + - - + -
Y(3) + - - - +
Y(4) + - -
Y(5) + - - - -
Y(6) + - + - -
Y(7) + - - + -
Y(8) + - - - +
Y(9) + - -
Y(10) + - -
Y(11) + - -
RESTRAINT + + +
CHECK STANDARD +
DEGREES OF FREEDOM = 6
SOLUTION MATRIX
DIVISOR = 920
OBSERVATIONS 5 3 2 1 1 1
Design 5,3,2,1,1,1,1
OBSERVATIONS 5 3 2 1 1 1 1
Y(1) + - -
Y(2) + - - -
Y(3) + - - -
Y(4) + - - - -
Y(5) + - - - -
Y(6) + - - - -
Y(7) + - - - -
Y(8) + - -
Y(9) + - -
Y(10) + - -
Y(11) + - -
RESTRAINT + + +
CHECK STANDARD +
DEGREES OF FREEDOM = 5
SOLUTION MATRIX
DIVISOR = 40
OBSERVATIONS 5 3 2 1 1 1
Y(1) 20 -4 -16 12 12 12
12
Y(2) 0 -4 4 -8 -8 2
2
Y(3) 0 -4 4 2 2 -8
-8
Y(4) 0 0 0 -5 -5 -10
10
Y(5) 0 0 0 -5 -5 10
-10
Y(6) 0 0 0 -10 10 -5
-5
Y(7) 0 0 0 10 -10 -5
-5
Y(8) 0 4 -4 -12 8 3
3
Y(9) 0 4 -4 8 -12 3
3
Y(10) 0 4 -4 3 3 -12
8
Y(11) 0 4 -4 3 3 8
-12
R* 20 12 8 4 4 4
4
1 0.6557 +
OBSERVATIONS 5 3 2 2 1 1 1
Y(1) + - -
Y(2) + - -
Y(3) + - - -
Y(4) + - - -
Y(5) + - - -
Y(6) + - -
Y(7) + - -
Y(8) + - -
Y(9) + - - -
Y(10) + -
Y(11) + -
Y(12) - +
RESTRAINT + + +
CHECK STANDARDS +
DEGREES OF FREEDOM = 6
SOLUTION MATRIX
DIVISOR = 10
OBSERVATIONS 5 3 2 2 1 1
Y(1) 2 0 -2 2 0 0
0
Y(2) 0 -6 6 -4 -2 -2
-2
Y(3) 1 1 -2 0 -1 1
1
Y(4) 1 1 -2 0 1 -1
1
Y(5) 1 1 -2 0 1 1
-1
Y(6) -1 1 0 -2 -1 1
1
Y(7) -1 1 0 -2 1 -1
1
Y(8) -1 1 0 -2 1 1
-1
Y(9) 0 -2 2 2 -4 -4
-4
Y(10) 0 0 0 0 2 -2
0
Y(11) 0 0 0 0 0 2
-2
Y(12) 0 0 0 0 -2 0
2
R* 5 3 2 2 1 1
1
OBSERVATIONS 5 4 4 3 2 2 1 1
Y(1) + + - - - - -
Y(2) + + - - - - -
Y(3) + - -
Y(4) + - -
Y(5) + - -
Y(6) + - -
Y(7) + - - -
Y(8) + - - -
Y(9) + - -
Y(10) + - -
Y(11) + - -
Y(12) + - -
RESTRAINT + +
CHECK STANDARD + -
DEGREES OF FREEDOM = 5
SOLUTION MATRIX
DIVISOR = 916
OBSERVATIONS 5 4 4 3 2 2
1 1
11 2.5981 + + + +
15 3.3153 + + + + +
20 4.4809 + + + + + +
0 1.1950 + -
Design 5,5,2,2,1,1,1,1
OBSERVATIONS 5 5 2 2 1 1 1 1
Y(1) + - - -
Y(2) + - - -
Y(3) + - - -
Y(4) + - - -
Y(5) + + - - - -
Y(6) + - -
Y(7) + - -
Y(8) + - -
Y(9) + - -
Y(10) + -
Y(11) + -
RESTRAINT + +
CHECK STANDARD +
DEGREES OF FREEDOM = 4
SOLUTION MATRIX
DIVISOR = 120
OBSERVATIONS 5 5 2 2 1 1
1 1
1 0.5370 +
OBSERVATIONS 5 5 3 2 1 1 1
Y(1) + - -
Y(2) + - -
Y(3) + - - - -
Y(4) + - - - -
Y(5) + - - -
Y(6) + - - -
Y(7) + - - -
Y(8) + - - -
Y(9) + - - -
Y(10) + - - -
RESTRAINT + +
CHECK STANDARD +
DEGREES OF FREEDOM = 4
SOLUTION MATRIX
DIVISOR = 10
OBSERVATIONS 5 5 3 2 1 1
1
Y(1) 1 -1 -2 -3 1 1
1
Y(2) -1 1 -2 -3 1 1
1
Y(3) 1 -1 2 -2 -1 -1
-1
Y(4) -1 1 2 -2 -1 -1
-1
Y(5) 1 -1 -1 1 -2 -2
3
Y(6) 1 -1 -1 1 -2 3
-2
Y(7) 1 -1 -1 1 3 -2
-2
Y(8) -1 1 -1 1 -2 -2
3
Y(9) -1 1 -1 1 -2 3
-2
Y(10) -1 1 -1 1 3 -2
-2
R* 5 5 3 2 1 1
1
5 0.7071 +
5 0.7071 +
3 1.0863 +
2 1.0392 +
1 1.0100 +
1 1.0100 +
1 1.0100 +
3 1.4765 + +
6 1.9287 + + +
11 2.0543 + + + +
16 1.9287 + + + + +
1 1.0100 +
OBSERVATIONS 1 1 1 1 1 1 1 1
Y(1) + -
Y(2) + -
Y(3) + -
Y(4) + -
Y(5) + -
Y(6) + -
Y(7) + -
Y(8) + -
Y(9) + -
Y(10) + -
Y(11) + -
Y(12) + -
RESTRAINT + +
CHECK STANDARD +
DEGREES OF FREEDOM = 5
SOLUTION MATRIX
DIVISOR = 12
OBSERVATIONS 1 1 1 1 1 1
1 1
Y(1) 1 -1 -6 0 0 0
0 0
Y(2) 1 -1 0 -6 0 0
0 0
Y(3) 1 -1 0 0 -6 0
0 0
Y(4) 1 -1 0 0 0 -6
0 0
Y(5) 1 -1 0 0 0 0
-6 0
Y(6) 1 -1 0 0 0 0
0 -6
Y(7) -1 1 -6 0 0 0
0 0
Y(8) -1 1 0 -6 0 0
0 0
Y(9) -1 1 0 0 -6 0
0 0
Y(10) -1 1 0 0 0 -6
0 0
Y(11) -1 1 0 0 0 0
-6 0
Y(12) -1 1 0 0 0 0
0 -6
R* 6 6 6 6 6 6
6 6
WT K1 1 1 1 1 1 1 1 1
1 0.2887 +
1 0.2887 +
1 0.7071 +
1 0.7071 +
1 0.7071 +
1 0.7071 +
1 0.7071 +
1 0.7071 +
2 1.0000 + +
3 1.2247 + + +
4 1.4142 + + + +
5 1.5811 + + + + +
6 1.7321 + + + + + +
1 0.7071 +
WT K2 1 1 1 1 1 1 1 1
1 0.7071 +
1 0.7071 +
1 1.2247 +
1 1.2247 +
1 1.2247 +
1 1.2247 +
1 1.2247 +
1 1.2247 +
2 2.0000 + +
3 2.7386 + + +
4 3.4641 + + + +
5 4.1833 + + + + +
6 4.8990 + + + + + +
1 1.2247 +
OBSERVATIONS 3 2 1 1 1
Y(1) + - -
Y(2) + - -
Y(3) + - -
Y(4) + - - -
Y(5) + - -
Y(6) + - -
Y(7) + - -
Y(8) + -
Y(9) + -
Y(10) + -
RESTRAINT + +
CHECK STANDARD +
DEGREES OF FREEDOM = 6
SOLUTION MATRIX
DIVISOR = 25
OBSERVATIONS 3 2 1 1 1
Y(1) 3 -3 -4 1 1
Y(2) 3 -3 1 -4 1
Y(3) 3 -3 1 1 -4
Y(4) 1 -1 -3 -3 -3
Y(5) -2 2 -4 -4 1
Y(6) -2 2 -4 1 -4
Y(7) -2 2 1 -4 -4
Y(8) 0 0 5 -5 0
Y(9) 0 0 5 0 -5
Y(10) 0 0 0 5 -5
R* 15 10 5 5 5
WT K1 3 2 1 1 1
3 0.2530 +
2 0.2530 +
1 0.4195 +
1 0.4195 +
1 0.4195 +
2 0.5514 + +
3 0.6197 + + +
1 0.4195 +
WT K2 3 2 1 1 1
3 0.7211 +
2 0.7211 +
1 1.0392 +
1 1.0392 +
1 1.0392 +
2 1.5232 + +
3 1.9287 + + +
1 1.0392 +
OBSERVATIONS 1 2 2 1 1
Y(1) + -
Y(2) + -
Y(3) + - +
Y(4) + - +
Y(5) + - +
Y(6) + - +
Y(7) + -
RESTRAINT +
CHECK STANDARD +
DEGREES OF FREEDOM = 3
SOLUTION MATRIX
DIVISOR = 24
OBSERVATIONS 1 2 2 1 1
Y(3) 0 -9 -3 -4 4
Y(4) 0 -3 -9 4 -4
Y(5) 0 -9 -3 4 -4
Y(6) 0 -3 -9 -4 4
Y(7) 0 6 -6 0 0
R* 24 48 48 24 24
WT K1 1 2 2 1 1
2 0.9354 +
2 0.9354 +
1 0.8165 +
1 0.8165 +
4 1.7321 + +
5 2.3805 + + +
6 3.0000 + + + +
1 0.8165 +
WT K2 1 2 2 1 1
2 2.2361 +
2 2.2361 +
1 1.4142 +
1 1.4142 +
4 4.2426 + +
5 5.2915 + + +
6 6.3246 + + + +
1 1.4142 +
Elimination of The designs in this catalog are constructed so that the solutions are
linear drift immune to linear drift if the measurements are equally spaced over
time. The size of the drift is the average of the n difference
measurements. Keeping track of drift from design to design is
useful because a marked change from its usual range of values may
indicate a problem with the measurement system.
Caution Because of the large number of gauge blocks that are being
intercompared and the need to eliminate drift, the Doiron designs
are not completely balanced with respect to the test blocks.
Therefore, the standard deviations are not equal for all blocks. If all
the blocks are being calibrated for use in one facility, it is easiest to
quote the largest of the standard deviations for all blocks rather
than try to maintain a separate record on each block.
Properties of Historical designs for gauge blocks (Cameron and Hailes) work on
designs that use 2 the assumption that the difference measurements are contaminated
master blocks by linear drift. This assumption is more restrictive and covers the
case of drift in successive measurements but produces fewer
designs. The Cameron/Hailes designs meeting this criterion allow
for:
● two reference (master) blocks, R1 and R2
Important The check standards for the designs in this section are not artifact
concept - check standards but constructions from the design. The value of one
standard master block or the average of two master blocks is the restraint for
the design, and values for the masters, R1 and R2, are estimated
from a set of measurements taken according to the design. The
check standard value is the difference between the estimates, R1
and R2. Measurement control is exercised by comparing the current
value of the check standard with its historical average.
OBSERVATIONS 1 1 1
Y(1) + -
Y(2) - +
Y(3) + -
Y(4) - +
Y(5) - +
Y(6) + -
RESTRAINT +
CHECK STANDARD +
DEGREES OF FREEDOM = 4
SOLUTION MATRIX
DIVISOR = 6
OBSERVATIONS 1 1 1
Y(1) 0 -2 -1
Y(2) 0 1 2
Y(3) 0 1 -1
Y(4) 0 2 1
Y(5) 0 -1 1
Y(6) 0 -1 -2
R* 6 6 6
OBSERVATIONS 1 1 1
Y(1) + -
Y(2) - +
Y(3) + -
Y(4) - +
Y(5) - +
Y(6) + -
Y(7) - +
Y(8) - +
Y(9) + -
RESTRAINT +
CHECK STANDARD +
DEGREES OF FREEDOM = 7
SOLUTION MATRIX
DIVISOR = 9
OBSERVATIONS 1 1 1
Y(1) 0 -2 -1
Y(2) 0 -1 1
Y(3) 0 -1 -2
Y(4) 0 2 1
Y(5) 0 1 2
Y(6) 0 1 -1
Y(7) 0 2 1
Y(8) 0 -1 1
Y(9) 0 -1 -2
R(1) 9 9 9
OBSERVATIONS 1 1 1 1
Y(1) + -
Y(2) + -
Y(3) - +
Y(4) + -
Y(5) - +
Y(6) - +
Y(7) + -
Y(8) - +
RESTRAINT +
CHECK STANDARD +
DEGREES OF FREEDOM = 5
SOLUTION MATRIX
DIVISOR = 8
OBSERVATIONS 1 1 1 1
Y(1) 0 -3 -2 -1
Y(2) 0 1 2 -1
Y(3) 0 1 2 3
Y(4) 0 1 -2 -1
Y(5) 0 3 2 1
Y(6) 0 -1 -2 1
Y(7) 0 -1 -2 -3
Y(8) 0 -1 2 1
R* 8 8 8 8
OBSERVATIONS 1 1 1 1
Y(1) + -
Y(2) + +
Y(3) + -
Y(4) - +
Y(5) + -
Y(6) - +
Y(7) + -
Y(8) + -
Y(9) + -
Y(10) - +
Y(11) - +
Y(12) - +
RESTRAINT +
CHECK STANDARD +
DEGREES OF FREEDOM = 9
SOLUTION MATRIX
DIVISOR = 8
OBSERVATIONS 1 1 1 1
Y(1) 0 -2 -1 -1
Y(2) 0 1 1 2
Y(3) 0 0 1 -1
Y(4) 0 2 1 1
Y(5) 0 1 -1 0
Y(6) 0 -1 0 1
Y(7) 0 -1 -2 -1
Y(8) 0 1 0 -1
Y(9) 0 -1 -1 -2
Y(10) 0 -1 1 0
Y(11) 0 1 2 1
Y(12) 0 0 -1 1
R* 6 6 6 4
OBSERVATIONS 1 1 1 1 1
Y(1) + -
Y(2) - +
Y(3) + -
Y(4) - +
Y(5) - +
Y(6) + -
Y(7) - +
Y(8) + -
Y(9) - +
Y(10) + -
RESTRAINT +
CHECK STANDARD +
DEGREES OF FREEDOM = 6
SOLUTION MATRIX
DIVISOR = 90
OBSERVATIONS 1 1 1 1 1
Y(4) 0 -20 5 5 15
Y(5) 0 0 -18 18 0
Y(6) 0 -10 -11 -29 -15
Y(7) 0 10 29 11 15
Y(8) 0 -20 14 -4 -30
Y(9) 0 10 11 29 15
Y(10) 0 20 -5 -5 -15
R* 90 90 90 90 90
OBSERVATIONS 1 1 1 1 1 1
Y(1) + -
Y(2) - +
Y(3) - +
Y(4) - +
Y(5) - +
Y(6) + -
Y(7) + -
Y(8) + -
Y(9) + -
Y(10) - +
Y(11) + -
Y(12) - +
RESTRAINT +
CHECK STANDARD +
DEGREES OF FREEDOM = 7
SOLUTION MATRIX
DIVISOR = 360
OBSERVATIONS 1 1 1 1 1 1
OBSERVATIONS 1 1 1 1 1 1 1
Y(1) + -
Y(2) - +
Y(3) + -
Y(4) + -
Y(5) + -
Y(6) - +
Y(7) + -
Y(8) + -
Y(9) + -
Y(10) - +
Y(11) - +
Y(12) - +
Y(13) - +
Y(14) - +
RESTRAINT +
CHECK STANDARD +
DEGREES OF FREEDOM = 8
PARAMETER VALUES
DIVISOR = 1015
OBSERVATIONS 1 1 1 1 1 1
1 0.6325 +
OBSERVATIONS 1 1 1 1 1 1 1 1
Y(1) + -
Y(2) + -
Y(3) - +
Y(4) - +
Y(5) + -
Y(6) - +
Y(7) - +
Y(8) - +
Y(9) - +
Y(10) - +
Y(11) + -
Y(12) - +
Y(13) - +
Y(14) - +
Y(15) + -
Y(16) + -
RESTRAINT +
CHECK STANDARD +
DEGREES OF FREEDOM = 9
SOLUTION MATRIX
DIVISOR = 2852
OBSERVATIONS 1 1 1 1 1 1
1 1
1 0.0000 +
1 0.6986 +
1 0.7518 +
1 0.5787 +
1 0.6996 +
1 0.8313 +
1 0.7262 +
1 0.7534 +
1 0.6986 +
OBSERVATIONS 1 1 1 1 1 1 1 1 1
Y(1) + -
Y(2) - +
Y(3) + -
Y(4) - +
Y(5) + -
Y(6) - +
Y(7) + -
Y(8) + -
Y(9) - +
Y(10) + -
Y(11) - +
Y(12) - +
Y(13) - +
Y(14) + -
Y(15) - +
Y(16) + -
Y(17) - +
Y(18) + -
RESTRAINT +
CHECK STANDARD +
DEGREES OF FREEDOM = 10
SOLUTION MATRIX
DIVISOR = 8247
OBSERVATIONS 1 1 1 1 1 1
1 1 1
OBSERVATIONS 1 1 1 1 1 1 1 1 1 1
Y(1) + -
Y(2) + -
Y(3) - +
Y(4) + -
Y(5) + -
Y(6) + -
Y(7) + -
Y(8) - +
Y(9) + -
Y(10) + -
Y(11) + -
Y(12) + -
Y(13) + -
Y(14) - +
Y(15) + -
Y(16) + -
Y(17) - +
Y(18) + -
Y(19) - +
Y(20) - +
RESTRAINT +
CHECK STANDARD +
DEGREES OF FREEDOM = 11
SOLUTION MATRIX
DIVISOR = 33360
OBSERVATIONS 1 1 1 1 1 1
1 1 1
OBSERVATIONS 1 1 1 1 1 1 1 1 1 1 1
Y(1) + -
Y(2) + -
Y(3) + -
Y(4) + -
Y(5) + -
Y(6) + -
Y(7) - +
Y(8) - +
Y(9) + -
Y(10) + -
Y(11) + -
Y(12) - +
Y(13) + -
Y(14) - +
Y(15) + -
Y(16) + -
Y(17) + -
Y(18) - +
Y(19) + -
Y(20) - +
Y(21) - +
Y(22) + -
RESTRAINT +
CHECK STANDARD +
DEGREES OF FREEDOM = 12
SOLUTION MATRIX
DIVISOR = 55858
OBSERVATIONS 1 1 1 1 1 1
1 1 1 1
Designs for A calibration design for comparing standard cells can be constructed to
eliminating be left-right balanced so that:
bias
● A constant bias, P, does not contaminate the estimates for the
individual cells.
Test cells Calibration designs for assigning values to test cells in a common
environment on the basis of comparisons with reference cells with
known values are shown below. The designs in this catalog are left-right
balanced.
1. Design for 4 test cells and 4 reference cells
2. Design for 8 test cells and 8 reference cells
Zeners Increasingly, zeners are replacing saturated standard cells as artifacts for
maintaining and disseminating the volt. Values are assigned to test
zeners, based on a group of reference zeners, using calibration designs.
1. Design for 4 reference zeners and 2 test zeners
2. Design for 4 reference zeners and 3 test zeners
Standard Designs for comparing standard resistors that are used for maintaining
resistors and disseminating the ohm are listed in this section.
1. Design for 3 reference resistors and 1 test resistor
2. Design for 4 reference resistors and 1 test resistor
CELLS
OBSERVATIONS 1 1 1
Y(1) + -
Y(2) + -
Y(3) + -
Y(4) - +
Y(5) - +
Y(6) - +
RESTRAINT + + +
DEGREES OF FREEDOM = 3
SOLUTION MATRIX
DIVISOR = 6
OBSERVATIONS 1 1 1 P
Y(1) 1 -1 0 1
Y(2) 1 0 -1 1
Y(3) 0 1 -1 1
Y(4) -1 1 0 1
Y(5) -1 0 1 1
Y(6) 0 -1 1 1
R* 2 2 2 0
P = LEFT-RIGHT BIAS
OBSERVATIONS 1 1 1 1
Y(1) + -
Y(2) + -
Y(3) + -
Y(4) + -
Y(5) + -
Y(6) - +
Y(7) - +
Y(8) - +
Y(9) - +
Y(10) - +
Y(11) - +
Y(12) + -
RESTRAINT + + + +
DEGREES OF FREEDOM = 8
SOLUTION MATRIX
DIVISOR = 8
OBSERVATIONS 1 1 1 1 P
Y(1) 1 -1 0 0 1
Y(2) 1 0 -1 0 1
Y(3) 0 1 -1 0 1
Y(4) 0 1 0 -1 1
Y(5) 0 0 1 -1 1
Y(6) -1 0 1 0 1
Y(7) 0 -1 1 0 1
Y(8) 0 -1 0 1 1
Y(9) -1 0 0 1 1
Y(10) 0 0 -1 1 1
Y(11) -1 1 0 0 1
Y(12) 1 0 0 -1 1
R* 2 2 2 2 0
P = LEFT-RIGHT BIAS
OBSERVATIONS 1 1 1 1 1
Y(1) + -
Y(2) + -
Y(3) + -
Y(4) + -
Y(5) + -
Y(6) + -
Y(7) + -
Y(8) - +
Y(9) - +
Y(10) - +
RESTRAINT + + + + +
DEGREES OF FREEDOM = 5
SOLUTION MATRIX
DIVISOR = 5
OBSERVATIONS 1 1 1 1 1 P
Y(1) 1 -1 0 0 0 1
Y(2) 1 0 -1 0 0 1
Y(3) 0 1 -1 0 0 1
Y(4) 0 1 0 -1 0 1
Y(5) 0 0 1 -1 0 1
Y(6) 0 0 1 0 -1 1
Y(7) 0 0 0 1 -1 1
Y(8) -1 0 0 1 0 1
Y(9) -1 0 0 0 1 1
Y(10) 0 -1 0 0 1 1
R* 1 1 1 1 1 0
P = LEFT-RIGHT BIAS
CELLS
OBSERVATIONS 1 1 1 1 1 1
Y(1) + -
Y(2) + -
Y(3) + -
Y(4) + -
Y(5) + -
Y(6) + -
Y(7) + -
Y(8) + -
Y(9) + -
Y(10) - +
Y(11) - +
Y(12) - +
Y(13) + -
Y(14) + -
Y(15) + -
RESTRAINT + + + + + +
DEGREES OF FREEDOM = 9
SOLUTION MATRIX
DIVISOR = 6
OBSERVATIONS 1 1 1 1 1 1
P
Y(1) 1 -1 0 0 0 0
1
Y(2) 1 0 -1 0 0 0
1
Y(3) 0 1 -1 0 0 0
1
Y(4) 0 1 0 -1 0 0
1
Y(5) 0 0 1 -1 0 0
1
Y(6) 0 0 1 0 -1 0
1
Y(7) 0 0 0 1 -1 0
1
Y(8) 0 0 0 1 0 -1
1
Y(9) 0 0 0 0 1 -1
1
Y(10) -1 0 0 0 1 0
1
Y(11) -1 0 0 0 0 1
1
Y(12) 0 -1 0 0 0 1
1
Y(13) 1 0 0 -1 0 0
1
Y(14) 0 1 0 0 -1 0
1
Y(15) 0 0 1 0 0 -1
1
R* 1 1 1 1 1 1
0
P = LEFT-RIGHT BIAS
1 0.3727 +
1 0.3727 +
1 0.3727 +
1 0.3727 +
OBSERVATIONS 1 1 1 1 1 1 1 1
Y(1) + -
Y(2) + -
Y(3) + -
Y(4) + -
Y(5) + -
Y(6) + -
Y(7) + -
Y(8) + -
Y(9) - +
Y(10) - +
Y(11) - +
Y(12) - +
Y(13) - +
Y(14) - +
Y(15) - +
Y(16) - +
RESTRAINT + + + +
DEGREES OF FREEDOM = 8
SOLUTION MATRIX
DIVISOR = 16
OBSERVATIONS 1 1 1 1 1 1
1 1 P
Y(1) 3 -1 -1 -1 -4 0
0 0 1
Y(2) 3 -1 -1 -1 0 0
-4 0 1
Y(3) -1 -1 3 -1 0 0
-4 0 1
Y(4) -1 -1 3 -1 -4 0
0 0 1
Y(5) -1 3 -1 -1 0 -4
0 0 1
Y(6) -1 3 -1 -1 0 0
0 -4 1
Y(7) -1 -1 -1 3 0 0
0 -4 1
Y(8) -1 -1 -1 3 0 -4
0 0 1
Y(9) -3 1 1 1 0 4
0 0 1
Y(10) -3 1 1 1 0 0
0 4 1
Y(11) 1 1 -3 1 0 0
0 4 1
Y(12) 1 1 -3 1 0 4
0 0 1
Y(13) 1 -3 1 1 4 0
0 0 1
Y(14) 1 -3 1 1 0 0
4 0 1
Y(15) 1 1 1 -3 0 0
4 0 1
Y(16) 1 1 1 -3 4 0
0 0 1
R* 4 4 4 4 4 4
4 4 0
1 1 1 1 1 1 1 1
1 0.4330 +
1 0.4330 +
1 0.4330 +
1 0.4330 +
1 0.5000 +
1 0.5000 +
1 0.5000 +
1 0.5000 +
Y(1) + -
Y(2) - +
Y(3) - +
Y(4) +
-
Y(5) +
-
Y(6) -
+
Y(7) -
+
Y(8) +
-
Y(9) + -
Y(10) + -
Y(11) - +
Y(12) -
+
Y(13) +
-
Y(14) +
-
Y(15) -
+
Y(16) -
+
RESTRAINT + + +
+ + + + +
DEGREES OF FREEDOM = 0
Y(1) 8 4 0 -4 -6 6
2 -2
Y(2) -8 4 0 -4 -6 6
2 -2
Y(3) 4 -8 -4 0 2 6
-6 -2
Y(4) 4 8 -4 0 2 6
-6 -2
Y(5) 0 -4 8 4 2 -2
-6 6
Y(6) 0 -4 -8 4 2 -2
-6 6
Y(7) -4 0 4 -8 -6 -2
2 6
Y(8) -4 0 4 8 -6 -2
2 6
Y(9) -6 -2 2 6 8 -4
0 4
Y(10) -6 6 2 -2 -4 8
4 0
Y(11) -6 6 2 -2 -4 -8
4 0
Y(12) 2 6 -6 -2 0 4
-8 -4
Y(13) 2 6 -6 -2 0 4
8 -4
Y(14) 2 -2 -6 6 4 0
-4 8
Y(15) 2 -2 -6 6 4 0
-4 -8
Y(16) -6 -2 2 6 -8 -4
0 4
R 2 2 2 2 2 2
2 2
Y(1) -7 7 5 3 1 -1
-3 -5 1
Y(2) -7 7 5 3 1 -1
-3 -5 1
Y(3) 3 5 7 -7 -5 -3
-1 1 1
Y(4) 3 5 7 -7 -5 -3
-1 1 1
Y(5) 1 -1 -3 -5 -7 7
5 3 1
Y(6) 1 -1 -3 -5 -7 7
5 3 1
Y(7) -5 -3 -1 1 3 5
7 -7 1
Y(8) -5 -3 -1 1 3 5
7 -7 1
Y(9) -7 -5 -3 -1 1 3
5 7 1
Y(10) -5 -7 7 5 3 1
-1 -3 1
Y(11) -5 -7 7 5 3 1
-1 -3 1
Y(12) 1 3 5 7 -7 -5
-3 -1 1
Y(13) 1 3 5 7 -7 -5
-3 -1 1
Y(14) 3 1 -1 -3 -5 -7
7 5 1
Y(15) 3 1 -1 -3 -5 -7
7 5 1
Y(16) -7 -5 -3 -1 1 3
5 7 1
R* 2 2 2 2 2 2
2 2 0
ZENERS
OBSERVATIONS 1 1 1 1 1 1
Y(1) + -
Y(2) + -
Y(3) + -
Y(4) + -
Y(5) + -
Y(6) + -
Y(7) + -
Y(8) + -
Y(9) - +
Y(10) - +
Y(11) - +
Y(12) - +
Y(13) - +
Y(14) - +
Y(15) - +
Y(16) - +
RESTRAINT + + + +
CHECK STANDARD + -
DEGREES OF FREEDOM = 10
SOLUTION MATRIX
DIVISOR = 16
OBSERVATIONS 1 1 1 1 1 1
P
Y(1) 3 -1 -1 -1 -2 0
1
Y(2) 3 -1 -1 -1 0 -2
1
Y(3) -1 3 -1 -1 -2 0
1
Y(4) -1 3 -1 -1 0 -2
1
Y(5) -1 -1 3 -1 -2 0
1
Y(6) -1 -1 3 -1 0 -2
1
Y(7) -1 -1 -1 3 -2 0
1
Y(8) -1 -1 -1 3 0 -2
1
Y(9) 1 1 1 -3 2 0
1
Y(10) 1 1 1 -3 0 2
1
Y(11) 1 1 -3 1 2 0
1
Y(12) 1 1 -3 1 0 2
1
Y(13) 1 -3 1 1 2 0
1
Y(14) 1 -3 1 1 0 2
1
Y(15) -3 1 1 1 2 0
1
Y(16) -3 1 1 1 0 2
1
R* 4 4 4 4 4 4
0
P = LEFT-RIGHT EFFECT
ZENERS
OBSERVATIONS 1 1 1 1 1 1 1
Y(1) - +
Y(2) - +
Y(3) + -
Y(4) + -
Y(5) + -
Y(6) + -
Y(7) - +
Y(8) - +
Y(9) - +
Y(10) - +
Y(11) - +
Y(12) - +
Y(13) + -
Y(14) + -
Y(15) + -
Y(16) + -
Y(17) + -
Y(18) - +
RESTRAINT + + + +
CHECK STANDARD + -
DEGREES OF FREEDOM = 11
SOLUTION MATRIX
DIVISOR = 1260
OBSERVATIONS 1 1 1 1 1 1
1 P
0 70
R* 315 315 315 315 315 315
315 0
P = left-right effect
V K1 1 1 1 1 1 1 1
1 0.5000 +
1 0.5000 +
1 0.5000 +
2 0.7071 + +
3 0.8660 + + +
0 0.5578 + -
OBSERVATIONS 1 1 1 1
Y(1) + -
Y(2) + -
Y(3) + -
Y(4) - +
Y(5) - +
Y(6) - +
RESTRAINT + + +
DEGREES OF FREEDOM = 3
SOLUTION MATRIX
DIVISOR = 6
OBSERVATIONS 1 1 1 1
Y(1) 1 -2 1 1
Y(2) 1 1 -2 1
Y(3) 0 0 0 -3
Y(4) 0 0 0 3
Y(5) -1 -1 2 -1
Y(6) -1 2 -1 -1
R 2 2 2 2
OBSERVATIONS 1 1 1 1 1
Y(1) + -
Y(2) + -
Y(3) + -
Y(4) + -
Y(5) - +
Y(6) - +
Y(7) - +
Y(8) - +
RESTRAINT + + + +
DEGREES OF FREEDOM = 4
SOLUTION MATRIX
DIVISOR = 8
OBSERVATIONS 1 1 1 1 1
Y(1) 3 -1 -1 -1 -1
Y(2) -1 3 -1 -1 -1
Y(3) -1 -1 3 -1 -1
Y(4) -1 -1 -1 3 -1
Y(5) 1 1 1 -3 1
Y(6) 1 1 -3 1 1
Y(7) 1 -3 1 1 1
Y(8) -3 1 1 1 1
R 2 2 2 2 2
The diagram
shows the
trace and Y,
the distance
from the
spindle center
to the trace at
the angle.
A least
squares circle
fit to data at
equally spaced
angles gives
estimates of P
- R, the
noncircularity,
where R =
radius of the
circle and P =
distance from
the center of
the circle to
the trace.
Low precision Some measurements of roundness do not require a high level of precision, such as
measurements measurements on cylinders, spheres, and ring gages where roundness is not of
primary importance. For this purpose, a single trace is made of the workpiece.
Weakness of The weakness of this method is that the deviations contain both the spindle error
single trace and the workpiece error, and these two errors cannot be separated with the single
method trace. Because the spindle error is usually small and within known limits, its effect
can be ignored except when the most precise measurements are needed.
High precision High precision measurements of roundness are appropriate where an object, such
measurements as a hemisphere, is intended to be used primarily as a roundness standard.
Measurement The measurement sequence involves making multiple traces of the roundness
method standard where the standard is rotated between traces. Least-squares analysis of the
resulting measurements enables the noncircularity of the spindle to be separated
from the profile of the standard.
Choice of A synopsis of the measurement method and the estimation technique are given in
measurement this chapter for:
method ● Single-trace method
● Multiple-trace method
The reader is encouraged to obtain a copy of the publication on roundness (Reeve)
for a more complete description of the measurement method and analysis.
Single trace For this purpose, a single trace covering exactly 360° is made of the
method workpiece and measurements at angles of the distance between
the center of the spindle and the trace, are made at
equally spaced angles. A least-squares circle fit to the data gives the
following estimators of the parameters of the circle.
Noncircularity The deviation of the trace from the circle at angle , which defines
of workpiece
the noncircularity of the workpiece, is estimated by:
Weakness of The weakness of this method is that the deviations contain both the
single trace spindle error and the workpiece error, and these two errors cannot be
method separated with the single trace. Because the spindle error is usually
small and within known limits, its effect can be ignored except when
the most precise measurements are needed.
Method of n The number of traces that are made on the workpiece is arbitrary but
traces should not be less than four. The workpiece is centered as well as
possible under the spindle. The mark on the workpiece which denotes
the zero angular position is aligned with the zero position of the
spindle as shown in the graph. A trace is made with the workpiece in
this position. The workpiece is then rotated clockwise by 360/n
degrees and another trace is made. This process is continued until n
traces have been recorded.
Definition of The deviation from the least squares circle (LSC) of the workpiece at
terms relating the position is .
to distances
to the least The deviation of the spindle from its LSC at the position is .
squares circle
Terms For the jth graph, let the three parameters that define the LSC be given
relating to by
parameters of
least squares
circle
defining the radius R, a, and b as shown in the graph. In an idealized
measurement system these parameters would be constant for all j. In
reality, each rotation of the workpiece causes it to shift a small amount
vertically and horizontally. To account for this shift, separate
parameters are needed for each trace.
Correction Let be the observed distance (in polar graph units) from the center
for
obstruction to of the jth graph to the point on the curve that corresponds to the
stylus position of the spindle. If K is the magnification factor of the
instrument in microinches/polar graph unit and is the angle between
the lever arm of the stylus and the tangent to the workpiece at the point
of contact (which normally can be set to zero if there is no
obstruction), the transformed observations to be used in the estimation
equations are:
where
Finally, the standard deviations of the profile estimators are given by:
Computation The computation of the residual standard deviation of the fit requires,
of standard first, the computation of the predicted values,
deviation
Restraint The solution to the calibration design depends on the known value of a
reference block, which is compared with the test blocks. The reference block
is designated as block 1 for the purpose of this discussion.
Check It is suggested that block 2 be reserved for a check standard that is maintained
standard in the laboratory for quality control purposes.
Series of For all of the designs, the measurements are made in groups of seven starting
measurements with the measurements of blocks in the following order: 2-3-2-1-2-4-2.
for calibrating Schematically, the calibration design is completed by counter-clockwise
4, 5, and 6 rotation of the test blocks about the reference block, one-at-a-time, with 7
angle blocks readings for each series reduced to 3 difference measurements. For n angle
simultaneously blocks (including the reference block), this amounts to n - 1 series of 7
readings. The series for 4, 5, and 6 angle blocks are shown below.
Measurements
for 4 angle Series 1: 2-3-2-1-2-4-2
blocks Series 2: 4-2-4-1-4-3-4
Series 3: 3-4-3-1-3-2-3
Measurements
for 5 angle
blocks (see
diagram)
Series 1: 2-3-2-1-2-4-2
Series 2: 5-2-5-1-5-3-5
Series 3: 4-5-4-1-4-2-4
Series 4: 3-4-3-1-3-5-3
Measurements
for 6 angle Series 1: 2-3-2-1-2-4-2
blocks Series 2: 6-2-6-1-6-3-6
Series 3: 5-6-5-1-5-2-5
Series 4: 4-5-4-1-4-6-4
Series 5: 3-4-3-1-3-5-3
Equations for The equations explaining the seven measurements for the first series in terms
the of the errors in the measurement system are:
measurements
in the first Z11 = B + X1 + error11
series showing
error sources
Z12 = B + X2 + d + error12
Z13 = B + X3 + 2d + error13
Z14 = B + X4 + 3d + error14
Z15 = B + X5 + 4d + error15
Z16 = B + X6 + 5d + error16
Z17 = B + X7 + 6d + error17
with B a bias associated with the instrument, d is a linear drift factor, X is the
value of the angle block to be determined; and the error terms relate to
random errors of measurement.
Calibration The check block, C, is measured before and after each test block, and the
procedure difference measurements (which are not the same as the difference
depends on measurements for calibrations of mass weights, gage blocks, etc.) are
difference constructed to take advantage of this situation. Thus, the 7 readings are
measurements reduced to 3 difference measurements for the first series as follows:
For all series, there are 3(n - 1) difference measurements, with the first
subscript in the equations above referring to the series number. The difference
measurements are free of drift and instrument bias.
Design matrix As an example, the design matrix for n = 4 angle blocks is shown below.
1 1 1 1
0 1 -1 0
-1 1 0 0
0 1 0 -1
0 -1 0 1
-1 0 0 1
0 0 -1 1
0 0 1 -1
-1 0 1 0
0 -1 1 0
The design matrix is shown with the solution matrix for identification
purposes only because the least-squares solution is weighted (Reeve) to
account for the fact that test blocks are measured twice as many times as the
reference block. The weight matrix is not shown.
Solutions to Solutions to the angle block designs are shown on the following pages. The
the calibration solution matrix and factors for the repeatability standard deviation are to be
designs interpreted as explained in solutions to calibration designs . As an example,
measurements the solution for the design for n=4 angle blocks is as follows:
The solution for the reference standard is shown under the first column of the
solution matrix; for the check standard under the second column; for the first
test block under the third column; and for the second test block under the
fourth column. Notice that the estimate for the reference block is guaranteed
to be R*, regardless of the measurement results, because of the restraint that
is imposed on the design. Specifically,
Calibrations The calibration series is run with the blocks all face "up" and is then repeated
can be run for with the blocks all face "down", and the results averaged. The difference
top and between the two series can be large compared to the repeatability standard
bottom faces deviation, in which case a between-series component of variability must be
of blocks included in the calculation of the standard deviation of the reported average.
Calculation of For n blocks, the differences between the values for the blocks measured in
standard the top ( denoted by "t") and bottom (denoted by "b") positions are denoted
deviations by:
when the
blocks are
measured in The standard deviation of the average (for each block) is calculated from
two these differences to be:
orientations
Standard If the blocks are measured in only one orientation, there is no way to estimate
deviations the between-series component of variability and the standard deviation for the
when the value of each block is computed as
blocks are
measured in stest = K1s1
only one where K1 is shown under "Factors for computing repeatability standard
orientation
deviations" for each design and is the repeatability standard deviation as
estimated from the design. Because this standard deviation may seriously
underestimate the uncertainty, a better approach is to estimate the standard
deviation from the data on the check standard over time. An expanded
uncertainty is computed according to the ISO guidelines.
DESIGN MATRIX
1 1 1 1
Y(1) 0 1 -1 0
Y(2) -1 1 0 0
Y(3) 0 1 0 -1
Y(4) 0 -1 0 1
Y(5) -1 0 0 1
Y(6) 0 0 -1 1
Y(7) 0 0 1 -1
Y(8) -1 0 1 0
Y(9) 0 -1 1 0
REFERENCE +
CHECK STANDARD +
DEGREES OF FREEDOM = 6
SOLUTION MATRIX
DIVISOR = 24
OBSERVATIONS 1 1 1 1
1 1 1 1 1
0 1 -1 0 0
-1 1 0 0 0
0 1 0 -1 0
0 -1 0 0 1
-1 0 0 0 1
0 0 -1 0 1
0 0 0 1 -1
-1 0 0 1 0
0 -1 0 1 0
0 0 1 -1 0
-1 0 1 0 0
0 0 1 0 -1
REFERENCE +
CHECK STANDARD +
DEGREES OF FREEDOM = 8
SOLUTION MATRIX
DIVISOR = 24
OBSERVATIONS 1 1 1 1 1
1 1 1 1 1 1
0 1 -1 0 0 0
-1 1 0 0 0 0
0 1 0 -1 0 0
0 -1 0 0 0 1
-1 0 0 0 0 1
0 0 -1 0 0 1
0 0 0 0 1 -1
-1 0 0 0 1 0
0 -1 0 0 1 0
0 0 0 1 -1 0
-1 0 0 1 0 0
0 0 0 1 0 -1
0 0 1 -1 0 0
-1 0 1 0 0 0
0 0 1 0 -1 0
REFERENCE +
CHECK STANDARD +
DEGREES OF FREEDOM = 10
SOLUTION MATRIX
DIVISOR = 24
OBSERVATIONS 1 1 1 1 1
1 0.7111 +
1 0.7111 +
Indications It can be shown (Cameron and Hailes) that the average reading for a
for test test thermometer is its indication at the temperature implied by the
thermometers average of the three resistance readings. The standard deviation
associated with this indication is calculated from difference readings
where
Estimates of The estimates of the shift due to the resistance thermometer and
drift temperature drift are given by:
.
The standard deviation of the indication assigned to the ith test
thermometer is
and the standard deviation for the estimates of shift and drift are
respectively.
List of designs
OBSERVATIONS 1 1 1 1 1
Y(1) + -
Y(2) + -
Y(3) + -
Y(4) + -
Y(5) - +
Y(6) - +
Y(7) + -
Y(8) + -
Y(9) - +
Y(10) + -
RESTRAINT + +
CHECK STANDARD + -
DEGREES OF FREEDOM = 6
SOLUTION MATRIX
DIVISOR = 10
OBSERVATIONS 1 1 1 1 1
Y(1) 2 -2 0 0 0
Y(2) 0 0 0 2 -2
Y(3) 0 0 2 -2 0
Y(4) -1 1 -3 -1 -1
Y(5) -1 1 1 1 3
Y(6) -1 1 1 3 1
Y(7) 0 0 2 0 -2
Y(8) -1 1 -1 -3 -1
Y(9) 1 -1 1 1 3
Y(10) 1 -1 -3 -1 -1
R* 5 5 5 5 5
WT K1 1 1 1 1 1
1 0.5477 +
1 0.5477 +
1 0.5477 +
2 0.8944 + +
3 1.2247 + + +
0 0.6325 + -
Short-term The short-term standard deviation from each design is the basis for
standard controlling instrument precision. Because the measurements for a single
deviation design are completed in a short time span, this standard deviation
estimates the basic precision of the instrument. Designs should be
chosen to have enough measurements so that the standard deviation
from the design has at least 3 degrees of freedom where the degrees of
freedom are (n - m + 1) with
● n = number of difference measurements
● m = number of artifacts.
Database of The creation and maintenance of the database of check standard values
check is an important aspect of the control process. The results from each
standard calibration run are recorded in the database. The best way to record this
values information is in one file with one line (row in a spreadsheet) of
information in fixed fields for each calibration run. A list of typical
entries follows:
1. Date
2. Identification for check standard
3. Identification for the calibration design
4. Identification for the instrument
5. Check standard value
6. Repeatability standard deviation from design
7. Degrees of freedom
8. Operator identification
9. Flag for out-of-control signal
10. Environmental readings (if pertinent)
.
The quantity under the radical is the upper percentage point from the
F table where is chosen small to be, say, 05. The other two terms
refer to the degrees of freedom in the new standard deviation and the
degrees of freedom in the process standard deviation.
Monitoring The standard deviations over time and many calibrations are tracked and monitored using a
technique for control chart for standard deviations. The database and control limits are updated on a
standard yearly or bi-yearly basis and standard deviations for each calibration run in the next cycle
deviations are compared with the control limits. In this case, the standard deviations from 117
calibrations between 1975 and 1985 were pooled to obtain a repeatability standard
deviation with v = 3*117 = 351 degrees of freedom, and the control limits were computed
at the 1% significance level.
Run the Dataplot commands for creating the control chart are as follows:
software
macro for dimension 30 columns
creating the skip 4
control chart read mass.dat t id y bal s ds
for balance let n = size s
#12 y1label MICROGRAMS
x1label TIME IN YEARS
xlimits 75 90
x2label STANDARD DEVIATIONS ON BALANCE 12
characters * blank blank blank
lines blank solid dotted dotted
let ss=s*s
let sp=mean ss
let sp=sqrt(sp)
let scc=sp for i = 1 1 n
let f = fppf(.99,3,351)
let f=sqrt(f)
let sul=f*scc
plot s scc sul vs t
Control chart
for precision
TIME IN YEARS
Interpretation The control chart shows that the precision of the balance remained in control through 1990
of the control with only two violations of the control limits. For those occasions, the calibrations were
chart discarded and repeated. Clearly, for the second violation, something significant occurred
that invalidated the calibration results.
Further However, it is also clear from the pattern of standard deviations over time that the precision
interpretation of the balance was gradually degrading and more and more points were approaching the
of the control control limits. This finding led to a decision to replace this balance for high accuracy
chart calibrations.
The control If has been computed from historical data, the upper and lower
limits depend control limits are:
on the t-
distribution
and the
degrees of
freedom in the
process with denoting the upper critical value from the
standard
deviation t-table with v = (K - 1) degrees of freedom.
Run software Dataplot can compute the value of the t-statistic. For a conservative
macro for case with = 0.05 and K = 6, the commands
computing the
t-factor
let alphau = 1 - 0.05/2
let k = 6
let v1 = k-1
let t = tppf(alphau, v1)
return the following value:
THE COMPUTED VALUE OF THE CONSTANT T =
0.2570583E+01
The control The control procedure compares the check standard value, C, from
procedure is each calibration run with the upper and lower control limits. This
invoked in procedure should be implemented in real time and does not necessarily
real-time and require a graphical presentation. The check standard value can be
a failure compared algebraically with the control limits. The calibration run is
implies that judged to be out-of-control if either:
the current
calibration C > UCL
should be
rejected
or
C < LCL
Actions to be If the check standard value exceeds one of the control limits, the
taken process is judged to be out of control and the current calibration run is
rejected. The best strategy in this situation is to repeat the calibration
to see if the failure was a chance occurrence. Check standard values
that remain in control, especially over a period of time, provide
confidence that no new biases have been introduced into the
measurement process and that the long-term variability of the process
has not changed.
For more guidance, see Remedies and strategies for dealing with
out-of-control signals.
Caution - be If the tests for control are carried out algebraically, it is recommended
sure to plot that, at regular intervals, the check standard values be plotted against
the data time to check for drift or anomalies in the measurement process.
Check There is no slot in the 1,1,1,1 design for an artifact check standard when the first two kilograms are
standard reference standards; the third kilogram is a test weight; and the fourth is a summation of smaller
weights that act as the restraint in the next series. Therefore, the check standard is a computed
difference between the values of the two reference standards as estimated from the design. The
convention with mass calibrations is to report the correction to nominal, in this case the correction to
1000 g, as shown in the control charts below.
Monitoring Check standard values over time and many calibrations are tracked and monitored using a Shewhart
technique for control chart. The database and control limits are updated when needed and check standard values for
check standard each calibration run in the next cycle are compared with the control limits. In this case, the values
values from 117 calibrations between 1975 and 1985 were averaged to obtain a baseline and process standard
deviation with v = 116 degrees of freedom. Control limits are computed with a factor of k = 3 to
identify truly anomalous data points.
Run the Dataplot commands for creating the control chart are as follows:
software
macro for dimension 500 30
creating the skip 4
Shewhart read mass.dat t id y bal s ds
control chart let n = size y
title mass check standard 41
y1label micrograms
x1label time in years
xlimits 75 90
let ybar=mean y subset t < 85
let sd=standard deviation y subset t < 85
let cc=ybar for i = 1 1 n
let ul=cc+3*sd
let ll=cc-3*sd
characters * blank blank blank * blank blank blank
lines blank solid dotted dotted blank solid dotted dotted
plot y cc ul ll vs t
.end of calculations
Control chart
of
measurements
of kilogram
check standard
showing a
change in the
process after
1985
Interpretation The control chart shows only two violations of the control limits. For those occasions, the calibrations
of the control were discarded and repeated. The configuration of points is unacceptable if many points are close to a
chart control limit and there is an unequal distribution of data points on the two sides of the control chart --
indicating a change in either:
● process average which may be related to a change in the reference standards
or
● variability which may be caused by a change in the instrument precision or may be the result of
other factors on the measurement process.
Small changes Unfortunately, it takes time for the patterns in the data to emerge because individual violations of the
only become control limits do not necessarily point to a permanent shift in the process. The Shewhart control chart
obvious over is not powerful for detecting small changes, say of the order of at most one standard deviation, which
time appears to be approximately the case in this application. This level of change might seem
insignificant, but the calculation of uncertainties for the calibration process depends on the control
limits.
Re-establishing If the limits for the control chart are re-calculated based on the data after 1985, the extent of the
the limits change is obvious. Because the exponentially weighted moving average (EWMA) control chart is
based on capable of detecting small changes, it may be a better choice for a high precision process that is
recent data producing many control values.
and EWMA
option
Run Dataplot commands for updating the control chart are as follows:
continuation of
software let ybar2=mean y subset t > 85
macro for let sd2=standard deviation y subset t > 85
updating let n = size y
Shewhart let cc2=ybar2 for i = 1 1 n
control chart let ul2=cc2+3*sd2
let ll2=cc2-3*sd2
plot y cc ul ll vs t subset t < 85 and
plot y cc2 ul2 ll2 vs t subset t > 85
Revised
control chart
based on check
standard
measurements
after 1985
Explanation The exponentially weighted moving average (EWMA) is a statistic for monitoring the process that
of EWMA averages the data in a way that gives less and less weight to data as they are further removed in time
statistic at from the current measurement. The EWMA statistic at time t is computed recursively from individual
the kilogram data points which are ordered in time to be
level
Control The EWMA control chart can be made sensitive to small changes or a gradual drift in the process by
mechanism the choice of the weighting factor, . A weighting factor between 0.2 - 0.3 has been suggested for
for EWMA this purpose (Hunter), and 0.15 is another popular choice.
Limits for the The target or center line for the control chart is the average of historical data. The upper (UCL) and
control chart lower (LCL) limits are
where s is the standard deviation of the historical data; the function under the radical is a good
approximation to the component of the standard deviation of the EWMA statistic that is a function of
time; and k is the multiplicative factor, defined in the same manner as for the Shewhart control chart,
which is usually taken to be 3.
Example of The target (average) and process standard deviation are computed from the check standard data taken
EWMA chart prior to 1985. The computation of the EWMA statistic begins with the data taken at the start of 1985.
for check In the control chart below, the control data after 1985 are shown in green, and the EWMA statistics
standard data are shown as black dots superimposed on the raw data. The control limits are calculated according to
for kilogram the equation above where the process standard deviation, s = 0.03065 mg and k = 3. The EWMA
calibrations statistics, and not the raw data, are of interest in looking for out-of-control signals. Because the
showing EWMA statistic is a weighted average, it has a smaller standard deviation than a single control
multiple measurement, and, therefore, the EWMA control limits are narrower than the limits for a Shewhart
violations of control chart.
the control
limits for the
EWMA
statistics
Run the Dataplot commands for creating the control chart are as follows:
software
macro for dimension 500 30
creating the skip 4
Shewhart read mass.dat x id y bal s ds
control chart let n = number y
let cutoff = 85.0
let tag = 2 for i = 1 1 n
let tag = 1 subset x < cutoff
xlimits 75 90
let m = mean y subset tag 1
let s = sd y subset tag 1
let lambda = .2
let fudge = sqrt(lambda/(2-lambda))
let mean = m for i = 1 1 n
Interpretation The EWMA control chart shows many violations of the control limits starting at approximately the
of the control mid-point of 1986. This pattern emerges because the process average has actually shifted about one
chart standard deviation, and the EWMA control chart is sensitive to small changes.
● Data collection
● Assumptions
● Conditions that can invalidate the calibration procedure
● Data analysis and model validation
● Calibration of future measurements
● Uncertainties of calibrated values
Basic steps The calibration method is the same for both situations and requires the
for correcting following basic steps:
the ● Selection of reference standards with known values to cover the
instrument for range of interest.
bias
● Measurements on the reference standards with the instrument to
be calibrated.
● Functional relationship between the measured and known values
of the reference standards (usually a least-squares fit to the data)
called a calibration curve.
● Correction of all measurements by the inverse of the calibration
curve.
Schematic A schematic explanation is provided by the figure below for load cell
example of a calibration. The loadcell measurements (shown as *) are plotted on the
calibration y-axis against the corresponding values of known load shown on the
curve and x-axis.
resulting
value A quadratic fit to the loadcell data produces the calibration curve that
is shown as the solid line. For a future measurement with the load cell,
Y' = 1.344 on the y-axis, a dotted line is drawn through Y' parallel to
the x-axis. At the point where it intersects the calibration curve,
another dotted line is drawn parallel to the y-axis. Its point of
intersection with the x-axis at X' = 13.417 is the calibrated value.
● Quadratic:
● Power:
● Non-linear:
Warning A plot of the data, although always recommended, is not sufficient for
identifying the correct model for the calibration curve. Instrument
responses may not appear non-linear over a large interval. If the
response and the known values are in the same units, differences from
the known values should be plotted versus the known values.
Power model The power model is appropriate when the measurement error is
treated as a proportional to the response rather than being additive. It is frequently
linear model used for calibrating instruments that measure dosage levels of
irradiated materials.
The power model is a special case of a non-linear model that can be
linearized by a natural logarithm transformation to
so that the model to be fit to the data is of the familiar linear form
Non-linear Instruments whose responses are not linear in the coefficients can
models and sometimes be described by non-linear models. In some cases, there are
their theoretical foundations for the models; in other cases, the models are
limitations developed by trial and error. Two classes of non-linear functions that
have been shown to have practical value as calibration functions are:
1. Exponential
2. Rational
Non-linear models are an important class of calibration models, but
they have several significant limitations.
● The model itself may be difficult to ascertain and verify.
Exception to If the instrument is not to be calibrated over its entire range, but only
the rule over a very short range for a specific application, then it may not be
above - necessary to develop a complete calibration curve, and a bracketing
bracketing technique (ISO 11095) will provide satisfactory results. The bracketing
technique assumes that the instrument is linear over the interval of
interest, and, in this case, only two reference standards are required --
one at each end of the interval.
Rule of thumb It has been shown by Bruce Hoadly, in an internal NIST publication,
that the best way to mitigate the effect of random fluctuations in the
reference values is to plan for a large spread of values on the x-axis
relative to the precision of the instrument.
Outliers in Outliers in the calibration data can seriously distort the calibration
the curve, particularly if they lie near one of the endpoints of the calibration
calibration interval.
data ● Isolated outliers (single points) should be deleted from the
calibration data.
● An entire day's results which are inconsistent with the other data
should be examined and rectified before proceeding with the
analysis.
Example of An important point, but one that is rarely considered, is that there can be
differences differences in responses from repetition to repetition that will invalidate
among the analysis. A plot of the aggregate of the calibration data may not
repetitions identify changes in the instrument response from day-to-day. What is
in the needed is a plot of the fine structure of the data that exposes any day to
calibration day differences in the calibration data.
data
Warning - A straight-line fit to the aggregate data will produce a 'calibration curve'.
calibration However, if straight lines fit separately to each day's measurements
can fail show very disparate responses, the instrument, at best, will require
because of calibration on a daily basis and, at worst, may be sufficiently lacking in
day-to-day control to be usable.
changes
This plot
shows
measurements
made on 10
reference
materials
repeated on
four days with
the 4 points
for each day
overlapping
This plot
shows the
differences
between each
measurement
and the
corresponding
reference
value.
Because days
are not
identified, the
plot gives no
indication of
problems in
the control of
the imaging
system from
from day to
day.
negative on
day 3.
Interpretation Given the lack of control for this measurement process, any calibration procedure
of calibration built on the average of the calibration data will fail to properly correct the system on
findings some days and invalidate resulting measurements. There is no good solution to this
problem except daily calibration.
Warning - Once an initial model has been chosen, the coefficients in the model are estimated
regarding from the data using a statistical software package. It is impossible to
statistical over-emphasize the importance of using reliable and documented software for this
software analysis.
Output With the exception of non-linear models, the software package will use the method
required from of least squares for estimating the coefficients. The software package should also
a software be capable of performing a 'weighted' fit for situations where errors of
package measurement are non-constant over the calibration interval. The choice of weights
is usually the responsibility of the user. The software package should, at the
minimum, provide the following information:
● Coefficients of the calibration curve
● F-ratio for goodness of fit (if there are repetitions on the y-axis at each
reference value)
Typical The following output is from the statistical software package, Dataplot where load
analysis of a cell measurements are modeled as a quadratic function of known loads. There are 3
quadratic fit repetitions at each load level for a total of 33 measurements. The commands
Run software
macro read loadcell.dat x y
quadratic fit y x
return the following output:
F-ratio for
judging the LACK OF FIT F-RATIO = 0.3482 = THE 6.3445% POINT OF THE
adequacy of F DISTRIBUTION WITH 8 AND 22 DEGREES OF FREEDOM
the model.
Coefficients
and their COEFFICIENT ESTIMATES ST. DEV. T VALUE
standard
deviations and 1 a -0.183980E-04 (0.2450E-04) -0.75
associated t
values 2 b 0.100102 (0.4838E-05) 0.21E+05
The F-ratio is The F-ratio provides information on the model as a good descriptor of the data. The
used to test F-ratio is compared with a critical value from the F-table. An F-ratio smaller than
the goodness the critical value indicates that all significant structure has been captured by the
of the fit to the model.
data
F-ratio < 1 For the load cell analysis, a plot of the data suggests a linear fit. However, the
always linear fit gives a very large F-ratio. For the quadratic fit, the F-ratio = 0.3482 with
indicates a v1 = 8 and v2 = 20 degrees of freedom. The critical value of F(0.05, 8, 20) = 2.45
good fit indicates that the quadratic function is sufficient for describing the data. A fact to
keep in mind is that an F-ratio < 1 does not need to be checked against a critical
value; it always indicates a good fit to the data.
Note: Dataplot reports a probability associated with the F-ratio (6.334%), where a
probability > 95% indicates an F-ratio that is significant at the 5% level. Other
software may report in other ways; therefore, it is necessary to check the
interpretation for each package.
The t-values The t-values can be compared with critical values from a t-table. However, for a
are used to test at the 5% significance level, a t-value < 2 is a good indicator of
test the non-significance. The t-value for the intercept term, a, is < 2 indicating that the
significance of intercept term is not significantly different from zero. The t-values for the linear
individual and quadratic terms are significant indicating that these coefficients are needed in
coefficients the model. If the intercept is dropped from the model, the analysis is repeated to
obtain new estimates for the coefficients, b and c.
Residual The residual standard deviation estimates the standard deviation of a single
standard measurement with the load cell.
deviation
Further The residuals (differences between the measurements and their fitted values) from
considerations the fit should also be examined for outliers and structure that might invalidate the
and tests of calibration curve. They are also a good indicator of whether basic assumptions of
assumptions normality and equal precision for all measurements are valid.
If the initial model proves inappropriate for the data, a strategy for improving the
model is followed.
21. 2.10526
21. 2.10524
21. 2.10524
Linear The inverse of the calibration line for the linear model
calibration
line
gives the calibrated value
Tests for the Before correcting for the calibration line by the equation above, the
intercept intercept and slope should be tested for a=0, and b=1. If both
and slope of
calibration
curve -- If
both
conditions
hold, no
calibration
is needed. there is no need for calibration. If, on the other hand only the test for
a=0 fails, the error is constant; if only the test for b=1 fails, the errors
are related to the size of the reference standards.
Quadratic The inverse of the calibration curve for the quadratic model
calibration
curve
requires a root
Power curve The inverse of the calibration curve for the power model
where b and the natural logarithm of a are estimated from the power
model transformed to a linear function.
Non-linear For more complicated models, the inverse for the calibration curve is
and other obtained by interpolation from a graph of the function or from predicted
calibration values of the function.
curves
Difficulties The answer is not easy because of the intersection of two uncertainties
associated with
1. the calibration curve itself because of limited data
2. the 'future' measurement
If the calibration experiment were to be repeated, a slightly different
calibration curve would result even for a system in statistical control.
An exposition of the intersection of the two uncertainties is given for
the calibration of proving rings ( Hockersmith and Ku).
Uncertainty of The uncertainty to be evaluated is the uncertainty of the calibrated value, X', computed
the calibrated for any future measurement, Y', made with the calibrated instrument where
value X' can
be evaluated
using software
capable of
algebraic
representation
Propagation The analysis of uncertainty is demonstrated with the software package, Mathematica
of error using (Wolfram). The format for inputting the solution to the quadratic calibration curve in
Mathematica Mathematica is as follows:
In[10]:=
f = (-b + (b^2 - 4 c (a - Y))^(1/2))/(2 c)
2
-b + Sqrt[b - 4 c (a - Y)]
---------------------------
2 c
Partial The partial derivatives are computed using the D function. For example, the partial
derivatives derivative of f with respect to Y is given by:
In[11]:=
dfdY=D[f, {Y,1}]
The Mathematica representation is:
Out[11]=
1
----------------------
2
Sqrt[b - 4 c (a - Y)]
Out[12]=
1
-(----------------------)
2
Sqrt[b - 4 c (a - Y)]
In[13]:=
dfdb=D[f,{b,1}]
Out[13]=
b
-1 + ----------------------
2
Sqrt[b - 4 c (a - Y)]
---------------------------
2 c
In[14]:=dfdc=D[f, {c,1}]
Out[14]=
2
-(-b + Sqrt[b - 4 c (a - Y)]) a - Y
------------------------------ - ------------------------
2 2
2 c c Sqrt[b - 4 c (a - Y)]
The variance The variance of X' is defined from propagation of error as follows:
of the
calibrated In[15]:=
value from u2 =(dfdY)^2 (sy)^2 + (dfda)^2 (sa)^2 + (dfdb)^2 (sb)^2
propagation of + (dfdc)^2 (sc)^2
error
The values of the coefficients and their respective standard deviations from the
quadratic fit to the calibration curve are substituted in the equation. The standard
deviation of the measurement, Y, may not be the same as the standard deviation from
the fit to the calibration data if the measurements to be corrected are taken with a
different system; here we assume that the instrument to be calibrated has a standard
deviation that is essentially the same as the instrument used for collecting the
calibration data and the residual standard deviation from the quadratic fit is the
appropriate estimate.
In[16]:=
% /. a -> -0.183980 10^-4
% /. sa -> 0.2450 10^-4
% /. b -> 0.100102
% /. sb -> 0.4838 10^-5
% /. c -> 0.703186 10^-5
% /. sc -> 0.2013 10^-6
% /. sy -> 0.0000376353
Simplification Intermediate outputs from Mathematica, which are not shown, are simplified. (Note that
of output the % sign means an operation on the last output.) Then the standard deviation is
computed as the square root of the variance.
In[17]:=
u2 = Simplify[%]
u=u2^.5
Out[24]=
0.100102 2
Power[0.11834 (-1 + --------------------------------) +
Sqrt[0.0100204 + 0.0000281274 Y]
-9
2.01667 10
-------------------------- +
0.0100204 + 0.0000281274 Y
-14 9
4.05217 10 Power[1.01221 10 -
10
1.01118 10 Sqrt[0.0100204 + 0.0000281274 Y] +
142210. (0.000018398 + Y)
--------------------------------, 2], 0.5]
Sqrt[0.0100204 + 0.0000281274 Y]
Input for The standard deviation expressed above is not easily interpreted but it is easily graphed.
displaying A graph showing standard deviations of calibrated values, X', as a function of
standard instrument response, Y', is displayed in Mathematica given the following input:
deviations of
calibrated In[31]:= Plot[u,{Y,0,2.}]
values as a
function of Y'
Graph
showing the
standard
deviations of
calibrated
values X' for
given
instrument
responses Y'
ignoring
covariance
terms in the
propagation of
error
Problem with The propagation of error shown above is not correct because it ignores the covariances
propagation of among the coefficients, a, b, c. Unfortunately, some statistical software packages do
error not display these covariance terms with the other output from the analysis.
Covariance The variance-covariance terms for the loadcell data set are shown below.
terms for
loadcell data a 6.0049021-10
b -1.0759599-10 2.3408589-11
c 4.0191106-12 -9.5051441-13 4.0538705-14
The diagonal elements are the variances of the coefficients, a, b, c, respectively, and
the off-diagonal elements are the covariance terms.
Recomputation To account for the covariance terms, the variance of X' is redefined by adding the
of the covariance terms. Appropriate substitutions are made; the standard deviations are
standard recomputed and graphed as a function of instrument response.
deviation of X'
In[25]:=
u2 = u2 + 2 dfda dfdb sab2 + 2 dfda dfdc sac2 + 2 dfdb dfdc
sbc2
% /. sab2 -> -1.0759599 10^-10
% /. sac2 -> 4.0191106 10^-12
% /. sbc2 -> -9.5051441 10^-13
u2 = Simplify[%]
u = u2^.5
Plot[u,{Y,0,2.}]
The graph below shows the correct estimates for the standard deviation of X' and gives
a means for assessing the loss of accuracy that can be incurred by ignoring covariance
terms. In this case, the uncertainty is reduced by including covariance terms, some of
which are negative.
Graph
showing the
standard
deviations of
calibrated
values, X', for
given
instrument
responses, Y',
with
covariance
terms included
in the
propagation of
error
Calculation of The check standard values are the raw measurements on the artifacts corrected by the
check calibration curve. The standard deviation of these values should estimate the uncertainty
standard associated with calibrated values. The success of this method of estimating the
values uncertainties depends on adequate sampling of the measurement process.
Run software Dataplot commands for computing the standard deviation from the control data are:
macro for
computing the read linewid2.dat day position x y
standard let b0 = 0.2817
deviation let b1 = 0.9767
let w = ((y - b0)/b1) - x
let sdcal = standard deviation w
Comparison The standard deviation, 0.062 µm, can be compared with a propagation of error analysis.
with
propagation
of error
Other sources In addition to the type A uncertainty, there may be other contributors to the uncertainty
of uncertainty such as the uncertainties of the values of the reference materials from which the
calibration curve was derived.
Estimates The calibration data consist of 40 measurements with an optical imaging system on 10
from line width artifacts. A linear fit to the data using the software package Omnitab (Omnitab
calibration 80 ) gives a calibration curve with the following estimates for the intercept, a, and the
data
slope, b:
a .23723513
b .98839599
-------------------------------------------------------
RESIDUAL STANDARD DEVIATION = .038654864
BASED ON DEGREES OF FREEDOM 40 - 2 = 38
a 2.2929900-04
b -2.9703502-05 4.5966426-06
Propagation The propagation of error is accomplished with the following instructions using the
of error software package Mathematica (Wolfram):
using
Mathematica f=(y -a)/b
dfdy=D[f, {y,1}]
dfda=D[f, {a,1}]
dfdb=D[f,{b,1}]
u2 =dfdy^2 sy^2 + dfda^2 sa2 + dfdb^2 sb2 + 2 dfda dfdb sab2
% /. a-> .23723513
% /. b-> .98839599
% /. sa2 -> 2.2929900 10^-04
% /. sb2 -> 4.5966426 10^-06
% /. sab2 -> -2.9703502 10^-05
% /. sy -> .038654864
u2 = Simplify[%]
u = u2^.5
Plot[u, {y, 0, 12}]
Standard The output from Mathematica gives the standard deviation of a calibrated value, X', as a
deviation of function of instrument response:
calibrated
value X' -6 2 0.5
(0.00177907 - 0.0000638092 y + 4.81634 10 y )
Graph
showing
standard
deviation of
calibrated
value X'
plotted as a
function of
instrument
response Y'
for a linear
calibration
Comparison Comparison of the analysis of check standard data, which gives a standard deviation of
of check 0.062 µm, and propagation of error, which gives a maximum standard deviation of 0.042
standard µm, suggests that the propagation of error may underestimate the type A uncertainty. The
analysis and check standard measurements are undoubtedly sampling some sources of variability that
propagation do not appear in the formal propagation of error formula.
of error
Check For linear calibration, it is sufficient to control the end-points and the
standards middle of the calibration interval to ensure that the instrument does not
needed for drift out of calibration. Therefore, check standards are required at three
the control points; namely,
program ● at the lower-end of the regime
Data One measurement is needed on each check standard for each checking
collection period. It is advisable to start by making control measurements at the
start of each day or as often as experience dictates. The time between
checks can be lengthened if the instrument continues to stay in control.
Calculation The upper and lower control limits (Croarkin and Varner)) are,
of control respectively,
limits
where s is the residual standard deviation of the fit from the calibration
experiment, and is the slope of the linear calibration curve.
Values t* The critical value, , can be found in the t* table for p = 3; v is the
degrees of freedom for the residual standard deviation; and is equal to
0.05.
Run Dataplot will compute the critical value of the t* statistic. For the case
software where = 0.05, m = 3 and v = 38, say, the commands
macro for t*
let alpha = 0.05
let m = 3
let v = 38
let zeta = .5*(1 - exp(ln(1-alpha)/m))
let TSTAR = tppf(zeta, v)
return the following value:
THE COMPUTED VALUE OF THE CONSTANT TSTAR =
0.2497574E+01
Sensitivity to If
departure
from
linearity
the instrument is in statistical control. Statistical control in this context
implies not only that measurements are repeatable within certain limits
but also that instrument response remains linear. The test is sensitive to
departures from linearity.
● Slope = 0.9767
● Degrees of freedom = 38
Line width The control measurements, Y, and known values, X, for the three
measurements artifacts at the upper, mid-range, and lower end (U, M, L) of the
made with an calibration line are shown in the following table:
optical
imaging DAY POSITION X Y
system
1 L 0.76 1.12
1 M 3.29 3.49
1 U 8.89 9.11
2 L 0.76 0.99
2 M 3.29 3.53
2 U 8.89 8.89
3 L 0.76 1.05
3 M 3.29 3.46
3 U 8.89 9.02
4 L 0.76 0.76
4 M 3.29 3.75
4 U 8.89 9.30
5 L 0.76 0.96
5 M 3.29 3.53
5 U 8.89 9.05
6 L 0.76 1.03
6 M 3.29 3.52
6 U 8.89 9.02
Run software Dataplot commands for computing the control limits and producing the
macro for control chart are:
control chart
read linewid.dat day position x y
let b0 = 0.2817
let b1 = 0.9767
let s = 0.06826
let df = 38
let alpha = 0.05
let m = 3
let zeta = .5*(1 - exp(ln(1-alpha)/m))
let TSTAR = tppf(zeta, df)
let W = ((y - b0)/b1) - x
let n = size w
let center = 0 for i = 1 1 n
let LCL = CENTER + s*TSTAR/b1
let UCL = CENTER - s*TSTAR/b1
characters * blank blank blank
lines blank dashed solid solid
y1label control values
xlabel TIME IN DAYS
plot W CENTER UCL LCL vs day
Interpretation The control measurements show no evidence of drift and are within the
of control control limits except on the fourth day when all three control values
chart are outside the limits. The cause of the problem on that day cannot be
diagnosed from the data at hand, but all measurements made on that
day, including workload items, should be rejected and remeasured.
● Reproducibility
● Stability
● Bias
Strategy The strategy is to conduct and analyze a study that examines the
behavior of similar gauges to see if:
● They exhibit different levels of precision;
● Operators
● Gauges
● Parameter levels
● Configurations, etc.
Selection of The artifacts for the study are check standards or test items of a type
artifacts or that are typically measured with the gauges under study. It may be
check necessary to include check standards for different parameter levels if
standards the gauge is a multi-response instrument. The discussion of check
standards should be reviewed to determine the suitability of available
artifacts.
Number of The number of artifacts for the study should be Q (Q > 2). Check
artifacts standards for a gauge study are needed only for the limited time
period (two or three months) of the study.
Selection of Only those operators who are trained and experienced with the
operators gauges should be enlisted in the study, with the following constraints:
● If there is a small number of operators who are familiar with
the gauges, they should all be included in the study.
● If the study is intended to be representative of a large pool of
operators, then a random sample of L (L > 2) operators should
be chosen from the pool.
● If there is only one operator for the gauge type, that operator
should make measurements on K (K > 2) days.
Selection of If there is only a small number of gauges in the facility, then all
gauges gauges should be included in the study.
If the study is intended to represent a larger pool of gauges, then a
random sample of I (I > 3) gauges should be chosen for the study.
Limit the initial If the gauges operate at several parameter levels (for example;
study frequencies), an initial study should be carried out at 1 or 2 levels
before a larger study is undertaken.
If there are differences in the way that the gauge can be operated, an
initial study should be carried out for one or two configurations
before a larger study is undertaken.
Relationship The disadvantage of this design is that there is minimal data for
to 2-level estimating variability over time. A 2-level nested design and a 3-level
and 3-level nested design, both of which require measurments over time, are
nested discussed on other pages.
designs
with the first index identifying the month of measurement and the
second index identifying the repetition number.
Analysis of The level-1 standard deviation, which describes the basic precision of
data the gauge, is
Relationship The standard deviation that defines the uncertainty for a single
to measurement on a test item, often referred to as the reproducibility
uncertainty standard deviation (ASTM), is given by
for a test
item
Time intervals The following levels are based on the characteristics of many measurement systems
in a nested and should be adapted to a specific measurement situation as needed.
design ● Level-1 Measurements taken over a short term to estimate gauge precision
● Level-2 Measurements taken over days (of other appropriate time increment)
Schedule for A schedule for making check standard measurements over time (once a day, twice a
making week, or whatever is appropriate for sampling all conditions of measurement) should
measurements be set up and adhered to. The check standard measurements should be structured in
the same way as values reported on the test items. For example, if the reported values
are averages of two repetitions made within 5 minutes of each other, the check
standard values should be averages of the two measurements made in the same
manner.
Exception One exception to this rule is that there should be at least J = 2 repetitions per day,
etc. Without this redundancy, there is no way to check on the short-term precision of
the measurement system.
Depiction of
schedule for
making check
standard
measurements
with 4
repetitions per
day over K
days on the
surface of a
silicon wafer
K days - 4 repetitions
2-level design for check standard measurements
Operator The measurements should be taken with ONE operator. Operator is not usually a
considerations consideration with automated systems. However, systems that require decisions
regarding line edge or other feature delineations may be operator dependent.
Case Study: Results should be recorded along with pertinent environmental readings and
Resistivity identifications for significant factors. The best way to record this information is in
check standard one file with one line or row (on a spreadsheet) of information in fixed fields for
each check standard measurement.
for the jth repetition on the kth day. The mean for the kth day is
and the (level-1) standard deviation for gauge precision with v = J - 1 degrees of
freedom is
Pooling The pooled level-1 standard deviation with v = K(J - 1) degrees of freedom is
increases the
reliability of
the estimate of
the standard
.
deviation
Data analysis The level-2 standard deviation of the check standard represents the process
of process variability. It is computed with v = K - 1 degrees of freedom as:
(level-2)
standard
deviation
where
Relationship to The standard deviation that defines the uncertainty for a single measurement on a test
uncertainty for item, often referred to as the reproducibility standard deviation (ASTM), is given by
a test item
There may be other sources of uncertainty in the measurement process that must be
accounted for in a formal analysis of uncertainty.
2-level nested The design can be truncated at two levels to estimate repeatability and
design day-to-day variability if there is no reason to estimate longer-term
effects. The analysis remains the same through the first two levels.
Advantages This design has advantages in ease of use and computation. The
number of repetitions at each level need not be large because
information is being gathered on several check standards.
Caution Be sure that the design is truly nested; i.e., that each operator reports
results for the same set of circumstances, particularly with regard to
day of measurement so that each operator measures every day, or
every other day, and so forth.
Randomize on Randomize with respect to gauges for each check standard; i.e.,
gauges choose the first check standard and randomize the gauges; choose the
second check standard and randomize gauges; and so forth.
Record results Record the average and standard deviation from each group of J
in a file repetitions by:
● check standard
● gauge
● Level 2 - reproducibility/day-to-day
● Level 3 - stability/run-to-run
The graph below depicts possible scenarios for a 2-level design
(short-term repetitions and days) to illustrate the concepts.
where
Level-2: L The level-2 standard deviation, s2l, is pooled over the L runs. Individual
reproducibility standard deviations with (K - 1) degrees of freedom each are computed
standard from K daily averages as
deviations can
be computed
from the data
where
Relationship The standard deviation that defines the uncertainty for a single
to uncertainty measurement on a test item is given by
for a test item
and
● Plot of repeatability standard deviations versus check standard with gauge coded
Typically, we expect the standard deviation to be gauge dependent -- in which case there
should be a separate standard deviation for each gauge. If the gauges are all at the same
level of precision, the values can be combined over all gauges.
Repeatability A repeatability standard deviation from J repetitions is not a reliable estimate of the
standard precision of the gauge. Fortunately, these standard deviations can be pooled over days;
deviations runs; and check standards, if appropriate, to produce a more reliable precision measure.
can be The table below shows a mechanism for pooling. The pooled repeatability standard
pooled over deviation, , has LK(J - 1) degrees of freedom for measurements taken over:
operators,
runs, and ● J repetitions
check ● K days
standards ● L runs
Basic The table below gives the mechanism for pooling repeatability standard deviations over
pooling rules days and runs. The pooled value is an average of weighted variances and is shown as the
last entry in the right-hand column of the table. The pooling can also cover check
standards, if appropriate.
View of To illustrate the calculations, a subset of data collected in a nested design for one check
entire standard (#140) and one probe (#2362) are shown below. The measurements are
dataset from resistivity (ohm.cm) readings with six repetitions per day. The individual level-1 standard
the nested deviations from the six repetitions and degrees of freedom are recorded in the last two
columns of the database.
design
Probe 2362
60 0.37176
gives the total degrees gives the total sum of
of freedom for s1 squares for s1
0.07871
The pooled value of s1 is given by
Run software The Dataplot commands (corresponding to the calculations in the table above)
macro for
pooling
standard dimension 500 30
deviations read mpc411.dat run wafer probe month day op temp avg s1i vi
let ssi=vi*s1i*s1i
let ss=sum ssi
let v = sum vi
let s1 = (ss/v)**0.5
print s1 v
return the following pooled values for the repeatability standard deviation and degrees of
freedom.
S1 -- 0.7871435E-01
V -- 0.6000000E+02
Pooling The level-2 standard deviations with (K - 1) degrees of a freedom are computed from the check
results in standard values for days and pooled over runs as shown in the table below. The pooled level-2
more standard deviation has degrees of freedom L(K - 1) for measurements made over:
reliable ● K days
estimates
● L runs
Mechanism The table below gives the mechanism for pooling level-2 standard deviations over runs. The pooled
for pooling value is an average of weighted variances and is the last entry in the right-hand column of the table.
The pooling can be extended in the same manner to cover check standards, if appropriate.
------- -------------
10 0.007519
Pooled value
0.02742
Run A subset of data (shown on previous page) collected in a nested design on one check standard
software (#140) with probe (#2362) on six days are analyzed for between-day effects. Dataplot commands to
macro for compute the level-2 standard deviations and pool over runs 1 and 2 are:
computing
level-2
standard dimension 500 30
deviations read mpc441.dat run wafer probe mo day op temp
and pooling y s df
over runs let n1 = count y subset run 1
let df1 = n1 - 1
let n2 = count y subset run 2
let df2 = n2 - 1
let v2 = df1 + df2
let s2run1 = standard deviation y subset run 1
let s2run2 = standard deviation y subset run 2
let s2 = df1*(s2run1)**2 + df2*(s2run2)**2
let s2 = (s2/v2)**.5
print s2run1 df1
print s2run2 df2
print s2 v2
Dataplot Dataplot returns the following level-2 standard deviations and degrees of freedom:
output
S2RUN1 -- 0.2728125E-01
DF1 -- 0.5000000E+01
S2RUN2 -- 0.2756367E-01
DF2 -- 0.5000000E+01
S2 -- 0.2742282E-01
v2 -- 0.1000000E+02
Relationship The level-2 standard deviation is related to the standard deviation for between-day precision and
to day effect gauge precision by
The size of the day effect can be calculated by subtraction using the formula above once the other
two standard deviations have been estimated reliably.
Example of Level-3 standard deviations for a single gauge pooled over check
pooling standards
Source of Standard Degrees of freedom Sum of squares
variability deviation (DF) (SS)
Level-3
Run A subset of data collected in a nested design on one check standard (#140) with
software probe (#2362) for six days and two runs is analyzed for between-run effects.
macro for Dataplot commands to compute the level-3 standard deviation from the
computing averages of 2 runs are:
level-3
standard
deviation dimension 30 columns
read mpc441.dat run wafer probe mo ...
day op temp y s df
let y1 = average y subset run 1
let y2 = average y subset run 2
let ybar = (y1 + y2)/2
let ss = (y1-ybar)**2 + (y2-ybar)**2
let v3 = 1
let s3 = (ss/v3)**.5
print s3 v3
Dataplot Dataplot returns the level-3 standard deviation and degrees of freedom:
output
S3 -- 0.2885137E-01
V3 -- 0.1000000E+01
Relationship The size of the between-run effect can be calculated by subtraction using the
to long-term standard deviations for days and gauge precision as
changes,
days and
gauge
precision
● L = 2 runs
Repeatability
0.48115 60
Wafer #138
0.69209 60
Wafer #139
Wafer #141
1.21752 60
Wafer #142
0.30076 60
SUM
3.17635 300 0.10290
● Lack of linearity
● Drift
● Hysteresis
● Differences among gauges
● Differences among geometries
● Differences among operators
● Remedial actions and strategies
2.4.5.1. Resolution
Resolution Resolution (MSA) is the ability of the measurement system to detect
and faithfully indicate small changes in the characteristic of the
measurement result.
Good versus A small implies good resolution -- the measurement system can
poor discriminate between artifacts that are close together in value.
Warning The number of digits displayed does not indicate the resolution of
the instrument.
Plot of the A test of linearity starts with a plot of the measured values versus
data corresponding values of the reference standards to obtain an indication
of whether or not the points fall on a straight line with slope equal to 1
-- indicating linearity.
Output from The intercept and bias are estimated using a statistical software
software package that should provide the following information:
package ● Estimates of the intercept and slope,
● Standard deviations of the intercept and slope
● Residual standard deviation of the fit
● F-test for goodness of fit
Test for Tests for the slope and bias are described in the section on instrument
linearity calibration. If the slope is different from one, the gauge is non-linear
and requires calibration or repair. If the intercept is different from zero,
the gauge has a bias.
2.4.5.3. Drift
Definition Drift can be defined (VIM) as a slow change in the response of a gauge.
Data For each gauge in the study, the analysis requires measurements on
collection ● Q (Q > 2) check standards
● K (K > 2) days
Data from a The data in the table below come from resistivity (ohm.cm) measurements on Q = 5
gauge study artifacts on K = 6 days. Two runs were made which were separated by about a
month's time. The artifacts are silicon wafers and the gauges are four-point probes
specifically designed for measuring resistivity of silicon wafers. Differences from the
wafer means are shown in the table.
Biases for 5
probes from a Table of biases for probes and silicon wafers (ohm.cm)
gauge study Wafers
with 5
artifacts on 6 Probe 138 139 140 141 142
days ---------------------------------------------------------
1 0.02476 -0.00356 0.04002 0.03938 0.00620
Plot of A graphical analysis can be more effective for detecting differences among gauges
differences than a table of differences. The differences are plotted versus artifact identification
among with each gauge identified by a separate plotting symbol. For ease of interpretation,
probes the symbols for any one gauge can be connected by dotted lines.
Interpretation Because the plots show differences from the average by artifact, the center line is the
zero-line, and the differences are estimates of bias. Gauges that are consistently
above or below the other gauges are biased high or low, respectively, relative to the
average. The best estimate of bias for a particular gauge is its average bias over the Q
artifacts. For this data set, notice that probe #2362 is consistently biased low relative
to the other probes.
Strategies for Given that the gauges are a random sample of like-kind gauges, the best estimate in
dealing with any situation is an average over all gauges. In the usual production or metrology
differences setting, however, it may only be feasible to make the measurements on a particular
among piece with one gauge. Then, there are two methods of dealing with the differences
gauges among gauges.
1. Correct each measurement made with a particular gauge for the bias of that
gauge and report the standard deviation of the correction as a type A
uncertainty.
2. Report each measurement as it occurs and assess a type A uncertainty for the
differences among the gauges.
Example of An example is given of a study of configuration differences for a single gauge. The
differences gauge, a 4-point probe for measuring resistivity of silicon wafers, can be wired in
among wiring several ways. Because it was not possible to test all wiring configurations during the
configurations gauge study, measurements were made in only two configurations as a way of
identifying possible problems.
Data on Measurements were made on six wafers over six days (except for 5 measurements on
wiring wafer 39) with probe #2062 wired in two configurations. This sequence of
configurations measurements was repeated after about a month resulting in two runs. Differences
and a plot of between measurements in the two configurations on the same day are shown in the
differences following table.
between the 2
wiring
configurations Differences between wiring configurations
● Temperature excursions
● Operator technique
Resolution There is no remedy for a gauge with insufficient resolution. The gauge
will need to be replaced with a better gauge.
Lack of Lack of linearity can be dealt with by correcting the output of the
linearity gauge to account for bias that is dependent on the level of the stimulus.
Lack of linearity can be tolerated (left uncorrected) if it does not
increase the uncertainty of the measurement result beyond its
requirement.
Drift It would be very difficult to correct a gauge for drift unless there is
sufficient history to document the direction and size of the drift. Drift
can be tolerated if it does not increase the uncertainty of the
measurement result beyond its requirement.
Potential The potential problem with this approach is that the calculation of
problem with uncertainty depends totally on the gauge study. If the measurement
this process changes its characteristics over time, the standard deviation
approach from the gauge study will not be the correct standard deviation for the
uncertainty analysis. One way to try to avoid such a problem is to carry
out a gauge study both before and after the measurements that are being
characterized for uncertainty. The 'before' and 'after' results should
indicate whether or not the measurement process changed in the
interim.
Biases - Rule The approach for biases is to estimate the maximum bias from a gauge
of thumb study and compute a standard uncertainty from the maximum bias
assuming a suitable distribution. The formulas shown below assume a
uniform distribution for each bias.
Determining If the maximum departure from linearity for the gauge has been
non-linearity determined from a gauge study, and it is reasonable to assume that the
gauge is equally likely to be engaged at any point within the range
tested, the standard uncertainty for linearity is
Determining Drift in direct reading instruments is defined for a specific time interval
drift of interest. The standard uncertainty for drift is
Case study: A case study on type A uncertainty analysis from a gauge study is
Type A recommended as a guide for bringing together the principles and
uncertainties elements discussed in this section. The study in question characterizes
from a the uncertainty of resistivity measurements made on silicon wafers.
gauge study
standard
3. Sensitivity coefficients for measurements with a 2-level
design
4. Sensitivity coefficients for measurements with a 3-level
design
5. Example of error budget
7. Standard and expanded uncertainties
1. Degrees of freedom
8. Treatment of uncorrected bias
1. Computation of revised uncertainty
2.5.1. Issues
Issues for Evaluation of uncertainty is an ongoing process that can consume
uncertainty time and resources. It can also require the services of someone who
analysis is familiar with data analysis techniques, particularly statistical
analysis. Therefore, it is important for laboratory personnel who are
approaching uncertainty analysis for the first time to be aware of the
resources required and to carefully lay out a plan for data collection
and analysis.
Problem areas Some laboratories, such as test laboratories, may not have the
resources to undertake detailed uncertainty analyses even though,
increasingly, quality management standards such as the ISO 9000
series are requiring that all measurement results be accompanied by
statements of uncertainty.
Other situations where uncertainty analyses are problematical are:
● One-of-a-kind measurements
Directions being What can be done in these situations? There is no definitive answer
pursued at this time. Several organizations, such as the National Conference
of Standards Laboratories (NCSL) and the International Standards
Organization (ISO) are investigating methods for dealing with this
problem, and there is a document in draft that will recommend a
simplified approach to uncertainty analysis based on results of
interlaboratory tests.
2.5.2. Approach
Procedures The procedures in this chapter are intended for test laboratories,
in this calibration laboratories, and scientific laboratories that report results of
chapter measurements from ongoing or well-documented processes.
Pertinent The following pages outline methods for estimating the individual
sections uncertainty components, which are consistent with materials presented
in other sections of this Handbook, and rules and equations for
combining them into a final expanded uncertainty. The general
framework is:
1. ISO Approach
2. Outline of steps to uncertainty analysis
3. Methods for type A evaluations
4. Methods for type B evaluations
5. Propagation of error considerations
6. Uncertainty budgets and sensitivity coefficients
7. Standard and expanded uncertainties
8. Treatment of uncorrected bias
Specific Methods for calculating uncertainties for specific results are explained
situations are in the following sections:
outlined in ● Calibrated values of artifacts
other places
in this ● Calibrated values from calibration curves
chapter ❍ From propagation of error
❍ From check standard measurements
❍ Comparison of check standards and propagation of error
● Gauge R & R studies
● Type A components for resistivity measurements
● Type B components for resistivity measurements
Accounts for The estimation of a possible discrepancy takes into account both
both random random error and bias in the measurement process. The distinction to
error and keep in mind with regard to random error and bias is that random
bias errors cannot be corrected, and biases can, theoretically at least, be
corrected or eliminated from the measurement result.
Handbook This Handbook follows the ISO approach (GUM) to stating and
follows the combining components of uncertainty. To this basic structure, it adds a
ISO statistical framework for estimating individual components,
approach particularly those that are classified as type A uncertainties.
ISO Components are grouped into two major categories, depending on the
approach to source of the data and not on the type of error, and each component is
classifying quantified by a standard deviation. The categories are:
sources of ● Type A - components evaluated by statistical methods
error
● Type B - components evaluated by other means (or in other
laboratories)
Type B Type B evaluations apply to random errors and biases for which there
evaluations is little or no data from the local process, and to random errors and
biases from other measurement processes.
2.5.2.1. Steps
Steps in The first step in the uncertainty evaluation is the definition of the result
uncertainty to be reported for the test item for which an uncertainty is required. The
analysis - computation of the standard deviation depends on the number of
define the repetitions on the test item and the range of environmental and
result to be operational conditions over which the repetitions were made, in addition
reported to other sources of error, such as calibration uncertainties for reference
standards, which influence the final result. If the value for the test item
cannot be measured directly, but must be calculated from measurements
on secondary quantities, the equation for combining the various
quantities must be defined. The steps to be followed in an uncertainty
analysis are outlined for two situations:
❍ day-to-day variation
❍ long-term variation
❍ operator differences.
Caveat for The ISO guidelines are based on the assumption that all biases are
biases corrected and that the only uncertainty from this source is the
uncertainty of the correction. The section on type A evaluations of bias
gives guidance on how to assess, correct and calculate uncertainties
related to bias.
Random How the source of error affects the reported value and the context for
error and the uncertainty determines whether an analysis of random error or bias
bias require is appropriate.
different
types of Consider a laboratory with several instruments that can reasonably be
analyses assumed to be representative of all similar instruments. Then the
differences among these instruments can be considered to be a random
effect if the uncertainty statement is intended to apply to the result of
any instrument, selected at random, from this batch.
If, on the other hand, the uncertainty statement is intended to apply to
one specific instrument, then the bias of this instrument relative to the
group is the component of interest.
Time-dependent One of the most important indicators of random error is time, with
changes are a the root cause perhaps being environmental changes over time.
primary source Three levels of time-dependent effects are discussed in this section.
of random
errors
Evaluation How these differences are treated depends primarily on the context
depends on the for the uncertainty statement. The differences, depending on the
context for the context, will be treated either as random differences, or as bias
uncertainty differences.
Two levels may Two levels of time-dependent errors are probably sufficient for
be sufficient describing the majority of measurement processes. Three levels
may be needed for new measurement processes or processes whose
characteristics are not well understood.
Approaches The computation of the uncertainty of the reported value for a test
given in this item is outlined for situations where temporal sources of
chapter uncertainty are estimated from:
1. measurements on the test item itself
2. measurements on a check standard
3. measurements from a 2-level nested design (gauge study)
4. measurements from a 3-level nested design (gauge study)
● operator
● temperature
● humidity
First, decide The approach depends primarily on the context for the uncertainty statement. For
on context for example, if instrument effect is the question, one approach is to regard, say, the
uncertainty instruments in the laboratory as a random sample of instruments of the same type
and to compute an uncertainty that applies to all results regardless of the particular
instrument on which the measurements are made. The other approach is to compute
an uncertainty that applies to results using a specific instrument.
Plan for To evaluate the measurement process for instruments, select a random sample of I (I
collecting > 4) instruments from those available. Make measurements on Q (Q >2) artifacts
data with each instrument.
Graph For a graphical analysis, differences from the average for each artifact can be plotted
showing versus artifact, with instruments individually identified by a special plotting symbol.
differences The plot is examined to determine if some instruments always read high or low
among relative to the other instruments and if this behavior is consistent across artifacts. If
there are systematic and significant differences among instruments, a type A
instruments
uncertainty for instruments is computed. Notice that in the graph for resistivity
probes, there are differences among the probes with probes #4 and #5, for example,
consistently reading low relative to the other probes. A standard deviation that
describes the differences among the probes is included as a component of the
uncertainty.
for each of Q artifacts and I instruments, the pooled standard deviation that describes
the differences among instruments is:
where
Wafers
-------------------------------------------------------
Example Silicon wafers are doped with boron to produce desired levels of
resistivity (ohm.cm). Manufacturing processes for semiconductors
are not yet capable (at least at the time this was originally written) of
producing 2" diameter wafers with constant resistivity over the
surfaces. However, because measurements made at the center of a
wafer by a certification laboratory can be reproduced in the
industrial setting, the inhomogeneity is not a factor in the uncertainty
analysis -- as long as only the center-point of the wafer is used for
future measurements.
Best strategy The best strategy is to draw a sample of bottles from the lot for the
purpose of identifying and quantifying between-bottle variability.
These measurements can be made with a method that lacks the
accuracy required to certify isotopic ratios, but is precise enough to
allow between-bottle comparisons. A second sample is drawn from
the lot and measured with an accurate method for determining
isotopic ratios, and the reported value for the lot is taken to be the
average of these determinations. There are therefore two components
of uncertainty assessed:
1. component that quantifies the imprecision of the average
2. component that quantifies how much an individual bottle can
deviate from the average.
Best strategy In this situation, the best strategy is to compute the reported value as
the average of measurements made over the surface of the piece and
assess an uncertainty for departures from the average. The
component of uncertainty can be assessed by one of several methods
for evaluating bias -- depending on the type of inhomogeneity.
Balanced The simplest scheme for identifying and quantifying the effect of inhomogeneity
measurements of a measurement result is a balanced (equal number of measurements per cell)
at 2-levels 2-level nested design. For example, K bottles of a chemical compound are drawn
at random from a lot and J (J > 1) measurements are made per bottle. The
measurements are denoted by
where the k index runs over bottles and the j index runs over repetitions within a
bottle.
where
and
Between If this variance is negative, there is no contribution to uncertainty, and the bottles
bottle are equivalent with regard to their chemical compositions. Even if the variance is
variance may positive, inhomogeneity still may not be statistically significant, in which case it is
be negative not required to be included as a component of the uncertainty.
If the between-bottle variance is statistically significantly (i.e., judged to be
greater than zero), then inhomogeneity contributes to the uncertainty of the
reported value.
If sreported value
leads to an interval that covers the difference between the reported value
and the average for a bottle selected at random from the batch.
2. The standard deviation
Relationship When the standard deviation for inhomogeneity is included in the calculation, as
to prediction in the last two cases above, the uncertainty interval becomes a prediction interval
intervals ( Hahn & Meeker) and is interpreted as characterizing a future measurement on a
bottle drawn at random from the lot.
The best This problem was treated on the foregoing page as an analysis of
strategy is to random error for the case where the uncertainty was intended to apply
correct for to all measurements for all configurations. If measurements for only
bias and one configuration are of interest, such as measurements made with a
compute the specific instrument, or if a smaller uncertainty is required, the
uncertainty differences among, say, instruments are treated as biases. The best
of the strategy in this situation is to correct all measurements made with a
correction specific instrument to the average for the instruments in the laboratory
and compute a type A uncertainty for the correction. This strategy, of
course, relies on the assumption that the instruments in the laboratory
represent a random sample of all instruments of a specific type.
Only limited However, suppose that it is possible to make comparisons among, say,
comparisons only two instruments and neither is known to be 'unbiased'. This
can be made scenario requires a different strategy because the average will not
among necessarily be an unbiased result. The best strategy if there is a
sources of significant difference between the instruments, and this should be
possible bias tested, is to apply a 'zero' correction and assess a type A uncertainty of
the correction.
Guidelines The discussion above is intended to point out that there are many
for treatment possible scenarios for biases and that they should be treated on a
of biases case-by-case basis. A plan is needed for:
● gathering data
● estimating biases
caused by:
● instruments
● operators
● inhomogeneities
Plan for Measurements needed for assessing biases among instruments, say,
testing for requires a random sample of I (I > 1) instruments from those available
assessing and measurements on Q (Q >2) artifacts with each instrument. The
bias same can be said for the other sources of possible bias. General
strategies for dealing with significant biases are given in the table
below.
Data collection and analysis for assessing biases related to:
● lack of resolution of instrument
● non-linearity of instrument
● drift
Strategy for If there is no significant bias over time, there is no correction and no
no contribution to uncertainty.
significant
bias
Computations The equation for estimating the standard deviation of the correction
based on assumes that biases are uniformly distributed between {-max |bias|, +
uniform or max |bias|}. This assumption is quite conservative. It gives a larger
normal uncertainty than the assumption that the biases are normally distributed.
distribution If normality is a more reasonable assumption, substitute the number '3'
for the 'square root of 3' in the equation above.
Example of The results of resistivity measurements with five probes on five silicon
change in wafers are shown below for probe #283, which is the probe of interest
bias over at this level with the artifacts being 1 ohm.cm wafers. The bias for
time probe #283 is negative for run 1 and positive for run 2 with the runs
separated by a two-month time period. The correction is taken to be
zero.
-----------------------------------
Graph An analysis of bias for five instruments based on measurements on five artifacts
showing shows differences from the average for each artifact plotted versus artifact with
consistent instruments individually identified by a special plotting symbol. The plot is
bias for examined to determine if some instruments always read high or low relative to the
other instruments, and if this behavior is consistent across artifacts. Notice that on
probe #5
the graph for resistivity probes, probe #2362, (#5 on the graph), which is the
instrument of interest for this measurement process, consistently reads low
relative to the other probes. This behavior is consistent over 2 runs that are
separated by a two-month time period.
Strategy - Because there is significant and consistent bias for the instrument of interest, the
correct for measurements made with that instrument should be corrected for its average bias
bias relative to the other instruments.
on Q artifacts with I instruments, the average bias for instrument, I' say, is
where
Computation The correction that should be made to measurements made with instrument I' is
of correction
Type A The type A uncertainty of the correction is the standard deviation of the average
uncertainty bias or
of the
correction
Example of The table below comes from the table of resistivity measurements from a type A
consistent analysis of random effects with the average for each wafer subtracted from each
bias for measurement. The differences, as shown, represent the biases for each probe with
probe #2362 respect to the other probes. Probe #2362 has an average bias, over the five wafers,
used to of -0.02724 ohm.cm. If measurements made with this probe are corrected for this
measure bias, the standard deviation of the correction is a type A uncertainty.
resistivity of
silicon
wafers Table of biases for probes and silicon wafers (ohm.cm)
Wafers
Probe 138 139 140 141 142
-------------------------------------------------------
1 0.02476 -0.00356 0.04002 0.03938 0.00620
181 0.01076 0.03944 0.01871 -0.01072 0.03761
Note on The analysis on this page considers the case where only one instrument is used to
different make the certification measurements; namely probe #2362, and the certified
approaches values are corrected for bias due to this probe. The analysis in the section on type
to A analysis of random effects considers the case where any one of the probes could
instrument be used to make the certification measurements.
bias
● consistent
Example of An example is given of a study of wiring settings for a single gauge. The
differences gauge, a 4-point probe for measuring resistivity of silicon wafers, can be
among wiring wired in several ways. Because it was not possible to test all wiring
settings configurations during the gauge study, measurements were made in only
two configurations as a way of identifying possible problems.
Data on Measurements were made on six wafers over six days (except for 5
wiring measurements on wafer 39) with probe #2062 wired in two
configurations configurations. This sequence of measurements was repeated after about
a month resulting in two runs. A database of differences between
measurements in the two configurations on the same day are analyzed
for significance.
Run software A plot of the differences between the 2 configurations shows that the
macro for differences for run 1 are, for the most part, < zero, and the differences
making for run 2 are > zero. The following Dataplot commands produce the plot:
plotting
differences
between the 2 dimension 500 30
wiring read mpc536.dat wafer day probe d1 d2
configurations let n = count probe
let t = sequence 1 1 n
let zero = 0 for i = 1 1 n
lines dotted blank blank
characters blank 1 2
x1label = DIFFERENCES BETWEEN 2 WIRING
CONFIGURATIONS
x2label SEQUENCE BY WAFER AND DAY
plot zero d1 d2 vs t
Statistical test A t-statistic is used as an approximate test where we are assuming the differences are
for difference approximately normal. The average difference and standard deviation of the
between 2 difference are required for this test. If
configurations
AVGRUN1 -- -0.3834483E-02
SDRUN1 -- 0.5145197E-02
T1 -- -0.4013319E+01
AVGRUN2 -- 0.4886207E-02
SDRUN2 -- 0.4004259E-02
T2 -- 0.6571260E+01
Case of The data reveal a significant wiring bias for both runs that changes direction between
inconsistent runs. Because of this inconsistency, a 'zero' correction is applied to the results, and
bias the type A uncertainty is taken to be
Case of Even if the bias is consistent over time, a 'zero' correction is applied to the results,
consistent and for a single run, the estimated standard deviation of the correction is
bias
For two runs (1 and 2), the estimated standard deviation of the correction is
Sources of Sources of uncertainty that are local to the measurement process but
uncertainty which cannot be adequately sampled to allow a statistical analysis
that are require type B evaluations. One technique, which is widely used, is to
local to the estimate the worst-case effect, a, for the source of interest, from
measurement
● experience
process
● scientific judgment
● scant data
● Triangular
● Normal (Gaussian)
Degrees of In the context of using the Welch-Saitterthwaite formula with the above
freedom distributions, the degrees of freedom is assumed to be infinite.
Exact formula Goodman (1960) derived an exact formula for the variance between two products.
Given two random variables, x and y (correspond to width and length in the above
approximate formula), the exact formula for the variance is:
with
● X = E(x) and Y = E(y) (corresponds to width and length, respectively, in the
approximate formula)
● V(x) = variance of x and V(y) = variance Y (corresponds to s2 for width and
length, respectively, in the approximate formula)
● Eij = {( x)i, ( y)j} where x = x - X and y=y-Y
●
To obtain the standard deviation, simply take the square root of the above formula.
Also, an estimate of the statistic is obtained by substituting sample estimates for the
corresponding population values on the right hand side of the equation.
Approximate The approximate formula assumes that length and width are independent. The exact
formula formula assumes that length and width are not independent.
assumes
indpendence
Disadvantages In the ideal case, the propagation of error estimate above will not differ from the
of estimate made directly from the area measurements. However, in complicated scenarios,
propagation they may differ because of:
of error ● unsuspected covariances
approach
● disturbances that affect the reported value and not the elementary measurements
(usually a result of mis-specification of the model)
● mistakes in propagating the error through the defining formulas
Propagation Sometimes the measurement of interest cannot be replicated directly and it is necessary
of error to estimate its uncertainty via propagation of error formulas (Ku). The propagation of
formula error formula for
Y = f(X, Z, ... )
a function of one or more variables with measurements, X, Z, ... gives the following
estimate for the standard deviation of Y:
where
Treatment of Covariance terms can be difficult to estimate if measurements are not made in pairs.
covariance Sometimes, these terms are omitted from the formula. Guidance on when this is
terms acceptable practice is given below:
1. If the measurements of X, Z are independent, the associated covariance term is
zero.
2. Generally, reported values of test items from calibration designs have non-zero
covariances that must be taken into account if Y is a summation such as the mass
of two weights, or the length of two gage blocks end-to-end, etc.
3. Practically speaking, covariance terms should be included in the computation
only if they have been estimated from sufficient data. See Ku (1966) for guidance
on what constitutes sufficient data.
Sensitivity The partial derivatives are the sensitivity coefficients for the associated components.
coefficients
Examples of Examples of propagation of error that are shown in this chapter are:
propagation ● Case study of propagation of error for resistivity measurements
of error
analyses ● Comparison of check standard analysis and propagation of error for linear
calibration
● Propagation of error for quadratic calibration showing effect of covariance terms
Specific Formulas for specific functions can be found in the following sections:
formulas ● functions of a single variable
Approximation
could be
seriously in
error if n is
small--
Not directly
derived from
the formulas Note: we need to assume that the original
data follow an approximately normal
distribution.
Standard deviation of
= standard dev of X;
Function of ,
= standard dev of Z;
if all covariances are negligible. These formulas are easily extended to more
than three variables.
Software can Propagation of error for more complicated functions can be done reliably
simplify with software capable of algebraic representations such as Mathematica
propagation of (Wolfram).
error
Example from For example, discharge coefficients for fluid flow are computed from the
fluid flow of following equation (Whetstone et al.)
non-linear
function
where
Out[1]=
4
d
Sqrt[1 - ---] m
4
D
-----------------------
2
d F K Sqrt[delp] Sqrt[p]
Partial Partial derivatives are derived via the function D where, for example,
derivatives -
first partial
derivative with D[Cd, {d,1}]
respect to
orifice indicates the first partial derivative of the discharge coefficient with respect
diameter to orifice diameter, and the result returned by Mathematica is
Out[2]=
4
d
-2 Sqrt[1 - ---] m
4
D
-------------------------- -
3
d F K Sqrt[delp] Sqrt[p]
2 d m
------------------------------------
4
d 4
Sqrt[1 - ---] D F K Sqrt[delp] Sqrt[p]
4
D
First partial Similarly, the first partial derivative of the discharge coefficient with respect
derivative with to pressure is represented by
respect to
pressure
D[Cd, {p,1}]
with the result
Out[3]=
4
d
- (Sqrt[1 - ---] m)
4
D
----------------------
2 3/2
2 d F K Sqrt[delp] p
Comparison of The software can also be used to combine the partial derivatives with the
check appropriate standard deviations, and then the standard deviation for the
standard discharge coefficient can be evaluated and plotted for specific values of the
analysis and secondary variables.
propagation of
error
Sensitivity This section defines sensitivity coefficients that are appropriate for
coefficients for type A components estimated from repeated measurements. The
type A pages on type A evaluations, particularly the pages related to
components of estimation of repeatability and reproducibility components, should
uncertainty be reviewed before continuing on this page. The convention for the
notation for sensitivity coefficients for this section is that:
1. refers to the sensitivity coefficient for the repeatability
standard deviation,
2. refers to the sensitivity coefficient for the reproducibility
standard deviation,
3. refers to the sensitivity coefficient for the stability
standard deviation,
with some of the coefficients possibly equal to zero.
Sensitivity This Handbook follows the ISO guidelines in that biases are
coefficients for corrected (correction may be zero), and the uncertainty component
type A is the standard deviation of the correction. Procedures for dealing
components for with biases show how to estimate the standard deviation of the
bias correction so that the sensitivity coefficients are equal to one.
and
and the reported value is the average, the standard deviation of the
reported value is
To improve If possible, the measurements on the test item should be repeated over M
the days and averaged to estimate the reported value. The standard deviation
reliability of for the reported value is computed from the daily averages>, and the
the standard deviation for the temporal component is:
uncertainty
calculation
Standard The computation of the standard deviation from the check standard
deviation values and its relationship to components of instrument precision and
from check day-to-day variability of the process are explained in the section on
standard two-level nested designs using check standards.
measurements
See the relationships in the section on 2-level nested design for definitions of the
standard deviations and their respective degrees of freedom.
Problem If degrees of freedom are required for the uncertainty of the reported value, the formula
with above cannot be used directly and must be rewritten in terms of the standard deviations,
estimating and .
degrees of
freedom
Short-term Day-to-day
Number Number
sensitivity sensitivity
short-term day-to-day coefficient coefficient
N M
1 1 1
N 1 1
N M
Problem If degrees of freedom are required for the uncertainty, the formula above
with cannot be used directly and must be rewritten in terms of the standard
estimating deviations , , and .
degrees of
freedom
a3 = .
Run-to-run
Number Number Number Short-term Day-to-day
sensitivity
short-term day-to-day run-to-run sensitivity coefficient sensitivity coefficient
coefficient
N M P
1 1 1 1
N 1 1 1
N M 1 1
N M P
Example of This example also illustrates the case where the measuring instrument
instrument is biased relative to the other instruments in the laboratory, with a bias
bias correction applied accordingly. The sensitivity coefficient, given that
the bias correction is based on measurements on Q artifacts, is defined
as a4 = 1, and the standard deviation, s4, is the standard deviation of the
correction.
Case study: A case study of type A uncertainty analysis shows the computations of
Uncertainty temporal components of uncertainty; instrument bias; geometrical bias;
and degrees standard uncertainty; degrees of freedom; and expanded uncertainty.
of freedom
The question A method needs to be developed which assures that the resulting
is how to uncertainty has the following properties (Phillips and Eberhardt):
adjust the 1. The final uncertainty must be greater than or equal to the
uncertainty uncertainty that would be quoted if the bias were corrected.
2. The final uncertainty must reduce to the same uncertainty given
that the bias correction is applied.
3. The level of coverage that is achieved by the final uncertainty
statement should be at least the level obtained for the case of
corrected bias.
4. The method should be transferable so that both the uncertainty
and the bias can be used as components of uncertainty in another
uncertainty statement.
5. The method should be easy to implement.
Conditions The definition above can lead to a negative uncertainty limit; e.g., if
on the the bias is positive and greater than U, the upper endpoint becomes
relationship negative. The requirement that the uncertainty limits be greater than or
between the equal to zero for all values of the bias guarantees non-negative
bias and U uncertainty limits and is accepted at the cost of somewhat wider
uncertainty intervals. This leads to the following set of restrictions on
the uncertainty limits:
Situation If the bias is not known exactly, its magnitude is estimated from
where bias is repeated measurements, from sparse data or from theoretical
not known considerations, and the standard deviation is estimated from repeated
exactly but measurements or from an assumed distribution. The standard deviation
must be of the bias becomes a component in the uncertainty analysis with the
estimated standard uncertainty restructured to be:
Interpretation The uncertainty intervals described above have the desirable properties
outlined on a previous page. For more information on theory and
industrial examples, the reader should consult the paper by the authors
of this technique (Phillips and Eberhardt).
Gauges The gauges for the study were five probes used to measure resistivity
of silicon wafers. The five gauges are assumed to represent a random
sample of typical 4-point gauges for making resistivity measurements.
There is a question of whether or not the gauges are essentially
equivalent or whether biases among them are possible.
Check The check standards for the study were five wafers selected at random
standards from the batch of 100 ohm.cm wafers.
Operators The effect of operator was not considered to be significant for this
study.
● L = 2 runs
Measurements on the The effect of operator was not considered to be significant for this
check standards are study; therefore, 'day' replaces 'operator' as a factor in the nested
used to estimate design. Averages and standard deviations from J = 6
repeatability, day measurements at the center of each wafer are shown in the table.
effect, and run effect ● J = 6 measurements at the center of the wafer per day
● L = 2 runs (complete)
The results of pooling are shown below. Intermediate steps are not
shown, but the section on repeatability standard deviations shows an
example of pooling over wafers.
and pooled over as shown in the table below. The table shows that the level-2
wafers and runs standard deviations are consistent over wafers and runs.
Run 1
Run 2
Wafer Probe Average Stddev DF Average
Stddev DF
(over 2 runs)
0.0362 50
Level-3 Level-3 standard deviations are computed from the averages of the two
(stability) runs. Then the level-3 standard deviations are pooled over the five
standard wafers to obtain a standard deviation with 5 degrees of freedom as
deviations shown in the table below.
computed
from run
averages
and pooled
over wafers
Run 1 Run 2
Wafer Probe Average Average Diff
Stddev DF
2362. Pooled
0.0197 5
Graphs of A graphical analysis shows the relative biases among the 5 probes. For each
probe wafer, differences from the wafer average by probe are plotted versus wafer
biases number. The graphs verify that probe #2362 (coded as 5) is biased low
relative to the other probes. The bias shows up more strongly after the
probes have been in use (run 2).
How to deal Probe #2362 was chosen for the certification process because of its
with bias superior precision, but its bias relative to the other probes creates a
due to the problem. There are two possibilities for handling this problem:
probe 1. Correct all measurements made with probe #2362 to the average
of the probes.
2. Include the standard deviation for the difference among probes in
the uncertainty budget.
The better choice is (1) if we can assume that the probes in the study
represent a random sample of probes of this type. This is particularly
true when the unit (resistivity) is defined by a test method.
Run 1 -
Graph of
repeatability
standard
deviations
for probe
#2362 -- 6
days and 5
wafers
showing
that
repeatability
is constant
across
wafers and
days
Run 2 -
Graph of
repeatability
standard
deviations
for probe
#2362 -- 6
days and 5
wafers
showing
that
repeatability
is constant
across
wafers and
days
Run 1 -
Graph
showing
repeatability
standard
deviations
for five
probes as a
function of
wafers and
probes
Run 2 -
Graph
showing
repeatability
standard
deviations
for 5 probes
as a
function of
wafers and
probes
Wafer 138
Wafer 139
Wafer 140
Wafer 141
Wafer 142
Run 1 -
Graph of
differences
from
wafer
averages
for each of
5 probes
showing
that
probes
#2062 and
#2362 are
biased low
relative to
the other
probes
Run 2 -
Graph of
differences
from
wafer
averages
for each of
5 probes
showing
that probe
#2362
continues
to be
biased low
relative to
the other
probes
Plot of wafer
and day effect reset data
on reset plot control
repeatability reset i/o
standard dimension 500 30
deviations for label size 3
run 2 read mpc61.dat run wafer probe mo day op hum y sw
y1label ohm.cm
title GAUGE STUDY
lines blank all
let z = pattern 1 2 3 4 5 6 for I = 1 1 300
let z2 = wafer + z/10 -0.25
characters a b c d e f
X1LABEL WAFERS
X2LABEL REPEATABILITY STANDARD DEVIATIONS BY WAFER AND DAY
X3LABEL CODE FOR DAYS: A, B, C, D, E, F
TITLE RUN 2
plot sw z2 day subset run 2
Plot of
repeatability reset data
standard reset plot control
deviations for reset i/o
5 probes - run dimension 500 30
1 label size 3
read mpc61.dat run wafer probe mo day op hum y sw
y1label ohm.cm
title GAUGE STUDY
lines blank all
let z = pattern 1 2 3 4 5 6 for I = 1 1 300
let z2 = wafer + z/10 -0.25
characters 1 2 3 4 5
X1LABEL WAFERS
X2LABEL REPEATABILITY STANDARD DEVIATIONS BY WAFER AND PROBE
X3LABEL CODE FOR PROBES: 1= SRM1; 2= 281; 3=283; 4=2062;
5=2362
TITLE RUN 1
plot sw z2 probe subset run 1
Plot of
repeatability reset data
standard reset plot control
deviations for reset i/o
5 probes - run dimension 500 30
2 label size 3
read mpc61.dat run wafer probe mo day op hum y sw
y1label ohm.cm
title GAUGE STUDY
lines blank all
let z = pattern 1 2 3 4 5 6 for I = 1 1 300
let z2 = wafer + z/10 -0.25
characters 1 2 3 4 5
X1LABEL WAFERS
X2LABEL REPEATABILITY STANDARD DEVIATIONS BY WAFER AND PROBE
X3LABEL CODE FOR PROBES: 1= SRM1; 2= 281; 3=283; 4=2062;
5=2362
TITLE RUN 2
plot sw z2 probe subset run 2
Plot of
differences reset data
from the wafer reset plot control
mean for 5 reset i/o
probes - run 1 dimension 500 30
read mpc61a.dat wafer probe d1 d2
let biasrun1 = mean d1 subset probe 2362
print biasrun1
title GAUGE STUDY FOR 5 PROBES
Y1LABEL OHM.CM
lines dotted dotted dotted dotted dotted solid
characters 1 2 3 4 5 blank
xlimits 137 143
let zero = pattern 0 for I = 1 1 30
x1label DIFFERENCES AMONG PROBES VS WAFER (RUN 1)
plot d1 wafer probe and
plot zero wafer
Plot of
differences reset data
from the wafer reset plot control
mean for 5 reset i/o
probes - run 2 dimension 500 30
read mpc61a.dat wafer probe d1 d2
let biasrun2 = mean d2 subset probe 2362
print biasrun2
title GAUGE STUDY FOR 5 PROBES
Y1LABEL OHM.CM
lines dotted dotted dotted dotted dotted solid
characters 1 2 3 4 5 blank
xlimits 137 143
let zero = pattern 0 for I = 1 1 30
x1label DIFFERENCES AMONG PROBES VS WAFER (RUN 2)
plot d2 wafer probe and
plot zero wafer
Plot of
averages by reset data
day showing reset plot control
reproducibility reset i/o
and stability dimension 300 50
for label size 3
measurements read mcp61b.dat wafer probe mo1 day1 y1 mo2 day2 y2 diff
made with let t = mo1+(day1-1)/31.
probe #2362 let t2= mo2+(day2-1)/31.
on 5 wafers x3label WAFER 138
multiplot 3 2
plot y1 t subset wafer 138 and
Design of data The measurements were carried out according to an ASTM Test
collection and Method (F84) with NIST probe #2362. The measurements on the
Database check standard duplicate certification measurements that were being
made, during the same time period, on individual wafers from crystal
#51939. For the check standard there were:
● J = 6 repetitions at the center of the wafer on each day
● K = 25 days
The K = 25 days cover the time during which the individual wafers
were being certified at the National Institute of Standards and
Technology.
Control chart The control chart for monitoring the precision of probe #2362 is
for probe constructed as discussed in the section on control charts for standard
#2362 deviations. The upper control limit (UCL) for testing for degradation
of the probe is computed using the critical value from the F table with
numerator degrees of freedom J - 1 = 5 and denominator degrees of
freedom K(J - 1) = 125. For a 0.05 significance level,
F0.05(5,125) = 2.29
Interpretation The control chart shows two points exceeding the upper control limit.
of control We expect 5% of the standard deviations to exceed the UCL for a
chart for measurement process that is in-control. Two outliers are not indicative
probe #2362 of significant problems with the repeatability for the probe, but the
probe should be monitored closely in the future.
Control chart The control limits for monitoring the bias and long-term variability of
for bias and resistivity with a Shewhart control chart are given by
variability
UCL = Average + 2*s2 = 97.1234 ohm.cm
Centerline = Average = 97.0698 ohm.cm
LCL = Average - 2*s2 = 97.0162 ohm.cm
Interpretation The control chart shows that the points scatter randomly about the
of control center line with no serious problems, although one point exceeds the
chart for bias upper control limit and one point exceeds the lower control limit by a
small amount. The conclusion is that there is:
● No evidence of bias, change or drift in the measurement
process.
● No evidence of long-term lack of control.
Control
chart for
probe
#2362
showing
violations
of the
control
limits --
all
standard
deviations
are based
on 6
repetitions
and the
control
limits are
95%
limits
Shewhart
control chart
for
measurements
on a
resistivity
check
standard
showing that
the process is
in-control --
all
measurements
are averages
of 6
repetitions
Control chart for precision The precision control chart shows two points
exceeding the upper control limit. We expect 5%
Control chart for probe #2362
of the standard deviations to exceed the UCL
Computations: even when the measurement process is
in-control.
1. Pooled repeatability standard deviation
2. Control limit
Normal
probability reset data
plot for reset plot control
check reset i/o
standard dimension 500 30
#137 to test skip 14
assumption read mpc62.dat crystal wafer mo day hour min op hum probe
of normality temp y sw df
normal probabilty plot y
Control
chart for reset data
precision of reset plot control
probe reset i/o
#2372 and dimension 500 30
computation skip 14
of control read mpc62.dat crystal wafer mo day hour min op hum probe
parameter temp y sw df
estimates let time = mo +(day-1)/31.
let s = sw*sw
let spool = mean s
let spool = spool**.5
print spool
let f = fppf(.95, 5, 125)
let ucl = spool*(f)**.5
print ucl
title Control chart for precision
characters blank blank O
lines solid dashed blank
y1label ohm.cm
x1label Time in days
x2label Standard deviations with probe #2362
x3label 5% upper control limit
let center = sw - sw + spool
let cl = sw - sw + ucl
plot center cl sw vs time
Shewhart
control reset data
chart for reset plot control
check reset i/o
standard dimension 500 30
#137 with skip 14
computation read mpc62.dat crystal wafer mo day hour min op hum probe
of control temp y sw df
chart let time = mo +(day-1)/31.
parameters let avg = mean y
let sprocess = standard deviation y
let ucl = avg + 2*sprocess
let lcl = avg - 2*sprocess
print avg
print sprocess
print ucl lcl
title Shewhart control chart
characters O blank blank blank
lines blank dashed solid dashed
y1label ohm.cm
x1label Time in days
x2label Check standard 137 with probe 2362
x3label 2-sigma control limits
let ybar = y - y + avg
let lc1 = y - y + lcl
let lc2 = y - y + ucl
plot y lc1 ybar lc2 vs time
Sources of The uncertainty analysis takes into account the following sources of
uncertainty in variability:
NIST ● Repeatability of measurements at the center of the wafer
measurements
● Day-to-day effects
● Run-to-run effects
● Bias due to probe #2362
● Bias due to wiring configuration
Measurements on the The effect of operator was not considered to be significant for this
check standards are study. Averages and standard deviations from J = 6 measurements
used to estimate at the center of each wafer are shown in the table.
repeatability, day ● J = 6 measurements at the center of the wafer per day
effect, run effect
● K = 6 days (one operator) per repetition
● L = 2 runs (complete)
Standard
Run Wafer Probe Month Day Operator Temp Average
Deviation
Summary of The level-1, level-2, and level-3 standard deviations for the uncertainty
standard analysis are summarized in the table below from the gauge case study.
deviations at
three levels
Standard deviations for probe #2362
Calculation of The certified value for each wafer is the average of N = 6 repeatability
the standard measurements at the center of the wafer on M = 1 days and over P = 1 runs.
deviation of Notice that N, M and P are not necessarily the same as the number of
the certified measurements in the gauge study per wafer; namely, J, K and L. The
value showing standard deviation of a certified value (for time-dependent sources of
sensitivity error), is
coefficients
Standard deviations for days and runs are included in this calculation, even
though there were no replications over days or runs for the certification
measurements. These factors contribute to the overall uncertainty of the
measurement process even though they are not sampled for the particular
measurements of interest.
The equation Degrees of freedom cannot be calculated from the equation above because
must be the calculations for the individual components involve differences among
rewritten to variances. The table of sensitivity coefficients for a 3-level design shows
calculate that for
degrees of N = J, M = 1, P = 1
freedom
the equation above can be rewritten in the form
Probe bias - A graphical analysis shows the relative biases among the 5 probes. For
Graphs of each wafer, differences from the wafer average by probe are plotted versus
probe biases wafer number. The graphs verify that probe #2362 (coded as 5) is biased
low relative to the other probes. The bias shows up more strongly after the
probes have been in use (run 2).
How to deal Probe #2362 was chosen for the certification process because of its superior
with bias due precision, but its bias relative to the other probes creates a problem. There
to the probe are two possibilities for handling this problem:
1. Correct all measurements made with probe #2362 to the average of
the probes.
2. Include the standard deviation for the difference among probes in the
uncertainty budget.
The best strategy, as followed in the certification process, is to correct all
measurements for the average bias of probe #2362 and take the standard
deviation of the correction as a type A component of uncertainty.
Correction for Biases by probe and wafer are shown in the gauge case study. Biases for
bias or probe probe #2362 are summarized in table below for the two runs. The
#2362 and correction is taken to be the negative of the average bias. The standard
uncertainty deviation of the correction is the standard deviation of the average of the
ten biases.
Test for This finding is consistent over runs 1 and 2 and is confirmed by the
difference t-statistics in the table below where the average differences and standard
between deviations are computed from 6 days of measurements on 5 wafers. A
configurations t-statistic < 2 indicates no significant difference. The conclusion is that
there is no bias due to wiring configuration and no contribution to
uncertainty from this source.
Run-to-run A a3 = 1 0.0197 5
Probe #2362 A a4 = 0.0162 5
Wiring A a5 = 1 0 --
Configuration A
where the i are the degrees of freedom given in the rightmost column of
the table.
The critical value at the 0.05 significance level with 42 degrees of freedom,
from the t-table, is 2.018 so the expanded uncertainty is
Artifacts for the The five wafers; namely, #138, #139, #140, #141, and #142 are
study coded 1, 2, 3, 4, 5, respectively, in the graphs. These wafers were
chosen at random from a batch of approximately 100 wafers that
were being certified for resistivity.
run 2
plot sr subset run 2
let var = sr*sr
let df11 = sum df subset run 1
let df12 = sum df subset run 2
let s11 = sum var subset run 1
let s12 = sum var subset run 2
let s11 = (5.*s11/df11)**(1/2)
let s12 = (5.*s12/df12)**(1/2)
print s11 df11
print s12 df12
let s1 = ((s11**2 + s12**2)/2.)**(1/2)
let df1=df11+df12
. repeatability standard deviation and df for run 2
print s1 df1
. end of calculations
Computes
level-2 reset data
standard reset plot control
deviations from reset i/o
daily averages dimension 500 rows
and pools over label size 3
wafers -- run 1 set read format f1.0,f6.0,f8.0,32x,f10.4,f10.4
read mpc633a.dat run wafer probe y sr
retain run wafer probe y sr subset probe 2362
sd plot y wafer subset run 1
let s21 = yplot
let wafer1 = xplot
retain s21 wafer1 subset tagplot = 1
let nwaf = size s21
let df21 = 5 for i = 1 1 nwaf
. level-2 standard deviations and df for 5 wafers - run 1
print wafer1 s21 df21
. end of calculations
Computes
level-2 reset data
standard reset plot control
deviations from reset i/o
daily averages dimension 500 rows
and pools over label size 3
wafers -- run 2 set read format f1.0,f6.0,f8.0,32x,f10.4,f10.4
read mpc633a.dat run wafer probe y sr
retain run wafer probe y sr subset probe 2362
sd plot y wafer subset run 2
let s22 = yplot
let wafer1 = xplot
retain s22 wafer1 subset tagplot = 1
let nwaf = size s22
Pools level-2
standard reset data
deviations over reset plot control
wafers and reset i/o
runs dimension 500 30
label size 3
set read format f1.0,f6.0,f8.0,32x,f10.4,f10.4
read mpc633a.dat run wafer probe y sr
retain run wafer probe y sr subset probe 2362
sd plot y wafer subset run 1
let s21 = yplot
let wafer1 = xplot
sd plot y wafer subset run 2
let s22 = yplot
retain s21 s22 wafer1 subset tagplot = 1
let nwaf = size wafer1
let df21 = 5 for i = 1 1 nwaf
let df22 = 5 for i = 1 1 nwaf
let s2a = (s21**2)/5 + (s22**2)/5
let s2 = sum s2a
let s2 = sqrt(s2/2)
let df2a = df21 + df22
let df2 = sum df2a
. pooled level-2 standard deviation and df across wafers and
runs
print s2 df2
. end of calculations
Computes
level-3standard reset data
deviations from reset plot control
run averages reset i/o
and pools over dimension 500 rows
wafers label size 3
set read format f1.0,f6.0,f8.0,32x,f10.4,f10.4
read mpc633a.dat run wafer probe y sr
retain run wafer probe y sr subset probe 2362
.
mean plot y wafer subset run 1
let m31 = yplot
let wafer1 = xplot
mean plot y wafer subset run 2
let m32 = yplot
retain m31 m32 wafer1 subset tagplot = 1
let nwaf = size m31
Plot
differences reset data
from the reset plot control
average wafer reset i/o
value for each dimension 500 30
probe showing read mpc61a.dat wafer probe d1 d2
bias for probe let biasrun1 = mean d1 subset probe 2362
#2362 let biasrun2 = mean d2 subset probe 2362
print biasrun1 biasrun2
title GAUGE STUDY FOR 5 PROBES
Y1LABEL OHM.CM
lines dotted dotted dotted dotted dotted solid
characters 1 2 3 4 5 blank
xlimits 137 143
let zero = pattern 0 for I = 1 1 30
x1label DIFFERENCES AMONG PROBES VS WAFER (RUN 1)
plot d1 wafer probe and
plot zero wafer
let biasrun2 = mean d2 subset probe 2362
print biasrun2
title GAUGE STUDY FOR 5 PROBES
Y1LABEL OHM.CM
lines dotted dotted dotted dotted dotted solid
characters 1 2 3 4 5 blank
xlimits 137 143
let zero = pattern 0 for I = 1 1 30
x1label DIFFERENCES AMONG PROBES VS WAFER (RUN 2)
plot d2 wafer probe and
plot zero wafer
. end of calculations
Compute bias
for probe reset data
#2362 by wafer reset plot control
reset i/o
dimension 500 30
label size 3
set read format f1.0,f6.0,f8.0,32x,f10.4,f10.4
read mpc633a.dat run wafer probe y sr
set read format
.
cross tabulate mean y run wafer
retain run wafer probe y sr subset probe 2362
skip 1
read dpst1f.dat runid wafid ybar
print runid wafid ybar
let ngroups = size ybar
skip 0
.
let m3 = y - y
feedback off
loop for k = 1 1 ngroups
let runa = runid(k)
let wafera = wafid(k)
let ytemp = ybar(k)
let m3 = ytemp subset run = runa subset wafer = wafera
end of loop
feedback on
.
let d = y - m3
let bias1 = average d subset run 1
let bias2 = average d subset run 2
.
mean plot d wafer subset run 1
let b1 = yplot
let wafer1 = xplot
mean plot d wafer subset run 2
let b2 = yplot
retain b1 b2 wafer1 subset tagplot = 1
let nwaf = size b1
. biases for run 1 and run 2 by wafers
print wafer1 b1 b2
. average biases over wafers for run 1 and run 2
print bias1 bias2
. end of calculations
Compute
correction for reset data
bias for reset plot control
measurements reset i/o
with probe dimension 500 30
#2362 and the label size 3
standard set read format f1.0,f6.0,f8.0,32x,f10.4,f10.4
deviation of the read mpc633a.dat run wafer probe y sr
correction set read format
.
cross tabulate mean y run wafer
retain run wafer probe y sr subset probe 2362
skip 1
read dpst1f.dat runid wafid ybar
let ngroups = size ybar
skip 0
.
let m3 = y - y
feedback off
loop for k = 1 1 ngroups
let runa = runid(k)
let wafera = wafid(k)
let ytemp = ybar(k)
let m3 = ytemp subset run = runa subset wafer = wafera
end of loop
feedback on
.
let d = y - m3
let bias1 = average d subset run 1
let bias2 = average d subset run 2
.
mean plot d wafer subset run 1
let b1 = yplot
let wafer1 = xplot
mean plot d wafer subset run 2
let b2 = yplot
retain b1 b2 wafer1 subset tagplot = 1
.
extend b1 b2
let sd = standard deviation b1
let sdcorr = sd/(10**(1/2))
let correct = -(bias1+bias2)/2.
. correction for probe #2362, standard dev, and standard dev
of corr
print correct sd sdcorr
. end of calculations
Plot
differences reset data
between wiring reset plot control
configurations reset i/o
A and B dimension 500 30
label size 3
read mpc633k.dat wafer probe a1 s1 b1 s2 a2 s3 b2 s4
let diff1 = a1 - b1
let diff2 = a2 - b2
let t = sequence 1 1 30
lines blank all
characters 1 2 3 4 5
y1label ohm.cm
x1label Config A - Config B -- Run 1
x2label over 6 days and 5 wafers
x3label legend for wafers 138, 139, 140, 141, 142: 1, 2, 3,
4, 5
plot diff1 t wafer
x1label Config A - Config B -- Run 2
plot diff2 t wafer
. end of calculations
Compute
average reset data
differences reset plot control
between reset i/o
configuration separator character @
A and B; dimension 500 rows
standard label size 3
deviations and read mpc633k.dat wafer probe a1 s1 b1 s2 a2 s3 b2 s4
t-statistics for let diff1 = a1 - b1
testing let diff2 = a2 - b2
significance let d1 = average diff1
let d2 = average diff2
let s1 = standard deviation diff1
let s2 = standard deviation diff2
let t1 = (30.)**(1/2)*(d1/s1)
let t2 = (30.)**(1/2)*(d2/s2)
. Average config A-config B; std dev difference; t-statistic
for run 1
print d1 s1 t1
. Average config A-config B; std dev difference; t-statistic
for run 2
print d2 s2 t2
separator character ;
. end of calculations
Compute
standard reset data
uncertainty, reset plot control
effective reset i/o
degrees of dimension 500 rows
freedom, t label size 3
value and read mpc633m.dat sz a df
expanded let c = a*sz*sz
uncertainty let d = c*c
let e = d/(df)
let sume = sum e
let u = sum c
let u = u**(1/2)
let effdf=(u**4)/sume
let tvalue=tppf(.975,effdf)
let expu=tvalue*u
.
. uncertainty, effective degrees of freedom, tvalue and
. expanded uncertainty
print u effdf tvalue expu
. end of calculations
● Day-to-day effects
● Run-to-run effects
● Bias due to probe #2362
● Bias due to wiring configuration
Need for Not all factors could be replicated during the gauge experiment. Wafer
propagation thickness and measurements required for the scale corrections were
of error measured off-line. Thus, the type B evaluation of uncertainty is computed
using propagation of error. The propagation of error formula in units of
resistivity is as follows:
Standard Standard deviations for the type B components are summarized here. For a
deviations for complete explanation, see the publication (Ehrstein and Croarkin).
type B
evaluations
Electrical There are two basic sources of uncertainty for the electrical measurements.
measurements The first is the least-count of the digital volt meter in the measurement of X
with a maximum bound of
a = 0.0000534 ohm
which is assumed to be the half-width of a uniform distribution. The
second is the uncertainty of the electrical scale factor. This has two sources
of uncertainty:
1. error in the solution of the transcendental equation for determining
the factor
2. errors in measured voltages
The maximum bounds to these errors are assumed to be half-widths of
Thickness The standard deviation for thickness, t, accounts for two sources of
uncertainty:
1. calibration of the thickness measuring tool with precision gauge
blocks
2. variation in thicknesses of the silicon wafers
The maximum bounds to these errors are assumed to be half-widths of
Temperature The standard deviation for the temperature correction is calculated from its
correction defining equation as shown below. Thus, the standard deviation for the
correction is the standard deviation associated with the measurement of
temperature multiplied by the temperature coefficient, C(t) = 0.0083.
The maximum bound to the error of the temperature measurement is
assumed to be the half-width
a = 0.13 °C
of a triangular distribution. Thus the standard deviation of the correction
for
is
Thickness The standard deviation for the thickness scale factor is negligible.
scale factor
Associated Sensitivity coefficients for translating the standard deviations for the type B
sensitivity components into units of resistivity (ohm.cm) from the propagation of error
coefficients equation are listed below and in the error budget. The sensitivity coefficient
for a source is the multiplicative factor associated with the standard
deviation in the formula above; i.e., the partial derivative with respect to
that variable from the propagation of error equation.
Sensitivity Sensitivity coefficients for the type A components are shown in the case
coefficients study of type A uncertainty analysis and repeated below. Degrees of
and degrees freedom for type B uncertainties based on assumed distributions, according
of freedom to the convention, are assumed to be infinite.
Error budget The error budget showing sensitivity coefficients for computing the relative
showing standard uncertainty of volume resistivity (ohm.cm) with degrees of
sensitivity freedom is outlined below.
coefficients,
standard Error budget for volume resistivity (ohm.cm)
deviations Standard
and degrees Source Type Sensitivity Deviation DF
of freedom
Repeatability A a1 = 0 0.0710 300
Reproducibility A a2 = 0.0362 50
Run-to-run A a3 = 1 0.0197 5
Probe #2362 A a4 = 0.0162 5
Wiring A a5 = 1 0 --
Configuration A
Resistance B a6 = 900.901 0.0000308
ratio
Electrical B a7 = 22.222 0.000227
scale
Thickness B a8 = 159.20 0.00000868
Temperature B a9 = 100 0.000441
correction
2.7. References
Degrees of K. A. Brownlee (1960). Statistical Theory and Methodology in
freedom Science and Engineering, John Wiley & Sons, Inc., New York, p.
236.
ASTM F84 for ASTM Method F84-93, Standard Test Method for Measuring
resistivity Resistivity of Silicon Wafers With an In-line Four-Point Probe.
Annual Book of ASTM Standards, 10.05, West Conshohocken, PA
19428.
ISO 5725 for ISO 5725: 1997. Accuracy (trueness and precision) of measurement
interlaboratory results, Part 2: Basic method for repeatability and reproducibility of
testing a standard measurement method, ISO, Case postale 56, CH-1211,
Genève 20, Switzerland.
ISO 11095 on ISO 11095: 1997. Linear Calibration using Reference Materials,
linear ISO, Case postale 56, CH-1211, Genève 20, Switzerland.
calibration
MSA gauge Measurement Systems Analysis Reference Manual, 2nd ed., (1995).
studies manual Chrysler Corp., Ford Motor Corp., General Motors Corp., 120 pages.
Exact variance Leo Goodman (1960). "On the Exact Variance of Products" in
for length and Journal of the American Statistical Association, December, 1960, pp.
width 708-713.
1. Introduction 2. Assumptions
1. Definition 1. General Assumptions
2. Uses 2. Specific PPC Models
3. Terminology/Concepts
4. PPC Steps
5. Case Studies
1. Furnace Case Study
2. Machine Case Study
References
6. References [3.6.]
Not all of The first two steps are only needed for new processes or when the
the steps process has undergone some significant engineering change. There are,
need to be however, many times throughout the life of a process when the third
performed step is needed. Examples might be: initial process qualification, control
chart development, after minor process adjustments, after schedule
equipment maintenance, etc.
When process ● when we are bringing a new process or tool into use.
characterization ● when we are bringing a tool or process back up after
is required scheduled/unscheduled maintenance.
● when we want to compare tools or processes.
● when we want to check the health of our process during the
monitoring phase.
● when we are troubleshooting a bad process.
Process ● calibration
characterization ● process monitoring
techniques are
● process improvement
applicable in
other areas ● process/product comparison
● reliability
3.1.3. Terminology/Concepts
There are just a few fundamental concepts needed for PPC.
This section will review these ideas briefly and provide
links to other sections in the Handbook where they are
covered in more detail.
Location
The location is the expected value of the output being measured.
For a stable process, this is the value around which the process
has stabilized.
Spread
The spread is the expected amount of variation associated with
the output. This tells us the range of possible values that we
would expect to see.
Shape
The shape shows how the variation is distributed about the
location. This tells us if our variation is symmetric about the
mean or if it is skewed or possibly multimodal.
A primary One of the primary goals of a PPC study is to characterize our process
goal of PPC outputs in terms of these three measurements. If we can demonstrate
is to estimate that our process is stabilized about a constant location, with a constant
the variance and a known stable shape, then we have a process that is both
distributions predictable and controllable. This is required before we can set up
of the control charts or conduct experiments.
process
outputs
Click on The table below shows the most common numerical and graphical
each item to measures of location, spread and shape.
read more
Parameter Numerical Graphical
detail
scatter plot
mean
Location boxplot
median
histogram
variance
boxplot
Spread range
histogram
inter-quartile range
boxplot
skewness
Shape histogram
kurtosis probability plot
How does The standard deviation (square root of the variance) gives insight into the spread of the data through
the the use of what is known as the Empirical Rule. This rule (shown in the graph below) is:
standard
deviation Approximately 60-78% of the data are within a distance of one standard deviation from the average
describe the ( -s, +s).
spread of Approximately 90-98% of the data are within a distance of two standard deviations from the
the data? average ( -2s, +2s).
More than 99% of the data are within a distance of three standard deviations from the average (
-3s, +3s).
Variability This observed variability is an accumulation of many different sources of variation that have
accumulates occurred throughout the manufacturing process. One of the more important activities of process
from many characterization is to identify and quantify these various sources of variation so that they may be
sources minimized.
There are There are not only different sources of variation, but there are also different types of variation. Two
also important classifications of variation for the purposes of PPC are controlled variation and
different uncontrolled variation.
types
In the course of process characterization we should endeavor to eliminate all sources of uncontrolled
variation.
Examples The first process is an example of a process that is "in control" with random
of"in fluctuation about a process location of approximately 990. The second process is an
control" and example of a process that is "out of control" with a process location trending upward
"out of after observation 20.
control"
processes
This process
exhibits
controlled
variation.
Note the
random
fluctuation
about a
constant
mean.
This process
exhibits
uncontrolled
variation.
Note the
structure in
the
variation in
the form of
a linear
trend.
How do I Usually we can model the contribution of the various sources of error to
partition the the total error through a simple linear relationship. If we have a simple
error? linear relationship between two variables, say,
.
If the variables are not correlated, then there is no covariance and the
last term in the above equation drops off. A good example of this is the
case in which we have both process error and measurement error. Since
these are usually independent of each other, the total observed variance
is just the sum of the variances for process and measurement.
Remember to never add standard deviations, we must add variances.
Facts about If it weren't for process variation we could just take one sample and
a sample everything would be known about the target population. Unfortunately
are not this is never the case. We cannot take facts about the sample to be facts
necessarily about the population. Our job is to reach appropriate conclusions about
facts about the population despite this variation. The more observations we take
a population from a population, the more our sample data resembles the population.
When we have reached the point at which facts about the sample are
reasonable approximations of facts about the population, then we say the
sample is adequate.
We use the We can use the simple black-box model, shown below, to describe most of the tools and
black-box processes we will encounter in PPC. The process will be stimulated by inputs. These
model to inputs can either be controlled (such as recipe or machine settings) or uncontrolled (such
describe as humidity, operators, power fluctuations, etc.). These inputs interact with our process
our and produce outputs. These outputs are usually some characteristic of our process that
processes we can measure. The measurable inputs and outputs can be sampled in order to observe
and understand how they behave and relate to each other.
Diagram
of the
black box
model
These inputs and outputs are also known as Factors and Responses, respectively.
Factors
Observed inputs used to explain response behavior (also called explanatory
variables). Factors may be fixed-level controlled inputs or sampled uncontrolled
inputs.
Responses
Sampled process outputs. Responses may also be functions of sampled outputs
such as average thickness or uniformity.
Factors We further categorize factors and responses according to their Variable Type, which
and indicates the amount of information they contain. As the name implies, this classification
Responses is useful for data modeling activities and is critical for selecting the proper analysis
are further technique. The table below summarizes this categorization. The types are listed in order
classified of the amount of information they contain with Measurement containing the most
by information and Nominal containing the least.
variable
type
Table
Type Description Example
describing
the discrete/continuous, order is particle count, oxide thickness,
different Measurement
important, infinite range pressure, temperature
variable
types discrete, order is important, finite
Ordinal run #, wafer #, site, bin
range
good/bad, bin,
discrete, no order, very few
Nominal high/medium/low, shift,
possible values
operator
Fishbone We can use the fishbone diagram to further refine the modeling process. Fishbone
diagrams diagrams are very useful for decomposing the complexity of our manufacturing
help to processes. Typically, we choose a process characteristic (either Factors or Responses)
decompose and list out the general categories that may influence the characteristic (such as material,
complexity machine method, environment, etc.), and then provide more specific detail within each
category. Examples of how to do this are given in the section on Case Studies.
Sample
fishbone
diagram
We look for There are generally two types of relationships that we are interested in
correlations for purposes of PPC. They are:
and causal Correlation
relationships
Two variables are said to be correlated if an observed change in
the level of one variable is accompanied by a change in the level
of another variable. The change may be in the same direction
(positive correlation) or in the opposite direction (negative
correlation).
Causality
There is a causal relationship between two variables if a change
in the level of one variable causes a change in the other variable.
Note that correlation does not imply causality. It is possible for two
variables to be associated with each other without one of them causing
the observed behavior in the other. When this is the case it is usually
because there is a third (possibly unknown) causal factor.
Our goal is to Generally, our ultimate goal in PPC is to find and quantify causal
find causal relationships. Once this is done, we can then take advantage of these
relationships relationships to improve and control our processes.
Find Generally, we first need to find and explore correlations and then try to
correlations establish causal relationships. It is much easier to find correlations as
and then try these are just properties of the data. It is much more difficult to prove
to establish causality as this additionally requires sound engineering judgment.
causal There is a systematic procedure we can use to accomplish this in an
relationships efficient manner. We do this through the use of designed experiments.
First we When we have many potential factors and we want to see which ones
screen, then are correlated and have the potential to be involved in causal
we build relationships with the responses, we use screening designs to reduce
models the number of candidates. Once we have a reduced set of influential
factors, we can use response surface designs to model the causal
relationships with the responses across the operating range of the
process factors.
Step 1: Plan The most important step by far is the planning step. By faithfully
executing this step, we will ensure that we only collect data in the most
efficient manner possible and still support the goals of the PPC.
Planning should generate the following:
● a statement of the goals
Step 2: Data collection is essentially just the execution of the sampling plan part
Collect of the previous step. If a good job were done in the planning step, then
this step should be pretty straightforward. It is important to execute to
the plan as closely as possible and to note any exceptions.
Further The planning and data collection steps are described in detail in the data
information collection section. The analysis and interpretation steps are covered in
detail in the analysis section. Examples of the reporting step can be seen
in the Case Studies.
Assumption: Finally, we assume that the data used to fit these models are
data used to fit representative of the process being modeled. As a result, we must
these models additionally assume that the measurement system used to collect the
are data has been studied and proven to be capable of making
representative measurements to the desired precision and accuracy. If this is not the
of the process case, refer to the Measurement Capability Section of this Handbook.
being modeled
This equation just says that if we have p explanatory variables then the
response is modeled by a constant term plus a sum of functions of those
explanatory variables, plus some random error term. This will become clear
as we look at some examples below.
Estimation The coefficients for the parameters in the CLM are estimated by the method
of least squares. This is a method that gives estimates which minimize the
sum of the squared distances from the observations to the fitted line or
plane. See the chapter on Process Modeling for a more complete discussion
on estimating the coefficients for these models.
Testing The tests for the CLM involve testing that the model as a whole is a good
representation of the process and whether any of the coefficients in the
model are zero or have no effect on the overall fit. Again, the details for
testing are given in the chapter on Process Modeling.
Assumptions For estimation purposes, there are no additional assumptions necessary for
the CLM beyond those stated in the assumptions section. For testing
purposes, however, it is necessary to assume that the error term is
adequately modeled by a Gaussian distribution.
Uses The CLM has many uses such as building predictive process models over a
range of process settings that exhibit linear behavior, control charts, process
capability, building models from the data produced by designed
experiments, and building response surface models for automated process
control applications.
Examples Shewhart Control Chart - The simplest example of a very common usage
of the CLM is the underlying model used for Shewhart control charts. This
model assumes that the process parameter being measured is a constant with
additive Gaussian noise and is given by:
Diffusion Furnace (cont.) - Usually, the fitted line for the average wafer
sheet resistance is not straight but has some curvature to it. This can be
accommodated by adding a quadratic term for the time parameter as
follows:
Example: The simplest example of value splitting is when we just have one level
Turned Pins of one factor. Suppose we have a turning operation in a machine shop
where we are turning pins to a diameter of .125 +/- .005 inches.
Throughout the course of a day we take five samples of pins and obtain
the following measurements: .125, .127, .124, .126, .128.
We can split these data values into a common value (mean) and
residuals (what's left over) as follows:
.125 .127 .124 .126 .128
=
.126 .126 .126 .126 .126
+
-.001 .001 -.002 .000 .002
From these tables, also called overlays, we can easily calculate the
location and spread of the data as follows:
mean = .126
std. deviation = .0016.
Other While the above example is a trivial structural layout, it illustrates how
layouts we can split data values into its components. In the next sections, we
will look at more complicated structural layouts for the data. In
particular we will look at multiple levels of one factor ( One-Way
ANOVA ) and multiple levels of two factors (Two-Way ANOVA)
where the factors are crossed and nested.
This equation indicates that the jth data value, from level i, is the sum of
three components: the common value (grand mean), the level effect (the
deviation of each level mean from the grand mean), and the residual
(what's left over).
Estimation Estimation for the one-way layout can be performed one of two ways.
First, we can calculate the total variation, within-level variation and
click here to across-level variation. These can be summarized in a table as shown
see details below and tests can be made to determine if the factor levels are
of one-way significant. The value splitting example illustrates the calculations
value involved.
splitting
ANOVA In general, the ANOVA table for the one-way case is given by:
table for
Degrees of
one-way Source Sum of Squares Mean Square
Freedom
case
Factor
I-1
levels
/(I-1)
residuals I(J-1)
/I(J-1)
Level effects The other way is through the use of CLM techniques. If you look at the
must sum to model above you will notice that it is in the form of a CLM. The only
zero problem is that the model is saturated and no unique solution exists. We
overcome this problem by applying a constraint to the model. Since the
level effects are just deviations from the grand mean, they must sum to
zero. By applying the constraint that the level effects must sum to zero,
we can now obtain a unique solution to the CLM equations. Most
analysis programs will handle this for you automatically. See the chapter
on Process Modeling for a more complete discussion on estimating the
coefficients for these models.
Testing The testing we want to do in this case is to see if the observed data
support the hypothesis that the levels of the factor are significantly
different from each other. The way we do this is by comparing the
within-level variancs to the between-level variance.
If we assume that the observations within each level have the same
variance, we can calculate the variance within each level and pool these
together to obtain an estimate of the overall population variance. This
works out to be the mean square of the residuals.
Similarly, if there really were no level effect, the mean square across
levels would be an estimate of the overall variance. Therefore, if there
really were no level effect, these two estimates would be just two
different ways to estimate the same parameter and should be close
numerically. However, if there is a level effect, the level mean square
will be higher than the residual mean square.
It can be shown that given the assumptions about the data stated below,
the ratio of the level mean square and the residual mean square follows
an F distribution with degrees of freedom as shown in the ANOVA
table. If the F-value is significant at a given level of confidence (greater
than the cut-off value in a F-Table), then there is a level effect present in
the data.
Assumptions For estimation purposes, we assume the data can adequately be modeled
as the sum of a deterministic component and a random component. We
further assume that the fixed (deterministic) component can be modeled
as the sum of an overall mean and some contribution from the factor
level. Finally, it is assumed that the random component can be modeled
with a Gaussian distribution with fixed location and spread.
Uses The one-way ANOVA is useful when we want to compare the effect of
multiple levels of one factor and we have multiple observations at each
level. The factor can be either discrete (different machine, different
plants, different shifts, etc.) or continuous (different gas flows,
temperatures, etc.).
Example Let's extend the machining example by assuming that we have five
different machines making the same part and we take five random
samples from each machine to obtain the following diameter data:
Machine
1 2 3 4 5
.125 .118 .123 .126 .118
.127 .122 .125 .128 .129
.125 .120 .125 .126 .127
.126 .124 .124 .127 .120
.128 .119 .126 .129 .121
Test By dividing the Factor-level mean square by the residual mean square,
we obtain a F-value of 4.86 which is greater than the cut-off value of
2.87 for the F-distribution at 4 and 20 degrees of freedom and 95%
confidence. Therefore, there is sufficient evidence to reject the
hypothesis that the levels are all the same.
Conclusion From the analysis of these data we can conclude that the factor
"machine" has an effect. There is a statistically significant difference in
the pin diameters across the machines on which they were
manufactured.
Machine
1 2 3 4 5
.125 .118 .123 .126 .118
.127 .122 .125 .128 .129
.125 .120 .125 .126 .127
.126 .124 .124 .127 .120
.128 .119 .126 .129 .121
Machine
1 2 3 4 5
.1262 .1206 .1246 .1272 .123
Sweep level We can then sweep (subtract the level mean from each associated data
means value) the means through the original data table to get the residuals:
Machine
1 2 3 4 5
-.0012 -.0026 -.0016 -.0012 -.005
.0008 .0014 .0004 .0008 .006
-.0012 -.0006 .0004 -.0012 .004
-.0002 .0034 -.0006 -.0002 -.003
.0018 -.0016 .0014 .0018 -.002
Calculate The next step is to calculate the grand mean from the individual
the grand machine means as:
mean
Grand
Mean
.12432
Sweep the Finally, we can sweep the grand mean through the individual level
grand mean means to obtain the level effects:
through the
level means
Machine
1 2 3 4 5
.00188 -.00372 .00028 .00288 -.00132
Calculate Now that we have the data values split and the overlays created, the next
ANOVA step is to calculate the various values in the One-Way ANOVA table.
values We have three values to calculate for each overlay. They are the sums of
squares, the degrees of freedom, and the mean squares.
Total sum of The total sum of squares is calculated by summing the squares of all the
squares data values and subtracting from this number the square of the grand
mean times the total number of data values. We usually don't calculate
the mean square for the total sum of squares because we don't use this
value in any statistical test.
Residual The residual sum of squares is calculated by summing the squares of the
sum of residual values. This is equal to .000132. The degrees of freedom is the
squares, number of unconstrained values. Since the residuals for each level of the
degrees of factor must sum to zero, once we know four of them, the last one is
freedom and determined. This means we have four unconstrained values for each
mean square level, or 20 degrees of freedom. This gives a mean square of .000007.
Level sum of Finally, to obtain the sum of squares for the levels, we sum the squares
squares, of each value in the level effect overlay and multiply the sum by the
degrees of number of observations for each level (in this case 5) to obtain a value
freedom and of .000137. Since the deviations from the level means must sum to zero,
mean square we have only four unconstrained values so the degrees of freedom for
level effects is 4. This produces a mean square of .000034.
Calculate The last step is to calculate the F-value and perform the test of equal
F-value level means. The F- value is just the level mean square divided by the
residual mean square. In this case the F-value=4.86. If we look in an
F-table for 4 and 20 degrees of freedom at 95% confidence, we see that
the critical value is 2.87, which means that we have a significant result
and that there is thus evidence of a strong machine effect. By looking at
the level-effect overlay we see that this is driven by machines 2 and 4.
This equation just says that the kth data value for the jth level of Factor
B and the ith level of Factor A is the sum of five components: the
common value (grand mean), the level effect for Factor A, the level
effect for Factor B, the interaction effect, and the residual. Note that (ab)
does not mean multiplication; rather that there is interaction between the
two factors.
Estimation Like the one-way case, the estimation for the two-way layout can be
done either by calculating the variance components or by using CLM
techniques.
Click here For the variance components methods we display the data in a two
for the value dimensional table with the levels of Factor A in columns and the levels
splitting of Factor B in rows. The replicate observations fill each cell. We can
example sweep out the common value, the row effects, the column effects, the
interaction effects and the residuals using value-splitting techniques.
Sums of squares can be calculated and summarized in an ANOVA table
as shown below.
Degrees
Source Sum of Squares of Mean Square
Freedom
rows I-1
/(I-1)
columns J-1
/(J-1)
interaction (I-1)(J-1)
/(I-1)(J-1)
residuals IJ(K-1) /IJ(K-1)
corrected
IJK-1
total
Testing Like testing in the one-way case, we are testing that two main effects
and the interaction are zero. Again we just form a ratio of each main
effect mean square and the interaction mean square to the residual mean
square. If the assumptions stated below are true then those ratios follow
an F-distribution and the test is performed by comparing the F-ratios to
values in an F-table with the appropriate degrees of freedom and
confidence level.
Assumptions For estimation purposes, we assume the data can be adequately modeled
as described in the model above. It is assumed that the random
component can be modeled with a Gaussian distribution with fixed
location and spread.
Uses The two-way crossed ANOVA is useful when we want to compare the
effect of multiple levels of two factors and we can combine every level
of one factor with every level of the other factor. If we have multiple
observations at each level, then we can also estimate the effects of
interaction between the two factors.
Example Let's extend the one-way machining example by assuming that we want
to test if there are any differences in pin diameters due to different types
of coolant. We still have five different machines making the same part
and we take five samples from each machine for each coolant type to
obtain the following data:
Machine
1 2 3 4 5
.125 .118 .123 .126 .118
Coolant .127 .122 .125 .128 .129
A .125 .120 .125 .126 .127
.126 .124 .124 .127 .120
.128 .119 .126 .129 .121
.124 .116 .122 .126 .125
.128 .125 .121 .129 .123
Coolant
.127 .119 .124 .125 .114
B
.126 .125 .126 .130 .124
.129 .120 .125 .124 .117
Analyze For analysis details see the crossed two-way value splitting example.
We can summarize the analysis results in an ANOVA table as follows:
Sum of Degrees of
Source Mean Square F-value
Squares Freedom
machine .000303 4 .000076 8.8 > 2.61
coolant .00000392 1 .00000392 .45 < 4.08
interaction .00001468 4 .00000367 .42 < 2.61
residuals .000346 40 .0000087
corrected total .000668 49
Test By dividing the mean square for machine by the mean square for
residuals we obtain an F-value of 8.8 which is greater than the cut-off
value of 2.61 for 4 and 40 degrees of freedom and a confidence of
95%. Likewise the F-values for Coolant and Interaction, obtained by
dividing their mean squares by the residual mean square, are less than
their respective cut-off values.
Conclusion From the ANOVA table we can conclude that machine is the most
important factor and is statistically significant. Coolant is not significant
and neither is the interaction. These results would lead us to believe that
some tool-matching efforts would be useful for improving this process.
For the crossed two-way case, the first thing we need to do is to sweep
the cell means from the data table to obtain the residual values. This is
shown in the tables below.
Sweep the The next step is to sweep out the row means. This gives the table below.
row means
Machine
1 2 3 4 5
A .1243 .0019 -.0037 .0003 .0029 -.0013
B .1238 .003 -.0028 -.0002 .003 -.0032
Sweep the Finally, we sweep the column means to obtain the grand mean, row
column (coolant) effects, column (machine) effects and the interaction effects.
means
Machine
1 2 3 4 5
.1241 .0025 -.0033 .00005 .003 -.0023
A .0003 -.0006 -.0005 .00025 .0000 .001
B -.0003 .0006 .0005 -.00025 .0000 -.001
What do By looking at the table of residuals, we see that the residuals for coolant
these tables B tend to be a little higher than for coolant A. This implies that there
tell us? may be more variability in diameter when we use coolant B. From the
effects table above, we see that machines 2 and 5 produce smaller pin
diameters than the other machines. There is also a very slight coolant
effect but the machine effect is larger. Finally, there also appears to be
slight interaction effects. For instance, machines 1 and 2 had smaller
diameters with coolant A but the opposite was true for machines 3,4 and
5.
Calculate We can calculate the values for the ANOVA table according to the
sums of formulae in the table on the crossed two-way page. This gives the table
squares and below. From the F-values we see that the machine effect is significant
mean but the coolant and the interaction are not.
squares
Model If Factor B is nested within Factor A, then a level of Factor B can only
occur within one level of Factor A and there can be no interaction. This
gives the following model:
This equation indicates that each data value is the sum of a common value
(grand mean), the level effect for Factor A, the level effect of Factor B
nested Factor A, and the residual.
Click here It is important to note that with this type of layout, since each level of one
for nested factor is only present with one level of the other factor, we can't estimate
value- interaction between the two.
splitting
example
ANOVA Degrees of
table for Source Sum of Squares Mean Square
Freedom
nested case
rows I-1
/(I-1)
columns I(J-1)
/I(J-1)
residuals IJ(K-1) /IJ(K-1)
corrected
IJK-1
total
As with the crossed layout, we can also use CLM techniques. We still have
the problem that the model is saturated and no unique solution exists. We
overcome this problem by applying to the model the constraints that the
two main effects sum to zero.
Testing We are testing that two main effects are zero. Again we just form a ratio of
each main effect mean square to the residual mean square. If the
assumptions stated below are true then those ratios follow an F-distribution
and the test is performed by comparing the F-ratios to values in an F-table
with the appropriate degrees of freedom and confidence level.
Assumptions For estimation purposes, we assume the data can be adequately modeled as
described in the model above. It is assumed that the random component can
be modeled with a Gaussian distribution with fixed location and spread.
Uses The two-way nested ANOVA is useful when we are constrained from
combining all the levels of one factor with all of the levels of the other
factor. These designs are most useful when we have what is called a
random effects situation. When the levels of a factor are chosen at random
rather than selected intentionally, we say we have a random effects model.
An example of this is when we select lots from a production run, then
select units from the lot. Here the units are nested within lots and the effect
of each factor is random.
Example Let's change the two-way machining example slightly by assuming that we
have five different machines making the same part and each machine has
two operators, one for the day shift and one for the night shift. We take five
samples from each machine for each operator to obtain the following data:
Machine
1 2 3 4 5
.125 .118 .123 .126 .118
Operator .127 .122 .125 .128 .129
Day .125 .120 .125 .126 .127
.126 .124 .124 .127 .120
.128 .119 .126 .129 .121
.124 .116 .122 .126 .125
.128 .125 .121 .129 .123
Operator
.127 .119 .124 .125 .114
Night
.126 .125 .126 .130 .124
.129 .120 .125 .124 .117
Analyze For analysis details see the nested two-way value splitting example. We
can summarize the analysis results in an ANOVA table as follows:
Sum of Degrees of
Source Mean Square F-value
Squares Freedom
8.77 >
Machine .000303 4 .0000758
2.61
.428 <
Operator(Machine) .0000186 5 .00000372
2.45
Residuals .000346 40 .0000087
Corrected Total .000668 49
Test By dividing the mean square for machine by the mean square for residuals
we obtain an F-value of 8.5 which is greater than the cut-off value of 2.61
for 4 and 40 degrees of freedom and a confidence of 95%. Likewise the
F-value for Operator(Machine), obtained by dividing its mean square by
the residual mean square is less than the cut-off value of 2.45 for 5 and 40
degrees of freedom and 95% confidence.
Conclusion From the ANOVA table we can conclude that the Machine is the most
important factor and is statistically significant. The effect of Operator
nested within Machine is not statistically significant. Again, any
improvement activities should be focused on the tools.
For the nested two-way case, just as in the crossed case, the first thing we need to do is to
sweep the cell means from the data table to obtain the residual values. We then sweep the
nested factor (Operator) and the top level factor (Machine) to obtain the table below.
What By looking at the residuals we see that machines 2 and 5 have the greatest variability.
does this There does not appear to be much of an operator effect but there is clearly a strong machine
table tell effect.
us?
Calculate We can calculate the values for the ANOVA table according to the formulae in the table on
sums of the nested two-way page. This produces the table below. From the F-values we see that the
squares machine effect is significant but the operator effect is not. (Here it is assumed that both
and factors are fixed).
mean
squares
Contingency There are two primary methods available for the analysis of discrete
table response data. The first one applies to situations in which we have
analysis and discrete explanatory variables and discrete responses and is known as
log-linear Contingency Table Analysis. The model for this is covered in detail in
model this section. The second model applies when we have both discrete and
continuous explanatory variables and is referred to as a Log-Linear
Model. That model is beyond the scope of this Handbook, but interested
readers should refer to the reference section of this chapter for a list of
useful books on the topic.
Now, each cell of this table will have a count of the individuals who fall
into its particular combination of classification levels. Let's call this
count Nij. The sum of all of these counts will be equal to the total
number of individuals, N. Also, each row of the table will sum to Ni.
and each column will sum to N.j .
Estimation The estimation is very simple. All we do is make a table of the observed
counts and then calculate the expected counts as described above.
Assumptions The estimation and testing results above hold regardless of whether the
sample model is Poisson, multinomial, or product-multinomial. The
chi-square results start to break down if the counts in any cell are small,
say < 5.
Uses The contingency table method is really just a test of interaction between
discrete explanatory variables for discrete responses. The example given
below is for two factors. The methods are equally applicable to more
factors, but as with any interaction, as you add more factors the
interpretation of the results becomes more difficult.
Example Suppose we are comparing the yield from two manufacturing processes.
We want want to know if one process has a higher yield.
We obtain the expected values by the formula given above. This gives
the table below.
Calculate The chi-square statistic is 1.276. This is below the chi-square value for 1
chi-square degree of freedom and 90% confidence of 2.71 . Therefore, we conclude
statistic and that there is not a (significant) difference in process yield.
compare to
table value
Many things This activity of course ends without the actual collection of the data
can go which is usually not as straightforward as it might appear. Many things
wrong in the can go wrong in the execution of the sampling plan. The problems can
data be mitigated with the use of check lists and by carefully documenting all
collection exceptions to the original sampling plan.
Example Click on each of the links below to see Goal Statements for each of the
goal case studies.
statements 1. Furnace Case Study (Goal)
2. Machine Case Study (Goal)
Document The final step is to document all known information about the
relationships relationships and sensitivities between the inputs and outputs. Some of
and the inputs may be correlated with each other as well as the outputs.
sensitivities There may be detailed mathematical models available from other
studies or the information available may be vague such as for a
machining process we know that as the feed rate increases, the quality
of the finish decreases.
Examples Click on each of the links below to see the process models for each of
the case studies.
1. Case Study 1 (Process Model)
2. Case Study 2 (Process Model)
Verify and Once the sampling plan has been developed, it can be verified and then
execute passed on to the responsible parties for execution.
Goals will tell The first step is to carefully examine the goals. This will tell you
us what to which response variables need to be sampled and how. For instance, if
measure and our goal states that we want to determine if an oxide film can be
how grown on a wafer to within 10 Angstroms of the target value with a
uniformity of <2%, then we know we have to measure the film
thickness on the wafers to an accuracy of at least +/- 3 Angstroms and
we must measure at multiple sites on the wafer in order to calculate
uniformity.
The goals and the models we build will also indicate which
explanatory variables need to be sampled and how. Since the fishbone
diagrams define the known important relationships, these will be our
best guide as to which explanatory variables are candidates for
measurement.
Ranges help Defining the expected ranges of values is useful for screening outliers.
screen outliers In the machining example , we would not expect to see many values
that vary more than +/- .005" from nominal. Therefore we know that
any values that are much beyond this interval are highly suspect and
should be remeasured.
Examples Click on each of the links below to see the parameter descriptions for
each of the case studies.
1. Case Study 1 (Sampling Plan)
2. Case Study 2 (Sampling Plan)
Passive data The second situation is when we are conducting a passive data
collection collection (PDC) study to learn about the inherent properties of a
process. These types of studies are usually for comparison purposes
when we wish to compare properties of processes against each other
or against some hypothesis. This is the situation that we will focus on
here.
There are two Once we have selected our response parameters, it would seem to be a
principles that rather straightforward exercise to take some measurements, calculate
guide our some statistics and draw conclusions. There are, however, many
choice of things which can go wrong along the way that can be avoided with
sampling careful planning and knowing what to watch for. There are two
scheme overriding principles that will guide the design of our sampling
scheme.
The first is The first principle is that of precision. If the sampling scheme is
precision properly laid out, the difference between our estimate of some
parameter of interest and its true value will be due only to random
variation. The size of this random variation is measured by a quantity
called standard error. The magnitude of the standard error is known
as precision. The smaller the standard error, the more precise are our
estimates.
The second is The second principle is the avoidance of systematic errors. Systematic
systematic sampling error occurs when the levels of one explanatory variable are
sampling error the same as some other unaccounted for explanatory variable. This is
(or also referred to as confounded effects. Systematic sampling error is
confounded best seen by example.
effects)
Stratification The way to combat systematic sampling errors (and at the same time
helps to increase precision) is through stratification and randomization.
overcome Stratification is the process of segmenting our population across
systematic levels of some factor so as to minimize variability within those
error segments or strata. For instance, if we want to try several different
process recipes to see which one is best, we may want to be sure to
apply each of the recipes to each of the three work shifts. This will
ensure that we eliminate any systematic errors caused by a shift effect.
This is where the ANOVA designs are particularly useful.
Examples The issues here are many and complicated. Click on each of the links
below to see the sampling schemes for each of the case studies.
1. Case Study 1 (Sampling Plan)
2. Case Study 2 (Sampling Plan)
Cost of The cost of sampling issue helps us determine how precise our
taking estimates should be. As we will see below, when choosing sample
samples sizes we need to select risk values. If the decisions we will make from
the sampling activity are very valuable, then we will want low risk
values and hence larger sample sizes.
Prior If our process has been studied before, we can use that prior
information information to reduce sample sizes. This can be done by using prior
mean and variance estimates and by stratifying the population to
reduce variation within groups.
Practicality Of course the sample size you select must make sense. This is where
the trade-offs usually occur. We want to take enough observations to
obtain reasonably precise estimates of the parameters of interest but we
also want to do this within a practical resource budget. The important
thing is to quantify the risks associated with the chosen sample size.
Sample size In summary, the steps involved in estimating a sample size are:
determination 1. There must be a statement about what is expected of the sample.
We must determine what is it we are trying to estimate, how
precise we want the estimate to be, and what are we going to do
with the estimate once we have it. This should easily be derived
from the goals.
2. We must find some equation that connects the desired precision
of the estimate with the sample size. This is a probability
statement. A couple are given below; see your statistician if
these are not appropriate for your situation.
3. This equation may contain unknown properties of the population
such as the mean or variance. This is where prior information
can help.
4. If you are stratifying the population in order to reduce variation,
sample size determination must be performed for each stratum.
5. The final sample size should be scrutinized for practicality. If it
is unacceptable, the only way to reduce it is to accept less
precision in the sample estimate.
where
Example Let's say we have a new process we want to try. We plan to run the
new process and sample the output for yield (good/bad). Our current
process has been yielding 65% (p=.65, q=.35). We decide that we want
the estimate of the new process yield to be accurate to within = .10
at 95% confidence ( = .05, z=2). Using the formula above we get a
sample size estimate of n=91. Thus, if we draw 91 random parts from
the output of the new process and estimate the yield, then we are 95%
sure the yield estimate is within .10 of the true process yield.
Estimating If instead of relative error, we wish to use absolute error, the equation
location: for sample size looks alot like the one for the case of proportions:
absolute
error
Know the In the planning phase of the PPC, be sure to understand the entire data
process collection process. Things to watch out for include:
involved ● automatic measurement machines rejecting outliers
Table showing A partial list of these individuals along with their roles and potential
roles and responsibilities is given in the table below. There may be multiple
potential occurrences of each of these individuals across shifts or process steps,
responsibilities so be sure to include everyone.
Tool Owner Controls Tool ● Schedules tool time
Operations ● Ensures tool state
● Advises on
experimental design
Process Owner Controls Process ● Advises on
Recipe experimental design
● Controls recipe settings
Tool Operator Executes ● Executes experimental
Experimental Plan runs
● May take
measurements
Metrology Own Measurement ● Maintains metrology
Tools equipment
● Conducts gauge studies
● May take
measurements
Perform a The next step is to perform a quality check on the data. Here we are
quality typically looking for data entry problems, unusual data values, missing
check on the data, etc. The two most useful tools for this step are the scatter plot and
data using the histogram. By constructing scatter plots of all of the response
graphical variables, any data entry problems will be easily identified. Histograms
and of response variables are also quite useful for identifying data entry
numerical problems. Histograms of explanatory variables help identify problems
techniques with the execution of the sampling plan. If the counts for each level of
the explanatory variables are not the same as called for in the sampling
plan, you know you may have an execution problem. Running
numerical summary statistics on all of the variables (both response and
explanatory) also helps to identify data problems.
Summarize Once the data quality problems are identified and fixed, we should
data by estimate the location, spread and shape for all of the response variables.
estimating This is easily done with a combination of histograms and numerical
location, summary statistics.
spread and
shape
Graph In this exploratory phase, the key is to graph everything that makes
everything sense to graph. These pictures will not only reveal any additional
that makes quality problems with the data but will also reveal influential data
sense points and will guide the subsequent modeling activities.
Graph The order that generally proves most effective for data analysis is to
responses, first graph all of the responses against each other in a pairwise fashion.
then Then we graph responses against the explanatory variables. This will
explanatory give an indication of the main factors that have an effect on response
versus variables. Finally, we graph response variables, conditioned on the
response, levels of explanatory factors. This is what reveals interactions between
then explanatory variables. We will use nested boxplots and block plots to
conditional visualize interactions.
plots
Check the The scatterplot matrix shows how the response variables are related to each other. If there is a linear
slope of trend with a positive slope, this indicates that the responses are positively correlated. If there is a
the data linear trend with a negative slope, then the variables are negatively correlated. If the data appear
on the random with no slope, the variables are probably not correlated. This will be important information
scatter for subsequent model building steps.
plots
This An example of a scatterplot matrix is given below. In this semiconductor manufacturing example,
scatterplot three responses, yield (Bin1), N-channel Id effective (NIDEFF), and P-channel Id effective
matrix (PIDEFF) are plotted against each other in a scatterplot matrix. We can see that Bin1 is positively
shows correlated with NIDEFF and negatively correlated with PIDEFF. Also, as expected, NIDEFF is
examples negatively correlated with PIDEFF. This kind of information will prove to be useful when we build
of both models for yield improvement.
negatively
and
positively
correlated
variables
Watch out This step is relatively self explanatory. However there are two points of caution. First, be cognizant
for varying of not only the trends in these graphs but also the amount of data represented in those trends. This is
sample especially true for categorical explanatory variables. There may be many more observations in some
sizes across levels of the categorical variable than in others. In any event, take unequal sample sizes into account
levels when making inferences.
Graph The second point is to be sure to graph the responses against implicit explanatory variables (such as
implicit as observation order) as well as the explicit explanatory variables. There may be interesting insights in
well as these hidden explanatory variables.
explicit
explanatory
variables
Example: In the example below, we have collected data on the particles added to a wafer during a particular
wafer processing step. We ran a number of cassettes through the process and sampled wafers from certain
processing slots in the cassette. We also kept track of which load lock the wafers passed through. This was done
for two different process temperatures. We measured both small particles (< 2 microns) and large
particles (> 2 microns). We plot the responses (particle counts) against each of the explanatory
variables.
Cassette This first graph is a box plot of the number of small particles added for each cassette type. The "X"'s
does not in the plot represent the maximum, median, and minimum number of particles.
appear to
be an
important
factor for
small or
large
particles
The second graph is a box plot of the number of large particles added for each cassette type.
We conclude from these two box plots that cassette does not appear to be an important factor for
small or large particles.
There is a We next generate box plots of small and large particles for the slot variable. First, the box plot for
difference small particles.
between
slots for
small
particles,
one slot is
different for
large
particles
We conclude that there is a difference between slots for small particles. We also conclude that one
slot appears to be different for large particles.
Load lock We next generate box plots of small and large particles for the load lock variable. First, the box plot
may have a for small particles.
slight effect
for small
and large
particles
We conclude that there may be a slight effect for load lock for small and large particles.
For small We next generate box plots of small and large particles for the temperature variable. First, the box
particles, plot for small particles.
temperature
has a
strong
effect on
both
location
and spread.
For large
particles,
there may
be a slight
temperature
effect but
this may
just be due
to the
outliers
'
We conclude that temperature has a strong effect on both location and spread for small particles. We
conclude that there might be a small temperature effect for large particles, but this may just be due to
outliers.
The eyes Interactions can be seen visually by using nested box plots. However, caution should be exercised
can be when identifying interactions through graphical means alone. Any graphically identified interactions
deceiving - should be verified by numerical methods as well.
be careful
Previous To continue the previous example, given below are nested box plots of the small and large particles.
example The load lock is nested within the two temperature values. There is some evidence of possible
continued interaction between these two factors. The effect of load lock is stronger at the lower temperature
than at the higher one. This effect is stronger for the smaller particles than for the larger ones. As
this example illustrates, when you have significant interactions the main effects must be interpreted
conditionally. That is, the main effects do not tell the whole story by themselves.
For small The following is the box plot of small particles for load lock nested within temperature.
particles,
the load
lock effect
is not as
strong for
high
temperature
as it is for
low
temperature
We conclude from this plot that for small particles, the load lock effect is not as strong for high
temperature as it is for low temperature.
The same The following is the box plot of large particles for load lock nested within temperature.
may be true
for large
particles
but not as
strongly
We conclude from this plot that for large particles, the load lock effect may not be as strong for high
temperature as it is for low temperature. However, this effect is not as strong as it is for small
particles.
In our data
collection plan
we drew
process model
pictures
Polynomial There are two cases that we will cover for building mathematical models. If our
models are goal is to develop an empirical prediction equation or to identify statistically
generic significant explanatory variables and quantify their influence on output
descriptors of responses, we typically build polynomial models. As the name implies, these are
our output polynomial functions (typically linear or quadratic functions) that describe the
surface relationships between the explanatory variables and the response variable.
Physical On the other hand, if our goal is to fit an existing theoretical equation, then we
models want to build physical models. Again, as the name implies, this pertains to the
describe the case when we already have equations representing the physics involved in the
underlying process and we want to estimate specific parameter values.
physics of our
processes
Use multiple When the number of factors is small (less than 5), the complete
regression to polynomial equation can be fitted using the technique known as
fit multiple regression. When the number of factors is large, we should use
polynomial a technique known as stepwise regression. Most statistical analysis
models programs have a stepwise regression capability. We just enter all of the
terms of the polynomial models and let the software choose which
terms best describe the data. For a more thorough discussion of this
topic and some examples, refer to the process improvement chapter.
We will We will illustrate this concept with an example. We have collected data on a
use a chemical/mechanical planarization process (CMP) at a particular semiconductor
CMP processing step. In this process, wafers are polished using a combination of
process to chemicals in a polishing slurry using polishing pads. We polished a number of
illustrate wafers for differing periods of time in order to calculate material removal rates.
CMP From first principles we know that removal rate changes with time. Early on,
removal removal rate is high and as the wafer becomes more planar the removal rate
rate can declines. This is easily modeled with an exponential function of the form:
be removal rate = p1 + p2 x exp p3 x time
modeled
with a where p1, p2, and p3 are the parameters we want to estimate.
non-linear
equation
A The equation was fit to the data using a non-linear regression routine. A plot of
non-linear the original data and the fitted line are given in the image below. The fit is quite
regression good. This fitted equation was subsequently used in process optimization work.
routine
was used
to fit the
data to
the
equation
The key to performing an analysis of variance is identifying the structure represented by the
The key is data. In the ANOVA models section we discussed one-way layouts and two-way layouts where
to know the
the factors are either crossed or nested. Review these sections if you want to learn more about
structure
ANOVA structural layouts.
To perform the analysis, we just identify the structure, enter the data for each of the factors and
levels into a statistical analysis program and then interpret the ANOVA table and other output.
This is all illustrated in the example below.
Example: The example is a furnace operation in semiconductor manufacture where we are growing an
furnace oxide layer on a wafer. Each lot of wafers is placed on quartz containers (boats) and then placed
oxide in a long tube-furnace. They are then raised to a certain temperature and held for a period of
thickness time in a gas flow. We want to understand the important factors in this operation. The furnace is
with a broken down into four sections (zones) and two wafers from each lot in each zone are measured
1-way for the thickness of the oxide layer.
layout
Look at The first thing to look at is the effect of zone location on the oxide thickness. This is a classic
effect of one-way layout. The factor is furnace zone and we have four levels. A plot of the data and an
zone ANOVA table are given below.
location on
oxide
thickness
The zone
effect is
masked by
the
lot-to-lot
variation
Let's From the graph there does not appear to be much of a zone effect; in fact, the ANOVA table
account for indicates that it is not significant. The problem is that variation due to lots is so large that it is
lot with a masking the zone effect. We can fix this by adding a factor for lot. By treating this as a nested
nested two-way layout, we obtain the ANOVA table below.
layout
Conclusions Since the "Prob > F" is less than .05, for both lot and zone, we know that these factors are
statistically significant at the 95% level of confidence.
The graphical The graphical tool we use to assess process stability is the scatter
tool we use to plot. We collect a sufficient number of independent samples (greater
assess stability than 100) from our process over a sufficiently long period of time
is the scatter (this can be specified in days, hours of processing time or number of
plot or the parts processed) and plot them on a scatter plot with sample order on
control chart the x-axis and the sample value on the y-axis. The plot should look
like constant random variation about a constant mean. Sometimes it
is helpful to calculate control limits and plot them on the scatter plot
along with the data. The two plots in the controlled variation
example are good illustrations of stable and unstable processes.
Graphical Typically, graphical methods are good enough for evaluating process
methods stability. The numerical methods are generally only used for
usually good modeling purposes.
enough
Use a Graphically, we assess process capability by plotting the process specification limits on a histogram
capability of the observations. If the histogram falls within the specification limits, then the process is capable.
chart This is illustrated in the graph below. Note how the process is shifted below target and the process
variation is too large. This is an example of an incapable process.
Notice how
the process is
off target and
has too much
variation
Numerically, we measure capability with a capability index. The general equation for the capability
index, Cp, is:
Numerically,
we use the Cp
index
Interpretation This equation just says that the measure of our process capability is how much of our observed
of the Cp process variation is covered by the process specifications. In this case the process variation is
index measured by 6 standard deviations (+/- 3 on each side of the mean). Clearly, if Cp > 1.0, then the
process specification covers almost all of our process observations.
Cp does not The only problem with with the Cp index is that it does not account for a process that is off-center.
account for We can modify this equation slightly to account for off-center processes to obtain the Cpk index as
process that follows:
is off center
Or the Cpk
index
Cpk accounts This equation just says to take the minimum distance between our specification limits and the
for a process process mean and divide it by 3 standard deviations to arrive at the measure of process capability.
being off This is all covered in more detail in the process capability section of the process monitoring chapter.
center For the example above, note how the Cpk value is less than the Cp value. This is because the process
distribution is not centered between the specification limits.
Some causes There are several things that could cause the data to appear non-normal, such as:
of non- ● The data come from two or more different sources. This type of data will often have a
normality multi-modal distribution. This can be solved by identifying the reason for the multiple sets of
data and analyzing the data separately.
● The data come from an unstable process. This type of data is nearly impossible to analyze
because the results of the analysis will have no credibility due to the changing nature of the
process.
● The data were generated by a stable, yet fundamentally non-normal mechanism. For example,
particle counts are non-normal by the very nature of the particle generation process. Data of
this type can be handled using transformations.
We can For the last case, we could try transforming the data using what is known as a power
sometimes transformation. The power transformation is given by the equation:
transform the
data to make it
look normal
where Y represents the data and lambda is the transformation value. Lambda is typically any value
between -2 and 2. Some of the more common values for lambda are 0, 1/2, and -1, which give the
following transformations:
General The general algorithm for trying to make non-normal data appear to be approximately normal is to:
algorithm for 1. Determine if the data are non-normal. (Use normal probability plot and histogram).
trying to make
non-normal 2. Find a transformation that makes the data look approximately normal, if possible. Some data
data sets may include zeros (i.e., particle data). If the data set does include zeros, you must first
approximately add a constant value to the data and then transform the results.
normal
Example: As an example, let's look at some particle count data from a semiconductor processing step. Count
particle count data are inherently non-normal. Below are histograms and normal probability plots for the original
data data and the ln, sqrt and inverse of the data. You can see that the log transform does the best job of
making the data appear as if it is normal. All analyses can be performed on the log-transformed data
and the assumptions will be approximately satisfied.
The original
data is
non-normal,
the log
transform
looks fairly
normal
Neither the
square root
nor the inverse
transformation
looks normal
Table of The case study is broken down into the following steps.
Contents 1. Background and Data
2. Initial Analysis of Response Variable
3. Identify Sources of Variation
4. Analysis of Variance
5. Final Conclusions
6. Work This Example Yourself
Process In the picture below we are modeling this process with one output
Model (film thickness) that is influenced by four controlled factors (gas flow,
pressure, temperature and time) and two uncontrolled factors (run and
zone). The four controlled factors are part of our recipe and will
remain constant throughout this study. We know that there is
run-to-run variation that is due to many different factors (input
material variation, variation in consumables, etc.). We also know that
the different zones in the furnace have an effect. A zone is a region of
the furnace tube that holds one boat. There are four zones in these
tubes. The zones in the middle of the tube grow oxide a little bit
differently from the ones on the ends. In fact, there are temperature
offsets in the recipe to help minimize this problem.
Sensitivity The sensitivity model for this process is fairly straightforward and is
Model given in the figure below. The effects of the machin are mostly related
to the preventative maintenance (PM) cycle. We want to make sure the
quartz tube has been cleaned recently, the mass flow controllers are in
good shape and the temperature controller has been calibrated recently.
The same is true of the measurement equipment where the thickness
readings will be taken. We want to make sure a gauge study has been
performed. For material, the incoming wafers will certainly have an
effect on the outgoing thickness as well as the quality of the gases used.
Finally, the recipe will have an effect including gas flow, temperature
offset for the different zones, and temperature profile (how quickly we
raise the temperature, how long we hold it and how quickly we cool it
off).
Sampling Given our goal statement and process modeling, we can now define a
Plan sampling plan. The primary goal is to determine if the process is
capable. This just means that we need to monitor the process over some
period of time and compare the estimates of process location and spread
to the specifications. An additional goal is to identify sources of
variation to aid in setting up a process control strategy. Some obvious
sources of variation are incoming wafers, run-to-run variability,
variation due to operators or shift, and variation due to zones within a
furnace tube. One additional constraint that we must work under is that
this study should not have a significant impact on normal production
operations.
Given these constraints, the following sampling plan was selected. It
was decided to monitor the process for one day (three shifts). Because
this process is operator independent, we will not keep shift or operator
information but just record run number. For each run, we will randomly
assign cassettes of wafers to a zone. We will select two wafers from
each zone after processing and measure two sites on each wafer. This
plan should give reasonable estimates of run-to-run variation and within
zone variability as well as good overall estimates of process location and
spread.
We are expecting readings around 560 Angstroms. We would not expect
many readings above 700 or below 400. The measurement equipment is
accurate to within 0.5 Angstroms which is well within the accuracy
needed for this study.
Data
The following are the data that were collected for this study.
7 1 1 593
7 1 2 626
7 2 1 584
7 2 2 559
7 3 1 634
7 3 2 598
7 4 1 569
7 4 2 592
8 1 1 522
8 1 2 535
8 2 1 535
8 2 2 581
8 3 1 527
8 3 2 520
8 4 1 532
8 4 2 539
9 1 1 562
9 1 2 568
9 2 1 548
9 2 2 548
9 3 1 533
9 3 2 553
9 4 1 533
9 4 2 521
10 1 1 555
10 1 2 545
10 2 1 584
10 2 2 572
10 3 1 546
10 3 2 552
10 4 1 586
10 4 2 584
11 1 1 565
11 1 2 557
11 2 1 583
11 2 2 585
11 3 1 582
11 3 2 567
11 4 1 549
11 4 2 533
12 1 1 548
12 1 2 528
12 2 1 563
12 2 2 588
12 3 1 543
12 3 2 540
12 4 1 585
12 4 2 586
13 1 1 580
13 1 2 570
13 2 1 556
13 2 2 569
13 3 1 609
13 3 2 625
13 4 1 570
13 4 2 595
14 1 1 564
14 1 2 555
14 2 1 585
14 2 2 588
14 3 1 564
14 3 2 583
14 4 1 563
14 4 2 558
15 1 1 550
15 1 2 557
15 2 1 538
15 2 2 525
15 3 1 556
15 3 2 547
15 4 1 534
15 4 2 542
16 1 1 552
16 1 2 547
16 2 1 563
16 2 2 578
16 3 1 571
16 3 2 572
16 4 1 575
16 4 2 584
17 1 1 549
17 1 2 546
17 2 1 584
17 2 2 593
17 3 1 567
17 3 2 548
17 4 1 606
17 4 2 607
18 1 1 539
18 1 2 554
18 2 1 533
18 2 2 535
18 3 1 522
18 3 2 521
18 4 1 547
18 4 2 550
19 1 1 610
19 1 2 592
19 2 1 587
19 2 2 587
19 3 1 572
19 3 2 612
19 4 1 566
19 4 2 563
20 1 1 569
20 1 2 609
20 2 1 558
20 2 2 555
20 3 1 577
20 3 2 579
20 4 1 552
20 4 2 558
21 1 1 595
21 1 2 583
21 2 1 599
21 2 2 602
21 3 1 598
21 3 2 616
21 4 1 580
21 4 2 575
Conclusions We can make the following conclusions based on these initial plots.
From the ● The box plot indicates one outlier. However, this outlier is only slightly smaller than the
Plots other numbers.
● The normal probability plot and the histogram (with an overlaid normal density) indicate
that this data set is reasonably approximated by a normal distribution.
Parameter Parameter estimates for the film thickness are summarized in the
Estimates following table.
Parameter Estimates
Lower (95%) Upper (95%)
Type Parameter Estimate Confidence Confidence
Bound Bound
Location Mean 563.0357 559.1692 566.9023
Standard
Dispersion 25.3847 22.9297 28.4331
Deviation
Quantiles Quantiles for the film thickness are summarized in the following table.
Quantiles for Film Thickness
100.0% Maximum 634.00
99.5% 634.00
97.5% 615.10
90.0% 595.00
75.0% Upper Quartile 582.75
50.0% Median 562.50
25.0% Lower Quartile 546.25
10.0% 532.90
2.5% 514.23
0.5% 487.00
0.0% Minimum 487.00
Capability From the above preliminary analysis, it looks reasonable to proceed with the capability
Analysis analysis.
****************************************************
* CAPABILITY ANALYSIS *
* NUMBER OF OBSERVATIONS = 168 *
* MEAN = 563.03571 *
* STANDARD DEVIATION = 25.38468 *
****************************************************
* LOWER SPEC LIMIT (LSL) = 460.00000 *
Summary of From the above capability analysis output, we can summarize the percent defective (i.e.,
Percent the number of items outside the specification limits) in the following table.
Defective
Percentage Outside Specification Limits
Theoretical (%
Specification Value Percent Actual
Based On Normal)
Percent Below
Lower Specification
460 LSL = 100* 0.0000 0.0025%
Limit
((LSL - )/s)
Percent Above
Upper Specification
660 USL = 100*(1 - 0.0000 0.0067%
Limit
((USL - )/s))
Combined Percent
Specification Target 560 Below LSL and 0.0000 0.0091%
Above USL
Standard Deviation 25.38468
with denoting the normal cumulative distribution function, the sample mean, and s
the sample standard deviation.
Summary of From the above capability analysis output, we can summarize various capability index
Capability statistics in the following table.
Index
Statistics Capability Index Statistics
Capability Statistic Index Lower CI Upper CI
CP 1.313 1.172 1.454
CPK 1.273 1.128 1.419
CPM 1.304 1.165 1.442
CPL 1.353 1.218 1.488
CPU 1.273 1.142 1.404
Conclusions The above capability analysis indicates that the process is capable and we can proceed
with the analysis.
Box Plot by The following is a box plot of the thickness by run number.
Run
Conclusions We can make the following conclusions from this box plot.
From Box 1. There is significant run-to-run variation.
Plot
2. Although the means of the runs are different, there is no discernable trend due to run.
3. In addition to the run-to-run variation, there is significant within-run variation as well. This
suggests that a box plot by furnace location may be useful as well.
Box Plot by The following is a box plot of the thickness by furnace location.
Furnace
Location
Conclusions We can make the following conclusions from this box plot.
From Box 1. There is considerable variation within a given furnace location.
Plot
2. The variation between furnace locations is small. That is, the locations and scales of each
of the four furnace locations are fairly comparable (although furnace location 3 seems to
have a few mild outliers).
Conclusion From this box plot, we conclude that wafer does not seem to be a significant factor.
From Box
Plot
Block Plot In order to show the combined effects of run, furnace location, and wafer, we draw a block plot of
the thickness. Note that for aesthetic reasons, we have used connecting lines rather than enclosing
boxes.
Conclusions We can draw the following conclusions from this block plot.
From Block 1. There is significant variation both between runs and between furnace locations. The
Plot between-run variation appears to be greater.
2. Run 3 seems to be an outlier.
Components From the above analysis of variance table, we can compute the
of Variance components of variance. Recall that for this data set we have 2 wafers
measured at 4 furnace locations for 21 runs. This leads to the following
set of equations.
3072.11 = (4*2)*Var(Run) + 2*Var(Furnace Location) +
Var(Within)
571.659 = 2*Var(Furnace Location) + Var(Within)
120.893 = Var(Within)
Solving these equations yields the following components of variance
table.
Components of Variance
Component Variance Percent of Sqrt(Variance
Component Total Component)
Run 312.55694 47.44 17.679
Furnace 225.38294 34.21 15.013
Location[Run]
Within 120.89286 18.35 10.995
Table of The case study is broken down into the following steps.
Contents 1. Background and Data
2. Box Plots by Factor
3. Analysis of Variance
4. Throughput
5. Final Conclusions
6. Work This Example Yourself
Goal The goal of this study is to determine which machine is least stable in
manufacturing a steel pin with a diameter of .125 +/- .003 inches.
Stability will be measured in terms of a constant variance about a
constant mean. If all machines are stable, the decision will be based on
process variability and throughput. Namely, the machine with the
highest variability and lowest throughput will be selected for
replacement.
Process The process model for this operation is trivial and need not be
Model addressed.
Sensitivity The sensitivity model, however, is important and is given in the figure
Model below. The material is not very important. All machines will receive
barstock from the same source and the coolant will be the same. The
method is important. Each machine is slightly different and the
operator must make adjustments to the speed (how fast the part
rotates), feed (how quickly the cut is made) and stops (where cuts are
finished) for each machine. The same operator will be running all three
machines simultaneously. Measurement is not too important. An
experienced QC engineer will be collecting the samples and making
the measurements. Finally, the machine condition is really what this
study is all about. The wear on the ways and the lead screws will
largely determine the stability of the machining process. Also, tool
wear is important. The same type of tool inserts will be used on all
three machines. The tool insert wear will be monitored by the operator
Sampling Given our goal statement and process modeling, we can now define a sampling
Plan plan. The primary goal is to determine if the process is stable and to compare the
variances of the three machines. We also need to monitor throughput so that we
can compare the productivity of the three machines.
There is an upcoming three-day run of the particular part of interest, so this
study will be conducted on that run. There is a suspected time-of-day effect that
we must account for. It is sometimes the case that the machines do not perform
as well in the morning, when they are first started up, as they do later in the day.
To account for this we will sample parts in the morning and in the afternoon. So
as not to impact other QC operations too severely, it was decided to sample 10
parts, twice a day, for three days from each of the three machines. Daily
throughput will be recorded as well.
We are expecting readings around .125 +/- .003 inches. The parts will be
measured using a standard micrometer with readings recorded to 0.0001 of an
inch. Throughput will be measured by reading the part counters on the machines
at the end of each day.
Data The following are the data that were collected for this study.
MACHINE DAY
TIME SAMPLE DIAMETER
(1-3) (1-3)
1 = AM (1-10) (inches)
2 = PM
------------------------------------------------------
1 1 1 1 0.1247
1 1 1 2 0.1264
1 1 1 3 0.1252
1 1 1 4 0.1253
1 1 1 5 0.1263
1 1 1 6 0.1251
1 1 1 7 0.1254
1 1 1 8 0.1239
1 1 1 9 0.1235
1 1 1 10 0.1257
1 1 2 1 0.1271
1 1 2 2 0.1253
1 1 2 3 0.1265
1 1 2 4 0.1254
1 1 2 5 0.1243
1 1 2 6 0.124
1 1 2 7 0.1246
1 1 2 8 0.1244
1 1 2 9 0.1271
1 1 2 10 0.1241
1 2 1 1 0.1251
1 2 1 2 0.1238
1 2 1 3 0.1255
1 2 1 4 0.1234
1 2 1 5 0.1235
1 2 1 6 0.1266
1 2 1 7 0.125
1 2 1 8 0.1246
1 2 1 9 0.1243
1 2 1 10 0.1248
1 2 2 1 0.1248
1 2 2 2 0.1235
1 2 2 3 0.1243
1 2 2 4 0.1265
1 2 2 5 0.127
1 2 2 6 0.1229
1 2 2 7 0.125
1 2 2 8 0.1248
1 2 2 9 0.1252
1 2 2 10 0.1243
1 3 1 1 0.1255
1 3 1 2 0.1237
1 3 1 3 0.1235
1 3 1 4 0.1264
1 3 1 5 0.1239
1 3 1 6 0.1266
1 3 1 7 0.1242
1 3 1 8 0.1231
1 3 1 9 0.1232
1 3 1 10 0.1244
1 3 2 1 0.1233
1 3 2 2 0.1237
1 3 2 3 0.1244
1 3 2 4 0.1254
1 3 2 5 0.1247
1 3 2 6 0.1254
1 3 2 7 0.1258
1 3 2 8 0.126
1 3 2 9 0.1235
1 3 2 10 0.1273
2 1 1 1 0.1239
2 1 1 2 0.1239
2 1 1 3 0.1239
2 1 1 4 0.1231
2 1 1 5 0.1221
2 1 1 6 0.1216
2 1 1 7 0.1233
2 1 1 8 0.1228
2 1 1 9 0.1227
2 1 1 10 0.1229
2 1 2 1 0.122
2 1 2 2 0.1239
2 1 2 3 0.1237
2 1 2 4 0.1216
2 1 2 5 0.1235
2 1 2 6 0.124
2 1 2 7 0.1224
2 1 2 8 0.1236
2 1 2 9 0.1236
2 1 2 10 0.1217
2 2 1 1 0.1247
2 2 1 2 0.122
2 2 1 3 0.1218
2 2 1 4 0.1237
2 2 1 5 0.1234
2 2 1 6 0.1229
2 2 1 7 0.1235
2 2 1 8 0.1237
2 2 1 9 0.1224
2 2 1 10 0.1224
2 2 2 1 0.1239
2 2 2 2 0.1226
2 2 2 3 0.1224
2 2 2 4 0.1239
2 2 2 5 0.1237
2 2 2 6 0.1227
2 2 2 7 0.1218
2 2 2 8 0.122
2 2 2 9 0.1231
2 2 2 10 0.1244
2 3 1 1 0.1219
2 3 1 2 0.1243
2 3 1 3 0.1231
2 3 1 4 0.1223
2 3 1 5 0.1218
2 3 1 6 0.1218
2 3 1 7 0.1225
2 3 1 8 0.1238
2 3 1 9 0.1244
2 3 1 10 0.1236
2 3 2 1 0.1231
2 3 2 2 0.1223
2 3 2 3 0.1241
2 3 2 4 0.1215
2 3 2 5 0.1221
2 3 2 6 0.1236
2 3 2 7 0.1229
2 3 2 8 0.1205
2 3 2 9 0.1241
2 3 2 10 0.1232
3 1 1 1 0.1255
3 1 1 2 0.1215
3 1 1 3 0.1219
3 1 1 4 0.1253
3 1 1 5 0.1232
3 1 1 6 0.1266
3 1 1 7 0.1271
3 1 1 8 0.1209
3 1 1 9 0.1212
3 1 1 10 0.1249
3 1 2 1 0.1228
3 1 2 2 0.126
3 1 2 3 0.1242
3 1 2 4 0.1236
3 1 2 5 0.1248
3 1 2 6 0.1243
3 1 2 7 0.126
3 1 2 8 0.1231
3 1 2 9 0.1234
3 1 2 10 0.1246
3 2 1 1 0.1207
3 2 1 2 0.1279
3 2 1 3 0.1268
3 2 1 4 0.1222
3 2 1 5 0.1244
3 2 1 6 0.1225
3 2 1 7 0.1234
3 2 1 8 0.1244
3 2 1 9 0.1207
3 2 1 10 0.1264
3 2 2 1 0.1224
3 2 2 2 0.1254
3 2 2 3 0.1237
3 2 2 4 0.1254
3 2 2 5 0.1269
3 2 2 6 0.1236
3 2 2 7 0.1248
3 2 2 8 0.1253
3 2 2 9 0.1252
3 2 2 10 0.1237
3 3 1 1 0.1217
3 3 1 2 0.122
3 3 1 3 0.1227
3 3 1 4 0.1202
3 3 1 5 0.127
3 3 1 6 0.1224
3 3 1 7 0.1219
3 3 1 8 0.1266
3 3 1 9 0.1254
3 3 1 10 0.1258
3 3 2 1 0.1236
3 3 2 2 0.1247
3 3 2 3 0.124
3 3 2 4 0.1235
3 3 2 5 0.124
3 3 2 6 0.1217
3 3 2 7 0.1235
3 3 2 8 0.1242
3 3 2 9 0.1247
3 3 2 10 0.125
Conclusions We can make the following conclusions from this box plot.
From Box 1. The location appears to be significantly different for the three machines, with machine 2
Plot having the smallest median diameter and machine 1 having the largest median diameter.
2. Machines 1 and 2 have comparable variability while machine 3 has somewhat larger
variability.
Conclusions We can draw the following conclusion from this box plot. Neither the location nor the spread
From Box seem to differ significantly by day.
Plot
Conclusion We can draw the following conclusion from this box plot. Neither the location nor the spread
From Box seem to differ significantly by time of day.
Plot
Conclusion We can draw the following conclusion from this box plot. Although there are some minor
From Box differences in location and spread between the samples, these differences do not show a
Plot noticeable pattern and do not seem significant.
**********************************
**********************************
** 4-WAY ANALYSIS OF VARIANCE **
**********************************
**********************************
*****************
* ANOVA TABLE *
*****************
1.2478 73.441%
FACTOR 4 9 0.000009 0.000001
0.5205 14.172%
-------------------------------------------------------------------------------
RESIDUAL 165 0.000312 0.000002
****************
* ESTIMATION *
****************
GRAND MEAN =
0.12395893037E+00
GRAND STANDARD DEVIATION =
0.15631503193E-02
0.00031
-- 9.00000 18. 0.12376 -0.00020
0.00031
-- 10.00000 18. 0.12440 0.00044
0.00031
Interpretation The first thing to note is that Dataplot fits an overall mean when
of ANOVA performing the ANOVA. That is, it fits the model
Output
Analysis of The previous analysis of variance indicated that only the machine
Variance factor was statistically significant. The following shows the ANOVA
Using Only output using only the machine factor.
Machine
**********************************
**********************************
** 1-WAY ANALYSIS OF VARIANCE **
**********************************
**********************************
*****************
* ANOVA TABLE *
*****************
****************
* ESTIMATION *
****************
GRAND MEAN =
0.12395893037E+00
GRAND STANDARD DEVIATION =
0.15631503193E-02
Interpretation At this stage, we are interested in the effect estimates for the machine variable. These can be
of ANOVA summarized in the following table.
Output
Means for Oneway Anova
Level Number Mean Standard Error Lower 95% CI Upper 95% CI
1 60 0.124887 0.00018 0.12454 0.12523
2 60 0.122968 0.00018 0.12262 0.12331
3 60 0.124022 0.00018 0.12368 0.12437
The Dataplot macro file shows the computations required to go from the Dataplot ANOVA
output to the numbers in the above table.
Model As a final step, we validate the model by generating a 4-plot of the residuals.
Validation
The 4-plot does not indicate any significant problems with the ANOVA model.
3.5.2.4. Throughput
Summary of The throughput is summarized in the following table (this was part of the original data collection,
Throughput not the result of analysis).
Machine Day 1 Day 2 Day 3
1 576 604 583
2 657 604 586
3 510 546 571
This table shows that machine 3 had significantly lower throughput.
Analysis of We can confirm the statistical significance of the lower throughput of machine 3 by running an
Variance for analysis of variance.
Throughput
**********************************
**********************************
** 1-WAY ANALYSIS OF VARIANCE **
**********************************
**********************************
NUMBER OF OBSERVATIONS = 9
NUMBER OF FACTORS = 1
NUMBER OF LEVELS FOR FACTOR 1 = 3
BALANCED CASE
RESIDUAL STANDARD DEVIATION =
0.28953985214E+02
RESIDUAL DEGREES OF FREEDOM = 6
REPLICATION CASE
REPLICATION STANDARD DEVIATION =
0.28953985214E+02
REPLICATION DEGREES OF FREEDOM = 6
NUMBER OF DISTINCT CELLS = 3
*****************
* ANOVA TABLE *
*****************
****************
* ESTIMATION *
****************
GRAND MEAN =
0.58188891602E+03
GRAND STANDARD DEVIATION =
0.40692272186E+02
3. Analysis of Variance
4. Graph of Throughput
3.6. References
Box, G.E.P., Hunter, W.G., and Hunter, J.S. (1978), Statistics for Experimenters, John
Wiley and Sons, New York.
Cleveland, W.S. (1993), Visualizing Data, Hobart Press, New Jersey.
Hoaglin, D.C., Mosteller, F., and Tukey, J.W. (1985), Exploring Data Tables, Trends,
and Shapes, John Wiley and Sons, New York.
Hoaglin, D.C., Mosteller, F., and Tukey, J.W. (1991), Fundamentals of Exploratory
Analysis of Variance, John Wiley and Sons, New York.
4. Process Modeling
The goal for this chapter is to present the background and specific analysis techniques
needed to construct a statistical model that describes a particular scientific or
engineering process. The types of models discussed in this chapter are limited to those
based on an explicit mathematical function. These types of models can be used for
prediction of process outputs, for calibration, or for process optimization.
1. Introduction 2. Assumptions
1. Definition 1. Assumptions
2. Terminology
3. Uses
4. Methods
3. Design 4. Analysis
1. Definition 1. Modeling Steps
2. Importance 2. Model Selection
3. Design Principles 3. Model Fitting
4. Optimal Designs 4. Model Validation
5. Assessment 5. Model Improvement
The goal for this chapter is to present the background and specific analysis techniques needed to
construct a statistical model that describes a particular scientific or engineering process. The types
of models discussed in this chapter are limited to those based on an explicit mathematical
function. These types of models can be used for prediction of process outputs, for calibration, or
for process optimization.
1. Introduction to Process Modeling [4.1.]
1. What is process modeling? [4.1.1.]
2. What terminology do statisticians use to describe process models? [4.1.2.]
3. What are process models used for? [4.1.3.]
1. Estimation [4.1.3.1.]
2. Prediction [4.1.3.2.]
3. Calibration [4.1.3.3.]
4. Optimization [4.1.3.4.]
4. What are some of the different statistical methods for model building? [4.1.4.]
1. Linear Least Squares Regression [4.1.4.1.]
2. Nonlinear Least Squares Regression [4.1.4.2.]
3. Weighted Least Squares Regression [4.1.4.3.]
4. LOESS (aka LOWESS) [4.1.4.4.]
4. Process Modeling
4. Process Modeling
4.1. Introduction to Process Modeling
Example For example, the total variation of the measured pressure of a fixed amount of a gas in a tank can
be described by partitioning the variability into its deterministic part, which is a function of the
temperature of the gas, plus some left-over random error. Charles' Law states that the pressure of
a gas is proportional to its temperature under the conditions described here, and in this case most
of the variation will be deterministic. However, due to measurement error in the pressure gauge,
the relationship will not be purely deterministic. The random errors cannot be characterized
individually, but will follow some probability distribution that will describe the relative
frequencies of occurrence of different-sized errors.
Graphical Using the example above, the definition of process modeling can be graphically depicted like
Interpretation this:
Click Figure
for Full-Sized
Copy
The top left plot in the figure shows pressure data that vary deterministically with temperature
except for a small amount of random error. The relationship between pressure and temperature is
a straight line, but not a perfect straight line. The top row plots on the right-hand side of the
equals sign show a partitioning of the data into a perfect straight line and the remaining
"unexplained" random variation in the data (note the different vertical scales of these plots). The
plots in the middle row of the figure show the deterministic structure in the data again and a
histogram of the random variation. The histogram shows the relative frequencies of observing
different-sized random errors. The bottom row of the figure shows how the relative frequencies of
the random errors can be summarized by a (normal) probability distribution.
An Example Of course, the straight-line example is one of the simplest functions used for process modeling.
from a More Another example is shown below. The concept is identical to the straight-line example, but the
Complex structure in the data is more complex. The variation in is partitioned into a deterministic part,
Process which is a function of another variable, , plus some left-over random variation. (Again note the
difference in the vertical axis scales of the two plots in the top right of the figure.) A probability
distribution describes the leftover random variation.
An Example The examples of process modeling shown above have only one explanatory variable but the
with Multiple concept easily extends to cases with more than one explanatory variable. The three-dimensional
Explanatory perspective plots below show an example with two explanatory variables. Examples with three or
Variables more explanatory variables are exactly analogous, but are difficult to show graphically.
4. Process Modeling
4.1. Introduction to Process Modeling
All process models discussed in this chapter have this general form. As
alluded to earlier, the random errors that are included in the model make
the relationship between the response variable and the predictor
variables a "statistical" one, rather than a perfect deterministic one. This
is because the functional relationship between the response and
predictors holds only on average, not for each data point.
Some of the details about the different parts of the model are discussed
below, along with alternate terminology for the different components of
the model.
Response The response variable, , is a quantity that varies in a way that we hope
Variable to be able to summarize and exploit via the modeling process. Generally
it is known that the variation of the response variable is systematically
related to the values of one or more other variables before the modeling
process is begun, although testing the existence and nature of this
dependence is part of the modeling process itself.
Mathematical The mathematical function consists of two parts. These parts are the
Function predictor variables, , and the parameters, . The
predictor variables are observed along with the response variable. They
are the quantities described on the previous page as inputs to the
mathematical function, . The collection of all of the predictor
variables is denoted by for short.
The parameters are the quantities that will be estimated during the
modeling process. Their true values are unknown and unknowable,
except in simulation experiments. As for the predictor variables, the
collection of all of the parameters is denoted by for short.
For a straight line with a known slope of one, but an unknown intercept,
there would only be one parameter
For a quadratic surface with two predictor variables, there are six
parameters for the full model.
Random Like the parameters in the mathematical function, the random errors are
Error unknown. They are simply the difference between the data and the
mathematical function. They are assumed to follow a particular
probability distribution, however, which is used to describe their
aggregate behavior. The probability distribution that describes the errors
has a mean of zero and an unknown standard deviation, denoted by ,
that is another parameter in the model, like the 's.
Scope of In its correct usage, the term "model" refers to the equation above and
"Model" also includes the underlying assumptions made about the probability
distribution used to describe the variation of the random errors. Often,
however, people will also use the term "model" when referring
specifically to the mathematical function describing the deterministic
variation in the data. Since the function is part of the model, the more
limited usage is not wrong, but it is important to remember that the term
"model" might refer to more than just the mathematical function.
4. Process Modeling
4.1. Introduction to Process Modeling
Further 1. Estimation
Details 2. Prediction
3. Calibration
4. Optimization
4. Process Modeling
4.1. Introduction to Process Modeling
4.1.3. What are process models used for?
4.1.3.1. Estimation
More on As mentioned on the preceding page, the primary goal of estimation is to determine the value of
Estimation the regression function that is associated with a specific combination of predictor variable values.
The estimated values are computed by plugging the value(s) of the predictor variable(s) into the
regression equation, after estimating the unknown parameters from the data. This process is
illustrated below using the Pressure/Temperature example from a few pages earlier.
Example Suppose in this case the predictor variable value of interest is a temperature of 47 degrees.
Computing the estimated value of the regression function using the equation
Of course, if the pressure/temperature experiment were repeated, the estimates of the parameters
of the regression function obtained from the data would differ slightly each time because of the
randomness in the data and the need to sample a limited amount of data. Different parameter
estimates would, in turn, yield different estimated values. The plot below illustrates the type of
slight variation that could occur in a repeated experiment.
Estimated
Value from
a Repeated
Experiment
Uncertainty A critical part of estimation is an assessment of how much an estimated value will fluctuate due
of the to the noise in the data. Without that information there is no basis for comparing an estimated
Estimated value to a target value or to another estimate. Any method used for estimation should include an
Value assessment of the uncertainty in the estimated value(s). Fortunately it is often the case that the
data used to fit the model to a process can also be used to compute the uncertainty of estimated
values obtained from the model. In the pressure/temperature example a confidence interval for the
value of the regresion function at 47 degrees can be computed from the data used to fit the model.
The plot below shows a 99% confidence interval produced using the original data. This interval
gives the range of plausible values for the average pressure for a temperature of 47 degrees based
on the parameter estimates and the noise in the data.
99%
Confidence
Interval for
Pressure at
T=47
Length of Because the confidence interval is an interval for the value of the regression function, the
Confidence uncertainty only includes the noise that is inherent in the estimates of the regression parameters.
Intervals The uncertainty in the estimated value can be less than the uncertainty of a single measurement
from the process because the data used to estimate the unknown parameters is essentially
averaged (in a way that depends on the statistical method being used) to determine each
parameter estimate. This "averaging" of the data tends to cancel out errors inherent in each
individual observed data point. The noise in the this type of result is generally less than the noise
in the prediction of one or more future measurements, which must account for both the
uncertainty in the estimated parameters and the uncertainty of the new measurement.
More Info For more information on the interpretation and computation confidence, intervals see Section 5.1
4. Process Modeling
4.1. Introduction to Process Modeling
4.1.3. What are process models used for?
4.1.3.2. Prediction
More on As mentioned earlier, the goal of prediction is to determine future value(s) of the response
Prediction variable that are associated with a specific combination of predictor variable values. As in
estimation, the predicted values are computed by plugging the value(s) of the predictor variable(s)
into the regression equation, after estimating the unknown parameters from the data. The
difference between estimation and prediction arises only in the computation of the uncertainties.
These differences are illustrated below using the Pressure/Temperature example in parallel with
the example illustrating estimation.
Example Suppose in this case the predictor variable value of interest is a temperature of 47 degrees.
Computing the predicted value using the equation
Of course, if the pressure/temperature experiment were repeated, the estimates of the parameters
of the regression function obtained from the data would differ slightly each time because of the
randomness in the data and the need to sample a limited amount of data. Different parameter
estimates would, in turn, yield different predicted values. The plot below illustrates the type of
slight variation that could occur in a repeated experiment.
Predicted
Value from
a Repeated
Experiment
Prediction A critical part of prediction is an assessment of how much a predicted value will fluctuate due to
Uncertainty the noise in the data. Without that information there is no basis for comparing a predicted value to
a target value or to another prediction. As a result, any method used for prediction should include
an assessment of the uncertainty in the predicted value(s). Fortunately it is often the case that the
data used to fit the model to a process can also be used to compute the uncertainty of predictions
from the model. In the pressure/temperature example a prediction interval for the value of the
regresion function at 47 degrees can be computed from the data used to fit the model. The plot
below shows a 99% prediction interval produced using the original data. This interval gives the
range of plausible values for a single future pressure measurement observed at a temperature of
47 degrees based on the parameter estimates and the noise in the data.
99%
Prediction
Interval for
Pressure at
T=47
Length of Because the prediction interval is an interval for the value of a single new measurement from the
Prediction process, the uncertainty includes the noise that is inherent in the estimates of the regression
Intervals parameters and the uncertainty of the new measurement. This means that the interval for a new
measurement will be wider than the confidence interval for the value of the regression function.
These intervals are called prediction intervals rather than confidence intervals because the latter
are for parameters, and a new measurement is a random variable, not a parameter.
Tolerance Like a prediction interval, a tolerance interval brackets the plausible values of new measurements
Intervals from the process being modeled. However, instead of bracketing the value of a single
measurement or a fixed number of measurements, a tolerance interval brackets a specified
percentage of all future measurements for a given set of predictor variable values. For example, to
monitor future pressure measurements at 47 degrees for extreme values, either low or high, a
tolerance interval that brackets 98% of all future measurements with high confidence could be
used. If a future value then fell outside of the interval, the system would then be checked to
ensure that everything was working correctly. A 99% tolerance interval that captures 98% of all
future pressure measurements at a temperature of 47 degrees is 192.4655 +/- 14.5810. This
interval is wider than the prediction interval for a single measurement because it is designed to
capture a larger proportion of all future measurements. The explanation of tolerance intervals is
potentially confusing because there are two percentages used in the description of the interval.
One, in this case 99%, describes how confident we are that the interval will capture the quantity
that we want it to capture. The other, 98%, describes what the target quantity is, which in this
case that is 98% of all future measurements at T=47 degrees.
More Info For more information on the interpretation and computation of prediction and tolerance intervals,
see Section 5.1.
4. Process Modeling
4.1. Introduction to Process Modeling
4.1.3. What are process models used for?
4.1.3.3. Calibration
More on As mentioned in the page introducing the different uses of process models, the goal of calibration
Calibration is to quantitatively convert measurements made on one of two measurement scales to the other
measurement scale. The two scales are generally not of equal importance, so the conversion
occurs in only one direction. The primary measurement scale is usually the scientifically relevant
scale and measurements made directly on this scale are often the more precise (relatively) than
measurements made on the secondary scale. A process model describing the relationship between
the two measurement scales provides the means for conversion. A process model that is
constructed primarily for the purpose of calibration is often referred to as a "calibration curve". A
graphical depiction of the calibration process is shown in the plot below, using the example
described next.
Example Thermocouples are a common type of temperature measurement device that is often more
practical than a thermometer for temperature assessment. Thermocouples measure temperature in
terms of voltage, however, rather than directly on a temperature scale. In addition, the response of
a particular thermocouple depends on the exact formulation of the metals used to construct it,
meaning two thermocouples will respond somewhat differently under identical measurement
conditions. As a result, thermocouples need to be calibrated to produce interpretable measurement
information. The calibration curve for a thermocouple is often constructed by comparing
thermocouple output to relatively precise thermometer data. Then, when a new temperature is
measured with the thermocouple, the voltage is converted to temperature terms by plugging the
observed voltage into the regression equation and solving for temperature.
The plot below shows a calibration curve for a thermocouple fit with a locally quadratic model
using a method called LOESS. Traditionally, complicated, high-degree polynomial models have
been used for thermocouple calibration, but locally linear or quadratic models offer better
computational stability and more flexibility. With the locally quadratic model the solution of the
regression equation for temperature is done numerically rather than analytically, but the concept
of calibration is identical regardless of which type of model is used. It is important to note that the
thermocouple measurements, made on the secondary measurement scale, are treated as the
response variable and the more precise thermometer results, on the primary scale, are treated as
the predictor variable because this best satisfies the underlying assumptions of the analysis.
Thermocouple
Calibration
Just as in estimation or prediction, if the calibration experiment were repeated, the results would
vary slighly due to the randomness in the data and the need to sample a limited amount of data
from the process. This means that an uncertainty statement that quantifies how much the results
of a particular calibration could vary due to randomness is necessary. The plot below shows what
would happen if the thermocouple calibration were repeated under conditions identical to the first
experiment.
Calibration
Result from
Repeated
Experiment
Calibration Again, as with prediction, the data used to fit the process model can also be used to determine the
Uncertainty uncertainty in the calibration. Both the variation in the estimated model parameters and in the
new voltage observation need to be accounted for. This is similar to uncertainty for the prediction
of a new measurement. In fact, calibration intervals are computed by solving for the predictor
variable value in the formulas for a prediction interval end points. The plot below shows a 99%
calibration interval for the original calibration data used in the first plot on this page. The area of
interest in the plot has been magnified so the endpoints of the interval can be visually
differentiated. The calibration interval is 387.3748 +/- 0.307 degrees Celsius.
In almost all calibration applications the ultimate quantity of interest is the true value of the
primary-scale measurement method associated with a measurement made on the secondary scale.
As a result, there are no analogs of the prediction interval or tolerance interval in calibration.
More Info More information on the construction and interpretation of calibration intervals can be found in
Section 5.2 of this chapter. There is also more information on calibration, especially "one-point"
calibrations and other special cases, in Section 3 of Chapter 2: Measurement Process
Characterization.
4. Process Modeling
4.1. Introduction to Process Modeling
4.1.3. What are process models used for?
4.1.3.4. Optimization
More on As mentioned earlier, the goal of optimization is to determine the necessary process input values
Optimization to obtain a desired output. Like calibration, optimization involves substitution of an output value
for the response variable and solving for the associated predictor variable values. The process
model is again the link that ties the inputs and output together. Unlike calibration and prediction,
however, successful optimization requires a cause-and-effect relationship between the predictors
and the response variable. Designed experiments, run in a randomized order, must be used to
ensure that the process model represents a cause-and-effect relationship between the variables.
Quadratic models are typically used, along with standard calculus techniques for finding
minimums and maximums, to carry out an optimization. Other techniques can also be used,
however. The example discussed below includes a graphical depiction of the optimization
process.
Example In a manufacturing process that requires a chemical reaction to take place, the temperature and
pressure under which the process is carried out can affect reaction time. To maximize the
throughput of this process, an optimization experiment was carried out in the neighborhood of the
conditions felt to be best, using a central composite design with 13 runs. Calculus was used to
determine the input values associated with local extremes in the regression function. The plot
below shows the quadratic surface that was fit to the data and conceptually how the input values
associated with the maximum throughput are found.
As with prediction and calibration, randomness in the data and the need to sample data from the
process affect the results. If the optimization experiment were carried out again under identical
conditions, the optimal input values computed using the model would be slightly different. Thus,
it is important to understand how much random variability there is in the results in order to
interpret the results correctly.
Optimization
Result from
Repeated
Experiment
Optimization As with prediction and calibration, the uncertainty in the input values estimated to maximize
Uncertainty throughput can also be computed from the data used to fit the model. Unlike prediction or
calibration, however, optimization almost always involves simultaneous estimation of several
quantities, the values of the process inputs. As a result, we will compute a joint confidence region
for all of the input values, rather than separate uncertainty intervals for each input. This
confidence region will contain the complete set of true process inputs that will maximize
throughput with high probability. The plot below shows the contours of equal throughput on a
map of various possible input value combinations. The solid contours show throughput while the
dashed contour in the center encloses the plausible combinations of input values that yield
optimum results. The "+" marks the estimated optimum value. The dashed region is a 95% joint
confidence region for the two process inputs. In this region the throughput of the process will be
approximately 217 units/hour.
Contour
Plot,
Estimated
Optimum &
Confidence
Region
More Info Computational details for optimization are primarily presented in Chapter 5: Process
Improvement along with material on appropriate experimental designs for optimization. Section
5.5.3. specifically focuses on optimization methods and their associated uncertainties.
4. Process Modeling
4.1. Introduction to Process Modeling
Selecting an Model building, however, is different from most other areas of statistics
Appropriate with regard to method selection. There are more general approaches and
Stat more competing techniques available for model building than for most
Method: other types of problems. There is often more than one statistical tool that
Modeling can be effectively applied to a given modeling application. The large
menu of methods applicable to modeling problems means that there is
both more opportunity for effective and efficient solutions and more
potential to spend time doing different analyses, comparing different
solutions and mastering the use of different tools. The remainder of this
section will introduce and briefly discuss some of the most popular and
well-established statistical techniques that are useful for different model
building situations.
4. Process Modeling
4.1. Introduction to Process Modeling
4.1.4. What are some of the different statistical methods for model building?
Definition of a Used directly, with an appropriate data set, linear least squares
Linear Least
regression can be used to fit the data with any function of the form
Squares
Model
in which
1. each explanatory variable in the function is multiplied by an
unknown parameter,
2. there is at most one unknown parameter with no corresponding
explanatory variable, and
3. all of the individual terms are summed to produce the final
function value.
In statistical terms, any function that meets these criteria would be
called a "linear function". The term "linear" is used, even though the
function may not be a straight line, because if the unknown parameters
are considered to be variables and the explanatory variables are
considered to be known coefficients corresponding to those
"variables", then the problem becomes a system (usually
overdetermined) of linear equations that can be solved for the values
of the unknown parameters. To differentiate the various meanings of
the word "linear", the linear models being discussed here are often
Why "Least Linear least squares regression also gets its name from the way the
Squares"? estimates of the unknown parameters are computed. The "method of
least squares" that is used to obtain parameter estimates was
independently developed in the late 1700's and the early 1800's by the
mathematicians Karl Friedrich Gauss, Adrien Marie Legendre and
(possibly) Robert Adrain [Stigler (1978)] [Harter (1983)] [Stigler
(1986)] working in Germany, France and America, respectively. In the
least squares method the unknown parameters are estimated by
minimizing the sum of the squared deviations between the data and
the model. The minimization process reduces the overdetermined
system of equations formed by the data to a sensible system of
(where is the number of parameters in the functional part of the
model) equations in unknowns. This new system of equations is
then solved to obtain the parameter estimates. To learn more about
how the method of least squares is used to estimate the parameters,
see Section 4.4.3.1.
Examples of As just mentioned above, linear models are not limited to being
Linear straight lines or planes, but include a fairly wide range of shapes. For
Functions example, a simple quadratic curve
or a polynomial in
is also linear in the statistical sense because they are linear in the
parameters, though not with respect to the observed explanatory
variable, .
Nonlinear Just as models that are linear in the statistical sense do not have to be
Model linear with respect to the explanatory variables, nonlinear models can
Example be linear with respect to the explanatory variables, but not with respect
to the parameters. For example,
Advantages of
Linear least squares regression has earned its place as the primary tool
Linear Least
for process modeling because of its effectiveness and completeness.
Squares
Though there are types of data that are better described by functions
that are nonlinear in the parameters, many processes in science and
engineering are well-described by linear models. This is because
either the processes are inherently linear or because, over short ranges,
any process can be well-approximated by a linear model.
Disadvantages The main disadvantages of linear least squares are limitations in the
of Linear shapes that linear models can assume over long ranges, possibly poor
Least Squares extrapolation properties, and sensitivity to outliers.
4. Process Modeling
4.1. Introduction to Process Modeling
4.1.4. What are some of the different statistical methods for model building?
Definition of a As the name suggests, a nonlinear model is any model of the basic
Nonlinear form
Regression
Model
.
in which
1. the functional part of the model is not linear with respect to the
unknown parameters, , and
2. the method of least squares is used to estimate the values of the
unknown parameters.
Due to the way in which the unknown parameters of the function are
usually estimated, however, it is often much easier to work with
models that meet two additional criteria:
3. the function is smooth with respect to the unknown parameters,
and
4. the least squares criterion that is used to obtain the parameter
estimates has a unique solution.
These last two criteria are not essential parts of the definition of a
nonlinear least squares model, but are of practical importance.
Advantages of The biggest advantage of nonlinear least squares regression over many
Nonlinear other techniques is the broad range of functions that can be fit.
Least Squares Although many scientific and engineering processes can be described
well using linear models, or other relatively simple types of models,
there are many other processes that are inherently nonlinear. For
example, the strengthening of concrete as it cures is a nonlinear
process. Research on concrete strength shows that the strength
increases quickly at first and then levels off, or approaches an
asymptote in mathematical terms, over time. Linear models do not
describe processes that asymptote very well because for all linear
functions the function value can't increase or decrease at a declining
rate as the explanatory variables go to the extremes. There are many
types of nonlinear models, on the other hand, that describe the
asymptotic behavior of a process well. Like the asymptotic behavior
of some processes, other features of physical processes can often be
expressed more easily using nonlinear models than with simpler
model types.
Disadvantages The major cost of moving to nonlinear least squares regression from
of Nonlinear simpler modeling techniques like linear least squares is the need to use
Least Squares iterative optimization procedures to compute the parameter estimates.
With functions that are linear in the parameters, the least squares
estimates of the parameters can always be obtained analytically, while
that is generally not the case with nonlinear models. The use of
iterative procedures requires the user to provide starting values for the
unknown parameters before the software can begin the optimization.
The starting values must be reasonably close to the as yet unknown
parameter estimates or the optimization procedure may not converge.
Bad starting values can also cause the software to converge to a local
minimum rather than the global minimum that defines the least
squares estimates.
4. Process Modeling
4.1. Introduction to Process Modeling
4.1.4. What are some of the different statistical methods for model building?
Linespacing
Measurement
Error Data
Model Types Unlike linear and nonlinear least squares regression, weighted least squares regression is not
and Weighted associated with a particular type of function used to describe the relationship between the process
Least Squares variables. Instead, weighted least squares reflects the behavior of the random errors in the model;
and it can be used with functions that are either linear or nonlinear in the parameters. It works by
incorporating extra nonnegative constants, or weights, associated with each data point, into the
fitting criterion. The size of the weight indicates the precision of the information contained in the
associated observation. Optimizing the weighted fitting criterion to find the parameter estimates
allows the weights to determine the contribution of each observation to the final parameter
estimates. It is important to note that the weight for each observation is given relative to the
weights of the other observations; so different sets of absolute weights can have identical effects.
Advantages of Like all of the least squares methods discussed so far, weighted least squares is an efficient
Weighted method that makes good use of small data sets. It also shares the ability to provide different types
Least Squares of easily interpretable statistical intervals for estimation, prediction, calibration and optimization.
In addition, as discussed above, the main advantage that weighted least squares enjoys over other
methods is the ability to handle regression situations in which the data points are of varying
quality. If the standard deviation of the random errors in the data is not constant across all levels
of the explanatory variables, using weighted least squares with weights that are inversely
proportional to the variance at each level of the explanatory variables yields the most precise
parameter estimates possible.
Disadvantages The biggest disadvantage of weighted least squares, which many people are not aware of, is
of Weighted probably the fact that the theory behind this method is based on the assumption that the weights
Least Squares are known exactly. This is almost never the case in real applications, of course, so estimated
weights must be used instead. The effect of using estimated weights is difficult to assess, but
experience indicates that small variations in the the weights due to estimation do not often affect a
regression analysis or its interpretation. However, when the weights are estimated from small
numbers of replicated observations, the results of an analysis can be very badly and unpredictably
affected. This is especially likely to be the case when the weights for extreme values of the
predictor or explanatory variables are estimated using only a few observations. It is important to
remain aware of this potential problem, and to only use weighted least squares when the weights
can be estimated precisely relative to one another [Carroll and Ruppert (1988), Ryan (1997)].
Weighted least squares regression, like the other least squares methods, is also sensitive to the
effects of outliers. If potential outliers are not investigated and dealt with appropriately, they will
likely have a negative impact on the parameter estimation and other aspects of a weighted least
squares analysis. If a weighted least squares regression actually increases the influence of an
outlier, the results of the analysis may be far inferior to an unweighted least squares analysis.
Futher Further information on the weighted least squares fitting criterion can be found in Section 4.3.
Information Discussion of methods for weight estimation can be found in Section 4.5.
4. Process Modeling
4.1. Introduction to Process Modeling
4.1.4. What are some of the different statistical methods for model building?
Localized The subsets of data used for each weighted least squares fit in LOESS
Subsets of are determined by a nearest neighbors algorithm. A user-specified
Data input to the procedure called the "bandwidth" or "smoothing
parameter" determines how much of the data is used to fit each local
polynomial. The smoothing parameter, q, is a number between
(d+1)/n and 1, with d denoting the degree of the local polynomial. The
value of q is the proportion of data used in each fit. The subset of data
used in each weighted least squares fit is comprised of the nq
(rounded to the next largest integer) points whose explanatory
variables values are closest to the point at which the response is being
estimated.
Degree of The local polynomials fit to each subset of the data are almost always
Local of first or second degree; that is, either locally linear (in the straight
Polynomials line sense) or locally quadratic. Using a zero degree polynomial turns
LOESS into a weighted moving average. Such a simple local model
might work well for some situations, but may not always approximate
the underlying function well enough. Higher-degree polynomials
would work in theory, but yield models that are not really in the spirit
of LOESS. LOESS is based on the ideas that any function can be well
approximated in a small neighborhood by a low-order polynomial and
that simple models can be fit to data easily. High-degree polynomials
would tend to overfit the data in each subset and are numerically
unstable, making accurate computations difficult.
Weight As mentioned above, the weight function gives the most weight to the
Function data points nearest the point of estimation and the least weight to the
data points that are furthest away. The use of the weights is based on
the idea that points near each other in the explanatory variable space
are more likely to be related to each other in a simple way than points
that are further apart. Following this logic, points that are likely to
follow the local model best influence the local model parameter
estimates the most. Points that are less likely to actually conform to
the local model have less influence on the local model parameter
estimates.
The traditional weight function used for LOESS is the tri-cube weight
function,
However, any other weight function that satisfies the properties listed
in Cleveland (1979) could also be used. The weight for a specific
point in any localized subset of data is obtained by evaluating the
weight function at the distance between that point and the point of
estimation, after scaling the distance so that the maximum absolute
distance over all of the points in the subset of data is exactly one.
Advantages of As discussed above, the biggest advantage LOESS has over many
LOESS other methods is the fact that it does not require the specification of a
function to fit a model to all of the data in the sample. Instead the
analyst only has to provide a smoothing parameter value and the
degree of the local polynomial. In addition, LOESS is very flexible,
making it ideal for modeling complex processes for which no
theoretical models exist. These two advantages, combined with the
simplicity of the method, make LOESS one of the most attractive of
the modern regression methods for applications that fit the general
framework of least squares regression but which have a complex
deterministic structure.
Although it is less obvious than for some of the other methods related
to linear least squares regression, LOESS also accrues most of the
benefits typically shared by those procedures. The most important of
those is the theory for computing uncertainties for prediction and
calibration. Many other tests and procedures used for validation of
least squares models can also be extended to LOESS models.
Disadvantages Although LOESS does share many of the best features of other least
of LOESS squares methods, efficient use of data is one advantage that LOESS
doesn't share. LOESS requires fairly large, densely sampled data sets
in order to produce good models. This is not really surprising,
however, since LOESS needs good empirical information on the local
structure of the process in order perform the local fitting. In fact, given
the results it provides, LOESS could arguably be more efficient
overall than other methods like nonlinear least squares. It may simply
frontload the costs of an experiment in data collection but then reduce
analysis costs.
4. Process Modeling
Checking If the implicit assumptions that underlie a particular action are not true,
Assumptions then that action is not likely to meet expectations either. Sometimes it is
Provides abundantly clear when a goal has been met, but unfortunately that is not
Feedback on always the case. In particular, it is usually not possible to obtain
Actions immediate feedback on the attainment of goals in most process
modeling applications. The goals of process modeling, sucha as
answering a scientific or engineering question, depend on the
correctness of a process model, which can often only be directly and
absolutely determined over time. In lieu of immediate, direct feedback,
however, indirect information on the effectiveness of a process
modeling analysis can be obtained by checking the validity of the
underlying assumptions. Confirming that the underlying assumptions
are valid helps ensure that the methods of analysis were appropriate and
that the results will be consistent with the goals.
4. Process Modeling
4.2. Underlying Assumptions for Process Modeling
4. Process Modeling
4.2. Underlying Assumptions for Process Modeling
4.2.1. What are the typical underlying assumptions in process modeling?
Role of The overall goal of all statistical procedures, including those designed
Random for process modeling, is to enable valid conclusions to be drawn from
Variation noisy data. As a result, statistical procedures are designed to compare
apparent effects found in a data set to the noise in the data in order to
determine whether the effects are more likely to be caused by a
repeatable underlying phenomenon of some sort or by fluctuations in
the data that happened by chance. Thus the random variation in the
process serves as a baseline for drawing conclusions about the nature
of the deterministic part of the process. If there were no random noise
in the process, then conclusions based on statistical methods would no
longer make sense or be appropriate.
4. Process Modeling
4.2. Underlying Assumptions for Process Modeling
4.2.1. What are the typical underlying assumptions in process modeling?
Validity of The validity of this assumption is determined by both the nature of the
Assumption process and, to some extent, by the data collection methods used. The
Improved by process may be one in which the data are easily measured and it will be
Experimental clear that the data have a direct relationship to the regression function.
Design When this is the case, use of optimal methods of data collection are not
critical to the success of the modeling effort. Of course, it is rarely
known that this will be the case for sure, so it is usually worth the effort
to collect the data in the best way possible.
In the most difficult processes even good experimental design may not
be able to salvage a set of data that includes a high level of systematic
error. In these situations the best that can be hoped for is recognition of
the fact that the true regression function has not been identified by the
analysis. Then effort can be put into finding a better way to solve the
problem by correcting for the systematic error using additional
information, redesigning the measurement system to eliminate the
systematic errors, or reformulating the problem to obtain the needed
information another way.
Assumption Another more subtle violation of this assumption occurs when the
Violated by explanatory variables are observed with random error. Although it
Errors in intuitively seems like random errors in the explanatory variables should
Observation cancel out on average, just as random errors in the observation of the
of response variable do, that is unfortunately not the case. The direct
linkage between the unknown parameters and the explanatory variables
in the functional part of the model makes this situation much more
complicated than it is for the random errors in the response variable .
More information on why this occurs can be found in Section 4.2.1.6.
4. Process Modeling
4.2. Underlying Assumptions for Process Modeling
4.2.1. What are the typical underlying assumptions in process modeling?
Assumption The assumption that the random errors have constant standard deviation
Not Needed is not implicit to weighted least squares regression. Instead, it is
for Weighted assumed that the weights provided in the analysis correctly indicate the
Least differing levels of variability present in the response variables. The
Squares weights are then used to adjust the amount of influence each data point
has on the estimates of the model parameters to an appropriate level.
They are also used to adjust prediction and calibration uncertainties to
the correct levels for different regions of the data set.
Assumption Even though it uses weighted least squares to estimate the model
Does Apply parameters, LOESS still relies on the assumption of a constant standard
to LOESS deviation. The weights used in LOESS actually reflect the relative level
of similarity between mean response values at neighboring points in the
explanatory variable space rather than the level of response precision at
each set of explanatory variable values. Actually, because LOESS uses
separate parameter estimates in each localized subset of data, it does not
require the assumption of a constant standard deviation of the data for
parameter estimation. The subsets of data used in LOESS are usually
small enough that the precision of the data is roughly constant within
each subset. LOESS normally makes no provisions for adjusting
uncertainty computations for differing levels of precision across a data
set, however.
4. Process Modeling
4.2. Underlying Assumptions for Process Modeling
4.2.1. What are the typical underlying assumptions in process modeling?
Non-Normal Of course, if it turns out that the random errors in the process are not
Random normally distributed, then any inferences made about the process may
Errors May be incorrect. If the true distribution of the random errors is such that
Result in the scatter in the data is less than it would be under a normal
Incorrect distribution, it is possible that the intervals used to capture the values
Inferences of the process parameters will simply be a little longer than necessary.
The intervals will then contain the true process parameters more often
than expected. It is more likely, however, that the intervals will be too
short or will be shifted away from the true mean value of the process
parameter being estimated. This will result in intervals that contain the
true process parameters less often than expected. When this is the case,
the intervals produced under the normal distribution assumption will
likely lead to incorrect conclusions being drawn about the process.
Parameter The methods used for parameter estimation can also imply the
Estimation assumption of normally distributed random errors. Some methods, like
Methods Can maximum likelihood, use the distribution of the random errors directly
Require to obtain parameter estimates. Even methods that do not use
Gaussian distributional methods for parameter estimation directly, like least
Errors squares, often work best for data that are free from extreme random
fluctuations. The normal distribution is one of the probability
distributions in which extreme random errors are rare. If some other
distribution actually describes the random errors better than the normal
distribution does, then different parameter estimation methods might
need to be used in order to obtain good estimates of the values of the
unknown parameters in the model.
4. Process Modeling
4.2. Underlying Assumptions for Process Modeling
4.2.1. What are the typical underlying assumptions in process modeling?
Data Best Given that we can never determine what the actual random errors in a
Reflects the particular data set are, representative samples of data are best obtained
Process Via by randomly sampling data from the process. In a simple random
Unbiased sample, every response from the population(s) being sampled has an
Sampling equal chance of being observed. As a result, while it cannot guarantee
that each sample will be representative of the process, random sampling
does ensure that the act of data collection does not leave behind any
biases in the data, on average. This means that most of the time, over
repeated samples, the data will be representative of the process. In
addition, under random sampling, probability theory can be used to
quantify how often particular modeling procedures will be affected by
relatively extreme variations in the data, allowing us to control the error
rates experienced when answering questions about the process.
4. Process Modeling
4.2. Underlying Assumptions for Process Modeling
4.2.1. What are the typical underlying assumptions in process modeling?
with denoting the random error associated with the basic form of
the model,
under all of the usual assumptions (denoted here more carefully than is
usually necessary), and is a value between and . This
extra term in the expression of the random error, ,
complicates matters because is typically not a constant.
For most functions, will depend on the explanatory
variable values and, more importantly, on . This is the source of the
problem with observing the explanatory variable values with random
error.
Biases Can Even if the 's have zero means, observation of the explanatory
Affect variables with random error can still bias the parameter estimates.
Parameter Depending on the method used to estimate the parameters, the
Estimates explanatory variables can be used in the computation of the parameter
When Means
estimates in ways that keep the 's from canceling out. One
of 's are 0
unfortunate example of this phenomenon is the use of least squares to
estimate the parameters of a straight line. In this case, because of the
simplicity of the model,
Berkson There is one type of model in which errors in the measurement of the
Model Does explanatory variables do not bias the parameter estimates. The Berkson
Not Depend model [Berkson (1950)] is a model in which the observed values of the
on this explanatory variables are directly controlled by the experimenter while
Assumption their true values vary for each observation. The differences between
the observed and true values for each explanatory variable are assumed
to be independent random variables from a normal distribution with a
mean of zero. In addition, the errors associated with each explanatory
variable must be independent of the errors associated with all of the
other explanatory variables, and also independent of the observed
values of each explanatory variable. Finally, the Berkson model
requires the functional part of the model to be a straight line, a plane,
or a higher-dimension first-order model in the explanatory variables.
When these conditions are all met, the errors in the explanatory
variables can be ignored.
Applications for which the Berkson model correctly describes the data
are most often situations in which the experimenter can adjust
equipment settings so that the observed values of the explanatory
variables will be known ahead of time. For example, in a study of the
relationship between the temperature used to dry a sample for chemical
analysis and the resulting concentration of a volatile consituent, an
oven might be used to prepare samples at temperatures of 300 to 500
degrees in 50 degree increments. In reality, however, the true
temperature inside the oven will probably not exactly equal 450
degrees each time that setting is used (or 300 when that setting is used,
etc). The Berkson model would apply, though, as long as the errors in
measuring the temperature randomly differed from one another each
time an observed value of 450 degrees was used and the mean of the
true temperatures over many repeated runs at an oven setting of 450
degrees really was 450 degrees. Then, as long as the model was also a
straight line relating the concentration to the observed values of
temperature, the errors in the measurement of temperature would not
bias the estimates of the parameters.
4. Process Modeling
4. Process Modeling
4.3. Data Collection for Process Modeling
DEX Problem There are 4 general engineering problem areas in which DEX may
Areas be applied:
1. Comparative
2. Screening/Characterizing
3. Modeling
4. Optimizing
4. Process Modeling
4.3. Data Collection for Process Modeling
What are Poor values of the coefficients are those for which the resulting
Good predicted values are considerably different from the observed raw data
Coefficient . Good values of the coefficients are those for which the resulting
Values? predicted values are close to the observed raw data . The best values
of the coefficients are those for which the resulting predicted values are
close to the observed raw data , and the statistical uncertainty
connected with each coefficient is small.
There are two considerations that are useful for the generation of "best"
coefficients:
1. Least squares criterion
2. Design of experiment principles
Least For a given data set (e.g., 10 ( , ) pairs), the most common procedure
Squares for obtaining the coefficients for is the least squares
Criterion
estimation criterion. This criterion yields coefficients with predicted
values that are closest to the raw data in the sense that the sum of the
squared differences between the raw data and the predicted values is as
small as possible.
The overwhelming majority of regression programs today use the least
squares criterion for estimating the model coefficients. Least squares
estimates are popular because
1. the estimators are statistically optimal (BLUEs: Best Linear
Unbiased Estimators);
2. the estimation algorithm is mathematically tractable, in closed
form, and therefore easily programmable.
How then can this be improved? For a given set of values it cannot
be; but frequently the choice of the values is under our control. If we
can select the values, the coefficients will have less variability than if
the are not controlled.
Design of As to what values should be used for the 's, we look to established
Experiment experimental design principles for guidance.
Principles
Five Designs To see the effect of experimental design on process modeling, consider
the following simplest case of fitting a line:
Are the Fitted But are the fitted lines, i.e., the fitted process models, better for some
Lines Better designs than for others? Are the coefficient estimator variances smaller
for Some for some designs than for others? For given estimates, are the resulting
Designs? predicted values better (that is, closer to the observed values) than for
other designs? The answer to all of the above is YES. It DOES make a
difference.
The most popular answer to the above question about which design to
use for linear modeling is design #1 with ten equi-spaced points. It can
be shown, however, that the variance of the estimated slope parameter
depends on the design according to the relationship
What is the What is the worst design in the above case? Of the five designs, the
Worst worst design is the one that has maximum variation. In the
Design? mathematical expression above, it is the one that minimizes the
denominator, and so this is design #4 above, for which almost all of the
data are located at the mid-range. Clearly the estimated line in this case
is going to chase the solitary point at each end and so the resulting
linear fit is intuitively inferior.
Designs 1, 2, What about the other 3 designs? Designs 1, 2, and 5 are useful only for
and 5 the case when we think the model may be linear, but we are not sure,
and so we allow additional points that permit fitting a line if
appropriate, but build into the design the "capacity" to fit beyond a line
(e.g., quadratic, cubic, etc.) if necessary. In this regard, the ordering of
the designs would be
● design 5 (if our worst-case model is quadratic),
4. Process Modeling
4.3. Data Collection for Process Modeling
Capacity for For your best-guess model, make sure that the design has the capacity
Primary for estimating the coefficients of that model. For a simple example of
Model this, if you are fitting a quadratic model, then make sure you have at
least three distinct horixontal axis points.
Capacity for If your best-guess model happens to be inadequate, make sure that the
Alternative design has the capacity to estimate the coefficients of your best-guess
Model back-up alternative model (which means implicitly that you should
have already identified such a model). For a simple example, if you
suspect (but are not positive) that a linear model is appropriate, then it
is best to employ a globally robust design (say, four points at each
extreme and three points in the middle, for a ten-point design) as
opposed to the locally optimal design (such as five points at each
extreme). The locally optimal design will provide a best fit to the line,
but have no capacity to fit a quadratic. The globally robust design will
provide a good (though not optimal) fit to the line and additionally
provide a good (though not optimal) fit to the quadratic.
Minimum For a given model, make sure the design has the property of
Variance of minimizing the variation of the least squares estimated coefficients.
Coefficient This is a general principle that is always in effect but which in
Estimators practice is hard to implement for many models beyond the simpler
1-factor models. For more complicated 1-factor
models, and for most multi-factor models, the
expressions for the variance of the least squares estimators, although
available, are complicated and assume more than the analyst typically
knows. The net result is that this principle, though important, is harder
to apply beyond the simple cases.
Sample Where Regardless of the simplicity or complexity of the model, there are
the Variation situations in which certain regions of the curve are noisier than others.
Is (Non A simple case is when there is a linear relationship between and
Constant but the recording device is proportional rather than absolute and so
Variance larger values of are intrinsically noisier than smaller values of . In
Case) such cases, sampling where the variation is means to have more
replicated points in those regions that are noisier. The practical
answer to how many such replicated points there should be is
Sample Where
A common occurence for non-linear models is for some regions of the
the Variation
curve to be steeper than others. For example, in fitting an exponential
Is (Steep
model (small corresponding to large , and large corresponding
Curve Case)
to small ) it is often the case that the data in the steep region are
intrinsically noisier than the data in the relatively flat regions. The
reason for this is that commonly the values themselves have a bit of
noise and this -noise gets translated into larger -noise in the steep
sections than in the shallow sections. In such cases, when we know
the shape of the response curve well enough to identify
steep-versus-shallow regions, it is often a good idea to sample more
heavily in the steep regions than in the shallow regions. A practical
rule-of-thumb for where to position the values in such situations is
to
1. sketch out your best guess for what the resulting curve will be;
Randomization Just because the 's have some natural ordering does not mean that
the data should be collected in the same order as the 's. Some aspect
of randomization should enter into every experiment, and experiments
for process modeling are no exception. Thus if your are sampling ten
points on a curve, the ten values should not be collected by
sequentially stepping through the values from the smallest to the
largest. If you do so, and if some extraneous drifting or wear occurs in
the machine, the operator, the environment, the measuring device,
etc., then that drift will unwittingly contaminate the values and in
turn contaminate the final fit. To minimize the effect of such potential
drift, it is best to randomize (use random number tables) the sequence
of the values. This will not make the drift go away, but it will
spread the drift effect evenly over the entire curve, realistically
inflating the variation of the fitted values, and providing some
mechanism after the fact (at the residual analysis model validation
stage) for uncovering or discovering such a drift. If you do not
randomize the run sequence, you give up your ability to detect such a
drift if it occurs.
4. Process Modeling
4.3. Data Collection for Process Modeling
Reasons Cases do arise, however, for which the tabulated classical designs do
Classical not cover a particular practical situation. That is, user constraints
Designs May preclude the use of tabulated classical designs because such classical
Not Work designs do not accommodate user constraints. Such constraints
include:
1. Limited maximum number of runs:
User constraints in budget and time may dictate a maximum
allowable number of runs that is too small or too "irregular"
(e.g., "13") to be accommodated by classical designs--even
fractional factorial designs.
2. Impossible factor combinations:
The user may have some factor combinations that are
impossible to run. Such combinations may at times be
specified (to maintain balance and orthogonality) as part of a
recommeded classical design. If the user simply omits this
impossible run from the design, the net effect may be a
reduction in the quality and optimaltiy of the classical design.
3. Too many levels:
The number of factors and/or the number of levels of some
factors intended for use may not be included in tabulations of
classical designs.
What to Do If If user constraints are such that classical designs do not exist to
Classical accommodate such constraints, then what is the user to do?
Designs Do Not
Exist? The previous section's list of design criteria (capability for the
primary model, capability for the alternate model, minimum
variation of estimated coefficients, etc.) is a good passive target to
aim for in terms of desirable design properties, but provides little
help in terms of an active formal construction methodology for
generating a design.
Need 1: a Model The motivation for optimal designs is the practical constraints that
the user has. The advantage of optimal designs is that they do
provide a reasonable design-generating methodology when no other
mechanism exists. The disadvantage of optimal designs is that they
require a model from the user. The user may not have this model.
All optimal designs are model-dependent, and so the quality of the
final engineering conclusions that result from the ensuing design,
data, and analysis is dependent on the correctness of the analyst's
assumed model. For example, if the responses from a particular
process are actually being drawn from a cubic model and the analyst
assumes a linear model and uses the corresponding optimal design
to generate data and perform the data analysis, then the final
4. Process Modeling
4.3. Data Collection for Process Modeling
Graphically If you have a design that claims to be globally good in k factors, then
Check for generally that design should be locally good in each of the individual k
Univariate factors. Checking high-dimensional global goodness is difficult, but
Balance checking low-dimensional local goodness is easy. Generate k counts
plots, with the levels of factors plotted on the horizontal axis of each
plot and the number of design points for each level in factor on the
vertical axis. For most good designs, these counts should be about the
same (= balance) for all levels of a factor. Exceptions exist, but such
balance is a low-level characteristic of most good designs.
Check for For optimal designs, metrics exist (D-efficiency, A-efficiency, etc.) that
Minimal can be computed and that reflect the quality of the design. Further,
Variation relative ratios of standard deviations of the coefficient estimators and
relative ratios of predicted values can be computed and compared for
such designs. Such calculations are commonly performed in computer
packages which specialize in the generation of optimal designs.
4. Process Modeling
Contents: 1. What are the basic steps for developing an effective process
Section 4 model?
2. How do I select a function to describe my process?
1. Incorporating Scientific Knowledge into Function Selection
2. Using the Data to Select an Appropriate Function
3. Using Methods that Do Not Require Function Specification
3. How are estimates of the unknown parameters obtained?
1. Least Squares
2. Weighted Least Squares
4. How can I tell if a model fits my data?
1. How can I assess the sufficiency of the functional part of
the model?
2. How can I detect non-constant variation across the data?
3. How can I tell if there was drift in the measurement
process?
4. How can I assess whether the random errors are
independent from one to the next?
5. How can I test whether or not the random errors are
normally distributed?
6. How can I test whether any significant terms are missing or
misspecified in the functional part of the model?
7. How can I test whether all of the terms in the functional
part of the model are necessary?
5. If my current model does not fit the data well, how can I improve
it?
1. Updating the Function Based on Residual Plots
2. Accounting for Non-Constant Variation Across the Data
3. Accounting for Errors with a Non-Normal Distribution
4. Process Modeling
4.4. Data Analysis for Process Modeling
A The three basic steps of process modeling described in the paragraph above assume that the
Variation data have already been collected and that the same data set can be used to fit all of the
on the candidate models. Although this is often the case in model-building situations, one variation
Basic Steps on the basic model-building sequence comes up when additional data are needed to fit a
newly hypothesized model based on a model fit to the initial data. In this case two
additional steps, experimental design and data collection, can be added to the basic
sequence between model selection and model-fitting. The flow chart below shows the basic
model-fitting sequence with the integration of the related data collection steps into the
model-building process.
Model
Building
Sequence
Examples illustrating the model-building sequence in real applications can be found in the
case studies in Section 4.6. The specific tools and techniques used in the basic
model-building steps are described in the remainder of this section.
Design of Of course, considering the model selection and fitting before collecting the initial data is
Initial also a good idea. Without data in hand, a hypothesis about what the data will look like is
Experiment needed in order to guess what the initial model should be. Hypothesizing the outcome of an
experiment is not always possible, of course, but efforts made in the earliest stages of a
project often maximize the efficiency of the whole model-building process and result in the
best possible models for the process. More details about experimental design can be found
in Section 4.3 and in Chapter 5: Process Improvement.
4. Process Modeling
4.4. Data Analysis for Process Modeling
4. Process Modeling
4.4. Data Analysis for Process Modeling
4.4.2. How do I select a function to describe my process?
so this would probably be the best model with which to start. If the
linear-over-linear model is not adequate, then the initial fit can be
followed up using a higher-degree rational function, or some other
type of model that also has a horizontal asymptote.
Focus on the Although the concrete strength example makes a good case for
Region of incorporating scientific knowledge into the model, it is not
Interest necessarily a good idea to force a process model to follow all of the
physical properties that the process must follow. At first glance it
seems like incorporating physical properties into a process model
could only improve it; however, incorporating properties that occur
outside the region of interest for a particular application can actually
sacrifice the accuracy of the model "where it counts" for increased
accuracy where it isn't important. As a result, physical properties
should only be incorporated into process models when they directly
affect the process in the range of the data used to fit the model or in
the region in which the model will be used.
4. Process Modeling
4.4. Data Analysis for Process Modeling
4.4.2. How do I select a function to describe my process?
Example The data from the Pressure/Temperature example is plotted below. From the plot it looks like a
straight-line model will fit the data well. This is as expected based on Charles' Law. In this case
there are no signs of any problems with the process or data collection.
Straight-Line
Model Looks
Appropriate
Start with Least A key point when selecting a model is to start with the simplest function that looks as though it
Complex will describe the structure in the data. Complex models are fine if required, but they should not be
Functions First used unnecessarily. Fitting models that are more complex than necessary means that random
noise in the data will be modeled as deterministic structure. This will unnecessarily reduce the
amount of data available for estimation of the residual standard deviation, potentially increasing
the uncertainties of the results obtained when the model is used to answer engineering or
scientific questions. Fortunately, many physical systems can be modeled well with straight-line,
polynomial, or simple nonlinear functions.
Quadratic
Polynomial a
Good Starting
Point
Developing When the function describing the deterministic variability in the response variable depends on
Models in several predictor (input) variables, it can be difficult to see how the different variables relate to
Higher one another. One way to tackle this problem that often proves useful is to plot cross-sections of
Dimensions the data and build up a function one dimension at a time. This approach will often shed more light
on the relationships between the different predictor variables and the response than plots that
lump different levels of one or more predictor variables together on plots of the response variable
versus another predictor variable.
Polymer For example, materials scientists are interested in how cylindrical polymer samples that have
Relaxation been twisted by a fixed amount relax over time. They are also interested in finding out how
Example temperature may affect this process. As a result, both time and temperature are thought to be
important factors for describing the systematic variation in the relaxation data plotted below.
When the torque is plotted against time, however, the nature of the relationship is not clearly
shown. Similarly, when torque is plotted versus the temperature the effect of temperature is also
unclear. The difficulty in interpreting these plots arises because the plot of torque versus time
includes data for several different temperatures and the plot of torque versus temperature includes
data observed at different times. If both temperature and time are necessary parts of the function
that describes the data, these plots are collapsing what really should be displayed as a
three-dimensional surface onto a two-dimensional plot, muddying the picture of the data.
Polymer
Relaxation
Data
Multiplots If cross-sections of the data are plotted in multiple plots instead of lumping different explanatory
Reveal variable values together, the relationships between the variables can become much clearer. Each
Structure cross-sectional plot below shows the relationship between torque and time for a particular
temperature. Now the relationship between torque and time for each temperature is clear. It is
also easy to see that the relationship differs for different temperatures. At a temperature of 25
degrees there is a sharp drop in torque between 0 and 20 minutes and then the relaxation slows.
At a temperature of 75 degrees, however, the relaxation drops at a rate that is nearly constant over
the whole experimental time period. The fact that the profiles of torque versus time vary with
temperature confirms that any functional description of the polymer relaxation process will need
to include temperature.
Cross-Sections
of the Data
Cross-Sectional Further insight into the appropriate function to use can be obtained by separately modeling each
Models Provide cross-section of the data and then relating the individual models to one another. Fitting the
Further Insight accepted stretched exponential relationship between torque ( ) and time ( ),
to each cross-section of the polymer data and then examining plots of the estimated parameters
versus temperature roughly indicates how temperature should be incorporated into a model of the
polymer relaxation data. The individual stretched exponentials fit to each cross-section of the data
are shown in the plot above as solid curves through the data. Plots of the estimated values of each
of the four parameters in the stretched exponential versus temperature are shown below.
Cross-Section
Parameters vs.
Temperature
The solid line near the center of each plot of the cross-sectional parameters from the stretched
exponential is the mean of the estimated parameter values across all six levels of temperature.
The dashed lines above and below the solid reference line provide approximate bounds on how
much the parameter estimates could vary due to random variation in the data. These bounds are
based on the typical value of the standard deviations of the estimates from each individual
stretched exponential fit. From these plots it is clear that only the values of significantly differ
from one another across the temperature range. In addition, there is a clear increasing trend in the
parameter estimates for . For each of the other parameters, the estimate at each temperature
falls within the uncertainty bounds and no clear structure is visible.
Based on the plot of estimated values above, augmenting the term in the standard stretched
exponential so that the new denominator is quadratic in temperature (denoted by ) should
provide a good starting model for the polymer relaxation process. The choice of a quadratic in
temperature is suggested by the slight curvature in the plot of the individually estimated
parameter values. The resulting model is
4. Process Modeling
4.4. Data Analysis for Process Modeling
4.4.2. How do I select a function to describe my process?
Input The smoothing parameter controls how flexible the functional part of the model will be. This, in
Parameters turn, controls how closely the function will fit the data, just as the choice of a straight line or a
Control polynomial of higher degree determines how closely a traditional regression model will track the
Function deterministic structure in a set of data. The exact information that must be specified in order to fit
Shape the regression function to the data will vary from method to method. Some methods may require
other user-specified parameters require, in addition to a smoothing parameter, to fit the regression
function. However, the purpose of the user-supplied information is similar for all methods.
Starting As for more traditional methods of regression, simple regression functions are better than
Simple still complicated ones in local regression. The complexity of a regression function can be gauged by
Best its potential to track the data. With traditional modeling methods, in which a global function that
describes the data is given explictly, it is relatively easy to differentiate between simple and
complicated models. With local regression methods, on the other hand, it can sometimes difficult
to tell how simple a particular regression function actually is based on the inputs to the procedure.
This is because of the different ways of specifying local functions, the effects of changes in the
smoothing parameter, and the relationships between the different inputs. Generally, however, any
local functions should be as simple as possible and the smoothing parameter should be set so that
each local function is fit to a large subset of the data. For example, if the method offers a choice
of local functions, a straight line would typically be a better starting point than a higher-order
polynomial or a statistically nonlinear function.
Function To use LOESS, the user must specify the degree, d, of the local polynomial to be fit to the data,
Specification and the fraction of the data, q, to be used in each fit. In this case, the simplest possible initial
for LOESS function specification is d=1 and q=1. While it is relatively easy to understand how the degree of
the local polynomial affects the simplicity of the initial model, it is not as easy to determine how
the smoothing parameter affects the function. However, plots of the data from the computational
example of LOESS in Section 1 with four potential choices of the initial regression function show
that the simplest LOESS function, with d=1 and q=1, is too simple to capture much of the
structure in the data.
LOESS
Regression
Functions
with Different
Initial
Parameter
Specifications
Experience Although the simplest possible LOESS function is not flexible enough to describe the data well,
Suggests any of the other functions shown in the figure would be reasonable choices. All of the latter
Good Values functions track the data well enough to allow assessment of the different assumptions that need to
to Use be checked before deciding that the model really describes the data well. None of these functions
is probably exactly right, but they all provide a good enough fit to serve as a starting point for
model refinement. The fact that there are several LOESS functions that are similar indicates that
additional information is needed to determine the best of these functions. Although it is debatable,
experience indicates that it is probably best to keep the initial function simple and set the
smoothing parameter so each local function is fit to a relatively small subset of the data.
Accepting this principle, the best of these initial models is the one in the upper right corner of the
figure with d=1 and q=0.5.
4. Process Modeling
4.4. Data Analysis for Process Modeling
Overview of Although robust techniques are valuable, they are not as well developed
Section 4.3 as the more traditional methods and often require specialized software
that is not readily available. Maximum likelihood also requires
specialized algorithms in general, although there are important special
cases that do not have such a requirement. For example, for data with
normally distributed random errors, the least squares and maximum
likelihood parameter estimators are identical. As a result of these
software and developmental issues, and the coincidence of maximum
likelihood and least squares in many applications, this section currently
focuses on parameter estimation only by least squares methods. The
remainder of this section offers some intuition into how least squares
works and illustrates the effectiveness of this method.
4. Process Modeling
4.4. Data Analysis for Process Modeling
4.4.3. How are estimates of the unknown parameters obtained?
As previously noted, are treated as the variables in the optimization and the predictor
variable values, are treated as coefficients. To emphasize the fact that the estimates
of the parameter values are not the same as the true values of the parameters, the estimates are
denoted by . For linear models, the least squares minimization is usually done
analytically using calculus. For nonlinear models, on the other hand, the minimization must
almost always be done using iterative numerical algorithms.
For this model the least squares estimates of the parameters would be computed by minimizing
Doing this by
1. taking partial derivatives of with respect to and ,
2. setting each partial derivative equal to zero, and
3. solving the resulting system of two equations with two unknowns
yields the following estimators for the parameters:
These formulas are instructive because they show that the parameter estimators are functions of
both the predictor and response variables and that the estimators are not independent of each
other unless . This is clear because the formula for the estimator of the intercept depends
directly on the value of the estimator of the slope, except when the second term in the formula for
drops out due to multiplication by zero. This means that if the estimate of the slope deviates a
lot from the true slope, then the estimate of the intercept will tend to deviate a lot from its true
value too. This lack of independence of the parameter estimators, or more specifically the
correlation of the parameter estimators, becomes important when computing the uncertainties of
predicted values from the model. Although the formulas discussed in this paragraph only apply to
the straight-line model, the relationship between the parameter estimators is analogous for more
complicated models, including both statistically linear and statistically nonlinear models.
Quality of From the preceding discussion, which focused on how the least squares estimates of the model
Least parameters are computed and on the relationship between the parameter estimates, it is difficult to
Squares picture exactly how good the parameter estimates are. They are, in fact, often quite good. The plot
Estimates below shows the data from the Pressure/Temperature example with the fitted regression line and
the true regression line, which is known in this case because the data were simulated. It is clear
from the plot that the two lines, the solid one estimated by least squares and the dashed being the
true line obtained from the inputs to the simulation, are almost identical over the range of the
data. Because the least squares line approximates the true line so well in this case, the least
squares line will serve as a useful description of the deterministic portion of the variation in the
data, even though it is not a perfect description. While this plot is just one example, the
relationship between the estimated and true regression functions shown here is fairly typical.
Comparison
of LS Line
and True
Line
Quantifying From the plot above it is easy to see that the line based on the least squares estimates of and
the Quality is a good estimate of the true line for these simulated data. For real data, of course, this type of
of the Fit direct comparison is not possible. Plots comparing the model to the data can, however, provide
for Real valuable information on the adequacy and usefulness of the model. In addition, another measure
Data of the average quality of the fit of a regression function to a set of data by least squares can be
quantified using the remaining parameter in the model, , the standard deviation of the error term
in the model.
Like the parameters in the functional part of the model, is generally not known, but it can also
be estimated from the least squares equations. The formula for the estimate is
with denoting the number of observations in the sample and is the number of parameters in
the functional part of the model. is often referred to as the "residual standard deviation" of the
process.
Because measures how the individual values of the response variable vary with respect to their
true values under , it also contains information about how far from the truth quantities
derived from the data, such as the estimated values of the parameters, could be. Knowledge of the
approximate value of plus the values of the predictor variable values can be combined to
provide estimates of the average deviation between the different aspects of the model and the
corresponding true values, quantities that can be related to properties of the process generating
the data that we would like to know.
More information on the correlation of the parameter estimators and computing uncertainties for
different functions of the estimated regression parameters can be found in Section 5.
4. Process Modeling
4.4. Data Analysis for Process Modeling
4.4.3. How are estimates of the unknown parameters obtained?
4. Process Modeling
4.4. Data Analysis for Process Modeling
Main There are many statistical tools for model validation, but the primary tool for most
Tool: process modeling applications is graphical residual analysis. Different types of plots of
Graphical the residuals (see definition below) from a fitted model provide information on the
Residual adequacy of different aspects of the model. Numerical methods for model validation,
Analysis such as the statistic, are also useful, but usually to a lesser degree than graphical
methods. Graphical methods have an advantage over numerical methods for model
validation because they readily illustrate a broad range of complex aspects of the
relationship between the model and the data. Numerical methods for model validation
tend to be narrowly focused on a particular aspect of the relationship between the model
and the data and often try to compress that information into a single descriptive number
or test result.
Numerical Numerical methods do play an important role as confirmatory methods for graphical
Methods' techniques, however. For example, the lack-of-fit test for assessing the correctness of the
Forte functional part of the model can aid in interpreting a borderline residual plot. There are
also a few modeling situations in which graphical methods cannot easily be used. In these
cases, numerical methods provide a fallback position for model validation. One common
situation when numerical validation methods take precedence over graphical methods is
when the number of parameters being estimated is relatively close to the size of the data
set. In this situation residual plots are often difficult to interpret due to constraints on the
residuals imposed by the estimation of the unknown parameters. One area in which this
typically happens is in optimization applications using designed experiments. Logistic
regression with binary data is another area in which graphical residual analysis can be
difficult.
Residuals The residuals from a fitted model are the differences between the responses observed at
each combination values of the explanatory variables and the corresponding prediction of
the response computed using the regression function. Mathematically, the definition of
the residual for the ith observation in the data set is written
with denoting the ith response in the data set and represents the list of explanatory
variables, each set at the corresponding values found in the ith observation in the data set.
Example The data listed below are from the Pressure/Temperature example introduced in Section
4.1.1. The first column shows the order in which the observations were made, the second
column indicates the day on which each observation was made, and the third column
gives the ambient temperature recorded when each measurement was made. The fourth
column lists the temperature of the gas itself (the explanatory variable) and the fifth
column contains the observed pressure of the gas (the response variable). Finally, the
sixth column gives the corresponding values from the fitted straight-line regression
function.
and the last column lists the residuals, the difference between columns five and six.
Data,
Fitted Run Ambient Fitted
Values & Order Day Temperature Temperature Pressure Value
Residuals Residual
1 1 23.820 54.749 225.066 222.920
2.146
2 1 24.120 23.323 100.331 99.411
0.920
3 1 23.434 58.775 230.863 238.744
-7.881
4 1 23.993 25.854 106.160 109.359
-3.199
5 1 23.375 68.297 277.502 276.165
1.336
6 1 23.233 37.481 148.314 155.056
-6.741
7 1 24.162 49.542 197.562 202.456
-4.895
8 1 23.667 34.101 138.537 141.770
-3.232
Why Use If the model fit to the data were correct, the residuals would approximate the random
Residuals? errors that make the relationship between the explanatory variables and the response
variable a statistical relationship. Therefore, if the residuals appear to behave randomly, it
suggests that the model fits the data well. On the other hand, if non-random structure is
evident in the residuals, it is a clear sign that the model fits the data poorly. The
subsections listed below detail the types of plots to use to test different aspects of a model
and give guidance on the correct interpretations of different results that could be observed
for each type of plot.
Model 1. How can I assess the sufficiency of the functional part of the model?
Validation 2. How can I detect non-constant variation across the data?
Specifics
3. How can I tell if there was drift in the process?
4. How can I assess whether the random errors are independent from one to the next?
5. How can I test whether or not the random errors are distributed normally?
6. How can I test whether any significant terms are missing or misspecified in the
functional part of the model?
7. How can I test whether all of the terms in the functional part of the model are
necessary?
4. Process Modeling
4.4. Data Analysis for Process Modeling
4.4.4. How can I tell if a model fits my data?
Pressure / The residual scatter plot below, of the residuals from a straight line fit to the
Temperature Pressure/Temperature data introduced in Section 4.1.1. and also discussed in the previous section,
Example does not indicate any problems with the model. The reference line at 0 emphasizes that the
residuals are split about 50-50 between positive and negative. There are no systematic patterns
apparent in this plot. Of course, just as the statistic cannot justify a particular model on its
own, no single residual plot can completely justify the adoption of a particular model either. If a
plot of these residuals versus another variable did show systematic structure, the form of model
with respect to that variable would need to be changed or that variable, if not in the model, would
need to be added to the model. It is important to plot the residuals versus every available variable
to ensure that a candidate model is the best model possible.
Importance of One important class of potential predictor variables that is often overlooked is environmental
Environmental variables. Environmental variables include things like ambient temperature in the area where
Variables measurements are being made and ambient humidity. In most cases environmental variables are
not expected to have any noticeable effect on the process, but it is always good practice to check
for unanticipated problems caused by environmental conditions. Sometimes the catch-all
environmental variables can also be used to assess the validity of a model. For example, if an
experiment is run over several days, a plot of the residuals versus day can be used to check for
differences in the experimental conditions at different times. Any differences observed will not
necessarily be attributable to a specific cause, but could justify further experiments to try to
identify factors missing from the model, or other model misspecifications. The two residual plots
below show the pressure/temperature residuals versus ambient lab temperature and day. In both
cases the plots provide further evidence that the straight line model gives an adequate description
of the data. The plot of the residuals versus day does look a little suspicious with a slight cyclic
pattern between days, but doesn't indicate any overwhelming problems. It is likely that this
apparent difference between days is just due to the random variation in the data.
Pressure /
Temperature
Residuals vs
Environmental
Variables
Residual The examples of residual plots given above are for the simplest possible case, straight line
Scatter Plots regression via least squares, but the residual plots are used in exactly the same way for almost all
Work Well for of the other statistical methods used for model building. For example, the residual plot below is
All Methods for the LOESS model fit to the thermocouple calibration data introduced in Section 4.1.3.2. Like
the plots above, this plot does not signal any problems with the fit of the LOESS model to the
data. The residuals are scattered both above and below the reference line at all temperatures.
Residuals adjacent to one another in the plot do not tend to have similar signs. There are no
obvious systematic patterns of any type in this plot.
Validation of
LOESS Model
for
Thermocouple
Calibration
An Alternative Based on the plot of voltage (response) versus the temperature (predictor) for the thermocouple
to the LOESS calibration data, a quadratic model would have been a reasonable initial model for these data. The
Model quadratic model is the simplest possible model that could account for the curvature in the data.
The scatter plot of the residuals versus temperature for a quadratic model fit to the data clearly
indicates that it is a poor fit, however. This residual plot shows strong cyclic structure in the
residuals. If the quadratic model did fit the data, then this structure would not be left behind in the
residuals. One thing to note in comparing the residual plots for the quadratic and LOESS models,
besides the amount of structure remaining in the data in each case, is the difference in the scales
of the two plots. The residuals from the quadratic model have a range that is approximately fifty
times the range of the LOESS residuals.
Validation of
the Quadratic
Model
4. Process Modeling
4.4. Data Analysis for Process Modeling
4.4.4. How can I tell if a model fits my data?
Residuals
from Pressure
/ Temperature
Example
Modification To illustrate how the residuals from the Pressure/Temperature data would look if the standard
of Example deviation was not constant across the different temperature levels, a modified version of the data
was simulated. In the modified version, the standard deviation increases with increasing values of
pressure. Situations like this, in which the standard deviation increases with increasing values of
the response, are among the most common ways that non-constant random variation occurs in
physical science and engineering applications. A plot of the data is shown below. Comparison of
these two versions of the data is interesting because in the original units of the data they don't
look strikingly different.
Pressure
Data with
Non-Constant
Residual
Standard
Deviation
Residuals The residual plot from a straight-line fit to the modified data, however, highlights the
Indicate non-constant standard deviation in the data. The horn-shaped residual plot, starting with residuals
Non-Constant close together around 20 degrees and spreading out more widely as the temperature (and the
Standard pressure) increases, is a typical plot indicating that the assumptions of the analysis are not
Deviation satisfied with this model. Other residual plot shapes besides the horn shape could indicate
non-constant standard deviation as well. For example, if the response variable for a data set
peaked in the middle of the range of the predictors and was small for extreme values of the
predictors, the residuals plotted versus the predictors would look like two horns with the bells
facing one another. In a case like this, a plot of the residuals versus the predicted values would
exhibit the single horn shape, however.
Residuals
from Modified
Pressure
Data
Residual The use of residual plots to check the assumption of constant standard deviation works in the
Plots same way for most modeling methods. It is not limited to least squares regression even though
Comparing that is almost always the context in which it is explained. The plot below shows the residuals
Variability from a LOESS fit to the data from the Thermocouple Calibration example. The even spread of the
Apply to Most residuals across the range of the data does not indicate any changes in the standard deviation,
Methods leading us to the conclusion that this assumption is not unreasonable for these data.
Residuals
from LOESS
Fit to
Thermocouple
Calibration
Data
Correct One potential pitfall in using residual plots to check for constant standard deviation across the
Function data is that the functional part of the model must adequately describe the systematic variation in
Needed to the data. If that is not the case, then the typical horn shape observed in the residuals could be due
Check for to an artifact of the function fit to the data rather than to non-constant variation. For example, in
Constant the Polymer Relaxation example it was hypothesized that both time and temperature are related to
Standard the response variable, torque. However, if a single stretched exponential model in time was the
Deviation initial model used for the process, the residual plots could be misinterpreted fairly easily, leading
to the false conclusion that the standard deviation is not constant across the data. When the
functional part of the model does not fit the data well, the residuals do not reflect purely random
variations in the process. Instead, they reflect the remaining structure in the data not accounted
for by the function. Because the residuals are not random, they cannot be used to answer
questions about the random part of the model. This also emphasizes the importance of plotting the
data before fitting the initial model, even if a theoretical model for the data is available. Looking
at the data before fitting the initial model, at least in this case, would likely forestall this potential
problem.
Polymer
Relaxation
Data Modeled
as a Single
Stretched
Exponential
Residuals
from Single
Stretched
Exponential
Model
Getting Back Fortunately, even if the initial model were incorrect, and the residual plot above was made, there
on Course are clues in this plot that indicate that the horn shape (pointing left this time) is not caused by
After a Bad non-constant standard deviation. The cluster of residuals at time zero that have a residual torque
Start near one indicate that the functional part of the model does not fit the data. In addition, even when
the residuals occur with equal frequency above and below zero, the spacing of the residuals at
each time does not really look random. The spacing is too regular to represent random
measurement errors. At measurement times near the low end of the scale, the spacing of the
points increases as the residuals decrease and at the upper end of the scale the spacing decreases
as the residuals decrease. The patterns in the spacing of the residuals also points to the fact that
the functional form of the model is not correct and needs to be corrected before drawing
conclusions about the distribution of the residuals.
4. Process Modeling
4.4. Data Analysis for Process Modeling
4.4.4. How can I tell if a model fits my data?
Pressure / To show in a more concrete way how run order plots work, the plot below shows the residuals
Temperature from a straight-line fit to the Pressure/Temperature data plotted in run order. Comparing the run
Example order plot to a listing of the data with the residuals shows how the residual for the first data point
collected is plotted versus the run order index value 1, the second residual is plotted versus an
index value of 2, and so forth.
Run
Sequence
Plot for the
Pressure /
Temperature
Data
No Drift Taken as a whole, this plot essentially shows that there is only random scatter in the relationship
Indicated between the observed pressures and order in which the data were collected, rather than any
systematic relationship. Although there appears to be a slight trend in the residuals when plotted
in run order, the trend is small when measured against short-term random variation in the data,
indicating that it is probably not a real effect. The presence of this apparent trend does emphasize,
however, that practice and judgment are needed to correctly interpret these plots. Although
residual plots are a very useful tool, if critical judgment is not used in their interpretation, you can
see things that aren't there or miss things that are. One hint that the slight slope visible in the data
is not worrisome in this case is the fact that the residuals overlap zero across all runs. If the
process was drifting significantly, it is likely that there would be some parts of the run sequence
in which the residuals would not overlap zero. If there is still some doubt about the slight trend
visible in the data after using this graphical procedure, a term describing the drift can be added to
the model and tested numerically to see if it has a significant impact on the results.
Modification To illustrate how the residuals from the Pressure/Temperature data would look if there were drift
of Example in the process, a modified version of the data was simulated. A small drift of 0.3
units/measurement was added to the process. A plot of the data is shown below. In this run
sequence plot a clear, strong trend is visible and there are portions of the run order where the
residuals do not overlap zero. Because the structure is so evident in this case, it is easy to
conclude that some sort of drift is present. Then, of course, its cause needs to be determined so
that appropriate steps can be taken to eliminate the drift from the process or to account for it in
the model.
Run
Sequence
Plot for
Pressure /
Temperature
Data with
Drift
As in the case when the standard deviation was not constant across the data set, comparison of
these two versions of the data is interesting because the drift is not apparent in either data set
when viewed in the scale of the data. This highlights the need for graphical residual analysis
when developing process models.
Applicable The run sequence plot, like most types of residual plots, can be used to check for drift in many
to Most regression methods. It is not limited to least squares fitting or one particular type of model. The
Regression run sequence plot below shows the residuals from the fit of the nonlinear model
Methods
to the data from the Polymer Relaxation example. The even spread of the residuals across the
range of the data indicates that there is no apparent drift in this process.
Run
Sequence
Plot for
Polymer
Relaxation
Data
4. Process Modeling
4.4. Data Analysis for Process Modeling
4.4.4. How can I tell if a model fits my data?
Interpretation If the errors are independent, there should be no pattern or structure in the lag plot. In this case
the points will appear to be randomly scattered across the plot in a scattershot fashion. If there is
significant dependence between errors, however, some sort of deterministic pattern will likely be
evident.
Examples Lag plots for the Pressure/Temperature example, the Thermocouple Calibration example, and the
Polymer Relaxation example are shown below. The lag plots for these three examples suggest
that the errors from each fit are independent. In each case, the residuals are randomly scattered
about the origin with no apparent structure. The last plot, for the Polymer Relaxation data, shows
an apparent slight correlation between the residuals and the lagged residuals, but experience
suggests that this could easily be due to random error and is not likely to be a real issue. In fact,
the lag plot can also emphasize outlying observations and a few of the larger residuals (in
absolute terms) may be pulling our eyes unduly. The normal probability plot, which is also good
at identifying outliers, will be discussed next, and will shed further light on any unusual points in
the data set.
Lag Plot:
Temperature /
Pressure
Example
Lag Plot:
Thermocouple
Calibration
Example
Lag Plot:
Polymer
Relaxation
Example
Next Steps Some of the different patterns that might be found in the residuals when the errors are not
independent are illustrated in the general discussion of the lag plot. If the residuals are not
random, then time series methods might be required to fully model the data. Some time series
basics are given in Section 4 of the chapter on Process Monitoring. Before jumping to
conclusions about the need for time series methods, however, be sure that a run order plot does
not show any trends, or other structure, in the data. If there is a trend in the run order plot,
whether caused by drift or by the use of the wrong functional form, the source of the structure
shown in the run order plot will also induce structure in the lag plot. Structure induced in the lag
plot in this way does not necessarily indicate dependence in successive random errors. The lag
plot can only be interpreted clearly after accounting for any structure in the run order plot.
4. Process Modeling
4.4. Data Analysis for Process Modeling
4.4.4. How can I tell if a model fits my data?
Normal The normal probability plot is constructed by plotting the sorted values of the residuals versus the
Probability associated theoretical values from the standard normal distribution. Unlike most residual scatter
Plot plots, however, a random scatter of points does not indicate that the assumption being checked is
met in this case. Instead, if the random errors are normally distributed, the plotted points will lie
close to straight line. Distinct curvature or other signficant deviations from a straight line indicate
that the random errors are probably not normally distributed. A few points that are far off the line
suggest that the data has some outliers in it.
Examples Normal probability plots for the Pressure/Temperature example, the Thermocouple Calibration
example, and the Polymer Relaxation example are shown below. The normal probability plots for
these three examples indicate that that it is reasonable to assume that the random errors for these
processes are drawn from approximately normal distributions. In each case there is a strong linear
relationship between the residuals and the theoretical values from the standard normal
distribution. Of course the plots do show that the relationship is not perfectly deterministic (and it
never will be), but the linear relationship is still clear. Since none of the points in these plots
deviate much from the linear relationship defined by the residuals, it is also reasonable to
conclude that there are no outliers in any of these data sets.
Normal
Probability
Plot:
Temperature /
Pressure
Example
Normal
Probability
Plot:
Thermocouple
Calibration
Example
Normal
Probability
Plot: Polymer
Relaxation
Example
Further If the random errors from one of these processes were not normally distributed, then significant
Discussion curvature may have been visible in the relationship between the residuals and the quantiles from
and Examples the standard normal distribution, or there would be residuals at the upper and/or lower ends of the
line that clearly did not fit the linear relationship followed by the bulk of the data. Examples of
some typical cases obtained with non-normal random errors are illustrated in the general
discussion of the normal probability plot.
Histogram The normal probability plot helps us determine whether or not it is reasonable to assume that the
random errors in a statistical process can be assumed to be drawn from a normal distribution. An
advantage of the normal probability plot is that the human eye is very sensitive to deviations from
a straight line that might indicate that the errors come from a non-normal distribution. However,
when the normal probability plot suggests that the normality assumption may not be reasonable, it
does not give us a very good idea what the distribution does look like. A histogram of the
residuals from the fit, on the other hand, can provide a clearer picture of the shape of the
distribution. The fact that the histogram provides more general distributional information than
does the normal probability plot suggests that it will be harder to discern deviations from
normality than with the more specifically-oriented normal probability plot.
Examples Histograms for the three examples used to illustrate the normal probability plot are shown below.
The histograms are all more-or-less bell-shaped, confirming the conclusions from the normal
probability plots. Additional examples can be found in the gallery of graphical techniques.
Histogram:
Temperature /
Pressure
Example
Histogram:
Thermocouple
Calibration
Example
Histogram:
Polymer
Relaxation
Example
Important One important detail to note about the normal probability plot and the histogram is that they
Note provide information on the distribution of the random errors from the process only if
1. the functional part of the model is correctly specified,
2. the standard deviation is constant across the data,
3. there is no drift in the process, and
4. the random errors are independent from one run to the next.
If the other residual plots indicate problems with the model, the normal probability plot and
histogram will not be easily interpretable.
4. Process Modeling
4.4. Data Analysis for Process Modeling
4.4.4. How can I tell if a model fits my data?
General The most common strategy used to test for model adequacy is to
Strategy compare the amount of random variation in the residuals from the data
used to fit the model with an estimate of the random variation in the
process using data that are independent of the model. If these two
estimates of the random variation are similar, that indicates that no
significant terms are likely to be missing from the model. If the
model-dependent estimate of the random variation is larger than the
model-independent estimate, then significant terms probably are
missing or misspecified in the functional part of the model.
Testing Model The need for a model-independent estimate of the random variation
Adequacy means that replicate measurements made under identical experimental
Requires conditions are required to carry out a lack-of-fit test. If no replicate
Replicate measurements are available, then there will not be any baseline
Measurements estimate of the random process variation to compare with the results
from the model. This is the main reason that the use of replication is
emphasized in experimental design.
Data Used to Although it might seem like two sets of data would be needed to carry
Fit Model out the lack-of-fit test using the strategy described above, one set of
Can Be data to fit the model and compute the residual standard deviation and
Partitioned to the other to compute the model-independent estimate of the random
Compute variation, that is usually not necessary. In most regression
Lack-of-Fit applications, the same data used to fit the model can also be used to
Statistic carry out the lack-of-fit test, as long as the necessary replicate
measurements are available. In these cases, the lack-of-fit statistic is
computed by partitioning the residual standard deviation into two
independent estimators of the random variation in the process. One
estimator depends on the model and the sample means of the
replicated sets of data ( ), while the other estimator is a pooled
standard deviation based on the variation observed in each set of
replicated measurements ( ). The squares of these two estimators of
the random variation are often called the "mean square for lack-of-fit"
and the "mean square for pure error," respectively, in statistics texts.
The notation and is used here instead to emphasize the fact
that, if the model fits the data, these quantities should both be good
estimators of .
with denoting the sample size of the data set used to fit the model,
is the number of unique combinations of predictor variable levels,
is the number of replicated observations at the ith combination of
predictor variable levels, the are the regression responses indexed
by their predictor variable levels and number of replicate
measurements, and is the mean of the responses at the itth
combination of predictor variable levels. Notice that the formula for
depends only on the data and not on the functional part of the
model. This shows that will be a good estimator of , regardless of
whether the model is a complete description of the process or not.
Carrying Out Combining the ideas presented in the previous two paragraphs,
the Test for following the general strategy outlined above, the adequacy of the
Lack-of-Fit functional part of the model can be assessed by comparing the values
of and . If , then one or more important terms may be
missing or misspecified in the functional part of the model. Because of
the random error in the data, however, we know that will
sometimes be larger than even when the model is adequate. To
make sure that the hypothesis that the model is adequate is not rejected
by chance, it is necessary to understand how much greater than the
value of might typically be when the model does fit the data. Then
the hypothesis can be rejected only when is significantly greater
than .
When the model does fit the data, it turns out that the ratio
Alternative Although the formula given above more clearly shows the nature of
Formula for , the numerically equivalent formula below is easier to use in
computations
4. Process Modeling
4.4. Data Analysis for Process Modeling
4.4.4. How can I tell if a model fits my data?
Empirical Over-fitting is especially likely to occur when developing purely empirical models for
and Local processes when there is no external understanding of how much of the total variation in
Models the data might be systematic and how much is random. It also happens more frequently
Most Prone when using regression methods that fit the data locally instead of using an explicitly
to specified function to describe the structure in the data. Explicit functions are usually
Over-fitting relatively simple and have few terms. It is usually difficult to know how to specify an
the Data explicit function that fits the noise in the data, since noise will not typically display
much structure. This is why over-fitting is not usually a problem with these types of
models. Local models, on the other hand, can easily be made to fit very complex
patterns, allowing them to find apparent structure in process noise if care is not
exercised.
Statistical Just as statistical tests can be used to check for significant missing or misspecified terms
Tests for in the functional part of a model, they can also be used to determine if any unnecessary
Over-fitting terms have been included. In fact, checking for over-fitting of the data is one area in
which statistical tests are more effective than residual plots. To test for over-fitting,
however, individual tests of the importance of each parameter in the model are used
rather than following using a single test as done when testing for terms that are missing
or misspecified in the model.
Tests of Most output from regression software also includes individual statistical tests that
Individual compare the hypothesis that each parameter is equal to zero with the alternative that it is
Parameters not zero. These tests are convenient because they are automatically included in most
computer output, do not require replicate measurements, and give specific information
about each parameter in the model. However, if the different predictor variables
included in the model have values that are correlated, these tests can also be quite
difficult to interpret. This is because these tests are actually testing whether or not each
parameter is zero given that all of the other predictors are included in the model.
Test The test statistics for testing whether or not each parameter is zero are typically based
Statistics on Student's t distribution. Each parameter estimate in the model is measured in terms
Based on of how many standard deviations it is from its hypothesized value of zero. If the
Student's t parameter's estimated value is close enough to the hypothesized value that any deviation
Distribution can be attributed to random error, the hypothesis that the parameter's true value is zero
is not rejected. If, on the other hand, the parameter's estimated value is so far away from
the hypothesized value that the deviation cannot be plausibly explained by random
error, the hypothesis that the true value of the parameter is zero is rejected.
Because the hypothesized value of each parameter is zero, the test statistic for each of
these tests is simply the estimated parameter value divided by its estimated standard
deviation,
which provides a measure of the distance between the estimated and hypothesized
values of the parameter in standard deviations. Based on the assumptions that the
random errors are normally distributed and the true value of the parameter is zero (as
we have hypothesized), the test statistic has a Student's t distribution with
degrees of freedom. Therefore, cut-off values for the t distribution can be used to
determine how extreme the test statistic must be in order for each parameter estimate to
be too far away from its hypothesized value for the deviation to be attributed to random
error. Because these tests are generally used to simultaneously test whether or not a
parameter value is greater than or less than zero, the tests should each be used with
cut-off values with a significance level of . This will guarantee that the hypothesis
that each parameter equals zero will be rejected by chance with probability . Because
of the symmetry of the t distribution, only one cut-off value, the upper or the lower one,
needs to be determined, and the other will be it's negative. Equivalently, many people
simply compare the absolute value of the test statistic to the upper cut-off value.
Parameter To illustrate the use of the individual tests of the significance of each parameter in a
Tests for the model, the Dataplot output for the Pressure/Temperature example is shown below. In
Pressure / this case a straight-line model was fit to the data, so the output includes tests of the
Temperature significance of the intercept and slope. The estimates of the intercept and the slope are
Example 7.75 and 3.93, respectively. Their estimated standard deviations are listed in the next
column followed by the test statistics to determine whether or not each parameter is
zero. At the bottom of the output the estimate of the residual standard deviation, , and
its degrees of freedom are also listed.
Dataplot
Output: LEAST SQUARES POLYNOMIAL FIT
Pressure / SAMPLE SIZE N = 40
Temperature DEGREE = 1
Example NO REPLICATION CASE
Looking up the cut-off value from the tables of the t distribution using a significance
level of and 38 degrees of freedom yields a cut-off value of 2.024 (the
cut-off is obtained from the column labeled "0.025" since this is a two-sided test and
0.05/2 = 0.025). Since both of the test statistics are larger in absolute value than the
cut-off value of 2.024, the appropriate conclusion is that both the slope and intercept are
significantly different from zero at the 95% confidence level.
4. Process Modeling
4.4. Data Analysis for Process Modeling
4. Process Modeling
4.4. Data Analysis for Process Modeling
4.4.5. If my current model does not fit the data well, how can I improve it?
Residuals vs
Temperature:
Quadratic
Model
The shape of the residual plot, which looks like a cubic polynomial, suggests that adding another
term to the polynomial might account for the structure left in the data by the quadratic model.
After fitting the cubic polynomial, the magnitude of the residuals is reduced by a factor of about
30, indicating a big improvement in the model.
Residuals vs
Temperature:
Cubic Model
Increasing Although the model is improved, there is still structure in the residuals. Based on this structure, a
Residual higher-degree polynomial looks like it would fit the data. Polynomial models become numerically
Complexity unstable as their degree increases, however. Therfore, after a few iterations like this, leading to
Suggests polynomials of ever-increasing degree, the structure in the residuals is indicating that a
LOESS polynomial does not actually describe the data very well. As a result, a different type of model,
Model such as a nonlinear model or a LOESS model, is probably more appropriate for these data. The
type of model needed to describe the data, however, can be arrived at systematically using the
structure in the residuals at each step.
4. Process Modeling
4.4. Data Analysis for Process Modeling
4.4.5. If my current model does not fit the data well, how can I improve it?
Using The basic steps for using transformations to handle data with unequal subpopulation standard
Transformations deviations are:
1. Transform the response variable to equalize the variation across the levels of the predictor
variables.
2. Transform the predictor variables, if necessary, to attain or restore a simple functional form
for the regression function.
3. Fit and validate the model in the transformed variables.
4. Transform the predicted values back into the original units using the inverse of the
transformation applied to the response variable.
Typical Appropriate transformations to stabilize the variability may be suggested by scientific knowledge
Transformations for or selected using the data. Three transformations that are often effective for equalizing the
Stabilization of standard deviations across the values of the predictor variables are:
Variation 1. ,
2. (note: the base of the logarithm does not really matter), and
3. .
Other transformations can be considered, of course, but in a surprisingly wide range of problems
one of these three transformations will work well. As a result, these are good transformations to
start with, before moving on to more specialized transformations.
Modified Pressure / To illustrate how to use transformations to stabilize the variation in the data, we will return to the
Temperature Example modified version of the Pressure/Temperature example. The residuals from a straight-line fit to
that data clearly showed that the standard deviation of the measurements was not constant across
the range of temperatures.
Residuals from
Modified Pressure
Data
Stabilizing the The first step in the process is to compare different transformations of the response variable,
Variation pressure, to see which one, if any, stabilizes the variation across the range of temperatures. The
straight-line relationship will not hold for all of the transformations, but at this stage of the
process that is not a concern. The functional relationship can usually be corrected after stabilizing
the variation. The key for this step is to find a transformation that makes the uncertainty in the
data approximately the same at the lowest and highest temperatures (and in between). The plot
below shows the modified Pressure/Temperature data in its original units, and with the response
variable transformed using each of the three typical transformations. Remember you can click on
the plot to see a larger view for easier comparison.
Transformations of
the Pressure
Inverse Pressure Has After comparing the effects of the different transformations, it looks like using the inverse of the
Constant Variation pressure will make the standard deviation approximately constant across all temperatures.
However, it is somewhat difficult to tell how the standard deviations really compare on a plot of
this size and scale. To better see the variation, a full-sized plot of temperature versus the inverse
of the pressure is shown below. In that plot it is easier to compare the variation across
temperatures. For example, comparing the variation in the pressure values at a temperature of
about 25 with the variation in the pressure values at temperatures near 45 and 70, this plot shows
about the same level of variation at all three temperatures. It will still be critical to look at
residual plots after fitting the model to the transformed variables, however, to really see whether
or not the transformation we've chosen is effective. The residual scale is really the only scale that
can reveal that level of detail.
Enlarged View of
Temperature Versus
1/Pressure
Transforming Having found a transformation that appears to stabilize the standard deviations of the
Temperature to measurements, the next step in the process is to find a transformation of the temperature that will
Linearity restore the straight-line relationship, or some other simple relationship, between the temperature
and pressure. The same three basic transformations that can often be used to stabilize the
variation are also usually able to transform the predictor to restore the original relationship
between the variables. Plots of the temperature and the three transformations of the temperature
versus the inverse of the pressure are shown below.
Transformations of
the Temperature
Comparing the plots of the various transformations of the temperature versus the inverse of the
pressure, it appears that the straight-line relationship between the variables is restored when the
inverse of the temperature is used. This makes intuitive sense because if the temperature and
pressure are related by a straight line, then the same transformation applied to both variables
should change them both similarly, retaining their original relationship. Now, after fitting a
straight line to the transformed data, the residuals plotted versus both the transformed and original
values of temperature indicate that the straight-line model fits the data and that the random
variation no longer increases with increasing temperature. Additional diagnostic plots of the
residuals confirm that the model fits the data well.
Using Weighted Least As discussed in the overview of different methods for building process models, the goal when
Squares using weighted least squares regression is to ensure that each data point has an appropriate level
of influence on the final parameter estimates. Using the weighted least squares fitting criterion,
the parameter estimates are obtained by minimizing
Optimal results, which minimize the uncertainty in the parameter estimators, are obtained when
the weights, , used to estimate the values of the unknown parameters are inversely proportional
to the variances at each combination of predictor variable values:
Unfortunately, however, these optimal weights, which are based on the true variances of each
data point, are never known. Estimated weights have to be used instead. When estimated weights
are used, the optimality properties associated with known weights no longer strictly apply.
However, if the weights can be estimated with high enough precision, their use can significantly
improve the parameter estimates compared to the results that would be obtained if all of the data
points were equally weighted.
Direct Estimation of If there are replicates in the data, the most obvious way to estimate the weights is to set the
Weights weight for each data point equal to the reciprocal of the sample variance obtained from the set of
replicate measurements to which the data point belongs. Mathematically, this would be
where
● are the weights indexed by their predictor variable levels and replicate measurements,
● indexes the unique combinations of predictor variable values,
● indexes the replicates within each combination of predictor variable values,
● is the sample standard deviation of the response variable at the ith combination of
predictor variable values,
● is the number of replicate observations at the ith combination of predictor variable
values,
● are the individual data points indexed by their predictor variable levels and replicate
measurements,
● is the mean of the responses at the ith combination of predictor variable levels.
Unfortunately, although this method is attractive, it rarely works well. This is because when the
weights are estimated this way, they are usually extremely variable. As a result, the estimated
weights do not correctly control how much each data point should influence the parameter
estimates. This method can work, but it requires a very large number of replicates at each
combination of predictor variables. In fact, if this method is used with too few replicate
measurements, the parameter estimates can actually be more variable than they would have been
if the unequal variation were ignored.
A Better Strategy for A better strategy for estimating the weights is to find a function that relates the standard deviation
Estimating the of the response at each combination of predictor variable values to the predictor variables
Weights themselves. This means that if
(denoting the unknown parameters in the function by ), then the weights can be set to
This approach to estimating the weights usually provides more precise estimates than direct
estimation because fewer quantities have to be estimated and there is more data to estimate each
one.
Estimating Weights If there are only very few or no replicate measurements for each combination of predictor
Without Replicates variable values, then approximate replicate groups can be formed so that weights can be
estimated. There are several possible approaches to forming the replicate groups.
1. One method is to manually form the groups based on plots of the response against the
predictor variables. Although this allows a lot of flexibility to account for the features of a
specific data set, it often impractical. However, this approach may be useful for relatively
small data sets in which the spacing of the predictor variable values is very uneven.
2. Another approach is to divide the data into equal-sized groups of observations after sorting
by the values of the response variable. It is important when using this approach not to make
the size of the replicate groups too large. If the groups are too large, the standard deviations
of the response in each group will be inflated because the approximate replicates will differ
from each other too much because of the deterministic variation in the data. Again, plots of
the response variable versus the predictor variables can be used as a check to confirm that
the approximate sets of replicate measurements look reasonable.
3. A third approach is to choose the replicate groups based on ranges of predictor variable
values. That is, instead of picking groups of a fixed size, the ranges of the predictor
variables are divided into equal size increments or bins and the responses in each bin are
treated as replicates. Because the sizes of the groups may vary, there is a tradeoff in this
case between defining the intervals for approximate replicates to be too narrow or too wide.
As always, plots of the response variable against the predictor variables can serve as a
guide.
Although the exact estimates of the weights will be somewhat dependent on the approach used to
define the replicate groups, the resulting weighted fit is typically not particularly sensitive to
small changes in the definition of the weights when the weights are based on a simple, smooth
function.
Power Function One particular function that often works well for modeling the variances is a power of the mean
Model for the Weights at each combination of predictor variable values,
Iterative procedures for simultaneously fitting a weighted least squares model to the original data
and a power function model for the weights are discussed in Carroll and Ruppert (1988), and
Ryan (1997).
Fitting the Model for When fitting the model for the estimation of the weights,
Estimation of the
Weights
,
it is important to note that the usual regression assumptions do not hold. In particular, the
variation of the random errors is not constant across the different sets of replicates and their
distribution is not normal. However, this can be often be accounted for by using transformations
(the ln transformation often stabilizes the variation), as described above.
Validating the Model Of course, it is always a good idea to check the assumptions of the analysis, as in any
for Estimation of the model-building effort, to make sure the model of the weights seems to fit the weight data
Weights reasonably well. The fit of the weights model often does not need to meet all of the usual
standards to be effective, however.
Using Weighted Once the weights have been estimated and the model has been fit to the original data using
Residuals to Validate weighted least squares, the validation of the model follows as usual, with one exception. In a
WLS Models weighted analysis, the distribution of the residuals can vary substantially with the different values
of the predictor variables. This necessitates the use of weighted residuals [Graybill and Iyer
(1994)] when carrying out a graphical residual analysis so that the plots can be interpreted as
usual. The weighted residuals are given by the formula
It is important to note that most statistical software packages do not compute and return weighted
residuals when a weighted fit is done, so the residuals will usually have to be weighted manually
in an additional step. If after computing a weighted least squares fit using carefully estimated
weights, the residual plots still show the same funnel-shaped pattern as they did for the initial
equally-weighted fit, it is likely that you may have forgotten to compute or plot the weighted
residuals.
Example of WLS The power function model for the weights, mentioned above, is often especially convenient when
Using the Power there is only one predictor variable. In this situation the general model given above can usually be
Function Model simplified to the power function
which does not require the use of iterative fitting methods. This model will be used with the
modified version of the Pressure/Temperature data, plotted below, to illustrate the steps needed to
carry out a weighted least squares fit.
Modified
Pressure/Temperature
Data
Defining Sets of From the data, plotted above, it is clear that there are not many true replicates in this data set. As
Approximate a result, sets of approximate replicate measurements need to be defined in order to use the power
Replicate function model to estimate the weights. In this case, this was done by rounding a multiple of the
Measurements temperature to the nearest degree and then converting the rounded data back to the original scale.
This is an easy way to identify sets of measurements that have temperatures that are relatively
close together. If this process had produced too few sets of replicates, a smaller factor than three
could have been used to spread the data out further before rounding. If fewer replicate sets were
needed, then a larger factor could have been used. The appropriate value to use is a matter of
judgment. An ideal value is one that doesn't combine values that are too different and that yields
sets of replicates that aren't too different in size. A table showing the original data, the rounded
temperatures that define the approximate replicates, and the replicate standard deviations is listed
below.
Data with
Approximate Rounded Standard
Replicates Temperature Temperature Pressure Deviation
---------------------------------------------
21.602 21 91.423 0.192333
21.448 21 91.695 0.192333
23.323 24 98.883 1.102380
22.971 24 97.324 1.102380
25.854 27 107.620 0.852080
25.609 27 108.112 0.852080
25.838 27 109.279 0.852080
29.242 30 119.933 11.046422
31.489 30 135.555 11.046422
34.101 33 139.684 0.454670
33.901 33 139.041 0.454670
37.481 36 150.165 0.031820
35.451 36 150.210 0.031820
39.506 39 164.155 2.884289
40.285 39 168.234 2.884289
43.004 42 180.802 4.845772
41.449 42 172.646 4.845772
42.989 42 169.884 4.845772
41.976 42 171.617 4.845772
44.692 45 180.564 NA
48.599 48 191.243 5.985219
47.901 48 199.386 5.985219
49.127 48 202.913 5.985219
49.542 51 196.225 9.074554
51.144 51 207.458 9.074554
50.995 51 205.375 9.074554
50.917 51 218.322 9.074554
54.749 54 225.607 2.040637
53.226 54 223.994 2.040637
54.467 54 229.040 2.040637
55.350 54 227.416 2.040637
54.673 54 223.958 2.040637
54.936 54 224.790 2.040637
57.549 57 230.715 10.098899
56.982 57 216.433 10.098899
58.775 60 224.124 23.120270
61.204 60 256.821 23.120270
68.297 69 276.594 6.721043
68.476 69 267.296 6.721043
68.774 69 280.352 6.721043
Transformation of the With the replicate groups defined, a plot of the ln of the replicate variances versus the ln of the
Weight Data temperature shows the transformed data for estimating the weights does appear to follow the
power function model. This is because the ln-ln transformation linearizes the power function, as
well as stabilizing the variation of the random errors and making their distribution approximately
normal.
Specification of The Splus output from the fit of the weight estimation model is shown below. Based on the output
Weight Function and the associated residual plots, the model of the weights seems reasonable, and
should be an appropriate weight function for the modified Pressure/Temperature data. The weight
function is based only on the slope from the fit to the transformed weight data because the
weights only need to be proportional to the replicate variances. As a result, we can ignore the
estimate of in the power function since it is only a proportionality constant (in original units of
the model). The exponent on the temperature in the weight function is usually rounded to the
nearest digit or single decimal place for convenience, since that small change in the weight
N = 14,
Fit of the WLS Model With the weight function estimated, the fit of the model with weighted least squares produces the
to the Pressure / residual plot below. This plot, which shows the weighted residuals from the fit versus
Temperature Data temperature, indicates that use of the estimated weight function has stabilized the increasing
variation in pressure observed with increasing temperature. The plot of the data with the
estimated regression function and additional residual plots using the weighted residuals confirm
that the model fits the data well.
Weighted Residuals
from WLS Fit of
Pressure /
Temperature Data
Comparison of Having modeled the data using both transformed variables and weighted least squares to account
Transformed and for the non-constant standard deviations observed in pressure, it is interesting to compare the two
Weighted Results resulting models. Logically, at least one of these two models cannot be correct (actually, probably
neither one is exactly correct). With the random error inherent in the data, however, there is no
way to tell which of the two models actually describes the relationship between pressure and
temperature better. The fact that the two models lie right on top of one another over almost the
entire range of the data tells us that. Even at the highest temperatures, where the models diverge
slightly, both models match the small amount of data that is available reasonably well. The only
way to differentiate between these models is to use additional scientific knowledge or collect a lot
more data. The good news, though, is that the models should work equally well for predictions or
calibrations based on these data, or for basic understanding of the relationship between
temperature and pressure.
4. Process Modeling
4.4. Data Analysis for Process Modeling
4.4.5. If my current model does not fit the data well, how can I improve it?
Using The basic steps for using transformations to handle data with non-normally distributed random
Transformations errors are essentially the same as those used to handle non-constant variation of the random
errors.
1. Transform the response variable to make the distribution of the random errors
approximately normal.
2. Transform the predictor variables, if necessary, to attain or restore a simple functional form
for the regression function.
3. Fit and validate the model in the transformed variables.
4. Transform the predicted values back into the original units using the inverse of the
transformation applied to the response variable.
The main difference between using transformations to account for non-constant variation and
non-normality of the random errors is that it is harder to directly see the effect of a transformation
on the distribution of the random errors. It is very often the case, however, that non-normality and
non-constant standard deviation of the random errors go together, and that the same
transformation will correct both problems at once. In practice, therefore, if you choose a
transformation to fix any non-constant variation in the data, you will often also improve the
normality of the random errors. If the data appear to have non-normally distributed random
errors, but do have a constant standard deviation, you can always fit models to several sets of
transformed data and then check to see which transformation appears to produce the most
normally distributed residuals.
Typical Not surprisingly, three transformations that are often effective for making the distribution of the
Transformations for random errors approximately normal are:
Meeting 1. ,
Distributional
Assumptions
2. (note: the base of the logarithm does not really matter), and
3. .
These are the same transformations often used for stabilizing the variation in the data. Other
appropriate transformations to improve the distributional properties of the random errors may be
suggested by scientific knowledge or selected using the data. However, these three
transformations are good ones to start with since they work well in so many situations.
Example To illustrate how to use transformations to change the distribution of the random errors, we will
look at a modified version of the Pressure/Temperature example in which the errors are uniformly
distributed. Comparing the results obtained from fitting the data in their original units and under
different transformations will directly illustrate the effects of the transformations on the
distribution of the random errors.
Modified
Pressure/Temperature
Data with Uniform
Random Errors
Fit of Model to the A four-plot of the residuals obtained after fitting a straight-line model to the
Untransformed Data Pressure/Temperature data with uniformly distributed random errors is shown below. The
histogram and normal probability plot on the bottom row of the four-plot are the most useful plots
for assessing the distribution of the residuals. In this case the histogram suggests that the
distribution is more rectangular than bell-shaped, indicating the random errors a not likely to be
normally distributed. The curvature in the normal probability plot also suggests that the random
errors are not normally distributed. If the random errors were normally distributed the normal
probability plots should be a fairly straight line. Of course it wouldn't be perfectly straight, but
smooth curvature or several points lying far from the line are fairly strong indicators of
non-normality.
Residuals from
Straight-Line Model
of Untransformed
Data with Uniform
Random Errors
Selection of Going through a set of steps similar to those used to find transformations to stabilize the random
Appropriate variation, different pairs of transformations of the response and predictor which have a simple
Transformations functional form and will potentially have more normally distributed residuals are chosen. In the
multiplots below, all of the possible combinations of basic transformations are applied to the
temperature and pressure to find the pairs which have simple functional forms. In this case, which
is typical, the the data with square root-square root, ln-ln, and inverse-inverse tranformations all
appear to follow a straight-line model. The next step will be to fit lines to each of these sets of
data and then to compare the residual plots to see whether any have random errors which appear
to be normally distributed.
sqrt(Pressure) vs
Different
Tranformations of
Temperature
log(Pressure) vs
Different
Tranformations of
Temperature
1/Pressure vs
Different
Tranformations of
Temperature
Fit of Model to The normal probability plots and histograms below show the results of fitting straight-line models
Transformed to the three sets of transformed data. The results from the fit of the model to the data in its
Variables original units are also shown for comparison. From the four normal probability plots it looks like
the model fit using the ln-ln transformations produces the most normally distributed random
errors. Because the normal probability plot for the ln-ln data is so straight, it seems safe to
conclude that taking the ln of the pressure makes the distribution of the random errors
approximately normal. The histograms seem to confirm this since the histogram of the ln-ln data
looks reasonably bell-shaped while the other histograms are not particularly bell-shaped.
Therefore, assuming the other residual plots also indicated that a straight line model fit this
transformed data, the use of ln-ln tranformations appears to be appropriate for analysis of this
data.
4. Process Modeling
4. Process Modeling
4.5. Use and Interpretation of Process Models
4. Process Modeling
4.5. Use and Interpretation of Process Models
4.5.1. What types of predictions can I make using the model?
Pressure / For example, in the Pressure/Temperature process, which is well described by a straight-line
Temperature model relating pressure ( ) to temperature ( ), the estimated regression function is found to be
Example
by substituting the estimated parameter values into the functional part of the model. Then to
estimate the average pressure at a temperature of 65, the predictor value of interest is subsituted in
the estimated regression function, yielding an estimated pressure of 263.21.
This estimation process works analogously for nonlinear models, LOESS models, and all other
types of functional process models.
Polymer Based on the output from fitting the stretched exponential model in time ( ) and temperature (
Relaxation ), the estimated regression function for the polymer relaxation data is
Example
Uncertainty Knowing that the estimated average pressure is 263.21 at a temperature of 65, or that the
Needed estimated average torque on a polymer sample under particular conditions is 5.26, however, is not
enough information to make scientific or engineering decisions about the process. This is because
the pressure value of 263.21 is only an estimate of the average pressure at a temperature of 65.
Because of the random error in the data, there is also random error in the estimated regression
parameters, and in the values predicted using the model. To use the model correctly, therefore, the
uncertainty in the prediction must also be quantified. For example, if the safe operational pressure
of a particular type of gas tank that will be used at a temperature of 65 is 300, different
engineering conclusions would be drawn from knowing the average actual pressure in the tank is
likely to lie somewhere in the range versus lying in the range .
Confidence In order to provide the necessary information with which to make engineering or scientific
Intervals decisions, predictions from process models are usually given as intervals of plausible values that
have a probabilistic interpretation. In particular, intervals that specify a range of values that will
contain the value of the regression function with a pre-specified probability are often used. These
intervals are called confidence intervals. The probability with which the interval will capture the
true value of the regression function is called the confidence level, and is most often set by the
user to be 0.95, or 95% in percentage terms. Any value between 0% and 100% could be specified,
though it would almost never make sense to consider values outside a range of about 80% to 99%.
The higher the confidence level is set, the more likely the true value of the regression function is
to be contained in the interval. The trade-off for high confidence, however, is wide intervals. As
the sample size is increased, however, the average width of the intervals typically decreases for
any fixed confidence level. The confidence level of an interval is usually denoted symbolically
using the notation , with denoting a user-specified probability, called the significance
level, that the interval will not capture the true value of the regression function. The significance
level is most often set to be 5% so that the associated confidence level will be 95%.
Computing Confidence intervals are computed using the estimated standard deviations of the estimated
Confidence regression function values and a coverage factor that controls the confidence level of the interval
Intervals and accounts for the variation in the estimate of the residual standard deviation.
The standard deviations of the predicted values of the estimated regression function depend on the
standard deviation of the random errors in the data, the experimental design used to collect the
data and fit the model, and the values of the predictor variables used to obtain the predicted
values. These standard deviations are not simple quantities that can be read off of the output
summarizing the fit of the model, but they can often be obtained from the software used to fit the
model. This is the best option, if available, because there are a variety of numerical issues that can
arise when the standard deviations are calculated directly using typical theoretical formulas.
Carefully written software should minimize the numerical problems encountered. If necessary,
however, matrix formulas that can be used to directly compute these values are given in texts such
as Neter, Wasserman, and Kutner.
The coverage factor used to control the confidence level of the intervals depends on the
distributional assumption about the errors and the amount of information available to estimate the
residual standard deviation of the fit. For procedures that depend on the assumption that the
random errors have a normal distribution, the coverage factor is typically a cut-off value from the
Student's t distribution at the user's pre-specified confidence level and with the same number of
degrees of freedom as used to estimate the residual standard deviation in the fit of the model.
Tables of the t distribution (or functions in software) may be indexed by the confidence level (
) or the significance level ( ). It is also important to note that since these are two-sided
intervals, half of the probability denoted by the significance level is usually assigned to each side
of the interval, so the proper entry in a t table or in a software function may also be labeled with
the value of , or , if the table or software is not exclusively designed for use with
two-sided tests.
The estimated values of the regression function, their standard deviations, and the coverage factor
are combined using the formula
with denoting the estimated value of the regression function, is the coverage factor,
indexed by a function of the significance level and by its degrees of freedom, and is the
standard deviation of . Some software may provide the total uncertainty for the confidence
interval given by the equation above, or may provide the lower and upper confidence bounds by
adding and subtracting the total uncertainty from the estimate of the average response. This can
save some computational effort when making predictions, if available. Since there are many types
of predictions that might be offered in a software package, however, it is a good idea to test the
software on an example for which confidence limits are already available to make sure that the
software is computing the expected type of intervals.
Confidence Computing confidence intervals for the average pressure in the Pressure/Temperature example,
Intervals for for temperatures of 25, 45, and 65, and for the average torque on specimens from the polymer
the Example relaxation example at different times and temperatures gives the results listed in the tables below.
Applications Note: the number of significant digits shown in the tables below is larger than would normally be
reported. However, as many significant digits as possible should be carried throughout all
calculations and results should only be rounded for final reporting. If reported numbers may be
used in further calculations, they should not be rounded even when finally reported. A useful rule
for rounding final results that will not be used for further computation is to round all of the
reported values to one or two significant digits in the total uncertainty, . This is the
convention for rounding that has been used in the tables below.
Pressure /
Lower 95% Upper 95%
Temperature Confidence Confidence
Example Bound Bound
Polymer
Lower 95% Upper 95%
Relaxation Confidence Confidence
Example Bound Bound
Interpretation As mentioned above, confidence intervals capture the true value of the regression function with a
of Confidence user-specified probability, the confidence level, using the estimated regression function and the
Intervals associated estimate of the error. Simulation of many sets of data from a process model provides a
good way to obtain a detailed understanding of the probabilistic nature of these intervals. The
advantage of using simulation is that the true model parameters are known, which is never the
case for a real process. This allows direct comparison of how confidence intervals constructed
from a limited amount of data relate to the true values that are being estimated.
The plot below shows 95% confidence intervals computed using 50 independently generated data
sets that follow the same model as the data in the Pressure/Temperature example. Random errors
from a normal distribution with a mean of zero and a known standard deviation are added to each
set of true temperatures and true pressures that lie on a perfect straight line to obtain the simulated
data. Then each data set is used to compute a confidence interval for the average pressure at a
temperature of 65. The dashed reference line marks the true value of the average pressure at a
temperature of 65.
Confidence
Intervals
Computed
from 50 Sets
of Simulated
Data
Confidence From the plot it is easy to see that not all of the intervals contain the true value of the average
Level pressure. Data sets 16, 26, and 39 all produced intervals that did not cover the true value of the
Specifies average pressure at a temperature of 65. Sometimes the interval may fail to cover the true value
Long-Run because the estimated pressure is unusually high or low because of the random errors in the data
Interval set. In other cases, the variability in the data may be underestimated, leading to an interval that is
Coverage too short to cover the true value. However, for 47 out of 50, or approximately 95% of the data
sets, the confidence intervals did cover the true average pressure. When the number of data sets
was increased to 5000, confidence intervals computed for 4723, or 94.46%, of the data sets
covered the true average pressure. Finally, when the number of data sets was increased to 10000,
95.12% of the confidence intervals computed covered the true average pressure. Thus, the
simulation shows that although any particular confidence interval might not cover its associated
true value, in repeated experiments this method of constructing intervals produces intervals that
cover the true value at the rate specified by the user as the confidence level. Unfortunately, when
dealing with real processes with unknown parameters, it is impossible to know whether or not a
particular confidence interval does contain the true value. It is nice to know that the error rate can
be controlled, however, and can be set so that it is far more likely than not that each interval
produced does contain the true value.
Interpretation To summarize the interpretation of the probabilistic nature of confidence intervals in words: in
Summary independent, repeated experiments, of the intervals will cover the true values,
given that the assumptions needed for the construction of the intervals hold.
4. Process Modeling
4.5. Use and Interpretation of Process Models
4.5.1. What types of predictions can I make using the model?
4.5.1.2. How can I predict the value and and estimate the
uncertainty of a single response?
A Different In addition to estimating the average value of the response variable for a given combination of preditor
Type of values, as discussed on the previous page, it is also possible to make predictions of the values of new
Prediction measurements or observations from a process. Unlike the true average response, a new measurement is
often actually observable in the future. However, there are a variety of different situations in which a
prediction of a measurement value may be more desirable than actually making an observation from
the process.
Example For example, suppose that a concrete supplier needs to supply concrete of a specified measured
strength for a particular contract, but knows that strength varies systematically with the ambient
temperature when the concrete is poured. In order to be sure that the concrete will meet the
specification, prior to pouring, samples from the batch of raw materials can be mixed, poured, and
measured in advance, and the relationship between temperature and strength can be modeled. Then
predictions of the strength across the range of possible field temperatures can be used to ensure the
product is likely to meet the specification. Later, after the concrete is poured (and the temperature is
recorded), the accuracy of the prediction can be verified.
The mechanics of predicting a new measurement value associated with a combination of predictor
variable values are similar to the steps used in the estimation of the average response value. In fact, the
actual estimate of the new measured value is obtained by evaluating the estimated regression function
at the relevant predictor variable values, exactly as is done for the average response. The estimates are
the same for these two quantities because, assuming the model fits the data, the only difference
between the average response and a particular measured response is a random error. Because the error
is random, and has a mean of zero, there is no additional information in the model that can be used to
predict the particular response beyond the information that is available when predicting the average
response.
Uncertainties As when estimating the average response, a probabilistic interval is used when predicting a new
Do Differ measurement to provide the information needed to make engineering or scientific conclusions.
However, even though the estimates of the average response and particular response values are the
same, the uncertainties of the two estimates do differ. This is because the uncertainty of the measured
response must include both the uncertainty of the estimated average response and the uncertainty of
the new measurement that could conceptually be observed. This uncertainty must be included if the
interval that will be used to summarize the prediction result is to contain the new measurement with
the specified confidence. To help distinguish the two types of predictions, the probabilistic intervals
for estimation of a new measurement value are called prediction intervals rather than confidence
intervals.
Standard The estimate of the standard deviation of the predicted value, , is obtained as described earlier.
Deviation of Because the residual standard deviation describes the random variation in each individual
Prediction
measurement or observation from the process, , the estimate of the residual standard deviation
obtained when fitting the model to the data, is used to account for the extra uncertainty needed to
predict a measurement value. Since the new observation is independent of the data used to fit the
model, the estimates of the two standard deviations are then combined by "root-sum-of-squares" or "in
quadrature", according to standard formulas for computing variances, to obtain the standard deviation
of the prediction of the new measurement, . The formula for is
Coverage Because both and are mathematically nothing more than different scalings of , and coverage
Factor and
factors from the t distribution only depend on the amount of data available for estimating , the
Prediction
coverage factors are the same for confidence and prediction intervals. Combining the coverage factor
Interval
and the standard deviation of the prediction, the formula for constructing prediction intervals is given
Formula
by
As with the computation of confidence intervals, some software may provide the total uncertainty for
the prediction interval given the equation above, or may provide the lower and upper prediction
bounds. As suggested before, however, it is a good idea to test the software on an example for which
prediction limits are already available to make sure that the software is computing the expected type of
intervals.
Prediction Computing prediction intervals for the measured pressure in the Pressure/Temperature example, at
Intervals for temperatures of 25, 45, and 65, and for the measured torque on specimens from the polymer relaxation
the Example example at different times and temperatures, gives the results listed in the tables below. Note: the
Applications number of significant digits shown is larger than would normally be reported. However, as many
significant digits as possible should be carried throughout all calculations and results should only be
rounded for final reporting. If reported numbers may be used in further calculations, then they should
not be rounded even when finally reported. A useful rule for rounding final results that will not be
used for further computation is to round all of the reported values to one or two significant digits in the
total uncertainty, . This is the convention for rounding that has been used in the tables
below.
Pressure /
Lower 95% Upper 95%
Temperature Prediction Prediction
Example Bound Bound
Polymer
Lower Upper
Relaxation 95% 95%
Example Prediction Prediction
Bound Bound
Interpretation Simulation of many sets of data from a process model provides a good way to obtain a detailed
of Prediction understanding of the probabilistic nature of the prediction intervals. The main advantage of using
Intervals simulation is that it allows direct comparison of how prediction intervals constructed from a limited
amount of data relate to the measured values that are being estimated.
The plot below shows 95% prediction intervals computed from 50 independently generated data sets
that follow the same model as the data in the Pressure/Temperature example. Random errors from the
normal distribution with a mean of zero and a known standard deviation are added to each set of true
temperatures and true pressures that lie on a perfect straight line to produce the simulated data. Then
each data set is used to compute a prediction interval for a newly observed pressure at a temperature of
65. The newly observed measurements, observed after making the prediction, are noted with an "X"
for each data set.
Prediction
Intervals
Computed
from 50 Sets
of Simulated
Data
Confidence From the plot it is easy to see that not all of the intervals contain the pressure values observed after the
Level prediction was made. Data set 4 produced an interval that did not capture the newly observed pressure
Specifies measurement at a temperature of 65. However, for 49 out of 50, or not much over 95% of the data sets,
Long-Run the prediction intervals did capture the measured pressure. When the number of data sets was
Interval increased to 5000, prediction intervals computed for 4734, or 94.68%, of the data sets covered the new
Coverage measured values. Finally, when the number of data sets was increased to 10000, 94.92% of the
confidence intervals computed covered the true average pressure. Thus, the simulation shows that
although any particular prediction interval might not cover its associated new measurement, in
repeated experiments this method produces intervals that contain the new measurements at the rate
specified by the user as the confidence level.
Comparison It is also interesting to compare these results to the analogous results for confidence intervals. Clearly
with the most striking difference between the two plots is in the sizes of the uncertainties. The uncertainties
Confidence for the prediction intervals are much larger because they must include the standard deviation of a
Intervals single new measurement, as well as the standard deviation of the estimated average response value.
The standard deviation of the estimated average response value is lower because a lot of the random
error that is in each measurement cancels out when the data are used to estimate the unknown
parameters in the model. In fact, if as the sample size increases, the limit on the width of a confidence
interval approaches zero while the limit on the width of the prediction interval as the sample size
increases approaches . Understanding the different types of intervals and the bounds on
interval width can be important when planning an experiment that requires a result to have no more
than a specified level of uncertainty to have engineering value.
Interpretation To summarize the interpretation of the probabilistic nature of confidence intervals in words: in
Summary independent, repeated experiments, of the intervals will be expected cover their true
values, given that the assumptions needed for the construction of the intervals hold.
4. Process Modeling
4.5. Use and Interpretation of Process Models
Calibration
1. Single-Use Calibration Intervals
Procedures
4. Process Modeling
4.5. Use and Interpretation of Process Models
4.5.2. How can I use my process model for calibration?
Calibration Like prediction estimates, calibration estimates can be computed relatively easily using the
Estimates regression equation. They are computed by setting a newly observed value of the response
variable, , which does not have an accompanying value of the predictor variable, equal to the
estimated regression function and solving for the unknown value of the predictor variable.
Depending on the complexity of the regression function, this may be done analytically, but
sometimes numerical methods are required. Fortunatel, the numerical methods needed are not
complicated, and once implemented are often easier to use than analytical methods, even for
simple regression functions.
Pressure /
Temperature
In the Pressure/Temperature example, pressure measurements could be used to measure the
Example
temperature of the system by observing a new pressure value, setting it equal to the estimated
regression function,
and solving for the temperature. If a pressure of 178 were measured, the associated temperature
would be estimated to be about 43.
Although this is a simple process for the straight-line model, note that even for this simple
regression function the estimate of the temperature is not linear in the parameters of the model.
Numerical To set this up to be solved numerically, the equation simply has to be set up in the form
Approach
and then the function of temperature ( ) defined by the left-hand side of the equation can be used
as the argument in an arbitrary root-finding function. It is typically necessary to provide the
root-finding software with endpoints on opposite sides of the root. These can be obtained from a
plot of the calibration data and usually do not need to be very precise. In fact, it is often adequate
to simply set the endpoints equal to the range of the calibration data, since calibration functions
tend to be increasing or decreasing functions without local minima or maxima in the range of the
data. For the pressure/temperature data, the endpoints used in the root-finding software could
even be set to values like -5 and 100, broader than the range of the data. This choice of end points
would even allow for extrapolation if new pressure values outside the range of the original
calibration data were observed.
Thermocouple For the more realistic thermocouple calibration example, which is well fit by a LOESS model that
Calibration does not require an explicit functional form, the numerical approach must be used to obtain
Example calibration estimates. The LOESS model is set up identically to the straight-line model for the
numerical solution, using the estimated regression function from the software used to fit the
model.
Again the function of temperature ( ) on the left-hand side of the equation would be used as the
main argument in an arbitrary root-finding function. If for some reason were not
available in the software used to fit the model, it could always be created manually since LOESS
can ultimately be reduced to a series of weighted least squares fits. Based on the plot of the
thermocouple data, endpoints of 100 and 600 would probably work well for all calibration
estimates. Wider values for the endpoints are not useful here since extrapolations do not make
much sense for this type of local model.
Dataplot Since the verbal descriptions of these numerical techniques can be hard to follow, these ideas may
Code become clearer by looking at the actual Dataplot computer code for a quadratic calibration, which
can be found in the Load Cell Calibration case study. If you have downloaded Dataplot and
installed it, you can run the computations yourself.
Calibration As in prediction, the data used to fit the process model can also be used to determine the
Uncertainties uncertainty of the calibration. Both the variation in the average response and in the new
observation of the response value need to be accounted for. This is similar to the uncertainty for
the prediction of a new measurement. In fact, approximate calibration confidence intervals are
actually computed by solving for the predictor variable value in the formulas for prediction
interval end points [Graybill (1976)]. Because , the standard deviation of the prediction of a
measured response, is a function of the predictor variable, like the regression function itself, the
inversion of the prediction interval endpoints is usually messy. However, like the inversion of the
regression function to obtain estimates of the predictor variable, it can be easily solved
numerically.
The equations to be solved to obtain approximate lower and upper calibration confidence limits,
are, respectively,
and
with denoting the estimated standard deviation of the prediction of a new measurement.
and are both denoted as functions of the predictor variable, , here to make it clear
that those terms must be written as functions of the unknown value of the predictor variable. The
left-hand sides of the two equations above are used as arguments in the root-finding software, just
as the expression is used when computing the estimate of the predictor variable.
Confidence Confidence intervals for the true predictor variable values associated with the observed values of
Intervals for pressure (178) and voltage (1522) are given in the table below for the Pressure/Temperature
the Example example and the Thermocouple Calibration example, respectively. The approximate confidence
Applications limits and estimated values of the predictor variables were obtained numerically in both cases.
Estimated
Lower 95% Predictor Upper 95%
Confidence Variable Confidence
Example Bound Value Bound
Interpretation Although calibration confidence intervals have some unique features, viewed as confidence
of Calibration intervals, their interpretation is essentially analogous to that of confidence intervals for the true
Intervals average response. Namely, in repeated calibration experiments, when one calibration is made for
each set of data used to fit a calibration function and each single new observation of the response,
then approximately of the intervals computed as described above will capture
the true value of the predictor variable, which is a measurement on the primary measurement
scale.
The plot below shows 95% confidence intervals computed using 50 independently generated data
sets that follow the same model as the data in the Thermocouple calibration example. Random
errors from a normal distribution with a mean of zero and a known standard deviation are added
to each set of true temperatures and true voltages that follow a model that can be
well-approximated using LOESS to produce the simulated data. Then each data set and a newly
observed voltage measurement are used to compute a confidence interval for the true temperature
that produced the observed voltage. The dashed reference line marks the true temperature under
which the thermocouple measurements were made. It is easy to see that most of the intervals do
contain the true value. In 47 out of 50 data sets, or approximately 95%, the confidence intervals
covered the true temperature. When the number of data sets was increased to 5000, the
confidence intervals computed for 4657, or 93.14%, of the data sets covered the true temperature.
Finally, when the number of data sets was increased to 10000, 93.53% of the confidence intervals
computed covered the true temperature. While these intervals do not exactly attain their stated
coverage, as the confidence intervals for the average response do, the coverage is reasonably
close to the specified level and is probably adequate from a practical point of view.
Confidence
Intervals
Computed
from 50 Sets
of Simulated
Data
4. Process Modeling
4.5. Use and Interpretation of Process Models
4. Process Modeling
4. Process Modeling
4.6. Case Studies in Process Modeling
4. Process Modeling
4.6. Case Studies in Process Modeling
4.6.1. Load Cell Calibration
Resulting
Data Deflection Load
-------------------------
0.11019 150000
0.21956 300000
0.32949 450000
0.43899 600000
0.54803 750000
0.65694 900000
0.76562 1050000
0.87487 1200000
0.98292 1350000
1.09146 1500000
1.20001 1650000
1.30822 1800000
1.41599 1950000
1.52399 2100000
1.63194 2250000
1.73947 2400000
1.84646 2550000
1.95392 2700000
2.06128 2850000
2.16844 3000000
0.11052 150000
0.22018 300000
0.32939 450000
0.43886 600000
0.54798 750000
0.65739 900000
0.76596 1050000
0.87474 1200000
0.98300 1350000
1.09150 1500000
1.20004 1650000
1.30818 1800000
1.41613 1950000
1.52408 2100000
1.63159 2250000
1.73965 2400000
1.84696 2550000
1.95445 2700000
2.06177 2850000
2.16829 3000000
4. Process Modeling
4.6. Case Studies in Process Modeling
4.6.1. Load Cell Calibration
Plot the Plotting the data indicates that the hypothesized, simple relationship between load and deflection
Data is reasonable. The plot below shows the data. It indicates that a straight-line model is likely to fit
the data. It does not indicate any other problems, such as presence of outliers or nonconstant
standard deviation of the response.
Initial
Model:
Straight
Line
4. Process Modeling
4.6. Case Studies in Process Modeling
4.6.1. Load Cell Calibration
is easily fit to the data. The computer output from this process is shown below.
Before trying to interpret all of the numerical output, however, it is critical to check
that the assumptions underlying the parameter estimation are met reasonably well.
The next two sections show how the underlying assumptions about the data and
model are checked using graphical and numerical methods.
Dataplot
Output LEAST SQUARES POLYNOMIAL FIT
SAMPLE SIZE N = 40
DEGREE = 1
REPLICATION CASE
REPLICATION STANDARD DEVIATION = 0.2147264895D-03
REPLICATION DEGREES OF FREEDOM = 20
NUMBER OF DISTINCT SUBSETS = 20
4. Process Modeling
4.6. Case Studies in Process Modeling
4.6.1. Load Cell Calibration
Avoiding the This type of overlaid plot is useful for showing the relationship between the data and the
Trap predicted values from the regression function; however, it can obscure important detail about the
model. Plots of the residuals, on the other hand, show this detail well, and should be used to
check the quality of the fit. Graphical analysis of the residuals is the single most important
technique for determining the need for model refinement or for verifying that the underlying
assumptions of the analysis are met.
Hidden
Structure
Revealed
Scale of Plot The structure in the relationship between the residuals and the load clearly indicates that the
Key functional part of the model is misspecified. The ability of the residual plot to clearly show this
problem, while the plot of the data did not show it, is due to the difference in scale between the
plots. The curvature in the response is much smaller than the linear trend. Therefore the curvature
is hidden when the plot is viewed in the scale of the data. When the linear trend is subtracted,
however, as it is in the residual plot, the curvature stands out.
The plot of the residuals versus the predicted deflection values shows essentially the same
structure as the last plot of the residuals versus load. For more complicated models, however, this
plot can reveal problems that are not clear from plots of the residuals versus the predictor
variables.
Similar
Residual
Structure
Additional Further residual diagnostic plots are shown below. The plots include a run order plot, a lag plot, a
Diagnostic histogram, and a normal probability plot. Shown in a two-by-two array like this, these plots
Plots comprise a 4-plot of the data that is very useful for checking the assumptions underlying the
model.
Dataplot
4plot
Interpretation The structure evident in these residual plots also indicates potential problems with different
of Plots aspects of the model. Under ideal circumstances, the plots in the top row would not show any
systematic structure in the residuals. The histogram would have a symmetric, bell shape, and the
normal probability plot would be a straight line. Taken at face value, the structure seen here
indicates a time trend in the data, autocorrelation of the measurements, and a non-normal
distribution of the residuals.
It is likely, however, that these plots will look fine once the function describing the systematic
relationship between load and deflection has been corrected. Problems with one aspect of a
regression model often show up in more than one type of residual plot. Thus there is currently no
clear evidence from the 4-plot that the distribution of the residuals from an appropriate model
would be non-normal, or that there would be autocorrelation in the process, etc. If the 4-plot still
indicates these problems after the functional part of the model has been fixed, however, the
possibility that the problems are real would need to be addressed.
4. Process Modeling
4.6. Case Studies in Process Modeling
4.6.1. Load Cell Calibration
Dataplot
Output LEAST SQUARES POLYNOMIAL FIT
SAMPLE SIZE N = 40
DEGREE = 1
REPLICATION CASE
REPLICATION STANDARD DEVIATION = 0.2147264895D-03
REPLICATION DEGREES OF FREEDOM = 20
NUMBER OF DISTINCT SUBSETS = 20
Function The lack-of-fit test statistic is 214.7534, which also clearly indicates that the
Incorrect functional part of the model is not right. The 95% cut-off point for the test is 2.15.
Any value greater than that indicates that the hypothesis of a straight-line model for
this data should be rejected.
4. Process Modeling
4.6. Case Studies in Process Modeling
4.6.1. Load Cell Calibration
Reviewing the plots of the residuals versus all potential predictor variables can offer insight into
selection of a new model, just as a plot of the data can aid in selection of an initial model.
Iterating through a series of models selected in this way will often lead to a function that
describes the data well.
Residual
Structure
Indicates
Quadratic
The horseshoe-shaped structure in the plot of the residuals versus load suggests that a quadratic
polynomial might fit the data well. Since that is also the simplest polynomial model, after a
straight line, it is the next function to consider.
4. Process Modeling
4.6. Case Studies in Process Modeling
4.6.1. Load Cell Calibration
The computer output from this process is shown below. As for the straight-line
model, however, it is important to check that the assumptions underlying the
parameter estimation are met before trying to interpret the numerical output. The
steps used to complete the graphical residual analysis are essentially identical to
those used for the previous model.
Dataplot
Output LEAST SQUARES POLYNOMIAL FIT
for SAMPLE SIZE N = 40
Quadratic DEGREE = 2
Fit REPLICATION CASE
REPLICATION STANDARD DEVIATION = 0.2147264895D-03
REPLICATION DEGREES OF FREEDOM = 20
NUMBER OF DISTINCT SUBSETS = 20
4. Process Modeling
4.6. Case Studies in Process Modeling
4.6.1. Load Cell Calibration
Compare
to Initial
Model
This plot is almost identical to the analogous plot for the straight-line model, again illustrating the
lack of detail in the plot due to the scale. In this case, however, the residual plots will show that
the model does fit well.
Plot
Indicates
Model
Fits Well
The residuals randomly scattered around zero, indicate that the quadratic is a good function to
describe these data. There is also no indication of non-constant variability over the range of loads.
Plot Also
Indicates
Model
OK
This plot also looks good. There is no evidence of changes in variability across the range of
deflection.
No
Problems
Indicated
All of these residual plots have become satisfactory by simply by changing the functional form of
the model. There is no evidence in the run order plot of any time dependence in the measurement
process, and the lag plot suggests that the errors are independent. The histogram and normal
probability plot suggest that the random errors affecting the measurement process are normally
distributed.
4. Process Modeling
4.6. Case Studies in Process Modeling
4.6.1. Load Cell Calibration
Dataplot
Output LEAST SQUARES POLYNOMIAL FIT
SAMPLE SIZE N = 40
DEGREE = 2
REPLICATION CASE
REPLICATION STANDARD DEVIATION = 0.2147264895D-03
REPLICATION DEGREES OF FREEDOM = 20
NUMBER OF DISTINCT SUBSETS = 20
Regression From the numerical output, we can also find the regression function that will be used
Function for the calibration. The function, with its estimated parameters, is
All of the parameters are significantly different from zero, as indicated by the
associated t statistics. The 97.5% cut-off for the t distribution with 37 degrees of
freedom is 2.026. Since all of the t values are well above this cut-off, we can safely
conclude that none of the estimated parameters is equal to zero.
4. Process Modeling
4.6. Case Studies in Process Modeling
4.6.1. Load Cell Calibration
Calibration
Finding From the plot, it is clear that the load that produced the deflection of 1.239722 should be about
Bounds on 1,750,000, and would certainly lie between 1,500,000 and 2,000,000. This rough estimate of the
the Load possible load range will be used to compute the load estimate numerically.
Obtaining To solve for the numerical estimate of the load associated with the observed deflection, the
a observed value substituting in the regression function and the equation is solved for load.
Numerical Typically this will be done using a root finding procedure in a statistical or mathematical
Calibration package. That is one reason why rough bounds on the value of the load to be estimated are
Value needed.
Solving the
Regression
Equation
Which Even though the rough estimate of the load associated with an observed deflection is not
Solution? necessary to solve the equation, the other reason is to determine which solution to the equation is
correct, if there are multiple solutions. The quadratic calibration equation, in fact, has two
solutions. As we saw from the plot on the previous page, however, there is really no confusion
over which root of the quadratic function is the correct load. Essentially, the load value must be
between 150,000 and 3,000,000 for this problem. The other root of the regression equation and
the new deflection value correspond to a load of over 229,899,600. Looking at the data at hand, it
is safe to assume that a load of 229,899,600 would yield a deflection much greater than 1.24.
+/- What? The final step in the calibration process, after determining the estimated load associated with the
observed deflection, is to compute an uncertainty or confidence interval for the load. A single-use
95% confidence interval for the load, is obtained by inverting the formulas for the upper and
lower bounds of a 95% prediction interval for a new deflection value. These inequalities, shown
below, are usually solved numerically, just as the calibration equation was, to find the end points
of the confidence interval. For some models, including this one, the solution could actually be
obtained algebraically, but it is easier to let the computer do the work using a generic algorithm.
The three terms on the right-hand side of each inequality are the regression function ( ), a
t-distribution multiplier, and the standard deviation of a new measurement from the process ( ).
Regression software often provides convenient methods for computing these quantities for
arbitrary values of the predictor variables, which can make computation of the confidence interval
end points easier. Although this interval is not symmetric mathematically, the asymmetry is very
small, so for all practical purposes, the interval can be written as
if desired.
4. Process Modeling
4.6. Case Studies in Process Modeling
4.6.1. Load Cell Calibration
from the model and the and observed values suggests the
data on the same plot. model is ok.
5. Plot the residuals vs. the 5. This plot echos the information in
predicted values. the previous plot.
6. Do a 4-plot of the
6. None of these plots indicates a
residuals.
problem with the model.
4. Process Modeling
4.6. Case Studies in Process Modeling
4. Process Modeling
4.6. Case Studies in Process Modeling
4.6.2. Alaska Pipeline
Resulting
Data Field Lab
Defect Defect
Size Size Batch
-----------------------
18 20.2 1
38 56.0 1
15 12.5 1
20 21.2 1
18 15.5 1
36 39.0 1
20 21.0 1
43 38.2 1
45 55.6 1
65 81.9 1
43 39.5 1
38 56.4 1
33 40.5 1
10 14.3 1
50 81.5 1
10 13.7 1
50 81.5 1
15 20.5 1
53 56.0 1
60 80.7 2
18 20.0 2
38 56.5 2
15 12.1 2
20 19.6 2
18 15.5 2
36 38.8 2
20 19.5 2
43 38.0 2
45 55.0 2
65 80.0 2
43 38.5 2
38 55.8 2
33 38.8 2
10 12.5 2
50 80.4 2
10 12.7 2
50 80.9 2
15 20.5 2
53 55.0 2
15 19.0 3
37 55.5 3
15 12.3 3
18 18.4 3
11 11.5 3
35 38.0 3
20 18.5 3
40 38.0 3
50 55.3 3
36 38.7 3
50 54.5 3
38 38.0 3
10 12.0 3
75 81.7 3
10 11.5 3
85 80.0 3
13 18.3 3
50 55.3 3
58 80.2 3
58 80.7 3
48 55.8 4
12 15.0 4
63 81.0 4
10 12.0 4
63 81.4 4
13 12.5 4
28 38.2 4
35 54.2 4
63 79.3 4
13 18.2 4
45 55.5 4
9 11.4 4
20 19.5 4
18 15.5 4
35 37.5 4
20 19.5 4
38 37.5 4
50 55.5 4
70 80.0 4
40 37.5 4
21 15.5 5
19 23.7 5
10 9.8 5
33 40.8 5
16 17.5 5
5 4.3 5
32 36.5 5
23 26.3 5
30 30.4 5
45 50.2 5
33 30.1 5
25 25.5 5
12 13.8 5
53 58.9 5
36 40.0 5
5 6.0 5
63 72.5 5
43 38.8 5
25 19.4 5
73 81.5 5
45 77.4 5
52 54.6 6
9 6.8 6
30 32.6 6
22 19.8 6
56 58.8 6
15 12.9 6
45 49.0 6
4. Process Modeling
4.6. Case Studies in Process Modeling
4.6.2. Alaska Pipeline
This scatter plot shows that a straight line fit is a good initial candidate model for these data.
Plot by Batch These data were collected in six distinct batches. The first step in the analysis is to determine if
there is a batch effect.
In this case, the scientist was not inherently interested in the batch. That is, batch is a nuisance
factor and, if reasonable, we would like to analyze the data as if it came from a single batch.
However, we need to know that this is, in fact, a reasonable assumption to make.
This conditional plot shows a scatter plot for each of the six batches on a single page. Each of
these plots shows a similar pattern.
Linear We can follow up the conditional plot with a linear correlation plot, a linear intercept plot, a
Correlation linear slope plot, and a linear residual standard deviation plot. These four plots show the
and Related correlation, the intercept and slope from a linear fit, and the residual standard deviation for linear
Plots fits applied to each batch. These plots show how a linear fit performs across the six batches.
The linear correlation plot (upper left), which shows the correlation between field and lab defect
sizes versus the batch, indicates that batch six has a somewhat stronger linear relationship
between the measurements than the other batches do. This is also reflected in the significantly
lower residual standard deviation for batch six shown in the residual standard deviation plot
(lower right), which shows the residual standard deviation versus batch. The slopes all lie within
a range of 0.6 to 0.9 in the linear slope plot (lower left) and the intercepts all lie between 2 and 8
in the linear intercept plot (upper right).
Treat BATCH These summary plots, in conjunction with the conditional plot above, show that treating the data
as as a single batch is a reasonable assumption to make. None of the batches behaves badly
Homogeneous compared to the others and none of the batches requires a significantly different fit from the
others.
These two plots provide a good pair. The plot of the fit statistics allows quick and convenient
comparisons of the overall fits. However, the conditional plot can reveal details that may be
hidden in the summary plots. For example, we can more readily determine the existence of
clusters of points and outliers, curvature in the data, and other similar features.
Based on these plots we will ignore the BATCH variable for the remaining analysis.
4. Process Modeling
4.6. Case Studies in Process Modeling
4.6.2. Alaska Pipeline
The intercept parameter is estimated to be 4.99 and the slope parameter is estimated to be 0.73.
Both parameters are statistically significant.
6-Plot for Model When there is a single independent variable, the 6-plot provides a convenient method for initial
Validation model validation.
The basic assumptions for regression models are that the errors are random observations from a
normal distribution with mean of zero and constant standard deviation (or variance).
The plots on the first row show that the residuals have increasing variance as the value of the
independent variable (lab) increases in value. This indicates that the assumption of constant
standard deviation, or homogeneity of variances, is violated.
In order to see this more clearly, we will generate full- size plots of the predicted values with the
data and the residuals against the independent variable.
Plot of Predicted
Values with
Original Data
This plot shows more clearly that the assumption of homogeneous variances for the errors may be
violated.
Plot of Residual
Values Against
Independent
Variable
This plot also shows more clearly that the assumption of homogeneous variances is violated. This
assumption, along with the assumption of constant location, are typically easiest to see on this
plot.
Non-Homogeneous Because the last plot shows that the variances may differ more that slightly, we will address this
Variances issue by transforming the data or using weighted least squares.
4. Process Modeling
4.6. Case Studies in Process Modeling
4.6.2. Alaska Pipeline
Plot of Common The first step is to try transforming the response variable to find a tranformation that will equalize
Transformations the variances. In practice, the square root, ln, and reciprocal transformations often work well for
to Obtain this purpose. We will try these first.
Homogeneous
Variances
In examining these plots, we are looking for the plot that shows the most constant variability
across the horizontal range of the plot.
This plot indicates that the ln transformation is a good candidate model for achieving the most
homogeneous variances.
Plot of Common One problem with applying the above transformation is that the plot indicates that a straight-line
Transformations fit will no longer be an adequate model for the data. We address this problem by attempting to
to Linearize the find a transformation of the predictor variable that will result in the most linear fit. In practice, the
Fit square root, ln, and reciprocal transformations often work well for this purpose. We will try these
first.
This plot shows that the ln transformation of the predictor variable is a good candidate model.
Box-Cox
Linearity Plot
The previous step can be approached more formally by the use of the Box-Cox linearity plot. The
value on the x axis corresponding to the maximum correlation value on the y axis indicates the
power transformation that yields the most linear fit.
This plot indicates that a value of -0.1 achieves the most linear fit.
In practice, for ease of interpretation, we often prefer to use a common transformation, such as
the ln or square root, rather than the value that yields the mathematical maximum. However, the
Box-Cox linearity plot still indicates whether our choice is a reasonable one. That is, we might
sacrifice a small amount of linearity in the fit to have a simpler model.
In this case, a value of 0.0 would indicate a ln transformation. Although the optimal value from
the plot is -0.1, the plot indicates that any value between -0.2 and 0.2 will yield fairly similar
results. For that reason, we choose to stick with the common ln transformation.
ln-ln Fit Based on the above plots, we choose to fit a ln-ln model. Dataplot generated the following output
for this model (it is edited slightly for display).
3.5
2 A1 XTEMP 0.885175 (0.2302E-01)
38.
Note that although the residual standard deviation is significantly lower than it was for the
original fit, we cannot compare them directly since the fits were performed on different scales.
Plot of
Predicted
Values
The plot of the predicted values with the transformed data indicates a good fit. In addition, the
variability of the data across the horizontal range of the plot seems relatively constant.
6-Plot of Fit
Since we transformed the data, we need to check that all of the regression assumptions are now
valid.
The 6-plot of the residuals indicates that all of the regression assumptions are now satisfied.
Plot of
Residuals
In order to see more detail, we generate a full-size plot of the residuals versus the predictor
variable, as shown above. This plot suggests that the assumption of homogeneous variances is
now met.
4. Process Modeling
4.6. Case Studies in Process Modeling
4.6.2. Alaska Pipeline
Fit for
Estimating
Weights
For the pipeline data, we chose approximate replicate groups so that each group has four
observations (the last group only has three). This was done by first sorting the data by the
predictor variable and then taking four points in succession to form each replicate group.
Using the power function model with the data for estimating the weights, Dataplot generated the
following output for the fit of ln(variances) against ln(means) for the replicate groups. The output
has been edited slightly for display.
The fit output and plot from the replicate variances against the replicate means shows that the a
linear fit provides a reasonable fit with an estimated slope of 1.69. Note that this data set has a
small number of replicates, so you may get a slightly different estimate for the slope. For
example, S-PLUS generated a slope estimate of 1.52. This is caused by the sorting of the
predictor variable (i.e., where we have actual replicates in the data, different sorting algorithms
may put some observations in different replicate groups). In practice, any value for the slope,
which will be used as the exponent in the weight function, in the range 1.5 to 2.0 is probably
reasonable and should produce comparable results for the weighted fit.
We used an estimate of 1.5 for the exponent in the weighting function.
Residual
Plot for
Weight
Function
The residual plot from the fit to determine an appropriate weighting function reveals no obvious
problems.
Numerical Dataplot generated the following output for the weighted fit of the model that relates the field
Output measurements to the lab measurements (edited slightly for display).
from
Weighted LEAST SQUARES MULTILINEAR FIT
Fit SAMPLE SIZE N = 107
NUMBER OF VARIABLES = 1
REPLICATION CASE
REPLICATION STANDARD DEVIATION = 0.6112687111D+01
REPLICATION DEGREES OF FREEDOM = 29
NUMBER OF DISTINCT SUBSETS = 78
This output shows a slope of 0.81 and an intercept term of 2.35. This is compared to a slope of
0.73 and an intercept of 4.99 in the original model.
Plot of
Predicted
Values
The plot of the predicted values with the data indicates a good fit.
Diagnostic
Plots of
Weighted
Residuals
We need to verify that the weighting did not result in the other regression assumptions being
violated. A 6-plot, after weighting the residuals, indicates that the regression assumptions are
satisfied.
Plot of
Weighted
Residuals
vs Lab
Defect
Size
In order to check the assumption of homogeneous variances for the errors in more detail, we
generate a full sized plot of the weighted residuals versus the predictor variable. This plot
suggests that the errors now have homogeneous variances.
4. Process Modeling
4.6. Case Studies in Process Modeling
4.6.2. Alaska Pipeline
Plot of Fits
with Data
This plot shows that, compared to the original fit, the transformed and weighted fits generate
smaller predicted values for low values of lab defect size and larger predicted values for high
values of lab defect size. The three fits match fairly closely for intermediate values of lab defect
size. The transformed and weighted fit tend to agree for the low values of lab defect size.
However, for large values of lab defect size, the weighted fit tends to generate higher values for
the predicted values than does the transformed fit.
Conclusion Although the original fit was not bad, it violated the assumption of homogeneous variances for
the error term. Both the fit of the transformed data and the weighted fit successfully address this
problem without violating the other regression assumptions.
4. Process Modeling
4.6. Case Studies in Process Modeling
4.6.2. Alaska Pipeline
Click on the links below to start Dataplot and run this case
The links in this column will connect you with more detailed
study yourself. Each step may use results from previous steps,
information about each analysis step from the case study
so please be patient. Wait until the software verifies that the
description.
current step is complete before clicking on the next step.
1. Linear fit of field versus lab. 1. The linear fit was carried out.
Plot predicted values with the Although the initial fit looks good,
data. the plot indicates that the residuals
do not have homogeneous variances.
2. Generate a 6-plot for model 2. The 6-plot does not indicate any
validation. other problems with the model,
beyond the evidence of
non-constant error variance.
2. Examine residuals from weight fit 2. The residuals from this fit
to check adequacy of weight function. indicate no major problems.
3. Weighted linear fit of field versus 3. The weighted fit was carried out.
lab. Plot predicted values with The plot of the predicted values
the data. with the data indicates that the
fit of the model is improved.
4. Generate a 6-plot after weighting 4. The 6-plot shows that the model
the residuals for model validation. assumptions are satisfied.
1. Plot predicted values from each 1. The transformed and weighted fits
of the three models with the generate lower predicted values for
data. low values of defect size and larger
predicted values for high values of
defect size.
4. Process Modeling
4.6. Case Studies in Process Modeling
4. Process Modeling
4.6. Case Studies in Process Modeling
4.6.3. Ultrasonic Reference Block Study
Resulting
Data Ultrasonic Metal
Response Distance
-----------------------
92.9000 0.5000
78.7000 0.6250
64.2000 0.7500
64.9000 0.8750
57.1000 1.0000
43.3000 1.2500
31.1000 1.7500
23.6000 2.2500
31.0500 1.7500
23.7750 2.2500
17.7375 2.7500
13.8000 3.2500
11.5875 3.7500
9.4125 4.2500
7.7250 4.7500
7.3500 5.2500
8.0250 5.7500
90.6000 0.5000
76.9000 0.6250
71.6000 0.7500
63.6000 0.8750
54.0000 1.0000
39.2000 1.2500
29.3000 1.7500
21.4000 2.2500
29.1750 1.7500
22.1250 2.2500
17.5125 2.7500
14.2500 3.2500
9.4500 3.7500
9.1500 4.2500
7.9125 4.7500
8.4750 5.2500
6.1125 5.7500
80.0000 0.5000
79.0000 0.6250
63.8000 0.7500
57.2000 0.8750
53.2000 1.0000
42.5000 1.2500
26.8000 1.7500
20.4000 2.2500
26.8500 1.7500
21.0000 2.2500
16.4625 2.7500
12.5250 3.2500
10.5375 3.7500
8.5875 4.2500
7.1250 4.7500
6.1125 5.2500
5.9625 5.7500
74.1000 0.5000
67.3000 0.6250
60.8000 0.7500
55.5000 0.8750
50.3000 1.0000
41.0000 1.2500
29.4000 1.7500
20.4000 2.2500
29.3625 1.7500
21.1500 2.2500
16.7625 2.7500
13.2000 3.2500
10.8750 3.7500
8.1750 4.2500
7.3500 4.7500
5.9625 5.2500
5.6250 5.7500
81.5000 0.5000
62.4000 0.7500
32.5000 1.5000
12.4100 3.0000
13.1200 3.0000
15.5600 3.0000
5.6300 6.0000
78.0000 0.5000
59.9000 0.7500
33.2000 1.5000
13.8400 3.0000
12.7500 3.0000
14.6200 3.0000
3.9400 6.0000
76.8000 0.5000
61.0000 0.7500
32.9000 1.5000
13.8700 3.0000
11.8100 3.0000
13.3100 3.0000
5.4400 6.0000
78.0000 0.5000
63.5000 0.7500
33.8000 1.5000
12.5600 3.0000
5.6300 6.0000
12.7500 3.0000
13.1200 3.0000
5.4400 6.0000
76.8000 0.5000
60.0000 0.7500
47.8000 1.0000
32.0000 1.5000
22.2000 2.0000
22.5700 2.0000
18.8200 2.5000
13.9500 3.0000
11.2500 4.0000
9.0000 5.0000
6.6700 6.0000
75.8000 0.5000
62.0000 0.7500
48.8000 1.0000
35.2000 1.5000
20.0000 2.0000
20.3200 2.0000
19.3100 2.5000
12.7500 3.0000
10.4200 4.0000
7.3100 5.0000
7.4200 6.0000
70.5000 0.5000
59.5000 0.7500
48.5000 1.0000
35.8000 1.5000
21.0000 2.0000
21.6700 2.0000
21.0000 2.5000
15.6400 3.0000
8.1700 4.0000
8.5500 5.0000
10.1200 6.0000
78.0000 0.5000
66.0000 0.6250
62.0000 0.7500
58.0000 0.8750
47.7000 1.0000
37.8000 1.2500
20.2000 2.2500
21.0700 2.2500
13.8700 2.7500
9.6700 3.2500
7.7600 3.7500
5.4400 4.2500
4.8700 4.7500
4.0100 5.2500
3.7500 5.7500
24.1900 3.0000
25.7600 3.0000
18.0700 3.0000
11.8100 3.0000
12.0700 3.0000
16.1200 3.0000
70.8000 0.5000
54.7000 0.7500
48.0000 1.0000
39.8000 1.5000
29.8000 2.0000
23.7000 2.5000
29.6200 2.0000
23.8100 2.5000
17.7000 3.0000
11.5500 4.0000
12.0700 5.0000
8.7400 6.0000
80.7000 0.5000
61.3000 0.7500
47.5000 1.0000
29.0000 1.5000
24.0000 2.0000
17.7000 2.5000
24.5600 2.0000
18.6700 2.5000
16.2400 3.0000
8.7400 4.0000
7.8700 5.0000
8.5100 6.0000
66.7000 0.5000
59.2000 0.7500
40.8000 1.0000
30.7000 1.5000
25.7000 2.0000
16.3000 2.5000
25.9900 2.0000
16.9500 2.5000
13.3500 3.0000
8.6200 4.0000
7.2000 5.0000
6.6400 6.0000
13.6900 3.0000
81.0000 0.5000
64.5000 0.7500
35.5000 1.5000
13.3100 3.0000
4.8700 6.0000
12.9400 3.0000
5.0600 6.0000
15.1900 3.0000
14.6200 3.0000
15.6400 3.0000
25.5000 1.7500
25.9500 1.7500
81.7000 0.5000
61.6000 0.7500
29.8000 1.7500
29.8100 1.7500
17.1700 2.7500
10.3900 3.7500
28.4000 1.7500
28.6900 1.7500
81.3000 0.5000
60.9000 0.7500
16.6500 2.7500
10.0500 3.7500
28.9000 1.7500
28.9500 1.7500
4. Process Modeling
4.6. Case Studies in Process Modeling
4.6.3. Ultrasonic Reference Block Study
This plot shows an exponentially decaying pattern in the data. This suggests that some type of
exponential function might be an appropriate model for the data.
Initial Model There are two issues that need to be addressed in the initial model selection when fitting a
Selection nonlinear model.
1. We need to determine an appropriate functional form for the model.
2. We need to determine appropriate starting values for the estimation of the model
parameters.
Determining an Due to the large number of potential functions that can be used for a nonlinear model, the
Appropriate determination of an appropriate model is not always obvious. Some guidelines for selecting an
Functional Form appropriate model were given in the analysis chapter.
for the Model
The plot of the data will often suggest a well-known function. In addition, we often use scientific
and engineering knowledge in determining an appropriate model. In scientific studies, we are
frequently interested in fitting a theoretical model to the data. We also often have historical
knowledge from previous studies (either our own data or from published studies) of functions that
have fit similar data well in the past. In the absence of a theoretical model or experience with
prior data sets, selecting an appropriate function will often require a certain amount of trial and
error.
Regardless of whether or not we are using scientific knowledge in selecting the model, model
validation is still critical in determining if our selected model is adequate.
Determining Nonlinear models are fit with iterative methods that require starting values. In some cases,
Appropriate inappropriate starting values can result in parameter estimates for the fit that converge to a local
Starting Values minimum or maximum rather than the global minimum or maximum. Some models are relatively
insensitive to the choice of starting values while others are extremely sensitive.
If you have prior data sets that fit similar models, these can often be used as a guide for
determining good starting values. We can also sometimes make educated guesses from the
functional form of the model. For some models, there may be specific methods for determining
starting values. For example, sinusoidal models that are commonly used in time series are quite
sensitive to good starting values. The beam deflection case study shows an example of obtaining
starting values for a sinusoidal model.
In the case where you do not know what good starting values would be, one approach is to create
a grid of values for each of the parameters of the model and compute some measure of goodness
of fit, such as the residual standard deviation, at each point on the grid. The idea is to create a
broad grid that encloses reasonable values for the parameter. However, we typically want to keep
the number of grid points for each parameter relatively small to keep the computational burden
down (particularly as the number of parameters in the model increases). The idea is to get in the
right neighborhood, not to find the optimal fit. We would pick the grid point that corresponds to
the smallest residual standard deviation as the starting values.
Fitting Data to a For this particular data set, the scientist was trying to fit the following theoretical model.
Theoretical Model
Prefit to Obtain We used the Dataplot PREFIT command to determine starting values based on a grid of the
Starting Values parameter values. Here, our grid was 0.1 to 1.0 in increments of 0.1. The output has been edited
slightly for display.
The best starting values based on this grid is to set all three parameters to 0.1.
Nonlinear Fit The following fit output was generated by Dataplot (it has been edited for display).
Output
Plot of Predicted
Values with
Original Data
This plot shows a reasonably good fit. It is difficult to detect any violations of the fit assumptions
from this plot. The estimated model is
6-Plot for Model When there is a single independent variable, the 6-plot provides a convenient method for initial
Validation model validation.
The basic assumptions for regression models are that the errors are random observations from a
normal distribution with zero mean and constant standard deviation (or variance).
These plots suggest that the variance of the errors is not constant.
In order to see this more clearly, we will generate full- sized a plot of the predicted values from
the model and overlay the data and plot the residuals against the independent variable, Metal
Distance.
Plot of Residual
Values Against
Independent
Variable
This plot suggests that the errors have greater variance for the values of metal distance less than
one than elsewhere. That is, the assumption of homogeneous variances seems to be violated.
Non-Homogeneous Except when the Metal Distance is less than or equal to one, there is not strong evidence that the
Variances error variances differ. Nevertheless, we will use transformations or weighted fits to see if we can
elminate this problem.
4. Process Modeling
4.6. Case Studies in Process Modeling
4.6.3. Ultrasonic Reference Block Study
Plot of Common The first step is to try transformations of the response variable that will result in homogeneous
Transformations variances. In practice, the square root, ln, and reciprocal transformations often work well for this
to Obtain purpose. We will try these first.
Homogeneous
Variances
In examining these four plots, we are looking for the plot that shows the most constant variability
of the ultrasonic response across values of metal distance. Although the scales of these plots
differ widely, which would seem to make comparisons difficult, we are not comparing the
absolute levesl of variability between plots here. Instead we are comparing only how constant the
variation within each plot is for these four plots. The plot with the most constant variation will
indicate which transformation is best.
Based on constancy of the variation in the residuals, the square root transformation is probably
the best tranformation to use for this data.
Plot of Common After transforming the response variable, it is often helpful to transform the predictor variable as
Transformations well. In practice, the square root, ln, and reciprocal transformations often work well for this
to Predictor purpose. We will try these first.
Variable
This plot shows that none of the proposed transformations offers an improvement over using the
raw predictor variable.
Square Root Fit Based on the above plots, we choose to fit a model with a square root transformation for the
response variable and no transformation for the predictor variable. Dataplot generated the
following output for this model (it is edited slightly for display).
3 B3 0.638590E-01 (0.2900E-02)
22.
Although the residual standard deviation is lower than it was for the original fit, we cannot
compare them directly since the fits were performed on different scales.
Plot of
Predicted
Values
The plot of the predicted values with the transformed data indicates a good fit. The fitted model is
6-Plot of Fit
Since we transformed the data, we need to check that all of the regression assumptions are now
valid.
The 6-plot of the data using this model indicates no obvious violations of the assumptions.
Plot of
Residuals
In order to see more detail, we generate a full size version of the residuals versus predictor
variable plot. This plot suggests that the errors now satisfy the assumption of homogeneous
variances.
4. Process Modeling
4.6. Case Studies in Process Modeling
4.6.3. Ultrasonic Reference Block Study
Finding An Techniques for determining an appropriate weight function were discussed in detail in Section
Appropriate 4.4.5.2.
Weight
Function In this case, we have replication in the data, so we can fit the power model
to the variances from each set of replicates in the data and use for the weights.
Fit for
Estimating
Weights
Dataplot generated the following output for the fit of ln(variances) against ln(means) for the
replicate groups. The output has been edited slightly for display.
The fit output and plot from the replicate variances against the replicate means shows that the
linear fit provides a reasonable fit, with an estimated slope of -1.03.
Based on this fit, we used an estimate of -1.0 for the exponent in the weighting function.
Residual
Plot for
Weight
Function
The residual plot from the fit to determine an appropriate weighting function reveals no obvious
problems.
Numerical Dataplot generated the following output for the weighted fit (edited slightly for display).
Output
from
Weighted
Fit LEAST SQUARES NON-LINEAR FIT
SAMPLE SIZE N = 214
MODEL--ULTRASON =EXP(-B1*METAL)/(B2+B3*METAL)
REPLICATION CASE
REPLICATION STANDARD DEVIATION = 0.3281762600D+01
REPLICATION DEGREES OF FREEDOM = 192
NUMBER OF DISTINCT SUBSETS = 22
Plot of To assess the quality of the weighted fit, we first generate a plot of the predicted line with the
Predicted original data.
Values
The plot of the predicted values with the data indicates a good fit. The model for the weighted fit
is
6-Plot of
Fit
We need to verify that the weighted fit does not violate the regression assumptions. The 6-plot
indicates that the regression assumptions are satisfied.
Plot of
Residuals
In order to check the assumption of equal error variances in more detail, we generate a full-sized
version of the residuals versus the predictor variable. This plot suggests that the residuals now
have approximately equal variability.
4. Process Modeling
4.6. Case Studies in Process Modeling
4.6.3. Ultrasonic Reference Block Study
Plot of Fits The first step in comparing the fits is to plot all three sets of predicted values (in the original
with Data units) on the same plot with the raw data.
This plot shows that all three fits generate comparable predicted values. We can also compare the
residual standard deviations (RESSD) from the fits. The RESSD for the transformed data is
calculated after translating the predicted values back to the original scale.
In this case, the RESSD is quite close for all three fits (which is to be expected based on the plot).
Conclusion Given that transformed and weighted fits generate predicted values that are quite close to the
original fit, why would we want to make the extra effort to generate a transformed or weighted
fit? We do so to develop a model that satisfies the model assumptions for fitting a nonlinear
model. This gives us more confidence that conclusions and analyses based on the model are
justified and appropriate.
4. Process Modeling
4.6. Case Studies in Process Modeling
4.6.3. Ultrasonic Reference Block Study
Click on the links below to start Dataplot and run this case study
The links in this column will connect you with more detailed
yourself. Each step may use results from previous steps, so please be
information about each analysis step from the case study
patient. Wait until the software verifies that the current step is
description.
complete before clicking on the next step.
3. Nonlinear fit of the ultrasonic response 3. The nonlinear fit was carried out.
4. Generate a 6-plot for model 4. The 6-plot shows that the model
validation. assumptions are satisfied except for
the non-homogeneous variances.
3. Nonlinear fit of transformed data. 3. Carry out the fit on the transformed
Plot predicted values with the data. The plot of the predicted
data. values overlaid with the data
indicates a good fit.
2. Plot residuals from fit to determine 2. The residuals from this fit
appropriate weight function. indicate no major problems.
4. Generate a 6-plot for model 4. The 6-plot shows that the model
validation. assumptions are satisfied.
1. Plot predicted values from each 1. The transformed and weighted fits
of the three models with the generate only slightly different
data. predicted values, but the model
assumptions are not violated.
4. Process Modeling
4.6. Case Studies in Process Modeling
4. Process Modeling
4.6. Case Studies in Process Modeling
4.6.4. Thermal Expansion of Copper Case Study
Resulting
Data Coefficient
of Thermal Temperature
Expansion (Degrees
of Copper Kelvin)
---------------------------
0.591 24.41
1.547 34.82
2.902 44.09
2.894 45.07
4.703 54.98
6.307 65.51
7.030 70.53
7.898 75.70
9.470 89.57
9.484 91.14
10.072 96.40
10.163 97.19
11.615 114.26
12.005 120.25
12.478 127.08
12.982 133.55
12.970 133.61
13.926 158.67
14.452 172.74
14.404 171.31
15.190 202.14
15.550 220.55
15.528 221.05
15.499 221.39
16.131 250.99
16.438 268.99
16.387 271.80
16.549 271.97
16.872 321.31
16.830 321.69
16.926 330.14
16.907 333.03
16.966 333.47
17.060 340.77
17.122 345.65
17.311 373.11
17.355 373.79
17.668 411.82
17.767 419.51
17.803 421.59
17.765 422.02
17.768 422.47
17.736 422.61
17.858 441.75
17.877 447.41
17.912 448.70
18.046 472.89
18.085 476.69
18.291 522.47
18.357 522.62
18.426 524.43
18.584 546.75
18.610 549.53
18.870 575.29
18.795 576.00
19.111 625.55
0.367 20.15
0.796 28.78
0.892 29.57
1.903 37.41
2.150 39.12
3.697 50.24
5.870 61.38
6.421 66.25
7.422 73.42
9.944 95.52
11.023 107.32
11.870 122.04
12.786 134.03
14.067 163.19
13.974 163.48
14.462 175.70
14.464 179.86
15.381 211.27
15.483 217.78
15.590 219.14
16.075 262.52
16.347 268.01
16.181 268.62
16.915 336.25
17.003 337.23
16.978 339.33
17.756 427.38
17.808 428.58
17.868 432.68
18.481 528.99
18.486 531.08
19.090 628.34
16.062 253.24
16.337 273.13
16.345 273.66
16.388 282.10
17.159 346.62
17.116 347.19
17.164 348.78
17.123 351.18
17.979 450.10
17.974 450.35
18.007 451.92
17.993 455.56
18.523 552.22
18.669 553.56
18.617 555.74
19.371 652.59
19.330 656.20
0.080 14.13
0.248 20.41
1.089 31.30
1.418 33.84
2.278 39.70
3.624 48.83
4.574 54.50
5.556 60.41
7.267 72.77
7.695 75.25
9.136 86.84
9.959 94.88
9.957 96.40
11.600 117.37
13.138 139.08
13.564 147.73
13.871 158.63
13.994 161.84
14.947 192.11
15.473 206.76
15.379 209.07
15.455 213.32
15.908 226.44
16.114 237.12
17.071 330.90
17.135 358.72
17.282 370.77
17.368 372.72
17.483 396.24
17.764 416.59
18.185 484.02
18.271 495.47
18.236 514.78
18.237 515.65
18.523 519.47
18.627 544.47
18.665 560.11
19.086 620.77
0.214 18.97
0.943 28.93
1.429 33.91
2.241 40.03
2.951 44.66
3.782 49.87
4.757 55.16
5.602 60.90
7.169 72.08
8.920 85.15
10.055 97.06
12.035 119.63
12.861 133.27
13.436 143.84
14.167 161.91
14.755 180.67
15.168 198.44
15.651 226.86
15.746 229.65
16.216 258.27
16.445 273.77
16.965 339.15
17.121 350.13
17.206 362.75
17.250 371.03
17.339 393.32
17.793 448.53
18.123 473.78
18.49 511.12
18.566 524.70
18.645 548.75
18.706 551.64
18.924 574.02
19.100 623.86
0.375 21.46
0.471 24.33
1.504 33.43
2.204 39.22
2.813 44.18
4.765 55.02
9.835 94.33
10.040 96.44
11.946 118.82
12.596 128.48
13.303 141.94
13.922 156.92
14.440 171.65
14.951 190.00
15.627 223.26
15.639 223.88
15.814 231.50
16.315 265.05
16.334 269.44
16.430 271.78
16.423 273.46
17.024 334.61
17.009 339.79
17.165 349.52
17.134 358.18
17.349 377.98
17.576 394.77
17.848 429.66
18.090 468.22
18.276 487.27
18.404 519.54
18.519 523.03
19.133 612.99
19.074 638.59
19.239 641.36
19.280 622.05
19.101 631.50
19.398 663.97
19.252 646.90
19.890 748.29
20.007 749.21
19.929 750.14
19.268 647.04
19.324 646.89
20.049 746.90
20.107 748.43
20.062 747.35
20.065 749.27
19.286 647.61
19.972 747.78
20.088 750.51
20.743 851.37
20.830 845.97
20.935 847.54
21.035 849.93
20.930 851.61
21.074 849.75
21.085 850.98
20.935 848.23
4. Process Modeling
4.6. Case Studies in Process Modeling
4.6.4. Thermal Expansion of Copper Case Study
Polynomial Historically, polynomial models are among the most frequently used
Models empirical models for fitting functions. These models are popular for the
following reasons.
1. Polynomial models have a simple form.
2. Polynomial models have well known and understood properties.
3. Polynomial models have moderate flexibility of shapes.
4. Polynomial models are a closed family. Changes of location and scale
in the raw data result in a polynomial model being mapped to a
polynomial model. That is, polynomial models are not dependent on
the underlying metric.
5. Polynomial models are computationally easy to use.
However, polynomial models also have the following limitations.
1. Polynomial models have poor interpolatory properties. High-degree
polynomials are notorious for oscillations between exact-fit values.
2. Polynomial models have poor extrapolatory properties. Polynomials
may provide good fits within the range of data, but they will
frequently deteriorate rapidly outside the range of the data.
3. Polynomial models have poor asymptotic properties. By their nature,
polynomials have a finite response for finite values and have an
infinite response if and only if the value is infinite. Thus
polynomials may not model asympototic phenomena very well.
4. Polynomial models have a shape/degree tradeoff. In order to model
data with a complicated structure, the degree of the model must be
high, indicating and the associated number of parameters to be
estimated will also be high. This can result in highly unstable models.
4. Process Modeling
4.6. Case Studies in Process Modeling
4.6.4. Thermal Expansion of Copper Case Study
This plot initially shows a fairly steep slope that levels off to a more gradual slope. This type of
curve can often be modeled with a rational function model.
The plot also indicates that there do not appear to be any outliers in this data.
4. Process Modeling
4.6. Case Studies in Process Modeling
4.6.4. Thermal Expansion of Copper Case Study
TEMP THERMEXP
---- --------
10 0
50 5
120 12
200 15
800 20
Exact Dataplot generated the following output from the EXACT RATIONAL FIT command. The
Rational output has been edited for display.
Fit Output
EXACT RATIONAL FUNCTION FIT
NUMBER OF POINTS IN FIRST SET = 5
DEGREE OF NUMERATOR = 2
DEGREE OF DENOMINATOR = 2
The important information in this output are the estimates for A0, A1, A2, B1, and B2 (B0 is
always set to 1). These values are used as the starting values for the fit in the next section.
Nonlinear Dataplot generated the following output for the nonlinear fit. The output has been edited for
Fit Output display.
Plot of We generate a plot of the fitted rational function model with the raw data.
Q/Q
Rational
Function
Fit
Looking at the fitted function with the raw data appears to show a reasonable fit.
6-Plot for
Model
Validation
Although the plot of the fitted function with the raw data appears to show a reasonable fit, we
need to validate the model assumptions. The 6-plot is an effective tool for this purpose.
The plot of the residuals versus the predictor variable temperature (row 1, column 2) and of the
residuals versus the predicted values (row 1, column 3) indicate a distinct pattern in the residuals.
This suggests that the assumption of random errors is badly violated.
Residual
Plot
The full-sized residual plot clearly shows the distinct pattern in the residuals. When residuals
exhibit a clear pattern, the corresponding errors are probably not random.
4. Process Modeling
4.6. Case Studies in Process Modeling
4.6.4. Thermal Expansion of Copper Case Study
TEMP THERMEXP
---- --------
10 0
30 2
40 3
50 5
120 12
200 15
800 20
Exact Dataplot generated the following output from the exact rational fit command. The output has been
Rational edited for display.
Fit Output
EXACT RATIONAL FUNCTION FIT
NUMBER OF POINTS IN FIRST SET = 7
DEGREE OF NUMERATOR = 3
DEGREE OF DENOMINATOR = 3
NUMERATOR --A0 A1 A2 A3 =
-0.2322993E+01 0.3528976E+00 -0.1382551E-01
0.1765684E-03
DENOMINATOR--B0 B1 B2 B3 =
0.1000000E+01 -0.3394208E-01 0.1099545E-03
0.7905308E-05
The important information in this output are the estimates for A0, A1, A2, A3, B1, B2, and B3
(B0 is always set to 1). These values are used as the starting values for the fit in the next section.
Nonlinear Dataplot generated the following output for the nonlinear fit. The output has been edited for
Fit Output display.
Plot of We generate a plot of the fitted rational function model with the raw data.
C/C
Rational
Function
Fit
The fitted function with the raw data appears to show a reasonable fit.
6-Plot for
Model
Validation
Although the plot of the fitted function with the raw data appears to show a reasonable fit, we
need to validate the model assumptions. The 6-plot is an effective tool for this purpose.
The 6-plot indicates no significant violation of the model assumptions. That is, the errors appear
to have constant location and scale (from the residual plot in row 1, column 2), seem to be
random (from the lag plot in row 2, column 1), and approximated well by a normal distribution
(from the histogram and normal probability plots in row 2, columns 2 and 3).
Residual
Plot
The full-sized residual plot suggests that the assumptions of constant location and scale for the
errors are valid. No distinguishing pattern is evident in the residuals.
Conclusion We conclude that the cubic/cubic rational function model does in fact provide a satisfactory
model for this data set.
4. Process Modeling
4.6. Case Studies in Process Modeling
4.6.4. Thermal Expansion of Copper Case Study
Click on the links below to start Dataplot and run this case
study yourself. Each step may use results from previous The links in this column will connect you with more detailed
steps, so please be patient. Wait until the software verifies information about each analysis step from the case study
that the current step is complete before clicking on the next description.
step.
1. Perform the Q/Q fit and plot the 1. The model parameters are estimated.
predicted values with the raw data. The plot of the predicted values with
the raw data seems to indicate a
reasonable fit.
1. Perform the C/C fit and plot the 1. The model parameters are estimated.
predicted values with the raw data. The plot of the predicted values with
the raw data seems to indicate a
reasonable fit.
4. Process Modeling
Stigler, S.M. (1978) "Mathematical Statistics in the Early States," The Annals of
Statistics, Vol. 6, pp. 239-265.
Stigler, S.M. (1986) The History of Statistics: The Measurement of Uncertainty Before
1900, The Belknap Press of Harvard University Press, Cambridge, Massachusetts.
4. Process Modeling
4. Process Modeling
4.8. Some Useful Functions for Process Modeling
Contents of 1. Polynomials
Section 8.1 2. Rational Functions
4. Process Modeling
4.8. Some Useful Functions for Process Modeling
4.8.1. Univariate Functions
Polynomial Historically, polynomial models are among the most frequently used
Models: empirical models for fitting functions. These models are popular for the
Advantages following reasons.
1. Polynomial models have a simple form.
2. Polynomial models have well known and understood properties.
3. Polynomial models have moderate flexibility of shapes.
4. Polynomial models are a closed family. Changes of location and scale
in the raw data result in a polynomial model being mapped to a
polynomial model. That is, polynomial models are not dependent on
the underlying metric.
5. Polynomial models are computationally easy to use.
Example The load cell calibration case study contains an example of fitting a
quadratic polynomial model.
4. Process Modeling
4.8. Some Useful Functions for Process Modeling
4.8.1. Univariate Functions
4.8.1.1. Polynomial Functions
Function:
Function
Family: Polynomial
Statistical
Type: Linear
Domain:
Range:
Special
Features: None
Additional
Examples: None
4. Process Modeling
4.8. Some Useful Functions for Process Modeling
4.8.1. Univariate Functions
4.8.1.1. Polynomial Functions
Function:
Function
Family: Polynomial
Statistical
Type: Linear
Domain:
Range:
Special
Features: None
Additional
Examples:
4. Process Modeling
4.8. Some Useful Functions for Process Modeling
4.8.1. Univariate Functions
4.8.1.1. Polynomial Functions
Function:
Function
Family: Polynomial
Statistical
Type: Linear
Domain:
Range:
Special
Features: None
Additional
Examples:
4. Process Modeling
4.8. Some Useful Functions for Process Modeling
4.8.1. Univariate Functions
detect by a simple plot of the fitted function over the range of the
data. Such asymptotes should not discourage you from considering
rational function models as a choice for empirical modeling. These
nuisance asymptotes occur occasionally and unpredictably, but the
gain in flexibility of shapes is well worth the chance that they may
occur.
The subset of points should be selected over the range of the data. It is not
critical which points are selected, although you should avoid points that are
obvious outliers.
Example The thermal expansion of copper case study contains an example of fitting a
rational function model.
4. Process Modeling
4.8. Some Useful Functions for Process Modeling
4.8.1. Univariate Functions
4.8.1.2. Rational Functions
Function:
Function
Family: Rational
Statistical
Type: Nonlinear
Domain:
Range:
Special
Features: Horizontal asymptote at:
Additional
Examples:
4. Process Modeling
4.8. Some Useful Functions for Process Modeling
4.8.1. Univariate Functions
4.8.1.2. Rational Functions
Function:
Function
Family: Rational
Statistical
Type: Nonlinear
Domain:
Range:
Special
Features: Horizontal asymptote at:
Additional
Examples:
4. Process Modeling
4.8. Some Useful Functions for Process Modeling
4.8.1. Univariate Functions
4.8.1.2. Rational Functions
Function:
Function Rational
Family:
Statistical Nonlinear
Type:
Domain:
Range:
Additional
Examples:
4. Process Modeling
4.8. Some Useful Functions for Process Modeling
4.8.1. Univariate Functions
4.8.1.2. Rational Functions
Function:
Function Rational
Family:
Statistical
Nonlinear
Type:
Domain:
Range:
with
and
Additional
Examples:
4. Process Modeling
4.8. Some Useful Functions for Process Modeling
4.8.1. Univariate Functions
4.8.1.2. Rational Functions
Function:
Function Rational
Family:
Statistical Nonlinear
Type:
Domain:
Range: The range is complicated and depends on the specific values of 1, ..., 5.
Additional
Examples:
4. Process Modeling
4.8. Some Useful Functions for Process Modeling
4.8.1. Univariate Functions
4.8.1.2. Rational Functions
Function:
Function
Family: Rational
Statistical
Type: Nonlinear
Domain:
Range:
Special
Features: Vertical asymptote at:
Additional
Examples:
4. Process Modeling
4.8. Some Useful Functions for Process Modeling
4.8.1. Univariate Functions
4.8.1.2. Rational Functions
Function:
Function
Family: Rational
Statistical
Type: Nonlinear
Domain:
Range:
Additional
Examples:
4. Process Modeling
4.8. Some Useful Functions for Process Modeling
4.8.1. Univariate Functions
4.8.1.2. Rational Functions
Function:
Function
Family: Rational
Statistical
Type: Nonlinear
Domain:
Range:
Additional
Examples:
4. Process Modeling
4.8. Some Useful Functions for Process Modeling
4.8.1. Univariate Functions
4.8.1.2. Rational Functions
Function:
Function
Family: Rational
Statistical
Type: Nonlinear
Domain:
Range:
Additional
Examples:
4. Process Modeling
4.8. Some Useful Functions for Process Modeling
4.8.1. Univariate Functions
4.8.1.2. Rational Functions
Function:
Function
Family: Rational
Statistical
Type: Nonlinear
Domain:
Range:
Additional
Examples:
4. Process Modeling
4.8. Some Useful Functions for Process Modeling
4.8.1. Univariate Functions
4.8.1.2. Rational Functions
Four To answer the above broad question, the following four specific questions need to be answered.
Questions 1. What value should the function have at x = ? Specifically, is the value zero, a constant,
or plus or minus infinity?
2. What slope should the function have at x = ? Specifically, is the derivative of the
function zero, a constant, or plus or minus infinity?
3. How many times should the function equal zero (i.e., f (x) = 0) for finite x?
4. How many times should the slope equal zero (i.e., f '(x) = 0) for finite x?
These questions are answered by the analyst by inspection of the data and by theoretical
considerations of the phenomenon under study.
Each of these questions is addressed separately below.
then asymptotically
Question 2: The slope is determined by the derivative of a function. The derivative of a rational function is
What Slope
Should the
Function
Have at x = with
?
Asymptotically
Question 3: For fintite x, R(x) = 0 only when the numerator polynomial, Pn, equals zero.
How Many
Times Should The numerator polynomial, and thus R(x) as well, can have between zero and n real roots. Thus,
the Function for a given n, the number of real roots of R(x) is less than or equal to n.
Equal Zero Conversely, if the fitted function f(x) is such that, for finite x, the number of times f(x) = 0 is k3,
for Finite ?
then n is greater than or equal to k3.
Question 4: The derivative function, R'(x), of the rational function will equal zero when the numerator
How Many polynomial equals zero. The number of real roots of a polynomial is between zero and the degree
Times Should of the polynomial.
the Slope
Equal Zero For n not equal to m, the numerator polynomial of R'(x) has order n+m-1. For n equal to m, the
for Finite ? numerator polynomial of R'(x) has order n+m-2.
From this it follows that
● if n m, the number of real roots of R'(x), k4, n+m-1.
● if n = m, the number of real roots of R'(x), k4, is n+m-2.
Conversely, if the fitted function f(x) is such that, for finite x and n m, the number of times f'(x)
= 0 is k4, then n+m-1 is k4. Similarly, if the fitted function f(x) is such that, for finite x and n =
m, the number of times f'(x) = 0 is k4, then n+m-2 k4.
Tables for In summary, we can determine the admissible combinations of n and m by using the following
Determining four tables to generate an n versus m graph. Choose the simplest (n,m) combination for the
Admissible degrees of the intial rational function model.
Combinations
of m and n 1. Desired value of f( ) Relation of n to m
0 n<m
constant n=m
n>m
0 n<m+1
constant n = m +1
n>m+1
k3 n k3
k4 (n m) n (1 + k4) - m
k4 (n = m) n (2 + k4) - m
Examples for The goal is to go from a sample data set to a specific rational function. The graphs below
Determing m summarize some common shapes that rational functions can have and shows the admissible
and n values and the simplest case for n and m. We typically start with the simplest case. If the model
validation indicates an inadequate model, we then try other rational functions in the admissible
region.
Shape 1
Shape 2
Shape 3
Shape 4
Shape 5
Shape 6
Shape 7
Shape 8
Shape 9
Shape 10
5. Process Improvement
1. Introduction 2. Assumptions
1. Definition of experimental design 1. Measurement system capable
2. Uses 2. Process stable
3. Steps 3. Simple model
4. Residuals well-behaved
1. Introduction [5.1.]
1. What is experimental design? [5.1.1.]
2. What are the uses of DOE? [5.1.2.]
3. What are the steps of DOE? [5.1.3.]
2. Assumptions [5.2.]
1. Is the measurement system capable? [5.2.1.]
2. Is the process stable? [5.2.2.]
3. Is there a simple model? [5.2.3.]
4. Are the model residuals well-behaved? [5.2.4.]
8. References [5.8.]
5. Process Improvement
5.1. Introduction
This section This section introduces the basic concepts, terminology, goals and
describes procedures underlying the proper statistical design of experiments.
the basic Design of experiments is abbreviated as DOE throughout this chapter
concepts of (an alternate abbreviation, DEX, is used in DATAPLOT).
the Design
of Topics covered are:
Experiments ● What is experimental design or DOE?
(DOE or ● What are the goals or uses of DOE?
DEX)
● What are the steps in DOE?
5. Process Improvement
5.1. Introduction
Black box It is common to begin with a process model of the `black box' type, with several
process discrete or continuous input factors that can be controlled--that is, varied at will
model by the experimenter--and one or more measured output responses. The output
responses are assumed continuous. Experimental data are used to derive an
empirical (approximation) model linking the outputs and inputs. These empirical
models generally contain first and second-order terms.
Often the experiment has to account for a number of uncontrolled factors that
may be discrete, such as different machines or operators, and/or continuous such
as ambient temperature or humidity. Figure 1.1 illustrates this situation.
Schematic
for a typical
process with
controlled
inputs,
outputs,
discrete
uncontrolled
factors and
continuous
uncontrolled
factors
Models for The most common empirical models fit to the experimental data take either a
DOE's linear form or quadratic form.
Linear model A linear model with two factors, X1 and X2, can be written as
Here, Y is the response for given levels of the main effects X1 and X2 and the
X1X2 term is included to account for a possible interaction effect between X1 and
X2. The constant 0 is the response of Y when both main effects are 0.
For a more complicated example, a linear model with three factors X1, X2, X3
and one response, Y, would look like (if all possible terms were included in the
model)
The three terms with single "X's" are the main effects terms. There are k(k-1)/2 =
3*2/2 = 3 two-way interaction terms and 1 three-way interaction term (which is
often omitted, for simplicity). When the experimental data are analyzed, all the
unknown " " parameters are estimated and the coefficients of the "X" terms are
tested to see which ones are significantly different from 0.
.
Note: Clearly, a full model could include many cross-product (or interaction)
terms involving squared X's. However, in general these terms are not needed and
most DOE software defaults to leaving them out of the model.
5. Process Improvement
5.1. Introduction
A common Supplier A vs. supplier B? Which new additive is the most effective? Is catalyst `x' an
use is improvement over the existing catalyst? These and countless other choices between alternatives
planning an can be presented to us in a never-ending parade. Often we have the choice made for us by outside
experiment factors over which we have no control. But in many cases we are also asked to make the choice.
to gather It helps if one has valid data to back up one's decision.
data to make
a decision The preferred solution is to agree on a measurement by which competing choices can be
between two compared, generate a sample of data from each alternative, and compare average results. The
or more 'best' average outcome will be our preference. We have performed a comparative experiment!
alternatives
Types of Sometimes this comparison is performed under one common set of conditions. This is a
comparitive comparative study with a narrow scope - which is suitable for some initial comparisons of
studies possible alternatives. Other comparison studies, intended to validate that one alternative is
perferred over a wide range of conditions, will purposely and systematically vary the background
conditions under which the primary comparison is made in order to reach a conclusion that will
be proven valid over a broad scope. We discuss experimental designs for each of these types of
comparisons in Sections 5.3.3.1 and 5.3.3.2.
Selecting the Often there are many possible factors, some of which may be critical and others which may have
few that little or no effect on a response. It may be desirable, as a goal by itself, to reduce the number of
matter from factors to a relatively small set (2-5) so that attention can be focussed on controlling those factors
the many with appropriate specifications, control charts, etc.
possible
factors Screening experiments are an efficient way, with a minimal number of runs, of determining the
important factors. They may also be used as a first step when the ultimate goal is to model a
response with a response surface. We will discuss experimental designs for screening a large
number of factors in Sections 5.3.3.3, 5.3.3.4 and 5.3.3.5.
Some Once one knows the primary variables (factors) that affect the responses of interest, a number of
reasons to additional objectives may be pursued. These include:
model a ● Hitting a Target
process
● Maximizing or Minimizing a Response
● Reducing Variation
● Making a Process Robust
● Seeking Multiple Goals
What each of these purposes have in common is that experimentation is used to fit a model that
may permit a rough, local approximation to the actual surface. Given that the particular objective
can be met with such an approximate model, the experimental effort is kept to a minimum while
still achieving the immediate goal.
These response surface modeling objectives will now be briefly expanded upon.
Hitting a Target
Optimizing a Many processes are being run at sub-optimal settings, some of them for years, even though each
process factor has been optimized individually over time. Finding settings that increase yield or decrease
output is a the amount of scrap and rework represent opportunities for substantial financial gain. Often,
common however, one must experiment with multiple inputs to achieve a better output. Section 5.3.3.6 on
goal second-order designs plus material in Section 5.5.3 will be useful for these applications.
Reducing Variation
Processes A process may be performing with unacceptable consistency, meaning its internal variation is too
that are on high.
target, on
the average, Excessive variation can result from many causes. Sometimes it is due to the lack of having or
may still following standard operating procedures. At other times, excessive variation is due to certain
have too hard-to-control inputs that affect the critical output characteristics of the process. When this latter
much situation is the case, one may experiment with these hard-to-control factors, looking for a region
variability where the surface is flatter and the process is easier to manage. To take advantage of such flatness
in the surface, one must use designs - such as the second-order designs of Section 5.3.3.6 - that
permit identification of these features. Contour or surface plots are useful for elucidating the key
features of these fitted models. See also 5.5.3.1.4.
Graph of
data before
variation
reduced
It might be possible to reduce the variation by altering the setpoints (recipe) of the process, so that
it runs in a more `stable' region.
Graph of
data after
process
variation
reduced
Finding this new recipe could be the subject of an experiment, especially if there are many input
factors that could conceivably affect the output.
The less a An item designed and made under controlled conditions will be later `field tested' in the hands of
process or the customer and may prove susceptible to failure modes not seen in the lab or thought of by
product is design. An example would be the starter motor of an automobile that is required to operate under
affected by extremes of external temperature. A starter that performs under such a wide range is termed
external `robust' to temperature.
conditions,
the better it Designing an item so that it is robust calls for a special experimental effort. It is possible to stress
is - this is the item in the design lab and so determine the critical components affecting its performance. A
called different gauge of armature wire might be a solution to the starter motor, but so might be many
"Robustness" other alternatives. The correct combination of factors can be found only by experimentation.
Sometimes A product or process seldom has just one desirable output characteristic. There are usually
we have several, and they are often interrelated so that improving one will cause a deterioration of another.
multiple For example: rate vs. consistency; strength vs. expense; etc.
outputs and
we have to Any product is a trade-off between these various desirable final characteristics. Understanding the
compromise boundaries of the trade-off allows one to make the correct choices. This is done by either
to achieve constructing some weighted objective function (`desirability function') and optimizing it, or
desirable examining contour plots of responses generated by a computer program, as given below.
outcomes -
DOE can
help here
Sample
contour plot
of deposition
rate and
capability
FIGURE 1.4 Overlaid contour plot of Deposition Rate and Capability (Cp)
Regression Modeling
Regression Sometimes we require more than a rough approximating model over a local region. In such cases,
models the standard designs presented in this chapter for estimating first- or second-order polynomial
(Chapter 4) models may not suffice. Chapter 4 covers the topic of experimental design and analysis for fitting
are used to general models for a single explanatory factor. If one has multiple factors, and either a nonlinear
fit more model or some other special model, the computer-aided designs of Section 5.5.2 may be useful.
precise
models
5. Process Improvement
5.1. Introduction
● Watch out for process drifts and shifts during the run.
Planning to It is often a mistake to believe that `one big experiment will give the
do a sequence answer.'
of small
experiments is A more useful approach to experimental design is to recognize that
often better while one experiment might provide a useful result, it is more
than relying common to perform two or three, or maybe more, experiments before
on one big a complete answer is attained. In other words, an iterative approach is
experiment to best and, in the end, most economical. Putting all one's eggs in one
give you all basket is not advisable.
the answers
Each stage The reason an iterative approach frequently works best is because it is
provides logical to move through stages of experimentation, each stage
insight for providing insight as to how the next experiment should be run.
next stage
5. Process Improvement
5.2. Assumptions
We should In all model building we make assumptions, and we also require
check the certain conditions to be approximately met for purposes of estimation.
engineering This section looks at some of the engineering and mathematical
and assumptions we typically make. These are:
model-building ● Are the measurement systems capable for all of your
assumptions responses?
that are made
in most DOE's ● Is your process stable?
● Are your responses likely to be approximated well by simple
polynomial models?
● Are the residuals (the difference between the model predictions
and the actual observations) well behaved?
5. Process Improvement
5.2. Assumptions
5. Process Improvement
5.2. Assumptions
5. Process Improvement
5.2. Assumptions
Examples of
piecewise
smooth and
jagged
responses
5. Process Improvement
5.2. Assumptions
Residuals are Residuals can be thought of as elements of variation unexplained by the fitted model. Since this is
elements of a form of error, the same general assumptions apply to the group of residuals that we typically use
variation for errors in general: one expects them to be (roughly) normal and (approximately) independently
unexplained distributed with a mean of 0 and some constant variance.
by fitted
model
Assumptions These are the assumptions behind ANOVA and classical regression analysis. This means that an
for residuals analyst should expect a regression model to err in predicting a response in a random fashion; the
model should predict values higher than actual and lower than actual with equal probability. In
addition, the level of the error should be independent of when the observation occurred in the
study, or the size of the observation being predicted, or even the factor settings involved in
making the prediction. The overall pattern of the residuals should be similar to the bell-shaped
pattern observed when plotting a histogram of normally distributed data.
We emphasize the use of graphical methods to examine residuals.
Departures Departures from these assumptions usually mean that the residuals contain structure that is not
indicate accounted for in the model. Identifying that structure and adding term(s) representing it to the
inadequate original model leads to a better model.
model
Plots for Any graph suitable for displaying the distribution of a set of data is suitable for judging the
examining normality of the distribution of a group of residuals. The three most common types are:
residuals 1. histograms,
2. normal probability plots, and
3. dot plots.
Histogram
The histogram is a frequency plot obtained by placing the data in regularly spaced cells and
plotting each cell frequency versus the center of the cell. Figure 2.2 illustrates an approximately
normal distribution of residuals produced by a model for a calibration process. We have
superimposed a normal density function on the histogram.
Small sample Sample sizes of residuals are generally small (<50) because experiments have limited treatment
sizes combinations, so a histogram is not be the best choice for judging the distribution of residuals. A
more sensitive graph is the normal probability plot.
The normal probability plot should produce an approximately straight line if the points come
from a normal distribution.
Sample Figure 2.3 below illustrates the normal probability graph created from the same group of residuals
normal used for Figure 2.2.
probability
plot with
overlaid dot
plot
This graph includes the addition of a dot plot. The dot plot is the collection of points along the left
y-axis. These are the values of the residuals. The purpose of the dot plot is to provide an
indication the distribution of the residuals.
"S" shaped Small departures from the straight line in the normal probability plot are common, but a clearly
curves "S" shaped curve on this graph suggests a bimodal distribution of residuals. Breaks near the
indicate middle of this graph are also indications of abnormalities in the residual distribution.
bimodal
distribution NOTE: Studentized residuals are residuals converted to a scale approximately representing the
standard deviation of an individual residual from the center of the residual distribution. The
technique used to convert residuals to this form produces a Student's t distribution of values.
Run sequence If the order of the observations in a data table represents the order of execution of each treatment
plot combination, then a plot of the residuals of those observations versus the case order or time order
of the observations will test for any time dependency. These are referred to as run sequence plots.
Sample run
sequence plot
that exhibits
a time trend
Sample run
sequence plot
that does not
exhibit a time
trend
Interpretation The residuals in Figure 2.4 suggest a time trend, while those in Figure 2.5 do not. Figure 2.4
of the sample suggests that the system was drifting slowly to lower values as the investigation continued. In
run sequence extreme cases a drift of the equipment will produce models with very poor ability to account for
plots the variability in the data (low R2).
If the investigation includes centerpoints, then plotting them in time order may produce a more
clear indication of a time trend if one exists. Plotting the raw responses in time sequence can also
sometimes detect trend changes in a process that residual plots might not detect.
Check for Plotting residuals versus the value of a fitted response should produce a distribution of points
increasing scattered randomly about 0, regardless of the size of the fitted value. Quite commonly, however,
residuals as residual values may increase as the size of the fitted value increases. When this happens, the
size of fitted residual cloud becomes "funnel shaped" with the larger end toward larger fitted values; that is, the
value residuals have larger and larger scatter as the value of the response increases. Plotting the
increases absolute values of the residuals instead of the signed values will produce a "wedge-shaped"
distribution; a smoothing function is added to each graph which helps to show the trend.
Sample
residuals
versus fitted
values plot
showing
increasing
residuals
Sample
residuals
versus fitted
values plot
that does not
show
increasing
residuals
Interpretation A residual distribution such as that in Figure 2.6 showing a trend to higher absolute residuals as
of the the value of the response increases suggests that one should transform the response, perhaps by
residuals modeling its logarithm or square root, etc., (contractive transformations). Transforming a
versus fitted response in this fashion often simplifies its relationship with a predictor variable and leads to
values plots simpler models. Later sections discuss transformation in more detail. Figure 2.7 plots the
residuals after a transformation on the response variable was used to reduce the scatter. Notice the
difference in scales on the vertical axes.
Sample
residuals
versus factor
setting plot
Sample
residuals
versus factor
setting plot
after adding
a quadratic
term
Interpreation Figure 2.8 shows that the size of the residuals changed as a function of a predictor's settings. A
of residuals graph like this suggests that the model needs a higher-order term in that predictor or that one
versus factor should transform the predictor using a logarithm or square root, for example. Figure 2.9 shows
setting plots the residuals for the same response after adding a quadratic term. Notice the single point widely
separated from the other residuals in Figure 2.9. This point is an "outlier." That is, its position is
well within the range of values used for this predictor in the investigation, but its result was
somewhat lower than the model predicted. A signal that curvature is present is a trace resembling
a "frown" or a "smile" in these graphs.
Sample
residuals
versus factor
setting plot
lacking one
or more
higher-order
terms
Interpretation The example given in Figures 2.8 and 2.9 obviously involves five levels of the predictor. The
of plot experiment utilized a response surface design. For the simple factorial design that includes center
points, if the response model being considered lacked one or more higher-order terms, the plot of
residuals versus factor settings might appear as in Figure 2.10.
Graph While the graph gives a definite signal that curvature is present, identifying the source of that
indicates curvature is not possible due to the structure of the design. Graphs generated using the other
prescence of predictors in that situation would have very similar appearances.
curvature
Additional Note: Residuals are an important subject discussed repeatedly in this Handbook. For example,
discussion of graphical residual plots using Dataplot are discussed in Chapter 1 and the general examination of
residual residuals as a part of model building is discussed in Chapter 4.
analysis
5. Process Improvement
2. Box-Behnken designs
3. Response surface design comparisons
4. Blocking a response surface design
7. Adding center points
8. Improving fractional design resolution
1. Mirror-image foldover designs
2. Alternative foldover designs
9. Three-level full factorial designs
10. Three-level, mixed level and fractional factorial designs
5. Process Improvement
5.3. Choosing an experimental design
Note that some authors prefer to restrict the term screening design
to the case where you are trying to extract the most important
factors from a large (say > 5) list of initial factors (usually a
fractional factorial design). We include the case with a smaller
Based on After identifying the objective listed above that corresponds most
objective, closely to your specific goal, you can
where to go ● proceed to the next section in which we discuss selecting
next experimental factors
and then
● select the appropriate design named in section 5.3.3 that suits
your objective (and follow the related links).
5. Process Improvement
5.3. Choosing an experimental design
Be careful We have to choose the range of the settings for input factors, and it is wise to give
when this some thought beforehand rather than just try extreme values. In some cases,
choosing extreme values will give runs that are not feasible; in other cases, extreme ranges
the might move one out of a smooth area of the response surface into some jagged
allowable region, or close to an asymptote.
range for
each factor
Two-level The most popular experimental designs are two-level designs. Why only two
designs levels? There are a number of good reasons why two is the most common choice
have just a amongst engineers: one reason is that it is ideal for screening designs, simple and
"high" and economical; it also gives most of the information required to go to a multilevel
a "low" response surface experiment if one is needed.
setting for
each factor
Matrix The standard layout for a 2-level design uses +1 and -1 notation to denote the
notation for "high level" and the "low level" respectively, for each factor. For example, the
describing matrix below
an Factor 1 (X1) Factor 2 (X2)
experiment
Trial 1 -1 -1
Trial 2 +1 -1
Trial 3 -1 +1
Trial 4 +1 +1
describes an experiment in which 4 trials (or runs) were conducted with each
factor set to high or low during a run according to whether the matrix had a +1 or
-1 set for the factor during that trial. If the experiment had more than 2 factors,
there would be an additional column in the matrix for each additional factor.
Note: Some authors shorten the matrix notation for a two-level design by just
recording the plus and minus signs, leaving out the "1's".
Coding the The use of +1 and -1 for the factor settings is called coding the data. This aids in
data the interpretation of the coefficients fit to any experimental model. After factor
settings are coded, center points have the value "0". Coding is described in more
detail in the DOE glossary.
Design If we add an "I" column and an "X1*X2" column to the matrix of 4 trials for a
matrices two-factor experiment described earlier, we obtain what is known as the model or
analysis matrix for this simple experiment, which is shown below. The model
matrix for a three-factor experiment is shown later in this section.
I X1 X2 X1*X2
+1 -1 -1 +1
+1 +1 -1 -1
+1 -1 +1 -1
+1 +1 +1 +1
Coding Coding is sometime called "orthogonal coding" since all the columns of a coded
produces 2-factor design matrix (except the "I" column) are typically orthogonal. That is,
orthogonal the dot product for any pair of columns is zero. For example, for X1 and X2:
columns (-1)(-1) + (+1)(-1) + (-1)(+1) + (+1)(+1) = 0.
5. Process Improvement
5.3. Choosing an experimental design
Types of Types of designs are listed here according to the experimental objective
designs are they meet.
listed here ● Comparative objective: If you have one or several factors under
according to investigation, but the primary goal of your experiment is to make
the a conclusion about one a-priori important factor, (in the presence
experimental of, and/or in spite of the existence of the other factors), and the
objective question of interest is whether or not that factor is "significant",
they meet (i.e., whether or not there is a significant change in the response
for different levels of that factor), then you have a comparative
problem and you need a comparative design solution.
● Screening objective: The primary purpose of the experiment is
to select or screen out the few important main effects from the
many less important ones. These screening designs are also
termed main effects designs.
● Response Surface (method) objective: The experiment is
designed to allow us to estimate interaction and even quadratic
effects, and therefore give us an idea of the (local) shape of the
response surface we are investigating. For this reason, they are
termed response surface method (RSM) designs. RSM designs are
used to:
❍ Find improved or optimal process settings
Mixture and Mixture designs are discussed briefly in section 5 (Advanced Topics)
regression and regression designs for a single factor are discussed in chapter 4.
designs Selection of designs for the remaining 3 objectives is summarized in the
following table.
Resources Choice of a design from within these various types depends on the
and degree amount of resources available and the degree of control over making
of control wrong decisions (Type I and Type II errors for testing hypotheses) that
over wrong the experimenter desires.
decisions
Save some It is a good idea to choose a design that requires somewhat fewer runs
runs for than the budget permits, so that center point runs can be added to check
center points for curvature in a 2-level screening design and backup resources are
and "redos" available to redo runs that have processing mishaps.
that might
be needed
5. Process Improvement
5.3. Choosing an experimental design
5.3.3. How do you select an experimental design?
Three key All completely randomized designs with one primary factor are
numbers defined by 3 numbers:
k = number of factors (= 1 for these designs)
L = number of levels
n = number of replications
and the total sample size (number of runs) is N = k x L x n.
Balance Balance dictates that the number of replications be the same at each
level of the factor (this will maximize the sensitivity of subsequent
statistical t (or F) tests).
5. Process Improvement
5.3. Choosing an experimental design
5.3.3. How do you select an experimental design?
Blocking used When we can control nuisance factors, an important technique known
for nuisance as blocking can be used to reduce or eliminate the contribution to
factors that experimental error contributed by nuisance factors. The basic concept
can be is to create homogeneous blocks in which the nuisance factors are held
controlled constant and the factor of interest is allowed to vary. Within blocks, it
is possible to assess the effect of different levels of the factor of
interest without having to worry about variations due to changes of the
block factors, which are accounted for in the analysis.
with
L1 = number of levels (settings) of factor 1
L2 = number of levels (settings) of factor 2
L3 = number of levels (settings) of factor 3
L4 = number of levels (settings) of factor 4
.
.
.
Furnace run The nuisance factor they are concerned with is "furnace run" since it is
is a nuisance known that each furnace run differs from the last and impacts many
factor process parameters.
Ideal would An ideal way to run this experiment would be to run all the 4x3=12
be to wafers in the same furnace run. That would eliminate the nuisance
eliminate furnace factor completely. However, regular production wafers have
nuisance furnace priority, and only a few experimental wafers are allowed into
furnace factor any furnace run at the same time.
Non-Blocked A non-blocked way to run this experiment would be to run each of the
method twelve experimental wafers, in random order, one per furnace run.
That would increase the experimental error of each resistivity
measurement by the run-to-run furnace variability and make it more
difficult to study the effects of the different dosages. The blocked way
to run this experiment, assuming you can convince manufacturing to
let you put four experimental wafers in a furnace run, would be to put
four wafers with different dosages in each of three furnace runs. The
only randomization would be choosing which of the three wafers with
dosage 1 would go into furnace run 1, and similarly for the wafers
with dosages 2, 3 and 4.
Description of Let X1 be dosage "level" and X2 be the blocking factor furnace run.
the Then the experiment can be described as follows:
experiment k = 2 factors (1 primary factor X1 and 1 blocking factor X2)
L1 = 4 levels of factor X1
L2 = 3 levels of factor X2
n = 1 replication per cell
N =L1 * L2 = 4 * 3 = 12 runs
Model for a The model for a randomized block design with one nuisance variable
randomized is
block design Yi,j = + Ti + Bj + random error
where
Yi,j is any observation for which X1 = i and X2 = j
X1 is the primary factor
X2 is the blocking factor
is the general location parameter (i.e., the mean)
Ti is the effect for being in treatment i (of factor X1)
Bj is the effect for being in block j (of factor X2)
5. Process Improvement
5.3. Choosing an experimental design
5.3.3. How do you select an experimental design?
5.3.3.2. Randomized block designs
Latin square The model for a response for a latin square design is
design model
and estimates
for effect with
levels Yijk denoting any observation for which
X1 = i, X2 = j, X3 = k
X1 and X2 are blocking factors
X3 is the primary factor
Randomize as Designs for Latin squares with 3-, 4-, and 5-level factors are given
much as next. These designs show what the treatment combinations should be
design allows for each run. When using any of these designs, be sure to randomize
the treatment units and trial order, as much as the design allows.
For example, one recommendation is that a Latin square design be
randomly selected from those available, then randomize the run order.
3 2 4
3 3 3
3 4 1
4 1 3
4 2 1
4 3 2
4 4 4
with
k = 3 factors (2 blocking factors and 1 primary factor)
L1 = 4 levels of factor X1 (block)
L2 = 4 levels of factor X2 (block)
L3 = 4 levels of factor X3 (primary)
N = L1 * L2 = 16 runs
This can alternatively be represented as
A B D C
D C A B
B D C A
C A B D
4 2 3
4 3 4
4 4 5
4 5 1
5 1 4
5 2 5
5 3 1
5 4 2
5 5 3
with
k = 3 factors (2 blocking factors and 1 primary factor)
L1 = 5 levels of factor X1 (block)
L2 = 5 levels of factor X2 (block)
L3 = 5 levels of factor X3 (primary)
N = L1 * L2 = 25 runs
This can alternatively be represented as
A B C D E
C D E A B
E A B C D
B C D E A
D E A B C
Further More details on Latin square designs can be found in Box, Hunter, and
information Hunter (1978).
5. Process Improvement
5.3. Choosing an experimental design
5.3.3. How do you select an experimental design?
5.3.3.2. Randomized block designs
Randomize Designs for 3-, 4-, and 5-level factors are given on this page. These
as much as designs show what the treatment combinations would be for each run.
design When using any of these designs, be sure to randomize the treatment
allows units and trial order, as much as the design allows.
For example, one recommendation is that a Graeco-Latin square design
be randomly selected from those available, then randomize the run
order.
with
k = 4 factors (3 blocking factors and 1 primary factor)
L1 = 3 levels of factor X1 (block)
L2 = 3 levels of factor X2 (block)
L3 = 3 levels of factor X3 (primary)
L4 = 3 levels of factor X4 (primary)
N = L1 * L2 = 9 runs
This can alternatively be represented as (A, B, and C represent the
treatment factor and 1, 2, and 3 represent the blocking factor):
A1 B2 C3
C2 A3 B1
B3 C1 A2
N = L1 * L2 = 16 runs
This can alternatively be represented as (A, B, C, and D represent the
treatment factor and 1, 2, 3, and 4 represent the blocking factor):
A1 B2 C3 D4
D2 C1 B4 A3
B3 A4 D1 C2
C4 D3 A2 B1
Further More designs are given in Box, Hunter, and Hunter (1978).
information
5. Process Improvement
5.3. Choosing an experimental design
5.3.3. How do you select an experimental design?
5.3.3.2. Randomized block designs
Randomize as Designs for 4- and 5-level factors are given on this page. These
much as designs show what the treatment combinations should be for each run.
design allows When using any of these designs, be sure to randomize the treatment
units and trial order, as much as the design allows.
For example, one recommendation is that a hyper-Graeco-Latin square
design be randomly selected from those available, then randomize the
run order.
3 2 1 4 3
3 3 4 1 2
3 4 3 2 1
4 1 3 4 2
4 2 4 3 1
4 3 1 2 4
4 4 2 1 3
with
k = 5 factors (4 blocking factors and 1 primary factor)
L1 = 4 levels of factor X1 (block)
L2 = 4 levels of factor X2 (block)
L3 = 4 levels of factor X3 (primary)
L4 = 4 levels of factor X4 (primary)
L5 = 4 levels of factor X5 (primary)
N = L1 * L2 = 16 runs
This can alternatively be represented as (A, B, C, and D represent the
treatment factor and 1, 2, 3, and 4 represent the blocking factors):
A11 B22 C33 D44
C42 D31 A24 B13
D23 C14 B41 A32
B34 A43 D12 C21
3 4 1 3 5
3 5 2 4 1
4 1 4 2 5
4 2 5 3 1
4 3 1 4 2
4 4 2 5 3
4 5 3 1 4
5 1 5 4 3
5 2 1 5 4
5 3 2 1 5
5 4 3 2 1
5 5 4 3 2
with
k = 5 factors (4 blocking factors and 1 primary factor)
L1 = 5 levels of factor X1 (block)
L2 = 5 levels of factor X2 (block)
L3 = 5 levels of factor X3 (primary)
L4 = 5 levels of factor X4 (primary)
L5 = 5 levels of factor X5 (primary)
N = L1 * L2 = 25 runs
This can alternatively be represented as (A, B, C, D, and E represent
the treatment factor and 1, 2, 3, 4, and 5 represent the blocking
factors):
A11 B22 C33 D44 E55
D23 E34 A45 B51 C12
B35 C41 D52 E31 A24
E42 A53 B14 C25 D31
C54 D15 E21 A32 B43
Further More designs are given in Box, Hunter, and Hunter (1978).
information
5. Process Improvement
5.3. Choosing an experimental design
5.3.3. How do you select an experimental design?
A design in A common experimental design is one with all input factors set at two
which every levels each. These levels are called `high' and `low' or `+1' and `-1',
setting of respectively. A design with all possible high/low combinations of all
every factor the input factors is called a full factorial design in two levels.
appears with
every setting If there are k factors, each at 2 levels, a full factorial design has 2k
of every other runs.
factor is a
full factorial TABLE 3.2 Number of Runs for a 2k Full Factorial
design
Number of Factors Number of Runs
2 4
3 8
4 16
5 32
6 64
7 128
Full factorial As shown by the above table, when the number of factors is 5 or
designs not greater, a full factorial design requires a large number of runs and is
recommended not very efficient. As recommended in the Design Guideline Table, a
for 5 or more fractional factorial design or a Plackett-Burman design is a better
factors choice for 5 or more factors.
5. Process Improvement
5.3. Choosing an experimental design
5.3.3. How do you select an experimental design?
5.3.3.3. Full factorial designs
Graphical Consider the two-level, full factorial design for three factors, namely
representation the 23 design. This implies eight runs (not counting replications or
of a two-level center point runs). Graphically, we can represent the 23 design by the
design with 3 cube shown in Figure 3.1. The arrows show the direction of increase of
factors the factors. The numbers `1' through `8' at the corners of the design
box reference the `Standard Order' of runs (see Figure 3.1).
FIGURE 3.1 A 23 two-level, full factorial design; factors X1, X2,
X3
1 -1 -1 -1
2 1 -1 -1
3 -1 1 -1
4 1 1 -1
5 -1 -1 1
6 1 -1 1
7 -1 1 1
8 1 1 1
The left-most column of Table 3.3, numbers 1 through 8, specifies a
(non-randomized) run order called the `Standard Order.' These
numbers are also shown in Figure 3.1. For example, run 1 is made at
the `low' setting of all three factors.
Rule for We can readily generalize the 23 standard order matrix to a 2-level full
writing a 2k factorial with k factors. The first (X1) column starts with -1 and
full factorial alternates in sign for all 2k runs. The second (X2) column starts with -1
in "standard repeated twice, then alternates with 2 in a row of the opposite sign
order" until all 2k places are filled. The third (X3) column starts with -1
repeated 4 times, then 4 repeats of +1's and so on. In general, the i-th
column (Xi) starts with 2i-1 repeats of -1 folowed by 2i-1 repeats of +1.
Example of a 23 Experiment
+1 -1 -1 +1 -1 +1 +1 -1 -3 -1
+1 +1 -1 -1 -1 -1 +1 +1 0 -1
+1 -1 +1 -1 -1 +1 -1 +1 -1 0
+1 +1 +1 +1 -1 -1 -1 -1 +2 +3
+1 -1 -1 +1 +1 -1 -1 +1 -1 0
+1 +1 -1 -1 +1 +1 -1 -1 +2 +1
+1 -1 +1 -1 +1 -1 +1 -1 +1 +1
+1 +1 +1 +1 +1 +1 +1 +1 +6 +5
The block with the 1's and -1's is called the Model Matrix or the
Analysis Matrix. The table formed by the columns X1, X2 and X3 is
called the Design Table or Design Matrix.
Eliminate When all factors have been coded so that the high value is "1" and the
correlation low value is "-1", the design matrix for any full (or suitably chosen
between fractional) factorial experiment has columns that are all pairwise
estimates of orthogonal and all the columns (except the "I" column) sum to 0.
main effects
and The orthogonality property is important because it eliminates
interactions correlation between the estimates of the main effects and interactions.
5. Process Improvement
5.3. Choosing an experimental design
5.3.3. How do you select an experimental design?
5.3.3.3. Full factorial designs
An example of The following is an example of a full factorial design with 3 factors that
a full factorial also illustrates replication, randomization, and added center points.
design with 3
factors Suppose that we wish to improve the yield of a polishing operation. The
three inputs (factors) that are considered important to the operation are
Speed (X1), Feed (X2), and Depth (X3). We want to ascertain the relative
importance of each of these factors on Yield (Y).
Speed, Feed and Depth can all be varied continuously along their
respective scales, from a low to a high setting. Yield is observed to vary
smoothly when progressive changes are made to the inputs. This leads us
to believe that the ultimate response surface for Y will be smooth.
Table of factor TABLE 3.5 High (+1), Low (-1), and Standard (0)
level settings Settings for a Polishing Operation
Low (-1) Standard (0) High (+1) Units
Speed 16 20 24 rpm
Feed 0.001 0.003 0.005 cm/sec
Depth 0.01 0.015 0.02 cm/sec
Factor Combinations
23 implies 8 Note that if we have k factors, each run at two levels, there will be 2k
runs different combinations of the levels. In the present case, k = 3 and 23 = 8.
Full Model Running the full complement of all possible factor combinations means
that we can estimate all the main and interaction effects. There are three
main effects, three two-factor interactions, and a three-factor interaction,
all of which appear in the full model as follows:
Standard order
Coded The numbering of the corners of the box in the last figure refers to a
variables in standard way of writing down the settings of an experiment called
standard order `standard order'. We see standard order displayed in the following tabular
representation of the eight-cornered box. Note that the factor settings have
been coded, replacing the low setting by -1 and the high setting by 1.
Replication
Replication Running the entire design more than once makes for easier data analysis
provides because, for each run (i.e., `corner of the design box') we obtain an
information on average value of the response as well as some idea about the dispersion
variability (variability, consistency) of the response at that setting.
Homogeneity One of the usual analysis assumptions is that the response dispersion is
of variance uniform across the experimental space. The technical term is
`homogeneity of variance'. Replication allows us to check this assumption
and possibly find the setting combinations that give inconsistent yields,
allowing us to avoid that area of the factor space.
Factor settings We now have constructed a design table for a two-level full factorial in
in standard three factors, replicated twice.
order with
replication TABLE 3.7 The 23 Full Factorial Replicated
Twice and Presented in Standard Order
Speed, X1 Feed, X2 Depth, X3
1 16, -1 .001, -1 .01, -1
2 24, +1 .001, -1 .01, -1
3 16, -1 .005, +1 .01, -1
4 24, +1 .005, +1 .01, -1
5 16, -1 .001, -1 .02, +1
6 24, +1 .001, -1 .02, +1
7 16, -1 .005, +1 .02, +1
8 24, +1 .005, +1 .02, +1
9 16, -1 .001, -1 .01, -1
10 24, +1 .001, -1 .01, -1
11 16, -1 .005, +1 .01, -1
12 24, +1 .005, +1 .01, -1
13 16, -1 .001, -1 .02, +1
14 24, +1 .001, -1 .02, +1
15 16, -1 .005, +1 .02, +1
16 24, +1 .005, +1 .02, +1
Randomization
No If we now ran the design as is, in the order shown, we would have two
randomization deficiencies, namely:
and no center 1. no randomization, and
points
2. no center points.
Randomization The more freely one can randomize experimental runs, the more insurance
provides one has against extraneous factors possibly affecting the results, and
protection hence perhaps wasting our experimental time and effort. For example,
against consider the `Depth' column: the settings of Depth, in standard order,
extraneous follow a `four low, four high, four low, four high' pattern.
factors
affecting the Suppose now that four settings are run in the day and four at night, and
results that (unknown to the experimenter) ambient temperature in the polishing
shop affects Yield. We would run the experiment over two days and two
nights and conclude that Depth influenced Yield, when in fact ambient
temperature was the significant influence. So the moral is: Randomize
experimental runs as much as possible.
Table of factor Here's the design matrix again with the rows randomized (using the
settings in RAND function of EXCEL). The old standard order column is also shown
randomized for comparison and for re-sorting, if desired, after the runs are in.
order
TABLE 3.8 The 23 Full Factorial Replicated
Twice with Random Run Order Indicated
Random Standard
Order Order X1 X2 X3
1 5 -1 -1 +1
2 15 -1 +1 +1
3 9 -1 -1 -1
4 7 -1 +1 +1
5 3 -1 +1 -1
6 12 +1 +1 -1
7 6 +1 -1 +1
8 4 +1 +1 -1
9 2 +1 -1 -1
10 13 -1 -1 +1
11 8 +1 +1 +1
12 16 +1 +1 +1
13 1 -1 -1 -1
14 14 +1 -1 +1
15 11 -1 +1 -1
16 10 +1 -1 -1
Table showing This design would be improved by adding at least 3 centerpoint runs
design matrix placed at the beginning, middle and end of the experiment. The final
with design matrix is shown below:
randomization
and center TABLE 3.9 The 23 Full Factorial Replicated
points Twice with Random Run Order Indicated and
Center Point Runs Added
Random Standard
Order Order X1 X2 X3
1 0 0 0
2 5 -1 -1 +1
3 15 -1 +1 +1
4 9 -1 -1 -1
5 7 -1 +1 +1
6 3 -1 +1 -1
7 12 +1 +1 -1
8 6 +1 -1 +1
9 0 0 0
10 4 +1 +1 -1
11 2 +1 -1 -1
12 13 -1 -1 +1
13 8 +1 +1 +1
14 16 +1 +1 +1
15 1 -1 -1 -1
16 14 +1 -1 +1
17 11 -1 +1 -1
18 10 +1 -1 -1
19 0 0 0
5. Process Improvement
5.3. Choosing an experimental design
5.3.3. How do you select an experimental design?
5.3.3.3. Full factorial designs
Blocking in a In this case, we need to divide our experiment into two halves (2
23 factorial blocks), one with the first raw material batch and the other with the
design new batch. The division has to balance out the effect of the materials
change in such a way as to eliminate its influence on the analysis, and
we do this by blocking.
Three-factor This works because we are in fact assigning the `estimation' of the
interaction (unwanted) blocking effect to the three-factor interaction, and because
confounded of the special property of two-level designs called orthogonality. That
with the block is, the three-factor interaction is "confounded" with the block effect as
effect will be seen shortly.
Orthogonality Orthogonality guarantees that we can always estimate the effect of one
factor or interaction clear of any influence due to any other factor or
interaction. Orthogonality is a very desirable property in DOE and this
is a major reason why two-level factorials are so popular and
successful.
Table Formally, consider the 23 design table with the three-factor interaction
showing column added.
blocking
scheme TABLE 3.10 Two Blocks for a 23 Design
SPEED FEED DEPTH BLOCK
X1 X2 X3 X1*X2*X3
-1 -1 -1 -1 I
+1 -1 -1 +1 II
-1 +1 -1 +1 II
+1 +1 -1 -1 I
-1 -1 +1 +1 II
+1 -1 +1 -1 I
-1 +1 +1 -1 I
+1 +1 +1 +1 II
Block by Rows that have a `-1' in the three-factor interaction column are
assigning the assigned to `Block I' (rows 1, 4, 6, 7), while the other rows are
"Block effect" assigned to `Block II' (rows 2, 3, 5, 8). Note that the Block I rows are
to a the open circle corners of the design `box' above; Block II are
high-order dark-shaded corners.
interaction
Most DOE The general rule for blocking is: use one or a combination of
software will high-order interaction columns to construct blocks. This gives us a
do blocking formal way of blocking complex designs. Apart from simple cases in
for you which you can design your own blocks, your statistical/DOE software
will do the blocking if asked, but you do need to understand the
principle behind it.
Block effects The price you pay for blocking by using high-order interaction
are columns is that you can no longer distinguish the high-order
confounded interaction(s) from the blocking effect - they have been `confounded,'
with higher- or `aliased.' In fact, the blocking effect is now the sum of the blocking
order effect and the high-order interaction effect. This is fine as long as our
interactions assumption about negligible high-order interactions holds true, which
it usually does.
Center points Within a block, center point runs are assigned as if the block were a
within a block separate experiment - which in a sense it is. Randomization takes place
within a block as it would for any non-blocked DOE.
5. Process Improvement
5.3. Choosing an experimental design
5.3.3. How do you select an experimental design?
Later The solution to this problem is to use only a fraction of the runs
sections will specified by the full factorial design. Which runs to make and which to
show how to leave out is the subject of interest here. In general, we pick a fraction
choose the such as ½, ¼, etc. of the runs called for by the full factorial. We use
"right" various strategies that ensure an appropriate choice of runs. The
fraction for following sections will show you how to choose an appropriate fraction
2-level of a full factorial design to suit your purpose at hand. Properly chosen
designs - fractional factorial designs for 2-level experiments have the desirable
these are properties of being both balanced and orthogonal.
both
balanced and
orthogonal
5. Process Improvement
5.3. Choosing an experimental design
5.3.3. How do you select an experimental design?
5.3.3.4. Fractional factorial designs
Tabular In tabular form, this design (also showing eight observations `yj'
representation (j = 1,...,8) is given by
of the design
TABLE 3.11 A 23 Two-level, Full Factorial Design Table Showing
Runs in `Standard Order,' Plus Observations (yj)
X1 X2 X3 Y
1 -1 -1 -1 y1 = 33
2 +1 -1 -1 y2 = 63
3 -1 +1 -1 y3 = 41
4 +1 +1 -1 Y4 = 57
5 -1 -1 +1 y5 = 57
6 +1 -1 +1 y6 = 51
7 -1 +1 +1 y7 = 59
8 +1 +1 +1 y8 = 53
Responses in The right-most column of the table lists `y1' through `y8' to indicate the
standard responses measured for the experimental runs when listed in standard
order order. For example, `y1' is the response (i.e., output) observed when
the three factors were all run at their `low' setting. The numbers
entered in the "y" column will be used to illustrate calculations of
effects.
Computing X1 From the entries in the table we are able to compute all `effects' such
main effect as main effects, first-order `interaction' effects, etc. For example, to
compute the main effect estimate `c1' of factor X1, we compute the
average response at all runs with X1 at the `high' setting, namely
(1/4)(y2 + y4 + y6 + y8), minus the average response of all runs with X1
set at `low,' namely (1/4)(y1 + y3 + y5 + y7). That is,
c1 = (1/4) (y2 + y4 + y6 + y8) - (1/4)(y1 + y3 + y5 + y7) or
c1 = (1/4)(63+57+51+53 ) - (1/4)(33+41+57+59) = 8.5
Example of For example, suppose we select only the four light (unshaded) corners
computing the of the design cube. Using these four runs (1, 4, 6 and 7), we can still
main effects compute c1 as follows:
using only c1 = (1/2) (y4 + y6) - (1/2) (y1 + y7) or
four runs
c1 = (1/2) (57+51) - (1/2) (33+59) = 8.
Simarly, we would compute c2, the effect due to X2, as
c2 = (1/2) (y4 + y7) - (1/2) (y1 + y6) or
c2 = (1/2) (57+59) - (1/2) (33+51) = 16.
Finally, the computation of c3 for the effect due to X3 would be
c3 = (1/2) (y6 + y7) - (1/2) (y1 + y4) or
c3 = (1/2) (51+59) - (1/2) (33+57) = 10.
Alternative We could also have used the four dark (shaded) corners of the design
runs for cube for our runs and obtained similiar, but slightly different,
computing estimates for the main effects. In either case, we would have used half
main effects the number of runs that the full factorial requires. The half fraction we
used is a new design written as 23-1. Note that 23-1 = 23/2 = 22 = 4,
which is the number of runs in this half-fraction design. In the next
section, a general method for choosing fractions that "work" will be
discussed.
choice.
On the other hand, if interactions are potentially large (and if the
replication required could be set aside), then the usual 23 full factorial
design (with center points) would serve as a good design.
5. Process Improvement
5.3. Choosing an experimental design
5.3.3. How do you select an experimental design?
5.3.3.4. Fractional factorial designs
Assign the This design has four runs, the right number for a half-fraction of a 23,
third factor but there is no column for factor X3. We need to add a third column to
to the take care of this, and we do it by adding the X1*X2 interaction column.
interaction This column is, as you will recall from full factorial designs,
column of a constructed by multiplying the row entry for X1 with that of X2 to
22 design obtain the row entry for X1*X2.
TABLE 3.13 A 22 Design Table
Augmented with the X1*X2
Interaction Column `X1*X2'
X1 X2 X1*X2
1 -1 -1 +1
2 +1 -1 -1
3 -1 +1 -1
4 +1 +1 +1
Design table We may now substitute `X3' in place of `X1*X2' in this table.
with X3 set
to X1*X2 TABLE 3.15 A 23-1 Design Table
with Column X3 set to X1*X2
X1 X2 X3
1 -1 -1 +1
2 +1 -1 -1
3 -1 +1 -1
4 +1 +1 +1
Design table Note that the rows of Table 3.14 give the dark-shaded corners of the
with X3 set design in Figure 3.4. If we had set X3 = -X1*X2 as the rule for
to -X1*X2 generating the third column of our 23-1 design, we would have obtained:
TABLE 3.15 A 23-1 Design Table
with Column X3 set to - X1*X2
X1 X2 X3
1 -1 -1 -1
2 +1 -1 +1
3 -1 +1 +1
4 +1 +1 -1
Main effect This design gives the light-shaded corners of the box of Figure 3.4. Both
estimates 23-1 designs that we have generated are equally good, and both save half
from the number of runs over the original 23 full factorial design. If c1, c2,
fractional
and c3 are our estimates of the main effects for the factors X1, X2, X3
factorial not
as good as (i.e., the difference in the response due to going from "low" to "high"
full factorial for an effect), then the precision of the estimates c1, c2, and c3 are not
quite as good as for the full 8-run factorial because we only have four
observations to construct the averages instead of eight; this is one price
we have to pay for using fewer runs.
Example Example: For the `Pressure (P), Table speed (T), and Down force (D)'
design situation of the previous example, here's a replicated 23-1 in
randomized run order, with five centerpoint runs (`000') interspersed
among the runs. This design table was constructed using the technique
discussed above, with D = P*T.
5. Process Improvement
5.3. Choosing an experimental design
5.3.3. How do you select an experimental design?
5.3.3.4. Fractional factorial designs
Sparsity of In using the 23-1 design, we also assume that c12 is small compared to c3;
effects this is called a `sparsity of effects' assumption. Our computation of c3 is in
assumption
fact a computation of c3 + c12. If the desired effects are only confounded
with non-significant interactions, then we are OK.
A short way A short way of writing `X3 = X1*X2' (understanding that we are talking
of writing about multiplying columns of the design table together) is: `3 = 12'
factor column (similarly 3 = -12 refers to X3 = -X1*X2). Note that `12' refers to column
multiplication multiplication of the kind we are using to construct the fractional design
and any column multiplied by itself gives the identity column of all 1's.
Principal Note: We can replace any design generator by its negative counterpart and
fraction have an equivalent, but different fractional design. The fraction generated
by positive design generators is sometimes called the principal fraction.
All main The confounding pattern described by 1=23, 2=13, and 3=12 tells us that
effects of 23-1 all the main effects of the 23-1 design are confounded with two-factor
design interactions. That is the price we pay for using this fractional design. Other
confounded fractional designs have different confounding patterns; for example, in the
with typical quarter-fraction of a 26 design, i.e., in a 26-2 design, main effects are
two-factor confounded with three-factor interactions (e.g., 5=123) and so on. In the
interactions case of 5=123, we can also readily see that 15=23 (etc.), which alerts us to
the fact that certain two-factor interactions of a 26-2 are confounded with
other two-factor interactions.
The next section will add one more item to the above box, and then we will
be able to select the right two-level fractional factorial design for a wide
range of experimental tasks.
5. Process Improvement
5.3. Choosing an experimental design
5.3.3. How do you select an experimental design?
5.3.3.4. Fractional factorial designs
28-3 design Figure 3.6 tells us that a 28-3 design has 32 runs, not including
has 32 runs centerpoint runs, and eight factors. There are three generators since this
is a 1/8 = 2-3 fraction (in general, a 2k-p fractional factorial needs p
generators which define the settings for p additional factor columns to
be added to the 2k-p full factorial design columns - see the following
detailed description for the 28-3 design).
Design We note further that the design generators, written in `I = ...' form, for
generators the principal 28-3 fractional factorial design are:
{ I = + 3456; I = + 12457; I = +12358 }.
These design generators result from multiplying the "6 = 345" generator
by "6" to obtain "I = 3456" and so on for the other two generqators.
"Defining The total collection of design generators for a factorial design, including
relation" for all new generators that can be formed as products of these generators,
a fractional is called a defining relation. There are seven "words", or strings of
factorial numbers, in the defining relation for the 28-3 design, starting with the
design original three generators and adding all the new "words" that can be
formed by multiplying together any two or three of these original three
words. These seven turn out to be I = 3456 = 12457 = 12358 = 12367 =
12468 = 3478 = 5678. In general, there will be (2p -1) words in the
defining relation for a 2k-p fractional factorial.
Definition of The length of the shortest word in the defining relation is called the
"Resolution" resolution of the design. Resolution describes the degree to which
estimated main effects are aliased (or confounded) with estimated
2-level interactions, 3-level interactions, etc.
Notation for The length of the shortest word in the defining relation for the 28-3
resolution design is four. This is written in Roman numeral script, and subscripted
(Roman as . Note that the 23-1 design has only one word, "I = 123" (or "I =
numerals)
-123"), in its defining relation since there is only one design generator,
and so this fractional factorial design has resolution three; that is, we
may write .
Resolution The design resolution tells us how badly the design is confounded.
and Previously, in the 23-1 design, we saw that the main effects were
confounding confounded with two-factor interactions. However, main effects were
not confounded with other main effects. So, at worst, we have 3=12, or
2=13, etc., but we do not have 1=2, etc. In fact, a resolution II design
would be pretty useless for any purpose whatsoever!
Similarly, in a resolution IV design, main effects are confounded with at
worst three-factor interactions. We can see, in Figure 3.7, that 6=345.
We also see that 36=45, 34=56, etc. (i.e., some two-factor interactions
are confounded with certain other two-factor interactions) etc.; but we
never see anything like 2=13, or 5=34, (i.e., main effects confounded
with two-factor interactions).
One or two For this fractional factorial design, 15 two-factor interactions are
factors aliased (confounded) in pairs or in a group of three. The remaining 28 -
suspected of 15 = 13 two-factor interactions are only aliased with higher-order
possibly interactions (which are generally assumed to be negligible). This is
having verified by noting that factors "1" and "2" never appear in a length-4
significant word in the defining relation. So, all 13 interactions involving "1" and
first-order "2" are clear of aliasing with any other two factor interaction.
interactions
can be If one or two factors are suspected of possibly having significant
assigned in first-order interactions, they can be assigned in such a way as to avoid
such a way having them aliased.
as to avoid
having them
aliased
There are Note: There are other fractional designs that can be derived
many starting with different choices of design generators for the "6", "7" and
choices of "8" factor columns. However, they are either equivalent (in terms of the
fractional number of words of length of length of four) to the fraction with
factorial generators 6 = 345, 7 = 1245, 8 = 1235 (obtained by relabeling the
designs - factors), or they are inferior to the fraction given because their defining
some may relation contains more words of length four (and therefore more
have the
confounded two-factor interactions). For example, the design with
same
number of generators 6 = 12345, 7 = 135, and 8 = 245 has five length-four words
runs and in the defining relation (the defining relation is I = 123456 = 1357 =
resolution, 2458 = 2467 = 1368 = 123478 = 5678). As a result, this design would
but different confound more two factor-interactions (23 out of 28 possible two-factor
aliasing interactions are confounded, leaving only "12", "14", "23", "27" and
patterns. "34" as estimable two-factor interactions).
Minimum A table given later in this chapter gives a collection of useful fractional
aberration factorial designs that, for a given k and p, maximize the possible
resolution and minimize the number of short words in the defining
relation (which minimizes two-factor aliasing). The term for this is
"minimum aberration".
5. Process Improvement
5.3. Choosing an experimental design
5.3.3. How do you select an experimental design?
5.3.3.4. Fractional factorial designs
5. Process Improvement
5.3. Choosing an experimental design
5.3.3. How do you select an experimental design?
5.3.3.4. Fractional factorial designs
Use screening Even when the experimental goal is to eventually fit a response
designs when you surface model (an RSM analysis), the first experiment should be a
have many screening design when there are many factors to consider.
factors to
consider
Screening Screening designs are typically of resolution III. The reason is that
designs are resolution III designs permit one to explore the effects of many
usually factors with an efficient number of runs.
resolution III or
IV Sometimes designs of resolution IV are also used for screening
designs. In these designs, main effects are confounded with, at
worst, three-factor interactions. This is better from the confounding
viewpoint, but the designs require more runs than a resolution III
design.
5. Process Improvement
5.3. Choosing an experimental design
5.3.3. How do you select an experimental design?
5.3.3.4. Fractional factorial designs
Generator They differ in the notation for the design generators. Box, Hunter, and
column Hunter use numbers (as we did in our earlier discussion) and
notation can Montgomery uses capital letters according to the following scheme:
use either
numbers or
letters for
the factor
columns
Notice the absence of the letter I. This is usually reserved for the
intercept column that is identically 1. As an example of the letter
notation, note that the design generator "6 = 12345" is equivalent to "F =
ABCDE".
Details of TABLE 3.17 catalogs these useful fractional factorial designs using the
the design notation previously described in FIGURE 3.7.
generators,
the defining Clicking on the specification for a given design provides details
relation, the (courtesy of Dataplot files) of the design generators, the defining
confounding relation, the confounding structure (as far as main effects and two-level
structure, interactions are concerned), and the design matrix. The notation used
and the follows our previous labeling of factors with numbers, not letters.
design
matrix
9 2III9-5 16
10 2V10-3 128
10 2IV10-4 64
10 2IV10-5 32
10 2III10-6 16
11 2V11-4 128
11 2IV11-5 64
11 2IV11-6 32
11 2III11-7 16
15 2III15-11 16
31 2III31-26 32
5. Process Improvement
5.3. Choosing an experimental design
5.3.3. How do you select an experimental design?
These Plackett-Burman (PB) designs are used for screening experiments because, in a PB
designs design, main effects are, in general, heavily confounded with two-factor interactions.
have run The PB design in 12 runs, for example, may be used for an experiment containing up to
numbers 11 factors.
that are a
multiple of
4
Saturated PB designs also exist for 20-run, 24-run, and 28-run (and higher) designs. With a 20-run
Main Effect design you can run a screening experiment for up to 19 factors, up to 23 factors in a
designs 24-run design, and up to 27 factors in a 28-run design. These Resolution III designs are
known as Saturated Main Effect designs because all degrees of freedom are utilized to
estimate main effects. The designs for 20 and 24 runs are shown below.
14 -1 1 1 -1 -1 1 -1 1 -1 -1 -1 -1 -1 1 1 1 1 -1 1 -1 1 1 -1
15 -1 -1 1 1 -1 -1 1 -1 1 -1 -1 -1 -1 -1 1 1 1 1 -1 1 -1 1 1
16 1 -1 -1 1 1 -1 -1 1 -1 1 -1 -1 -1 -1 -1 1 1 1 1 -1 1 -1 1
17 1 1 -1 -1 1 1 -1 -1 1 -1 1 -1 -1 -1 -1 -1 1 1 1 1 -1 1 -1
18 -1 1 1 -1 -1 1 1 -1 -1 1 -1 1 -1 -1 -1 -1 -1 1 1 1 1 -1 1
19 1 -1 1 1 -1 -1 1 1 -1 -1 1 -1 1 -1 -1 -1 -1 -1 1 1 1 1 -1
20 -1 1 -1 1 1 -1 -1 1 1 -1 -1 1 -1 1 -1 -1 -1 -1 -1 1 1 1 1
21 1 -1 1 -1 1 1 -1 -1 1 1 -1 -1 1 -1 1 -1 -1 -1 -1 -1 1 1 1
22 1 1 -1 1 -1 1 1 -1 -1 1 1 -1 -1 1 -1 1 -1 -1 -1 -1 -1 1 1
23 1 1 1 -1 1 -1 1 1 -1 -1 1 1 -1 -1 1 -1 1 -1 -1 -1 -1 -1 1
24 1 1 1 1 -1 1 -1 1 1 -1 -1 1 1 -1 -1 1 -1 1 -1 -1 -1 -1 -1
No defining These designs do not have a defining relation since interactions are not identically equal
relation to main effects. With the designs, a main effect column Xi is either orthogonal to
XiXj or identical to plus or minus XiXj. For Plackett-Burman designs, the two-factor
interaction column XiXj is correlated with every Xk (for k not equal to i or j).
Economical However, these designs are very useful for economically detecting large main effects,
for assuming all interactions are negligible when compared with the few important main
detecting effects.
large main
effects
5. Process Improvement
5.3. Choosing an experimental design
5.3.3. How do you select an experimental design?
Equations for In other circumstances, a complete description of the process behavior might
quadratic require a quadratic or cubic model:
and cubic
models Quadratic
Cubic
These are the full models, with all possible terms, rarely would all of the
terms be needed in an application.
Quadratic If the experimenter has defined factor limits appropriately and/or taken
models advantage of all the tools available in multiple regression analysis
almost (transformations of responses and factors, for example), then finding an
always industrial process that requires a third-order model is highly unusual.
sufficient for Therefore, we will only focus on designs that are useful for fitting quadratic
industrial models. As we will see, these designs often provide lack of fit detection that
applications will help determine when a higher-order model is needed.
General Figures 3.9 to 3.12 identify the general quadratic surface types that an
quadratic investigator might encounter
surface types
A two-level If a response behaves as in Figure 3.13, the design matrix to quantify that
experiment behavior need only contain factors with two levels -- low and high. This
with center model is a basic assumption of simple two-level factorial and fractional
points can factorial designs. If a response behaves as in Figure 3.14, the minimum
detect, but number of levels required for a factor to quantify that behavior is three. One
not fit, might logically assume that adding center points to a two-level design would
quadratic satisfy that requirement, but the arrangement of the treatments in such a
effects matrix confounds all quadratic effects with each other. While a two-level
design with center points cannot estimate individual pure quadratic effects, it
can detect them effectively.
Three-level A solution to creating a design matrix that permits the estimation of simple
factorial curvature as shown in Figure 3.14 would be to use a three-level factorial
design design. Table 3.21 explores that possibility.
Four-level Finally, in more complex cases such as illustrated in Figure 3.15, the design
factorial matrix must contain at least four levels of each factor to characterize the
design behavior of the response adequately.
Fractional Two-level factorial designs quickly become too large for practical application
factorial as the number of factors investigated increases. This problem was the
designs motivation for creating `fractional factorial' designs. Table 3.21 shows that
created to the number of runs required for a 3k factorial becomes unacceptable even
avoid such a more quickly than for 2k designs. The last column in Table 3.21 shows the
large number number of terms present in a quadratic model for each case.
of runs
Number of With only a modest number of factors, the number of runs is very large, even
runs large an order of magnitude greater than the number of parameters to be estimated
even for when k isn't small. For example, the absolute minimum number of runs
modest required to estimate all the terms present in a four-factor quadratic model is
number of 15: the intercept term, 4 main effects, 6 two-factor interactions, and 4
factors quadratic terms.
The corresponding 3k design for k = 4 requires 81 runs.
Complex Considering a fractional factorial at three levels is a logical step, given the
alias success of fractional designs when applied to two-level designs.
structure and Unfortunately, the alias structure for the three-level fractional factorial
lack of designs is considerably more complex and harder to define than in the
rotatability two-level case.
for 3-level
fractional Additionally, the three-level factorial designs suffer a major flaw in their lack
factorial of `rotatability.'
designs
Rotatability of Designs
Contours of In a rotatable design, the contours associated with the variance of the
variance of predicted values are concentric circles. Figures 3.16 and 3.17 (adapted from
predicted Box and Draper, `Empirical Model Building and Response Surfaces,' page
values are 485) illustrate a three-dimensional plot and contour plot, respectively, of the
concentric `information function' associated with a 32 design.
circles
Graphs of the Figures 3.18 and 3.19 are the corresponding graphs of the information
information function for a rotatable quadratic design. In each of these figures, the value of
function for a the information function depends only on the distance of a point from the
rotatable center of the space.
quadratic
design
FIGURE 3.16
Three-Dimensional FIGURE 3.17
Illustration for the Contour Map of the Information Function
Information Function of a for a 32 Design
32 Design
FIGURE 3.18
Three-Dimensional
FIGURE 3.19 Contour Map of the
Illustration of the
Information Function for a Rotatable
Information Function for a
Quadratic Design for Two Factors
Rotatable Quadratic Design
for Two Factors
Central Introduced during the 1950's, classical quadratic designs fall into two broad
composite categories: Box-Wilson central composite designs and Box-Behnken designs.
and The next sections describe these design classes and their properties.
Box-Behnken
designs
5. Process Improvement
5.3. Choosing an experimental design
5.3.3. How do you select an experimental design?
5.3.3.6. Response surface designs
Diagram of
central
composite
design
generation for
two factors
A CCD design A central composite design always contains twice as many star points
with k factors as there are factors in the design. The star points represent new
has 2k star extreme values (low and high) for each factor in the design. Table 3.22
points summarizes the properties of the three varieties of central composite
designs. Figure 3.21 illustrates the relationships among these varieties.
Pictorial
representation
of where the
star points
are placed for
the 3 types of
CCD designs
Comparison The diagrams in Figure 3.21 illustrate the three types of central
of the 3 composite designs for two factors. Note that the CCC explores the
central largest process space and the CCI explores the smallest process space.
composite Both the CCC and CCI are rotatable designs, but the CCF is not. In the
designs CCC design, the design points describe a circle circumscribed about
the factorial square. For three factors, the CCC design points describe
a sphere around the factorial cube.
5. Process Improvement
5.3. Choosing an experimental design
5.3.3. How do you select an experimental design?
5.3.3.6. Response surface designs
Geometry of The geometry of this design suggests a sphere within the process space
the design such that the surface of the sphere protrudes through each face with the
surface of the sphere tangential to the midpoint of each edge of the
space.
Examples of Box-Behnken designs are given on the next page.
5. Process Improvement
5.3. Choosing an experimental design
5.3.3. How do you select an experimental design?
5.3.3.6. Response surface designs
Various Table 3.24 contrasts the structures of four common quadratic designs one might
CCD designs use when investigating three factors. The table combines CCC and CCI designs
and because they are structurally identical.
Box-Behnken
designs are For three factors, the Box-Behnken design offers some advantage in requiring a
compared fewer number of runs. For 4 or more factors, this advantage disappears.
and their
properties
discussed
1 0 0 -1.682 1 0 0 -1 3 0 0 0
1 0 0 1.682 1 0 0 +1
6 0 0 0 6 0 0 0
Total Runs = 20 Total Runs = 20 Total Runs = 15
Factor Table 3.25 illustrates the factor settings required for a central composite
settings for circumscribed (CCC) design and for a central composite inscribed (CCI) design
CCC and (standard order), assuming three factors, each with low and high settings of 10
CCI three and 20, respectively. Because the CCC design generates new extremes for all
factor factors, the investigator must inspect any worksheet generated for such a design
designs to make certain that the factor settings called for are reasonable.
In Table 3.25, treatments 1 to 8 in each case are the factorial points in the design;
treatments 9 to 14 are the star points; and 15 to 20 are the system-recommended
center points. Notice in the CCC design how the low and high values of each
factor have been extended to create the star points. In the CCI design, the
specified low and high values become the star points, and the system computes
appropriate settings for the factorial part of the design inside those boundaries.
TABLE 3.25 Factor Settings for CCC and CCI Designs for Three
Factors
Central Composite Central Composite
Circumscribed CCC Inscribed CCI
Sequence Sequence
Number X1 X2 X3 Number X1 X2 X3
1 10 10 10 1 12 12 12
2 20 10 10 2 18 12 12
3 10 20 10 3 12 18 12
4 20 20 10 4 18 18 12
5 10 10 20 5 12 12 18
6 20 10 20 6 18 12 18
7 10 20 20 7 12 12 18
8 20 20 20 8 18 18 18
9 6.6 15 15 * 9 10 15 15
10 23.4 15 15 * 10 20 15 15
11 15 6.6 15 * 11 15 10 15
12 15 23.4 15 * 12 15 20 15
13 15 15 6.6 * 13 15 15 10
14 15 15 23.4 * 14 15 15 20
15 15 15 15 15 15 15 15
16 15 15 15 16 15 15 15
17 15 15 15 17 15 15 15
18 15 15 15 18 15 15 15
19 15 15 15 19 15 15 15
20 15 15 15 20 15 15 15
Factor Table 3.26 illustrates the factor settings for the corresponding central composite
settings for face-centered (CCF) and Box-Behnken designs. Note that each of these designs
CCF and provides three levels for each factor and that the Box-Behnken design requires
Box-Behnken fewer runs in the three-factor case.
three factor
designs TABLE 3.26 Factor Settings for CCF and Box-Behnken Designs for
Three Factors
Central Composite Box-Behnken
Face-Centered CCC
Sequence Sequence
Number X1 X2 X3 Number X1 X2 X3
1 10 10 10 1 10 10 10
2 20 10 10 2 20 10 15
3 10 20 10 3 10 20 15
4 20 20 10 4 20 20 15
5 10 10 20 5 10 15 10
6 20 10 20 6 20 15 10
7 10 20 20 7 10 15 20
8 20 20 20 8 20 15 20
9 10 15 15 * 9 15 10 10
10 20 15 15 * 10 15 20 10
11 15 10 15 * 11 15 10 20
12 15 20 15 * 12 15 20 20
13 15 15 10 * 13 15 15 15
14 15 15 20 * 14 15 15 15
15 15 15 15 15 15 15 15
16 15 15 15
17 15 15 15
18 15 15 15
19 15 15 15
20 15 15 15
Properties of Table 3.27 summarizes properties of the classical quadratic designs. Use this table
classical for broad guidelines when attempting to choose from among available designs.
response
surface TABLE 3.27 Summary of Properties of Classical Response Surface Designs
designs Design Type Comment
CCC designs provide high quality predictions over the entire
design space, but require factor settings outside the range of the
factors in the factorial part. Note: When the possibility of running
CCC a CCC design is recognized before starting a factorial experiment,
factor spacings can be reduced to ensure that ± for each coded
factor corresponds to feasible (reasonable) levels.
Requires 5 levels for each factor.
CCI designs use only points within the factor ranges originally
specified, but do not provide the same high quality prediction
CCI over the entire space compared to the CCC.
Requires 5 levels of each factor.
CCF designs provide relatively high quality predictions over the
entire design space and do not require using points outside the
CCF original factor range. However, they give poor precision for
estimating pure quadratic coefficients.
Requires 3 levels for each factor.
These designs require fewer treatment combinations than a
central composite design in cases involving 3 or 4 factors.
The Box-Behnken design is rotatable (or nearly so) but it contains
regions of poor prediction quality like the CCI. Its "missing
Box-Behnken
corners" may be useful when the experimenter should avoid
combined factor extremes. This property prevents a potential loss
of data in those cases.
Requires 3 levels for each factor.
Number of Table 3.28 compares the number of runs required for a given number of factors
runs for various Central Composite and Box-Behnken designs.
required by
central TABLE 3.28 Number of Runs Required by Central Composite and
composite Box-Behnken Designs
and Number of Factors Central Composite Box-Behnken
Box-Behnken 2 13 (5 center points) -
designs 3 20 (6 centerpoint runs) 15
4 30 (6 centerpoint runs) 27
5 33 (fractional factorial) or 52 (full factorial) 46
6 54 (fractional factorial) or 91 (full factorial) 54
5. Process Improvement
5.3. Choosing an experimental design
5.3.3. How do you select an experimental design?
5.3.3.6. Response surface designs
When If an investigator has run either a 2k full factorial or a 2k-p fractional factorial
augmenting design of at least resolution V, augmentation of that design to a central
a resolution composite design (either CCC of CCF) is easily accomplished by adding an
V design to additional set (block) of star and centerpoint runs. If the factorial experiment
a CCC indicated (via the t test) curvature, this composite augmentation is the best
design by follow-up option (follow-up options for other situations will be discussed later).
adding star
points, it
may be
desirable to
block the
design
CCF The CCF design does not allow orthogonal blocking and the Box-Behnken
designs designs offer blocking only in limited circumstances, whereas the CCC does
cannot be permit orthogonal blocking.
orthogonally
blocked
Axial and In general, when two blocks are required there should be an axial block and a
factorial factorial block. For three blocks, the factorial block is divided into two blocks
blocks and the axial block is not split. The blocking of the factorial design points
should result in orthogonality between blocks and individual factors and
between blocks and the two factor interactions.
The following Central Composite design in two factors is broken into two
blocks.
Table of The following three examples show blocking structure for various designs.
CCD design
with 3 TABLE 3.30 CCD: 3 Factors 3 Blocks, Sorted by Block
factors and Pattern Block X1 X2 X3 Comment
3 blocks
--- 1 -1 -1 -1 Full Factorial
-++ 1 -1 +1 +1 Full Factorial
+-+ 1 +1 -1 +1 Full Factorial
++- 1 +1 +1 -1 Full Factorial
000 1 0 0 0 Center-Full Factorial
000 1 0 0 0 Center-Full Factorial
--+ 2 -1 -1 +1 Full Factorial
00+0 3 0 0 +2 0 Axial
000- 3 0 0 0 -2 Axial
000+ 3 0 0 0 +2 Axial
0000 3 0 0 0 0 Center-Axial
000+0 2 0 0 0 +2 0 Axial
0000- 2 0 0 0 0 -2 Axial
0000+ 2 0 0 0 0 +2 Axial
00000 2 0 0 0 0 0 Center-Axial
5. Process Improvement
5.3. Choosing an experimental design
5.3.3. How do you select an experimental design?
Centerpoint Centerpoint runs should begin and end the experiment, and should be
runs are not dispersed as evenly as possible throughout the design matrix. The
randomized centerpoint runs are not randomized! There would be no reason to
randomize them as they are there as guardians against process instability
and the best way to find instability is to sample the process on a regular
basis.
Rough rule With this in mind, we have to decide on how many centerpoint runs to
of thumb is do. This is a tradeoff between the resources we have, the need for
to add 3 to 5 enough runs to see if there is process instability, and the desire to get the
center point experiment over with as quickly as possible. As a rough guide, you
runs to your should generally add approximately 3 to 5 centerpoint runs to a full or
design fractional factorial design.
Table of In the following Table we have added three centerpoint runs to the
randomized, otherwise randomized design matrix, making a total of nineteen runs.
replicated
23 full TABLE 3.32 Randomized, Replicated 23 Full Factorial Design
factorial Matrix with Centerpoint Control Runs Added
design with Random Order Standard Order SPEED FEED DEPTH
centerpoints 1 not applicable not applicable 0 0 0
2 1 5 -1 -1 1
3 2 15 -1 1 1
4 3 9 -1 -1 -1
5 4 7 -1 1 1
6 5 3 -1 1 -1
7 6 12 1 1 -1
8 7 6 1 -1 1
9 8 4 1 1 -1
10 not applicable not applicable 0 0 0
11 9 2 1 -1 -1
12 10 13 -1 -1 1
13 11 8 1 1 1
14 12 16 1 1 1
15 13 1 -1 -1 -1
16 14 14 1 -1 1
17 15 11 -1 1 -1
18 16 10 1 -1 -1
19 not applicable not applicable 0 0 0
Note that the control (centerpoint) runs appear at rows 1, 10, and 19.
This worksheet can be given to the person who is going to do the
runs/measurements and asked to proceed through it from first row to last
in that order, filling in the Yield values as they are obtained.
Center One often runs experiments in which some factors are nominal. For
points for example, Catalyst "A" might be the (-1) setting, catalyst "B" might be
discrete coded (+1). The choice of which is "high" and which is "low" is
factors arbitrary, but one must have some way of deciding which catalyst
setting is the "standard" one.
These standard settings for the discrete input factors together with center
points for the continuous input factors, will be regarded as the "center
points" for purposes of design.
Variance of Uniform precision ensures that the variance of prediction is the same at
prediction the center of the experimental space as it is at a unit distance away from
the center.
5. Process Improvement
5.3. Choosing an experimental design
5.3.3. How do you select an experimental design?
Partial Two methods will be described for selecting these additional treatment
foldover combinations:
designs ● Mirror-image foldover designs (to build a resolution
break up IV design from a resolution III design)
specific
● Alternative foldover designs (to break up specific
alias
patterns alias patterns).
5. Process Improvement
5.3. Choosing an experimental design
5.3.3. How do you select an experimental design?
5.3.3.8. Improving fractional factorial design resolution
A resolution In general, a design type that uses a specified fraction of the runs from
III design, a full factorial and is balanced and orthogonal is called a fractional
combined factorial.
with its
mirror-image A 2-level fractional factorial is constructed as follows: Let the number
foldover, of runs be 2k-p. Start by constructing the full factorial for the k-p
becomes variables. Next associate the extra factors with higher-order
resolution IV interaction columns. The Table shown previously details how to do this
to achieve a minimal amount of confounding.
For example, consider the 25-2 design (a resolution III design). The full
factorial for k = 5 requires 25 = 32 runs. The fractional factorial can be
achieved in 25-2 = 8 runs, called a quarter (1/4) fractional design, by
setting X4 = X1*X2 and X5 = X1*X3.
Design The design matrix for a 25-2 fractional factorial looks like:
matrix for a
25-2 TABLE 3.34 Design Matrix for a 25-2 Fractional Factorial
fractional run X1 X2 X3 X4 = X1X2 X5 = X1X3
factorial 1 -1 -1 -1 +1 +1
2 +1 -1 -1 -1 -1
3 -1 +1 -1 -1 +1
4 +1 +1 -1 +1 -1
5 -1 -1 +1 +1 -1
6 +1 -1 +1 -1 +1
7 -1 +1 +1 -1 -1
8 +1 +1 +1 +1 +1
Increase to In this design the X1X2 column was used to generate the X4 main
resolution IV effect and the X1X3 column was used to generate the X5 main effect.
design by The design generators are: 4 = 12 and 5 = 13 and the defining relation
augmenting is I = 124 = 135 = 2345. Every main effect is confounded (aliased) with
design matrix at least one first-order interaction (see the confounding structure for
this design).
We can increase the resolution of this design to IV if we augment the 8
original runs, adding on the 8 runs from the mirror-image fold-over
design. These runs make up another 1/4 fraction design with design
generators 4 = -12 and 5 = -13 and defining relation I = -124 = -135 =
2345. The augmented runs are:
Mirror-image A mirror-image foldover design is the original design with all signs
foldover reversed. It breaks the alias chains between every main factor and
design two-factor interactionof a resolution III design. That is, we can
reverses all estimate all the main effects clear of any two-factor interaction.
signs in
original
design matrix
Design The design generators for this 1/16 fractional factorial design are:
generators 4 = 12, 5 = 13, 6 = 23 and 7 = 123
and defining
From these we obtain, by multiplication, the defining relation:
relation for
this example I = 124 = 135 = 236 = 347 = 257 = 167 = 456 = 1237 =
2345 = 1346 = 1256 = 1457 = 2467 = 3567 = 1234567.
Computing Using this defining relation, we can easily compute the alias structure
alias for the complete design, as shown previously in the link to the
structure for fractional design Table given earlier. For example, to figure out which
complete effects are aliased (confounded) with factor X1 we multiply the
design defining relation by 1 to obtain:
1 = 24 = 35 = 1236 = 1347 = 1257 = 67 = 1456 = 237 = 12345 =
346 = 256 = 457 = 12467 = 13567 = 234567
In order to simplify matters, let us ignore all interactions with 3 or
more factors; we then have the following 2-factor alias pattern for X1:
1 = 24 = 35 = 67 or, using the full notation, X1 = X2*X4 = X3*X5 =
X6*X7.
The same procedure can be used to obtain all the other aliases for each
of the main effects, generating the following list:
1 = 24 = 35 = 67
2 = 14 = 36 = 57
3 = 15 = 26 = 47
4 = 12 = 37 = 56
5 = 13 = 27 = 46
6 = 17 = 23 = 45
7 = 16 = 25 = 34
Signs in The chosen design used a set of generators with all positive signs. The
every column mirror-image foldover design uses generators with negative signs for
of original terms with an even number of factors or, 4 = -12, 5 = -13, 6 = -23 and 7
design matrix = 123. This generates a design matrix that is equal to the original
reversed for design matrix with every sign in every column reversed.
mirror-image
foldover If we augment the initial 8 runs with the 8 mirror-image foldover
design design runs (with all column signs reversed), we can de-alias all the
main effect estimates from the 2-way interactions. The additional runs
are:
Alias Following the same steps as before and making the same assumptions
structure for about the omission of higher-order interactions in the alias structure,
augmented we arrive at:
runs
1 = -24 = -35 = -67
2 = -14 = -36 =- 57
3 = -15 = -26 = -47
4 = -12 = -37 = -56
5 = -13 = -27 = -46
6 = -17 = -23 = -45
7 = -16 = -25 = -34
With both sets of runs, we can now estimate all the main effects free
from two factor interactions.
5. Process Improvement
5.3. Choosing an experimental design
5.3.3. How do you select an experimental design?
5.3.3.8. Improving fractional factorial design resolution
Semifolding
Additional We can repeat only the points that were set at the high levels of the
information factor of choice and then run them at their low settings in the next
on John's 3/4 experiment. For the given example, this means an additional 4 runs
designs instead 8. We mention this technique only in passing, more details may
be found in the references (or see John's 3/4 designs).
5. Process Improvement
5.3. Choosing an experimental design
5.3.3. How do you select an experimental design?
The 32 design
The simplest This is the simplest three-level design. It has two factors, each at three levels.
3-level design The 9 treatment combinations for this type of design can be shown pictorially as
- with only 2 follows:
factors
FIGURE 3.23 A 32 Design Schematic
A notation such as "20" means that factor A is at its high level (2) and factor B
is at its low level (0).
The 33 design
The model This is a design that consists of three factors, each at three levels. It can be
and treatment expressed as a 3 x 3 x 3 = 33 design. The model for such an experiment is
runs for a 3
factor, 3-level
design
treatments.
Factor A
Factor B Factor C 0 1 2
0 0 000 100 200
0 1 001 101 201
0 2 002 102 202
1 0 010 110 210
1 1 011 111 211
1 2 012 112 212
2 0 020 120 220
2 1 021 121 221
2 2 022 122 222
5. Process Improvement
5.3. Choosing an experimental design
5.3.3. How do you select an experimental design?
Two-Level Three-Level
B C X
-1 -1 x1
+1 -1 x2
-1 +1 x2
+1 +1 x3
3-Level We have seen that in order to create one three-level factor, the starting
factors from design can be a 23 factorial. Without proof we state that a 24 can split
24 and 25 off 1, 2 or 3 three-level factors; a 25 is able to generate 3 three-level
designs factors and still maintain a full factorial structure. For more on this, see
Montgomery (1991).
Constructing We may use the same principles as for the three-level factor example in
a design creating a four-level factor. We will assume that the goal is to construct
with one a design with one four-level and two two-level factors.
4-level
factor and Initially we wish to estimate all main effects and interactions. It has
two 2-level been shown (see Montgomery, 1991) that this can be accomplished via
factors a 24 (16 runs) design, with columns A and B used to create the four
level factor X.
9 -1 -1 x1 -1 +1
10 +1 -1 x2 -1 +1
11 -1 +1 x3 -1 +1
12 +1 +1 x4 -1 +1
13 -1 -1 x1 +1 +1
14 +1 -1 x2 +1 +1
15 -1 +1 x3 +1 +1
16 +1 +1 x4 +1 +1
L36 L36 - A Fractional Factorial (Mixed-Level) Design Eleven Factors at Two Levels and Twelve Factors at 3
design Levels (36 Runs)
Run X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19 X20 X21 X22 X23
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
2 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2
3 1 1 1 1 1 1 1 1 1 1 1 3 3 3 3 3 3 3 3 3 3 3 3
4 1 1 1 1 1 2 2 2 2 2 2 1 1 1 1 2 2 2 2 3 3 3 3
5 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 1 1 1 1
6 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 1 1 1 1 2 2 2 2
7 1 1 2 2 2 1 1 1 2 2 2 1 1 2 3 1 2 3 3 1 2 2 3
8 1 1 2 2 2 1 1 1 2 2 2 2 2 3 1 2 3 1 1 2 3 3 1
9 1 1 2 2 2 1 1 1 2 2 2 3 3 1 2 3 1 2 2 3 1 1 2
10 1 2 1 2 2 1 2 2 1 1 2 1 1 3 2 1 3 2 3 2 1 3 2
11 1 2 1 2 2 1 2 2 1 1 2 2 2 1 3 2 1 3 1 3 2 1 3
12 1 2 1 2 2 1 2 2 1 1 2 3 3 2 1 3 2 1 2 1 3 2 1
13 1 2 2 1 2 2 1 2 1 2 1 1 2 3 1 3 2 1 3 3 2 1 2
14 1 2 2 1 2 2 1 2 1 2 1 2 3 1 2 1 3 2 1 1 3 2 3
15 1 2 2 1 2 2 1 2 1 2 1 3 1 2 3 2 1 3 2 2 1 3 1
16 1 2 2 2 1 2 2 1 2 1 1 1 2 3 2 1 1 3 2 3 3 2 1
17 1 2 2 2 1 2 2 1 2 1 1 2 3 1 3 2 2 1 3 1 1 3 2
18 1 2 2 2 1 2 2 1 2 1 1 3 1 2 1 3 3 2 1 2 2 1 3
19 2 1 2 2 1 1 2 2 1 2 1 1 2 1 3 3 3 1 2 2 1 2 3
20 2 1 2 2 1 1 2 2 1 2 1 2 3 2 1 1 1 2 3 3 2 3 1
21 2 1 2 2 1 1 2 2 1 2 1 3 1 3 2 2 2 3 1 1 3 1 2
22 2 1 2 1 2 2 2 1 1 1 2 1 2 2 3 3 1 2 1 1 3 3 2
23 2 1 2 1 2 2 2 1 1 1 2 2 3 3 1 1 2 3 2 2 1 1 3
24 2 1 2 1 2 2 2 1 1 1 2 3 1 1 2 2 3 1 3 3 2 2 1
25 2 1 1 2 2 2 1 2 2 1 1 1 3 2 1 2 3 3 1 3 1 2 2
26 2 1 1 2 2 2 1 2 2 1 1 2 1 3 2 3 1 1 2 1 2 3 3
27 2 1 1 2 2 2 1 2 2 1 1 3 2 1 3 1 2 2 3 2 3 1 1
28 2 2 2 1 1 1 1 2 2 1 2 1 3 2 2 2 1 1 3 2 3 1 3
29 2 2 2 1 1 1 1 2 2 1 2 2 1 3 3 3 2 2 1 3 1 2 1
30 2 2 2 1 1 1 1 2 2 1 2 3 2 1 1 1 3 3 2 1 2 3 2
31 2 2 1 2 1 2 1 1 1 2 2 1 3 3 3 2 3 2 2 1 2 1 1
32 2 2 1 2 1 2 1 1 1 2 2 2 1 1 1 3 1 3 3 2 3 2 2
33 2 2 1 2 1 2 1 1 1 2 2 3 2 2 1 2 1 1 3 1 1 3 3
34 2 2 1 1 2 1 2 1 2 2 1 1 3 1 2 3 2 3 1 2 2 3 1
35 2 2 1 1 2 1 2 1 2 2 1 2 1 2 3 1 3 1 2 3 3 1 2
36 2 2 1 1 2 1 2 1 2 2 1 3 2 3 1 2 1 2 3 1 1 2 3
5. Process Improvement
Prerequisite The examples in this section assume the reader is familiar with the
statistical concepts of
tools and ● ANOVA tables (see Chapter 3 or Chapter 7)
concepts
needed for ● p-values
DOE ● Residual analysis
analyses ● Model Lack of Fit tests
● Data transformations for normality and linearity
5. Process Improvement
5.4. Analysis of DOE data
■ Block plots
■ Normal or half-normal plots of the effects
■ Interaction plots
❍ Sometimes the right graphs and plots of the data lead to obvious answers for your
experimental objective questions and you can skip to step 5. In most cases, however,
you will want to continue by fitting and validating a model that can be used to
answer your questions.
2. Create the theoretical model (the experiment should have been designed with this model in
mind!).
3. Create a model from the data. Simplify the model, if possible, using stepwise regression
methods and/or parameter p-value significance information.
4. Test the model assumptions using residual graphs.
❍ If none of the model assumptions were violated, examine the ANOVA.
Flowchart Note: The above flowchart and sequence of steps should not be regarded as a "hard-and-fast rule"
is a for analyzing all DOE's. Different analysts may prefer a different sequence of steps and not all
guideline, types of experiments can be analyzed with one set procedure. There still remains some art in both
not a the design and the analysis of experiments, which can only be learned from experience. In
hard-and addition, the role of engineering judgment should not be underestimated.
-fast rule
5. Process Improvement
5.4. Analysis of DOE data
5. Process Improvement
5.4. Analysis of DOE data
Full For full factorial designs with k factors (2k runs, not counting any center
factorial points or replication runs), the full model contains all the main effects
designs and all orders of interaction terms. Usually, higher-order (three or more
factors) interaction terms are included initially to construct the normal
(or half-normal) plot of effects, but later dropped when a simpler,
adequate model is fit. Depending on the software available or the
analyst's preferences, various techniques such as normal or half-normal
plots, Youden plots, p-value comparisons and stepwise regression
routines are used to reduce the model to the minimum number of needed
terms. A JMP example of model selection is shown later in this section
and a Dataplot example is given as a case study.
Response Response surface initial models include quadratic terms and may
surface occasionally also include cubic terms. These models were described in
designs section 3.
Model Of course, as in all cases of model fitting, residual analysis and other
validation tests of model fit are used to confirm or adjust models, as needed.
5. Process Improvement
5.4. Analysis of DOE data
❍ p-values
❍ Residual analysis
❍ Model Lack of Fit tests
❍ Data transformations for normality and linearity
❍ Stepwise regression procedures
5. Process Improvement
5.4. Analysis of DOE data
5. Process Improvement
5.4. Analysis of DOE data
Checks for What do you do if you don't obtain the results you expected? If the
when confirmation runs don't produce the results you expected:
confirmation 1. check to see that nothing has changed since the original data
runs give collection
surprises
2. verify that you have the correct settings for the confirmation
runs
3. revisit the model to verify the "best" settings from the analysis
4. verify that you had the correct predicted value for the
confirmation runs.
If you don't find the answer after checking the above 4 items, the
model may not predict very well in the region you decided was "best".
You still learned from the experiment and you should use the
information gained from this experiment to design another follow-up
experiment.
5. Process Improvement
5.4. Analysis of DOE data
Three Perhaps one of the best ways to learn how to use DOE analysis software
detailed to analyze the results of an experiment is to go through several detailed
DOE examples, explaining each step in the analysis. This section will
analyses illustrate the use of JMP 3.2.6 software to analyze three real
will be given experiments. Analysis using other software packages would generally
using JMP proceed along similar paths.
software
The examples cover three basic types of DOE's:
1. A full factorial experiment
2. A fractional factorial experiment
3. A response surface experiment
5. Process Improvement
5.4. Analysis of DOE data
5.4.7. Examples of DOE's
This example This data set was taken from an experiment that was performed a few years ago at NIST (by Said
uses data from Jahanmir of the Ceramics Division in the Material Science and Engineering Laboratory). The
a NIST high original analysis was performed primarily by Lisa Gill of the Statistical Engineering Division.
performance The example shown here is an independent analysis of a modified portion of the original data set.
ceramics
experiment The original data set was part of a high performance ceramics experiment with the goal of
characterizing the effect of grinding parameters on sintered reaction-bonded silicon nitride,
reaction bonded silicone nitride, and sintered silicon nitride.
Only modified data from the first of the 3 ceramic types (sintered reaction-bonded silicon nitride)
will be discussed in this illustrative example of a full factorial data analysis.
The reader may want to download the data as a text file and try using other software packages to
analyze the data.
Response and Purpose: To determine the effect of machining factors on ceramic strength
factor Response variable = mean (over 15 repetitions) of the ceramic strength
variables used Number of observations = 32 (a complete 25 factorial design)
in the Response Variable Y = Mean (over 15 reps) of Ceramic Strength
experiment Factor 1 = Table Speed (2 levels: slow (.025 m/s) and fast (.125 m/s))
Factor 2 = Down Feed Rate (2 levels: slow (.05 mm) and fast (.125 mm))
Factor 3 = Wheel Grit (2 levels: 140/170 and 80/100)
Factor 4 = Direction (2 levels: longitudinal and transverse)
Factor 5 = Batch (2 levels: 1 and 2)
Since two factors were qualitative (direction and batch) and it was reasonable to expect monotone
effects from the quantitative factors, no centerpoint runs were included.
JMP The design matrix, with measured ceramic strength responses, appears below. The actual
spreadsheet of randomized run order is given in the last column. (The interested reader may download the data
the data as a text file or as a JMP file.)
Analysis The experimental data will be analyzed following the previously described 5 basic steps using
follows 5 basic SAS JMP 3.2.6 software.
steps
Plot the We start by plotting the response data several ways to see if any trends or anomalies appear that
response would not be accounted for by the standard linear response models.
variable
First we look at the distribution of all the responses irrespective of factor levels.
Plot of Next we look at the responses plotted versus run order to check whether there might be a time
response sequence component affecting the response levels.
versus run
order Plot of Response Vs. Run Order
As hoped for, this plot does not indicate that time order had much to do with the response levels.
Box plots of Next, we look at plots of the responses sorted by factor columns.
response by
factor
variables
Several factors, most notably "Direction" followed by "Batch" and possibly "Wheel Grit", appear
to change the average response level.
Theoretical With a 25 full factorial experiment we can fit a model containing a mean term, all 5 main effect
model: assume terms, all 10 2-factor interaction terms, all 10 3-factor interaction terms, all 5 4-factor interaction
all 4-factor and terms and the 5-factor interaction term (32 parameters). However, we start by assuming all three
higher factor and higher interaction terms are non-existent (it's very rare for such high-order interactions
interaction to be significant, and they are very difficult to interpret from an engineering viewpoint). That
terms are not allows us to accumulate the sums of squares for these terms and use them to estimate an error
significant term. So we start out with a theoretical model with 26 unknown constants, hoping the data will
clarify which of these are the significant main effects and interactions we need for a final model.
Output from After fitting the 26 parameter model, the following analysis table is displayed:
fitting up to
third-order
interaction Output after Fitting Third Order Model to Response Data
terms Response: Y: Strength
Summary of Fit
RSquare 0.995127
RSquare Adj 0.974821
Root Mean Square Error 17.81632
Mean of Response 546.8959
Observations 32
Effect Test
Sum
Source DF of Squares F Ratio Prob>F
X1: Table Speed 1 894.33 2.8175 0.1442
X2: Feed Rate 1 3497.20 11.0175 0.0160
X1: Table Speed* 1 4872.57 15.3505 0.0078
X2: Feed Rate
X3: Wheel Grit 1 12663.96 39.8964 0.0007
X1: Table Speed* 1 1838.76 5.7928 0.0528
X3: Wheel Grit
This fit has a high R2 and adjusted R2, but the large number of high (>0.10) p-values (in the
"Prob>F" column) make it clear that the model has many unnecessary terms.
JMP stepwise Starting with these 26 terms, we next use the JMP Stepwise Regression option to eliminate
regression unnecessary terms. By a combination of stepwise regression and the removal of remaining terms
with a p-value higher than 0.05, we quickly arrive at a model with an intercept and 12 significant
effect terms.
Output from
fitting the
12-term model Output after Fitting the 12-Term Model to Response Data
Response: Y: Strength
Summary of Fit
RSquare 0.989114
RSquare Adj 0.982239
Root Mean Square Error 14.96346
Mean of Response 546.8959
Observations (or Sum Wgts) 32
Effect Test
Sum
Source DF of Squares F Ratio Prob>F
X1: Table Speed 1 894.33 3.9942 0.0602
X2: Feed Rate 1 3497.20 15.6191 0.0009
X1: Table Speed* 1 4872.57 21.7618 0.0002
X2: Feed Rate
X3: Wheel Grit 1 12663.96 56.5595 <.0001
X1: Table Speed* 1 1838.76 8.2122 0.0099
X3: Wheel Grit
X4: Direction 1 315132.65 1407.4390 <.0001
X1: Table Speed* 1 1637.21 7.3121 0.0141
X4: Direction
X2: Feed Rate* 1 1972.71 8.8105 0.0079
X4: Direction
X1: Table Speed* 1 5895.62 26.3309 <.0001
X2: Feed Rate*
X4:Direction
X3: Wheel Grit* 1 3158.34 14.1057 0.0013
X4: Direction
X5: Batch 1 33653.91 150.3044 <.0001
X4: Direction* 1 1328.83 5.9348 0.0249
X5: Batch
Normal plot of Non-significant effects should effectively follow an approximately normal distribution with the
the effects same location and scale. Significant effects will vary from this normal distribution. Therefore,
another method of determining significant effects is to generate a normal plot of all 31 effects.
Those effects that are substantially away from the straight line fitted to the normal plot are
considered significant. Although this is a somewhat subjective criteria, it tends to work well in
practice. It is helpful to use both the numerical output from the fit and graphical techniques such
as the normal plot in deciding which terms to keep in the model.
The normal plot of the effects is shown below. We have labeled those effects that we consider to
be significant. In this case, we have arrived at the exact same 12 terms by looking at the normal
plot as we did from the stepwise regression.
Most of the effects cluster close to the center (zero) line and follow the fitted normal model
straight line. The effects that appear to be above or below the line by more than a small amount
are the same effects identified using the stepwise routine, with the exception of X1. Some analysts
prefer to include a main effect term when it has several significant interactions even if the main
effect term itself does not appear to be significant.
Model appears At this stage, this model appears to account for most of the variability in the response, achieving
to account for an adjusted R2 of 0.982. All the main effects are significant, as are 6 2-factor interactions and 1
most of the 3-factor interaction. The only interaction that makes little physical sense is the " X4:
variability Direction*X5: Batch" interaction - why would the response using one batch of material react
differently when the batch is cut in a different direction as compared to another batch of the same
formulation?
However, before accepting any model, residuals need to be examined.
Step 4: Test the model assumptions using residual graphs (adjust and simplify as needed)
Plot of First we look at the residuals plotted versus the predicted responses.
residuals
versus
predicted
responses
The residuals appear to spread out more with larger values of predicted strength, which should
not happen when there is a common variance.
Next we examine the normality of the residuals with a normal quantile plot, a box plot and a
histogram.
None of these plots appear to show typical normal residuals and 4 of the 32 data points appear as
outliers in the box plot.
Step 4 continued: Transform the data and fit the model again
Box-Cox We next look at whether we can model a transformation of the response variable and obtain
Transformation residuals with the assumed properties. JMP calculates an optimum Box-Cox transformation by
finding the value of that minimizes the model SSE. Note: the Box-Cox transformation used in
JMP is different from the transformation used in Dataplot, but roughly equivalent.
Box-Cox Transformation Graph
The optimum is found at = 0.2. A new column Y: Strength X is calculated and added to the
JMP data spreadsheet. The properties of this column, showing the transformation equation, are
shown below.
Fit model to When the 12-effect model is fit to the transformed data, the "X4: Direction*X5: Batch"
transformed interaction term is no longer significant. The 11-effect model fit is shown below, with parameter
data estimates and p-values.
Summary of Fit
RSquare 0.99041
RSquare Adj 0.985135
Root Mean Square Error 13.81065
Mean of Response 1917.115
Observations (or Sum Wgts) 32
Parameter
Effect Estimate p-value
Intercept 1917.115 <.0001
X1: Table Speed 5.777 0.0282
X2: Feed Rate 11.691 0.0001
X1: Table Speed* -14.467 <.0001
X2: Feed Rate
X3: Wheel Grit -21.649 <.0001
X1: Table Speed* 7.339 0.007
X3: Wheel Grit
X4: Direction -99.272 <.0001
X1: Table Speed* -7.188 0.0080
X4: Direction
X2: Feed Rate* -9.160 0.0013
X4: Direction
Model has high This model has a very high R2 and adjusted R2. The residual plots (shown below) are quite a bit
R2 better behaved than before, and pass the Wilk-Shapiro test for normality.
Residual plots
from model
with
transformed
response
The run sequence plot of the residuals does not indicate any time dependent patterns.
The normal probability plot, box plot, and the histogram of the residuals do not indicate any
serious violations of the model assumptions.
Important main The magnitudes of the effect estimates show that "Direction" is by far the most important factor.
effects and "Batch" plays the next most critical role, followed by "Wheel Grit". Then, there are several
interaction important interactions followed by "Feed Rate". "Table Speed" plays a role in almost every
effects significant interaction term, but is the least important main effect on its own. Note that large
interactions can obscure main effects.
Plots of the Plots of the main effects and the significant 2-way interactions are shown below.
main effects
and significant
2-way
interactions
Prediction To determine the best setting to use for maximum ceramic strength, JMP has the "Prediction
profile Profile" option shown below.
Y: Strength X
Prediction Profile
The vertical lines indicate the optimal factor settings to maximize the (transformed) strength
response. Translating from -1 and +1 back to the actual factor settings, we have: Table speed at
"1" or .125m/s; Down Feed Rate at "1" or .125 mm; Wheel Grit at "-1" or 140/170 and Direction
at "-1" or longitudinal.
Unfortunately, "Batch" is also a very significant factor, with the first batch giving higher
strengths than the second. Unless it is possible to learn what worked well with this batch, and
how to repeat it, not much can be done about this factor.
Comments
Analyses with 1. One might ask what an analysis of just the 24 factorial with "Direction" kept at -1 (i.e.,
value of longitudinal) would yield. This analysis turns out to have a very simple model; only
Direction fixed "Wheel Grit" and "Batch" are significant main effects and no interactions are significant.
indicates
complex model If, on the other hand, we do an analysis of the 24 factorial with "Direction" kept at +1 (i.e.,
is needed only transverse), then we obtain a 7-parameter model with all the main effects and interactions
for transverse we saw in the 25 analysis, except, of course, any terms involving "Direction".
cut
So it appears that the complex model of the full analysis came from the physical properties
of a transverse cut, and these complexities are not present for longitudinal cuts.
Half fraction 2. If we had assumed that three-factor and higher interactions were negligible before
design experimenting, a half fraction design might have been chosen. In hindsight, we would
have obtained valid estimates for all main effects and two-factor interactions except for X3
and X5, which would have been aliased with X1*X2*X4 in that half fraction.
Natural log 3. Finally, we note that many analysts might prefer to adopt a natural logarithm
transformation transformation (i.e., use ln Y) as the response instead of using a Box-Cox transformation
with an exponent of 0.2. The natural logarithm transformation corresponds to an exponent
of = 0 in the Box-Cox graph.
5. Process Improvement
5.4. Analysis of DOE data
5.4.7. Examples of DOE's
A step-by-step This experiment was conducted by a team of students on a catapult – a table-top wooden device
analysis of a used to teach design of experiments and statistical process control. The catapult has several
fractional controllable factors and a response easily measured in a classroom setting. It has been used for
factorial over 10 years in hundreds of classes. Below is a small picture of a catapult that can be opened to
"catapult" view a larger version.
experiment
Catapult
The experiment Purpose: To determine the significant factors that affect the distance the ball is thrown by the
has five factors catapult, and to determine the settings required to reach 3 different distances (30, 60 and 90
that might inches).
affect the
distance the Response Variable: The distance in inches from the front of the catapult to the spot where the ball
golf ball lands. The ball is a plastic golf ball.
travels Number of observations: 20 (a 25-1 resolution V design with 4 center points).
Variables:
1. Response Variable Y = distance
2. Factor 1 = band height (height of the pivot point for the rubber bands – levels were 2.25
and 4.75 inches with a centerpoint level of 3.5)
3. Factor 2 = start angle (location of the arm when the operator releases– starts the forward
motion of the arm – levels were 0 and 20 degrees with a centerpoint level of 10 degrees)
4. Factor 3 = rubber bands (number of rubber bands used on the catapult– levels were 1 and 2
bands)
5. Factor 4 = arm length (distance the arm is extended – levels were 0 and 4 inches with a
centerpoint level of 2 inches)
6. Factor 5 = stop angle (location of the arm where the forward motion of the arm is stopped
and the ball starts flying – levels were 45 and 80 degrees with a centerpoint level of 62
degrees)
Design matrix The design matrix appears below in (randomized) run order.
and responses
(in run order)
You can Readers who want to analyze this experiment may download an Excel spreadsheet catapult.xls or
download the a JMP spreadsheet capapult.jmp.
data in a
spreadsheet
One discrete Note that 4 of the factors are continuous, and one – number of rubber bands – is discrete. Due to
factor the presence of this discrete factor, we actually have two different centerpoints, each with two
runs. Runs 7 and 19 are with one rubber band, and the center of the other factors, while runs 2
and 13 are with two rubber bands and the center of the other factors.
5 confirmatory After analyzing the 20 runs and determining factor settings needed to achieve predicted distances
runs of 30, 60 and 90 inches, the team was asked to conduct 5 confirmatory runs at each of the derived
settings.
Analyze with The experimental data will be analyzed using SAS JMP 3.2.6 software.
JMP software
Histogram, box We start by plotting the data several ways to see if any trends or anomalies appear that would not
plot, and be accounted for by the models.
normal The distribution of the response is given below:
probability
plot of the
response
We can see the large spread of the data and a pattern to the data that should be explained by the
analysis.
Plot of Next we look at the responses versus the run order to see if there might be a time sequence
response component. The four highlighted points are the center points in the design. Recall that runs 2 and
versus run 13 had 2 rubber bands and runs 7 and 19 had 1 rubber band. There may be a slight aging of the
order rubber bands in that the second center point resulted in a distance that was a little shorter than the
first for each pair.
Several factors appear to change the average response level and most have a large spread at each
of the levels.
The resolution With a resolution V design we are able to estimate all the main effects and all two-factor
V design can interactions cleanly – without worrying about confounding. Therefore, the initial model will have
estimate main 16 terms – the intercept term, the 5 main effects, and the 10 two-factor interactions.
effects and all
2-factor
interactions
Variable Note we have used the orthogonally coded columns for the analysis, and have abbreviated the
coding factor names as follows:
Bheight = band height
Start = start angle
Bands = number of rubber bands
Stop = stop angle
Arm = arm length.
JMP output The following is the JMP output after fitting the trial model (all main factors and 2-factor
after fitting the interactions).
trial model (all
main factors
and 2-factor
interactions)
Use p-values to The model has a good R2 value, but the fact that R2 adjusted is considerably smaller indicates that
help select we undoubtedly have some terms in our model that are not significant. Scanning the column of
significant p-values (labeled Prob>|t| in the JMP output) for small values shows 5 significant effects at the
effects, and 0.05 level and another one at the 0.10 level.
also use a
normal plot The normal plot of effects is a useful graphical tool to determine significant effects. The graph
below shows that there are 9 terms in the model that can be assumed to be noise. That would
leave 6 terms to be included in the model. Whereas the output above shows a p-value of 0.0836
for the interaction of bands and arm, the normal plot suggests we treat this interaction as
significant.
A refit using Remove the non-significant terms from the model and refit to produce the following output:
just the effects
that appear to
matter
R2 is OK and The R2 and R2 adjusted values are acceptable. The ANOVA table shows us that the model is
there is no significant, and the Lack of Fit table shows that there is no significant lack of fit.
significant
model "lack of The Parameter estimates table is below.
fit"
Step 4: Test the model assumptions using residual graphs (adjust and simplify as needed)
Histogram of We should test that the residuals are approximately normally distributed, are independent, and
the residuals to have equal variances. First we create a histogram of the residual values.
test the model
assumptions
There does not appear to be a pattern to the residuals. One observation about the graph, from a
single point, is that the model performs poorly in predicting a short distance. In fact, run number
10 had a measured distance of 8 inches, but the model predicts -11 inches, giving a residual of 19.
The fact that the model predicts an impossible negative distance is an obvious shortcoming of the
model. We may not be successful at predicting the catapult settings required to hit a distance less
than 25 inches. This is not surprising since there is only one data value less than 28 inches. Recall
that the objective is for distances of 30, 60, and 90 inches.
Plot of Next we plot the residual values versus the run order of the design. The highlighted points are the
residuals centerpoint values. Recall that run numbers 2 and 13 had two rubber bands while run numbers 7
versus run and 19 had only one rubber band.
order
Plots of Next we look at the residual values versus each of the factors.
residuals
versus the
factor
variables
The residual Most of the residual graphs versus the factors appear to have a slight "frown" on the graph (higher
graphs are not residuals in the center). This may indicate a lack of fit, or sign of curvature at the centerpoint
ideal, although values. The Lack of Fit table, however, indicates that the lack of fit is not significant.
the model
passes "lack of
fit"
quantitative
tests
Consider a At this point, since there are several unsatisfactory features of the model we have fit and the
transformation resultant residuals, we should consider whether a simple transformation of the response variable
of the response (Y = "Distance") might improve the situation.
variable to see
if we can There are at least two good reasons to suspect that using the logarithm of distance as the response
obtain a better might lead to a better model.
model 1. A linear model fit to LN Y will always predict a positive distance when converted back to
the original scale for any possible combination of X factor values.
2. Physical considerations suggest that a realistic model for distance might require quadratic
terms since gravity plays a key role - taking logarithms often reduces the impact of
non-linear terms.
To see whether using LN Y as the response leads to a more satisfactory model, we return to step
3.
First a main Proceeding as before, using the coded columns of the matrix for the factor levels and Y = the
effects and natural logarithm of distance as the response, we initially obtain:
2-factor
interaction
model is fit to
the log
distance
responses
A simpler Examining the p-values of the 16 model coefficients, only the intercept and the 5 main effect
model with just terms appear significant. Refitting the model with just these terms yields the following results.
main effects
has a
satisfactory fit
This is a simpler model than previously obtained in Step 3 (no interaction term). All the terms are
highly significant and there is no quantitative indication of "lack of fit".
We next look at the residuals for this new model fit.
Step 4a: Test the (new) model assumptions using residual graphs (adjust and simplify as
needed)
Normal The following normal plot, box plot, and histogram of the residuals shows no problems.
probability
plot, box plot,
and histogram
of the residuals
Plot of A plot of the residuals versus the predicted LN Y values looks reasonable, although there might
residuals be a tendency for the model to overestimate slightly for high predicted values.
versus
predicted LN Y
values
Plot of Residuals plotted versus run order again show a possible slight decreasing trend (rubber band
residuals fatigue?).
versus run
order
Plot of Next we look at the residual values versus each of the factors.
residuals
versus the
factor
variables
The residuals These plots still appear to have a slight "frown" on the graph (higher residuals in the center).
for the main However, the model is generally an improvement over the previous model and will be accepted as
effects model possibly the best that can be done without conducting a new experiment designed to fit a
(fit to natural quadratic model.
log distance)
are reasonably
well behaved
Step 5: Use the results to answer the questions in your experimental objectives
Final step: The software used for this analysis (JMP 3.2.6) has an option called the "Prediction Profiler" that
quantify the can be used to derive settings that will yield a desired predicted natural log distance value. The
influence of all top graph in the figure below shows the direction and strength of each of the main effects in the
the significant model. Using natural log 30 = 3.401 as the target value, the Profiler allows us to set up a
effects and "Desirability" function that gives 3.401 a maximum desirability value of 1 and values above or
predict what below 3.401 have desirabilities that rapidly decrease to 0. This is shown by the desirability graph
settings should on the right (see the figure below).
be used to
obtain desired The next step is to set "bands" to either -1 or +1 (this is a discrete factor) and move the values of
distances the other factors interactively until a desirability as close as possible to 1 is obtained. In the figure
below, a desirability of .989218 was obtained, yielding a predicted natural log Y of 3.399351 (or a
distance of 29.94). The corresponding (coded) factor settings are: bheight = 0.17, start = -1, bands
= -1, arm = -1 and stop = 0.
Prediction
profile plots
for Y = 30
Prediction Repeating the profiler search for a Y value of 60 (or LN Y = 4.094) yielded the figure below for
profile plots which a natural log distance value of 4.094121 is predicted (a distance of 59.99) for coded factor
for Y = 60 settings of bheight = 1, start = 0, bands = -1, arm = .5 and stop = .5.
Prediction Finally, we set LN Y = LN 90 = 4.4998 and obtain (see the figure below) a predicted log distance
profile plots of 90.20 when bheight = -0.87, start = -0.52, bands = 1, arm = 1, and stop = 0.
for Y = 90
"Confirmation" In the confirmatory runs that followed the experiment, the team was successful at hitting all 3
runs were targets, but did not hit them all 5 times.
successful
NOTE: The model discovery and fitting process, as illustrated in this analysis, is often an
iterative process.
5. Process Improvement
5.4. Analysis of DOE data
5.4.7. Examples of DOE's
A CCD DOE This example uses experimental data published in Czitrom and Spagon, (1997), Statistical Case
with two Studies for Industrial Process Improvement. This material is copyrighted by the American
responses Statistical Association and the Society for Industrial and Applied Mathematics, and used with
their permission. Specifically, Chapter 15, titled "Elimination of TiN Peeling During Exposure to
CVD Tungsten Deposition Process Using Designed Experiments", describes a semiconductor
wafer processing experiment (labeled Experiment 2).
Goal, The goal of this experiment was to fit response surface models to the two responses, deposition
response layer Uniformity and deposition layer Stress, as a function of two particular controllable factors
variables, of the chemical vapor deposition (CVD) reactor process. These factors were Pressure (measured
and factor in torr) and the ratio of the gaseous reactants H2 and WF6 (called H2/WF6). The experiment also
variables included an important third (categorical) response - the presence or absence of titanium nitride
(TiN) peeling. That part of the experiment has been omitted in this example, in order to focus on
the response surface model aspects.
To summarize, the goal is to obtain a response surface model for each response where the
responses are: "Uniformity" and "Stress". The factors are: "Pressure" and "H2/WF6".
Experiment Description
The design is The maximum and minimum values chosen for pressure were 4 torr and 80 torr. The lower and
a 13-run CCI upper H2/WF6 ratios were chosen to be 2 and 10. Since response curvature, especially for
design with 3 Uniformity, was a distinct possibility, an experimental design that allowed estimating a second
centerpoint order (quadratic) model was needed. The experimenters decided to use a central composite
runs inscribed (CCI) design. For two factors, this design is typically recommended to have 13 runs
with 5 centerpoint runs. However, the experimenters, perhaps to conserve a limited supply of
wafer resources, chose to include only 3 centerpoint runs. The design is still rotatable, but the
uniform precision property has been sacrificed.
Table The table below shows the CCI design and experimental responses, in the order in which they
containing were run (presumably randomized). The last two columns show coded values of the factors.
the CCI Coded Coded
design and Run Pressure H2/WF6 Uniformity Stress Pressure H2/WF6
experimental 1 80 6 4.6 8.04 1 0
responses 2 42 6 6.2 7.78 0 0
3 68.87 3.17 3.4 7.58 0.71 -0.71
4 15.13 8.83 6.9 7.27 -0.71 0.71
5 4 6 7.3 6.49 -1 0
6 42 6 6.4 7.69 0 0
7 15.13 3.17 8.6 6.66 -0.71 -0.71
8 42 2 6.3 7.16 0 -1
9 68.87 8.83 5.1 8.33 0.71 0.71
10 42 10 5.4 8.19 0 1
11 42 6 5.0 7.90 0 0
Low values Note: "Uniformity" is calculated from four-point probe sheet resistance measurements made at 49
of both different locations across a wafer. The value used in the table is the standard deviation of the 49
responses measurements divided by their mean, expressed as a percentage. So a smaller value of
are better "Uniformity" indicates a more uniform layer - hence, lower values are desirable. The "Stress"
than high calculation is based on an optical measurement of wafer bow, and again lower values are more
desirable.
Steps for The steps for fitting a response surface (second-order or quadratic) model using the JMP 4.02
fitting a software for this example are as follows:
response 1. Specify the model in the "Fit Model" screen by inputting a response variable and the model
surface effects (factors) and using the macro labeled "Response Surface".
model using
2. Choose the "Stepwise" analysis option and select "Run Model".
JMP 4.02
(other 3. The stepwise regression procedure allows you to select probabilities (p-values) for adding
software or deleting model terms. You can also choose to build up from the simplest models by
packages adding and testing higher-order terms (the "forward" direction), or starting with the full
generally second-order model and eliminating terms until the most parsimonious, adequate model is
have similar obtained (the "backward" direction). In combining the two approaches, JMP tests for both
procedures) addition and deletion, stopping when no further changes to the model can be made. A
choice of p-values set at 0.10 generally works well, although sometimes the user has to
experiment here. Start the stepwise selection process by selecting "go".
4. "Stepwise" will generate a screen with recommended model terms checked and p-values
shown (these are called "Prob>F" in the output). Sometimes, based on p-values, you might
choose to drop, or uncheck, some of these terms. However, follow the hierarchy principle
and keep all main effects that are part of significant higher-order terms or interactions, even
if the main effect p-value is higher than you would like (note that not all analysts agree
with this principle).
5. Choose "make model" and "run model" to obtain the full range of JMP graphic and
analytical outputs for the selected model.
6. Examine the fitted model plot, normal plot of effects, interaction plots, residual plots, and
ANOVA statistics (R2, R2 adjusted, lack of fit test, etc.). By saving the residuals onto your
JMP worksheet you can generate residual distribution plots (histograms, box plots, normal
plots, etc.). Use all these plots and statistics to determine whether the model fit is
satisfactory.
7. Use the JMP contour profiler to generate response surface contours and explore the effect
of changing factor levels on the response.
8. Repeat all the above steps for the second response variable.
9. Save prediction equations for each response onto your JMP worksheet (there is an option
that does this for you). After satisfactory models have been fit to both responses, you can
use "Graph" and "Profiler" to obtain overlaid surface contours for both responses.
10. "Profiler" also allows you to (graphically) input a desirability function and let JMP find
optimal factor settings.
The displays below are copies of JMP output screens based on following the above 10 steps for
the "Uniformity" and "Stress" responses. Brief margin comments accompany the screen shots.
Fitting a Model to the "Uniformity" Response, Simplifying the Model and Checking
Residuals
Model We start with the model specification screen in which we input factors and responses and choose
specification the model we want to fit. We start with a full second-order model and select a "Stepwise Fit". We
screen and set "prob" to 0.10 and direction to "Mixed" and then "Go".
stepwise
regression
(starting
from a full
second-order
model)
output
The stepwise routine finds the intercept and three other terms (the main effects and the interaction
term) to be significant.
JMP output The following is the JMP analysis using the model selected by the stepwise regression in the
for analyzing previous step. The model is fit using coded factors, since the factor columns were given the
the model property "coded".
selected by
the stepwise
regression
for the
Uniformity
response
Plot of the We next perform a residuals analysis to validate the model. We first generate a plot of the
residuals residuals versus run order.
versus run
order
Normal plot, Next we generate a normal plot, a box plot, and a histogram of the residuals.
box plot, and
histogram of
the residuals
Viewing the above plots of the residuals does not show any reason to question the model.
Fitting a Model to the "Stress" Response, Simplifying the Model and Checking Residuals
Model We start with the model specification screen in which we input factors and responses and choose
specification the model we want to fit. This time the "Stress" response will be modeled. We start with a full
screen and second-order model and select a "Stepwise Fit". We set "prob" to 0.10 and direction to "Mixed"
stepwise and then "Go".
regression
(starting
from a full
second-order
model)
output
The stepwise routine finds the intercept, the main effects, and Pressure squared to be signficant
terms.
JMP output The following is the JMP analysis using the model selected by the stepwise regression, which
for analyzing contains four significant terms, in the previous step. The model is fit using coded factors, since
the model the factor columns were given the property "coded".
selected by
the stepwise
regression
for the Stress
response
Plot of the We next perform a residuals analysis to validate the model. We first generate a plot of the
residuals residuals versus run order.
versus run
order
Normal plot, Next we generate a normal plot, a box plot, and a histogram of the residuals.
box plot, and
histogram of
the residuals
Viewing the above plots of the residuals does not show any reason to question the model.
"Contour JMP has a "Contour Profiler" and "Prediction Profiler" that visually and interactively show how
Profiler" and the responses vary as a function of the input factors. These plots are shown here for both the
"Prediction Uniformity and the Stress response.
Profiler"
Desirability You can graphically construct a desirability function and let JMP find the factor settings that
function: maximize it - here it suggests that Pressure should be as high as possible and H2/WF6 as low as
Pressure possible.
should be as
high as
possible and
H2/WF6 as
low as
possible
Summary
Final The response surface models fit to (coded) "Uniformity" and "Stress" were:
response
surface Uniformity = 5.93 - 1.91*Pressure - 0.22*H2/WF6 + 1.70*Pressure*H2/WF6
models
Stress = 7.73 + 0.74*Pressure + 0.50*H2/WF6 - 0.49*Pressure2
Trade-offs These models and the corresponding profiler plots show that trade-offs have to be made when
are often trying to achieve low values for both "Uniformity" and "Stress" since a high value of "Pressure"
needed for is good for "Uniformity" while a low value of "Pressure" is good for "Stress". While low values
multiple of H2/WF6 are good for both responses, the situation is further complicated by the fact that the
responses "Peeling" response (not considered in this analysis) was unacceptable for values of H2/WF6
below approximately 5.
"Uniformity" In this case, the experimenters chose to focus on optimizing "Uniformity" while keeping H2/WF6
was chosen at 5. That meant setting "Pressure" at 80 torr.
as more
important
Confirmation A set of 16 verification runs at the chosen conditions confirmed that all goals, except those for the
runs "Stress" response, were met by this set of process settings.
validated the
model
projections
5. Process Improvement
6. Taguchi designs
7. John's 3/4 fractional factorial designs
8. Small composite designs
9. An EDA approach to experimental design
1. Ordered data plot
2. Dex scatter plot
3. Dex mean plot
4. Interaction effects matrix plot
5. Block plot
6. DEX Youden plot
7. |Effects| plot
8. Half-normal probability plot
9. Cumulative residual standard deviation plot
10. DEX contour plot
5. Process Improvement
5.5. Advanced topics
Computer- When situations such as the above exist, computer-aided designs are a
aided useful option. In some situations, computer-aided designs are the only
designs option an experimenter has.
5. Process Improvement
5.5. Advanced topics
Optimality There are various forms of optimality criteria that are used to select
critieria the points for a design.
5. Process Improvement
5.5. Advanced topics
5.5.2. What is a computer-aided design?
These designs These types of designs are always an option regardless of the type of
are always model the experimenter wishes to fit (for example, first order, first
an option order plus some interactions, full quadratic, cubic, etc.) or the objective
regardless of specified for the experiment (for example, screening, response surface,
model or etc.). D-optimal designs are straight optimizations based on a chosen
resolution optimality criterion and the model that will be fit. The optimality
desired criterion used in generating D-optimal designs is one of maximizing
|X'X|, the determinant of the information matrix X'X.
You start This optimality criterion results in minimizing the generalized variance
with a of the parameter estimates for a pre-specified model. As a result, the
candidate set 'optimality' of a given D-optimal design is model dependent. That is,
of runs and the experimenter must specify a model for the design before a
the algorithm computer can generate the specific treatment combinations. Given the
chooses a total number of treatment runs for an experiment and a specified
D-optimal set model, the computer algorithm chooses the optimal set of design runs
of design from a candidate set of possible design treatment runs. This candidate
runs set of treatment runs usually consists of all possible combinations of
various factor levels that one wishes to use in the experiment.
In other words, the candidate set is a collection of treatment
combinations from which the D-optimal algorithm chooses the
treatment combinations to include in the design. The computer
algorithm generally uses a stepping and exchanging process to select
No guarantee Note: There is no guarantee that the design the computer generates is
actually D-optimal.
D-optimal The reasons for using D-optimal designs instead of standard classical
designs are designs generally fall into two categories:
particularly 1. standard factorial or fractional factorial designs require too many
useful when runs for the amount of resources or time allowed for the
resources are experiment
limited or
2. the design space is constrained (the process space contains factor
there are
settings that are not feasible or are impossible to run).
constraints
on factor
settings
Industrial Industrial examples of these two situations are given below and the
example process flow of how to generate and analyze these types of designs is
demostrated also given. The software package used to demonstrate this is JMP
with JMP version 3.2. The flow presented below in generating the design is the
software flow that is specified in the JMP Help screens under its D-optimal
platform.
Create the Given the above experimental specifications, the first thing to do
candidate set toward generating the design is to create the candidate set. The
candidate set is a data table with a row for each point (run) you want
considered for your design. This is often a full factorial. You can create
a candidate set in JMP by using the Full Factorial design given by the
Design Experiment command in the Tables menu. The candidate set
for this example is shown below. Since the candidate set is a full
factorial in all factors, the candidate set contains (5)*(2)*(2) = 20
possible design runs.
Specify (and Once the candidate set has been created, specify the model you want in
run) the the Fit Model dialog. Do not give a response term for the model! Select
model in the D-Optimal as the fitting personality in the pop-up menu at the bottom
Fit Model of the dialog. Click Run Model and use the control panel that appears.
dialog Enter the number of runs you want in your design (N=12 in this
example). You can also edit other options available in the control
panel. This control panel and the editable options are shown in the
table below. These other options refer to the number of points chosen
at random at the start of an excursion or trip (N Random), the number
of worst points at each K-exchange step or iteration (K-value), and the
number of times to repeat the search (Trips). Click Go.
For this example, the table below shows how these options were set
and the reported efficiency values are relative to the best design found.
Best Design
D-efficiency 68.2558
A-efficiency 45.4545
G-efficiency 100
AvgPredSE 0.6233
N 12.0000
The The four line efficiency report given after each search shows the best
algorithm design over all the excursions (trips). D-efficiency is the objective,
computes which is a volume criterion on the generalized variance of the
efficiency estimates. The efficiency of the standard fractional factorial is 100%,
numbers to but this is not possible when pure quadratic terms such as (X1)2 are
zero in on a included in the model.
D-optimal
design The efficiency values are a function of the number of points in the
design, the number of independent variables in the model, and the
maximum standard error for prediction over the design points. The best
design is the one with the highest D-efficiency. The A-efficiencies and
G-efficiencies help choose an optimal design when multiple excursions
produce alternatives with similar D-efficiency.
Using several The search for a D-optimal design should be made using several
excursions excursions or trips. In each trip, JMP 3.2 chooses a different set of
(or trips) random seed points, which can possibly lead to different designs. The
recommended Save button saves the best design found. The standard error of
prediction is also saved under the variable OptStdPred in the table.
The selected The D-optimal design using 12 runs that JMP 3.2 created is listed
design should below in standard order. The design runs should be randomized before
be the treatment combinations are executed.
randomized
Parameter To see the correlations of the parameter estimates for the best design
estimates are found, you can click on the Correlations button in the D-optimal
usually Search Control Panel. In most D-optimal designs, the correlations
correlated among the estimates are non-zero. However, in this particular example,
the correlations are zero.
Other Note: Other software packages (or even other releases of JMP) may
software may have different procedures for generating D-optimal designs - the above
generate a example is a highly software dependent illustration of how to generate
different a D-optimal design.
D-optimal
design
5. Process Improvement
5.5. Advanced topics
5.5.2. What is a computer-aided design?
Properties The properties of this final design will probably not compare with those
of this final of the original design and there may exist some correlation among the
design may estimates. However, instead of not being able to use any of the data for
not compare analysis, generating the replacement runs from a computer-aided design,
with those of a D-optimal design for example, allows one to analyze the data.
the original Furthermore, computer-aided designs can be used to augment a classical
design design with treatment combinations that will break alias chains among
the terms in the model or permit the estimation of curvilinear effects.
5. Process Improvement
5.5. Advanced topics
The DOE The experimental optimization of response surface models differs from
approach to classical optimization techniques in at least three ways:
optimization
Randomness 2. The response models are fit from experimental data that usually
(sampling contain random variability due to uncontrollable or unknown
variability) causes. This implies that an experiment, if repeated, will result in
affects the a different fitted response surface model that might lead to
final different optimal operating conditions. Therefore, sampling
answers and variability should be considered in experimental optimization.
should be
taken into In contrast, in classical optimization techniques the functions are
account deterministic and given.
Optimization 3. The fitted responses are local approximations, implying that the
process optimization process requires the input of the experimenter (a
requires person familiar with the process). This is in contrast with
input of the classical optimization which is always automated in the form of
experimenter some computer algorithm.
5. Process Improvement
5.5. Advanced topics
5.5.3. How do you optimize a process?
If there is The second phase is performed when there is lack of linear fit in Phase I, and instead, a
lack of fit for second-order or quadratic polynomial regression model of the general form
linear
models,
quadratic
models are
tried next is fit. Not all responses will require quadratic fit, and in such cases Phase I is stopped when the
response of interest cannot be improved any further. Each phase is explained and illustrated in the
next few sections.
"Flowchart" The following is a flow chart showing the two phases of experimental optimization.
for two
phases of
experimental
optimization
5. Process Improvement
5.5. Advanced topics
5.5.3. How do you optimize a process?
5.5.3.1. Single response case
Experimental strategies for fitting this type of model were discussed earlier. Usually, a 2k-p
fractional factorial experiment is conducted with repeated runs at the current operating
conditions (which serve as the origin of coordinates in orthogonally coded factors).
Determine the The idea behind "Phase I" is to keep experimenting along the direction of steepest ascent (or
directions of descent, as required) until there is no further improvement in the response. At that point, a new
steepest fractional factorial experiment with center runs is conducted to determine a new search
ascent and
direction. This process is repeated until at some point significant curvature in is detected.
continue
This implies that the operating conditions X1, X2, ...,Xk are close to where the maximum (or
experimenting
until no minimum, as required) of Y occurs. When significant curvature, or lack of fit, is detected, the
further experimenter should proceed with "Phase II". Figure 5.2 illustrates a sequence of line searches
improvement when seeking a region where curvature exists in a problem with 2 factors (i.e., k=2).
occurs - then
iterate the
process
Two main There are two main decisions an engineer must make in Phase I:
decisions: 1. determine the search direction;
search
2. determine the length of the step to move from the current operating conditions.
direction and
length of step Figure 5.3 shows a flow diagram of the different iterative tasks required in Phase I. This
diagram is intended as a guideline and should not be automated in such a way that the
experimenter has no input in the optimization process.
Flow chart of
iterative
search
process
FIGURE 5.3: Flow Chart for the First Phase of the Experimental Optimization
Procedure
The direction Suppose a first-order model (like above) has been fit and provides a useful approximation. As
of steepest long as lack of fit (due to pure quadratic curvature and interactions) is very small compared to
ascent is the main effects, steepest ascent can be attempted. To determine the direction of maximum
determined by improvement we use
the gradient
of the fitted 1. the estimated direction of steepest ascent, given by the gradient of , if the objective is
model to maximize Y;
2. the estimated direction of steepest descent, given by the negative of the gradient of , if
the objective is to minimize Y.
The direction The direction of the gradient, g, is given by the values of the parameter estimates, that is, g' =
of steepest (b1, b2, ..., bk). Since the parameter estimates b1, b2, ..., bk depend on the scaling convention for
ascent the factors, the steepest ascent (descent) direction is also scale dependent. That is, two
depends on experimenters using different scaling conventions will follow different paths for process
the scaling improvement. This does not diminish the general validity of the method since the region of the
convention - search, as given by the signs of the parameter estimates, does not change with scale. An equal
equal variance scaling convention, however, is recommended. The coded factors xi, in terms of the
variance factors in the original units of measurement, Xi, are obtained from the relation
scaling is
recommended
This coding convention is recommended since it provides parameter estimates that are scale
independent, generally leading to a more reliable search direction. The coordinates of the factor
settings in the direction of steepest ascent, positioned a distance from the origin, are given
by:
Solution is a This problem can be solved with the aid of an optimization solver (e.g., like the solver option
simple of a spreadsheet). However, in this case this is not really needed, as the solution is a simple
equation equation that yields the coordinates
Equation can An engineer can compute this equation for different increasing values of and obtain different
be computed factor settings, all on the steepest ascent direction.
for increasing
values of To see the details that explain this equation, see Technical Appendix 5A.
Optimization It has been concluded (perhaps after a factor screening experiment) that the yield (Y, in %) of a
by search chemical process is mainly affected by the temperature (X1, in C) and by the reaction time
example (X2, in minutes). Due to safety reasons, the region of operation is limited to
Factor levels The process is currently run at a temperature of 200 C and a reaction time of 200 minutes. A
process engineer decides to run a 22 full factorial experiment with factor levels at
factor low center high
Orthogonally Five repeated runs at the center levels are conducted to assess lack of fit. The orthogonally
coded factors coded factors are
ANOVA table The corresponding ANOVA table for a first-order polynomial model, obtained using the
DESIGN EASE statistical software, is
SUM OF MEAN F
SOURCE SQUARES DF SQUARE VALUE PROB>F
Resulting It can be seen from the ANOVA table that there is no significant lack of linear fit due to an
model interaction term and there is no evidence of curvature. Furthermore, there is evidence that the
first-order model is significant. Using the DESIGN EXPERT statistical software, we obtain the
resulting model (in the coded variables) as
Diagnostic The usual diagnostic checks show conformance to the regression assumptions, although the R2
checks value is not very high: R2 = 0.6580.
Determine
To maximize , we use the direction of steepest ascent. The engineer selects = 1 since a
level of
factors for point on the steepest ascent direction one unit (in the coded units) from the origin is desired.
next run Then from the equation above for the predicted Y response, the coordinates of the factor levels
using for the next run are given by:
direction of
steepest
ascent
and
This means that to improve the process, for every (-0.1152)(30) = -3.456 C that temperature is
varied (decreased), the reaction time should be varied by (0.9933(50) = 49.66 minutes.
===========================================================
Technical Appendix 5A: finding the factor settings on the steepest ascent direction a
specified distance from the origin
Details of The problem of finding the factor settings on the steepest ascent/descent direction that are
how to located a distance from the origin is given by the optimization problem,
determine the
path of
steepest
ascent
Solve using a To solve it, use a Lagrange multiplier approach. First, add a penalty for solutions not
Lagrange satisfying the constraint (since we want a direction of steepest ascent, we maximize, and
multiplier therefore the penalty is negative). For steepest descent we minimize and the penalty term is
approach added instead.
Solve two These two equations have two unknowns (the vector x and the scalar ) and thus can be solved
equations in yielding the desired solution:
two unknowns
Multiples of From this equation we can see that any multiple of the direction of the gradient (given by
the direction
of the ) will lead to points on the steepest ascent direction. For steepest descent, use instead
gradient -bi in the numerator of the equation above.
5. Process Improvement
5.5. Advanced topics
5.5.3. How do you optimize a process?
5.5.3.1. Single response case
"Randomness" The direction given by the gradient g' = (b0, b2, ... , bk) constitutes only a single (point) estimate
means that the based on a sample of N runs. If a different set of N runs were conducted, these would provide
steepest different parameter estimates, which in turn would give a different gradient. To account for this
ascent sampling variability, Box and Draper gave a formula for constructing a "cone" around the
direction is direction of steepest ascent that with certain probability contains the true (unknown) system
just an
estimate and it gradient given by . The width of the confidence cone is useful to assess how
is possible to reliable an estimated search direction is.
construct a
confidence Figure 5.4 shows such a cone for the steepest ascent direction in an experiment with two factors.
"cone' around If the cone is so wide that almost every possible direction is inside the cone, an experimenter
this direction should be very careful in moving too far from the current operating conditions along the path of
estimate steepest ascent or descent. Usually this will happen when the linear fit is quite poor (i.e., when the
R2 value is low). Thus, plotting the confidence cone is not so important as computing its width.
If you are interested in the details on how to compute such a cone (and its width), see Technical
Appendix 5B.
Graph of a
confidence
cone for the
steepest
ascent
direction
FIGURE 5.4: A Confidence Cone for the Steepest Ascent Direction in an Experiment with 2
Factors
=============================================================
Technical Appendix 5B: Computing a Confidence Cone on the Direction of Steepest Ascent
Details of how Suppose the response of interest is adequately described by a first-order polynomial model.
to construct a Consider the inequality
confidence
cone for the
direction of
steepest
ascent with
Cjj is the j-th diagonal element of the matrix (X'X)-1 (for j = 1, ..., k these values are all equal if
the experimental design is a 2k-p factorial of at least Resolution III), and X is the model matrix of
the experiment (including columns for the intercept and second-order terms, if any). Any
operating condition with coordinates x' = (x1, x2, ..., xk) that satisfies this inequality generates a
direction that lies within the 100(1- )% confidence cone of steepest ascent if
Inequality The inequality defines a cone with the apex at the origin and center line located along the gradient
defines a cone
of .
A measure of A measure of "goodness" of a search direction is given by the fraction of directions excluded by
goodnes of fit: the 100(1- )% confidence cone around the steepest ascent/descent direction (see Box and
Draper, 1987) which is given by:
with Tk-1() denoting the complement of the Student's-t distribution function with k-1 degrees of
freedom (that is, Tk-1(x) = P(tk-1 x)) and F , k-1, n-p denotes an percentage point of the F
distribution with k-1 and n-p degrees of freedom, with n-p denoting the error degrees of freedom.
The value of represents the fraction of directions included by the confidence cone. The
smaller is, the wider the cone is, with . Note that the inequality equation and the
"goodness measure" equation are valid when operating conditions are given in coded units.
Example: Computing
Compute From the ANOVA table in the chemical experiment discussed earlier
from ANOVA = (52.3187)(1/4) = 13.0796
table and Cjj
since Cjj = 1/4 (j=2,3) for a 22 factorial. The fraction of directions excluded by a 95% confidence
cone in the direction of steepest ascent is:
Compute
Conclusions since F0.05,1,6 = 5.99. Thus 71.05% of the possible directions from the current operating point are
for this excluded with 95% confidence. This is useful information that can be used to select a step length.
example The smaller is, the shorter the step should be, as the steepest ascent direction is less reliable. In
this example, with high confidence, the true steepest ascent direction is within this cone of 29%
of possible directions. For k=2, 29% of 360o = 104.4o, so we are 95% confident that our estimated
steepest ascent path is within plus or minus 52.2o of the true steepest path. In this case, we should
not use a large step along the estimated steepest ascent path.
5. Process Improvement
5.5. Advanced topics
5.5.3. How do you optimize a process?
5.5.3.1. Single response case
However, the procedure below considers the original units of measurement which
are easier to deal with than the coded "distance" .
Procedure The following is the procedure for selecting the step length.
for selecting 1. Choose a step length Xj (in natural units of measurement) for some factor
the step j. Usually, factor j is chosen to be the one engineers feel more comfortable
length varying, or the one with the largest |bj|. The value of Xj can be based on
the width of the confidence cone around the steepest ascent/descent
direction. Very wide cones indicate that the estimated steepest ascent/descent
direction is not reliable, and thus Xj should be small. This usually occurs
when the R2 value is low. In such a case, additional experiments can be
conducted in the current experimental region to obtain a better model fit and
a better search direction.
2. Transform to coded units:
with sj denoting the scale factor used for factor j (e.g., sj = rangej/2).
● .
● .
● X2 = (-0.1160)(30) = -3.48oC.
Thus the step size is X' = (-3.48oC, 50 minutes).
Procedure The following is the procedure for conducting experiments along the direction of
for maximum improvement.
conducting 1. Given current operating conditions = (X1, X2, ..., Xk) and a step size X'
experiments
along the = ( X1, X2, ..., Xk), perform experiments at factor levels X0 + X, X0
direction of + 2 X, X0 + 3 X, ... as long as improvement in the response Y (decrease or
maximum increase, as desired) is observed.
improvement 2. Once a point has been reached where there is no further improvement, a new
first-order experiment (e.g., a 2k-p fractional factorial) should be performed
with repeated center runs to assess lack of fit. If there is no significant
evidence of lack of fit, the new first-order model will provide a new search
direction, and another iteration is performed as indicated in Figure 5.3.
Otherwise (there is evidence of lack of fit), the experimental design is
augmented and a second-order model should be fitted. That is, the
experimenter should proceed to "Phase II".
Step 1: Step 1:
increase
factor levels Given X0 = (200oC, 200 minutes) and X = (-3.48oC, 50 minutes), the next
by experiments were performed as follows (the step size in temperature was rounded
to -3.5oC for practical reasons):
X1 X2 x1 x2 Y (= yield)
X0 200 200 0 0
X0 + X 196.5 250 -0.1160 1 56.2
X0 + 2 X 193.0 300 -0.2320 2 71.49
X0 + 3 X 189.5 350 -0.3480 3 75.63
X0 + 4 X 186.0 400 -0.4640 4 72.31
X0 + 5 X 182.5 450 -0.5800 5 72.10
Since the goal is to maximize Y, the point of maximum observed response is X1 =
189.5oC, X2 = 350 minutes. Notice that the search was stopped after 2 consecutive
drops in response, to assure that we have passed by the "peak" of the "hill".
Five center runs (at X1 = 189.5, X2 = 350) were repeated to assess lack of fit. The
experimental results were:
x1 x2 X1 X2 Y (= yield)
-1 -1 159.5 300 64.33
+1 -1 219.5 300 51.78
-1 +1 159.5 400 77.30
+1 +1 219.5 400 45.37
0 0 189.5 350 62.08
0 0 189.5 350 79.36
0 0 189.5 350 75.29
0 0 189.5 350 73.81
0 0 189.5 350 69.45
The corresponding ANOVA table for a linear model, obtained using the
SUM OF MEAN F
SOURCE SQUARES DF SQUARE VALUE PROB > F
5. Process Improvement
5.5. Advanced topics
5.5.3. How do you optimize a process?
5.5.3.1. Single response case
Second- Once a linear model exhibits lack of fit or when significant curvature is detected, the experimental
order design used in Phase I (recall that a 2k-p factorial experiment might be used) should be augmented
polynomial with axial runs on each factor to form what is called a central composite design. This
model experimental design allows estimation of a second-order polynomial of the form
Steps to find If the corresponding analysis of variance table indicates no lack of fit for this model, the engineer
optimal can proceed to determine the estimated optimal operating conditions.
operating 1. Using some graphics software, obtain a contour plot of the fitted response. If the number of
conditions factors (k) is greater than 2, then plot contours in all planes corresponding to all the
possible pairs of factors. For k greater than, say, 5, this could be too cumbersome (unless
the graphic software plots all pairs automatically). In such a case, a "canonical analysis" of
the surface is recommended (see Technical Appendix 5D).
2. Use an optimization solver to maximize or minimize (as desired) the estimated response .
3. Perform a confirmation experiment at the estimated optimal operating conditions given by
the solver in step 2.
Illustrate with We illustrate these steps with the DESIGN-EXPERT software and our chemical experiment
DESIGN- discussed before. For a technical description of a formula that provides the coordinates of the
EXPERT stationary point of the surface, see Technical Appendix 5C.
software
Experimental Recall that in the chemical experiment, the ANOVA table, obtained from using an experiment run
results for around the coordinates X1 = 189.5, X2 = 350, indicated significant curvature effects. Augmenting
axial runs
the 22 factorial experiment with axial runs at to achieve a rotatable central
composite experimental design, the following experimental results were obtained:
x1 x2 X1 X2 Y (= yield)
-1.414 0 147.08 350 72.58
+1.414 0 231.92 350 37.42
0 -1.414 189.5 279.3 54.63
0 +1.414 189.5 420.7 54.18
ANOVA table The corresponding ANOVA table for the different effects, based on the sequential sum of squares
procedure of the DESIGN-EXPERT software, is
SUM OF MEAN F
SOURCE SQUARES DF SQUARE VALUE PROB > F
Lack of fit From the table, the linear and quadratic effects are significant. The lack of fit tests and auxiliary
tests and diagnostic statistics are:
auxillary
diagnostic SUM OF MEAN F
statistics MODEL SQUARES DF SQUARE VALUE PROB > F
Step 1:
Contour plot A contour plot of this function (Figure 5.5) shows that it appears to have a single optimum point
of the fitted in the region of the experiment (this optimum is calculated below to be (-.9285,.3472), in coded
response x1, x2 units, with a predicted response value of 77.59).
function
3D plot of the Since there are only two factors in this example, we can also obtain a 3D plot of the fitted
fitted response against the two factors (Figure 5.6).
response
function
Step 2:
Optimization The optimization routine in DESIGN-EXPERT was invoked for maximizing . The results are
point = 161.64oC, = 367.32 minutes. The estimated yield at the optimal point is (X*) =
77.59%.
Step 3:
Confirmation A confirmation experiment was conducted by the process engineer at settings X1 = 161.64, X2 =
experiment 367.32. The observed response was (X*) = 76.5%, which is satisfactorily close to the estimated
optimum.
==================================================================
Technical Appendix 5C: Finding the Factor Settings for the Stationary Point of a Quadratic
Response
is a matrix of second-order parameter estimates and x' = (x1, x2, ..., xk) is the vector of
controllable factors. Notice that the off-diagonal elements of B are equal to half the
two-factor interaction coefficients.
2. Equating the partial derivatives of with respect to x to zeroes and solving the resulting
system of equations, the coordinates of the stationary point of the response are given by
Nature of the The nature of the stationary point (whether it is a point of maximum response, minimum
stationary response, or a saddle point) is determined by the matrix B. The two-factor interactions do not, in
point is general, let us "see" what type of point x* is. One thing that can be said is that if the diagonal
determined by elements of B (the bii have mixed signs, x* is a saddle point. Otherwise, it is necessary to look at
B the characteristic roots or eigenvalues of B to see whether B is "positive definite" (so x* is a point
of minimum response) or "negative definite" (the case in which x* is a point of maximum
response). This task is easier if the two-factor interactions are "eliminated" from the fitted
equation as is described in Technical Appendix 5D.
Example of The fitted quadratic equation in the chemical experiment discussed in Section 5.5.3.1.1 is, in
computing the coded units,
stationary
point
from which we obtain b' = (-11.78, 0.74),
and
Transforming back to the original units of measurement, the coordinates of the stationary point
are
.
Notice this is the same solution as was obtained by using the optimization routine of
DESIGN-EXPERT (see section 5.5.3.1.1). The predicted response at the stationary point is (X*)
= 77.59%.
Case for a Whether the stationary point X* represents a point of maximum or minimum response, or is just a
single saddle point, is determined by the matrix of second-order coefficients, B. In the simpler case of
controllable just a single controllable factor (k=1), B is a scalar proportional to the second derivative of (x)
response with respect to x. If d2 /dx2 is positive, recall from calculus that the function (x) is convex
("bowl shaped") and x* is a point of minimum response.
Case for Unfortunately, the multiple factor case (k>1) is not so easy since the two-factor interactions (the
multiple off-diagonal elements of B) obscure the picture of what is going on. A recommended procedure
controllable for analyzing whether B is "positive definite" (we have a minimum) or "negative definite" (we
responses not have a maximum) is to rotate the axes x1, x2, ..., xk so that the two-factor interactions disappear. It
so easy is also customary (Box and Draper, 1987; Khuri and Cornell, 1987; Myers and Montgomery,
1995) to translate the origin of coordinates to the stationary point so that the intercept term is
eliminated from the equation of (x). This procedure is called the canonical analysis of (x).
Steps for 1. Define a new axis z = x - x* (translation step). The fitted equation becomes
performing
the canonical .
analysis
2. Define a new axis w = E'z, with E'BE = D and D a diagonal matrix to be defined (rotation
step). The fitted equation becomes
This is the so-called canonical form of the model. The elements on the diagonal of D, i (i
= 1, 2, ..., k) are the eigenvalues of B. The columns of E', ei, are the orthonormal
eigenvectors of B, which means that the ei satisfy (B - i)ei = 0, = 0 for i j, and
= 1.0.
3. If all the i are negative, x* is a point of maximum response. If all the i are positive, x* is
a point of minimum response. Finally, if the i are of mixed signs, the response is a saddle
function and x* is the saddle point.
Canonical It is nice to know that the JMP or SAS software (PROC RSREG) computes the eigenvalues i
analysis and the orthonormal eigenvectors ei; thus there is no need to do a canonical analysis by hand.
typically
performed by
software
B matrix for Let us return to the chemical experiment example. This illustrate the method, but keep in mind
this example that when the number of factors is small (e.g., k=2 as in this example) canonical analysis is not
recommended in practice since simple contour plotting will provide sufficient information. The
fitted equation of the model yields
Compute the To compute the eigenvalues i, we have to find all roots of the expression that results from
eigenvalues equating the determinant of B - iI to zero. Since B is symmetric and has real coefficients, there
and find the will be k real roots i, i = 1, 2, ..., k. To find the orthonormal eigenvectors, solve the simultaneous
orthonormal
eigenvectors equations (B - iI)ei = 0 and = 1.
SAS code for This is the hard way, of course. These computations are easily performed using the SAS software
performing PROC RSREG. The SAS program applied to our example is:
the canonical
analysis data;
input x1 x2 y;
cards;
-1 -1 64.33
1 -1 51.78
-1 1 77.30
1 1 45.37
0 0 62.08
0 0 79.36
0 0 75.29
0 0 73.81
0 0 69.45
-1.414 0 72.58
1.414 0 37.42
0 -1.414 54.63
0 1.414 54.18
;
proc rsreg;
model y=x1 x2 /nocode/lackfit;
run;
The "nocode" option was used since the factors had been input in coded form.
SAS output The corresponding output from the SAS canonical analysis is as follows:
from the
canonical Canonical Analysis of Response Surface
analysis
Critical
Factor Value
X1 -0.922
X2 0.346800
Eigenvectors
Eigenvalues X1 X2
5. Process Improvement
5.5. Advanced topics
5.5.3. How do you optimize a process?
5.5.3.1. Single response case
5. Process Improvement
5.5. Advanced topics
5.5.3. How do you optimize a process?
5.5.3.1. Single response case
Ridge analysis "Ridge Analysis", as developed by Hoerl (1959), Hoerl (1964) and
is a method for Draper (1963), is an optimization technique that finds factor settings
finding optimal x* such that they
factor settings
that satisfy optimize (x) = b0 + b'x + x'Bx
certain 2
subject to: x'x =
constraints
The solution x* to this problem provides operating conditions that
yield an estimated absolute maximum or minimum response on a
sphere of radius . Different solutions can be obtained by trying
different values of .
Solve with The original formulation of Ridge Analysis was based on the
non-linear eigenvalues of a stationarity system. With the wide availability of
programming non-linear programming codes, Ridge Analysis problems can be
software solved without recourse to eigenvalue analysis.
5. Process Improvement
5.5. Advanced topics
5.5.3. How do you optimize a process?
5. Process Improvement
5.5. Advanced topics
5.5.3. How do you optimize a process?
5.5.3.2. Multiple response case
A weighted The following is a weighted priority strategy using the path of steepest ascent for
priority each response.
strategy is 1. Compute the gradients gi (i = 1, 2, . . ., k) of all responses as explained in
described section 5.5.3.1.1. If one of the responses is clearly of primary interest
using the
compared to the others, use only the gradient of this response and follow the
path of
procedure of section 5.5.3.1.1. Otherwise, continue with step 2.
steepest
ascent for 2. Determine relative priorities for each of the k responses. Then, the
each weighted gradient for the search direction is given by
response
Weighting The confidence cone for the direction of maximum improvement explained in
factors section 5.5.3.1.2 can be used to weight down "poor" response models that provide
based on R2 very wide cones and unreliable directions. Since the width of the cone is
proportional to (1 - R2), we can use
Single Given a weighted direction of maximum improvement, we can follow the single
response response steepest ascent procedure as in section 5.5.3.1.1 by selecting points with
steepest coordinates x* = di, i = 1, 2, ..., k. These and related issues are explained more
ascent fully in Del Castillo (1996).
procedure
with = 0.5977. We wish to maximize the mean yield while minimizing the
standard deviation of the yield.
Find relative Since there are no clear priorities, we use the quality of fit as the priority:
priorities
5. Process Improvement
5.5. Advanced topics
5.5.3. How do you optimize a process?
5.5.3.2. Multiple response case
Desirability If a response is of the "target is best" kind, then its individual desirability function is
function for
"target is
best"
with the exponents s and t determining how important it is to hit the target value. For s = t = 1, the
desirability function increases linearly towards Ti; for s < 1, t < 1, the function is convex, and for
s > 1, t > 1, the function is concave (see the example below for an illustration).
with Ti in this case interpreted as a large enough value for the response.
Example:
An example Derringer and Suich (1980) present the following multiple response experiment arising in the
using the development of a tire tread compound. The controllable factors are: x1, hydrated silica level, x2,
desirability silane coupling agent level, and x3, sulfur level. The four responses to be optimized and their
approach
desired ranges are:
Experimental The following experiments were conducted using a central composite design.
runs from a Run
central Number x1 x2 x3 Y1 Y2 Y3 Y4
composite
design 1 -1.00 -1.00 -1.00 102 900 470 67.5
2 +1.00 -1.00 -1.00 120 860 410 65.0
3 -1.00 +1.00 -1.00 117 800 570 77.5
4 +1.00 +1.00 -1.00 198 2294 240 74.5
5 -1.00 -1.00 +1.00 103 490 640 62.5
6 +1.00 -1.00 +1.00 132 1289 270 67.0
7 -1.00 +1.00 +1.00 132 1270 410 78.0
8 +1.00 +1.00 +1.00 139 1090 380 70.0
9 -1.63 0.00 0.00 102 770 590 76.0
10 +1.63 0.00 0.00 154 1690 260 70.0
11 0.00 -1.63 0.00 96 700 520 63.0
12 0.00 +1.63 0.00 163 1540 380 75.0
13 0.00 0.00 -1.63 116 2184 520 65.0
14 0.00 0.00 +1.63 153 1784 290 71.0
15 0.00 0.00 0.00 133 1300 380 70.0
16 0.00 0.00 0.00 133 1300 380 68.5
17 0.00 0.00 0.00 140 1145 430 68.0
18 0.00 0.00 0.00 142 1090 430 68.0
19 0.00 0.00 0.00 145 1260 390 69.0
20 0.00 0.00 0.00 142 1344 390 70.0
Fitted Using ordinary least squares and standard diagnostics, the fitted responses are:
response
Optimization Optimization of D with respect to x was carried out using the Design-Expert software. Figure 5.7
performed by shows the individual desirability functions di( i) for each of the four responses. The functions
Design-Expert
are linear since the values of s and t were set equal to one. A dot indicates the best solution found
software
by the Design-Expert solver.
Diagram of
desirability
functions and
optimal
solutions
FIGURE 5.7 Desirability Functions and Optimal Solution for Example Problem
Best Solution The best solution is (x*)' = (-0.10, 0.15, -1.0) and results in:
d1( 1) = 0.34 ( 1(x*) = 136.4)
3D plot of the Figure 5.8 shows a 3D plot of the overall desirability function D(x) for the (x2, x3) plane when x1
overall is fixed at -0.10. The function D(x) is quite "flat" in the vicinity of the optimal solution, indicating
desirability that small variations around x* are predicted to not change the overall desirability drastically.
function However, the importance of performing confirmatory runs at the estimated optimal operating
conditions should be emphasized. This is particularly true in this example given the poor fit of the
response models (e.g., 2).
5. Process Improvement
5.5. Advanced topics
5.5.3. How do you optimize a process?
5.5.3.2. Multiple response case
Optimization The optimization of dual response systems (DRS) consists of finding operating
of dual conditions x that
response
systems
with T denoting the target value for the secondary response, p the number of
primary responses (i.e., responses to be optimized), s the number of secondary
responses (i.e., responses to be constrained), and is the radius of a spherical
constraint that limits the region in the controllable factor space where the search
should be undertaken. The value of should be chosen with the purpose of
avoiding solutions that extrapolate too far outside the region where the
Nonlinear
In a DRS, the response models and can be linear, quadratic or even cubic
programming
software
polynomials. A nonlinear programming algorithm has to be used for the
required for
optimization of a DRS. For the particular case of quadratic responses, an
DRS
equality constraint for the secondary response, and a spherical region of
experimentation, specialized optimization algorithms exist that guarantee global
optimal solutions. In such a case, the algorithm DRSALG can be used
(download from
http://www.nist.gov/cgi-bin/exit_nist.cgi?url=http://www.stat.cmu.edu/jqt/29-3),
but a Fortran compiler is necessary.
More general In the more general case of inequality constraints or a cubical region of
case experimentation, a general purpose nonlinear solver must be used and several
starting points should be tried to avoid local optima. This is illustrated in the
next section.
Example: The values of three components (x1, x2, x3) of a propellant need to be selected
problem to maximize a primary response, burning rate (Y1), subject to satisfactory levels
setup of two secondary reponses; namely, the variance of the burning rate (Y2) and the
cost (Y3). The three components must add to 100% of the mixture. The fitted
models are:
3(x) 20
x1 + x2 + x3 = 1.0
0 x1 1
0 x2 1
0 x3 1
Solve using We can use Microsoft Excel's "solver" to solve this problem. The table below
Excel solver shows an Excel spreadsheet that has been set up with the problem above. Cells
function B2:B4 contain the decision variables (cells to be changed), cell E2 is to be
maximized, and all the constraints need to be entered appropriately. The figure
shows the spreadsheet after the solver completes the optimization. The solution
is (x*)' = (0.212, 0.343, 0.443) which provides 1 = 106.62, 2 = 4.17, and 3
= 18.23. Therefore, both secondary responses are below the specified upper
bounds. The solver should be run from a variety of starting points (i.e., try
different initial values in cells B1:B3 prior to starting the solver) to avoid local
optima. Once again, confirmatory experiments should be conducted at the
estimated optimal operating conditions.
Excel A B C D E
spreadsheet 1 Factors Responses
2 x1 0.21233 Y1(x) 106.6217
3 x2 0.343725 Y2(x) 4.176743
4 x3 0.443946 Y3(x) 18.23221
5 Additional constraint
6 x1 + x2 + x3 1.000001
5. Process Improvement
5.5. Advanced topics
Standard When the mixture components are subject to the constraint that they
mixture must sum to one, there are standard mixture designs for fitting standard
designs and models, such as Simplex-Lattice designs and Simplex-Centroid designs.
constrained When mixture components are subject to additional constraints, such as
mixture a maximum and/or minimum value for each component, designs other
designs than the standard mixture designs, referred to as constrained mixture
designs or Extreme-Vertices designs, are appropriate.
Assumptions The usual assumptions made for factorial experiments are also made for
for mixture mixture experiments. In particular, it is assumed that the errors are
experiments independent and identically distributed with zero mean and common
variance. Another assumption that is made, as with factorial designs, is
that the true underlying response surface is continuous over the region
being studied.
5. Process Improvement
5.5. Advanced topics
5.5.4. What is a mixture design?
A first order The construction of screening designs and their corresponding models
mixture often begins with the first-order or first-degree mixture model
model
for which the beta coefficients are non-negative and sum to one.
5. Process Improvement
5.5. Advanced topics
5.5.4. What is a mixture design?
Except for the Note that the standard Simplex-Lattice and the Simplex-Centroid designs
center, all (described later) are boundary-point designs; that is, with the exception of
design points the overall centroid, all the design points are on the boundaries of the
are on the simplex. When one is interested in prediction in the interior, it is highly
simplex desirable to augment the simplex-type designs with interior design points.
boundaries
0 0 1
0 0.667 0.333
0 1 0
0.333 0 0.667
0.333 0.333 0.333
0.333 0.6667 0
0.667 0 0.333
0.667 0.333 0
1 0 0
Diagram
showing
configuration
of design
runs
Definition of Now consider the form of the polynomial model that one might fit to the
canonical data from a mixture experiment. Due to the restriction x1 + x2 + ... + xq =
polynomial 1, the form of the regression function that is fit to the data from a mixture
model used in experiment is somewhat different from the traditional polynomial fit and is
mixture often referred to as the canonical polynomial. Its form is derived using the
experiments general form of the regression function that can be fit to data collected at
the points of a {q, m} simplex-lattice design and substituting into this
function the dependence relationship among the xi terms. The number of
terms in the {q, m} polynomial is (q+m-1)!/(m!(q-1)!), as stated
previously. This is equal to the number of points that make up the
associated {q, m} simplex-lattice design.
Example for For example, the equation that can be fit to the points from a {q, m=1}
a {q, m=1} simplex-lattice design is
simplex-
lattice design
Multiplying 0 by (x1 + x2 + ... + xq = 1), the resulting equation is
First- This is called the canonical form of the first-order mixture model. In
order general, the canonical forms of the mixture models (with the asterisks
canonical removed from the parameters) are as follows:
form
Summary of
canonical
mixture Linear
models
Quadratic
Cubic
Special
Cubic
Diagram
showing the
designs runs
for this
example
Fit a The design runs listed in the above table are in standard order. The actual
quadratic order of the 15 treatment runs was completely randomized. JMP 3.2 will
mixture be used to analyze the results. Since there are three levels of each of the
model using three mixture components, a quadratic mixture model can be fit to the
JMP software data. The output from the model fit is shown below. Note that there was
no intercept in the model. To analyze the data in JMP, create a new table
with one column corresponding to the observed elongation values. Select
Fit Model and create the quadratic mixture model (this will look like the
'traditional' interactions regression model obtained from standard classical
designs). Check the No Intercept box on the Fit Model screen. Click on
Run Model. The output is shown below.
Analysis of Variance
Parameter Estimates
Interpretation Under the parameter estimates section of the output are the individual
of the JMP t-tests for each of the parameters in the model. The three cross product
output terms are significant (X1*X2, X3*X1, X3*X2), indicating a significant
quadratic fit.
Conclusions Since b3 > b1 > b2, one can conclude that component 3 (polypropylene)
from the produces yarn with the highest elongation. Additionally, since b12 and b13
fitted are positive, blending components 1 and 2 or components 1 and 3
quadratic produces higher elongation values than would be expected just by
model averaging the elongations of the pure blends. This is an example of
'synergistic' blending effects. Components 2 and 3 have antagonistic
blending effects because b23 is negative.
Contour plot The figure below is the contour plot of the elongation values. From the
of the plot it can be seen that if maximum elongation is desired, a blend of
predicted components 1 and 3 should be chosen consisting of about 75% - 80%
elongation component 3 and 20% - 25% component 1.
values
5. Process Improvement
5.5. Advanced topics
5.5.4. What is a mixture design?
mixtures, the permutations of (1/3, 1/3, 1/3, 0, ..., 0), ..., and so on, with
finally the overall centroid point (1/q, 1/q, ..., 1/q) or q-nary mixture.
The design points in the Simplex-Centroid design will support the polynomial
Model
supported
by simplex-
centroid
which is the qth-order mixture polynomial. For q = 2, this is the quadratic
designs
model. For q = 3, this is the special cubic model.
Example of For example, the fifteen runs for a four component (q = 4) simplex-centroid
runs for design are:
three and (1,0,0,0), (0,1,0,0), (0,0,1,0), (0,0,0,1), (.5,.5,0,0), (.5,0,.5,0) ...,
four (0,0,.5,.5), (1/3,1/3,1/3,0), ...,(0,1/3,1/3,1/3), (1/4,1/4,1/4,1/4).
components
The runs for a three component simplex-centroid design of degree 2 are
(1,0,0), (0,1,0), (0,0,1), (.5,.5,0), (.5,0,.5), (0,.5,.5), (1/3, 1/3, 1/3).
However, in order to fit a first-order model with q =4, only the five runs with a
"1" and all "1/4's" would be needed. To fit a second-order model, add the six
runs with a ".5" (this also fits a saturated third-order model, with no degrees of
freedom left for error).
5. Process Improvement
5.5. Advanced topics
5.5.4. What is a mixture design?
Typical x1 + x2 + ... + xq = 1
additional
constraints Li xi Ui, for i = 1, 2,..., q
with Li 0 and Ui 1.
Example using Consider the following case in which only the lower bounds in the
only lower above equation are imposed, so that the constrained mixture
bounds problem becomes
x1 + x2 + ... + xq = 1
Li xi 1, for i = 1, 2,..., q
Assume we have a three-component mixture problem with
constraints
0.3 x1 0.4 x2 0.1 x3
Feasible mixture The feasible mixture space is shown in the figure below. Note that
region the existence of lower bounds does not affect the shape of the
mixture region, it is still a simplex region. In general, this will
always be the case if only lower bounds are imposed on any of the
component proportions.
Diagram
showing the
feasible mixture
space
Formula for
pseudo
components
with
Computation of In the three component example above, the pseudo components are
the pseudo
components for
the example
Select In terms of the pseudo components, the experimenter has the choice
appropriate of selecting a Simplex-Lattice or a Simplex-Centroid design,
design depending on the objectives of the experiment.
Use of pseudo It is recommended that the pseudo components be used to fit the
components mixture model. This is due to the fact that the constrained design
(after space will usually have relatively high levels of multicollinearity
transformation) among the predictors. Once the final predictive model for the
is recommended pseudo components has been determined, the equation in terms of
the original components can be determined by substituting the
relationship between xi and .
Extreme vertice Note: There are other mixture designs that cover only a sub-portion
designs anre or smaller space within the simplex. These types of mixture designs
another option (not covered here) are referred to as extreme vertices designs. (See
chapter 11 of Myers and Montgomery (1995) or Cornell (1990).
5. Process Improvement
5.5. Advanced topics
5.5.4. What is a mixture design?
Example of For example, consider three mixture components (x1, x2, x3) with three
three process variables (z1, z2, z3). The dimensionality of the region is 5. The
mixture combined region of interest for the three mixture components and three
components process variables is shown in the two figures below. The complete space
and three of the design can be viewed in either of two ways. The first diagram
process shows the idea of a full factorial at each vertex of the three-component
variables simplex region. The second diagram shows the idea of a
three-component simplex region at each point in the full factorial. In
either case, the same overall process space is being investigated.
Diagram
showing
simplex
region of a
3-component
mixture with
a 2^3 full
factorial at
each pure
mixture run
Diagram
showing
process
space of a 23
full factorial
with the
3-component
simplex
region at
each point
of the full
factorial
Additional As can be seen from the above diagrams, setting up the design
options configurations in the process variables and mixture components
available involves setting up either a mixture design at each point of a
configuration in the process variables, or similarly, creating a factorial
arrangement in the process variables at each point of composition in the
mixture components. For the example depicted in the above two
diagrams, this is not the only design available for this number of
mixture components with the specified number of process variables.
Another option might be to run a fractional factorial design at each
vertex or point of the mixture design, with the same fraction run at each
mixture design point. Still another option might be to run a fractional
factorial design at each vertex or point of the mixture design, with a
different fraction run at each mixture design point.
5. Process Improvement
5.5. Advanced topics
Example of Figure 5.15 below represents a batch process that uses 7 monitor
nested data wafers in each run. The plan further calls for measuring response on
each wafer at each of 9 sites. The organization of the sampling plan
has a hierarchical or nested structure: the batch run is the topmost
level, the second level is an individual wafer, and the third level is the
site on the wafer.
The total amount of data generated per batch run will be 7*9 = 63 data
points. One approach to analyzing these data would be to compute the
mean of all these points as well as their standard deviation and use
those results as responses for each run.
Diagram
illustrating
the example
Sites nested Analyzing the data as suggested above is not absolutely incorrect, but
within wafers doing so loses information that one might otherwise obtain. For
and wafers example, site 1 on wafer 1 is physically different from site 1 on wafer
are nested 2 or on any other wafer. The same is true for any of the sites on any of
within runs the wafers. Similarly, wafer 1 in run 1 is physically different from
wafer 1 in run 2, and so on. To describe this situation one says that
sites are nested within wafers while wafers are nested within runs.
Treating Because the wafers and the sites represent unwanted sources of
wafers and variation and because one of the objectives is to reduce the process
sites as sensitivity to these sources of variation, treating wafers and sites as
random random effects in the analysis of the data is a reasonable approach. In
effects allows other words, nested variation is often another way of saying nested
calculation of random effects or nested sources of noise. If the factors "wafers" and
variance "sites", are treated as random effects, then it is possible to estimate a
estimates variance component due to each source of variation through analysis of
variance techniques. Once estimates of the variance components have
been obtained, an investigator is then able to determine the largest
source of variation in the process under experimentation, and also
determine the magnitudes of the other sources of variation in relation
to the largest source.
Split-Plot Designs
A split-plot An experiment run under one of the above three situations usually
experiment results in a split-plot type of design. Consider an experiment to
example examine electroplating of aluminum (non-aqueous) on copper strips.
The three factors of interest are: current (A); solution temperature (T);
and the solution concentration of the plating agent (S). Plating rate is
the measured response. There are a total of 16 copper strips available
for the experiment. The treatment combinations to be run
(orthogonally scaled) are listed below in standard order (i.e., they have
not been randomized):
-1 -1 -1
-1 -1 +1
-1 +1 -1
-1 +1 +1
+1 -1 -1
+1 -1 +1
+1 +1 -1
+1 +1 +1
Concentration Consider running the experiment under the first condition listed above,
is hard to with the factor solution concentration of the plating agent (S) being
vary, so hard to vary. Since this factor is hard to vary, the experimenter would
minimize the like to randomize the treatment combinations so that the solution
number of concentration factor has a minimal number of changes. In other words,
times it is the randomization of the treatment runs is restricted somewhat by the
changed level of the solution concentration factor.
Performing Once one complete replicate of the experiment has been completed, a
replications second replicate is performed with a set of four copper strips processed
for a given level of solution concentration before changing the
concentration and processing the remaining four strips. Note that the
levels for the remaining two factors can still be randomized. In
addition, the level of concentration that is run first in the replication
runs can also be randomized.
Whole plot Running the experiment in this way results in a split-plot design.
and subplot Solution concentration is known as the whole plot factor and the
factors subplot factors are the current and the solution temperature.
Definition of A split-plot design has more than one size experimental unit. In this
experimental experiment, one size experimental unit is an individual copper strip.
units and The treatments or factors that were applied to the individual strips are
whole plot solution temperature and current (these factors were changed each time
and subplot a new strip was placed in the solution). The other or larger size
factors for experimental unit is a set of four copper strips. The treatment or factor
this that was applied to a set of four strips is solution concentration (this
experiment factor was changed after four strips were processed). The smaller size
experimental unit is referred to as the subplot experimental unit, while
the larger experimental unit is referred to as the whole plot unit.
Each size of There are 16 subplot experimental units for this experiment. Solution
experimental temperature and current are the subplot factors in this experiment.
unit leads to There are four whole-plot experimental units in this experiment.
an error term Solution concentration is the whole-plot factor in this experiment.
in the model Since there are two sizes of experimental units, there are two error
for the terms in the model, one that corresponds to the whole-plot error or
experiment whole-plot experimental unit and one that corresponds to the subplot
error or subplot experimental unit.
Partial The ANOVA table for this experiment would look, in part, as follows:
ANOVA table
Source DF
Replication 1
Concentration 1
Error (Whole plot) = Rep*Conc 1
Temperature 1
Rep*Temp 1
Current 1
Rep*Current 1
Temp*Conc 1
Rep*Temp*Conc 1
Temp*Current 1
Rep*Temp*Current 1
Current*Conc 1
Rep*Current*Conc 1
Temp*Current*Conc 1
Error (Subplot) =Rep*Temp*Current*Conc 1
The first three sources are from the whole-plot level, while the next 12
are from the subplot portion. A normal probability plot of the 12
subplot term estimates could be used to look for significant terms.
A batch Consider running the experiment under the second condition listed
process leads above (i.e., a batch process) for which four copper strips are placed in
to a different the solution at one time. A specified level of current can be applied to
experiment - an individual strip within the solution. The same 16 treatment
also a combinations (a replicated 23 factorial) are run as were run under the
strip-plot first scenario. However, the way in which the experiment is performed
would be different. There are four treatment combinations of solution
temperature and solution concentration: (-1, -1), (-1, 1), (1, -1), (1, 1).
The experimenter randomly chooses one of these four treatments to set
up first. Four copper strips are placed in the solution. Two of the four
strips are randomly assigned to the low current level. The remaining
two strips are assigned to the high current level. The plating is
performed and the response is measured. A second treatment
combination of temperature and concentration is chosen and the same
procedure is followed. This is done for all four temperature /
concentration combinations.
This also a Running the experiment in this way also results in a split-plot design in
split-plot which the whole-plot factors are now solution concentration and
design solution temperature, and the subplot factor is current.
Subplot The smaller size experimental unit is again referred to as the subplot
experimental experimental unit. There are 16 subplot experimental units for this
unit experiment. Current is the subplot factor in this experiment.
Two error There are two sizes of experimental units and there are two error terms
terms in the in the model: one that corresponds to the whole-plot error or
model whole-plot experimental unit, and one that corresponds to the subplot
error or subplot experimental unit.
Concentration 1
Temperature 1
Error (Whole plot) = Conc*Temp 1
Current 1
Conc*Current 1
Temp*Current 1
Conc*Temp*Current 1
Error (Subplot) 8
The first three sources come from the whole-plot level and the next 5
come from the subplot level. Since there are 8 degrees of freedom for
the subplot error term, this MSE can be used to test each effect that
involves current.
Running the Consider running the experiment under the third scenario listed above.
experiment There is only one copper strip in the solution at one time. However,
under the two strips, one at the low current and one at the high current, are
third scenario processed one right after the other under the same temperature and
concentration setting. Once two strips have been processed, the
concentration is changed and the temperature is reset to another
combination. Two strips are again processed, one after the other, under
this temperature and concentration setting. This process is continued
until all 16 copper strips have been processed.
This also a Running the experiment in this way also results in a split-plot design in
split-plot which the whole-plot factors are again solution concentration and
design solution temperature and the subplot factor is current. In this
experiment, one size experimental unit is an individual copper strip.
The treatment or factor that was applied to the individual strips is
current (this factor was changed each time for a different strip within
the solution). The other or larger-size experimental unit is a set of two
copper strips. The treatments or factors that were applied to a pair of
two strips are solution concentration and solution temperature (these
factors were changed after two strips were processed). The smaller size
experimental unit is referred to as the subplot experimental unit.
Current is the There are 16 subplot experimental units for this experiment. Current is
subplot factor the subplot factor in the experiment. There are eight whole-plot
and experimental units in this experiment. Solution concentration and
temperature solution temperature are the whole plot factors. There are two error
and terms in the model, one that corresponds to the whole-plot error or
concentration whole-plot experimental unit, and one that corresponds to the subplot
are the whole error or subplot experimental unit.
plot factors
Partial The ANOVA for this (third) approach is, in part, as follows:
ANOVA table
Source DF
Concentration 1
Temperature 1
Conc*Temp 1
Error (Whole plot) 4
Current 1
Conc*Current 1
Temp*Current 1
Conc*Temp*Current 1
Error (Subplot) 4
The first four terms come from the whole-plot analysis and the next 5
terms come from the subplot analysis. Note that we have separate error
terms for both the whole plot and the subplot effects, each based on 4
degrees of freedom.
Primary As can be seen from these three scenarios, one of the major differences
distinction of in split-plot designs versus simple factorial designs is the number of
split-plot different sizes of experimental units in the experiment. Split-plot
designs is that designs have more than one size experimental unit, i.e., more than one
they have error term. Since these designs involve different sizes of experimental
more than one units and different variances, the standard errors of the various mean
experimental comparisons involve one or more of the variances. Specifying the
unit size (and appropriate model for a split-plot design involves being able to identify
therefore each size of experimental unit. The way an experimental unit is
more than one defined relative to the design structure (for example, a completely
error term) randomized design versus a randomized complete block design) and
the treatment structure (for example, a full 23 factorial, a resolution V
half fraction, a two-way treatment structure with a control group, etc.).
As a result of having greater than one size experimental unit, the
appropriate model used to analyze split-plot designs is a mixed model.
Using wrong If the data from an experiment are analyzed with only one error term
model can used in the model, misleading and invalid conclusions can be drawn
lead to invalid from the results. For a more detailed discussion of these designs and
conclusions the appropriate analysis procedures, see Milliken, Analysis of Messy
Data, Vol. 1.
Strip-Plot Designs
Strip-plot Similar to a split-plot design, a strip-plot design can result when some
desgins often type of restricted randomization has occurred during the experiment. A
result from simple factorial design can result in a strip-plot design depending on
experiments how the experiment was conducted. Strip-plot designs often result
that are from experiments that are conducted over two or more process steps in
conducted which each process step is a batch process, i.e., completing each
over two or treatment combination of the experiment requires more than one
more process processing step with experimental units processed together at each
steps process step. As in the split-plot design, strip-plot designs result when
the randomization in the experiment has been restricted in some way.
As a result of the restricted randomization that occurs in strip-plot
designs, there are multiple sizes of experimental units. Therefore, there
are different error terms or different error variances that are used to test
the factors of interest in the design. A traditional strip-plot design has
Example with Consider the following example from the semiconductor industry. An
two steps and experiment requires an implant step and an anneal step. At both the
three factor anneal and the implant steps there are three factors to test. The implant
variables process accommodates 12 wafers in a batch, and implanting a single
wafer under a specified set of conditions is not practical nor does doing
so represent economical use of the implanter. The anneal furnace can
handle up to 100 wafers.
Explanation The figure below shows the design structure for how the experiment
of the was run. The rectangles at the top of the diagram represent the settings
diagram that for a two-level factorial design for the three factors in the implant step
illustrates the (A, B, C). Similarly, the rectangles at the lower left of the diagram
design represent a two-level factorial design for the three factors in the anneal
structure of step (D, E, F).
the example
The arrows connecting each set of rectangles to the grid in the center
of the diagram represent a randomization of trials in the experiment.
The horizontal elements in the grid represent the experimental units for
the anneal factors. The vertical elements in the grid represent the
experimental units for the implant factors. The intersection of the
vertical and horizontal elements represents the experimental units for
the interaction effects between the implant factors and the anneal
factors. Therefore, this experiment contains three sizes of experimental
units, each of which has a unique error term for estimating the
significance of effects.
Diagram of
the split-plot
design
Physical To put actual physical meaning to each of the experimental units in the
meaning of above example, consider each cell in the grid as an individual wafer. A
the batch of eight wafers goes through the implant step first. According to
experimental the figure, treatment combination #3 in factors A, B, and C is the first
units implant treatment run. This implant treatment is applied to all eight
wafers at once. Once the first implant treatment is finished, another set
of eight wafers is implanted with treatment combination #5 of factors
A, B, and C. This continues until the last batch of eight wafers is
implanted with treatment combination #6 of factors A, B, and C. Once
all of the eight treatment combinations of the implant factors have
been run, the anneal step starts. The first anneal treatment combination
to be run is treatment combination #5 of factors D, E, and F. This
anneal treatment combination is applied to a set of eight wafers, with
each of these eight wafers coming from one of the eight implant
treatment combinations. After this first batch of wafers has been
Three sizes of Running the experiment in this way results in a strip-plot design with
experimental three sizes of experimental units. A set of eight wafers that are
units implanted together is the experimental unit for the implant factors A,
B, and C and for all of their interactions. There are eight experimental
units for the implant factors. A different set of eight wafers are
annealed together. This different set of eight wafers is the second size
experimental unit and is the experimental unit for the anneal factors D,
E, and F and for all of their interactions. The third size experimental
unit is a single wafer. This is the experimental unit for all of the
interaction effects between the implant factors and the anneal factors.
Replication Actually, the above figure of the strip-plot design represents one block
or one replicate of this experiment. If the experiment contains no
replication and the model for the implant contains only the main
effects and two-factor interactions, the three-factor interaction term
A*B*C (1 degree of freedom) provides the error term for the
estimation of effects within the implant experimental unit. Invoking a
similar model for the anneal experimental unit produces the
three-factor interaction term D*E*F for the error term (1 degree of
freedom) for effects within the anneal experimental unit.
Further For more details about strip-plot designs, see Milliken and Johnson
information (1987) or Miller (1997).
5. Process Improvement
5.5. Advanced topics
If these were the only aspects of "Taguchi Designs," there would be little additional
reason to consider them over and above our previous discussion on factorials.
"Taguchi" designs are similar to our familiar fractional factorial designs. However,
Taguchi has introduced several noteworthy new ways of conceptualizing an
experiment that are very valuable, especially in product development and industrial
engineering, and we will look at two of his main ideas, namely Parameter Design
and Tolerance Design.
Parameter Design
Taguchi The aim here is to make a product or process less variable (more robust) in the face
advocated of variation over which we have little or no control. A simple fictitious example
using inner might be that of the starter motor of an automobile that has to perform reliably in
and outer the face of variation in ambient temperature and varying states of battery weakness.
array designs The engineer has control over, say, number of armature turns, gauge of armature
to take into wire, and ferric content of magnet alloy.
account noise
factors (outer) Conventionally, one can view this as an experiment in five factors. Taguchi has
and design pointed out the usefulness of viewing it as a set-up of three inner array factors
factors (inner) (turns, gauge, ferric %) over which we have design control, plus an outer array of
factors over which we have control only in the laboratory (temperature, battery
voltage).
Pictorial Pictorially, we can view this design as being a conventional design in the inner
representation array factors (compare Figure 3.1) with the addition of a "small" outer array
of Taguchi factorial design at each corner of the "inner array" box.
designs
Let I1 = "turns," I2 = "gauge," I3 = "ferric %," E1 = "temperature," and E2 =
"voltage." Then we construct a 23 design "box" for the I's, and at each of the eight
corners so constructed, we place a 22 design "box" for the E's, as is shown in Figure
5.17.
An example of We now have a total of 8x4 = 32 experimental settings, or runs. These are set out in
an inner and Table 5.7, in which the 23 design in the I's is given in standard order on the left of
outer array the table and the 22 design in the E's is written out sideways along the top. Note that
designed the experiment would not be run in the standard order but should, as always, have
experiment its runs randomized. The output measured is the percent of (theoretical) maximum
torque.
Table showing TABLE 5.7 Design table, in standard order(s) for the parameter
the Taguchi design of Figure 5.9
design and the
responses Run
from the Number 1 2 3 4
experiment
E1 -1 +1 -1 +1 Output Output
I1 I2 I3 E2 -1 -1 +1 +1 MEAN STD. DEV
1 -1 -1 -1 75 86 67 98 81.5 13.5
2 +1 -1 -1 87 78 56 91 78.0 15.6
3 -1 +1 -1 77 89 78 8 63.0 37.1
4 +1 +1 -1 95 65 77 95 83.0 14.7
5 -1 -1 +1 78 78 59 94 77.3 14.3
6 +1 -1 +1 56 79 67 94 74.0 16.3
7 -1 +1 +1 79 80 66 85 77.5 8.1
8 +1 +1 +1 71 80 73 95 79.8 10.9
Interpretation Note that there are four outputs measured on each row. These correspond to the four
of the table `outer array' design points at each corner of the `outer array' box. As there are eight
corners of the outer array box, there are eight rows in all.
Each row yields a mean and standard deviation % of maximum torque. Ideally there
would be one row that had both the highest average torque and the lowest standard
deviation (variability). Row 4 has the highest torque and row 7 has the lowest
variability, so we are forced to compromise. We can't simply `pick the winner.'
Use contour One might also observe that all the outcomes occur at the corners of the design
plots to see `box', which means that we cannot see `inside' the box. An optimum point might
inside the box occur within the box, and we can search for such a point using contour plots.
Contour plots were illustrated in the example of response surface design analysis
given in Section 4.
Fractional Note that we could have used fractional factorials for either the inner or outer array
factorials designs, or for both.
Tolerance Design
Taguchi also This section deals with the problem of how, and when, to specify tightened
advocated tolerances for a product or a process so that quality and performance/productivity
tolerance are enhanced. Every product or process has a number—perhaps a large number—of
studies to components. We explain here how to identify the critical components to target
determine, when tolerances have to be tightened.
based on a
loss or cost It is a natural impulse to believe that the quality and performance of any item can
function, easily be improved by merely tightening up on some or all of its tolerance
which requirements. By this we mean that if the old version of the item specified, say,
variables have machining to ± 1 micron, we naturally believe that we can obtain better
critical performance by specifying machining to ± ½ micron.
tolerances This can become expensive, however, and is often not a guarantee of much better
that need to performance. One has merely to witness the high initial and maintenance costs of
be tightened such tight-tolerance-level items as space vehicles, expensive automobiles, etc. to
realize that tolerance design—the selection of critical tolerances and the
re-specification of those critical tolerances—is not a task to be undertaken without
careful thought. In fact, it is recommended that only after extensive parameter
design studies have been completed should tolerance design be performed as a last
resort to improve quality and productivity.
Example
Example: Customers for an electronic component complained to their supplier that the
measurement measurement reported by the supplier on the as-delivered items appeared to be
of electronic imprecise. The supplier undertook to investigate the matter.
component
made up of The supplier's engineers reported that the measurement in question was made up of
two two components, which we label x and y, and the final measurement M was reported
components according to the standard formula
M = K x/y
with `K' a known physical constant. Components x and y were measured separately
in the laboratory using two different techniques, and the results combined by
software to produce M. Buying new measurement devices for both components
would be prohibitively expensive, and it was not even known by how much the x or
y component tolerances should be improved to produce the desired improvement in
the precision of M.
Taylor series Assume that in a measurement of a standard item the `true' value of x is xo and for y
expansion it is yo. Let f(x, y) = M; then the Taylor Series expansion for f(x, y) is
with all the partial derivatives, `df/dx', etc., evaluated at (xo, yo).
Assume It is also assumed known that the y measurements show a normal distribution
distribution of around yo, with standard deviation y = 0.004 y-units. Thus Ty = ± 3 y = ±0.012.
y is normal
Worst case Now ±Tx and ±Ty may be thought of as `worst case' values for (x-xo) and (y-yo).
values Substituting Tx for (x-xo) and Ty for (y-yo) in the expanded formula for M(x, y), we
have
Drop some The and TxTy terms, and all terms of higher order, are going to be at least an
terms
order of magnitude smaller than terms in Tx and in Ty, and for this reason we drop
them, so that
Worst case Thus, a `worst case' Euclidean distance of M(x, y) from its ideal value Kxo/yo is
Euclidean (approximately)
distance
This shows the relative contributions of the components to the variation in the
measurement.
Economic As yo is a known quantity and reduction in Tx and in Ty each carries its own price
decision tag, it becomes an economic decision whether one should spend resources to reduce
Tx or Ty, or both.
Simulation an In this example, we have used a Taylor series approximation to obtain a simple
alternative to expression that highlights the benefit of Tx and Ty. Alternatively, one might
Taylor series simulate values of M = K*x/y, given a specified (Tx,Ty) and (x0,y0), and then
approximation summarize the results with a model for the variability of M as a function of (Tx,Ty).
Functional In other applications, no functional form is available and one must use
form may not experimentation to empirically determine the optimal tolerance design. See
be available Bisgaard and Steinberg (1997).
5. Process Improvement
5.5. Advanced topics
Semifolding Example
Four We have four experimental factors to investigate, namely X1, X2, X3,
experimental and X4, and we have designed and run a 24-1 fractional factorial design.
factors Such a design has eight runs, or rows, if we don't count center point
runs (or replications).
Resolution The 24-1 design is of resolution IV, which means that main effects are
IV design confounded with, at worst, three-factor interactions, and two-factor
interactions are confounded with other two factor interactions.
Design The design matrix, in standard order, is shown in Table 5.8 along with
matrix all the two-factor interaction columns. Note that the column for X4 is
constructed by multiplying columns for X1, X2, and X3 together (i.e.,
4=123).
Table 5.8 The 24-1 design plus 2-factor interaction columns shown
in standard order. Note that 4=123.
Run Two-Factor Interaction Columns
Number X1 X2 X3 X4 X1*X2 X1*X3 X1*X4 X2*X3 X2*X4 X3*X4
1 -1 -1 -1 -1 +1 +1 +1 +1 +1 +1
2 +1 -1 -1 +1 -1 -1 +1 +1 -1 -1
3 -1 +1 -1 +1 -1 +1 -1 -1 +1 -1
4 +1 +1 -1 -1 +1 -1 -1 -1 -1 +1
5 -1 -1 +1 +1 +1 -1 -1 -1 -1 +1
6 +1 -1 +1 -1 -1 +1 -1 -1 +1 -1
7 -1 +1 +1 -1 -1 -1 +1 +1 -1 -1
8 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1
Confounding Note also that 12=34, 13=24, and 14=23. These follow from the
of two-factor generating relationship 4=123 and tells us that we cannot estimate any
interactions two-factor interaction that is free of some other two-factor alias.
Alternative There is a way, however, to obtain what we want while adding only four
method more runs. These runs are selected in the following manner: take the
requiring four rows of Table 5.8 that have `-1' in the `X1' column and switch the
fewer runs `-' sign under X1 to `+' to obtain the four-row table of Table 5.9. This is
called a foldover on X1, choosing the subset of runs with X1 = -1. Note
that this choice of 4 runs is not unique, and that if the initial design
suggested that X1 = -1 were a desirable level, we would have chosen to
experiment at the other four treatment combinations that were omitted
from the initial design.
Table with Add this new block of rows to the bottom of Table 5.8 to obtain a
new design design in twelve rows. We show this in Table 5.10 and also add in the
points added two-factor interactions as well for illustration (not needed when we do
to the the runs).
original
design TABLE 5.10 A twelve-run design based on the 24-1 also showing all
points two-factor interaction columns
Run Two-Factor Interaction Columns
Number X1 X2 X3 X4 X1*X2 X1*X3 X1*X4 X2*X3 X2*X4 X3*X4
1 -1 -1 -1 -1 +1 +1 +1 +1 +1 +1
2 +1 -1 -1 +1 -1 -1 +1 +1 -1 -1
3 -1 +1 -1 +1 -1 +1 -1 -1 +1 -1
4 +1 +1 -1 -1 +1 -1 -1 -1 -1 +1
5 -1 -1 +1 +1 +1 -1 -1 -1 -1 +1
6 +1 -1 +1 -1 -1 +1 -1 -1 +1 -1
7 -1 +1 +1 -1 -1 -1 +1 +1 -1 -1
8 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1
1 +1 -1 -1 -1 -1 -1 -1 +1 +1 +1
10 +1 +1 -1 +1 +1 -1 +1 -1 +1 -1
11 +1 -1 +1 +1 -1 +1 +1 -1 -1 +1
12 +1 +1 +1 -1 +1 +1 -1 +1 -1 -1
Design is Examine the two-factor interaction columns and convince yourself that
resolution V no two are alike. This means that no two-factor interaction involving X1
is aliased with any other two-factor interaction. Thus, the design is
resolution V, which is not always the case when constructing these
types of ¾ foldover designs.
Estimating What we now have is a design with 12 runs, with which we can estimate
X1 all the two-factor interactions involving X1 free of aliasing with any
two-factor other two-factor interaction. It is called a ¾ design because it has ¾ the
interactions number of rows of the next regular factorial design (a 24).
Standard If one fits a model with an intercept, a block effect, the four main effects
errors of and the six two-factor interactions, then each coefficient has a standard
effect error of /81/2 - instead of /121/2 - because the design is not
estimates orthogonal and each estimate is correlated with two other estimates.
Note that no degrees of freedom exists for estimating . Instead, one
should plot the 10 effect estimates using a normal (or half-normal)
effects plot to judge which effects to declare significant.
Estimate all Suppose we wish to run an experiment for k=8 factors, with which we
main effects want to estimate all main effects and two-factor interactions. We could
and use the design described in the summary table of fractional
two-factor factorial designs, but this would require a 64-run experiment to estimate
interactions
the 1 + 8 + 28 = 37 desired coefficients. In this context, and especially
for 8 factors
for larger resolution V designs, ¾ of the design points will generally
suffice.
Table X1 X2 X3 X4 X5 X6 X7 X8
containing +1 -1 -1 -1 -1 -1 -1 -1
the design
matrix -1 +1 -1 -1 -1 -1 -1 -1
+1 +1 -1 -1 -1 -1 +1 +1
-1 -1 +1 -1 -1 -1 -1 +1
-1 +1 +1 -1 -1 -1 +1 -1
+1 +1 +1 -1 -1 -1 -1 +1
-1 -1 -1 +1 -1 -1 -1 +1
+1 -1 -1 +1 -1 -1 +1 -1
+1 +1 -1 +1 -1 -1 -1 +1
-1 -1 +1 +1 -1 -1 +1 +1
+1 -1 +1 +1 -1 -1 -1 -1
-1 +1 +1 +1 -1 -1 -1 -1
-1 -1 -1 -1 +1 -1 +1 -1
-1 +1 -1 -1 +1 -1 -1 +1
+1 +1 -1 -1 +1 -1 +1 -1
+1 -1 +1 -1 +1 -1 +1 +1
-1 +1 +1 -1 +1 -1 +1 +1
+1 +1 +1 -1 +1 -1 -1 -1
-1 -1 -1 +1 +1 -1 -1 -1
+1 -1 -1 +1 +1 -1 +1 +1
-1 +1 -1 +1 +1 -1 +1 +1
-1 -1 +1 +1 +1 -1 +1 -1
+1 -1 +1 +1 +1 -1 -1 +1
+1 +1 +1 +1 +1 -1 +1 -1
-1 -1 -1 -1 -1 +1 +1 -1
+1 -1 -1 -1 -1 +1 -1 +1
+1 +1 -1 -1 -1 +1 +1 -1
-1 -1 +1 -1 -1 +1 -1 -1
+1 -1 +1 -1 -1 +1 +1 +1
-1 +1 +1 -1 -1 +1 +1 +1
+1 -1 -1 +1 -1 +1 +1 +1
-1 +1 -1 +1 -1 +1 +1 +1
+1 +1 -1 +1 -1 +1 -1 -1
-1 -1 +1 +1 -1 +1 +1 -1
-1 +1 +1 +1 -1 +1 -1 +1
+1 +1 +1 +1 -1 +1 +1 -1
-1 -1 -1 -1 +1 +1 +1 +1
+1 -1 -1 -1 +1 +1 -1 -1
-1 +1 -1 -1 +1 +1 -1 -1
-1 -1 +1 -1 +1 +1 -1 +1
+1 -1 +1 -1 +1 +1 +1 -1
+1 +1 +1 -1 +1 +1 -1 +1
-1 -1 -1 +1 +1 +1 -1 +1
-1 +1 -1 +1 +1 +1 +1 -1
+1 +1 -1 +1 +1 +1 -1 +1
+1 -1 +1 +1 +1 +1 -1 -1
-1 +1 +1 +1 +1 +1 -1 -1
+1 +1 +1 +1 +1 +1 +1 +1
Good This design provides 11 degrees of freedom for error and also provides
precision for good precision for coefficient estimates (some of the coefficients have a
coefficient standard error of and some have a standard error of
estimates ).
Further More about John's ¾ designs can be found in John (1971) or Diamond
information (1989).
5. Process Improvement
5.5. Advanced topics
Useful for 4 This could be particularly useful when you have a design containing
or 5 factors four or five factors and you wish to only use the experimental units
from one lot (i.e., batch).
Table The following is a design for four factors. You would want to
containing randomize these runs before implementing them; -1 and +1 represent
design the low and high settings, respectively, of each factor.
matrix for
four factors
1 +1 -1 -1 -1
2 -1 +1 -1 -1
3 -1 -1 +1 -1
4 +1 +1 +1 -1
5 +1 -1 -1 +1
6 -1 +1 -1 +1
7 -1 -1 +1 +1
8 +1 +1 +1 +1
9 - 0 0 0
10 0 0 0
11 0 - 0 0
12 0 0 0
13 0 0 - 0
14 0 0 0
15 0 0 0 -
16 0 0 0
17 0 0 0 0
18 0 0 0 0
19 0 0 0 0
20 0 0 0 0
Small However, small composite designs are not rotatable, regardless of the
composite choice of . For small composite designs, should not be smaller than
designs not [number of factorial runs]1/4 nor larger than k1/2.
rotatable
5. Process Improvement
5.5. Advanced topics
Starting point
Problem The problem category we will address is the screening problem. Two
category characteristics of screening problems are:
1. There are many factors to consider.
2. Each of these factors may be either continuous or discrete.
Desired The desired output from the analysis of a screening problem is:
output ● A ranked list (by order of importance) of factors.
● A good model.
● Insight.
Design type In particular, the EDA approach is applied to 2k full factorial and 2k-p
fractional factorial designs.
10-Step The following is a 10-step EDA process for analyzing the data from 2k full
process factorial and 2k-p fractional factorial designs.
1. Ordered data plot
2. Dex scatter plot
3. Dex mean plot
4. Interaction effects matrix plot
5. Block plot
6. DEX Youden plot
7. |Effects| plot
8. Half-normal probability plot
9. Cumulative residual standard deviation plot
10. DEX contour plot
Each of these plots will be presented with the following format:
● Purpose of the plot
Data set
Defective The plots presented in this section are demonstrated with a data set from Box
springs data and Bisgaard (1987).
These data are from a 23 full factorial data set that contains the following
variables:
1. Response variable Y = percentage of springs without cracks
2. Factor 1 = oven temperature (2 levels: 1450 and 1600 F)
3. Factor 2 = carbon concentration (2 levels: .5% and .7%)
4. Factor 3 = quench temperature (2 levels: 70 and 120 F)
Y X1 X2 X3
Percent Oven Carbon Quench
Acceptable Temperature Concentration Temperature
----------------------------------------------------
67 -1 -1 -1
79 +1 -1 -1
61 -1 +1 -1
75 +1 +1 -1
59 -1 -1 +1
90 +1 -1 +1
52 -1 +1 +1
87 +1 +1 +1
You can read this file into Dataplot with the following commands:
SKIP 25
READ BOXSPRIN.DAT Y X1 X2 X3
5. Process Improvement
5.5. Advanced topics
5.5.9. An EDA approach to experimental design
Motivation To determine the best setting, an obvious place to start is the best response value. What
constitutes "best"? Are we trying to maximize the response, minimize the response, or hit a
specific target value? This non-statistical question must be addressed and answered by the
analyst. For example, if the project goal is ultimately to achieve a large response, then the desired
experimental goal is maximization. In such a case, the analyst would note from the plot the
largest response value and the corresponding combination of the k-factor settings that yielded that
best response.
Plot for Applying the ordered response plot for the defective springs data set yields the following plot.
defective
springs
data
How to From the ordered data plot, we look for the following:
interpret 1. best settings;
2. most important factor.
Best Settings (Based on the Data):
At the best (highest or lowest or target) response value, what are the corresponding settings for
each of the k factors? This defines the best setting based on the raw data.
Most Important Factor:
For the best response point and for the nearby neighborhood of near-best response points, which
(if any) of the k factors has consistent settings? That is, for the subset of response values that is
best or near-best, do all of these values emanate from an identical level of some factor?
Alternatively, for the best half of the data, does this half happen to result from some factor with a
common setting? If yes, then the factor that displays such consistency is an excellent candidate
for being declared the "most important factor". For a balanced experimental design, when all of
the best/near-best response values come from one setting, it follows that all of the
worst/near-worst response values will come from the other setting of that factor. Hence that factor
becomes "most important".
At the bottom of the plot, step though each of the k factors and determine which factor, if any,
exhibits such behavior. This defines the "most important" factor.
Conclusions The application of the ordered data plot to the defective springs data set results in the following
for the conclusions:
defective 1. Best Settings (Based on the Data):
springs
data (X1,X2,X3) = (+,-,+) = (+1,-1,+1) is the best setting since
1. the project goal is maximization of the percent acceptable springs;
2. Y = 90 is the largest observed response value; and
3. (X1,X2,X3) = (+,-,+) at Y = 90.
2. Most important factor:
X1 is the most important factor since the four largest response values (90, 87, 79, and 75)
have factor X1 at +1, and the four smallest response values (52, 59, 61, and 67) have factor
X1 at -1.
5. Process Improvement
5.5. Advanced topics
5.5.9. An EDA approach to experimental design
Motivation The scatter plot is the primary data analysis tool for determining if and how a response relates to
another factor. Determining if such a relationship exists is a necessary first step in converting
statistical association to possible engineering cause-and-effect. Looking at how the raw data
change as a function of the different levels of a factor is a fundamental step which, it may be
argued, should never be skipped in any data analysis.
From such a foundational plot, the analyst invariably extracts information dealing with location
shifts, variation shifts, and outliers. Such information may easily be washed out by other "more
advanced" quantitative or graphical procedures (even computing and plotting means!). Hence
there is motivation for the dex scatter plot.
If we were interested in assessing the importance of a single factor, and since "important" by
default means shift in location, then the simple scatter plot is an ideal tool. A large shift (with
little data overlap) in the body of the data from the "-" setting to the "+" setting of a given factor
would imply that the factor is important. A small shift (with much overlap) would imply the
factor is not important.
The dex scatter plot is actually a sequence of k such scatter plots with one scatter plot for each
factor.
Plot for The dex scatter plot for the defective springs data set is as follows.
defective
springs
data
How to As discussed previously, the dex scatter plot is used to look for the following:
interpret 1. Most Important Factors;
2. Best Settings of the Most Important Factors;
3. Outliers.
Each of these will be discussed in turn.
Most Important Factors:
For each of the k factors, as we go from the "-" setting to the "+" setting within the factor, is there
a location shift in the body of the data? If yes, then
1. Which factor has the biggest such data location shift (that is, has least data overlap)? This
defines the "most important factor".
2. Which factor has the next biggest shift (that is, has next least data overlap)? This defines
the "second most important factor".
3. Continue for the remaining factors.
In practice, the dex scatter plot will typically only be able to discriminate the most important
factor (largest shift) and perhaps the second most important factor (next largest shift). The degree
of overlap in remaining factors is frequently too large to ascertain with certainty the ranking for
other factors.
Best Settings for the Most Important Factors:
For each of the most important factors, which setting ("-" or "+") yields the "best" response?
In order to answer this question, the engineer must first define "best". This is done with respect to
the overall project goal in conjunction with the specific response variable under study. For some
experiments (e.g., maximizing the speed of a chip), "best" means we are trying to maximize the
response (speed). For other experiments (e.g., semiconductor chip scrap), "best" means we are
trying to minimize the response (scrap). For yet other experiments (e.g., designing a resistor)
"best" means we are trying to hit a specific target (the specified resistance). Thus the definition of
"best" is an engineering precursor to the determination of best settings.
Suppose the analyst is attempting to maximize the response. In such a case, the analyst would
proceed as follows:
1. For factor 1, for what setting (- or +) is the body of the data higher?
2. For factor 2, for what setting (- or +) is the body of the data higher?
3. Continue for the remaining factors.
The resulting k-vector of best settings:
(x1best, x2best, ..., xkbest)
is thus theoretically obtained by looking at each factor individually in the dex scatter plot and
choosing the setting (- or +) that has the body of data closest to the desired optimal (maximal,
minimal, target) response.
As indicated earlier, the dex scatter plot will typically be able to estimate best settings for only the
first few important factors. Again, the degree of data overlap precludes ascertaining best settings
for the remaining factors. Other tools, such as the dex mean plot, will do a better job of
determining such settings.
Outliers:
Do any data points stand apart from the bulk of the data? If so, then such values are candidates for
further investigation as outliers. For multiple outliers, it is of interest to note if all such anomalous
data cluster at the same setting for any of the various factors. If so, then such settings become
candidates for avoidance or inclusion, depending on the nature (bad or good), of the outliers.
Conclusions The application of the dex scatter plot to the defective springs data set results in the following
for the conclusions:
defective 1. Most Important Factors:
springs
1. X1 (most important);
data
2. X2 (of lesser importance);
3. X3 (of least importance).
that is,
❍ factor 1 definitely looks important;
❍ factor 3 has too much overlap to be important with respect to location, but is flagged
for further investigation due to potential differences in variation.
2. Best Settings:
(X1,X2,X3) = (+,-,- = (+1,-1,-1)
3. Outliers: None detected.
5. Process Improvement
5.5. Advanced topics
5.5.9. An EDA approach to experimental design
Motivation If we were interested in assessing the importance of a single factor, and since important, by
default, means shift in location, and the average is the simplest location estimator, a reasonable
graphics tool to assess a single factor's importance would be a simple mean plot. The vertical axis
of such a plot would be the mean response for each setting of the factor and the horizontal axis is
the two settings of the factor: "-" and "+" (-1 and +1). A large difference in the two means would
imply the factor is important while a small difference would imply the factor is not important.
The dex mean plot is actually a sequence of k such plots, with one mean plot for each factor. To
assist in comparability and relative importance, all of the mean plots are on the same scale.
Plot for Applying the dex mean plot to the defective springs data yields the following plot.
defective
springs
data
How to From the dex mean plot, we look for the following:
interpret 1. A ranked list of factors from most important to least important.
2. The best settings for each factor (on average).
Ranked List of Factors--Most Important to Least Important:
For each of the k factors, as we go from the "-" setting to the "+" setting for the factor, is there a
shift in location of the average response?
If yes, we would like to identify the factor with the biggest shift (the "most important factor"), the
next biggest shift (the "second most important factor"), and so on until all factors are accounted
for.
Since we are only plotting the means and each factor has identical (-,+) = (-1,+1) coded factor
settings, the above simplifies to
1. What factor has the steepest line? This is the most important factor.
2. The next steepest line? This is the second most important factor.
3. Continue for the remaining factors.
This ranking of factors based on local means is the most important step in building the definitive
ranked list of factors as required in screening experiments.
Best Settings (on Average):
For each of the k factors, which setting (- or +) yields the "best" response?
In order to answer this, the engineer must first define "best". This is done with respect to the
overall project goal in conjunction with the specific response variable under study. For some
experiments, "best" means we are trying to maximize the response (e.g., maximizing the speed of
a chip). For other experiments, "best" means we are trying to minimize the response (e.g.,
semiconductor chip scrap). For yet other experiments, "best" means we are trying to hit a specific
target (e.g., designing a resistor to match a specified resistance). Thus the definition of "best" is a
precursor to the determination of best settings.
For example, suppose the analyst is attempting to maximize the response. In that case, the analyst
would proceed as follows:
1. For factor 1, what setting (- or +) has the largest average response?
2. For factor 2, what setting (- or +) has the largest average response?
3. Continue for the remaining factors.
The resulting k-vector of best settings:
(x1best, x2best, ..., xkbest)
is in general obtained by looking at each factor individually in the dex mean plot and choosing
that setting (- or +) that has an average response closest to the desired optimal (maximal, minimal,
target) response.
This candidate for best settings is based on the averages. This k-vector of best settings should be
similar to that obtained from the dex scatter plot, though the dex mean plot is easier to interpret.
Conclusions The application of the dex mean plot to the defective springs data set results in the following
for the conclusions:
defective 1. Ranked list of factors (excluding interactions):
springs
1. X1 (most important). Qualitatively, this factor looks definitely important.
data
2. X2 (of lesser importantance). Qualitatively, this factor is a distant second to X1.
3. X3 (unimportant). Qualitatively, this factor appears to be unimportant.
2. Best settings (on average):
(X1,X2,X3) = (+,-,+) = (+1,-1,+1)
5. Process Improvement
5.5. Advanced topics
5.5.9. An EDA approach to experimental design
Thus for k = 3, the number of 2-factor interactions = 3, while for k = 7, the number of 2-factor
interactions = 21.
It is important to distinguish between the number of interactions that are active in a given
experiment versus the number of interactions that the analyst is capable of making definitive
conclusions about. The former depends only on the physics and engineering of the problem. The
latter depends on the number of factors, k, the choice of the k factors, the constraints on the
number of runs, n, and ultimately on the experimental design that the analyst chooses to use. In
short, the number of possible interactions is not necessarily identical to the number of
interactions that we can detect.
Note that
1. with full factorial designs, we can uniquely estimate interactions of all orders;
2. with fractional factorial designs, we can uniquely estimate only some (or at times no)
interactions; the more fractionated the design, the fewer interactions that we can estimate.
Definition The interaction effects matrix plot is an upper right-triangular matrix of mean plots consisting of
k main effects plots on the diagonal and k*(k-1)/2 2-factor interaction effects plots on the
off-diagonal.
In general, interactions are not the same as the usual (multiplicative) cross-products. However,
for the special case of 2-level designs coded as (-,+) = (-1 +1), the interactions are identical to
cross-products. By way of contrast, if the 2-level designs are coded otherwise (e.g., the (1,2)
notation espoused by Taguchi and others), then this equivalance is not true. Mathematically,
{-1,+1} x {-1,+1} => {-1,+1}
but
{1,2} x {1,2} => {1,2,4}
Thus, coding does make a difference. We recommend the use of the (-,+) coding.
It is remarkable that with the - and + coding, the 2-factor interactions are dealt with, interpreted,
and compared in the same way that the k main effects are handled. It is thus natural to include
both 2-factor interactions and main effects within the same matrix plot for ease of comparison.
For the off-diagonal terms, the first construction step is to form the horizontal axis values, which
will be the derived values (also - and +) of the cross-product. For example, the settings for the
X1*X2 interaction are derived by simple multiplication from the data as shown below.
X1 X2 X1*X2
- - +
+ - -
- + -
+ + +
Thus X1, X2, and X1*X2 all form a closed (-, +) system. The advantage of the closed system is
that graphically interactions can be interpreted in the exact same fashion as the k main effects.
After the entire X1*X2 vector of settings has been formed in this way, the vertical axis of the
X1*X2 interaction plot is formed:
1. the plot point above X1*X2 = "-" is simply the mean of all response values for which
X1*X2 = "-"
2. the plot point above X1*X2 = "+" is simply the mean of all response values for which
X1*X2 = "+".
We form the plots for the remaining 2-factor interactions in a similar fashion.
All the mean plots, for both main effects and 2-factor interactions, have a common scale to
facilitate comparisons. Each mean plot has
1. Vertical Axis: The mean response for a given setting (- or +) of a given factor or a given
2-factor interaction.
2. Horizontal Axis: The 2 settings (- and +) within each factor, or within each 2-factor
interaction.
3. Legend:
1. A tag (1, 2, ..., k, 12, 13, etc.), with 1 = X1, 2 = X2, ..., k = Xk, 12 = X1*X2, 13 =
X1*X3, 35 = X3*X5, 123 = X1*X2*X3, etc.) which identifies the particular mean
plot; and
2. The least squares estimate of the factor (or 2-factor interaction) effect. These effect
estimates are large in magnitude for important factors and near-zero in magnitude for
unimportant factors.
In a later section, we discuss in detail the models associated with full and fraction factorial 2-level
designs. One such model representation is
Written in this form (with the leading 0.5), it turns out that the . are identically the effect due to
factor X. Further, the least squares estimate turns out to be, due to orthogonality, the simple
difference of means at the + setting and the - setting. This is true for the k main factors. It is also
true for all 2-factor and multi-factor interactions.
Thus, visually, the difference in the mean values on the plot is identically the least squares
estimate for the effect. Large differences (steep lines) imply important factors while small
differences (flat lines) imply unimportant factors.
In earlier sections, a somewhat different form of the model is used (without the leading 0.5). In
this case, the plotted effects are not necessarily equivalent to the least squares estimates. When
using a given software program, you need to be aware what convention for the model the
software uses. In either case, the effects matrix plot is still useful. However, the estimates of the
coefficients in the model are equal to the effect estimates only if the above convention for the
model is used.
Motivation As discussed in detail above, the next logical step beyond main effects is displaying 2-factor
interactions, and this plot matrix provides a convenient graphical tool for examining the relative
importance of main effects and 2-factor interactions in concert. To do so, we make use of the
striking aspect that in the context of 2-level designs, the 2-factor interactions are identical to
cross-products and the 2-factor interaction effects can be interpreted and compared the same way
as main effects.
Plot for
defective
springs
data
Constructing the interaction effects matrix plot for the defective springs data set yields the
following plot.
How to From the interaction effects matrix, we can draw three important conclusions:
interpret 1. Important Factors (including 2-factor interactions);
2. Best Settings;
3. Confounding Structure (for fractional factorial designs).
We discuss each of these in turn.
1. Important factors (including 2-factor interactions):
Jointly compare the k main factors and the k*(k-1)/2 2-factor interactions. For each of these
subplots, as we go from the "-" setting to the "+" setting within a subplot, is there a shift in
location of the average data (yes/no)? Since all subplots have a common (-1, +1) horizontal
axis, questions involving shifts in location translate into questions involving steepness of
the mean lines (large shifts imply steep mean lines while no shifts imply flat mean lines).
1. Identify the factor or 2-factor interaction that has the largest shift (based on
averages). This defines the "most important factor". The largest shift is determined
by the steepest line.
2. Identify the factor or 2-factor interaction that has the next largest shift (based on
averages). This defines the "second most important factor". This shift is determined
by the next steepest line.
3. Continue for the remaining factors.
This ranking of factors and 2-factor interactions based on local means is a major step in
building the definitive list of ranked factors as required for screening experiments.
2. Best settings:
For each factor (of the k main factors along the diagonal), which setting (- or +) yields the
"best" (highest/lowest) average response?
Note that the experimenter has the ability to change settings for only the k main factors, not
for any 2-factor interactions. Although a setting of some 2-factor interaction may yield a
better average response than the alternative setting for that same 2-factor interaction, the
experimenter is unable to set a 2-factor interaction setting in practice. That is to say, there
is no "knob" on the machine that controls 2-factor interactions; the "knobs" only control the
settings of the k main factors.
How then does this matrix of subplots serve as an improvement over the k best settings that
one would obtain from the dex mean plot? There are two common possibilities:
1. Steep Line:
For those main factors along the diagonal that have steep lines (that is, are
important), choose the best setting directly from the subplot. This will be the same as
the best setting derived from the dex mean plot.
2. Flat line:
For those main factors along the diagonal that have flat lines (that is, are
unimportant), the naive conclusion to use either setting, perhaps giving preference to
the cheaper setting or the easier-to-implement setting, may be unwittingly incorrect.
In such a case, the use of the off-diagonal 2-factor interaction information from the
interaction effects matrix is critical for deducing the better setting for this nominally
"unimportant" factor.
To illustrate this, consider the following example:
■ Suppose the factor X1 subplot is steep (important) with the best setting for X1
at "+".
■ Suppose the factor X2 subplot is flat (unimportant) with both settings yielding
about the same mean response.
Then what setting should be used for X2? To answer this, consider the following two
cases:
1. Case 1. If the X1*X2 interaction plot happens also to be flat (unimportant),
then choose either setting for X2 based on cost or ease.
2. Case 2. On the other hand, if the X1*X2 interaction plot is steep (important),
then this dictates a prefered setting for X2 not based on cost or ease.
To be specific for case 2, if X1*X2 is important, with X1*X2 = "+" being the better
setting, and if X1 is important, with X1 = "+" being the better setting, then this
implies that the best setting for X2 must be "+" (to assure that X1*X2 (= +*+) will
also be "+"). The reason for this is that since we are already locked into X1 = "+",
and since X1*X2 = "+" is better, then the only way we can obtain X1*X2 = "+" with
X1 = "+" is for X2 to be "+" (if X2 were "-", then X1*X2 with X1 = "+" would yield
X1*X2 = "-").
In general, if X1 is important, X1*X2 is important, and X2 is not important, then
there are 4 distinct cases for deciding what the best setting is for X2:
X1 X1*X2 => X2
+ + +
+ - -
- + -
- - +
By similar reasoning, examining each factor and pair of factors, we thus arrive at a
resulting vector of the k best settings:
(x1best, x2best, ..., xkbest)
This average-based k-vector should be compared with best settings k-vectors
obtained from previous steps (in particular, from step 1 in which the best settings
were drawn from the best data value).
When the average-based best settings and the data-based best settings agree, we
benefit from the increased confidence given our conclusions.
When the average-based best settings and the data-based best settings disagree, then
what settings should the analyst finally choose? Note that in general the
average-based settings and the data-based settings will invariably be identical for all
"important" factors. Factors that do differ are virtually always "unimportant". Given
such disagreement, the analyst has three options:
1. Use the average-based settings for minor factors. This has the advantage of a
broader (average) base of support.
2. Use the data-based settings for minor factors. This has the advantage of
demonstrated local optimality.
3. Use the cheaper or more convenient settings for the local factor. This has the
advantage of practicality.
Thus the interaction effects matrix yields important information not only about the ranked
list of factors, but also about the best settings for each of the k main factors. This matrix of
subplots is one of the most important tools for the experimenter in the analysis of 2-level
screening designs.
3. Confounding Structure (for Fractional Factorial Designs)
When the interaction effects matrix is used to analyze 2-level fractional (as opposed to full)
factorial designs, important additional information can be extracted from the matrix
regarding confounding structure.
It is well-known that all fractional factorial designs have confounding, a property whereby
every estimated main effect is confounded/contaminated/biased by some high-order
interactions. The practical effect of this is that the analyst is unsure of how much of the
estimated main effect is due to the main factor itself and how much is due to some
confounding interaction. Such contamination is the price that is paid by examining k
factors with a sample size n that is less than a full factorial n = 2k runs.
It is a "fundamental theorem" of the discipline of experimental design that for a given
number of factors k and a given number of runs n, some fractional factorial designs are
better than others. "Better" in this case means that the intrinsic confounding that must exist
in all fractional factorial designs has been minimized by the choice of design. This
minimization is done by constructing the design so that the main effect confounding is
pushed to as high an order interaction as possible.
The rationale behind this is that in physical science and engineering systems it has been
found that the "likelihood" of high-order interactions being significant is small (compared
to the likelihood of main effects and 2-factor interactions being significant). Given this, we
would prefer that such inescapable main effect confounding be with the highest order
interaction possible, and hence the bias to the estimated main effect be as small as possible.
The worst designs are those in which the main effect confounding is with 2-factor
interactions. This may be dangerous because in physical/engineering systems, it is quite
common for Nature to have some real (and large) 2-factor interactions. In such a case, the
2-factor interaction effect will be inseparably entangled with some estimated main effect,
and so the experiment will be flawed in that
1. ambiguous estimated main effects and
2. an ambiguous list of ranked factors
will result.
If the number of factors, k, is large and the number of runs, n, is constrained to be small,
then confounding of main effects with 2-factor interactions is unavoidable. For example, if
we have k = 7 factors and can afford only n = 8 runs, then the corresponding 2-level
fractional factorial design is a 27-4 which necessarily will have main effects confounded
with (3) 2-factor interactions. This cannot be avoided.
On the other hand, situations arise in which 2-factor interaction confounding with main
effects results not from constraints on k or n, but on poor design construction. For example,
if we have k = 7 factors and can afford n = 16 runs, a poorly constructed design might have
main effects counfounded with 2-factor interactions, but a well-constructed design with the
same k = 7, n = 16 would have main effects confounded with 3-factor interactions but no
2-factor interactions. Clearly, this latter design is preferable in terms of minimizing main
effect confounding/contamination/bias.
For those cases in which we do have main effects confounded with 2-factor interactions, an
important question arises:
For a particular main effect of interest, how do we know which 2-factor
interaction(s) confound/contaminate that main effect?
The usual answer to this question is by means of generator theory, confounding tables, or
alias charts. An alternate complementary approach is given by the interaction effects
matrix. In particular, if we are examining a 2-level fractional factorial design and
1. if we are not sure that the design has main effects confounded with 2-factor
interactions, or
2. if we are sure that we have such 2-factor interaction confounding but are not sure
what effects are confounded,
then how can the interaction effects matrix be of assistance? The answer to this question is
that the confounding structure can be read directly from the interaction effects matrix.
For example, for a 7-factor experiment, if, say, the factor X3 is confounded with the
2-factor interaction X2*X5, then
1. the appearance of the factor X3 subplot and the appearance of the 2-factor interaction
X2*X5 subplot will necessarily be identical, and
2. the value of the estimated main effect for X3 (as given in the legend of the main
effect subplot) and the value of the estimated 2-factor interaction effect for X2*X5
(as given in the legend of the 2-factor interaction subplot) will also necessarily be
identical.
The above conditions are necessary, but not sufficient for the effects to be confounded.
Hence, in the abscence of tabular descriptions (from your statistical software program) of
the confounding structure, the interaction effect matrix offers the following graphical
alternative for deducing confounding structure in fractional factorial designs:
1. scan the main factors along the diagonal subplots and choose the subset of factors
that are "important".
2. For each of the "important" factors, scan all of the 2-factor interactions and compare
the main factor subplot and estimated effect with each 2-factor interaction subplot
and estimated effect.
3. If there is no match, this implies that the main effect is not confounded with any
2-factor interaction.
4. If there is a match, this implies that the main effect may be confounded with that
2-factor interaction.
5. If none of the main effects are confounded with any 2-factor interactions, we can
have high confidence in the integrity (non-contamination) of our estimated main
effects.
6. In practice, for highly-fractionated designs, each main effect may be confounded
with several 2-factor interactions. For example, for a 27-4 fractional factorial design,
each main effect will be confounded with three 2-factor interactions. These 1 + 3 = 4
identical subplots will be blatantly obvious in the interaction effects matrix.
Finally, what happens in the case in which the design the main effects are not confounded
with 2-factor interactions (no diagonal subplot matches any off-diagonal subplot). In such a
case, does the interaction effects matrix offer any useful further insight and information?
The answer to this question is yes because even though such designs have main effects
unconfounded with 2-factor interactions, it is fairly common for such designs to have
2-factor interactions confounded with one another, and on occasion it may be of interest to
the analyst to understand that confounding. A specific example of such a design is a 24-1
design formed with X4 settings = X1*X2*X3. In this case, the 2-factor-interaction
confounding structure may be deduced by comparing all of the 2-factor interaction subplots
(and effect estimates) with one another. Identical subplots and effect estimates hint strongly
that the two 2-factor interactions are confounded. As before, such comparisons provide
necessary (but not sufficient) conditions for confounding. Most statistical software for
analyzing fractional factorial experiments will explicitly list the confounding structure.
Conclusions The application of the interaction effects matrix plot to the defective springs data set results in the
for the following conclusions:
defective 1. Ranked list of factors (including 2-factor interactions):
springs
1. X1 (estimated effect = 23.0)
data
2. X1*X3 (estimated effect = 10.0)
3. X2 (estimated effect = -5.0)
4. X3 (estimated effect = 1.5)
5. X1*X2 (estimated effect = 1.5)
6. X2*X3 (estimated effect = 0.0)
Factor 1 definitely looks important. The X1*X3 interaction looks important. Factor 2 is of
lesser importance. All other factors and 2-factor interactions appear to be unimportant.
2. Best Settings (on the average):
(X1,X2,X3) = (+,-,+) = (+1,-1,+1)
5. Process Improvement
5.5. Advanced topics
5.5.9. An EDA approach to experimental design
1. Important factors (of the k factors and the 2-factor interactions); and
Definition The block plot is a series of k basic block plots with each basic block plot for a main effect. Each
basic block plot asks the question as to whether that particular factor is important:
1. The first block plot asks the question: "Is factor X1 important?
2. The second block plot asks the question: "Is factor X2 important?
3. Continue for the remaining factors.
The i-th basic block plot, which targets factor i and asks whether factor Xi is important, is formed
by:
● Vertical Axis: Response
● Horizontal Axis: All 2k-1 possible combinations of the (k-1) non-target factors (that is,
"robustness" factors). For example, for the block plot focusing on factor X1 from a 23 full
factorial experiment, the horizontal axis will consist of all 23-1 = 4 distinct combinations of
factors X2 and X3. We create this robustness factors axis because we are interested in
determining if X1 is important robustly. That is, we are interested in whether X1 is
important not only in a general/summary kind of way, but also whether the importance of X
is universally and consistently valid over each of the 23-1 = 4 combinations of factors X2
and X3. These 4 combinations are (X2,X3) = (+,+), (+,-), (-,+), and (-,-). The robustness
factors on the horizontal axis change from one block plot to the next. For example, for the k
= 3 factor case:
1. the block plot targeting X1 will have robustness factors X2 and X3;
2. the block plot targeting X2 will have robustness factors X1 and X3;
3. the block plot targeting X3 will have robustness factors X1 and X2.
● Plot Character: The setting (- or +) for the target factor Xi. Each point in a block plot has an
associated setting for the target factor Xi. If Xi = "-", the corresponding plot point will be
"-"; if Xi = "+", the corresponding plot point will be "+".
For a particular combination of robustness factor settings (horizontally), there will be two points
plotted above it (vertically):
1. one plot point for Xi = "-"; and
2. the other plot point for Xi = "+".
In a block plot, these two plot points are surrounded by a box (a block) to focus the eye on the
internal within-block differences as opposed to the distraction of the external block-to-block
differences. Internal block differences reflect on the importance of the target factor (as desired).
External block-to-block differences reflect on the importance of various robustness factors, which
is not of primary interest.
Large within-block differences (that is, tall blocks) indicate a large local effect on the response
which, since all robustness factors are fixed for a given block, can only be attributed to the target
factor. This identifies an "important" target factor. Small within-block differences (small blocks)
indicate that the target factor Xi is unimportant.
Plot for
defective
springs
data
Applying the ordered response plot to the defective springs data set yields the following plot.
How to From the block plot, we are looking for the following:
interpret
1. Important factors (including 2-factor interactions);
2. Best settings for these factors.
We will discuss each of these in turn.
Important factors (including 2-factor interactions):
Look at each of the k block plots. Within a given block plot,
Are the corresponding block heights consistently large as we scan across the within-plot
robustness factor settings--yes/no; and are the within-block sign patterns (+ above -, or -
above +) consistent across all robustness factors settings--yes/no?
To facilitate intercomparisons, all block plots have the same vertical axis scale. Across such block
plots,
1. Which plot has the consistently largest block heights, along with consistent arrangement of
within-block +'s and -'s? This defines the "most important factor".
2. Which plot has the consistently next-largest block heights, along with consistent
arrangement of within-block +'s and -'s? This defines the "second most important factor".
3. Continue for the remaining factors.
This scanning and comparing of the k block plots easily leads to the identification of the most
important factors. This identification has the additional virtue over previous steps in that it is
robust. For a given important factor, the consistency of block heights and sign arrangement across
robustness factors gives additional credence to the robust importance of that factor. The factor is
important (the change in the response will be large) irrespective of what settings the robustness
then from binomial considerations there is one chance in 24-1 = 1/8 12.5% of the the 4 local X1
effects having the same sign (i.e., all positive or all negative). The usual statistical cutoff of 5%
has not been achieved here, but the 12.5% is suggestive. Further, the consistency of the 4 X1
effects (all near 30) is evidence of a robustness of the X effect over the settings of the other two
factors. In summary, the above suggests:
1. Factor 1 is probably important (the issue of how large the effect has to be in order to be
considered important will be discussed in more detail in a later section); and
2. The estimated factor 1 effect is about 30 units.
On the other hand, suppose the 4 block heights for factor 1 vary in the following cyclic way:
(X2,X3) block height (= local X1 effect)
(+,+) 30
(+,-) 20
(-,+) 30
(-,-) 20
2. 20 whenever X3 = "-"
When the factor X1 effect is not consistent, but in fact changes depending on the setting of factor
X3, then definitionally that is said to be an "X1*X3 interaction". That is precisely the case here,
and so our conclusions would be:
1. factor X1 is probably important;
2. the estimated factor X1 effect is 25 (= average of 30,20,30,and 20);
3. the X1*X3 interaction is probably important;
4. the estimated X1*X3 interaction is about 10 (= the change in the factor X1 effect as X3
changes = 30 - 20 = 10);
5. hence the X1*X3 interaction is less important than the X1 effect.
Note that we are using the term important in a qualitative sense here. More precise determinations
of importance in terms of statistical or engineering significance are discussed in later sections.
The block plot gives us the structure and the detail to allow such conclusions to be drawn and to
be understood. It is a valuable adjunct to the previous analysis steps.
Best settings:
After identifying important factors, it is also of use to determine the best settings for these factors.
As usual, best settings are determined for main effects only (since main effects are all that the
engineer can control). Best settings for interactions are not done because the engineer has no
direct way of controlling them.
In the block plot context, this determination of best factor settings is done simply by noting which
factor setting (+ or -) within each block is closest to that which the engineer is ultimately trying to
achieve. In the defective springs case, since the response variable is % acceptable springs, we are
clearly trying to maximize (as opposed to minimize, or hit a target) the response and the ideal
optimum point is 100%. Given this, we would look at the block plot of a given important factor
and note within each block which factor setting (+ or -) yields a data value closest to 100% and
then select that setting as the best for that factor.
From the defective springs block plots, we would thus conclude that
1. the best setting for factor 1 is +;
2. the best setting for factor 2 is -;
3. the best setting for factor 3 cannot be easily determined.
Conclusions In summary, applying the block plot to the defective springs data set results in the following
for the conclusions:
defective 1. Unranked list of important factors (including interactions):
springs
❍ X1 is important;
data
❍ X2 is important;
❍ X1*X3 is important.
2. Best Settings:
(X1,X2,X3) = (+,-,?) = (+1,-1,?)
5. Process Improvement
5.5. Advanced topics
5.5.9. An EDA approach to experimental design
Motivation Definitionally, if a factor is unimportant, the "+" average will be approximately the same as
the "-" average, and if a factor is important, the "+" average will be considerably different
from the "-" average. Hence a plot that compares the "+" averages with the "-" averages
directly seems potentially informative.
From the definition above, the dex Youden plot is a scatter plot with the "+" averages on
the vertical axis and the "-" averages on the horizontal axis. Thus, unimportant factors will
tend to cluster in the middle of the plot and important factors will tend to be far removed
from the middle.
Because of an arithmetic identity which requires that the average of any corresponding "+"
and "-" means must equal the grand mean, all points on a dex Youden plot will lie on a -45
degree diagonal line. Or to put it another way, for each factor
average (+) + average (-) = constant (with constant = grand mean)
So
average (+) = constant - average (-)
Therefore, the slope of the line is -1 and all points lie on the line. Important factors will plot
well-removed from the center because average (+) = average (-) at the center.
Plot for Applying the dex Youden plot for the defective springs data set yields the following plot.
defective
springs
data
Conclusions The application of the dex Youden plot to the defective springs data set results in the following
for the conclusions:
defective 1. Ranked list of factors (including interactions):
springs
1. X1 (most important)
data
2. X1*X3 (next most important)
3. X2
4. other factors are of lesser importance
2. Separation of factors into important/unimportant categories:
❍ "Important": X1, X1*X3, and X2
5. Process Improvement
5.5. Advanced topics
5.5.9. An EDA approach to experimental design
Motivation Because of the difference-of-means definition of the least squares estimates, and because of the
fact that all factors (and interactions) are standardized by taking on values of -1 and +1
(simplified to - and +), the resulting estimates are all on the same scale. Therefore, comparing and
ranking the estimates based on magnitude makes eminently good sense.
Moreover, since the sign of each estimate is completely arbitrary and will reverse depending on
how the initial assignments were made (e.g., we could assign "-" to treatment A and "+" to
treatment B or just as easily assign "+" to treatment A and "-" to treatment B), forming a ranking
based on magnitudes (as opposed to signed effects) is preferred.
Given that, the ultimate and definitive ranking of factor and interaction effects will be made based
on the ranked (magnitude) list of such least squares estimates. Such rankings are given
graphically, Pareto-style, within the plot; the rankings are given quantitatively by the tableau in
the upper right region of the plot. For the case when we have fractional (versus full) factorial
designs, the upper right tableau also gives the confounding structure for whatever design was
used.
If a factor is important, the "+" average will be considerably different from the "-" average, and so
the absolute value of the difference will be large. Conversely, unimportant factors have small
differences in the averages, and so the absolute value will be small.
We choose to form a Pareto chart of such |effects|. In the Pareto chart, the largest effects (= most
important factors) will be presented first (to the left) and then progress down to the smallest
effects (= least important) factors) to the right.
Plot for Applying the |effects| plot to the defective springs data yields the following plot.
defective
springs
data
Conclusions The application of the |effects| plot to the defective springs data set results in the following
for the conclusions:
defective 1. Ranked list of factors (including interactions):
springs
1. X1 (most important)
data
2. X1*X3 (next most important)
3. X2
4. other factors are of lesser importance
2. Separation of factors into important/unimportant categories:
❍ Important: X1, X1*X3, and X2
5. Process Improvement
5.5. Advanced topics
5.5.9. An EDA approach to experimental design
5.5.9.7. |Effects| plot
ANOVA The virtue of ANOVA is that it is a powerful, flexible tool with many
applications. The drawback of ANOVA is that
● it is heavily quantitative and non-intuitive;
t confidence T confidence intervals for the effects, using the t-distribution, are also
intervals heavily used for determining factor significance. As part of the t
approach, one first needs to determine sd(effect), the standard
deviation of an effect. For 2-level full and fractional factorial designs,
such a standard deviation is related to , the standard deviation of an
observation under fixed conditions, via the formula:
Non-statistical The above statistical methods can and should be used. Additionally,
considerations the non-statistical considerations discussed in the next few sections are
frequently insightful in practice and have their place in the EDA
approach as advocated here.
5. Process Improvement
5.5. Advanced topics
5.5.9. An EDA approach to experimental design
5.5.9.7. |Effects| plot
5. Process Improvement
5.5. Advanced topics
5.5.9. An EDA approach to experimental design
5.5.9.7. |Effects| plot
Apply with As with any rule-of-thumb, some caution should be used in applying
caution this critierion. Specifically, if the largest effect is in fact not very large,
this rule-of-thumb may not be useful.
5. Process Improvement
5.5. Advanced topics
5.5.9. An EDA approach to experimental design
5.5.9.7. |Effects| plot
Factor As a consequence of this "kinking", note the labels on the far right
labels margin of the plot. Factors to the left and above the kink point tend to
have far-right labels distinct and isolated. Factors at, to the right, and
below the kink point tend to have far right labels that are overstruck and
hard to read. A (rough) rule-of-thumb would then be to declare as
important those factors/interactions whose far-right labels are easy to
distinguish, and to declare as unimportant those factors/interactions
whose far-right labels are overwritten and hard to distinguish.
5. Process Improvement
5.5. Advanced topics
5.5.9. An EDA approach to experimental design
If the design is a fractional factorial, the confounding structure is provided for main effects
and 2-factor interactions.
Motivation To provide a rationale for the half-normal probability plot, we first dicuss the motivation for the
normal probability plot (which also finds frequent use in these 2-level designs).
The basis for the normal probability plot is the mathematical form for each (and all) of the
estimated effects. As discussed for the |effects| plot, the estimated effects are the optimal least
squares estimates. Because of the orthogonality of the 2k full factorial and the 2k-p fractional
factorial designs, all least squares estimators for main effects and interactions simplify to the
form:
estimated effect = (+) - (-)
with (+) the average of all response values for which the factor or interaction takes on a "+"
value, and where (-) is the average of all response values for which the factor or interaction
takes on a "-" value.
Under rather general conditions, the Central Limit Thereom allows that the difference-of-sums
form for the estimated effects tends to follow a normal distribution (for a large enough sample
size n) a normal distribution.
The question arises as to what normal distribution; that is, a normal distribution with what mean
and what standard deviation? Since all estimators have an identical form (a difference of
averages), the standard deviations, though unknown, will in fact be the same under the
assumption of constant . This is good in that it simplifies the normality analysis.
As for the means, however, there will be differences from one effect to the next, and these
differences depend on whether a factor is unimportant or important. Unimportant factors are
those that have near-zero effects and important factors are those whose effects are considerably
removed from zero. Thus, unimportant effects tend to have a normal distribution centered
near zero while important effects tend to have a normal distribution centered at their
respective true large (but unknown) effect values.
In the simplest experimental case, if the experiment were such that no factors were
important (that is, all effects were near zero), the (n-1) estimated effects would behave like
random drawings from a normal distribution centered at zero. We can test for such
normality (and hence test for a null-effect experiment) by using the normal probability plot.
Normal probability plots are easy to interpret. In simplest terms:
if linear, then normal
If the normal probability plot of the (n-1) estimated effects is linear, this implies that all of
the true (unknown) effects are zero or near-zero. That is, no factor is important.
On the other hand, if the truth behind the experiment is that there is exactly one factor that
was important (that is, significantly non-zero), and all remaining factors are unimportant
(that is, near-zero), then the normal probability plot of all (n-1) effects is near-linear for the
(n-2) unimportant factors and the remaining single important factor would stand well off
the line.
Similarly, if the experiment were such that some subset of factors were important and all
remaining factors were unimportant, then the normal probability plot of all (n-1) effects
would be near-linear for all unimportant factors with the remaining important factors all
well off the line.
In real life, with the number of important factors unknown, this suggests that one could
form a normal probability plot of the (n-1) estimated effects and draw a line through those
(unimportant) effects in the vicinity of zero. This identifies and extracts all remaining effects
off the line and declares them as important.
The above rationale and methodology works well in practice, with the net effect that the
normal probability plot of the effects is an important, commonly used and successfully
employed tool for identifying important factors in 2-level full and factorial experiments.
Following the lead of Cuthbert Daniel (1976), we augment the methodology and arrive at a
further improvement. Specifically, the sign of each estimate is completely arbitrary and will
reverse depending on how the initial assignments were made (e.g., we could assign "-" to
treatment A and "+" to treatment B or just as easily assign "+" to treatment A and "-" to
treatment B).
This arbitrariness is addressed by dealing with the effect magnitudes rather than the signed
effects. If the signed effects follow a normal distribution, the absolute values of the effects
follow a half-normal distribution.
In this new context, one tests for important versus unimportant factors by generating a
half-normal probability plot of the absolute value of the effects. As before, linearity implies
half-normality, which in turn implies all factors are unimportant. More typically, however,
the half-normal probability plot will be only partially linear. Unimportant (that is,
near-zero) effects manifest themselves as being near zero and on a line while important
(that is, large) effects manifest themselves by being off the line and well-displaced from zero.
Plot for The half-normal probability plot of the effects for the defectice springs data set is as follows.
defective
springs
data
How to From the half-normal probability plot, we look for the following:
interpret 1. Identifying Important Factors:
Determining the subset of important factors is the most important task of the half-normal
probability plot of |effects|. As discussed above, the estimated |effect| of an unimportant
factor will typically be on or close to a near-zero line, while the estimated |effect| of an
important factor will typically be displaced well off the line.
The separation of factors into important/unimportant categories is thus done by answering
the question:
Which points on the half-normal probability plot of |effects| are large and well-off
the linear collection of points drawn in the vicinity of the origin?
This line of unimportant factors typically encompasses the majority of the points on the
plot. The procedure consists, therefore, of the following:
1. identifying this line of near-zero (unimportant) factors; then
2. declaring the remaining off-line factors as important.
Note that the half-normal probability plot of |effects| and the |effects| plot have the same
vertical axis; namely, the ordered |effects|, so the following discussion about right-margin
factor identifiers is relevant to both plots. As a consequence of the natural on-line/off-line
segregation of the |effects| in half-normal probability plots, factors off-line tend to have
far-right labels that are distinct and isolated while factors near the line tend to have
far-right labels that are overstruck and hard to read. The rough rule-of-thumb would then
be to declare as important those factors/interactions whose far-right labels are easy to
distinguish and to declare as unimportant those factors/interactions whose far-right labels
are overwritten and hard to distinguish.
2. Ranked List of Factors (including interactions):
This is a minor objective of the half-normal probability plot (it is better done via the
|effects| plot). To determine the ranked list of factors from a half-normal probability plot,
simply scan the vertical axis |effects|
1. Which |effect| is largest? Note the factor identifier associated with this largest |effect|
(this is the "most important factor").
2. Which |effect| is next in size? Note the factor identifier associated with this next
largest |effect| (this is the "second most important factor").
3. Continue for the remaining factors. In practice, the bottom end of the ranked list (the
unimportant factors) will be hard to extract because of overstriking, but the top end
of the ranked list (the important factors) will be easy to determine.
In summary, it should be noted that since the signs of the estimated effects are arbitrary, we
recommend the use of the half-normal probability plot of |effects| technique over the normal
probability plot of the |effects|. These probability plots are among the most commonly-employed
EDA procedure for identification of important factors in 2-level full and factorial designs. The
half-normal probability plot enjoys widespread usage across both "classical" and Taguchi camps.
It deservedly plays an important role in our recommended 10-step graphical procedure for the
analysis of 2-level designed experiments.
Conclusions The application of the half-normal probability plot to the defective springs data set results in the
for the following conclusions:
defective 1. Ranked list of factors (including interactions):
springs
1. X1 (most important)
data
2. X1*X3 (next most important)
3. X2
4. other factors are of lesser importance
2. Separation of factors into important/unimportant categories:
Important: X1, X1*X3, and X2
Unimportant: the remainder
5. Process Improvement
5.5. Advanced topics
5.5.9. An EDA approach to experimental design
are "close" to the observed raw data values Y. To this end, two tasks exist:
1. Determine a good functional form f;
2. Determine good estimates for the coefficients in that function f.
For example, if we had two factors X1 and X2, our goal would be to
1. determine some function Y = f(X1,X2); and
2. estimate the parameters in f
such that the resulting model would yield predicted values that are as close as possible to the
observed response values Y. If the form f has been wisely chosen, a good model will result and
that model will have the characteristic that the differences ("residuals" = Y - ) will be uniformly
near zero. On the other hand, a poor model (from a poor choice of the form f) will have the
characteristic that some or all of the residuals will be "large".
For a given model, a statistic that summarizes the quality of the fit via the typical size of the n
residuals is the residual standard deviation:
with p denoting the number of terms in the model (including the constant term) and r denoting the
ith residual. We are also assuming that the mean of the residuals is zero, which will be the case
for models with a constant term that are fit using least squares.
If we have a good-fitting model, sres will be small. If we have a poor-fitting model, sres will be
large.
For a given data set, each proposed model has its own quality of fit, and hence its own residual
standard deviation. Clearly, the residual standard deviation is more of a model-descriptor than a
data-descriptor. Whereas "nature" creates the data, the analyst creates the models. Theoretically,
for the same data set, it is possible for the analyst to propose an indefinitely large number of
models.
In practice, however, an analyst usually forwards only a small, finite number of plausible models
for consideration. Each model will have its own residual standard deviation. The cumulative
residual standard deviation plot is simply a graphical representation of this collection of residual
standard deviations for various models. The plot is beneficial in that
1. good models are distinguished from bad models;
2. simple good models are distinguished from complicated good models.
In summary, then, the cumulative residual standard deviation plot is a graphical tool to help
assess
1. which models are poor (least desirable); and
2. which models are good but complex (more desirable); and
3. which models are good and simple (most desirable).
Output The outputs from the cumulative residual standard deviation plot are
1. Primary: A good-fitting prediction equation consisting of an additive constant plus the
most important main effects and interactions.
2. Secondary: The residual standard deviation for this good-fitting model.
Plot for
defective
springs
data
Applying the cumulative residual standard deviation plot to the defective springs data set yields
the following plot.
How to
interpret As discussed in detail under question 4 in the Motivation section, the cumulative residual
standard deviation "curve" will characteristically decrease left to right as we add more terms to
the model. The incremental improvement (decrease) tends to be large at the beginning when
important factors are being added, but then the decrease tends to be marginal at the end as
unimportant factors are being added.
Including all terms would yield a perfect fit (residual standard deviation = 0) but would also result
in an unwieldy model. Including only the first term (the average) would yield a simple model
(only one term!) but typically will fit poorly. Although a formal quantitative stopping rule can be
developed based on statistical theory, a less-rigorous (but good) alternative stopping rule that is
graphical, easy to use, and highly effective in practice is as follows:
Keep adding terms to the model until the curve's "elbow" is encountered. The "elbow
point" is that value in which there is a consistent, noticeably shallower slope (decrease) in
the curve. Include all terms up to (and including) the elbow point (after all, each of these
included terms decreased the residual standard deviation by a large amount). Exclude any
terms after the elbow point since all such successive terms decreased the residual standard
deviation so slowly that the terms were "not worth the complication of keeping".
From the residual standard deviation plot for the defective springs data, we note the following:
1. The residual standard deviation (rsd) for the "baseline" model
is sres = 13.7.
2. As we add the next term, X1, the rsd drops nearly 7 units (from 13.7 to 6.6).
3. If we add the term X1*X3, the rsd drops another 3 units (from 6.6 to 3.4).
4. If we add the term X2, the rsd drops another 2 units (from 3.4 to 1.5).
5. When the term X3 is added, the reduction in the rsd (from about 1.5 to 1.3) is negligible.
6. Thereafter to the end, the total reduction in the rsd is from only 1.3 to 0.
In step 5, note that when we have effects of equal magnitude (the X3 effect is equal to the X1*X2
interaction effect), we prefer including a main effect before an interaction effect and a
lower-order interaction effect before a higher-order interaction effect.
In this case, the "kink" in the residual standard deviation curve is at the X2 term. Prior to that, all
added terms (including X2) reduced the rsd by a large amount (7, then 3, then 2). After the
addition of X2, the reduction in the rsd was small (all less than 1): .2, then .8, then .5, then 0.
The final recommended model in this case thus involves p = 4 terms:
1. the average (= 71.25)
2. factor X1
3. the X1*X3
4. factor X2
The fitted model thus takes on the form
The motivation for using the 0.5 term was given in an earlier section.
The least squares estimates for the coefficients in this model are
average = 71.25
B1 = 23
B13 = 10
B2 = -5
The B1 = 23, B13 = 10, and B2 = -5 least squares values are, of course, identical to the estimated
effects E1 = 23, E13 = 10, and E2 = -5 (= (+1) - (-1)) values as previously derived in step 7 of
this recommended 10-step DEX analysis procedure.
The final fitted model is thus
Applying this prediction equation to the 8 design points yields: predicted values that are close
to the data Y, and residuals (Res = Y - ) that are close to zero:
X1 X2 X3 Y Res
- - - 67 67.25 -0.25
+ - - 79 80.25 -1.25
- + - 61 62.25 -1.25
+ + - 75 75.25 -0.25
- - + 59 57.25 +1.75
+ - + 90 90.25 -0.25
- + + 52 52.25 -0.25
+ + + 87 85.25 +1.75
Computing the residual standard deviation:
with n = number of data points = 8, and p = 4 = number of estimated coefficients (including the
average) yields
sres = 1.54 (= 1.5 if rounded to 1 decimal place)
This detailed res = 1.54 calculation brings us full circle for 1.54 is the value given above the X3
term on the cumulative residual standard deviation plot.
Conclusions The application of the Cumulative Residual Standard Deviation Plot to the defective springs data
for the set results in the following conclusions:
defective 1. Good-fitting Parsimonious (constant + 3 terms) Model:
springs
data
2. Residual Standard Deviation for this Model (as a measure of the goodness-of-fit for the
model):
sres = 1.54
5. Process Improvement
5.5. Advanced topics
5.5.9. An EDA approach to experimental design
5.5.9.9. Cumulative residual standard deviation plot
Predicted For given settings of the factors X1 to Xk, a fitted model will yield
values and predicted values. For each (and every) setting of the Xi's, a
residuals "perfect-fit" model is one in which the predicted values are identical
to the observed responses Y at these Xi's. In other words, a perfect-fit
model would yield a vector of predicted values identical to the
observed vector of response values. For these same Xi's, a
"good-fitting" model is one that yields predicted values "acceptably
near", but not necessarily identical to, the observed responses Y.
The residuals (= deviations = error) of a model are the vector of
differences (Y - ) between the responses and the predicted values
from the model. For a perfect-fit model, the vector of residuals would
be all zeros. For a good-fitting model, the vector of residuals will be
acceptably (from an engineering point of view) close to zero.
5. Process Improvement
5.5. Advanced topics
5.5.9. An EDA approach to experimental design
5.5.9.9. Cumulative residual standard deviation plot
Sum of Since a model's adequacy is inversely related to the size of its residuals,
absolute one obvious statistic is the sum of the absolute residuals.
residuals
Clearly, for a fixed n,the smaller this sum is, the smaller are the
residuals, which implies the closer the predicted values are to the raw
data Y, and hence the better the fitted model. The primary disadvantage
of this statistic is that it may grow larger simply as the sample size n
grows larger.
Average A better metric that does not change (much) with increasing sample size
absolute is the average absolute residual:
residual
with n denoting the number of response values. Again, small values for
this statistic imply better-fitting models.
Square root An alternative, but similar, metric that has better statistical properties is
of the the square root of the average squared residual.
average
squared
residual
As with the previous statistic, the smaller this statistic, the better the
model.
Residual Our final metric, which is used directly in inferential statistics, is the
standard residual standard deviation
deviation
5. Process Improvement
5.5. Advanced topics
5.5.9. An EDA approach to experimental design
5.5.9.9. Cumulative residual standard deviation plot
The predicted values for this fitted trivial model are thus given by a
vector consisting of the same value (namely ) throughout. The
residual vector for this model will thus simplify to simple deviations
from the mean:
enough (because the common value will not be close enough to some
of the raw responses Y) and so the model must be augmented--but how?
Next-step The logical next-step proposed model will consist of the above additive
model constant plus some term that will improve the predicted values the most.
This will equivalently reduce the residuals the most and thus reduce the
residual standard deviation the most.
Using the As it turns out, it is a mathematical fact that the factor or interaction that
most has the largest estimated effect
important
effects
will necessarily, after being included in the model, yield the "biggest
bang for the buck" in terms of improving the predicted values toward
the response values Y. Hence at this point the model-building process
and the effect estimation process merge.
In the previous steps in our analysis, we developed a ranked list of
factors and interactions. We thus have a ready-made ordering of the
terms that could be added, one at a time, to the model. This ranked list
of effects is precisely what we need to cumulatively build more
complicated, but better fitting, models.
Step through Our procedure will thus be to step through, one by one, the ranked list of
the ranked effects, cumulatively augmenting our current model by the next term in
list of the list, and then compute (for all n design points) the predicted values,
factors residuals, and residual standard deviation. We continue this
one-term-at-a-time augmentation until the predicted values are
acceptably close to the observed responses Y (and hence the residuals
and residual standard deviation become acceptably close to zero).
Starting with the simple average, each cumulative model in this iteration
process will have its own associated residual standard deviation. In
practice, the iteration continues until the residual standard deviations
become sufficiently small.
5. Process Improvement
5.5. Advanced topics
5.5.9. An EDA approach to experimental design
5.5.9.9. Cumulative residual standard deviation plot
Stopping rule The cumulative residual standard deviation plot provides a visual
answer to the question:
What is a good model?
by answering the related question:
When do we stop adding terms to the cumulative model?
Graphically, the most common stopping rule for adding terms is to
cease immediately upon encountering the "elbow". We include all
terms up to and including the elbow point since each of these terms
decreased the residual standard deviation by a large amount. However,
we exclude any terms afterward since these terms do not decrease the
residual standard deviation fast enough to warrant inclusion in the
model.
5. Process Improvement
5.5. Advanced topics
5.5.9. An EDA approach to experimental design
5.5.9.9. Cumulative residual standard deviation plot
Ordered The listing above has the terms ordered with the main effects, then the 2-factor
linear interactions, then the 3-factor interactions, etc. In practice, it is recommended that the
combination terms be ordered by importance (whether they be main effects or interactions). Aside
from providing a functional representation of the response, models should help reinforce
what is driving the response, which such a re-ordering does. Thus for k = 2, if factor 2 is
most important, the 2-factor interaction is next in importance, and factor 1 is least
important, then it is recommended that the above ordering of
be rewritten as
5. Process Improvement
5.5. Advanced topics
5.5.9. An EDA approach to experimental design
5.5.9.9. Cumulative residual standard deviation plot
Effect For factor X2, the effect E2 is defined as the change in the mean
response as we proceed from the "-" setting of the factor to the "+"
setting of the factor. Symbolically:
Note that the estimated effect E2 value does not involve a model per
se, and is definitionally invariant to any other factors and interactions
that may affect the response. We examined and derived the factor
effects E in the previous steps of the general DEX analysis procedure.
On the other hand, the estimated coefficient B2 in a model is defined
as the value that results when we place the model into the least squares
fitting process (regression). The value that returns for B2 depends, in
general, on the form of the model, on what other terms are included in
the model, and on the experimental design that was run. The least
squares estimate for B2 is mildly complicated since it involves a
behind-the-scenes design matrix multiplication and inversion. The
coefficient values B that result are generally obscured by the
mathematics to make the coefficients have the collective property that
the fitted model as a whole yield a minimum sum of squared deviations
("least squares").
Why is 1/2 Given the computational simplicity of orthogonal designs, why then is
the 1/2 the appropriate multiplicative constant? Why not 1/3, 1/4, etc.? To
appropriate answer this, we revisit our specified desire that
multiplicative when we view the final fitted model and look at the coefficient
term in these associated with X2, say, we want the value of the coefficient B2
orthogonal to reflect identically the expected total change Y in the
models? response Y as we proceed from the "-" setting of X2 to the "+"
setting of X2 (that is, we would like the estimated coefficient B2
to be identical to the estimated effect E2 for factor X2).
Thus in glancing at the final model with this form, the coefficients B of
the model will immediately reflect not only the relative importance of
the coefficients, but will also reflect (absolutely) the effect of the
associated term (main effect or interaction) on the response.
In general, the least squares estimate of a coefficient in a linear model
will yield a coefficient that is essentially a slope:
= (change in response)/(change in factor levels)
associated with a given factor X. Thus in order to achieve the desired
interpretation of the coefficients B as being the raw change in the Y (
Y), we must account for and remove the change in X ( X).
-1*-1 = +1
-1*+1 = -1
+1*-1 = -1
+1*+1 = +1
and hence the set of values that the 2-factor interactions (and all
interactions) take on are in the closed set {-1,+1}. This -1 and +1
notation is superior in its consistency to the (1,2) notation of Taguchi
1*1 = 1
1*2 = 2
2*1 = 2
2*2 = 4
which yields the set {1,2,4}. To circumvent this, we would need to
replace multiplication with modular multiplication (see page 440 of
Ryan (2000)). Hence, with the -1,+1 values for the main factors, we
also have -1,+1 values for all interactions which in turn yields (for all
terms) a consistent X of
X = (+1) - (-1) = +2
In summary then,
B = ( )
= ( Y) / 2
= (1/2) * ( Y)
and so to achieve our goal of having the final coefficients reflect Y
only, we simply gather up all of the 2's in the denominator and create a
leading multiplicative constant of 1 with denominator 2, that is, 1/2.
Example for k For example, for the trivial k = 1 case, the obvious model
= 1 case Y = intercept + slope*X1
Y=c+( )*X1
becomes
Y = c + (1/ X) * ( Y)*X1
or simply
Y = c + (1/2) * ( Y)*X1
Y = c + (1/2)*(factor 1 effect)*X1
Y = c + (1/2)*(B*)*X1, with B* = 2B = E
This k = 1 factor result is easily seen to extend to the general k-factor
case.
5. Process Improvement
5.5. Advanced topics
5.5.9. An EDA approach to experimental design
5.5.9.9. Cumulative residual standard deviation plot
Example To illustrate in detail the above latter point, suppose the (-1,+1)
factor X1 is really a coding of temperature T with the original
temperature ranging from 300 to 350 degrees and the (-1,+1) factor
X2 is really a coding of time t with the original time ranging from 20
to 30 minutes. Given that, a linear model in the original temperature
T and time t would yield coefficients whose magnitude depends on
the magnitude of T (300 to 350) and t (20 to 30), and whose value
would change if we decided to change the units of T (e.g., from
Fahrenheit degrees to Celsius degrees) and t (e.g., from minutes to
seconds). All of this is avoided by carrying out the fit not in the
original units for T (300,350) and t (20,30), but in the coded units of
X1 (-1,+1) and X2 (-1,+1). The resulting coefficients are
unit-invariant, and thus the coefficient magnitudes reflect the true
contribution of the factors and interactions without regard to the unit
of measurement.
Coding does not Such coding leads to no loss of generality since the coded factor may
lead to loss of be expressed as a simple linear relation of the original factor (X1 to
generality T, X2 to t). The unit-invariant coded coefficients may be easily
transformed to unit-sensitive original coefficients if so desired.
5. Process Improvement
5.5. Advanced topics
5.5.9. An EDA approach to experimental design
5.5.9.9. Cumulative residual standard deviation plot
Table of fitted Substituting the values for the four design points into this equation yields the
values following fitted values
X1 X2 Y
- - 2 2
+ - 4 4
- + 6 6
+ + 8 8
Perfect fit This is a perfect-fit model. Such perfect-fit models will result anytime (in this
orthogonal 2-level design family) we include all main effects and all interactions.
Remarkably, this is true not only for k = 2 factors, but for general k.
Residuals For a given model (any model), the difference between the response value Y and the
predicted value is referred to as the "residual":
residual = Y -
The perfect-fit full-blown (all main factors and all interactions of all orders) models
will have all residuals identically zero.
The perfect fit is a mathematical property that comes if we choose to use the linear
model with all possible terms.
Price for What price is paid for this perfect fit? One price is that the variance of is increased
perfect fit unnecessarily. In addition, we have a non-parsimonious model. We must compute
and carry the average and the coefficients of all main effects and all interactions.
Including the average, there will in general be 2k coefficients to fully describe the
fitting of the n = 2k points. This is very much akin to the Y = f(X) polynomial fitting
of n distinct points. It is well known that this may be done "perfectly" by fitting a
polynomial of degree n-1. It is comforting to know that such perfection is
mathematically attainable, but in practice do we want to do this all the time or even
anytime? The answer is generally "no" for two reasons:
1. Noise: It is very common that the response data Y has noise (= error) in it. Do
we want to go out of our way to fit such noise? Or do we want our model to
filter out the noise and just fit the "signal"? For the latter, fewer coefficients
may be in order, in the same spirit that we may forego a perfect-fitting (but
jagged) 11-th degree polynomial to 12 data points, and opt out instead for an
imperfect (but smoother) 3rd degree polynomial fit to the 12 points.
2. Parsimony: For full factorial designs, to fit the n = 2k points we would need to
compute 2k coefficients. We gain information by noting the magnitude and
sign of such coefficients, but numerically we have n data values Y as input and
n coefficients B as output, and so no numerical reduction has been achieved.
We have simply used one set of n numbers (the data) to obtain another set of n
numbers (the coefficients). Not all of these coefficients will be equally
important. At times that importance becomes clouded by the sheer volume of
the n = 2k coefficients. Parsimony suggests that our result should be simpler
and more focused than our n starting points. Hence fewer retained coefficients
are called for.
The net result is that in practice we almost always give up the perfect, but unwieldy,
model for an imperfect, but parsimonious, model.
Imperfect fit The above calculations illustrated the computation of predicted values for the full
model. On the other hand, as discussed above, it will generally be convenient for
signal or parsimony purposes to deliberately omit some unimportant factors. When
the analyst chooses such a model, we note that the methodology for computing
predicted values is precisely the same. In such a case, however, the resulting
predicted values will in general not be identical to the original response values Y; that
is, we no longer obtain a perfect fit. Thus, linear models that omit some terms will
have virtually all non-zero residuals.
5. Process Improvement
5.5. Advanced topics
5.5.9. An EDA approach to experimental design
5.5.9.9. Cumulative residual standard deviation plot
does not guarantee that the derived model will be good at some
distance from the design points.
Applies only Of course, all such discussion of interpolation and extrapolation makes
for sense only in the context of continuous ordinal factors such as
continuous temperature, time, pressure, size, etc. Interpolation and extrapolation
factors make no sense for discrete non-ordinal factors such as supplier,
operators, design types, etc.
5. Process Improvement
5.5. Advanced topics
5.5.9. An EDA approach to experimental design
5.5.9.9. Cumulative residual standard deviation plot
Example For example, for the k = 2 factor (Temperature (300 to 350), and time
(20 to 30)) experiment discussed in the previous sections, the usual
4-run 22 full factorial design may be replaced by the following 5-run 22
full factorial design with a center point.
X1 X2 Y
- - 2
+ - 4
- + 6
+ + 8
0 0
Predicted Since "-" stands for -1 and "+" stands for +1, it is natural to code the
value for the center point as (0,0). Using the recommended model
center point
Replicated In practice, this center point value frequently has two, or even three or
center points more, replications. This not only provides a reference point for
assessing the interpolative power of the model at the center, but it also
allows us to compute model-free estimates of the natural error in the
data. This in turn allows us a more rigorous method for computing the
uncertainty for individual coefficients in the model and for rigorously
carrying out a lack-of-fit test for assessing general model adequacy.
5. Process Improvement
5.5. Advanced topics
5.5.9. An EDA approach to experimental design
5.5.9.9. Cumulative residual standard deviation plot
Predicting the The important next step is to convert the raw (in units of the original
response for factors T and t) interpolation point into a coded (in units of X1 and X2)
the interpolation point. From the graph or otherwise, we note that a linear
interpolated translation between T and X1, and between t and X2 yields
point T = 300 => X1 = -1
T = 350 => X1 = +1
thus
X1 = 0 is at T = 325
|-------------|-------------|
-1 ? 0 +1
300 310 325 350
t = 20 => X2 = -1
t = 30 => X2 = +1
therefore,
X2 = 0 is at t = 25
|-------------|-------------|
-1 0 ? +1
20 25 26 30
thus
t = 26 => X2 = +0.2
Substituting X1 = -0.6 and X2 = +0.2 into the prediction equation
Graphical Thus
representation
of response
value for
interpolated
data point
5. Process Improvement
5.5. Advanced topics
5.5.9. An EDA approach to experimental design
5.5.9.9. Cumulative residual standard deviation plot
Pseudo-data The predicted value from the modeling effort may be viewed as
pseudo-data, data obtained without the experimental effort. Such
"free" data can add tremendously to the insight via the application of
graphical techniques (in particular, the contour plots and can add
significant insight and understanding as to the nature of the response
surface relating Y to the X's.
But, again, a final word of caution: the "pseudo data" that results from
the modeling process is exactly that, pseudo-data. It is not real data,
and so the model and the model's predicted values must be validated
by additional confirmatory (real) data points. A more balanced
approach is that:
Models may be trusted as "real" [that is, to generate predicted
values and contour curves], but must always be verified [that is,
by the addition of confirmatory data points].
The rule of thumb is thus to take advantage of the available and
recommended model-building mechanics for these 2-level designs, but
do treat the resulting derived model with an equal dose of both
optimism and caution.
5. Process Improvement
5.5. Advanced topics
5.5.9. An EDA approach to experimental design
More specifically, the dex contour plot is constructed and utilized via the following 7 steps:
1. Axes
2. Contour Curves
3. Optimal Response Value
4. Best Corner
5. Steepest Ascent/Descent
6. Optimal Curve
7. Optimal Setting
with
1. Axes: Choose the two most important factors in the experiment as the two axes on the plot.
2. Contour Curves: Based on the fitted model and the best data settings for all of the
remaining factors, draw contour curves involving the two dominant factors. This yields a
graphical representation of the response surface. The details for constructing linear contour
curves are given in a later section.
3. Optimal Value: Identify the theoretical value of the response that constitutes "best." In
particular, what value would we like to have seen for the response?
4. Best "Corner": The contour plot will have four "corners" for the two most important factors
Xi and Xj: (Xi,Xj) = (-,-), (-,+), (+,-), and (+,+). From the data, identify which of these four
corners yields the highest average response .
5. Steepest Ascent/Descent: From this optimum corner point, and based on the nature of the
contour lines near that corner, step out in the direction of steepest ascent (if maximizing) or
steepest descent (if minimizing).
6. Optimal Curve: Identify the curve on the contour plot that corresponds to the ideal optimal
value.
7. Optimal Setting: Determine where the steepest ascent/descent line intersects the optimum
contour curve. This point represents our "best guess" as to where we could have run our
experiment so as to obtain the desired optimal response.
Motivation In addition to increasing insight, most experiments have a goal of optimizing the response. That
is, of determining a setting (X10, X20, ..., Xk0) for which the response is optimized.
The tool of choice to address this goal is the dex contour plot. For a pair of factors Xi and Xj, the
dex contour plot is a 2-dimensional representation of the 3-dimensional Y = f(Xi,Xj) response
surface. The position and spacing of the isocurves on the dex contour plot are an easily
interpreted reflection of the nature of the surface.
In terms of the construction of the dex contour plot, there are three aspects of note:
1. Pairs of Factors: A dex contour plot necessarily has two axes (only); hence only two out of
the k factors can be represented on this plot. All other factors must be set at a fixed value
(their optimum settings as determined by the ordered data plot, the dex mean plot, and the
interaction effects matrix plot).
2. Most Important Factor Pair: Many dex contour plots are possible. For an experiment with k
example, for k = 4 factors there are 6 possible contour plots: X1 and X2, X1 and X3, X1 and
X4, X2 and X3, X2 and X4, and X3 and X4. In practice, we usually generate only one contour
plot involving the two most important factors.
3. Main Effects Only: The contour plot axes involve main effects only, not interactions. The
rationale for this is that the "deliverable" for this step is k settings, a best setting for each of
the k factors. These k factors are real and can be controlled, and so optimal settings can be
used in production. Interactions are of a different nature as there is no "knob on the
machine" by which an interaction may be set to -, or to +. Hence the candidates for the axes
on contour plots are main effects only--no interactions.
In summary, the motivation for the dex contour plot is that it is an easy-to-use graphic that
provides insight as to the nature of the response surface, and provides a specific answer to the
question "Where (else) should we have collected the data so to have optimized the response?".
Plot for Applying the dex contour plot for the defective springs data set yields the following plot.
defective
springs
data
How to From the dex contour plot for the defective springs data, we note the following regarding the 7
interpret framework issues:
● Axes
● Contour curves
● Optimal response value
● Optimal response curve
● Best corner
● Steepest Ascent/Descent
● Optimal setting
Conclusions The application of the dex contour plot to the defective springs data set results in the following
for the conclusions:
defective 1. Optimal settings for the "next" run:
springs
data Coded : (X1,X2,X3) = (+1.5,+1.0,+1.3)
Uncoded: (OT,CC,QT) = (1637.5,0.7,127.5)
2. Nature of the response surface:
The X1*X3 interaction is important, hence the effect of factor X1 will change depending on
the setting of factor X3.
5. Process Improvement
5.5. Advanced topics
5.5.9. An EDA approach to experimental design
5.5.9.10. DEX contour plot
Possible In general, the two axes of the contour plot could consist of
choices ● X1 and X2,
● X1 and X3, or
● X2 and X3.
In this case, since X1 is the top item in the ranked list, with an
estimated effect of 23, X1 is the most important factor and so will
occupy the horizontal axis of the contour plot. The admissible list thus
reduces to
● X1 and X2, or
● X1 and X3.
To decide between these two pairs, we look to the second item in the
ranked list. This is the interaction term X1*X3, with an estimated effect
of 10. Since interactions are not allowed as contour plot axes, X1*X3
must be set aside. On the other hand, the components of this interaction
(X1 and X3) are not to be set aside. Since X1 has already been
identified as one axis in the contour plot, this suggests that the other
component (X3) be used as the second axis. We do so. Note that X3
itself does not need to be important (in fact, it is noted that X3 is
ranked fourth in the listed table with a value of 1.5).
In summary then, for this example the contour plot axes are:
Horizontal Axis: X1
Vertical Axis: X3
Four cases Other cases can be more complicated. In general, the recommended
for rule for selecting the two plot axes is that they be drawn from the first
recommended two items in the ranked list of factors. The following four cases cover
choice of most situations in practice:
axes
● Case 1:
2. Item 2 is anything
Recommended choice:
1. Horizontal axis: component 1 from the item 1 interaction
e.g., X2);
2. Horizontal axis: component 2 from the item 1 interaction
(e.g., X4).
5. Process Improvement
5.5. Advanced topics
5.5.9. An EDA approach to experimental design
5.5.9.10. DEX contour plot
Constructing As for the details of the construction of the contour plot, we draw on the
the contour model-fitting results that were achieved in the cumulative residual standard
curves deviation plot. In that step, we derived the following good-fitting prediction
equation:
The contour plot has axes of X1 and X3. X2 is not included and so a fixed value of
X2 must be assigned. The response variable is the percentage of acceptable
springs, so we are attempting to maximize the response. From the ordered data
plot, the main effects plot, and the interaction effects matrix plot of the general
analysis sequence, we saw that the best setting for factor X2 was "-". The best
observed response data value (Y = 90) was achieved with the run (X1,X2,X3) =
(+,-,+), which has X2 = "-". Also, the average response for X2 = "-" was 73 while
the average response for X2 = "+" was 68. We thus set X2 = -1 in the prediction
equation to obtain
This equation involves only X1 and X3 and is immediately usable for the X1 and
X3 contour plot. The raw response values in the data ranged from 52 to 90. The
response implies that the theoretical worst is Y = 0 and the theoretical best is Y =
100.
To generate the contour curve for, say, Y = 70, we solve
Values for For these X3 = g(X1) equations, what values should be used for X1? Since X1 is
X1 coded in the range -1 to +1, we recommend expanding the horizontal axis to -2 to
+2 to allow extrapolation. In practice, for the dex contour plot generated
previously, we chose to generate X1 values from -2, at increments of .05, up to +2.
For most data sets, this gives a smooth enough curve for proper interpretation.
Values for Y What values should be used for Y? Since the total theoretical range for the
response Y (= percent acceptable springs) is 0% to 100%, we chose to generate
contour curves starting with 0, at increments of 5, and ending with 100. We thus
generated 21 contour curves. Many of these curves did not appear since they were
beyond the -2 to +2 plot range for the X1 and X3 factors.
Summary In summary, the contour plot curves are generated by making use of the
(rearranged) previously derived prediction equation. For the defective springs data,
the appearance of the contour plot implied a strong X1*X3 interaction.
5. Process Improvement
5.5. Advanced topics
5.5.9. An EDA approach to experimental design
5.5.9.10. DEX contour plot
Optimal For the defective springs data, the response (percentage of acceptable
value for springs) ranged from Y = 52 to 90. The theoretically worst value would
this example be 0 (= no springs are acceptable), and the theoretically best value
would be 100 (= 100% of the springs are acceptable). Since we are
trying to maximize the response, the selected optimal value is 100.
5. Process Improvement
5.5. Advanced topics
5.5.9. An EDA approach to experimental design
5.5.9.10. DEX contour plot
Use the raw This is done by using the raw data, extracting out the two "axes factors",
data computing the average response at each of the four corners, then
choosing the corner with the best average.
For the defective springs data, the raw data were
X1 X2 X3 Y
- - - 67
+ - - 79
- + - 61
+ + - 75
- - + 59
+ - + 90
- + + 52
+ + + 87
The two plot axes are X1 and X3 and so the relevant raw data collapses
to
X1 X3 Y
- - 67
+ - 79
- - 61
+ - 75
- + 59
+ + 90
- + 52
+ + 87
5. Process Improvement
5.5. Advanced topics
5.5.9. An EDA approach to experimental design
5.5.9.10. DEX contour plot
Defective Since our goal for the defective springs problem is to maximize the
springs response, we seek the path of steepest ascent. Our starting point is the
example best corner (the upper right corner (+,+)), which has an average
response value of 88.5. The contour lines for this plot have
increments of 5 units. As we move from left to right across the
contour plot, the contour lines go from low to high response values.
In the plot, we have drawn the maximum contour level, 105, as a
thick line. For easier identification, we have also drawn the contour
level of 90 as thick line. This contour level of 90 is immediately to
the right of the best corner
Conclusions on The nature of the contour curves in the vicinity of (+,+) suggests a
steepest ascent path of steepest ascent
for defective 1. in the "northeast" direction
springs
2. about 30 degrees above the horizontal.
example
5. Process Improvement
5.5. Advanced topics
5.5.9. An EDA approach to experimental design
5.5.9.10. DEX contour plot
Defective For the defective springs data, we search for the Y = 100 contour
springs curve. As determined in the steepest ascent/descent section, the Y =
example 90 curve is immediately outside the (+,+) point. The next curve to the
right is the Y = 95 curve, and the next curve beyond that is the Y =
100 curve. This is the optimal response curve.
5. Process Improvement
5.5. Advanced topics
5.5.9. An EDA approach to experimental design
5.5.9.10. DEX contour plot
Theoretically, any (X1,X3) setting along the optimal curve would generate the
desired response of Y = 100. In practice, however, this is true only if our
estimated contour surface is identical to "nature's" response surface. In reality, the
plotted contour curves are truth estimates based on the available (and "noisy") n =
8 data values. We are confident of the contour curves in the vicinity of the data
points (the four corner points on the chart), but as we move away from the corner
points, our confidence in the contour curves decreases. Thus the point on the Y =
100 optimal response curve that is "most likely" to be valid is the one that is
closest to a corner point. Our objective then is to locate that "near-point".
Defective In terms of the defective springs contour plot, we draw a line from the best corner,
springs (+,+), outward and perpendicular to the Y = 90, Y = 95, and Y = 100 contour
example curves. The Y = 100 intersection yields the "nearest point" on the optimal
response curve.
Having done so, it is of interest to note the coordinates of that optimal setting. In
this case, from the graph, that setting is (in coded units) approximately at
(X1 = 1.5, X3 = 1.3)
Table of With the determination of this setting, we have thus, in theory, formally
coded and completed our original task. In practice, however, more needs to be done. We
uncoded need to know "What is this optimal setting, not just in the coded units, but also in
factors the original (uncoded) units"? That is, what does (X1=1.5, X3=1.3) correspond to
in the units of the original data?
To deduce his, we need to refer back to the original (uncoded) factors in this
problem. They were:
Coded Uncoded Factor
Factor
X1 OT: Oven Temperature
X2 CC: Carbon Concentration
X3 QT: Quench Temperature
Uncoded These factors had settings-- what were the settings of the coded and uncoded
and coded factors? From the original description of the problem, the uncoded factor settings
factor were:
settings 1. Oven Temperature (1450 and 1600 degrees)
2. Carbon Concentration (.5% and .7%)
3. Quench Temperature (70 and 120 degrees)
with the usual settings for the corresponding coded factors:
1. X1 (-1,+1)
2. X2 (-1,+1)
3. X3 (-1,+1)
Diagram To determine the corresponding setting for (X1=1.5, X3=1.3), we thus refer to the
following diagram, which mimics a scatter plot of response averages--oven
temperature (OT) on the horizontal axis and quench temperature (QT) on the
vertical axis:
The "X" on the chart represents the "near point" setting on the optimal curve.
Optimal To determine what "X" is in uncoded units, we note (from the graph) that a linear
setting for transformation between OT and X1 as defined by
X1 (oven OT = 1450 => X1 = -1
temperature) OT = 1600 => X1 = +1
yields
X1 = 0 being at OT = (1450 + 1600) / 2 = 1525
thus
|-------------|-------------|
X1: -1 0 +1
OT: 1450 1525 1600
and so X1 = +2, say, would be at oven temperature OT = 1675:
|-------------|-------------|-------------|
X1: -1 0 +1 +2
OT: 1450 1525 1600 1675
and hence the optimal X1 setting of 1.5 must be at
OT = 1600 + 0.5*(1675-1600) = 1637.5
Optimal Similarly, from the graph we note that a linear transformation between quench
setting for temperature QT and coded factor X3 as specified by
X3 (quench QT = 70 => X3 = -1
temperature) QT = 120 => X3 = +1
yields
X3 = 0 being at QT = (70 + 120) / 2 = 95
as in
|-------------|-------------|
X3: -1 0 +1
QT: 70 95 120
and so X3 = +2, say, would be quench temperature = 145:
|-------------|-------------|-------------|
X3: -1 0 +1 +2
QT: 70 95 120 145
Hence, the optimal X3 setting of 1.3 must be at
QT = 120 + .3*(145-120)
QT = 127.5
5. Process Improvement
5. Process Improvement
5.6. Case Studies
5. Process Improvement
5.6. Case Studies
5.6.1. Eddy Current Probe Sensitivity Case Study
Statistical The case study illustrates the analysis of a 23 full factorial experimental
Goals design. The specific statistical goals of the experiment are:
1. Determine the important factors that affect sensitivity.
2. Determine the settings that maximize sensitivity.
3. Determine a predicition equation that functionally relates
sensitivity to various factors.
Data Used There were three detector wiring component factors under consideration:
in the 1. X1 = Number of wire turns
Analysis
2. X2 = Wire winding distance
3. X3 = Wire guage
Since the maximum number of runs that could be afforded timewise and
costwise in this experiment was n = 10, a 23 full factoral experiment
(involving n = 8 runs) was chosen. With an eye to the usual monotonicity
assumption for 2-level factorial designs, the selected settings for the
three factors were as follows:
1. X1 = Number of wire turns : -1 = 90, +1 = 180
2. X2 = Wire winding distance: -1 = 0.38, +1 = 1.14
3. X3 = Wire guage : -1 = 40, +1 = 48
The experiment was run with the 8 settings executed in random order.
The following data resulted.
Y X1 X2 X3
Probe Number Winding Wire Run
Impedance of Turns Distance Guage Sequence
-------------------------------------------------
1.70 -1 -1 -1 2
4.57 +1 -1 -1 8
0.55 -1 +1 -1 3
3.39 +1 +1 -1 6
1.51 -1 -1 +1 7
4.59 +1 -1 +1 1
0.67 -1 +1 +1 4
4.29 +1 +1 +1 5
Note that the independent variables are coded as +1 and -1. These
represent the low and high settings for the levels of each variable.
Factorial designs often have 2 levels for each factor (independent
variable) with the levels being coded as -1 and +1. This is a scaling of
the data that can simplify the analysis. If desired, these scaled values can
be converted back to the original units of the data for presentation.
5. Process Improvement
5.6. Case Studies
5.6.1. Eddy Current Probe Sensitivity Case Study
Conclusions We can make the following conclusions based on the ordered data plot.
from the 1. Important Factors: The 4 highest response values have X1 = + while the 4 lowest response
Ordered values have X1 = -. This implies factor 1 is the most important factor. When X1 = -, the -
Data Plot values of X2 are higher than the + values of X2. Similarly, when X1 = +, the - values of X2
are higher than the + values of X2. This implies X2 is important, but less so than X1. There
is no clear pattern for X3.
2. Best Settings: In this experiment, we are using the device as a detector, and so high
sensitivities are desirable. Given this, our first pass at best settings yields (X1 = +1, X2 =
-1, X3 = either).
Plot the The next step in the analysis is to generate a dex scatter plot.
Data: Dex
Scatter Plot
Conclusions We can make the following conclusions based on the dex scatter plot.
from the 1. Important Factors: Factor 1 (Number of Turns) is clearly important. When X1 = -1, all 4
DEX senstivities are low, and when X1 = +1, all 4 sensitivities are high. Factor 2 (Winding
Scatter Plot Distance) is less important. The 4 sensitivities for X2 = -1 are slightly higher, as a group,
than the 4 sensitivities for X2 = +1. Factor 3 (Wire Gage) does not appear to be important
at all. The sensitivity is about the same (on the average) regardless of the settings for X3.
2. Best Settings: In this experiment, we are using the device as a detector, so high sensitivities
are desirable. Given this, our first pass at best settings yields (X1 = +1, X2 = -1, X3 =
either).
3. There does not appear to be any significant outliers.
Check for One of the primary questions is: what are the most important factors? The ordered data plot and
Main the dex scatter plot provide useful summary plots of the data. Both of these plots indicated that
Effects: Dex factor X1 is clearly important, X2 is somewhat important, and X3 is probably not important.
Mean Plot
The dex mean plot shows the main effects. This provides probably the easiest to interpert
indication of the important factors.
Conclusions The dex mean plot (or main effects plot) reaffirms the ordering of the dex scatter plot, but
from the additional information is gleaned because the eyeball distance between the mean values gives an
DEX Mean approximation to the least squares estimate of the factor effects.
Plot
We can make the following conclusions from the dex mean plot.
1. Important Factors:
X1 (effect = large: about 3 ohms)
X2 (effect = moderate: about -1 ohm)
X3 (effect = small: about 1/4 ohm)
2. Best Settings: As before, choose the factor settings that (on the average) maximize the
sensitivity:
(X1,X2,X3) = (+,-,+)
Comparison All of these plots are used primarily to detect the most important factors. Because it plots a
of Plots summary statistic rather than the raw data, the dex mean plot shows the main effects most clearly.
However, it is still recommended to generate either the ordered data plot or the dex scatter plot
(or both). Since these plot the raw data, they can sometimes reveal features of the data that might
be masked by the dex mean plot.
5. Process Improvement
5.6. Case Studies
5.6.1. Eddy Current Probe Sensitivity Case Study
Conclusions We can make the following conclusions from the dex interaction effects plot.
from the 1. Important Factors: Looking for the plots that have the steepest lines (that is, largest
DEX effects), we note that:
Interaction
❍ X1 (number of turns) is the most important effect: estimated effect = -3.1025;
Effects Plot
❍ X2 (winding distance) is next most important: estimated effect = -.8675;
2. Best Settings: As with the main effects plot, the best settings to maximize the sensitivity
are
(X1,X2,X3) = (+1,-1,+1)
but with the X3 setting of +1 mattering little.
5. Process Improvement
5.6. Case Studies
5.6.1. Eddy Current Probe Sensitivity Case Study
Conclusions It is recalled that the block plot will access factor importance by the degree of consistency
from the (robustness) of the factor effect over a variety of conditions. In this light, we can make the
Block Plots following conclusions from the block plots.
1. Relative Importance of Factors: All of the bar heights in plot 1 (turns) are greater than the
bar heights in plots 2 and 3. Hence, factor 1 is more important than factors 2 and 3.
2. Statistical Significance: In plot 1, looking at the levels within each bar, we note that the
response for level 2 is higher than level 1 in each of the 4 bars. By chance, this happens
with probability 1/(24) = 1/16 = 6.25%. Hence, factor 1 is near-statistically significant at
the 5% level. Similarly, for plot 2, level 1 is greater than level 2 for all 4 bars. Hence,
factor 2 is near-statistically significant. For factor 3, there is not consistent ordering within
all 4 bars and hence factor 3 is not statistically significant. Rigorously speaking then,
factors 1 and 2 are not statistically significant (since 6.25% is not < 5%); on the other hand
such near-significance is suggestive to the analyst that such factors may in fact be
important, and hence warrant further attention.
Note that the usual method for determining statistical significance is to perform an analysis
of variance (ANOVA). ANOVA is based on normality assumptions. If these normality
assumptions are in fact valid, then ANOVA methods are the most powerful method for
determining statistical signficance. The advantage of the block plot method is that it is
based on less rigorous assumptions than ANOVA. At an exploratory stage, it is useful to
know that our conclusions regarding important factors are valid under a wide range of
assumptions.
3. Interactions: For factor 1, the 4 bars do not change height in any systematic way and hence
there is no evidence of X1 interacting with either X2 or X3. Similarly, there is no evidence
of interactions for factor 2.
5. Process Improvement
5.6. Case Studies
5.6.1. Eddy Current Probe Sensitivity Case Study
Data from factorial designs with two levels can be analyzed using the Yates technique,
which is described in Box, Hunter, and Hunter. The Yates technique utilizes the
special structure of these designs to simplify the computation and presentation of the
fit.
Dataplot Dataplot generated the following output for the Yates analysis.
Output
Description In fitting 2-level factorial designs, Dataplot takes advantage of the special structure of
of Yates these designs in computing the fit and printing the results. Specifically, the main
Output effects and interaction effects are printed in sorted order from most significant to least
significant. It also prints the t-value for the term and the residual standard deviation
obtained by fitting the model with that term and the mean (the column labeled RESSD
MEAN + TERM), and for the model with that term, the mean, and all other terms that
are more statistically significant (the column labeled RESSD MEAN + CUM
TERMS).
Of the five columns of output, the most important are the first (which is the identifier),
the second (the least squares estimated effect = the difference of means), and the last
(the residuals standard deviation for the cumulative model, which will be discussed in
more detail in the next section).
Conclusions In summary, the Yates analysis provides us with the following ranked list of important
factors.
1. X1 (Number of Turns): effect estimate = 3.1025 ohms
2. X2 (Winding Distance): effect estimate = -0.8675 ohms
3. X2*X3 (Winding Distance with effect estimate = 0.2975 ohms
Wire Guage):
4. X1*X3 (Number of Turns with Wire effect estimate = 0.2475 ohms
Guage):
5. X3 (Wire Guage): effect estimate = 0.2125 ohms
6. X1*X2*X3 (Number of Turns with effect estimate = 0.1425 ohms
Winding Distance with Wire
Guage):
7. X1*X2 (Number of Turns with effect estimate = 0.1275 ohms
Winding Distance):
5. Process Improvement
5.6. Case Studies
5.6.1. Eddy Current Probe Sensitivity Case Study
Default At the top of the Yates table, if none of the factors are important, the prediction
Model: equation defaults to the mean of all the response values (the overall or grand mean).
Grand That is,
Mean
From the last column of the Yates table, it can be seen that this simplest of all models
has a residual standard deviation (a measure of goodness of fit) of 1.74106 ohms.
Finding a good-fitting model was not one of the stated goals of this experiment, but the
determination of a good-fitting model is "free" along with the rest of the analysis, and
so it is included.
Conclusions From the last column of the Yates table, we can summarize the following prediction
equations:
●
5. Process Improvement
5.6. Case Studies
5.6.1. Eddy Current Probe Sensitivity Case Study
Best Settings Also, from the various analyses, we conclude that the best design
settings (on the average) for a high-sensitivity detector are
(X1,X2,X3) = (+,-,+)
that is
number of turns = 180,
winding distance = 0.38, and
wire guage = 48.
Can We Thus, in a very real sense, the analysis is complete. We have achieved
Extract the two most important stated goals of the experiment:
More From 1. gaining insight into the most important factors, and
the Data?
2. ascertaining the optimal production settings.
On the other hand, more information can be squeezed from the data, and
that is what this section and the remaining sections address.
1. First of all, we focus on the problem of taking the ranked list of
factors and objectively ascertaining which factors are "important"
versus "unimportant".
2. In a parallel fashion, we use the subset of important factors
derived above to form a "final" prediction equation that is good
(that is, having a sufficiently small residual standard deviation)
while being parsimonious (having a small number of terms),
compared to the full model, which is perfect (having a residual
standard deviation = 0, that is, the predicted values = the raw
data), but is unduly complicated (consisting of a constant + 7
terms).
5. Process Improvement
5.6. Case Studies
5.6.1. Eddy Current Probe Sensitivity Case Study
Criteria for The criteria that we can use in determining whether to keep a factor in the model can be
Including summarized as follows.
Terms in 1. Effects: Engineering Significance
the Model
2. Effects: 90% Numerical Significance
3. Effects: Statistical Significance
4. Effects: Half-normal Probability Plots
5. Averages: Youden Plot
The first four criteria focus on effect estimates with three numerical criteria and one graphical
criterion. The fifth criterion focuses on averages. We discuss each of these criteria in detail in the
following sections.
The last section summarizes the conclusions based on all of the criteria.
That is, declare a factor as "important" if its effect is more than 2 standard deviations away from 0
(0, by definition, meaning "no effect"). The difficulty with this is that in order to invoke this we
need the = the standard deviation of an observation.
For the current case study, ignoring 3-factor interactions and higher-order interactions leads to an
estimate of based on omitting only a single term: the X1*X2*X3 interaction.
Thus for this current case study, if one assumes that the 3-factor interaction is nil and hence
represents a single drawing from a population centered at zero, an estimate of the standard
deviation of an effect is simply the estimate of the interaction effect (0.1425). Two such effect
standard deviations is 0.2850. This rule becomes to keep all > 0.2850. This results in keeping
three terms: X1 (3.10250), X2 (-.86750), and X1*X2 (.29750).
Effects:
Probability
Plots
The half-normal probablity plot clearly shows two factors displaced off the line, and we see that
those two factors are factor 1 and factor 2. In conclusion, keep two factors: X1 (3.10250) and X2
(-.86750).
Effects:
Youden Plot
A dex Youden plot can be used in the following way. Keep a factor as "important" if it is
displaced away from the central-tendency bunch in a Youden plot of high and low averages.
For the case study at hand, the Youden plot clearly shows a cluster of points near the grand
average (2.65875) with two displaced points above (factor 1) and below (factor 2). Based on the
Youden plot, we thus conclude to keep two factors: X1 (3.10250) and X2 (-.86750).
Conclusions In summary, the criterion for specifying "important" factors yielded the following:
1. Effects, Engineering Significant: X1 X2
2. Effects, Numerically Significant: X1 X2 (X2*X3 is borderline)
3. Effects, Statistically Significant: X1 X2 X2*X3
4. Effects, Half-Normal Probability Plot: X1 X2
5. Averages, Youden Plot: X1 X2
All the criteria select X1 and X2. One also includes the X2*X3 interaction term (and it is
borderline for another criteria).
We thus declare the following consensus:
1. Important Factors: X1 and X2
2. Parsimonious Prediction Equation:
5. Process Improvement
5.6. Case Studies
5.6.1. Eddy Current Probe Sensitivity Case Study
Table of If the model fits well, should be near Y for all 8 design points. Hence the 8 residuals should all
Residuals be near zero. The 8 predicted values and residuals for the model with these data are:
Residual What is the magnitude of the typical residual? There are several ways to compute this, but the
Standard statistically optimal measure is the residual standard deviation:
Deviation
with ri denoting the ith residual, N = 8 is the number of observations, and P = 3 is the number of
fitted parameters. From the Yates table, the residual standard deviation is 0.30429.
How Should If the prediction equation is adequate, the residuals from that equation should behave like random
Residuals drawings (typically from an approximately normal distribution), and should, since presumably
Behave? random, have no structural relationship with any factor. This includes any and all potential terms
(X1, X2, X3, X1*X2, X1*X3, X2*X3, X1*X2*X3).
Further, if the model is adequate and complete, the residuals should have no structural
relationship with any other variables that may have been recorded. In particular, this includes the
run sequence (time), which is really serving as a surrogate for any physical or environmental
variable correlated with time. Ideally, all such residual scatter plots should appear structureless.
Any scatter plot that exhibits structure suggests that the factor should have been formally
included as part of the prediction equation.
Validating the prediction equation thus means that we do a final check as to whether any other
variables may have been inadvertently left out of the prediction equation, including variables
drifting with time.
The graphical residual analysis thus consists of scatter plots of the residuals versus all 3 factors
and 4 interactions (all such plots should be structureless), a scatter plot of the residuals versus run
sequence (which also should be structureless), and a normal probability plot of the residuals
(which should be near linear). We present such plots below.
Residual
Plots
The first plot is a normal probability plot of the residuals. The second plot is a run sequence plot
of the residuals. The remaining plots are plots of the residuals against each of the factors and each
of the interaction terms.
5. Process Improvement
5.6. Case Studies
5.6.1. Eddy Current Probe Sensitivity Case Study
Global How does one predict the response at points other than those used in the experiment? The
Prediction prediction equation yields good results at the 8 combinations of coded -1 and +1 values for the
three factors:
1. X1 = Number of turns = 90 and 180
2. X2 = Winding distance = .38 and 1.14
3. X3 = Wire gauge = 40 and 48
What, however, would one expect the detector to yield at target settings of, say,
1. Number of turns = 150
2. Winding distance = .50
3. Wire guage = 46
Based on the fitted equation, we first translate the target values into coded target values as
follows:
coded target = -1 + 2*(target-low)/(high-low)
Hence the coded target values are
1. X1 = -1 + 2*(150-90)/(180-90) = 0.333333
2. X2 = -1 + 2*(.50-.38)/(1.14-.38) = -0.684211
3. X3 = -1 + 2*(46-40)/(48-40) = 0.5000
Thus the raw data
(Number of turns,Winding distance,Wire guage) = (150,0.50,46)
translates into the coded
(X1,X2,X3) = (0.333333,-0.684211,0.50000)
on the -1 to +1 scale.
Inserting these coded values into the fitted equation yields, as desired, a predicted value of
= 2.65875 + 0.5(3.10250*(.333333) - 0.86750*(-.684211)) = 3.47261
The above procedure can be carried out for any values of turns, distance, and gauge. This is
subject to the usual cautions that equations that are good near the data point vertices may not
necessarily be good everywhere in the factor space. Interpolation is a bit safer than extrapolation,
but it is not guaranteed to provide good results, of course. One would feel more comfortable
about interpolation (as in our example) if additional data had been collected at the center point
and the center point data turned out to be in good agreement with predicted values at the center
point based on the fitted model. In our case, we had no such data and so the sobering truth is that
the user of the equation is assuming something in which the data set as given is not capable of
suggesting one way or the other. Given that assumption, we have demonstrated how one may
cautiously but insightfully generate predicted values that go well beyond our limited original data
set of 8 points.
Global In order to determine the best settings for the factors, we can use a dex contour plot. The dex
Determination contour plot is generated for the two most significant factors and shows the value of the response
of Best variable at the vertices (i.e, the -1 and +1 settings for the factor variables) and indicates the
Settings direction that maximizes (or minimizes) the response variable. If you have more than two
significant factors, you can generate a series of dex contour plots with each one using two of the
important factors.
DEX Contour The following is the dex contour plot of the number of turns and the winding distance.
Plot
The maximum value of the response variable (eddy current) corresponds to X1 (number of turns)
equal to -1 and X2 (winding distance) equal to +1. The thickened line in the contour plot
corresponds to the direction that maximizes the response variable. This information can be used
in planning the next phase of the experiment.
5. Process Improvement
5.6. Case Studies
5.6.1. Eddy Current Probe Sensitivity Case Study
The dex contour plot gave us the best settings for the factors (X1 = -1
and X2 = 1).
Next Step Full and fractional designs are typically used to identify the most
important factors. In some applications, this is sufficient and no further
experimentation is performed. In other applications, it is desired to
maximize (or minimize) the response variable. This typically involves
the use of response surface designs. The dex contour plot can provide
guidance on the settings to use for the factor variables in this next phase
of the experiment.
This is a common sequence for designed experiments in engineering and
scientific applications. Note the iterative nature of this approach. That is,
you typically do not design one large experiment to answer all your
questions. Rather, you run a series of smaller experiments. The initial
experiment or experiments are used to identify the important factors.
Once these factors are identified, follow-up experiments can be run to
fine tune the optimal settings (in terms of maximizing/minimizing the
response variable) for these most important factors.
For this particular case study, a response surface design was not used.
5. Process Improvement
5.6. Case Studies
5.6.1. Eddy Current Probe Sensitivity Case Study
Click on the links below to start Dataplot and run this case study
The links in this column will connect you with more detailed
yourself. Each step may use results from previous steps, so please be
information about each analysis step from the case study
patient. Wait until the software verifies that the current step is
description.
complete before clicking on the next step.
1. Perform a Yates fit to estimate the 1. The Yates analysis shows that the
main effects and interaction effects. factor 1 and factor 2 main effects
are significant, and the interaction
for factors 2 and 3 is at the
boundary of statistical significance.
6. Model selection
7. Model validation
1. Compute residuals and predicted values 1. Check the link for the
from the partial model suggested by values of the residual and
the Yates analysis. predicted values.
1. Generate a dex contour plot using 1. The dex contour plot shows
factors 1 and 2. X1 = -1 and X2 = +1 to be the
best settings.
5. Process Improvement
5.6. Case Studies
5. Process Improvement
5.6. Case Studies
5.6.2. Sonoluminescent Light Intensity Case Study
Response This experiment utilizes the following response and factor variables.
Variable, 1. Response Variable (Y) = The sonoluminescent light intensity.
Factor
2. Factor 1 (X1) = Molarity (amount of Solute). The coding is -1 for
Variables,
0.10 mol and +1 for 0.33 mol.
and Factor-
Level 3. Factor 2 (X2) = Solute type. The coding is -1 for sugar and +1 for
Settings glycerol.
4. Factor 3 (X3) = pH. The coding is -1 for 3 and +1 for 11.
5. Factor 4 (X4) = Gas type in water. The coding is -1 for helium
and +1 for air.
6. Factor 5 (X5) = Water depth. The coding is -1 for half and +1 for
full.
7. Factor 6 (X6) = Horn depth. The coding is -1 for 5 mm and +1 for
10 mm.
8. Factor 7 (X7) = Flask clamping. The coding is -1 for unclamped
and +1 for clamped.
This data set has 16 observations. It is a 27-3 design with no center
points.
Goal of the This case study demonstrates the analysis of a 27-3 fractional factorial
Experiment experimental design. The goals of this case study are:
1. Determine the important factors that affect the sonoluminescent
light intensity. Specifically, we are trying to maximize this
intensity.
2. Determine the best settings of the seven factors so as to maximize
the sonoluminescent light intensity.
Data The following are the data used for this analysis. This data set is given in Yates order.
Used in
the
Analysis Y X1 X2 X3 X4 X5 X6
X7
Light Solute Gas Water Horn
Flask
Intensity Molarity type pH Type Depth Depth
Clamping
------------------------------------------------------------------
80.6 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0
-1.0
66.1 1.0 -1.0 -1.0 -1.0 -1.0 1.0
1.0
59.1 -1.0 1.0 -1.0 -1.0 1.0 -1.0
1.0
68.9 1.0 1.0 -1.0 -1.0 1.0 1.0
-1.0
Reading These data can be read into Dataplot with the following commands
Data into SKIP 25
Dataplot READ INN.DAT Y X1 TO X7
5. Process Improvement
5.6. Case Studies
5.6.2. Sonoluminescent Light Intensity Case Study
Conclusions We can make the following conclusions based on the ordered data plot.
from the 1. Two points clearly stand out. The first 13 points lie in the 50 to 100 range, the next point is
Ordered greater than 100, and the last two points are greater than 300.
Data Plot
2. Important Factors: For these two highest points, factors X1, X2, X3, and X7 have the same
value (namely, +, -, +, -, respectively) while X4, X5, and X6 have differing values. We
conclude that X1, X2, X3, and X7 are potentially important factors, while X4, X5, and X6
are not.
3. Best Settings: Our first pass makes use of the settings at the observed maximum (Y =
373.8). The settings for this maximum are (+, -, +, -, +, -, -).
Plot the The next step in the analysis is to generate a dex scatter plot.
Data: Dex
Scatter Plot
Conclusions We can make the following conclusions based on the dex scatter plot.
from the 1. Important Factors: Again, two points dominate the plot. For X1, X2, X3, and X7, these two
DEX points emanate from the same setting, (+, -, +, -), while for X4, X5, and X6 they emanate
Scatter Plot from different settings. We conclude that X1, X2, X3, and X7 are potentially important,
while X4, X5, and X6 are probably not important.
2. Best Settings: Our first pass at best settings yields (X1 = +, X2 = -, X3 = +, X4 = either, X5
= either, X6 = either, X7 = -).
Check for The dex mean plot is generated to more clearly show the main effects:
Main
Effects: Dex
Mean Plot
Conclusions We can make the following conclusions from the dex mean plot.
from the 1. Important Factors:
DEX Mean X2 (effect = large: about -80)
Plot X7 (effect = large: about -80)
X1 (effect = large: about 70)
X3 (effect = large: about 65)
X6 (effect = small: about -10)
X5 (effect = small: between 5 and 10)
X4 (effect = small: less than 5)
2. Best Settings: Here we step through each factor, one by one, and choose the setting that
yields the highest average for the sonoluminescent light intensity:
(X1,X2,X3,X4,X5,X6,X7) = (+,-,+,+,+,-,-)
Comparison All of the above three plots are used primarily to determine the most important factors. Because it
of Plots plots a summary statistic rather than the raw data, the dex mean plot shows the ordering of the
main effects most clearly. However, it is still recommended to generate either the ordered data
plot or the dex scatter plot (or both). Since these plot the raw data, they can sometimes reveal
features of the data that might be masked by the dex mean plot.
In this case, the ordered data plot and the dex scatter plot clearly show two dominant points. This
feature would not be obvious if we had generated only the dex mean plot.
Interpretation-wise, the most important factor X2 (solute) will, on the average, change the light
intensity by about 80 units regardless of the settings of the other factors. The other factors are
interpreted similarly.
In terms of the best settings, note that the ordered data plot, based on the maximum response
value, yielded
+, -, +, -, +, -, -
Note that a consensus best value, with "." indicating a setting for which the three plots disagree,
would be
+, -, +, ., +, -, -
Note that the factor for which the settings disagree, X4, invariably defines itself as an
"unimportant" factor.
5. Process Improvement
5.6. Case Studies
5.6.2. Sonoluminescent Light Intensity Case Study
Conclusions We make the following conclusions from the dex interaction effects plot.
from the 1. Important Factors: Looking for the plots that have the steepest lines (that is, the largest
DEX effects), and noting that the legends on each subplot give the estimated effect, we have that
Interaction
❍ The diagonal plots are the main effects. The important factors are: X2, X7, X1, and
Effects Plot
X3. These four factors have |effect| > 60. The remaining three factors have |effect| <
10.
❍ The off-diagonal plots are the 2-factor interaction effects. Of the 21 2-factor
interactions, 9 are nominally important, but they fall into three groups of three:
■ X1*X3, X4*X6, X2*X7 (effect = 70)
All remaining 2-factor interactions are small having an |effect| < 20. A virtue of the
interaction effects matrix plot is that the confounding structure of this Resolution IV
design can be read off the plot. In this case, the fact that X1*X3, X4*X6, and X2*X7
all have effect estimates identical to 70 is not a mathematical coincidence. It is a
reflection of the fact that for this design, the three 2-factor interactions are
confounded. This is also true for the other two sets of three (X2*X3, X4*X5, X1*X7,
and X1*X2, X5*X6, X3*X7).
2. Best Settings: Reading down the diagonal plots, we select, as before, the best settings "on
the average":
(X1,X2,X3,X4,X5,X6,X7) = (+,-,+,+,+,-,-)
For the more important factors (X1, X2, X3, X7), we note that the best settings (+, -, +, -)
are consistent with the best settings for the 2-factor interactions (cross-products):
X1: +, X2: - with X1*X2: -
X1: +, X3: + with X1*X3: +
X1: +, X7: - with X1*X7: -
X2: -, X3: + with X2*X3: -
X2: -, X7: - with X2*X7: +
X3: +, X7: - with X3*X7: -
5. Process Improvement
5.6. Case Studies
5.6.2. Sonoluminescent Light Intensity Case Study
Conclusions We can make the following conclusions from the block plots.
from the 1. Relative Importance of Factors: Because of the expanded vertical axis, due to the two
Block Plots "outliers", the block plot is not particularly revealing. Block plots based on alternatively
scaled data (e.g., LOG(Y)) would be more informative.
5. Process Improvement
5.6. Case Studies
5.6.2. Sonoluminescent Light Intensity Case Study
Sample
Youden Plot
Conclusions We can make the following conclusions from the Youden plot.
from the 1. In the upper left corner are the interaction term X1*X3 and the main effects X1 and X3.
Youden plot
2. In the lower right corner are the main effects X2 and X7 and the interaction terms X2*X3
and X1*X2.
3. The remaining terms are clustered in the center, which indicates that such effects have
averages that are similar (and hence the effects are near zero), and so such effects are
relatively unimportant.
4. On the far right of the plot, the confounding structure is given (e.g., 13: 13+27+46), which
suggests that the information on X1*X3 (on the plot) must be tempered with the fact that
X1*X3 is confounded with X2*X7 and X4*X6.
5. Process Improvement
5.6. Case Studies
5.6.2. Sonoluminescent Light Intensity Case Study
Sample
|Effects|
Plot
Conclusions We can make the following conclusions from the |effects| plot.
from the 1. A ranked list of main effects and interaction terms is:
|effects| plot
X2
X7
X1*X3 (confounded with X2*X7 and X4*X6)
X1
X3
X2*X3 (confounded with X4*X5 and X1*X7)
X1*X2 (confounded with X3*X7 and X5*X6)
X3*X4 (confounded with X1*X6 and X2*X5)
X1*X4 (confounded with X3*X6 and X5*X7)
X6
X5
X1*X2*X4 (confounded with other 3-factor interactions)
X4
X2*X4 (confounded with X3*X5 and X6*X7)
X1*X5 (confounded with X2*X6 and X4*X7)
2. From the graph, there is a clear dividing line between the first seven effects (all |effect| >
50) and the last eight effects (all |effect| < 20). This suggests we retain the first seven terms
as "important" and discard the remaining as "unimportant".
3. Again, the confounding structure on the right reminds us that, for example, the nominal
effect size of 70.0125 for X1*X3 (molarity*pH) can come from an X1*X3 interaction, an
X2*X7 (solute*clamping) interaction, an X4*X6 (gas*horn depth) interaction, or any
mixture of the three interactions.
5. Process Improvement
5.6. Case Studies
5.6.2. Sonoluminescent Light Intensity Case Study
Sample
Half-Normal
Probability
Plot
Conclusions We can make the following conclusions from the half-normal probability plot.
from the 1. The points in the plot divide into two clear clusters:
Half-Normal
❍ An upper cluster (|effect| > 60).
Probability
Plot ❍ A lower cluster (|effect| < 20).
5. Process Improvement
5.6. Case Studies
5.6.2. Sonoluminescent Light Intensity Case Study
Sample
Cumulative
Residual
Standard
Deviation
Plot
Conclusions We can make the following conclusions from the cumulative residual standard deviation plot.
from the
Cumulative 1. The baseline model consisting only of the average ( ) = 110.6063) has a high residual
Residual standard deviation (95).
SD Plot 2. The cumulative residual standard deviation shows a significant and steady decrease as the
following terms are added to the average: X2, X7, X1*X3, X1, X3, X2*X3, and X1*X2.
Including these terms reduces the cumulative residual standard deviation from
approximately 95 to approximately 17.
3. Exclude from the model any term after X1*X2 as the decrease in the residual standard
deviation becomes relatively small.
4. From the |effects| plot, we see that the average is 110.6063, the estimated X2 effect is
-78.6126, and so on. We use this to from the following prediction equation:
Note that X1*X3 is confounded with X2*X7 and X4*X6, X1*X5 is confounded with X2*X6
and X4*X7, and X1*X2 is confounded with X3*X7 and X5*X6.
From the above graph, we see that the residual standard deviation for this model is
approximately 17.
5. Process Improvement
5.6. Case Studies
5.6.2. Sonoluminescent Light Intensity Case Study
Sample Dex
Contour
Plot
Conclusions We can make the following conclusions from the dex contour plot.
from the 1. The best (high light intensity) setting for X2 is "-" and the best setting for X7 is "-". This
Dex combination yields an average response of approximately 224. The next highest average
Contour response from any other combination of these factors is only 76.
Plot
2. The non-linear nature of the contour lines implies that the X2*X7 interaction is important.
3. On the left side of the plot from top to bottom, the contour lines start at 0, increment by 50
and stop at 400. On the bottom of the plot from right to left, the contour lines start at 0,
increment by 50 and stop at 400.
To achieve a light intensity of, say 400, this suggests an extrapolated best setting of (X2,
X7) = (-2,-2).
4. Such extrapolation only makes sense if X2 and X7 are continuous factors. Such is not the
case here. In this example, X2 is solute (-1 = sugar and +1 = glycerol) and X7 is flask
clamping (-1 is unclamped and +1 is clamped). Both factors are discrete, and so
extrapolated settings are not possible.
5. Process Improvement
5.6. Case Studies
5.6.2. Sonoluminescent Light Intensity Case Study
3. X1*X3 (Molarity*pH) or
X2*X7 (Solute*Clamping)
(effect = 70.0)
6. X2*X3 (Solute*pH) or
X4*X5 (Gas*Water Depth)
X1*X7 (Molarity*Clamping)
(effect = -63.5)
7. X1*X2 (Molarity*Solute) or
X3*X7 (Ph*Clamping)
(effect = -59.6)
Best Settings The best settings to maximize sonoluminescent light intensity are
● X1 (Molarity) + (0.33 mol)
● X2 (Solute) - (sugar)
● X3 (pH) + (11)
● X4 (Gas) . (either)
● X7 (Clamping) - (unclamped)
5. Process Improvement
5.6. Case Studies
5.6.2. Sonoluminescent Light Intensity Case Study
Click on the links below to start Dataplot and run this case study
The links in this column will connect you with more
yourself. Each step may use results from previous steps, so please be
detailed information about each analysis step from the
patient. Wait until the software verifies that the current step is
case study description.
complete before clicking on the next step.
1. Generate a dex contour plot using 1. The dex contour plot shows
factors 2 and 7. X2 = -1 and X7 = -1 to be the
best settings.
5. Process Improvement
● silicon wafer
● plot of land
● automotive transmissions
● etc.
5. Process Improvement
5.8. References
Chapter Bisgaard, S. and Steinberg, D. M., (1997), "The Design and Analysis
specific of 2k-p Prototype Experiments," Technometrics, 39, 1, 52-62.
references
Box, G. E. P., and Draper, N. R., (1987), Empirical Model Building
and Response Surfaces, John Wiley & Sons, New York, NY.
Box, G. E. P., and Hunter, J. S., (1954), "A Confidence Region for the
Solution of a Set of Simultaneous Equations with an Application to
Experimental Design," Biometrika, 41, 190-199
Box, G. E. P., and Wilson, K. B., (1951), "On the Experimental
Attainment of Optimum Conditions," Journal of the Royal Statistical
Society, Series B, 13, 1-45.
Capobianco, T. E., Splett, J. D. and Iyer, H. K., "Eddy Current Probe
Sensitivity as a Function of Coil Construction Parameters." Research
in Nondesructive Evaluation, Vol. 2, pp. 169-186, December, 1990.
Cornell, J. A., (1990), Experiments with Mixtures: Designs, Models,
and the Analysis of Mixture Data, John Wiley & Sons, New York,
NY.
Del Castillo, E., (1996), "Multiresponse Optimization Confidence
Regions," Journal of Quality Technology, 28, 1, 61-70.
Derringer, G., and Suich, R., (1980), "Simultaneous Optimization of
Several Response Variables," Journal of Quality Technology, 12, 4,
214-219.
Draper, N.R., (1963), "Ridge Analysis of Response Surfaces,"
Technometrics, 5, 469-479.
Hoerl, A. E., (1959), "Optimum Solution of Many Variables
Equations," Chemical Engineering Progress, 55, 67-78.
Hoerl, A. E., (1964), "Ridge Analysis," Chemical Engineering
Symposium Series, 60, 67-77.
Khuri, A. I., and Cornell, J. A., (1987), Response Surfaces, Marcel
Dekker, New York, NY.
Well Known Box, G. E. P., Hunter, W. G., and Hunter, S. J. (1978), Statistics for
General Experimenters, John Wiley & Sons, Inc., New York, NY.
References
Diamond, W. J., (1989), Practical Experimental Designs, Second Ed.,
Van Nostrand Reinhold, NY.
John, P. W. M., (1971), Statistical Design and Analysis of
Experiments, SIAM Classics in Applied Mathematics, Philadelphia,
PA.
Milliken, G. A., and Johnson, D. E., (1984), Analysis of Messy Data,
Vol. 1, Van Nostrand Reinhold, NY.
Montgomery, D. C., (2000), Design and Analysis of Experiments,
Fifth Edition, John Wiley & Sons, New York, NY.
Case studies Snee, R. D., Hare, L. B., and Trout, J. R.(1985), Experiments in
for different Industry. Design, Analysis and Interpretation of Results, Milwaukee,
industries WI, American Society for Quality.
Case studies in Czitrom, V., and Spagon, P. D., (1997), Statistical Case Studies for
Process Industrial process Improvement, Philadelphia, PA, ASA-SIAM Series
Improvement, on Statistics and Applied Probability.
including
DOE, in the
Semiconductor
Industry
1. Introduction [6.1.]
1. How did Statistical Quality Control Begin? [6.1.1.]
2. What are Process Control Techniques? [6.1.2.]
3. What is Process Control? [6.1.3.]
4. What to do if the process is "Out of Control"? [6.1.4.]
5. What to do if "In Control" but Unacceptable? [6.1.5.]
6. What is Process Capability? [6.1.6.]
5. Tutorials [6.5.]
1. What do we mean by "Normal" data? [6.5.1.]
2. What do we do when data are "Non-normal"? [6.5.2.]
3. Elements of Matrix Algebra [6.5.3.]
1. Numerical Examples [6.5.3.1.]
2. Determinant and Eigenstructure [6.5.3.2.]
4. Elements of Multivariate Analysis [6.5.4.]
1. Mean Vector and Covariance Matrix [6.5.4.1.]
2. The Multivariate Normal Distribution [6.5.4.2.]
3. Hotelling's T squared [6.5.4.3.]
1. T2 Chart for Subgroup Averages -- Phase I [6.5.4.3.1.]
2. T2 Chart for Subgroup Averages -- Phase II [6.5.4.3.2.]
3. Chart for Individual Observations -- Phase I [6.5.4.3.3.]
4. Chart for Individual Observations -- Phase II [6.5.4.3.4.]
5. Charts for Controlling Multivariate Variability [6.5.4.3.5.]
6. Constructing Multivariate Charts [6.5.4.3.6.]
5. Principal Components [6.5.5.]
1. Properties of Principal Components [6.5.5.1.]
2. Numerical Example [6.5.5.2.]
7. References [6.7.]
6.1. Introduction
Contents of This section discusses the basic concepts of statistical process control,
Section quality control and process capability.
The concept of The first to apply the newly discovered statistical methods to the
quality control problem of quality control was Walter A. Shewhart of the Bell
in Telephone Laboratories. He issued a memorandum on May 16, 1924
manufacturing that featured a sketch of a modern control chart.
was first
advanced by Shewhart kept improving and working on this scheme, and in 1931 he
Walter published a book on statistical quality control, "Economic Control of
Shewhart Quality of Manufactured Product", published by Van Nostrand in
New York. This book set the tone for subsequent applications of
statistical methods to process control.
Contributions Two other Bell Labs statisticians, H.F. Dodge and H.G. Romig
of Dodge and spearheaded efforts in applying statistical theory to sampling
Romig to inspection. The work of these three pioneers constitutes much of what
sampling nowadays comprises the theory of statistical quality and control.
inspection There is much more to say about the history of statistical quality
control and the interested reader is invited to peruse one or more of
the references. A very good summary of the historical background of
SQC is found in chapter 1 of "Quality Control and Industrial
Statistics", by Acheson J. Duncan. See also Juran (1997).
Typical There are many ways to implement process control. Key monitoring and
process investigating tools include:
control ● Histograms
techniques
● Check Sheets
● Pareto Charts
● Cause and Effect Diagrams
● Defect Concentration Diagrams
● Scatter Diagrams
● Control Charts
All these are described in Montgomery (2000). This chapter will focus
(Section 3) on control chart methods, specifically:
● Classical Shewhart Control charts,
● Cumulative Sum (CUSUM) charts
● Exponentially Weighted Moving Average (EWMA) charts
● Multivariate control charts
Tools of Several techniques can be used to investigate the product for defects or
statistical defective pieces after all processing is complete. Typical tools of SQC
quality (described in section 2) are:
control ● Lot Acceptance sampling plans
● Skip lot sampling plans
● Military (MIL) Standard sampling plans
Process must Note that the process must be stable before it can be centered at a
be stable target value or its overall variation can be reduced.
A process We are often required to compare the output of a stable process with the process
capability specifications and make a statement about how well the process meets specification. To
index uses do this we compare the natural variability of a stable process with the process
both the specification limits.
process
variability A capable process is one where almost all the measurements fall inside the specification
and the limits. This can be represented pictorially by the plot below:
process
specifications
to determine
whether the
process is
"capable"
There are several statistics that can be used to measure the capability of a process: Cp,
Cpk, Cpm.
Most capability indices estimates are valid only if the sample size used is 'large enough'.
Large enough is generally thought to be about 50 independent data values.
The Cp, Cpk, and Cpm statistics assume that the population of data values is normally
distributed. Assuming a two-sided specification, if and are the mean and standard
deviation, respectively, of the normal data and USL, LSL, and T are the upper and
lower specification limits and the target value, respectively, then the population
capability indices are defined as follows:
Definitions of
various
process
capability
indices
Sample Sample estimators for these indices are given below. (Estimators are indicated with a
estimates of "hat" over them).
capability
indices
The estimator for Cpk can also be expressed as Cpk = Cp(1-k), where k is a scaled
distance between the midpoint of the specification range, m, and the process mean, .
Denote the midpoint of the specification range by m = (USL+LSL)/2. The distance
between the process mean, , and the optimum, which is m, is - m, where
. The scaled distance is
(the absolute sign takes care of the case when ). To determine the
estimated value, , we estimate by . Note that .
The estimator for the Cp index, adjusted by the k factor, is
Plot showing To get an idea of the value of the Cp statistic for varying process widths, consider the
Cp for varying following plot
process
widths
Translating
capability into USL - LSL 6 8 10 12
"rejects" Cp 1.00 1.33 1.66 2.00
where ppm = parts per million and ppb = parts per billion. Note that the reject figures
are based on the assumption that the distribution is centered at .
We have discussed the situation with two spec. limits, the USL and LSL. This is known
as the bilateral or two-sided case. There are many cases where only the lower or upper
specifications are used. Using one spec limit is called unilateral or one-sided. The
corresponding capability indices are
One-sided
specifications
and the
corresponding and
capability
indices
where and are the process mean and standard deviation, respectively.
Estimators of Cpu and Cpl are obtained by replacing and by and s, respectively.
The following relationship holds
Cp = (Cpu + Cpl) /2.
This can be represented pictorially by
Confidence
Assuming normally distributed process data, the distribution of the sample follows
intervals for
indices
from a Chi-square distribution and and have distributions related to the
non-central t distribution. Fortunately, approximate confidence limits related to the
normal distribution have been derived. Various approximations to the distribution of
have been proposed, including those given by Bissell (1990), and we will use a
normal approximation.
The resulting formulas for confidence limits are given below:
100(1- )% Confidence Limits for Cp
where
= degrees of freedom
Confidence Approximate 100(1- )% confidence limits for Cpu with sample size n are:
Intervals for
Cpu and Cpl
with z denoting the percent point function of the standard normal distribution. If is
not known, set it to .
Limits for Cpl are obtained by replacing by .
Confidence Zhang et al. (1990) derived the exact variance for the estimator of Cpk as well as an
Interval for approximation for large n. The reference paper is Zhang, Stenback and Wardrop (1990),
Cpk "Interval Estimation of the process capability index", Communications in Statistics:
Theory and Methods, 19(21), 4455-4470.
The variance is obtained as follows:
Let
Then
where
It is important to note that the sample size should be at least 25 before these
approximations are valid. In general, however, we need n 100 for capability studies.
Another point to observe is that variations are not negligible due to the randomness of
capability indices.
An example For a certain process the USL = 20 and the LSL = 8. The observed process average,
= 16, and the standard deviation, s = 2. From this we obtain
This means that the process is capable as long as it is located at the midpoint, m = (USL
+ LSL)/2 = 14.
But it doesn't, since = 16. The factor is found by
and
We would like to have at least 1.0, so this is not a good process. If possible,
reduce the variability or/and center the process. We can compute the and
From this we see that the , which is the smallest of the above indices, is 0.6667.
Note that the formula is the algebraic equivalent of the min{
, } definition.
What you can The indices that we considered thus far are based on normality of the process
do with distribution. This poses a problem when the process distribution is not normal. Without
non-normal going into the specifics, we can list some remedies.
data 1. Transform the data so that they become approximately normal. A popular
transformation is the Box-Cox transformation
2. Use or develop another set of indices, that apply to nonnormal distributions. One
statistic is called Cnpk (for non-parametric Cpk). Its estimator is calculated by
where p(0.995) is the 99.5th percentile of the data and p(.005) is the 0.5th
percentile of the data.
For additional information on nonnormal distributions, see Johnson and Kotz
(1993).
There is, of course, much more that can be said about the case of nonnormal data.
However, if a Box-Cox transformation can be successfully performed, one is
encouraged to use it.
Definintion of Dodge reasoned that a sample should be picked at random from the
Lot lot, and on the basis of information that was yielded by the sample, a
Acceptance decision should be made regarding the disposition of the lot. In
Sampling general, the decision is either to accept or reject the lot. This process is
called Lot Acceptance Sampling or just Acceptance Sampling.
Acceptance It was pointed out by Harold Dodge in 1969 that Acceptance Quality
Quality Control is not the same as Acceptance Sampling. The latter depends
Control and on specific sampling plans, which when implemented indicate the
Acceptance conditions for acceptance or rejection of the immediate lot that is
Sampling being inspected. The former may be implemented in the form of an
Acceptance Control Chart. The control limits for the Acceptance
Control Chart are computed using the specification limits and the
standard deviation of what is being monitored (see Ryan, 2000 for
details).
Definitions Deriving a plan, within one of the categories listed above, is discussed
of basic in the pages that follow. All derivations depend on the properties you
Acceptance want the plan to have. These are described using the following terms:
Sampling ● Acceptable Quality Level (AQL): The AQL is a percent defective
terms that is the base line requirement for the quality of the producer's
product. The producer would like to design a sampling plan such
that there is a high probability of accepting a lot that has a defect
level less than or equal to the AQL.
● Lot Tolerance Percent Defective (LTPD): The LTPD is a
designated high defect level that would be unacceptable to the
consumer. The consumer would like the sampling plan to have a
low probability of accepting a lot with a defect level as high as
the LTPD.
● Type I Error (Producer's Risk): This is the probability, for a
given (n,c) sampling plan, of rejecting a lot that has a defect level
equal to the AQL. The producer suffers when this occurs, because
a lot with acceptable quality was rejected. The symbol is
commonly used for the Type I error and typical values for
range from 0.2 to 0.01.
● Type II Error (Consumer's Risk): This is the probability, for a
given (n,c) sampling plan, of accepting a lot with a defect level
equal to the LTPD. The consumer suffers when this occurs,
because a lot with unacceptable quality was accepted. The symbol
is commonly used for the Type II error and typical values range
from 0.2 to 0.01.
● Operating Characteristic (OC) Curve: This curve plots the
probability of accepting the lot (Y-axis) versus the lot fraction or
percent defectives (X-axis). The OC curve is the primary tool for
displaying and investigating the properties of a LASP.
● Average Outgoing Quality (AOQ): A common procedure, when
sampling and testing is non-destructive, is to 100% inspect
rejected lots and replace all defectives with good units. In this
case, all rejected lots are made perfect and the only defects left
are those in lots that were accepted. AOQ's refer to the long term
defect level for this combined LASP and 100% inspection of
rejected lots process. If all lots come in with a defect level of
exactly p, and the OC curve for the chosen (n,c) LASP indicates a
probability pa of accepting such a lot, over the long run the AOQ
can easily be shown to be:
The final Making a final choice between single or multiple sampling plans that
choice is a have acceptable properties is a matter of deciding whether the average
tradeoff sampling savings gained by the various multiple sampling plans justifies
decision the additional complexity of these plans and the uncertainty of not
knowing how much sampling and inspection will be done on a
day-by-day basis.
AQL is The foundation of the Standard is the acceptable quality level or AQL. In
foundation the following scenario, a certain military agency, called the Consumer
of standard from here on, wants to purchase a particular product from a supplier,
called the Producer from here on.
In applying the Mil. Std. 105D it is expected that there is perfect
agreement between Producer and Consumer regarding what the AQL is
for a given product characteristic. It is understood by both parties that
the Producer will be submitting for inspection a number of lots whose
quality level is typically as good as specified by the Consumer.
Continued quality is assured by the acceptance or rejection of lots
following a particular sampling plan and also by providing for a shift to
another, tighter sampling plan, when there is evidence that the
Producer's product does not meet the agreed-upon AQL.
Standard Mil. Std. 105E offers three types of sampling plans: single, double and
offers 3 multiple plans. The choice is, in general, up to the inspectors.
types of
sampling Because of the three possible selections, the standard does not give a
plans sample size, but rather a sample code letter. This, together with the
decision of the type of plan yields the specific sampling plan to be used.
Steps in the The steps in the use of the standard can be summarized as follows:
standard 1. Decide on the AQL.
2. Decide on the inspection level.
3. Determine the lot size.
4. Enter the table to find sample size code letter.
5. Decide on type of sampling to be used.
6. Enter proper table to find the plan to be used.
7. Begin with normal inspection, follow the switching rules and the
rule for stopping the inspection (if needed).
Additional There is much more that can be said about Mil. Std. 105E, (and 105D).
information The interested reader is referred to references such as (Montgomery
(2000), Schilling, tables 11-2 to 11-17, and Duncan, pages 214 - 248).
There is also (currently) a web site developed by Galit Shmueli that will
develop sampling plans interactively with the user, according to Military
Standard 105E (ANSI/ASQC Z1.4, ISO 2859) Tables.
Number of It is instructive to show how the points on this curve are obtained, once
defectives is we have a sampling plan (n,c) - later we will demonstrate how a
approximately sampling plan (n,c) is obtained.
binomial
We assume that the lot size N is very large, as compared to the sample
size n, so that removing the sample doesn't significantly change the
remainder of the lot, no matter how many defects are in the sample.
Then the distribution of the number of defectives, d, in a random
sample of n items is approximately binomial with parameters n and p,
where p is the fraction of defectives per lot.
The probability of observing exactly d defectives is given by
The binomial
distribution
Sample table Using this formula with n = 52 and c=3 and p = .01, .02, ...,.12 we find
for Pa, Pd Pa Pd
using the
binomial .998 .01
distribution .980 .02
.930 .03
.845 .04
.739 .05
.620 .06
.502 .07
.394 .08
.300 .09
.223 .10
.162 .11
.115 .12
Equations for In order to design a sampling plan with a specified OC curve one
calculating a needs two designated points. Let us design a sampling plan such that
sampling plan the probability of acceptance is 1- for lots with fraction defective p1
with a given and the probability of acceptance is for lots with fraction defective
OC curve p2. Typical choices for these points are: p1 is the AQL, p2 is the LTPD
and , are the Producer's Risk (Type I error) and Consumer's Risk
(Type II error), respectively.
Sample table Setting p = .01, .02, ..., .12, we can generate the following table
of AOQ AOQ p
versus p .0010 .01
.0196 .02
.0278 .03
.0338 .04
.0369 .05
.0372 .06
.0351 .07
.0315 .08
.0270 .09
.0223 .10
.0178 .11
.0138 .12
Interpretation From examining this curve we observe that when the incoming quality
of AOQ plot is very good (very small fraction of defectives coming in), then the
outgoing quality is also very good (very small fraction of defectives
going out). When the incoming lot quality is very bad, most of the lots
are rejected and then inspected. The "duds" are eliminated or replaced
by good ones, so that the quality of the outgoing lots, the AOQ,
becomes very good. In between these extremes, the AOQ rises, reaches
a maximum, and then drops.
The maximum ordinate on the AOQ curve represents the worst
possible quality that results from the rectifying inspection program. It
is called the average outgoing quality limit, (AOQL ).
From the table we see that the AOQL = 0.0372 at p = .06 for the above
example.
One final remark: if N >> n, then the AOQ ~ pa p .
Sample table Setting p= .01, .02, ....14 generates the following table
of ATI versus ATI P
p 70 .01
253 .02
753 .03
1584 .04
2655 .05
3836 .06
5007 .07
6083 .08
7012 .09
7779 .10
8388 .11
8854 .12
9201 .13
9453 .14
Plot of ATI A plot of ATI versus p, the Incoming Lot Quality (ILQ) is given below.
versus p
Design of a The parameters required to construct the OC curve are similar to the single
double sample case. The two points of interest are (p1, 1- ) and (p2, , where p1 is the
sampling lot fraction defective for plan 1 and p2 is the lot fraction defective for plan 2. As
plan far as the respective sample sizes are concerned, the second sample size must
be equal to, or an even multiple of, the first sample size.
There exist a variety of tables that assist the user in constructing double and
multiple sampling plans. The index to these tables is the p2/p1 ratio, where p2 >
p1. One set of tables, taken from the Army Chemical Corps Engineering
Agency for = .05 and = .10, is given below:
Tables for n1 = n2
accept approximation values
R= numbers of pn1 for
p2/p1 c1 c2 P = .95 P = .10
Example
The left holds constant at 0.05 (P = 0.95 = 1 - ) and the right holds
constant at 0.10. (P = 0.10). Then holding constant we find pn1 = 1.16 so n1
= 1.16/p1 = 116. And, holding constant we find pn1 = 5.39, so n1 = 5.39/p2 =
108. Thus the desired sampling plan is
n1 = 108 c1 = 2 n2 = 108 c2 = 4
If we opt for n2 = 2n1, and follow the same procedure using the appropriate
table, the plan is:
n1 = 77 c1 = 1 n2 = 154 c2 = 4
The first plan needs less samples if the number of defectives in sample 1 is
greater than 2, while the second plan needs less samples if the number of
defectives in sample 1 is less than 2.
Construction Since when using a double sampling plan the sample size depends on whether
of the ASN or not a second sample is required, an important consideration for this kind of
curve sampling is the Average Sample Number (ASN) curve. This curve plots the
ASN versus p', the true fraction defective in an incoming lot.
We will illustrate how to calculate the ASN curve with an example. Consider a
double-sampling plan n1 = 50, c1= 2, n2 = 100, c2 = 6, where n1 is the sample
size for plan 1, with accept number c1, and n2, c2, are the sample size and
accept number, respectively, for plan 2.
Let p' = .06. Then the probability of acceptance on the first sample, which is the
chance of getting two or less defectives, is .416 (using binomial tables). The
probability of rejection on the second sample, which is the chance of getting
more than six defectives, is (1-.971) = .029. The probability of making a
decision on the first sample is .445, equal to the sum of .416 and .029. With
complete inspection of the second sample, the average size sample is equal to
the size of the first sample times the probability that there will be only one
sample plus the size of the combined samples times the probability that a
second sample will be necessary. For the sampling plan under consideration,
the ASN with complete inspection of the second sample for a p' of .06 is
50(.445) + 150(.555) = 106
The general formula for an average sample number curve of a double-sampling
plan with complete inspection of the second sample is
ASN = n1P1 + (n1 + n2)(1 - P1) = n1 + n2(1 - P1)
where P1 is the probability of a decision on the first sample. The graph below
shows a plot of the ASN versus p'.
The ASN
curve for a
double
sampling
plan
Procedure The procedure commences with taking a random sample of size n1from
for multiple a large lot of size N and counting the number of defectives, d1.
sampling
if d1 a1 the lot is accepted.
if d1 r1 the lot is rejected.
if a1 < d1 < r1, another sample is taken.
If subsequent samples are required, the first sample procedure is
repeated sample by sample. For each sample, the total number of
defectives found at any stage, say stage i, is
Item-by-item The sequence can be one sample at a time, and then the sampling
and group process is usually called item-by-item sequential sampling. One can also
sequential select sample sizes greater than one, in which case the process is
sampling referred to as group sequential sampling. Item-by-item is more popular
so we concentrate on it. The operation of such a plan is illustrated
below:
Diagram of
item-by-item
sampling
Equations The equations for the two limit lines are functions of the parameters p1,
for the limit , p2, and .
lines
where
Instead of using the graph to determine the fate of the lot, one can resort
to generating tables (with the help of a computer program).
n n n n n n
inspect accept reject inspect accept reject
1 x x 14 x 2
2 x 2 15 x 2
3 x 2 16 x 3
4 x 2 17 x 3
5 x 2 18 x 3
6 x 2 19 x 3
7 x 2 20 x 3
8 x 2 21 x 3
9 x 2 22 x 3
10 x 2 23 x 3
11 x 2 24 0 3
12 x 2 25 0 3
13 x 2 26 0 3
The f and i The parameters f and i are essential to calculating the probability of acceptance
parameters for a skip-lot sampling plan. In this scheme, i, called the clearance number, is a
positive integer and the sampling fraction f is such that 0 < f < 1. Hence, when f
= 1 there is no longer skip-lot sampling. The calculation of the acceptance
probability for the skip-lot sampling plan is performed via the following
formula
ASN of skip-lot An important property of skip-lot sampling plans is the average sample number
sampling plan (ASN ). The ASN of a skip-lot sampling plan is
ASNskip-lot = (F)(ASNreference)
where F is defined by
Therefore, since 0 < F < 1, it follows that the ASN of skip-lot sampling is
smaller than the ASN of the reference sampling plan.
In summary, skip-lot sampling is preferred when the quality of the submitted
lots is excellent and the supplier can demonstrate a proven track record.
Chart
demonstrating
basis of
control chart
Why control The control limits as pictured in the graph might be .001 probability
charts "work" limits. If so, and if chance causes alone were present, the probability of
a point falling above the upper limit would be one out of a thousand,
and similarly, a point falling below the lower limit would be one out of
a thousand. We would be searching for an assignable cause if a point
would fall outside these limits. Where we put these limits will
determine the risk of undertaking such a search when in reality there is
no assignable cause for variation.
Since two out of a thousand is a very small risk, the 0.001 limits may
be said to give practical assurances that, if a point falls outside these
limits, the variation was caused be an assignable cause. It must be
noted that two out of one thousand is a purely arbitrary number. There
is no reason why it could have been set to one out a hundred or even
larger. The decision would depend on the amount of risk the
management of the quality control program is willing to take. In
general (in the world of quality control) it is customary to use limits
that approximate the 0.002 standard.
Letting X denote the value of a process characteristic, if the system of
chance causes generates a variation in X that follows the normal
distribution, the 0.001 probability limits will be very close to the 3
limits. From normal tables we glean that the 3 in one direction is
0.00135, or in both directions 0.0027. For normal distributions,
therefore, the 3 limits are the practical equivalent of 0.001
probability limits.
Strategies for If a data point falls outside the control limits, we assume that the
dealing with process is probably out of control and that an investigation is
out-of-control warranted to find and eliminate the cause or causes.
findings
Does this mean that when all points fall within the limits, the process is
in control? Not necessarily. If the plot looks non-random, that is, if the
points exhibit some form of systematic behavior, there is still
something wrong. For example, if the first 25 of 30 points fall above
the center line and the last 5 fall below the center line, we would wish
to know why this is so. Statistical methods to detect sequences or
nonrandom patterns can be applied to the interpretation of control
charts. To be sure, "in control" implies that all points are between the
control limits and they form a random pattern.
C4 factor
Fractional
Factorials
With this definition the reader should have no problem verifying that
the c4 factor for n = 10 is .9727.
Mean and So the mean or expected value of the sample standard deviation is c4 .
standard
deviation of The standard deviation of the sample standard deviation is
the
estimators
WECO rules The WECO rules are based on probability. We know that, for a normal
based on distribution, the probability of encountering a point outside ± 3 is
probabilities 0.3%. This is a rare event. Therefore, if we observe a point outside the
control limits, we conclude the process has shifted and is unstable.
Similarly, we can identify other events that are equally rare and use
them as flags for instability. The probability of observing two points
out of three in a row between 2 and 3 and the probability of
observing four points out of five in a row between 1 and 2 are also
about 0.3%.
WECO rules Note: While the WECO rules increase a Shewhart chart's sensitivity to
increase trends or drifts in the mean, there is a severe downside to adding the
false alarms WECO rules to an ordinary Shewhart control chart that the user should
understand. When following the standard Shewhart "out of control"
rule (i.e., signal if and only if you see a point beyond the plus or minus
3 sigma control limits) you will have "false alarms" every 371 points
on the average (see the description of Average Run Length or ARL on
the next page). Adding the WECO rules increases the frequency of
false alarms to about once in every 91.75 points, on the average (see
Champ and Woodall, 1987). The user has to decide whether this price
is worth paying (some users add the WECO rules, but take them "less
seriously" in terms of the effort put into troubleshooting activities when
out of control signals occur).
With this background, the next page will describe how to construct
Shewhart variables control charts.
and S We begin with and s charts. We should use the s chart first to
Shewhart determine if the distribution for the process characteristic is stable.
Control
Charts Let us consider the case where we have to estimate by analyzing past
data. Suppose we have m preliminary samples at our disposition, each of
size n, and let si be the standard deviation of the ith sample. Then the
average of the m standard deviations is
and R If the sample size is relatively small (say equal to or less than 10), we
control can use the range instead of the standard deviation of a sample to
charts construct control charts on and the range, R. The range of a sample is
simply the difference between the largest and smallest observation.
There is a statistical relationship (Patnaik, 1946) between the mean
range for data from a normal distribution and , the standard deviation
of that distribution. This relationship depends only on the sample size, n.
The mean of R is d2 , where the value of d2 is also a function of n. An
estimator of is therefore R /d2.
Armed with this background we can now develop the and R control
chart.
Let R1, R2, ..., Rk, be the range of k samples. The average range is
The R chart
R control This chart controls the process variability since the sample range is
charts related to the process standard deviation. The center line of the R chart
is the average range.
To compute the control limits we need an estimate of the true, but
unknown standard deviation W = R/ . This can be found from the
distribution of W = R/ (assuming that the items that we measure
follow a normal distribution). The standard deviation of W is d3, and is a
known function of the sample size, n. It is tabulated in many textbooks
on statistical quality control.
Therefore since R = W , the standard deviation of R is R = d3 . But
since the true is unknown, we may estimate R by
As was the case with the control chart parameters for the subgroup
averages, defining another set of factors will ease the computations,
namely:
D3 = 1 - 3 d3 / d2 and D4 = 1 + 3 d3 / d2. These yield
Efficiency of n Relative
R versus S Efficiency
2 1.000
3 0.992
4 0.975
5 0.955
6 0.930
10 0.850
A typical sample size is 4 or 5, so not much is lost by using the range for
such sample sizes.
which is the absolute value of the first difference (e.g., the difference between
two consecutive data points) of the data. Analogous to the Shewhart control
chart, one can plot both the data (which are the individuals) and the moving
range.
Individuals For the control chart for individual measurements, the lines plotted are:
control
limits for an
observation
where is the average of all the individuals and is the average of all
the moving ranges of two observations. Keep in mind that either or both
averages may be replaced by a standard or target, if available. (Note that
1.128 is the value of d2 for n = 2).
Example of The following example illustrates the control chart for individual
moving observations. A new process was studied in order to monitor flow rate. The
range first 10 batches resulted in
Example of
individuals
chart
The process is in control, since none of the plotted points fall outside either
the UCL or LCL.
Alternative Note: Another way to construct the individuals chart is by using the standard
for deviation. Then we can obtain the chart from
constructing
individuals
control It is preferable to have the limits computed this way for the start of Phase 2.
chart
Definition of
cumulative
sum
Sample
V-Mask
demonstrating
an out of
control
process
Interpretation In the diagram above, the V-Mask shows an out of control situation
of the V-Mask because of the point that lies above the upper arm. By sliding the
on the plot V-Mask backwards so that the origin point covers other cumulative
sum data points, we can determine the first point that signaled an
out-of-control situation. This is useful for diagnosing what might have
caused the process to go out of control.
From the diagram it is clear that the behavior of the V-Mask is
determined by the distance k (which is the slope of the lower arm) and
the rise distance h. These are the design parameters of the V-Mask.
Note that we could also specify d and the vertex angle (or, as is more
common in the literature, = 1/2 the vertex angle) as the design
parameters, and we would end up with the same V-Mask.
In practice, designing and manually constructing a V-Mask is a
complicated procedure. A cusum spreadsheet style procedure shown
below is more practical, unless you have statistical software that
automates the V-Mask methodology. Before describing the spreadsheet
approach, we will look briefly at an example of a software V-Mask.
JMP example An example will be used to illustrate how to construct and apply a
of V-Mask V-Mask procedure using JMP. The 20 data points
324.925, 324.675, 324.725, 324.350, 325.350, 325.225, 324.125,
324.525, 325.225, 324.600, 324.625, 325.150, 328.325, 327.250,
327.825, 328.500, 326.675, 327.775, 326.875, 328.350
are each the average of samples of size 4 taken from a process that has
an estimated mean of 325. Based on process data, the process standard
deviation is 1.27 and therefore the sample means used in the cusum
procedure have a standard deviation of 1.27/41/2 = 0.635.
After inputting the 20 sample means and selecting "control charts"
from the pull down "Graph" menu, JMP displays a "Control Charts"
screen and a "CUSUM Charts" screen. Since each sample mean is a
separate "data point", we choose a constant sample size of 1. We also
choose the option for a two sided Cusum plot shown in terms of the
original data.
JMP allows us a choice of either designing via the method using h and
k or using an alpha and beta design approach. For the latter approach
we must specify
● , the probability of a false alarm, i.e., concluding that a shift in
the process has occurred, while in fact it did not
● , the the probability of not detecting that a shift in the process
mean has, in fact, occurred
● (delta), the amount of shift in the process mean that we wish to
detect, expressed as a multiple of the standard deviation of the
data points (which are the sample means).
Note: Technically, alpha and beta are calculated in terms of one
sequential trial where we monitor Sm until we have either an
out-of-control signal or Sm returns to the starting point (and the
monitoring begins, in effect, all over again).
JMP output When we click on chart we see the V-Mask placed over the last data
from point. The mask clearly indicates an out of control situation.
CUSUM
procedure
We next "grab" the V-Mask and move it back to the first point that
indicated the process was out of control. This is point number 14, as
shown below.
JMP
CUSUM
chart after
moving
V-Mask to
first out of
control
point
A Most users of cusum procedures prefer tabular charts over the V-Mask.
spreadsheet The V-Mask is actually a carry-over of the pre-computer era. The
approach to tabular method can be quickly implemented by standard spreadsheet
cusum software.
monitoring
To generate the tabular form we use the h and k parameters expressed in
the original data units. It is also possible to use sigma units.
The following quantities are calculated:
Shi(i) = max(0, Shi(i-1) + xi - - k)
Example of We will construct a cusum tabular chart for the example described
spreadsheet above. For this example, the JMP parameter table gave h = 4.1959 and k
calculations = .3175. Using these design values, the tabular form of the example is
h k
325 4.1959 0.3175
Increase in Decrease in
mean mean
Group x x-325 x-325-k Shi 325-k-x Slo Cusum
If the distance between a plotted point and the lowest previous point is equal
to or greater than h, one concludes that the process mean has shifted
(increased).
h is decision Hence, h is referred to as the decision limit. Thus the sample size n,
limit reference value k, and decision limit h are the parameters required for
operating a one-sided CUSUM chart. If one has to control both positive and
negative deviations, as is usually the case, two one-sided charts are used,
with respective values k1, k2, (k1 > k2) and respective decision limits h and
-h.
Standardizing The shift in the mean can be expressed as - k. If we are dealing with
shift in mean normally distributed measurements, we can standardize this shift by
and decision
limit
Determination The average run length (ARL) at a given quality level is the average number
of the ARL, of samples (subgroups) taken before an action signal is given. The
given h and k standardized parameters ks and hs together with the sample size n are usually
selected to yield approximate ARL's L0 and L1 at acceptable and rejectable
quality levels 0 and 1 respectively. We would like to see a high ARL, L0,
when the process is on target, (i.e. in control), and a low ARL, L1, when the
process mean shifts to an unsatisfactory level.
In order to determine the parameters of a CUSUM chart, the acceptable and
rejectable quality levels along with the desired respective ARL ' s are usually
specified. The design parameters can then be obtained by a number of ways.
Unfortunately, the calculations of the ARL for CUSUM charts are quite
involved.
There are several nomographs available from different sources that can be
utilized to find the ARL's when the standardized h and k are given. Some of
the nomographs solve the unpleasant integral equations that form the basis
of the exact solutions, using an approximation of Systems of Linear
Algebraic Equations (SLAE). This Handbook used a computer program that
furnished the required ARL's given the standardized h and k. An example is
given below:
Example of
finding ARL's mean shift Shewart
given the (k = .5) 4 5
standardized
h and k 0 336 930 371.00
.25 74.2 140 281.14
.5 26.6 30.0 155.22
.75 13.3 17.0 81.22
1.00 8.38 10.4 44.0
1.50 4.75 5.75 14.97
2.00 3.34 4.01 6.30
2.50 2.62 3.11 3.24
3.00 2.19 2.57 2.00
4.00 1.71 2.01 1.19
Using the If k = .5, then the shift of the mean (in multiples of the standard deviation of
table the mean) is obtained by adding .5 to the first column. For example to detect
a mean shift of 1 sigma at h = 4, the ARL = 8.38. (at first column entry of
.5).
The last column of the table contains the ARL's for a Shewhart control chart
at selected mean shifts. The ARL for Shewhart = 1/p, where p is the
probability for a point to fall outside established control limits. Thus, for
3-sigma control limits and assuming normality, the probability to exceed the
upper control limit = .00135 and to fall below the lower control limit is also
.00135 and their sum = .0027. (These numbers come from standard normal
distribution tables or computer programs, setting z = 3). Then the ARL =
1/.0027 = 370.37. This says that when a process is in control one expects an
out-of-control signal (false alarm) each 371 runs.
ARL if a 1 When the means shifts up by 1 sigma, then the distance between the upper
sigma shift control limit and the shifted mean is 2 sigma (instead of 3 ). Entering
has occurred normal distribution tables with z = 2 yields a probability of p = .02275 to
exceed this value. The distance between the shifted mean and the lower limit
is now 4 sigma and the probability of < -4 is only .000032 and can be
ignored. The ARL is 1 / .02275 = 43.96 .
Shewhart is The conclusion can be drawn that the Shewhart chart is superior for
better for detecting large shifts and the CUSUM scheme is faster for small shifts. The
detecting break-even point is a function of h, as the table shows.
large shifts,
CUSUM is
faster for
small shifts
Comparison For the Shewhart chart control technique, the decision regarding the
of Shewhart state of control of the process at any time, t, depends solely on the most
control recent measurement from the process and, of course, the degree of
chart and 'trueness' of the estimates of the control limits from historical data. For
EWMA the EWMA control technique, the decision depends on the EWMA
control statistic, which is an exponentially weighted average of all prior data,
chart including the most recent measurement.
techniques
By the choice of weighting factor, , the EWMA control procedure can
be made sensitive to a small or gradual drift in the process, whereas the
Shewhart control procedure can only react when the last data point is
outside a control limit.
Choice of The parameter determines the rate at which 'older' data enter into the
weighting calculation of the EWMA statistic. A value of = 1 implies that only
factor the most recent measurement influences the EWMA (degrades to
Shewhart chart). Thus, a large value of = 1 gives more weight to
recent data and less weight to older data; a small value of gives more
weight to older data. The value of is usually set between 0.2 and 0.3
(Hunter) although this choice is somewhat arbitrary. Lucas and Saccucci
(1990) give tables that help the user select .
Definition of The center line for the control chart is the target value or EWMA0. The
control control limits are:
limits for UCL = EWMA0 + ksewma
EWMA
LCL = EWMA0 - ksewma
where the factor k is either set equal 3 or chosen using the Lucas and
Saccucci (1990) tables. The data are assumed to be independent and
these tables also assume a normal population.
As with all control procedures, the EWMA procedure depends on a
database of measurements that are truly representative of the process.
Once the mean value and standard deviation have been calculated from
this database, the process can enter the monitoring stage, provided the
process was in control when the data were collected. If not, then the
usual Phase 1 work would have to be completed first.
Sample data Consider the following data consisting of 20 points where 1 - 10 are on
the top row from left to right and 11-20 are on the bottom row from left
to right:
EWMA These data represent control measurements from the process which is to
statistics for be monitored using the EWMA control chart technique. The
sample data corresponding EWMA statistics that are computed from this data set
are:
Interpretation The red dots are the raw data; the jagged line is the EWMA statistic
of EWMA over time. The chart tells us that the process is in control because all
control chart EWMAt lie between the control limits. However, there seems to be a
trend upwards for the last 5 periods.
There is another chart which handles defects per unit, called the u chart
(for unit). This applies when we wish to work with the average number
of nonconformities per unit of product.
For additional references, see Woodall (1997) which reviews papers
showing examples of attribute control charting, including examples
from semiconductor manufacturing such as those examining the spatial
depencence of defects.
and
The inspection Before the control chart parameters are defined there is one more
unit definition: the inspection unit. We shall count the number of defects
that occur in a so-called inspection unit. More often than not, an
inspection unit is a single unit or item of product; for example, a
wafer. However, sometimes the inspection unit could consist of five
wafers, or ten wafers and so on. The size of the inspection units may
depend on the recording facility, measuring equipment, operators, etc.
Suppose that defects occur in a given inspection unit according to the
Poisson distribution, with parameter c (often denoted by np or the
Greek letter ). In other words
Control charts
for counts,
using the
Poisson where x is the number of defects and c > 0 is the parameter of the
distribution Poisson distribution. It is known that both the mean and the variance
of this distribution are equal to c. Then the k-sigma control chart is
If the LCL comes out negative, then there is no lower control limit.
This control scheme assumes that a standard value for c is available. If
this is not the case then c may be estimated as the average of the
number of defects in a preliminary sample of inspection units, call it
. Usually k is set to 3 by many practioners.
Control chart An example may help to illustrate the construction of control limits for
example using counts data. We are inspecting 25 successive wafers, each containing
counts 100 chips. Here the wafer is the inspection unit. The observed number
of defects are
8 12 21 11
9 10 22 19
10 17 23 16
11 19 24 31
12 17 25 13
13 14
Normal We have seen that the 3-sigma limits for a c chart, where c represents
approximation the number of nonconformities, are given by
to Poisson is
adequate
when the
mean of the
Poisson is at
least 5 where it is assumed that the normal approximation to the Poisson
distribution holds, hence the symmetry of the control limits. It is
shown in the literature that the normal approximation to the Poisson is
adequate when the mean of the Poisson is at least 5. When applied to
the c chart this implies that the mean of the defects should be at least
5. This requirement will often be met in practice, but still, when the
mean is smaller than 9 (solving the above equation) there will be no
lower control limit.
Let the mean be 10. Then the lower control limit = 0.513. However,
P(c = 0) = .000045, using the Poisson formula. This is only 1/30 of the
assumed area of .00135. So one has to raise the lower limit so as to get
as close as possible to .00135. From Poisson tables or computer
software we find that P(1) = .0005 and P(2) = .0027, so the lower limit
should actually be 2 or 3.
on the original scale, but they require special tables to obtain the
limits. Of course, software might be used instead.
Warning for Note: In general, it is not a good idea to use 3-sigma limits for
highly skewed distributions that are highly skewed (see Ryan and Schwertman (1997)
distributions for more about the possibly extreme consequences of doing this).
The
binomial
distribution
model for
number of
defectives in
a sample
The mean of D is np and the variance is np(1-p). The sample proportion
nonconforming is the ratio of the number of nonconforming units in the
sample, D, to the sample size n,
and
p control If the true fraction conforming p is known (or a standard value is given),
charts for then the center line and control limits of the fraction nonconforming
lot control chart is
proportion
defective
Hotelling As in the univariate case, when data are grouped, the T 2 chart can be
charts for paired with a chart that displays a measure of variability within the
both means subgroups for all the analyzed characteristics. The combined T 2 and
and
dispersion (dispersion) charts are thus a multivariate counterpart of the univariate
and S (or and R) charts.
Additional For more details and examples see the next page and also Tutorials,
discussion section 5, subsections 4.3, 4.3.1 and 4.3.2. An introduction to Elements of
multivariate analysis is also given in the Tutorials.
The constant c is the sample size from which the covariance matrix was
estimated.
Mean and Let X1,...Xn be n p-dimensional vectors of observations that are sampled
Covariance independently from Np(m, ) with p < n-1, with the covariance
matrices
matrix of X. The observed mean vector and the sample dispersion
matrix
Additional See Tutorials (section 5), subsections 4.3, 4.3.1 and 4.3.2 for more
discussion details and examples. An introduction to Elements of multivariate
analysis is also given in the Tutorials.
Another way Another way to analyze the data is to use principal components. For
to monitor each multivariate measurement (or observation), the principal
multivariate components are linear combinations of the standardized p variables (to
data: standardize subtract their respective targets and divide by their
Principal standard deviations). The principal components have two important
Components advantages:
control 1. the new variables are uncorrelated (or almost)
charts
2. very often, a few (sometimes 1 or 2) principal components may
capture most of the variability in the data so that we do not have
to use all of the p principal components for control.
Additional More details and examples are given in the Tutorials (section 5).
discussion
where Zi is the ith EWMA vector, Xi is the the ith observation vector i
= 1, 2, ..., n, Z0 is the vector of variable values from the historical data,
is the diag( 1, 2, ... , p) which is a diagonal matrix with 1, 2,
... , p on the main diagonal, and p is the number of variables; that is
the number of elements in each vector.
Illustration of The following illustration may clarify this. There are p variables and
multivariate each variable contains n observations. The input data matrix looks like:
EWMA
Simplification It has been shown (Lowry et al., 1992) that the (k,l)th element of the
covariance matrix of the ith EWMA, , is
Table for The following table gives the values of (1- ) 2i for selected values of
selected and i.
values of
2i
and i
1- 4 6 8 10 12 20 30 40 50
.9 .656 .531 .430 .349 .282 .122 .042 .015 .005
.8 .410 .262 .168 .107 .069 .012 .001 .000 .000
.7 .240 .118 .058 .028 .014 .001 .000 .000 .000
.6 .130 .047 .017 .006 .002 .000 .000 .000 .000
.5 .063 .016 .004 .001 .000 .000 .000 .000 .000
.4 .026 .004 .001 .000 .000 .000 .000 .000 .000
.3 .008 .001 .000 .000 .000 .000 .000 .000 .000
.2 .002 .000 .000 .000 .000 .000 .000 .000 .000
.1 .000 .000 .000 .000 .000 .000 .000 .000 .000
*****************************************************
* Multi-Variate EWMA Control Chart *
*****************************************************
The UCL = 5.938 for = .05. Smaller choices of are also used.
● Sales Forecasting
● Budgetary Analysis
● Yield Projections
● Inventory Studies
● Workload Projections
● Utility Studies
● Census Analysis
There are Techniques: The fitting of time series models can be an ambitious
many undertaking. There are many methods of model fitting including the
methods following:
used to ● Box-Jenkins ARIMA models
model and
forecast ● Box-Jenkins Multivariate Models
time series ● Holt-Winters Exponential Smoothing (single, double, triple)
The user's application and preference will decide the selection of the
appropriate technique. It is beyond the realm and intention of the
authors of this handbook to cover all these methods. The overview
presented here will start by looking at some basic smoothing techniques:
● Averaging Methods
Taking We will first investigate some averaging methods, such as the "simple"
averages is average of all past data.
the simplest
way to A manager of a warehouse wants to know how much a typical supplier
smooth data delivers in 1000 dollar units. He/she takes a sample of 12 suppliers, at
random, obtaining the following results:
Supplier Amount Supplier Amount
1 9 7 11
2 8 8 7
3 9 9 13
4 12 10 9
5 9 11 11
6 12 12 10
The computed mean or average of the data = 10. The manager decides
to use this as the estimate for expenditure of a typical supplier.
Is this a good or bad estimate?
1 9 -1 1
2 8 -2 4
3 9 -1 1
4 12 2 4
5 9 -1 1
6 12 2 4
7 11 1 1
8 7 -3 9
9 13 3 9
10 9 -1 1
11 11 1 1
12 10 0 0
Table of So how good was the estimator for the amount spent for each supplier?
MSE results Let us compare the estimate (10) with the following estimates: 7, 9, and
for example 12. That is, we estimate that each supplier will spend $7, or $9 or $12.
using
different Performing the same calculations we arrive at:
estimates Estimator 7 9 10 12
SSE 144 48 36 84
MSE 12 4 3 7
The estimator with the smallest MSE is the best. It can be shown
mathematically that the estimator that minimizes the MSE for a set of
random data is the mean.
Table Next we will examine the mean to see how well it predicts net income
showing over time.
squared
error for the The next table gives the income before taxes of a PC manufacturer
mean for between 1985 and 1994.
sample data Squared
Year $ (millions) Mean Error Error
1985 46.163 48.776 -2.613 6.828
1986 46.998 48.776 -1.778 3.161
1987 47.816 48.776 -0.960 0.922
1988 48.311 48.776 -0.465 0.216
1989 48.758 48.776 -0.018 0.000
1990 49.164 48.776 0.388 0.151
1991 49.548 48.776 0.772 0.596
1992 48.915 48.776 1.139 1.297
1993 50.315 48.776 1.539 2.369
1994 50.768 48.776 1.992 3.968
The mean is The question arises: can we use the mean to forecast income if we
not a good suspect a trend? A look at the graph below shows clearly that we should
estimator not do this.
when there
are trends
Moving The next table summarizes the process, which is referred to as Moving
average Averaging. The general expression for the moving average is
example Mt = [ Xt + Xt-1 + ... + Xt-N+1] / N
1 9
2 8
3 9 8.667 0.333 0.111
4 12 9.667 2.333 5.444
5 9 10.000 -1.000 1.000
6 12 11.000 1.000 1.000
7 11 10.667 0.333 0.111
8 7 10.000 -3.000 9.000
9 13 10.333 2.667 7.111
10 9 9.667 -0.667 0.444
11 11 11.000 0 0
12 10 10.000 0 0
1 9
2 8
3 9 9.5
4 12 10.0
5 9 10.75
6 12
7 11
The first The initial EWMA plays an important role in computing all the
forecast is subsequent EWMA's. Setting S2 to y1 is one method of initialization.
very Another way is to set it to the target of the process.
important
Still another possibility would be to average the first four or five
observations.
It can also be shown that the smaller the value of , the more important
is the selection of the initial EWMA. The user would be wise to try a
few methods, (assuming that the software has them available) before
finalizing the settings.
Expand Let us expand the basic equation by first substituting for St-1 in the
basic basic equation to obtain
equation
St = yt-1 + (1- ) [ yt-2 + (1- ) St-2 ]
= yt-1 + (1- ) yt-2 + (1- )2 St-2
Summation By substituting for St-2, then for St-3, and so forth, until we reach S2
formula for (which is just y1), it can be shown that the expanding equation can be
basic written as:
equation
Expanded For example, the expanded equation for the smoothed value S5 is:
equation for
S5
From the last formula we can see that the summation term shows that
the contribution to the smoothed value St becomes less at each
consecutive time period.
Example for Let = .3. Observe that the weights (1- ) t decrease exponentially
= .3 (geometrically) with time.
Value weight
last y1 .2100
y2 .1470
y3 .1029
y4 .0720
How do you The speed at which the older responses are dampened (smoothed) is a
choose the function of the value of . When is close to 1, dampening is quick
weight and when is close to 0, dampening is slow. This is illustrated in the
parameter? table below:
---------------> towards past observations
(1- ) (1- ) 2 (1- ) 3 (1- ) 4
We choose the best value for so the value which results in the
smallest MSE.
Example Let us illustrate this principle with an example. Consider the following
data set consisting of 12 observations taken over time:
Error
Time yt S ( =.1) Error squared
1 71
2 70 71 -1.00 1.00
3 69 70.9 -1.90 3.61
4 68 70.71 -2.71 7.34
5 64 70.44 -6.44 41.47
6 65 69.80 -4.80 23.04
7 72 69.32 2.68 7.18
8 78 69.58 8.42 70.90
9 75 70.43 4.57 20.88
10 75 70.88 4.12 16.97
11 75 71.29 3.71 13.76
12 70 71.67 -1.67 2.79
The sum of the squared errors (SSE) = 208.94. The mean of the squared
errors (MSE) is the SSE /11 = 19.0.
Calculate The MSE was again calculated for = .5 and turned out to be 16.29, so
for different in this case we would prefer an of .5. Can we do better? We could
values of apply the proven trial-and-error method. This is an iterative procedure
beginning with a range of between .1 and .9. We determine the best
initial choice for and then search between - and + . We
could repeat this perhaps one more time to find the best to 3 decimal
places.
Nonlinear But there are better search methods, such as the Marquardt procedure.
optimizers This is a nonlinear optimizer that minimizes the sum of squares of
can be used residuals. In general, most well designed statistical software programs
should be able to find the value of that minimizes the MSE.
Sample plot
showing
smoothed
data for 2
values of
In other words, the new forecast is the old one plus an adjustment for
the error that occurred in the last forecast.
Bootstrapping of Forecasts
Bootstrapping What happens if you wish to forecast from some origin, usually the
forecasts last data point, and no actual observations are available? In this
situation we have to modify the formula to become:
Example of Bootstrapping
Example The last data point in the previous example was 70 and its forecast
(smoothed value S) was 71.7. Since we do have the data point and the
forecast available, we can calculate the next forecast using the regular
formula
Table The following table displays the comparison between the two methods:
comparing Period Bootstrap Data Single Smoothing
two methods forecast Forecast
13 71.50 75 71.5
14 71.35 75 71.9
15 71.21 74 72.2
16 71.09 78 72.4
17 70.98 86 73.0
Sample data Let us demonstrate this with the following data set smoothed with an
set with trend of 0.3:
Data Fit
6.4
5.6 6.4
7.8 6.2
8.8 6.7
11.0 7.3
11.6 8.4
16.7 9.4
15.3 11.6
21.6 12.7
22.4 15.4
Note that the current value of the series is used to calculate its smoothed
value replacement in double exponential smoothing.
Initial Values
Several As in the case for single smoothing, there are a variety of schemes to set
methods to initial values for St and bt in double smoothing.
choose the
initial S1 is in general set to y1. Here are three suggestions for b1:
values b1 = y2 - y1
b1 = (yn - y1)/(n - 1)
Comments
Meaning of The first smoothing equation adjusts St directly for the trend of the
the previous period, bt-1, by adding it to the last smoothed value, St-1. This
smoothing helps to eliminate the lag and brings St to the appropriate base of the
equations
current value.
The second smoothing equation then updates the trend, which is
expressed as the difference between the last two values. The equation is
similar to the basic form of single smoothing, but here applied to the
updating of the trend.
Non-linear The values for and can be obtained via non-linear optimization
optimization techniques, such as the Marquardt Algorithm.
techniques
can be used
Example
Comparison of Forecasts
Table To see how each method predicts the future, we computed the first five
showing forecasts from the last observation as follows:
single and Period Single Double
double
exponential 11 22.4 25.8
smoothing 12 22.4 28.7
forecasts 13 22.4 31.7
14 22.4 34.6
15 22.4 37.6
Plot A plot of these results (using the forecasted double smoothing values) is
comparing very enlightening.
single and
double
exponential
smoothing
forecasts
This graph indicates that double smoothing follows the data much closer
than single smoothing. Furthermore, for forecasting single smoothing
cannot do better than projecting a straight horizontal line, which is not
very likely to occur in reality. So in this case double smoothing is
preferred.
To handle In this case double smoothing will not work. We now introduce a third
seasonality, equation to take care of seasonality (sometimes called periodicity). The
we have to resulting set of equations is called the "Holt-Winters" (HW) method after
add a third the names of the inventors.
parameter
The basic equations for their method are given by:
where
● y is the observation
● S is the smoothed observation
● b is the trend factor
● I is the seasonal index
● F is the forecast at m periods ahead
● t is an index denoting a time period
and , , and are constants that must be estimated in such a way that the
MSE of the error is minimized. This is best left to a good software package.
Complete To initialize the HW method we need at least one complete season's data to
season determine initial estimates of the seasonal indices I t-L.
needed
L periods A complete season's data consists of L periods. And we need to estimate the
in a season trend factor from one period to the next. To accomplish this, it is advisable
to use two complete seasons; that is, 2L periods.
How to get The general formula to estimate the initial trend is given by
initial
estimates
for trend
and
seasonality
parameters
As we will see in the example, we work with data that consist of 6 years
with 4 periods (that is, 4 quarters) per year. Then
Step 3: Step 3: Now the seasonal indices are formed by computing the average of
form each row. Thus the initial seasonal indices (symbolically) are:
seasonal I1 = ( y1/A1 + y5/A2 + y9/A3 + y13/A4 + y17/A5 + y21/A6)/6
indices I2 = ( y2/A1 + y6/A2 + y10/A3 + y14/A4 + y18/A5 + y22/A6)/6
I3 = ( y3/A1 + y7/A2 + y11/A3 + y15/A4 + y19/A5 + y22/A6)/6
I4 = ( y4/A1 + y8/A2 + y12/A3 + y16/A4 + y20/A5 + y24/A6)/6
We now know the algebra behind the computation of the initial estimates.
The next page contains an example of triple exponential smoothing.
Plot of raw
data with
single,
double, and
triple
exponential
forecasts
6906 .4694
5054 .1086 1.000
936 1.000 1.000
520 .7556 0.000 .9837
Computation The data set consists of quarterly sales data. The season is 1 year and
of initial since there are 4 quarters per year, L = 4. Using the formula we obtain:
trend
Table of 1 2 3 4 5 6
initial
seasonal 1 362 382 473 544 628 627
indices 2 385 409 513 582 707 725
3 432 498 582 681 773 854
4 341 387 474 557 592 661
380 419 510.5 591 675 716.75
In this example we used the full 6 years of data. Other schemes may use
only 3, or some other number of years. There are also a number of ways
to compute initial estimates.
Data Each line contains the CO2 concentration (mixing ratio in dry air,
expressed in the WMO X85 mole fraction scale, maintained by the Scripps
Institution of Oceanography). In addition, it contains the year, month, and
a numeric value for the combined month and year. This combined date is
useful for plotting purposes.
Data
Southern
6.4.4.2. Stationarity
Stationarity A common assumption in many time series techniques is that the
data are stationary.
A stationary process has the property that the mean, variance and
autocorrelation structure do not change over time. Stationarity can
be defined in precise mathematical terms, but for our purpose we
mean a flat looking series, without trend, constant variance over
time, a constant autocorrelation structure over time and no periodic
fluctuations (seasonality).
The differenced data will contain one less point than the
original data. Although you can difference the data more than
once, one differene is usually sufficient.
2. If the data contain a trend, we can fit some type of curve to
the data and then model the residuals from that fit. Since the
purpose of the fit is to simply remove long term trend, a
simple fit, such as a straight line, is typically used.
3. For non-constant variance, taking the logarithm or square root
of the series may stabilize the variance. For negative data, you
can add a suitable constant to make all the data positive before
applying the transformation. This constant can then be
subtracted from the model to obtain predicted (i.e., the fitted)
values and forecasts for future points.
The above techniques are intended to generate series with constant
Example The following plots are from a data set of monthly CO2
concentrations.
Run Sequence
Plot
The initial run sequence plot of the data indicates a rising trend. A
visual inspection of this plot indicates that a simple linear fit should
be sufficient to remove this upward trend.
This plot also shows periodical behavior. This is discussed in the
next section.
Linear Trend
Removed
This plot contains the residuals from a linear fit to the original data.
After removing the linear trend, the run sequence plot indicates that
the data have a constant location and variance, although the pattern
of the residuals shows that the data depart from the model in a
systematic way.
6.4.4.3. Seasonality
Seasonality Many time series display seasonality. By seasonality, we mean periodic
fluctuations. For example, retail sales tend to peak for the Christmas
season and then decline after the holidays. So time series of retail sales
will typically show increasing sales from September through December
and declining sales in January and February.
Seasonality is quite common in economic time series. It is less common
in engineering and scientific data.
If seasonality is present, it must be incorporated into the time series
model. In this section, we discuss techniques for detecting seasonality.
We defer modeling of seasonality until later sections.
seasonal periods are known. In most cases, the analyst will in fact know
this. For example, for monthly data, the period is 12 since there are 12
months in a year. However, if the period is not known, the
autocorrelation plot can help. If there is significant seasonality, the
autocorrelation plot should show spikes at lags equal to the period. For
example, for monthly data, if there is a seasonality effect, we would
expect to see significant peaks at lag 12, 24, 36, and so on (although the
intensity may decrease the further out we go).
Example The following plots are from a data set of southern oscillations for
without predicting el nino.
Seasonality
Run
Sequence
Plot
Seasonal
Subseries
Plot
The means for each month are relatively close and show no obvious
pattern.
Box Plot
Example The following plots are from a data set of monthly CO2 concentrations.
with A linear trend has been removed from these data.
Seasonality
Run
Sequence
Plot
Seasonal
Subseries
Plot
The seasonal subseries plot shows the seasonal pattern more clearly. In
Box Plot
As with the seasonal subseries plot, the seasonal pattern is quite evident
in the box plot.
Sample Plot
This plot allows you to detect both between group and within group
patterns.
If there is a large number of observations, then a box plot may be
preferable.
Questions The seasonal subseries plot can provide answers to the following
questions:
1. Do the data exhibit a seasonal pattern?
2. What is the nature of the seasonality?
3. Is there a within-group pattern (e.g., do January and July exhibit
similar patterns)?
4. Are there any outliers once seasonality has been accounted for?
Software Seasonal subseries plots are available in a few general purpose statistical
software programs. They are available in Dataplot. It may possible to
write macros to generate this plot in most statistical software programs
that do not provide it directly.
Trend, One approach is to decompose the time series into a trend, seasonal,
Seasonal, and residual component.
Residual
Decompositions Triple exponential smoothing is an example of this approach. Another
example, called seasonal loess, is based on locally weighted least
squares and is discussed by Cleveland (1993). We do not discuss
seasonal loess in this handbook.
where Xt is the time series, is the mean of the series, At-i are white
noise, and 1, ... , q are the parameters of the model. The value of q
is called the order of the MA model.
That is, a moving average model is conceptually a linear regression of
the current value of the series against the white noise or random
shocks of one or more prior values of the series. The random shocks
at each point are assumed to come from the same distribution,
typically a normal distribution, with location at zero and constant
scale. The distinction in this model is that these random shocks are
propogated to future values of the time series. Fitting the MA
estimates is more complicated than with AR models because the error
terms are not observable. This means that iterative non-linear fitting
procedures need to be used in place of linear least squares. MA
models also have a less obvious interpretation than AR models.
Sometimes the ACF and PACF will suggest that a MA model would
be a better model choice and sometimes both AR and MA terms
should be used in the same model (see Section 6.4.4.5).
Note, however, that the error terms after the model is fit should be
independent and follow the standard assumptions for a univariate
process.
Box-Jenkins Box and Jenkins popularized an approach that combines the moving
Approach average and the autoregressive approaches in the book "Time Series
Analysis: Forecasting and Control" (Box, Jenkins, and Reinsel,
1994).
Although both autoregressive and moving average approaches were
already known (and were originally investigated by Yule), the
contribution of Box and Jenkins was in developing a systematic
methodology for identifying and estimating models that could
incorporate both approaches. This makes Box-Jenkins models a
powerful class of models. The next several sections will discuss these
models in detail.
where the terms in the equation have the same meaning as given for the
AR and MA model.
Stages in There are three primary stages in building a Box-Jenkins time series
Box-Jenkins model.
Modeling 1. Model Identification
2. Model Estimation
3. Model Validation
Detecting Stationarity can be assessed from a run sequence plot. The run
stationarity sequence plot should show constant location and scale. It can also be
detected from an autocorrelation plot. Specifically, non-stationarity is
often indicated by an autocorrelation plot with very slow decay.
Identify p and q Once stationarity and seasonality have been addressed, the next step
is to identify the order (i.e., the p and q) of the autoregressive and
moving average terms.
Autocorrelation The primary tools for doing this are the autocorrelation plot and the
and Partial partial autocorrelation plot. The sample autocorrelation plot and the
Autocorrelation sample partial autocorrelation plot are compared to the theoretical
Plots behavior of these plots when the order is known.
Examples We show a typical series of plots for performing the initial model
identification for
1. the southern oscillations data and
2. the CO2 monthly concentrations data.
Run Sequence
Plot
Seasonal
Subseries Plot
Autocorrelation
Plot
Partial
Autocorrelation
Plot
Run Sequence
Plot
The initial run sequence plot of the data indicates a rising trend. A
visual inspection of this plot indicates that a simple linear fit should
be sufficient to remove this upward trend.
Linear Trend
Removed
This plot contains the residuals from a linear fit to the original data.
After removing the linear trend, the run sequence plot indicates that
the data have a constant location and variance, which implies
stationarity.
However, the plot does show seasonality. We generate an
autocorrelation plot to help determine the period followed by a
seasonal subseries plot.
Autocorrelation
Plot
Seasonal
Subseries Plot
Autocorrelation
Plot for
Seasonally
Differenced
Data
Partial
Autocorrelation
Plot of
Seasonally
Differenced
Data
remaining seasonality.
In summary, our intial attempt would be to fit an AR(2) model with a
seasonal AR(12) term on the data with a linear trend line removed.
We could try the model both with and without seasonal differencing
applied. Model validation should be performed before accepting this
as a final model.
Sample Plot
Questions The partial autocorrelation plot can help provide answers to the
following questions:
1. Is an AR model appropriate for the data?
2. If an AR model is appropriate, what order should we use?
Case Study The partial autocorrelation plot is demonstrated in the Negiz data case
study.
4-Plot of As discussed in the EDA chapter, one way to assess if the residuals
Residuals from the Box-Jenkins model follow the assumptions is to generate a
4-plot of the residuals and an autocorrelation plot of the residuals. One
could also look at the value of the Box-Ljung (1978) statistic.
Output from other software programs will be similar, but not identical.
Model
With the SEMSTAT program, you start by entering a valid file name or you can select a
Identification
file extension to search for files of particular interest. In this program, if you press the
Section
enter key, ALL file names in the directory are displayed.
Enter FILESPEC or EXTENSION (1-3 letters): To quit, press F10.
? bookf.bj
MAX MIN MEAN VARIANCE NO. DATA
80.0000 23.0000 51.7086 141.8238 70
Do you wish to make transformations? y/n n
Input order of difference or 0: 0
Input period of seasonality (2-12) or 0: 0
Model
Fitting Enter FILESPEC or EXTENSION (1-3 letters): To quit, press F10.
Section
? bookf.bj
MAX MIN MEAN VARIANCE NO. DATA
80.0000 23.0000 51.7086 141.8238 70
Do you wish to make transformations? y/n n
Input order of difference or 0: 0
Input NUMBER of AR terms: 2
Input NUMBER of MA terms: 0
Input period of seasonality (2-12) or 0: 0
*********** OUTPUT SECTION ***********
AR estimates with Standard Errors
Phi 1 : -0.3397 0.1224
Phi 2 : 0.1904 0.1223
Forecasting
Section
---------------------------------------------------
FORECASTING SECTION
---------------------------------------------------
Analyzing If you observe very large autocorrelations at lags spaced n periods apart, for
Autocorrelation example at lags 12 and 24, then there is evidence of periodicity. That effect
Plot for should be removed, since the objective of the identification stage is to reduce
Seasonality the autocorrelations throughout. So if simple differencing was not enough,
try seasonal differencing at a selected period. In the above case, the period is
12. It could, of course, be any value, such as 4 or 6.
The number of seasonal terms is rarely more than 1. If you know the shape of
your forecast function, or you wish to assign a particular shape to the forecast
function, you can select the appropriate number of terms for seasonal AR or
seasonal MA models.
The book by Box and Jenkins, Time Series Analysis Forecasting and Control
(the later edition is Box, Jenkins and Reinsel, 1994) has a discussion on these
forecast functions on pages 326 - 328. Again, if you have only a faint notion,
but you do know that there was a trend upwards before differencing, pick a
seasonal MA term and see what comes out in the diagnostics.
The results after taking a seasonal difference look good!
Model Fitting Now we can proceed to the estimation, diagnostics and forecasting routines.
Section The following program is again executed from a menu and issues the
following flow of output:
Enter FILESPEC or EXTENSION (1-3 letters):
To quit press F10.
? bookg.bj
MAX MIN MEAN VARIANCE NO. DATA
622.0000 104.0000 280.2986 14391.9170 144
y (we selected a square root
Do you wish to make transformation because a closer
transformations? y/n inspection of the plot revealed
increasing variances over time)
Statistics of Transformed series:
Mean: 5.542 Variance 0.195
Input order of difference or 0: 1
Input NUMBER of AR terms: Blank defaults to 0
Input NUMBER of MA terms: 1
Input period of seasonality (2-12) or
12
0:
Input order of seasonal difference or
1
0:
Input NUMBER of seasonal AR
Blank defaults to 0
terms:
Input NUMBER of seasonal MA
1
terms:
Statistics of Differenced series:
Forecasting Defaults are obtained by pressing the enter key, without input.
Section Default for number of periods ahead from last period = 6.
Default for the confidence band around the forecast = 90%.
Next Period Lower Forecast Upper
145 423.4257 450.1975 478.6620
146 382.9274 411.6180 442.4583
147 407.2839 441.9742 479.6191
148 437.8781 479.2293 524.4855
149 444.3902 490.1471 540.6153
150 491.0981 545.5740 606.0927
151 583.6627 652.7856 730.0948
152 553.5620 623.0632 701.2905
153 458.0291 518.6510 587.2965
with
Interesting There are a few interesting properties associated with the phi or AR
properties of parameter matrices. Consider the following example for a bivariate
parameter series with n =2, p = 2, and q = 0. The ARMAV(2,0) model is:
matrices
Without loss of generality, assume that the X series is input and the Y series
are output and that the mean vector = (0,0).
Therefore, tranform the observation by subtracting their respective averages.
Diagonal The diagonal terms of each Phi matrix are the scalar estimates for each series,
terms of in this case:
Phi matrix
1.11, 2.11 for the input series X,
Transfer The lower off-diagonal elements represent the influence of the input on the
mechanism output.
This is called the "transfer" mechanism or transfer-function model as
discussed by Box and Jenkins in Chapter 11. The terms here correspond to
their terms.
The upper off-diagonal terms represent the influence of the output on the
input.
Feedback This is called "feedback". The presence of feedback can also be seen as a high
value for a coefficient in the correlation matrix of the residuals. A "true"
transfer model exists when there is no feedback.
This can be seen by expressing the matrix form into scalar form:
Delay Finally, delay or "dead' time can be measured by studying the lower
off-diagonal elements again.
If, for example, 1.21 is non-significant, the delay is 1 time period.
Plots of The plots of the input and output series are displayed below.
input and
output
series
-------------------------------------------------------------------------------
Statistics on the Residuals
MEANS
-0.0000 0.0000
COVARIANCE MATRIX
0.01307 -0.00118
-0.00118 0.06444
CORRELATION MATRIX
1.0000 -0.0407
-0.0407 1.0000
----------------------------------------------------------------------
--------------------------------------------------------
FORECASTING SECTION
--------------------------------------------------------
The forecasting method is an extension of the model and follows the
theory outlined in the previous section. Based on the estimated variances
and number
of forecasts we can compute the forecasts and their confidence limits.
The user, in this software, is able to choose how many forecasts to
obtain, and at what confidence levels.
Defaults are obtained by pressing the enter key, without input.
Default for number of periods ahead from last period = 6.
Default for the confidence band around the forecast = 90%.
How many periods ahead to forecast? 6
Enter confidence level for the forecast limits : .90:
SERIES: 1
6.5. Tutorials
Tutorial 1. What do we mean by "Normal" data?
contents 2. What do we do when data are "Non-normal"?
3. Elements of Matrix Algebra
1. Numerical Examples
2. Determinant and Eigenstructure
4. Elements of Multivariate Analysis
1. Mean vector and Covariance Matrix
2. The Multivariate Normal Distribution
3. Hotelling's T2
1. Example of Hotelling's T2 Test
2. Example 1 (continued)
3. Example 2 (multiple groups)
4. Hotelling's T2 Chart
5. Principal Components
1. Properties of Principal Components
2. Numerical Example
Normal
probability
distribution
Parameters The parameters of the normal distribution are the mean and the
of normal standard deviation (or the variance 2). A special notation is
distribution employed to indicate that X is normally distributed with these
parameters, namely
X ~ N( , ) or X ~ N( , 2).
Tables for the Tables of the cumulative standard normal distribution are given in
cumulative every statistics textbook and in the handbook. A rich variety of
standard approximations can be found in the literature on numerical methods.
normal
distribution For example, if = 0 and = 1 then the area under the curve from -
1 to + 1 is the area from 0 - 1 to 0 + 1, which is 0.6827. Since
most standard normal tables give area to the left of the lookup value,
they will have for z = 1 an area of .8413 and for z = -1 an area of .1587.
By subtraction we obtain the area between -1 and +1 to be .8413 -
.1587 = .6826.
The Box-Cox
Transformation
Given the vector of data observations x = x1, x2, ...xn, one way to select the
power is to use the that maximizes the logarithm of the likelihood
function
The logarithm
of the
likelihood
function where
Confidence In addition, a confidence bound (based on the likelihood ratio statistic) can
bound for be constructed for as follows: A set of values that represent an
approximate 100(1- )% confidence bound for is formed from those
that satisfy
Example of the To illustrate the procedure, we used the data from Johnson and Wichern's
Box-Cox textbook (Prentice Hall 1988), Example 4.14. The observations are
scheme microwave radiation measurements.
Table of The values of the log-likelihood function obtained by varying from -2.0
log-likelihood to 2.0 are given below.
values for LLF LLF LLF
various values
of -2.0 7.1146 -0.6 89.0587 0.7 103.0322
-1.9 14.1877 -0.5 92.7855 0.8 101.3254
-1.8 21.1356 -0.4 96.0974 0.9 99.3403
-1.7 27.9468 -0.3 98.9722 1.0 97.1030
-1.6 34.6082 -0.2 101.3923 1.1 94.6372
-1.5 41.1054 -0.1 103.3457 1.2 91.9643
-1.4 47.4229 0.0 104.8276 1.3 89.1034
-1.3 53.5432 0.1 105.8406 1.4 86.0714
1.2 59.4474 0.2 106.3947 1.5 82.8832
-1.1 65.1147 0.3 106.5069 1.6 79.5521
-0.9 75.6471 0.4 106.1994 1.7 76.0896
-0.8 80.4625 0.5 105.4985 1.8 72.5061
-0.7 84.9421 0.6 104.4330 1.9 68.8106
The Box-Cox transform is also discussed in Chapter 1 under the Box Cox
Linearity Plot and the Box Cox Normality Plot. The Box-Cox normality
plot discussion provides a graphical method for choosing to transform a
data set to normality. The criterion used to choose for the Box-Cox
linearity plot is the value of that maximizes the correlation between the
transformed x-values and the y-values when making a normal probability
plot of the (transformed) data.
Basic Vectors and matrices are arrays of numbers. The algebra for symbolic
definitions operations on them is different from the algebra for operations on
and scalars, or single numbers. For example there is no division in matrix
operations of algebra, although there is an operation called "multiplying by an
matrix inverse". It is possible to express the exact equivalent of matrix algebra
algebra - equations in terms of scalar algebra expressions, but the results look
needed for rather messy.
multivariate
analysis It can be said that the matrix algebra notation is shorthand for the
corresponding scalar longhand.
Sum of two The sum of two vectors (say, a and b) is the vector of sums of
vectors corresponding elements.
Sample matrices If
then
Matrix addition,
subtraction, and
multipication
and
Multiply matrix To multiply a a matrix by a given scalar, each element of the matrix
by a scalar is multiplied by that scalar
Identity matrix To augment the notion of the inverse of a matrix, A-1 (A inverse) we
notice the following relation
A-1A = A A-1 = I
I is a matrix of form
Definition of the
characteristic
equation for 2x2
matrix
Definition of A matrix with rows and columns exchanged in this manner is called the
Transpose transpose of the original matrix.
Definition of The mean vector consists of the means of each variable and the
mean vector variance-covariance matrix consists of the variances of the variables
and along the main diagonal and the covariances between each pair of
variance- variables in the other matrix positions.
covariance
matrix The formula for computing the covariance of the variables X and Y is
where the mean vector contains the arithmetic averages of the three
variables and the (unbiased) variance-covariance matrix S is calculated
by
Centroid, The mean vector is often referred to as the centroid and the
dispersion variance-covariance matrix as the dispersion or dispersion matrix. Also,
matix the terms variance-covariance matrix and covariance matrix are used
interchangeably.
where m = (m1, ..., mp) is the vector of means and is the variance-covariance
matrix of the multivariate normal distribution. The shortcut notation for this density
is
Univariate When p = 1, the one-dimensional vector X = X1 has the normal distribution with
normal mean m and variance 2
distribution
so that
with
Result does Although this result applies to hypothesis testing, it does not apply
not apply directly to multivariate Shewhart-type charts (for which there is no
directly to 0), although the result might be used as an approximation when a
multivariate large sample is used and data are in subgroups, with the upper control
Shewhart-type limit (UCL) of a chart based on the approximation.
charts
Selection of Three-sigma units are generally not used with multivariate charts,
different however, which makes the selection of different control limit forms for
control limit each Phase (based on the relevant distribution theory), a natural
forms for choice.
each Phase
Obtaining the Each i, i = 1, 2, ..., p, is obtained the same way as with an chart,
i namely, by taking k subgroups of size n and computing
Here is used to denote the average for the lth subgroup of the ith
variable. That is,
with xilr denoting the rth observation (out of n) for the ith variable in
the lth subgroup.
Estimating The variances and covariances are similarly averaged over the
the variances subgroups. Specifically, the sij elements of the variance-covariance
and matrix S are obtained as
covariances
Compare T2 As with an chart (or any other chart), the k subgroups would be
against tested for control by computing k values of T2 and comparing each
control against the UCL. If any value falls above the UCL (there is no lower
values control limit), the corresponding subgroup would be investigated.
Formula for Each of the k values of given in the equation above would be
the upper
compared with
control limit
Lower A lower control limit is generally not used in multivariate control chart
control limits applications, although some control chart methods do utilize a LCL.
Although a small value for might seem desirable, a value that is
very small would likely indicate a problem of some type as we would
not expect every element of to be virtually equal to every element
in .
Delete As with any Phase I control chart procedure, if there are any points that
out-of-control plot above the UCL and can be identified as corresponding to
points once out-of-control conditions that have been corrected, the point(s) should
cause be deleted and the UCL recomputed. The remaining points would then
discovered be compared with the new UCL and the process continued as long as
and corrected necessary, remembering that points should be deleted only if their
correspondence with out-of-control conditions can be identified and the
cause(s) of the condition(s) were removed.
Illustration To illustrate, assume that a subgroups had been discarded (with possibly a = 0) so
that k - a subgroups are used in obtaining and . We shall let these two values
be represented by and to distinguish them from the original values, and
, before any subgroups are deleted. Future values to be plotted on the
multivariate chart would then be obtained from
Phase II
control
limits
with a denoting the number of the original subgroups that are deleted before
computing and . Notice that the equation for the control limits for Phase II
given here does not reduce to the equation for the control limits for Phase I when a
= 0, nor should we expect it to since the Phase I UCL is used when testing for
control of the entire set of subgroups that is used in computing and .
Delete As in the case when subgroups are used, if any points plot outside these
points if control limits and special cause(s) that were subsequently removed can
special be identified, the point(s) would be deleted and the control limits
cause(s) are recomputed, making the appropriate adjustments on the degrees of
identified freedom, and re-testing the remaining points against the new limits.
and
corrected
Further The control limit expressions given in this section and the immediately
Information preceding sections are given in Ryan (2000, Chapter 9).
Inverse While these principal factors represent or replace one or more of the
transformaion original variables, it should be noted that they are not just a one-to-one
not possible transformation, so inverse transformations are not possible.
Original data To shed a light on the structure of principal components analysis, let
matrix us consider a multivariate data matrix X, with n rows and p columns.
The p elements of each row are scores or measurements on a subject
such as height, weight and age.
Linear Next, standardize the X matrix so that each column mean is 0 and
function that each column variance is 1. Call this matrix Z. Each column is a vector
maximizes variable, zi, i = 1, . . . , p. The main idea behind principal component
variance analysis is to derive a linear function y for each of the vector variables
zi. This linear function possesses an extremely important property;
namely, its variance is maximized.
Number of At this juncture you may be tempted to say: "so what?". To answer
parameters to this let us look at the intercorrelations among the elements of a vector
estimate variable. The number of parameters to be estimated for a p-element
increases variable is
rapidly as p ● p means
increases
● p variances
Orthogonal To produce a transformation vector for y for which the elements are
transformations uncorrelated is the same as saying that we want V such that Dy is a
simplify things diagonal matrix. That is, all the off-diagonal elements of Dy must be
zero. This is called an orthogonalizing transformation.
Infinite number There are an infinite number of values for V that will produce a
of values for V diagonal Dy for any correlation matrix R. Thus the mathematical
problem "find a unique V such that Dy is diagonal" cannot be solved
as it stands. A number of famous statisticians such as Karl Pearson
and Harold Hotelling pondered this problem and suggested a
"variance maximizing" solution.
Constrain v to The constraint on the numbers in v1 is that the sum of the squares of
generate a the coefficients equals 1. Expressed mathematically, we wish to
unique solution maximize
where
y1i = v1' zi
The eigenstructure
Lagrange Let
multiplier
approach
>
introducing the restriction on v1 via the Lagrange multiplier
approach. It can be shown (T.W. Anderson, 1958, page 347, theorem
8) that the vector of partial derivatives is
and setting this equal to zero, dividing out 2 and factoring gives
Largest Specifically, the largest eigenvalue, 1, and its associated vector, v1,
eigenvalue are required. Solving for this eigenvalue and vector is another
mammoth numerical task that can realistically only be performed by
a computer. In general, software is involved and the algorithms are
complex.
Remainig p After obtaining the first eigenvalue, the process is repeated until all p
eigenvalues eigenvalues are computed.
Principal Factors
Scale to zero It was mentioned before that it is helpful to scale any transformation
means and unit y of a vector variable z so that its elements have zero means and unit
variances variances. Such a standardized transformation is called a factoring of
z, or of R, and each linear component of the transformation is called
a factor.
Deriving unit Now, the principal components already have zero means, but their
variances for variances are not 1; in fact, they are the eigenvalues, comprising the
principal diagonal elements of L. It is possible to derive the principal factor
components with unit variance from the principal component as follows
where
B = VL -1/2
B matrix The matrix B is then the matrix of factor score coefficients for
principal factors.
Dimensionality The number of eigenvalues, N, used in the final set determines the
of the set of dimensionality of the set of factor scores. For example, if the original
factor scores test consisted of 8 measurements on 100 subjects, and we extract 2
eigenvalues, the set of factor scores is a matrix of 100 rows by 2
columns.
Factor Structure
The SS' is the source of the "explained" correlations among the variables.
communality Its diagonal is called "the communality".
Rotation
Factor analysis If this correlation matrix, i.e., the factor structure matrix, does not
help much in the interpretation, it is possible to rotate the axis of the
principal components. This may result in the polarization of the
correlation coefficients. Some practitioners refer to rotation after
generating the factor structure as factor analysis.
Communality
Formula for A measure of how well the selected factors (principal components)
communality "explain" the variance of each of the variables is given by a statistic
statistic called communality. This is defined by
Explanation of That is: the square of the correlation of variable k with factor i gives
communality the part of the variance accounted for by that factor. The sum of these
statistic squares for n factors is the communality, or explained variable for
that variable (row).
Main steps to In summary, here are the main steps to obtain the eigenstructure for a
obtaining correlation matrix.
eigenstructure 1. Compute R, the correlation matrix of the original data. R is
for a also the correlation matrix of the standardized data.
correlation
2. Obtain the characteristic equation of R which is a polynomial
matrix
of degree p (the number of variables), obtained from
expanding the determinant of |R- I| = 0 and solving for the
roots i, that is: 1, 2, ... , p.
3. Then solve for the columns of the V matrix, (v1, v2, ..vp). The
roots, , i, are called the eigenvalues (or latent values). The
columns of V are called the eigenvectors.
Sample data Let us analyze the following 3-variate dataset with 10 observations. Each
set observation consists of 3 measurements on a wafer: thickness, horizontal
displacement and vertical displacement.
Solve for the Next solve for the roots of R, using software
roots of R value proportion
1 1.769 .590
2 .927 .899
3 .304 1.000
Notice that
● Each eigenvalue satisfies |R- I| = 0.
● The sum of the eigenvalues = 3 = p, which is equal to the trace of R (i.e., the
sum of the main diagonal elements).
● The determinant of R is the product of the eigenvalues.
● The product is 1 x 2 x 3 = .499.
Compute the Substituting the first eigenvalue of 1.769 and R in the appropriate equation we
first column obtain
of the V
matrix
This is the matrix expression for 3 homogeneous equations with 3 unknowns and
yields the first column of V: .64 .69 -.34 (again, a computerized solution is
indispensable).
Compute the Repeating this procedure for the other 2 eigenvalues yields the matrix V
remaining
columns of
the V matrix
Notice that if you multiply V by its transpose, the result is an identity matrix,
V'V=I.
Compute the Now form the matrix L1/2, which is a diagonal matrix whose elements are the
L1/2 matrix square roots of the eigenvalues of R. Then obtain S, the factor structure, using S =
V L1/2
So, for example, .91 is the correlation between variable 2 and the first principal
component.
Compute the Next compute the communality, using the first two eigenvalues only
communality
This means that the first two principal components "explain" 86.62% of the first
variable, 84.20 % of the second variable, and 98.76% of the third.
Compute the The coefficient matrix, B, is formed using the reciprocals of the diagonals of L1/2
coefficient
matrix
Compute the Finally, we can compute the factor scores from ZB, where Z is X converted to
principal standard score form. These columns are the principal factors.
factors
Principal These factors can be plotted against the indices, which could be times. If time is
factors used, the resulting plot is an example of a principal factors control chart.
control
chart
Semiconductor One of the assumptions in using classical Shewhart SPC charts is that the only
processing source of variation is from part to part (or within subgroup variation). This is
creates the case for most continuous processing situations. However, many of today's
multiple processing situations have different sources of variation. The semiconductor
sources of industry is one of the areas where the processing creates multiple sources of
variability to variation.
monitor
In semiconductor processing, the basic experimental unit is a silicon wafer.
Operations are performed on the wafer, but individual wafers can be grouped
multiple ways. In the diffusion area, up to 150 wafers are processed in one
time in a diffusion tube. In the etch area, single wafers are processed
individually. In the lithography area, the light exposure is done on sub-areas of
the wafer. There are many times during the production of a computer chip
where the experimental unit varies and thus there are different sources of
variation in this batch processing environment.
tHE following is a case study of a lithography process. Five sites are measured
on each wafer, three wafers are measured in a cassette (typically a grouping of
24 - 25 wafers) and thirty cassettes of wafers are used in the study. The width
of a line is the measurement under study. There are two line width variables.
The first is the original data and the second has been cleaned up somewhat.
This case study uses the raw data. The entire data table is 450 rows long with
six columns.
Case study
data: wafer Raw Cleaned
line width Line Line
measurements Cassette Wafer Site Width Sequence Width
=====================================================
1 1 Top 3.199275 1 3.197275
1 1 Lef 2.253081 2 2.249081
1 1 Cen 2.074308 3 2.068308
1 1 Rgt 2.418206 4 2.410206
1 1 Bot 2.393732 5 2.383732
1 2 Top 2.654947 6 2.642947
1 2 Lef 2.003234 7 1.989234
1 2 Cen 1.861268 8 1.845268
1 2 Rgt 2.136102 9 2.118102
1 2 Bot 1.976495 10 1.956495
1 3 Top 2.887053 11 2.865053
1 3 Lef 2.061239 12 2.037239
1 3 Cen 1.625191 13 1.599191
1 3 Rgt 2.304313 14 2.276313
1 3 Bot 2.233187 15 2.203187
2 1 Top 3.160233 16 3.128233
2 1 Lef 2.518913 17 2.484913
2 1 Cen 2.072211 18 2.036211
2 1 Rgt 2.287210 19 2.249210
2 1 Bot 2.120452 20 2.080452
2 2 Top 2.063058 21 2.021058
2 2 Lef 2.217220 22 2.173220
2 2 Cen 1.472945 23 1.426945
2 2 Rgt 1.684581 24 1.636581
2 2 Bot 1.900688 25 1.850688
2 3 Top 2.346254 26 2.294254
2 3 Lef 2.172825 27 2.118825
2 3 Cen 1.536538 28 1.480538
2 3 Rgt 1.966630 29 1.908630
2 3 Bot 2.251576 30 2.191576
3 1 Top 2.198141 31 2.136141
3 1 Lef 1.728784 32 1.664784
3 1 Cen 1.357348 33 1.291348
3 1 Rgt 1.673159 34 1.605159
3 1 Bot 1.429586 35 1.359586
3 2 Top 2.231291 36 2.159291
3 2 Lef 1.561993 37 1.487993
3 2 Cen 1.520104 38 1.444104
3 2 Rgt 2.066068 39 1.988068
4-Plot of
Data
Run
Sequence
Plot of Data
Numerical
Summary
SUMMARY
***********************************************************************
* LOCATION MEASURES * DISPERSION MEASURES
*
***********************************************************************
* MIDRANGE = 0.2957607E+01 * RANGE = 0.4422122E+01
*
* MEAN = 0.2532284E+01 * STAND. DEV. = 0.6937559E+00
*
* MIDMEAN = 0.2393183E+01 * AV. AB. DEV. = 0.5482042E+00
*
* MEDIAN = 0.2453337E+01 * MINIMUM = 0.7465460E+00
*
* = * LOWER QUART. = 0.2046285E+01
*
* = * LOWER HINGE = 0.2048139E+01
*
* = * UPPER HINGE = 0.2971948E+01
*
* = * UPPER QUART. = 0.2987150E+01
*
* = * MAXIMUM = 0.5168668E+01
*
***********************************************************************
* RANDOMNESS MEASURES * DISTRIBUTIONAL MEASURES
*
***********************************************************************
* AUTOCO COEF = 0.6072572E+00 * ST. 3RD MOM. = 0.4527434E+00
*
* = 0.0000000E+00 * ST. 4TH MOM. = 0.3382735E+01
*
* = 0.0000000E+00 * ST. WILK-SHA = 0.6957975E+01
*
* = * UNIFORM PPCC = 0.9681802E+00
*
* = * NORMAL PPCC = 0.9935199E+00
*
* = * TUK -.5 PPCC = 0.8528156E+00
*
* = * CAUCHY PPCC = 0.5245036E+00
*
***********************************************************************
This summary generates a variety of statistics. In this case, we are primarily interested in
the mean and standard deviation. From this summary, we see that the mean is 2.53 and
the standard deviation is 0.69.
Plot response The next step is to plot the response against each individual factor. For
agains comparison, we generate both a scatter plot and a box plot of the data.
individual The scatter plot shows more detail. However, comparisons are usually
factors easier to see with the box plot, particularly as the number of data points
and groups become larger.
Scatter plot
of width
versus
cassette
Box plot of
width versus
cassette
Interpretation We can make the following conclusions based on the above scatter and
box plots.
1. There is considerable variation in the location for the various
cassettes. The medians vary from about 1.7 to 4.
2. There is also some variation in the scale.
3. There are a number of outliers.
Scatter plot
of width
versus wafer
Box plot of
width versus
wafer
Interpretation We can make the following conclusions based on the above scatter and
box plots.
1. The locations for the 3 wafers are relatively constant.
2. The scales for the 3 wafers are relatively constant.
3. There are a few outliers on the high side.
4. It is reasonable to treat the wafer factor as homogeneous.
Scatter plot
of width
versus site
Box plot of
width versus
site
Interpretation We can make the following conclusions based on the above scatter and
box plots.
1. There is some variation in location based on site. The center site
in particular has a lower median.
2. The scales are relatively constant across sites.
3. There are a few outliers.
Dex mean We can use the dex mean plot and the dex standard deviation plot to
and sd plots show the factor means and standard deviations together for better
comparison.
Dex mean
plot
Dex sd plot
Summary The above graphs show that there are differences between the lots and
the sites.
There are various ways we can create subgroups of this dataset: each
lot could be a subgroup, each wafer could be a subgroup, or each site
measured could be a subgroup (with only one data value in each
subgroup).
Recall that for a classical Shewhart Means chart, the average within
subgroup standard deviation is used to calculate the control limits for
the Means chart. However, on the means chart you are monitoring the
subgroup mean-to-mean variation. There is no problem if you are in a
continuous processing situation - this becomes an issue if you are
operating in a batch processing environment.
We will look at various control charts based on different subgroupings
next.
Site as The first pair of control charts use the site as the subgroup. However,
subgroup since site has a subgroup size of one we use the control charts for
individual measurements. A moving average and a moving range chart
are shown.
Moving
average
control chart
Moving
range control
chart
Wafer as The next pair of control charts use the wafer as the subgroup. In this
subgroup case, that results in a subgroup size of 5. A mean and a standard
deviation control chart are shown.
Mean control
chart
SD control
chart
Note that there is no LCL here because of the small subgroup size.
Cassette as The next pair of control charts use the cassette as the subgroup. In this
subgroup case, that results in a subgroup size of 15. A mean and a standard
deviation control chart are shown.
Mean control
chart
SD control
chart
Interpretation Which of these subgroupings of the data is correct? As you can see,
each sugrouping produces a different chart. Part of the answer lies in
the manufacturing requirements for this process. Another aspect that
can be statistically determined is the magnitude of each of the sources
of variation. In order to understand our data structure and how much
variation each of our sources contribute, we need to perform a variance
component analysis. The variance component analysis for this data set
is shown below.
Equating If your software does not generate the variance components directly,
mean squares they can be computed from a standard analysis of variance output by
with expected equating means squares (MSS) to expected mean squares (EMS).
values
JMP ANOVA Below we show SAS JMP 4 output for this dataset that gives the SS,
output MSS, and components of variance (the model entered into JMP is a
nested, random factors model). The EMS table contains the
coefficients needed to write the equations setting MSS values equal to
their EMS's. This is further described below.
Variance From the ANOVA table, labelled "Tests wrt to Random Effects" in the
Components JMP output, we can make the following variance component
Estimation calculations:
Chart only Another solution would be to have one chart on the largest source of
most variation. This would mean we would have one set of charts that
important monitor the lot-to-lot variation. From a manufacturing standpoint, this
source of would be unacceptable.
variation
Use boxplot We could create a non-standard chart that would plot all the individual
type chart data values and group them together in a boxplot type format by lot. The
control limits could be generated to monitor the individual data values
while the lot-to-lot variation would be monitored by the patterns of the
groupings. This would take special programming and management
intervention to implement non-standard charts in most floor shop control
systems.
Alternate A commonly applied solution is the first option; have multiple charts on
form for this process. When creating the control limits for the lot means, care
mean must be taken to use the lot-to-lot variation instead of the within lot
control variation. The resulting control charts are: the standard
chart individuals/moving range charts (as seen previously), and a control chart
on the lot means that is different from the previous lot means chart. This
new chart uses the lot-to-lot variation to calculate control limits instead
of the average within-lot standard deviation. The accompanying
standard deviation chart is the same as seen previously.
Mean
control
chart using
lot-to-lot
variation
The control limits labeled with "UCL" and "LCL" are the standard
control limits. The control limits labeled with "UCL: LL" and "LCL:
LL" are based on the lot-to-lot variation.
Click on the links below to start Dataplot and run this case
study yourself. Each step may use results from previous The links in this column will connect you with more detailed
steps, so please be patient. Wait until the software verifies information about each analysis step from the case study
that the current step is complete before clicking on the next description.
step.
1. Numerical summary of WIDTH. 1. The summary shows the mean line width
is 2.53 and the standard deviation
of the line width is 0.69.
4. Subgroup analysis.
7. Generate an analysis of
7. The analysis of variance and
variance. This is not
currently implemented in components of variance
DATAPLOT for nested calculations show that
datasets. cassette to cassette
variation is 54% of the total
and site to site variation
is 36% of the total.
8. Generate a mean control chart
using lot-to-lot variation. 8. The mean control chart shows one
point that is on the boundary of
being out of control.
Data These data were collected from an aerosol mini-spray dryer device. The
Collection purpose of this device is to convert a slurry stream into deposited
particles in a drying chamber. The device injects the slurry at high
speed. The slurry is pulverized as it enters the drying chamber when it
comes into contact with a hot gas stream at low humidity. The liquid
contained in the pulverized slurry particles is vaporized, then
transferred to the hot gas stream leaving behind dried small-sized
particles.
The response variable is particle size, which is collected equidistant in
time. There are a variety of associated variables that may affect the
injection process itself and hence the size and quality of the deposited
particles. For this case study, we restrict our analysis to the response
variable.
Aerosol The data set consists of particle sizes collected over time. The basic
Particle Size distributional properties of this process are of interest in terms of
Dynamic distributional shape, constancy of size, and variation in size. In
Modeling addition, this time series may be examined for autocorrelation structure
and Control to determine a prediction model of particle size as a function of
time--such a model is frequently autoregressive in nature. Such a
high-quality prediction equation would be essential as a first step in
developing a predictor-corrective recursive feedback mechanism which
would serve as the core in developing and implementing real-time
dynamic corrective algorithms. The net effect of such algorthms is, of
course, a particle size distribution that is much less variable, much
more stable in nature, and of much higher quality. All of this results in
final ceramic mold products that are more uniform and predictable
across a wide range of important performance characteristics.
For the purposes of this case study, we restrict the analysis to
determining an appropriate Box-Jenkins model of the particle size.
Case study
data 115.36539
114.63150
114.63150
116.09940
116.34400
116.09940
116.34400
116.83331
116.34400
116.83331
117.32260
117.07800
117.32260
117.32260
117.81200
117.56730
118.30130
117.81200
118.30130
117.81200
118.30130
118.30130
118.54590
118.30130
117.07800
116.09940
118.30130
118.79060
118.05661
118.30130
118.54590
118.30130
118.54590
118.05661
118.30130
118.54590
118.30130
118.30130
118.30130
118.30130
118.05661
118.30130
117.81200
118.30130
117.32260
117.32260
117.56730
117.81200
117.56730
117.81200
117.81200
117.32260
116.34400
116.58870
116.83331
116.58870
116.83331
116.83331
117.32260
116.34400
116.09940
115.61010
115.61010
115.61010
115.36539
115.12080
115.61010
115.85471
115.36539
115.36539
115.36539
115.12080
114.87611
114.87611
115.12080
114.87611
114.87611
114.63150
114.63150
114.14220
114.38680
114.14220
114.63150
114.87611
114.38680
114.87611
114.63150
114.14220
114.14220
113.89750
114.14220
113.89750
113.65289
113.65289
113.40820
113.40820
112.91890
113.40820
112.91890
113.40820
113.89750
113.40820
113.65289
113.89750
113.65289
113.65289
113.89750
113.65289
113.16360
114.14220
114.38680
113.65289
113.89750
113.89750
113.40820
113.65289
113.89750
113.65289
113.65289
114.14220
114.38680
114.63150
115.61010
115.12080
114.63150
114.38680
113.65289
113.40820
113.40820
113.16360
113.16360
113.16360
113.16360
113.16360
112.42960
113.40820
113.40820
113.16360
113.16360
113.16360
113.16360
111.20631
112.67420
112.91890
112.67420
112.91890
113.16360
112.91890
112.67420
112.91890
112.67420
112.91890
113.16360
112.67420
112.67420
112.91890
113.16360
112.67420
112.91890
111.20631
113.40820
112.91890
112.67420
113.16360
113.65289
113.40820
114.14220
114.87611
114.87611
116.09940
116.34400
116.58870
116.09940
116.34400
116.83331
117.07800
117.07800
116.58870
116.83331
116.58870
116.34400
116.83331
116.83331
117.07800
116.58870
116.58870
117.32260
116.83331
118.79060
116.83331
117.07800
116.58870
116.83331
116.34400
116.58870
116.34400
116.34400
116.34400
116.09940
116.09940
116.34400
115.85471
115.85471
115.85471
115.61010
115.61010
115.61010
115.36539
115.12080
115.61010
115.85471
115.12080
115.12080
114.87611
114.87611
114.38680
114.14220
114.14220
114.38680
114.14220
114.38680
114.38680
114.38680
114.38680
114.38680
114.14220
113.89750
114.14220
113.65289
113.16360
112.91890
112.67420
112.42960
112.42960
112.42960
112.18491
112.18491
112.42960
112.18491
112.42960
111.69560
112.42960
112.42960
111.69560
111.94030
112.18491
112.18491
112.18491
111.94030
111.69560
111.94030
111.94030
112.42960
112.18491
112.18491
111.94030
112.18491
112.18491
111.20631
111.69560
111.69560
111.69560
111.94030
111.94030
112.18491
111.69560
112.18491
111.94030
111.69560
112.18491
110.96170
111.69560
111.20631
111.20631
111.45100
110.22771
109.98310
110.22771
110.71700
110.22771
111.20631
111.45100
111.69560
112.18491
112.18491
112.18491
112.42960
112.67420
112.18491
112.42960
112.18491
112.91890
112.18491
112.42960
111.20631
112.42960
112.42960
112.42960
112.42960
113.16360
112.18491
112.91890
112.91890
112.67420
112.42960
112.42960
112.42960
112.91890
113.16360
112.67420
113.16360
112.91890
112.42960
112.67420
112.91890
112.18491
112.91890
113.16360
112.91890
112.91890
112.91890
112.67420
112.42960
112.42960
113.16360
112.91890
112.67420
113.16360
112.91890
113.16360
112.91890
112.67420
112.91890
112.67420
112.91890
112.91890
112.91890
113.16360
112.91890
112.91890
112.18491
112.42960
112.42960
112.18491
112.91890
112.67420
112.42960
112.42960
112.18491
112.42960
112.67420
112.42960
112.42960
112.18491
112.67420
112.42960
112.42960
112.67420
112.42960
112.42960
112.42960
112.67420
112.91890
113.40820
113.40820
113.40820
112.91890
112.67420
112.67420
112.91890
113.65289
113.89750
114.38680
114.87611
114.87611
115.12080
115.61010
115.36539
115.61010
115.85471
116.09940
116.83331
116.34400
116.58870
116.58870
116.34400
116.83331
116.83331
116.83331
117.32260
116.83331
117.32260
117.56730
117.32260
117.07800
117.32260
117.81200
117.81200
117.81200
118.54590
118.05661
118.05661
117.56730
117.32260
117.81200
118.30130
118.05661
118.54590
118.05661
118.30130
118.05661
118.30130
118.30130
118.30130
118.05661
117.81200
117.32260
118.30130
118.30130
117.81200
117.07800
118.05661
117.81200
117.56730
117.32260
117.32260
117.81200
117.32260
117.81200
117.07800
117.32260
116.83331
117.07800
116.83331
116.83331
117.07800
115.12080
116.58870
116.58870
116.34400
115.85471
116.34400
116.34400
115.85471
116.58870
116.34400
115.61010
115.85471
115.61010
115.85471
115.12080
115.61010
115.61010
115.85471
115.61010
115.36539
114.87611
114.87611
114.63150
114.87611
115.12080
114.63150
114.87611
115.12080
114.63150
114.38680
114.38680
114.87611
114.63150
114.63150
114.63150
114.63150
114.63150
114.14220
113.65289
113.65289
113.89750
113.65289
113.40820
113.40820
113.89750
113.89750
113.89750
113.65289
113.65289
113.89750
113.40820
113.40820
113.65289
113.89750
113.89750
114.14220
113.65289
113.40820
113.40820
113.65289
113.40820
114.14220
113.89750
114.14220
113.65289
113.65289
113.65289
113.89750
113.16360
113.16360
113.89750
113.65289
113.16360
113.65289
113.40820
112.91890
113.16360
113.16360
113.40820
113.40820
113.65289
113.16360
113.40820
113.16360
113.16360
112.91890
112.91890
112.91890
113.65289
113.65289
113.16360
112.91890
112.67420
113.16360
112.91890
112.67420
112.91890
112.91890
112.91890
111.20631
112.91890
113.16360
112.42960
112.67420
113.16360
112.42960
112.67420
112.91890
112.67420
111.20631
112.42960
112.67420
112.42960
113.16360
112.91890
112.67420
112.91890
112.42960
112.67420
112.18491
112.91890
112.42960
112.18491
Run Sequence
Plot
Interpretation We can make the following conclusions from the run sequence plot.
of the Run 1. The data show strong and positive autocorrelation.
Sequence Plot
2. There does not seem to be a significant trend or any obvious
seasonal pattern in the data.
The next step is to examine the sample autocorrelations using the
autocorrelation plot.
Autocorrelation
Plot
Run Sequence
Plot of
Differenced
Data
Interpretation The run sequence plot of the differenced data shows that the mean of
of the Run the differenced data is around zero, with the differenced data less
Sequence Plot autocorrelated than the original data.
The next step is to examine the sample autocorrelations of the
differenced data.
Autocorrelation
Plot of the
Differenced
Data
Partial
Autocorrelation
Plot of the
Differenced
Data
Interpretation The partial autocorrelation plot of the differenced data with 95%
of the Partial confidence bands shows that only the partial autocorrelations of the
Autocorrelation first and second lag are significant. This suggests an AR(2) model for
Plot of the the differenced data.
Differenced
Data
Dataplot Based on the differenced data, Dataplot generated the following estimation output for the
ARMA AR(2) model:
Output
for the
AR(2) #############################################################
Model # NONLINEAR LEAST SQUARES ESTIMATION FOR THE PARAMETERS OF #
# AN ARIMA MODEL USING BACKFORECASTS #
#############################################################
MODEL SPECIFICATION
FACTOR (P D Q) S
1 2 1 0 1
##STEP SIZE
FOR
######PARAMETER
##APPROXIMATING
#################PARAMETER DESCRIPTION STARTING VALUES
#####DERIVATIVE
INDEX #########TYPE ##ORDER ##FIXED ##########(PAR)
##########(STP)
1 AR (FACTOR 1) 1 NO 0.10000000E+00
0.77167549E-06
2 AR (FACTOR 1) 2 NO 0.10000000E+00
0.77168311E-06
3 MU ### NO 0.00000000E+00
0.80630875E-06
0.1000E-09
MAXIMUM SCALED RELATIVE CHANGE IN THE PARAMETERS (STOPP)
0.1489E-07
NONDEFAULT VALUES....
FACTOR 1
AR 1 -0.40604575E+00 0.41885445E-01 -9.69 -0.47505616E+00
-0.33703534E+00
AR 2 -0.16414479E+00 0.41836922E-01 -3.92 -0.23307525E+00
-0.95214321E-01
MU ## -0.52091780E-02 0.11972592E-01 -0.44 -0.24935207E-01
0.14516851E-01
Interpretation The first section of the output identifies the model and shows the starting values for the fit.
of Output This output is primarily useful for verifying that the model and starting values were
correctly entered.
The section labeled "ESTIMATES FROM LEAST SQUARES FIT" gives the parameter
estimates, standard errors from the estimates, and 95% confidence limits for the
parameters. A confidence interval that contains zero indicates that the parameter is not
statistically significant and could probably be dropped from the model.
The model for the differenced data, Yt, is an AR(2) model:
with 0.44.
It is often more convenient to express the model in terms of the original data, Xt, rather
than the differenced data. From the definition of the difference, Yt = Xt - Xt-1, we can make
the appropriate substitutions into the above equation:
Dataplot Alternatively, based on the differenced data Dataplot generated the following estimation
ARMA output for an MA(1) model:
Output for
the MA(1)
Model
#############################################################
# NONLINEAR LEAST SQUARES ESTIMATION FOR THE PARAMETERS OF #
# AN ARIMA MODEL USING BACKFORECASTS #
#############################################################
MODEL SPECIFICATION
FACTOR (P D Q) S
1 0 1 1 1
##STEP SIZE
FOR
######PARAMETER
##APPROXIMATING
#################PARAMETER DESCRIPTION STARTING VALUES
#####DERIVATIVE
INDEX #########TYPE ##ORDER ##FIXED ##########(PAR)
##########(STP)
1 MU ### NO 0.00000000E+00
0.20630657E-05
2 MA (FACTOR 1) 1 NO 0.10000000E+00
0.34498203E-07
NONDEFAULT VALUES....
FACTOR 1
MU ## -0.51160754E-02 0.11431230E-01 -0.45 -0.23950101E-01
0.13717950E-01
MA 1 0.39275694E+00 0.39028474E-01 10.06 0.32845386E+00
0.45706001E+00
Interpretation The model for the differenced data, Yt, is an ARIMA(0,1,1) model:
of the Output
with 0.44.
4-Plot of The 4-plot is a convenient graphical technique for model validation in that it
Residuals from tests the assumptions for the residuals on a single graph.
ARIMA(2,1,0)
Model
Interpretation We can make the following conclusions based on the above 4-plot.
of the 4-Plot 1. The run sequence plot shows that the residuals do not violate the
assumption of constant location and scale. It also shows that most of
the residuals are in the range (-1, 1).
2. The lag plot indicates that the residuals are not autocorrelated at lag 1.
3. The histogram and normal probability plot indicate that the normal
distribution provides an adequate fit for this model.
Autocorrelation In addition, the autocorrelation plot of the residuals from the ARIMA(2,1,0)
Plot of model was generated.
Residuals from
ARIMA(2,1,0)
Model
Interpretation The autocorrelation plot shows that for the first 25 lags, all sample
of the autocorrelations expect those at lags 7 and 18 fall inside the 95% confidence
Autocorrelation bounds indicating the residuals appear to be random.
Plot
Ljung-Box Test Instead of checking the autocorrelation of the residuals, portmanteau tests
for such as the test proposed by Ljung and Box (1978) can be used. In this
Randomness example, the test of Ljung and Box indicates that the residuals are random at
for the the 95% confidence level and thus the model is appropriate. Dataplot
ARIMA(2,1,0) generated the following output for the Ljung-Box test.
Model
LJUNG-BOX TEST FOR RANDOMNESS
1. STATISTICS:
NUMBER OF OBSERVATIONS = 559
LAG TESTED = 24
LAG 1 AUTOCORRELATION = -0.1012441E-02
LAG 2 AUTOCORRELATION = 0.6160716E-02
LAG 3 AUTOCORRELATION = 0.5182213E-02
4-Plot of The 4-plot is a convenient graphical technique for model validation in that it
Residuals from tests the assumptions for the residuals on a single graph.
ARIMA(0,1,1)
Model
Interpretation We can make the following conclusions based on the above 4-plot.
of the 4-Plot 1. The run sequence plot shows that the residuals do not violate the
from the assumption of constant location and scale. It also shows that most of
ARIMA(0,1,1) the residuals are in the range (-1, 1).
Model
2. The lag plot indicates that the residuals are not autocorrelated at lag 1.
3. The histogram and normal probability plot indicate that the normal
distribution provides an adequate fit for this model.
This 4-plot of the residuals indicates that the fitted model is an adequate
model for these data.
Autocorrelation The autocorrelation plot of the residuals from ARIMA(0,1,1) was generated.
Plot of
Residuals from
ARIMA(0,1,1)
Model
Interpretation Similar to the result for the ARIMA(2,1,0) model, it shows that for the first
of the 25 lags, all sample autocorrelations expect those at lags 7 and 18 fall inside
Autocorrelation the 95% confidence bounds indicating the residuals appear to be random.
Plot
Ljung-Box Test The Ljung and Box test is also applied to the residuals from the
for ARIMA(0,1,1) model. The test indicates that the residuals are random at the
Randomness of 99% confidence level, but not at the 95% level.
the Residuals
for the Dataplot generated the following output for the Ljung-Box test.
ARIMA(0,1,1)
LJUNG-BOX TEST FOR RANDOMNESS
Model
1. STATISTICS:
NUMBER OF OBSERVATIONS = 559
LAG TESTED = 24
LAG 1 AUTOCORRELATION = -0.1280136E-01
LAG 2 AUTOCORRELATION = -0.3764571E-02
LAG 3 AUTOCORRELATION = 0.7015200E-01
99 % POINT = 42.97982
1. Run sequence plot of Y. 1. The run sequence plot shows that the
data show strong and positive
autocorrelation.
4. Model validation.
6.7. References
Selected References
Army Chemical Corps (1953). Master Sampling Plans for Single, Duplicate, Double and
Multiple Sampling, Manual No. 2.
Bissell, A. F. (1990). "How Reliable is Your Capability Index?", Applied Statistics, 39,
331-340.
Champ, C.W., and Woodall, W.H. (1987). "Exact Results for Shewhart Control Charts
with Supplementary Runs Rules", Technometrics, 29, 393-399.
Duncan, A. J. (1986). Quality Control and Industrial Statistics, 5th ed., Irwin,
Homewood, IL.
Hotelling, H. (1947). Multivariate Quality Control. In C. Eisenhart, M. W. Hastay, and
W. A. Wallis, eds. Techniques of Statistical Analysis. New York: McGraw-Hill.
Juran, J. M. (1997). "Early SQC: A Historical Supplement", Quality Progress, 30(9)
73-81.
Montgomery, D. C. (2000). Introduction to Statistical Quality Control, 4th ed., Wiley,
New York, NY.
Kotz, S. and Johnson, N. L. (1992). Process Capability Indices, Chapman & Hall,
London.
Lowry, C. A., Woodall, W. H., Champ, C. W., and Rigdon, S. E. (1992). "A Multivariate
Exponentially Weighted Moving Average Chart", Technometrics, 34, 46-53.
Lucas, J. M. and Saccucci, M. S. (1990). "Exponentially weighted moving average
control schemes: Properties and enhancements", Technometrics 32, 1-29.
Ott, E. R. and Schilling, E. G. (1990). Process Quality Control, 2nd ed., McGraw-Hill,
New York, NY.
Quesenberry, C. P. (1993). "The effect of sample size on estimated limits for and X
control charts", Journal of Quality Technology, 25(4) 237-247.
Ryan, T.P. (2000). Statistical Methods for Quality Improvement, 2nd ed., Wiley, New
York, NY.
Ryan, T. P. and Schwertman, N. C. (1997). "Optimal limits for attributes control charts",
Journal of Quality Technology, 29 (1), 86-98.
Schilling, E. G. (1982). Acceptance Sampling in Quality Control, Marcel Dekker, New
York, NY.
Tracy, N. D., Young, J. C. and Mason, R. L. (1992). "Multivariate Control Charts for
Individual Observations", Journal of Quality Technology, 24(2), 88-95.
Woodall, W. H. (1997). "Control Charting Based on Attribute Data: Bibliography and
Review", Journal of Quality Technology, 29, 172-183.
Woodall, W. H., and Adams, B. M. (1993); "The Statistical Design of CUSUM Charts",
Quality Engineering, 5(4), 559-570.
Zhang, Stenback, and Wardrop (1990). "Interval Estimation of the Process Capability
Index", Communications in Statistics: Theory and Methods, 19(21), 4455-4470.
Statistical Analysis
Anderson, T. W. (1984). Introduction to Multivariate Statistical Analysis, 2nd ed., Wiley
New York, NY.
Johnson, R. A. and Wichern, D. W. (1998). Applied Multivariate Statistical Analysis,
Fourth Ed., Prentice Hall, Upper Saddle River, NJ.
1. Introduction [7.1.]
1. What is the scope? [7.1.1.]
2. What assumptions are typically made? [7.1.2.]
3. What are statistical tests? [7.1.3.]
1. Critical values and p values [7.1.3.1.]
4. What are confidence intervals? [7.1.4.]
5. What is the relationship between a test and a confidence interval? [7.1.5.]
6. What are outliers in the data? [7.1.6.]
7. What are trends in sequential process or product data? [7.1.7.]
5. References [7.5.]
7.1. Introduction
Goals of this The primary goal of this section is to lay a foundation for understanding
section statistical tests and confidence intervals that are useful for making
decisions about processes and comparisons among processes. The
materials covered are:
● Scope
● Assumptions
● Introduction to hypothesis testing
● Introduction to confidence intervals
● Relationship between hypothesis testing and confidence intervals
● Outlier detection
● Detection of sequential trends in data or processes
Hypothesis This chapter explores the types of comparisons which can be made from
testing and data and explains hypothesis testing, confidence intervals, and the
confidence interpretation of each.
intervals
Validity of tests The validity of the tests described in this chapter depend on the
following assumptions:
1. The data come from a single process that can be represented
by a single statistical distribution.
2. The distribution is a normal distribution.
3. The data are uncorrelated over time.
Concept of A classic use of a statistical test occurs in process control studies. For
null example, suppose that we are interested in ensuring that photomasks in a
hypothesis production process have mean linewidths of 500 micrometers. The null
hypothesis, in this case, is that the mean linewidth is 500 micrometers.
Implicit in this statement is the need to flag photomasks which have
mean linewidths that are either much greater or much less than 500
micrometers. This translates into the alternative hypothesis that the
mean linewidths are not equal to 500 micrometers. This is a two-sided
alternative because it guards against alternatives in opposite directions;
namely, that the linewidths are too small or too large.
The testing procedure works this way. Linewidths at random positions
on the photomask are measured using a scanning electron microscope. A
test statistic is computed from the data and tested against pre-determined
upper and lower critical values. If the test statistic is greater than the
upper critical value or less than the lower critical value, the null
hypothesis is rejected because there is evidence that the mean linewidth
is not 500 micrometers.
One-sided Null and alternative hypotheses can also be one-sided. For example, to
tests of ensure that a lot of light bulbs has a mean lifetime of at least 500 hours,
hypothesis a testing program is implemented. The null hypothesis, in this case, is
that the mean lifetime is greater than or equal to 500 hours. The
complement or alternative hypothesis that is being guarded against is
that the mean lifetime is less than 500 hours. The test statistic is
compared with a lower critical value, and if it is less than this limit, the
null hypothesis is rejected.
Thus, a statistical test requires a pair of hypotheses; namely,
● H0: a null hypothesis
Significance The null hypothesis is a statement about a belief. We may doubt that the
levels null hypothesis is true, which might be why we are "testing" it. The
alternative hypothesis might, in fact, be what we believe to be true. The
test procedure is constructed so that the risk of rejecting the null
hypothesis, when it is in fact true, is small. This risk, , is often
referred to as the significance level of the test. By having a test with a
small value of , we feel that we have actually "proved" something
when we reject the null hypothesis.
Errors of The risk of failing to reject the null hypothesis when it is in fact false is
the second not chosen by the user but is determined, as one might expect, by the
kind magnitude of the real discrepancy. This risk, , is usually referred to as
the error of the second kind. Large discrepancies between reality and the
null hypothesis are easier to detect and lead to small errors of the second
kind; while small discrepancies are more difficult to detect and lead to
large errors of the second kind. Also the risk increases as the risk
decreases. The risks of errors of the second kind are usually summarized
by an operating characteristic curve (OC) for the test. OC curves for
several types of tests are shown in (Natrella, 1962).
Guidance in This chapter gives methods for constructing test statistics and their
this chapter corresponding critical values for both one-sided and two-sided tests for
the specific situations outlined under the scope. It also provides
guidance on the sample sizes required for these tests.
Further guidance on statistical hypothesis testing, significance levels and
critical regions, is given in Chapter 1.
Information in This chapter gives formulas for the test statistics and points to the
this chapter appropriate tables of critical values for tests of hypothesis regarding
means, standard deviations, and proportion defectives.
Good practice It is good practice to decide in advance of the test how small a p-value
is required to reject the test. This is exactly analagous to choosing a
significance level, for test. For example, we decide either to reject
the null hypothesis if the test statistic exceeds the critical value (for
= 0.05) or analagously to reject the null hypothesis if the p-value is
smaller than 0.05. It is important to understand the relationship
between the two concepts because some statistical software packages
report p-values rather than critical values.
One and In the same way that statistical tests can be one or two-sided, confidence
two-sided intervals can be one or two-sided. A two-sided confidence interval
confidence brackets the population parameter from above and below. A one-sided
intervals confidence interval brackets the population parameter either from above
or below and furnishes an upper or lower bound to its magnitude.
Guidance in This chapter provides methods for estimating the population parameters
this chapter and confidence intervals for the situations described under the scope.
Problem In the normal course of events, population standard deviations are not
with known, and must be estimated from the data. Confidence intervals,
unknown given the same confidence level, are by necessity wider if the standard
standard deviation is estimated from limited data because of the uncertainty in
deviation this estimate. Procedures for creating confidence intervals in this
situation are described fully in this chapter.
More information on confidence intervals can also be found in Chapter
1.
Hypothesis test
for the mean For the test, the sample mean, , is calculated from N linewidths
chosen at random positions on each photomask. For the purpose of
the test, it is assumed that the standard deviation, , is known from a
long history of this process. A test statistic is calculated from these
sample statistics, and the null hypothesis is rejected if:
Equivalent With some algebra, it can be seen that the null hypothesis is rejected
confidence if and only if the value 500 micrometers is not in the confidence
interval interval
Equivalent In fact, all values bracketed by this interval would be accepted as null
confidence values for a given set of test data.
interval
Box plot The box plot is a useful graphical display for describing the behavior of
construction the data in the middle as well as at the ends of the distributions. The box
plot uses the median and the lower and upper quartiles (defined as the
25th and 75th percentiles). If the lower quartile is Q1 and the upper
quartile is Q2, then the difference (Q2 - Q1) is called the interquartile
range or IQ.
Box plots A box plot is constructed by drawing a box between the upper and lower
with fences quartiles with a solid line drawn across the box to locate the median.
The following quantities (called fences) are needed for identifying
extreme values in the tails of the distribution:
1. lower inner fence: Q1 - 1.5*IQ
2. upper inner fence: Q2 + 1.5*IQ
3. lower outer fence: Q1 - 3*IQ
4. upper outer fence: Q2 + 3*IQ
From an examination of the fence points and the data, one point (1441)
exceeds the upper inner fence and stands out as a mild outlier; there are
no extreme outliers.
JMP Output from a JMP command is shown below. The plot shows a
software histogram of the data on the left and a box plot with the outlier
output identified as a point on the right. Clicking on the outlier while in JMP
showing the identifies the data point as 1441.
outlier box
plot
Other trend A non-parametric approach for detecting significant trends known as the
tests Reverse Arrangement Test is described in Chapter 8.
Hypothesis Goodness-of-fit tests are a form of hypothesis testing where the null
Test model for and alternative hypotheses are
Goodness-of-fit
H0: Sample data come from the stated distribution.
HA: Sample data do not come from the stated distribution.
Problems with A second issue that affects a test is whether the data are censored.
censored data When data are censored, sample values are in some way restricted.
Censoring occurs if the range of potential values are limited such that
values from one or both tails of the distribution are unavailable (e.g.,
right and/or left censoring - where high and/or low values are
missing). Censoring frequently occurs in reliability testing, when
either the testing time or the number of failures to be observed is
fixed in advance. A thorough treatment of goodness-of-fit testing
under censoring is beyond the scope of this document. See
D'Agostino & Stephens (1986) for more details.
Rule-of-thumb One rule-of-thumb suggests using the value 2n2/5 as a good starting
for number of point for choosing the number of groups. Another well known
groups rule-of-thumb requires every group to have at least 5 data points.
K-S Details on the construction and interpretation of the K-S test statistic, D,
procedure and examples for several distributions are outlined in Chapter 1.
The Critical values associated with the test statistic, D, are difficult to
probability compute for finite sample sizes, often requiring Monte Carlo simulation.
associated However, some general purpose statistical software programs, including
with the test Dataplot, support the Kolmogorov-Smirnov test at least for some of the
statistic is more common distributions. Tabled values can be found in Birnbaum
difficult to (1952). A correction factor can be applied if the parameters of the
compute. distribution are estimated with the same data that are being tested. See
D'Agostino and Stephens (1986) for details.
Requires critical The Anderson-Darling test makes use of the specific distribution
values for each in calculating critical values. This has the advantage of allowing
distribution a more sensitive test and the disadvantage that critical values
must be calculated for each distribution. Tables of critical values
are not given in this handbook (see Stephens 1974, 1976, 1977,
and 1979) because this test is usually applied with a statistical
software program that produces the relevant critical values.
Currently, Dataplot computes critical values for the
Anderson-Darling test for the following distributions:
● normal
● lognormal
● Weibull
where the x(i) are the ordered sample values (x(1) is the smallest)
and the ai are constants generated from the means, variances and
covariances of the order statistics of a sample of size n from a
normal distribution (see Pearson and Hartley (1972, Table 15).
Typical null The corresponding null hypotheses that test the true mean, , against
hypotheses the standard or assumed mean, are:
1.
2.
3.
Test statistic The basic statistics for the test are the sample mean and the standard
where the deviation. The form of the test statistic depends on whether the
standard poulation standard deviation, , is known or is estimated from the data
deviation is at hand. The more typical case is where the standard deviation must be
not known estimated from the data, and the test statistic is
2.
3.
Test statistic If the standard deviation is known, the form of the test statistic is
where the
standard
deviation is
known
For case (1), the test statistic is compared with , which is the upper
critical value from the standard normal distribution, and similarly
for cases (2) and (3).
Caution If the standard deviation is assumed known for the purpose of this test,
this assumption should be checked by a test of hypothesis for the
standard deviation.
The test is Over a long run the process average for wafer particle counts has been
two-sided 50 counts per wafer, and on the basis of the sample, we want to test
whether a change has occurred. The null hypothesis that the process
mean is 50 counts is tested against the alternative hypothesis that the
process mean is not equal to 50 counts. The purpose of the two-sided
alternative is to rule out a possible process change in either direction.
Even though there is a history on this process, it has not been stable
enough to justify the assumption that the standard deviation is known.
Therefore, the appropriate test statistic is the t-statistic. Substituting the
sample mean, sample standard deviation, and sample size into the
formula for the test statistic gives a value of
t = 1.782
with degrees of freedom = N - 1 = 9. This value is tested against the
upper critical value
t0.025;9 = 2.262
from the t-table where the critical value is found under the column
labeled 0.025 for the probability of exceeding the critical value and in
the row for 9 degrees of freedom. The critical value is used instead
of because of the two-sided alternative (two-tailed test) which
requires equal probabilities in each tail of the distribution that add to .
Conclusion Because the value of the test statistic falls in the interval (-2.262, 2.262),
we cannot reject the null hypothesis and, therefore, we may continue to
assume the process mean is 50 counts.
General form Tests of hypotheses that can be made from a single sample of data
of confidence were discussed on the foregoing page. As with null hypotheses,
intervals confidence intervals can be two-sided or one-sided, depending on the
where the question at hand. The general form of confidence intervals, for the
standard three cases discussed earlier, where the standard deviation is unknown
deviation is are:
unknown 1. Two-sided confidence interval for :
Confidence The confidence intervals are constructed so that the probability of the
level interval containing the mean is 1 - . Such intervals are referred to as
100(1- )% confidence intervals.
Interpretation The 95% confidence interval includes the null hypothesis if, and only
if, it would be accepted at the 5% level. This interval includes the null
hypothesis of 50 counts so we cannot reject the hypothesis that the
process mean for particle counts is 50. The confidence interval
includes all null hypothesis values for the population mean that would
be accepted by an hypothesis test at the 5% significance level. This
assumes, of course, a two-sided alternative.
Application - For example, suppose that we wish to estimate the average daily yield,
estimating a , of a chemical process by the mean of a sample, Y1, ..., YN, such that
minimum the error of estimation is less than with a probability of 95%. This
sample size, means that a 95% confidence interval centered at the sample mean
N, for should be
limiting the
error in the
estimate of
the mean
and if the standard deviation is known,
The upper critical value from the normal distribution for = 0.025
is 1.96. Therefore,
Controlling To control the risk of accepting a false hypothesis, we set not only ,
the risk of the probability of rejecting the null hypothesis when it is true, but also
accepting a , the probability of accepting the null hypothesis when in fact the
false
hypothesis population mean is where is the difference or shift we want to
detect.
Standard The minimum sample size, N, is shown below for two- and one-sided
deviation tests of hypotheses with assumed to be known.
assumed to
be known
The quantities and are upper critical values from the normal
distribution.
More often The procedures for computing sample sizes when the standard
we must deviation is not known are similar to, but more complex, than when the
compute the standard deviation is known. The formulation depends on the
sample size t-distribution where the minimum sample size is given by
with the
population
standard
deviation
being
unknown
The drawback is that critical values of the t-distribution depend on
known degrees of freedom, which in turn depend upon the sample size
which we are trying to estimate.
Iterate on the Therefore, the best procedure is to start with an intial estimate based on
initial a sample standard deviation and iterate. Take the example discussed
estimate above where the the minimum sample size is computed to be N = 9.
using critical This estimate is low. Now use the formula above with degrees of
values from freedom N - 1 = 8 which gives a second estimate of
the t-table
N = (1.860 + 1.397)2 = 10.6 ~11.
It is possible to apply another iteration using degrees of freedom 10,
but in practice one iteration is usually sufficient. For the purpose of this
example, results have been rounded to the closest integer; however,
computer programs for finding critical values from the t-distribution
allow non-integer degrees of freedom.
Table The table below gives sample sizes for a two-sided test of hypothesis
showing that the mean is a given value, with the shift to be detected a multiple
minimum of the standard deviation. For a one-sided test at significance level ,
sample sizes look under the value of 2 in column 1.
for a
two-sided test Sample Size Table for Two-Sided Tests
.01 .01 98 25 11
.01 .05 73 18 8
.01 .10 61 15 7
.01 .20 47 12 6
.01 .50 27 7 3
.05 .01 75 19 9
.05 .05 53 13 6
.05 .10 43 11 5
.05 .20 33 8 4
.05 .50 16 4 3
.10 .01 65 16 8
.10 .05 45 11 5
.10 .10 35 9 4
.10 .20 25 7 3
.10 .50 11 3 3
.20 .01 53 14 6
.20 .05 35 9 4
.20 .10 27 7 3
.20 .20 19 5 3
.20 .50 7 3 3
Corresponding The corresponding null hypotheses that test the true standard
null deviation, , against the nominal value, are:
hypotheses
1. H0: =
2. H0: <=
3. H0: >=
1.
2.
3.
where is the upper critical value from the chi-square
distribution with N-1 degrees of freedom and similarly for cases (2)
and (3). Critical values can be found in the chi-square table in Chapter
1.
Example A supplier of 100 ohm.cm silicon wafers claims that his fabrication
process can produce wafers with sufficient consistency so that the
standard deviation of resistivity for the lot does not exceed 10
ohm.cm. A sample of N = 10 wafers taken from the lot has a standard
deviation of 13.97 ohm.cm. Is the suppliers claim reasonable? This
question falls under null hypothesis (2) above. For a test at
significance level, = 0.05, the test statistic,
Since the test statistic (17.56) exceeds the critical value (16.92) of the
chi-square distribution with 9 degrees of freedom, the manufacturer's
claim is rejected.
where for case (1) is the upper critical value from the
chi-square distribution with N-1 degrees of freedom and similarly for
cases (2) and (3). Critical values can be found in the chi-square table in
Chapter 1.
Choice of Confidence interval (1) is equivalent to a two-sided test for the standard
risk level deviation. That is, if the hypothesized or nominal value, , is not
can change
contained within these limits, then the hypothesis that the standard
the
deviation is equal to the nominal value is rejected.
conclusion
Alternatives This procedure is stated in terms of changes in the variance, not the standard
are specific deviation, which makes it somewhat difficult to interpret. Tests that are generally of
departures interest are stated in terms of , a discrepancy from the hypothesized variance. For
from the null example:
hypothesis 1. Is the true variance larger than its hypothesized value by ?
2. Is the true variance smaller than its hypothesized value by ?
That is, the tests of interest are:
1. H0:
2. H0:
Interpretation The experimenter wants to assure that the probability of erroneously accepting the
null hypothesis of unchanged variance is at most . The sample size, N, required
for this type of detection depends on the factor, ; the significance level, ; and
the risk, .
First choose The sample size is determined by first choosing appropriate values of and and
the level of then following the directions below to find the degrees of freedom, , from the
significance chi-square distribution.
and beta risk
2.
N= +1
Hints on The quantity is the critical value from the chi-square distribution with
using
software degrees of freedom which is exceeded with probability . It is sometimes referred
packages to to as the percent point function (PPF) or the inverse chi-square function. The
do the probability that is evaluated to get is called the cumulative density function
calculations (CDF).
Example Consider the case where the variance for resistivity measurements on a lot of
silicon wafers is claimed to be 100 ohm.cm. A buyer is unwilling to accept a
shipment if is greater than 55 ohm.cm for a particular lot. This problem falls
under case (1) above. The question is how many samples are needed to assure risks
of = 0.05 and = .01.
Calculations The procedure for performing these calculations using Dataplot is as follows:
using
Dataplot let d=55
let var = 100
let r = 1 + d/(var)
let function cnu=chscdf(chsppf(.95,nu)/r,nu) - 0.01
let a = roots cnu wrt nu for nu = 1 200
Dataplot returns a value of 169.5. Therefore, the minimum sample size needed to
guarantee the risk level is N = 170.
Alternatively, we could generate a table using the following Dataplot commands:
let d=55
let var = 100
let r = 1 + d/(var)
let nu = 1 1 200
let bnu = chsppf(.95,nu)
let bnu=bnu/r
let cnu=chscdf(bnu,nu)
print nu bnu cnu for nu = 165 1 175
Dataplot The Dataplot output, for calculations between 165 and 175 degrees of freedom, is
output shown below.
VARIABLES--
NU BNU CNU
0.1650000E+03 0.1264344E+03 0.1136620E-01
0.1660000E+03 0.1271380E+03 0.1103569E-01
0.1670000E+03 0.1278414E+03 0.1071452E-01
0.1680000E+03 0.1285446E+03 0.1040244E-01
0.1690000E+03 0.1292477E+03 0.1009921E-01
0.1700000E+03 0.1299506E+03 0.9804589E-02
0.1710000E+03 0.1306533E+03 0.9518339E-02
0.1720000E+03 0.1313558E+03 0.9240230E-02
0.1730000E+03 0.1320582E+03 0.8970034E-02
0.1740000E+03 0.1327604E+03 0.8707534E-02
0.1750000E+03 0.1334624E+03 0.8452513E-02
The value of which is closest to 0.01 is 0.010099; this has degrees of freedom
= 169. Therefore, the minimum sample size needed to guarantee the risk level is
N = 170.
Calculations The procedure for doing the calculations using an EXCEL spreadsheet is shown
using EXCEL below. The EXCEL calculations begin with 1 degree of freedom and iterate to the
correct solution.
Test statistic Given a random sample of measurements Y1, ..., YN from a population,
based on a the proportion of items that are judged defective from these N
normal measurements is denoted . The test statistic
approximation
Restriction on Because the test is approximate, N needs to be large for the test to be
sample size valid. One criterion is that N should be chosen so that
min{Np0, N(1 - p0)} >= 5
Interpretation Because the test statistic is less than the critical value (1.645), we
cannot reject hypothesis (3) and, therefore, we cannot conclude that
the new fabrication method is degrading the quality of the wafers. The
new process may, indeed, be worse, but more evidence would be
needed to reach that conclusion at the 95% confidence level.
Formulas
for the
confidence
intervals
Procedure This approach can be substantiated on the grounds that it is the exact
does not algebraic counterpart to the (large-sample) hypothesis test given in
strongly section 7.2.4 and is also supported by the research of Agresti and Coull.
depend on One advantage of this procedure is that its worth does not strongly
values of p depend upon the value of n and/or p, and indeed was recommended by
and n Agresti and Coull for virtually all combinations of n and p.
Another Another advantage is that the lower limit cannot be negative. That is not
advantage is true for the confidence expression most frequently used:
that the
lower limit
cannot be
negative
A confidence limit approach that produces a lower limit which is an
impossible value for the parameter for which the interval is constructed
is an inferior approach. This also applies to limits for the control charts
that are discussed in Chapter 6.
p 0.09577
Conclusion Since the lower bound does not exceed 0.10, in which case it would
from the exceed the hypothesized value, the null hypothesis that the proportion
example defective is at most .10, which was given in the preceding section,
would not be rejected if we used the confidence interval to test the
hypothesis. Of course a confidence interval has value in its own right
and does not have to be used for hypothesis testing.
Constrution If the number of failures is very small or if the sample size N is very
of exact small, symmetical confidence limits that are approximated using the
two-sided normal distribution may not be accurate enough for some applications.
confidence An exact method based on the binomial distribution is shown next. To
intervals construct a two-sided confidence interval at the 100(1 - )% confidence
based on the level for the true proportion defective p where Nd defects are found in a
binomial sample of size N follow the steps below.
distribution 1. Solve the equation
Note The interval {pL, pU} is an exact 100(1 - )% confidence interval for p.
However, it is not symmetric about the observed proportion defective,
.
Example of The equations above that determine pL and pU can easily be solved
calculation using functions built into EXCEL. Take as an example the situation
of upper where twenty units are sampled from a continuous production line and
limit for four items are found to be defective. The proportion defective is
binomial estimated to be = 4/20 = 0.20. The calculation of a 90% confidence
confidence
interval for the true proportion defective, p, is demonstrated using
intervals
EXCEL spreadsheets.
using
EXCEL
Final step 4. Click OK in the GOAL SEEK box. The number in A1 will
change from 0.5 to PU. The picture below shows the final result.
Example of The calculation of the lower limit is similar. To solve for pL:
calculation 1. Open an EXCEL spreadsheet and put the starting value of 0.5 in
of lower the A1 cell.
limit for
binomial 2. Put =BINOMDIST(Nd -1, N, A1, TRUE) in B1, where Nd -1 = 3
confidence and N = 20.
limits using 3. Open the Tools menu and click on GOAL SEEK. The GOAL
EXCEL SEEK box requires 3 entries.
❍ B1 in the "Set Cell" box
Final step 4. Click OK in the GOAL SEEK box. The number in A1 will
change from 0.5 to pL. The picture below shows the final result.
Calculations This computation can also be performed using the following Dataplot
using program.
Dataplot
. Initalize
let p = 0.5
let nd = 4
let n = 20
. Define the functions
let function fu = bincdf(4,p,20) - 0.05
let function fl = bincdf(3,p,20) - 0.95
. Calculate the roots
let pu = roots fu wrt p for p = .01 .99
let pl = roots fl wrt p for p = .01 .99
. print the results
let pu1 = pu(1)
let pl1 = pl(1)
print "PU = ^pu1"
print "PL = ^pl1"
Dataplot generated the following results.
PU = 0.401029
PL = 0.071354
Interpretation This requirement on the sample size only guarantees that a change of size
and sample is detected with 50% probability. The derivation of the sample size
size for high when we are interested in protecting against a change with probability
probability of
detecting a 1- (where is small) is
change 1. For a two-sided test
where is the upper critical value from the normal distribution that is
exceeded with probability .
Value for the The equations above require that p be known. Usually, this is not the
true case. If we are interested in detecting a change relative to an historical or
proportion hypothesized value, this value is taken as the value of p for this purpose.
defective Note that taking the value of the proportion defective to be 0.5 leads to
the largest possible sample size.
Example of Suppose that a department manager needs to be able to detect any change
calculating above 0.10 in the current proportion defective of his product line, which
sample size is running at approximately 10% defective. He is interested in a one-sided
for testing test and does not want to stop the line except when the process has clearly
proportion degraded and, therefore, he chooses a significance level for the test of
defective 5%. Suppose, also, that he is willing to take a risk of 10% of failing to
detect a change of this magnitude. With these criteria:
1. z.05 = 1.645; z.10=1.282
2. = 0.10
3. p = 0.10
and the minimum sample size for a one-sided test procedure is
Test statistic If, for a sample of area A with a defect density target of D0, a defect
based on a count of C is observed, then the test statistic
normal
approximation
can be used exactly as shown in the discussion of the test statistic for
fraction defectives in the preceding section.
Testing the For example, after choosing a sample size of area A (see below for
hypothesis sample size calculation) we can reject that the process defect density is
that the less than or equal to the target D0 if the number of defects C in the
process defect sample is greater than CA, where
density is less
than or equal
to D0
and Z is the upper 100x(1- ) percentile of the standard normal
distribution. The test significance level is 100x(1- ). For a 90%
significance level use Z = 1.282 and for a 95% test use Z = 1.645.
is the maximum risk that an acceptable process with a defect
density at least as low as D0 "fails" the test.
Choice of In order to determine a suitable area A to examine for defects, you first
sample size need to choose an unacceptable defect density level. Call this
(or area) to unacceptable defect density D1 = kD0, where k > 1.
examine for
defects We want to have a probability of less than or equal to is of
"passing" the test (and not rejecting the hypothesis that the true level is
D0 or better) when, in fact, the true defect level is D1 or worse.
Typically will be .2, .1 or .05. Then we need to count defects in a
sample size of area A, where A is equal to
Example Suppose the target is D0 = 4 defects per wafer and we want to verify a
new process meets that target. We choose = .1 to be the chance of
failing the test if the new process is as good as D0 ( = the Type I
error probability or the "producer's risk") and we choose = .1 for the
chance of passing the test if the new process is as bad as 6 defects per
wafer ( = the Type II error probability or the "consumer's risk").
That means Z = 1.282 and Z1- = -1.282.
which we round up to 9.
The test criteria is to "accept" that the new process meets target unless
the number of defects in the sample of 9 wafers exceeds
In other words, the reject criteria for the test of the new process is 44
or more defects in the sample of 9 wafers.
Note: Technically, all we can say if we run this test and end up not
rejecting is that we do not have statistically significant evidence that
the new process exceeds target. However, the way we chose the
sample size for this test assures us we most likely would have had
statistically significant evidence for rejection if the process had been
as bad as 1.5 times the target.
Various Several types of intervals about the mean that contain a large
methods percentage of the population values are discussed in this section.
● Approximate intervals that contain most of the population values
● Percentiles
● Tolerance intervals for a normal distribution
● Tolerance intervals using EXCEL
● Tolerance intervals based on the smallest and largest
observations
Intervals The Bienayme-Chebyshev rule states that regardless of how the data
that apply to are distributed, the percentage of observations that are contained within
any a distance of k tandard deviations of the mean is at least (1 -
distribution
1/k2)100%.
Exact The Bienayme-Chebyshev rule is conservative because it applies to any
intervals for distribution. For a normal distribution, a higher percentage of the
the normal observations are contained within k standard deviations of the mean as
distribution shown in the following table.
Percentage of observations contained between the mean and k
standard deviations
k, No. of
Normal
Standard Empircal Rule Bienayme-Chebychev
Distribution
Deviations
1 67% N/A 68.26%
2 90-95% at least 75% 95.44%
3 99-100% at least 88.89% 99.73%
4 N/A at least 93.75% 99.99%
7.2.6.2. Percentiles
Definitions of For a series of measurements Y1, ..., YN, denote the data ordered in
order increasing order of magnitude by Y[1], ..., Y[N]. These ordered data are
statistics and called order statistics. If Y[j] is the order statistic that corresponds to the
ranks
measurement Yi, then the rank for Yi is j; i.e.,
Definition of Order statistics provide a way of estimating proportions of the data that
percentiles should fall above and below a given value, called a percentile. The pth
percentile is a value, Y(p), such that at most (100p)% of the
measurements are less than this value and at most 100(1- p)% are
greater. The 50th percentile is called the median.
Percentiles split a set of ordered data into hundredths. (Deciles split
ordered data into tenths). For example, 70% of the data should fall
below the 70th percentile.
Example and For the purpose of illustration, twelve measurements from a gage study
interpretation are shown below. The measurements are resistivities of silicon wafers
measured in ohm.cm.
1 95.1772 95.0610 9
2 95.1567 95.0925 6
3 95.1937 95.1065 10
4 95.1959 95.1195 11
5 95.1442 95.1442 5
6 95.0610 95.1567 1
7 95.1591 95.1591 7
8 95.1195 95.1682 4
9 95.1065 95.1772 3
10 95.0925 95.1937 2
11 95.1990 95.1959 12
12 95.1682 95.1990 8
Note that Some software packages (EXCEL, for example) set 1+p(N-1) equal to
there are k + d, then proceed as above. The two methods give fairly similar
other ways of results.
calculating
percentiles in A third way of calculating percentiles (given in some elementary
common use textbooks) starts by calculating pN. If that is not an integer, round up to
the next highest integer k and use Y[k] as the percentile estimate. If pN
is an integer k, use .5(Y[k] +Y[k+1]).
Difference Confidence limits are limits within which we expect a given population
between parameter, such as the mean, to lie. Statistical tolerance limits are limits within
confidence which we expect a stated proportion of the population to lie. Confidence
and tolerance intervals shrink towards zero as the sample size increases. Tolerance intervals
intervals tend towards a fixed value as the sample size increases.
Three types of Three types of questions can be addressed by tolerance intervals. Question (1)
tolerance leads to a two-sided interval; questions (2) and (3) lead to one-sided intervals.
intervals 1. What interval will contain p percent of the population measurements?
2. What interval guarantees that p percent of population measurements will
not fall below a lower limit?
3. What interval guarantees that p percent of population measurements will
not exceed an upper limit?
Tolerance For the questions above, the corresponding tolerance intervals are defined by
intervals for lower (L) and upper (U) tolerance limits which are computed from a series of
measurements measurements Y1, ..., YN :
from a
normal 1.
distribution
2.
3.
where the k factors are determined so that the intervals cover at least a
proportion p of the population with confidence, .
Calculation If the data are from a normally distributed population, an approximate value for
of k factor for the factor as a function of p and for a two-sided tolerance interval (Howe,
a two-sided 1969) is
tolerance
limit for a
normal
distribution
Example of For example, suppose that we take a sample of N = 43 silicon wafers from a lot
calculation and measure their thicknesses in order to find tolerance limits within which a
proportion p = 0.90 of the wafers in the lot fall with probability = 0.99.
Use of tables Values of the k factor as a function of p and are tabulated in some textbooks,
in calculating such as Dixon and Massey (1969). To use the tables in this handbook, follow the
two-sided steps outlined below:
tolerance
1. Calculate = (1 - p)/2 = 0.05
intervals
2. Go to the table of upper critical values of the normal distribution and
4. Calculate
The tolerance limits are then computed from the sample mean, , and standard
deviation, s, according to case (1).
Important The notation for the critical value of the chi-square distribution can be
note confusing. Values as tabulated are, in a sense, already squared; whereas the
critical value for the normal distribution must be squared in the formula above.
Another note The notation for tail probabilities in Dataplot is the converse of the notation used
in this handbook. Therefore, in the example above it is necessary to specify the
critical value for the chi-square distribution, say, as chsppf(1-.99, 42) and
similarly for the critical value for the normal distribution.
Direct Dataplot also has an option for calculating tolerance intervals directly from the
calculation of data. The commands for producing tolerance intervals from twenty-five
tolerance measurements of resistivity from a quality control study at a confidence level of
intervals 99% are:
using
Dataplot read 100ohm.dat cr wafer mo day h min op hum ...
probe temp y sw df
tolerance y
Automatic output is given for several levels of coverage, and the tolerance
interval for 90% coverage is shown below in bold:
NUMBER OF OBSERVATIONS = 25
SAMPLE MEAN = 97.069832
SAMPLE STANDARD DEVIATION = 0.26798090E-01
CONFIDENCE = 99.%
COVERAGE (%) LOWER LIMIT UPPER LIMIT
50.0 97.04242 97.09724
75.0 97.02308 97.11658
90.0 97.00299 97.13667
95.0 96.99020 97.14946
99.0 96.96522 97.17445
99.9 96.93625 97.20341
where is the critical value from the normal distribution that is exceeded
with probability 1-p and is the critical value from the normal distribution
that is exceeded with probability 1- .
Dataplot For the example above, it may also be of interest to guarantee with 0.99
commands for probability (or 99% confidence) that 90% of the wafers have thicknesses less
calculating than an upper tolerance limit. This problem falls under case (3), and the Dataplot
the k factor commands for calculating the factor for the one-sided tolerance interval are:
for a
one-sided let n = 43
tolerance let p = .90
interval let g = .99
let nu = n-1
let zp = norppf(p)
let zg=norppf(g)
let a = 1 - ((zg**2)/(2*nu))
let b = zp**2 - (zg**2)/n
let k1 = (zp + (zp**2 - a*b)**.5)/a
and the output is:
Interative Unfortunately, r can only be found by iteration from the integral above which defines
method limits within which p percent of the normal distribution lies. An EXCEL calculation is
illustrated below for the same problem as on the previous page except where N= 220
measurements are made of thickness. We wish to find tolerance intervals that contain a
proportion p = 0.90 of the wafers with probability = 0.99.
The EXCEL commands for this calculation are shown below. The calculations are
approximate and depend on the starting value for r, which is taken to be zero in this
example. Calculations should be correct to three signficant digits.
Iteration Click on the green V (not shown here) or press the Enter key. Click on TOOLS and then
step in on GOALSEEK. A drop down menu appears. Then,
EXCEL ● Enter C1 (if it is not already there) in the cell in the row labeled: "Set cell:"
● Enter 0.9 (which is p) in the cell at the row labeled: "To value:"
● Press Enter
Calculation You can also perform this calculation using the following Dataplot macro.
in Dataplot
. Initialize
let r = 0
let n = 220
let c1 = 1/sqrt(n)
. Compute R
let function f = norcdf(c+r) - norcdf(c-r) - 0.9
let z = roots f wrt r for r = -4 4
let r = z(1)
. Compute K2
let c2 = (n-1)
let k2 = r*sqrt(c2/chsppf(0.01,c2))
. Print results
print "R = ^r"
print "K2 = ^k2"
Dataplot generates the following output.
R = 1.644854
K2 = 1.849208
Tolerance One obvious choice for a two-sided tolerance interval for an unknown
intervals based distribution is the interval between the smallest and largest observations from
on largest and a sample of Y1, ..., YN measurements. This choice does not allow us to
smallest choose the confidence and coverage levels that are desired, but it does permit
observations calculation of' combinations of confidence and coverage that match this
choice.
Dataplot The Dataplot commands for calculating confidence and coverage levels
calculations for corresponding to a tolerance interval defined as the interval between the
distribution-free smallest and largest observations are given below. The commands that are
tolerance invoked for twenty-five measurements of resistivity from a quality control
intervals study are the same as for producing tolerance intervals for a normal
distribution; namely,
What is the Another question of interest is, "How large should a sample be so that one
optimal sample can be assured with probability that the tolerance interval will contain at
size? least a proportion p of the population?"
Approximation A rather good approximation for the required sample size is given by
for N
Example of the Suppose we want to know how many measurements to make in order to
effect of p on guarantee that the interval between the smallest and largest observations
the sample size covers a proportion p of the population with probability =0.95. From the
table for the upper critical value of the chi-square distribution, look under the
column labeled 0.05 in the row for 4 degrees of freedom. The value is found
to be and calculations are shown below for p equal to 0.90
and 0.99.
The goal is In order to answer this question, data on piston rings are collected for
to determine each process. For example, on a particular day, data on the diameters of
if the two ten piston rings from each process are measured over a one-hour time
processes frame.
are similar
To determine if the two processes are similar, we are interested in
answering the following questions:
1. Do the two processes produce piston rings with the same
diameter?
2. Do the two processes have similar variability in the diameters of
the rings produced?
Unknown The second question assumes that one does not know the standard
standard deviation of either process and therefore it must be estimated from the
deviation data. This is usually the case, and the tests in this section assume that the
population standard deviations are unknown.
Assumption The statistical methodology used (i.e., the specific test to be used) to
of a normal answer these two questions depends on the underlying distribution of
distribution the measurements. The tests in this section assume that the data are
normally distributed.
Typical null The corresponding null hypotheses that test the true mean of the first process,
hypotheses , against the true mean of the second process, are:
1. H0: =
2. H0: < or equal to
3. H0: > or equal to
Note that as previously discussed, our choice of which null hypothesis to use is
typically made based on one of the following considerations:
1. When we are hoping to prove something new with the sample data, we
make that the alternative hypothesis, whenever possible.
2. When we want to continue to assume a reasonable or traditional
hypothesis still applies, unless very strong contradictory evidence is
present, we make that the null hypothesis, whenever possible.
Basic The basic statistics for the test are the sample means
statistics
from the two
processes ;
Form of the If the standard deviations from the two processes are equivalent, and this should
test statistic be tested before this assumption is made, the test statistic is
where the
two
processes
have
equivalent
standard where the pooled standard deviation is estimated as
deviations
Form of the If it cannot be assumed that the standard deviations from the two processes are
test statistic equivalent, the test statistic is
where the
two
processes do
NOT have
equivalent
standard
deviations The degrees of freedom are not known exactly but can be estimated using the
Welch-Satterthwaite approximation
Test The strategy for testing the hypotheses under (1), (2) or (3) above is to calculate
strategies the appropriate t statistic from one of the formulas above, and then perform a test
at significance level , where is chosen to be small, typically .01, .05 or .10.
The hypothesis associated with each case enumerated above is rejected if:
1.
2.
3.
Explanation The critical values from the t table depend on the significance level and the
of critical degrees of freedom in the standard deviation. For hypothesis (1) is the
values
upper critical value from the t table with degrees of freedom and
similarly for hypotheses (2) and (3).
Example of A new procedure (process 2) to assemble a device is introduced and tested for
unequal possible improvement in time of assembly. The question being addressed is
number of whether the mean, , of the new assembly process is smaller than the mean,
data points , for the old assembly process (process 1). We choose to test hypothesis (2) in
the hope that we will reject this null hypothesis and thereby feel we have a strong
degree of confidence that the new process is an improvement worth
implementing. Data (in minutes required to assemble a device) for both the new
and old processes are listed below along with their relevant statistics.
1 32 36
2 37 31
3 35 30
4 28 31
5 41 34
6 44 36
7 35 29
8 31 32
9 34 31
10 38
11 42
Decision For a one-sided test at the 5% significance level, go to the t table for 5%
process signficance level, and look up the critical value for degrees of freedom = 16.
The critical value is 1.746. Thus, hypothesis (2) is rejected because the test
statistic (t = 2.269) is greater than 1.746 and, therefore, we conclude that process
2 has improved assembly time (smaller mean) over process 1.
from two populations, the data are said to be paired if the ith
measurement on the first sample is naturally paired with the ith
measurement on the second sample. For example, if N supposedly
identical products are chosen from a production line, and each one, in
turn, is tested with first one measuring device and then with a second
measuring device, it is possible to decide whether the measuring devices
are compatible; i.e., whether there is a difference between the two
measurement systems. Similarly, if "before" and "after" measurements
are made with the same device on N objects, it is possible to decide if
there is a difference between "before" and "after"; for example, whether
a cleaning process changes an important characteristic of an object.
Each "before" measurement is paired with the corresponding "after"
measurement, and the differences
are calculated.
Basic The mean and standard deviation for the differences are calculated as
statistics for
the test
and
Test statistic The paired sample t-test is used to test for the difference of two means
based on the before and after a treatment. The test statistic is:
t
distribution
- (where = )
Unpaired observations
- (where = )
- (where )
Interpretation One interpretation of the confidence interval for means is that if zero is contained
of confidence within the confidence interval, the two population means are equivalent.
interval
Typical null The corresponding null hypotheses that test the true standard deviation of
hypotheses the first process, , against the true standard deviation of the second
process, are:
1. H0: =
2. H0:
3. H0:
Basic The basic statistics for the test are the sample variances
statistics
from the two
processes
Test The strategy for testing the hypotheses under (1), (2) or (3) above is to
strategies calculate the F statistic from the formula above, and then perform a test
at significance level , where is chosen to be small, typically .01, .05
or .10. The hypothesis associated with each case enumerated above is
rejected if:
1. or
2.
3.
Explanation The critical values from the F table depend on the significance level and
of critical the degrees of freedom in the standard deviations from the two
values processes. For hypothesis (1):
● is the upper critical value from the F table with
which means that only upper critical values are required for two-sided
tests. However, note that the degrees of freedom are interchanged in the
ratio. For example, for a two-sided test at significance level 0.05, go to
the F table labeled "2.5% significance level".
● For , reverse the order of the degrees of freedom; i.e.,
look across the top of the table for and down the table
for .
Critical values for cases (2) and (3) are defined similarly, except that the
critical values for the one-sided tests are based on rather than on .
Two-sided The two-sided confidence interval for the ratio of the two unknown
confidence variances (squares of the standard deviations) is shown below.
interval
Two-sided confidence interval with 100(1- )% coverage for:
Process 1 Process 2
The If the samples are reasonably large we can use the normal
hypothesis of approximation to the binomial to develop a test similar to testing
equal whether two normal means are equal.
proportions
can be tested Let sample 1 have x1 defects out of n1 and sample 2 have x2 defects
using a z out of n2. Calculate the proportion of defects for each sample and the z
statistic statistic below:
where
Compare z to the normal z table value for a 2-sided test. For a one
sided test, assuming the alternative hypothesis is p1 > p2, compare z to
the normal z table value. If the alternative hypothesis is p1 < p2,
compare z to -z .
Example of a In other words, every subject in each group has one of two possible
2x2 scores. These scores are represented by frequencies in a 2x2
contingency contingency table. The following discussion, using a 2x2 contingency
table table, illustrates how the test operates.
We are working with two independent groups, such as experiments
and controls, males and females, the Chicago Bulls and the New York
Knicks, etc.
- + Total
Group I A B A+B
Group
C D C+D
II
Total A+C B+D N
Determine Fisher's test determines whether the two groups differ in the proportion
whether two with which they fall into the two classifications. For the table above,
groups differ the test would determine whether Group I and Group II differ
in the significantly in the proportion of plusses and minuses attributed to
proportion them.
with which
they fall into The method proceeds as follows:
two The exact probability of observing a particular set of frequencies in a 2
classifications × 2 table, when the marginal totals are regarded as fixed, is given by
the hypergeometric distribution
But the test does not just look at the observed case. If needed, it also
computes the probability of more extreme outcomes, with the same
marginal totals. By "more extreme", we mean relative to the null
hypothesis of equal proportions.
Example of This will become clear in the next illustrative example. Consider the
Fisher's test following set of 2 x 2 contingency tables:
Observed Data More extreme outcomes with same marginals
(a) (b) (c)
2 5 7 1 6 7 0 7 7
3 2 5 4 1 5 5 0 5
5 7 12 5 7 12 5 7 12
Table (a) shows the observed frequencies and tables (b) and (c) show
the two more extreme distributions of frequencies that could occur
with the same marginal totals 7, 5. Given the observed data in table (a)
, we wish to test the null hypothesis at, say, = .05.
Applying the previous formula to tables (a), (b), and (c), we obtain
Tocher's Modification
Tocher's Tocher (1950) showed that a slight modification of the Fisher test
modification makes it a more useful test. Tocher starts by isolating the probability of
makes all cases more extreme than the observed one. In this example that is
Fisher's test
less pb + pc = .04419 + .00126 = .04545
conservative
Now, if this probability is larger than , we cannot reject Ho. But if
this probability is less than , while the probability that we got from
Fisher's test is greater than (as is the case in our example) then
Tocher advises to compute the following ratio:
Definition of Definition: Type II censored data occur when a life test is terminated
Type II exactly when a pre-specified number of failures have occurred. The
censored remaining units have not yet failed. If n units were on test, and the
data pre-specified number of failures is r (where r is less than or equal to n),
then the test ends at tr = the time of the r-th failure.
Test of Letting
equality of
1 and 2
and
confidence
interval for Then
1/ 2
and
where
and
Ha: 1/ 2 1
Two samples of size 10 from exponential distributions were put on life
test. The first sample was censored after 7 failures and the second
sample was censored after 5 failures. The times to failure were:
Sample 1: 125 189 210 356 468 550 610
Sample 2: 170 234 280 350 467
So r1 = 7, r2 = 5 and t1,(r1) = 610, t2,(r2)=467.
and (.228, 2.542) is a 95% confidence interval for the ratio of the
unknown means. The value of 1 is within this range, which is another
way of showing that we cannot reject the null hypothesis at the 95%
significance level.
or
Ub = n1(n2) + .5(n2)(n2 + 1) - Tb
where Ua + Ub = n1(n2).
Null The null hypothesis is: the populations have the same median. The
Hypothesis alternative hypothesis is: The medians are NOT the same.
Test statistic The test statistic, U, is the smaller of Ua and Ub. For sample sizes
larger than 20, we can use the normal z as follows:
z = [ U - E(U)] /
where
The critical value is the normal tabled z for /2 for a two-tailed test or
z at level, for a one-tail test.
For small samples use tables, which are readily available in most
textbooks on nonparametric statistics.
Example
An illustrative Two processing systems were used to clean wafers. The following
example of the data represent the (coded) particle counts. The null hypothesis is that
U test there is no difference between the means of the particle counts; the
alternative hypothesis is that there is a difference. The solution shows
the typical kind of output software for this procedure would generate,
based on the large sample approximation.
Group A Rank Group B Rank
.55 8 .49 5
.67 15.5 .68 17
.43 1 .59 9.5
.51 6 .72 19
.48 3.5 .67 15.5
.60 11 .75 20.5
.71 18 .65 13.5
.53 7 .77 22
.44 2 .62 12
.65 13.5 .48 3.5
.75 20.5 .59 9.5
E(U) = 60.500000
The Z-test statistic = 1.346133
The critical value = +/- 1.960395.
(1.346133) = 0.910870
Right Tail Area = 0.089130
Cannot reject the null hypothesis.
A two-sided confidence interval about U - E(U) is:
Prob {-9.3545 < DELTA < 50.3545 } = 0.9500
DELTA is the absolute difference between U and E(U).
The test statistic is given by: (DELTA / SIGMA).
Compute the The next step is to compute the sum of the ranks for each of the
sum of the original samples. The KW test determines whether these sums of
ranks for each ranks are so different by sample that they are not likely to have all
sample come from the same population.
Test statistic It can be shown that if the k samples come from the same population,
follows a 2 that is, if the null hypothesis is true, then the test statistic, H, used in
distribution the KW procedure is distributed approximately as a chi-square
statistic with df = k - 1, provided that the sample sizes of the k samples
are not too small (say, ni>4, for all i). H is defined as follows:
where
● k = number of samples (groups)
● ni = number of observations for the i-th sample or group
● N = total number of observations (sum of all the ni)
● Ri = sum of ranks for group i
Example
An illustrative The following data are from a comparison of four investment firms.
example The observations represent percentage of growth during a three month
period.for recommended funds.
A B C D
17 10 2 11
19 4.5 4.5 9
14 6 3 12
15 13 7 16
8 1 18
From the chi-square table in Chapter 1, the critical value for = .05
with df = k-1 = 3 is 7.812. Since 13.678 > 7.812, we reject the null
hypothesis.
Note that the rejection region for the KW procedure is one-sided,
since we only reject the null hypothesis when the H statistic is too
large.
The KW test is implemented in the Dataplot command KRUSKAL
WALLIS TEST Y X .
Null Bartlett's test is a commonly used test for equal variances. Let's examine
hypothesis the null and alternative hypotheses.
against
Test statistic Assume we have samples of size ni from the i-th population, i = 1, 2, . . .
, k, and the usual variance estimates from each sample:
where
Distribution When none of the degrees of freedom is small, Bartlett showed that M is
of the test distributed approximately as . The chi-square approximation is
statistic generally acceptable if all the ni are at least 5.
Bartlett's This test is not robust, it is very sensitive to departures from normality.
test is not
robust An alternative description of Bartlett's test, which also describes how
Dataplot implements the test, appears in Chapter 1.
The Levene Levene's test offers a more robust alternative to Bartlett's procedure.
test for That means it will be less likely to reject a true hypothesis of equality of
equality of variances just because the distributions of the sampled populations are
variances not normal. When non-normality is suspected, Levene's procedure is a
better choice than Bartlett's.
Levene's test and its implementation in DATAPLOT were described in
Chapter 1. This description also includes an example where the test is
applied to the gear data. Levene's test does not reject the assumption of
equality of batch variances for these data. This differs from the
conclusion drawn from Bartlett's test and is a better answer if, indeed,
the batch population distributions are non-normal.
The ANOVA ANOVA is a general technique that can be used to test the hypothesis
procedure is that the means among two or more groups are equal, under the
one of the assumption that the sampled populations are normally distributed.
most
powerful A couple of questions come immediately to mind: what means? and
statistical why analyze variances in order to derive conclusions about the means?
techniques Both questions will be answered as we delve further into the subject.
The 1-way In the experiment above, there is only one factor, temperature, and the
ANOVA analysis of variance that we will be using to analyze the effect of
temperature is called a one-way or one-factor ANOVA.
The 2-way We could have opted to also study the effect of positions in the oven. In
or 3-way this case there would be two factors, temperature and oven position.
ANOVA Here we speak of a two-way or two-factor ANOVA. Furthermore, we
may be interested in a third factor, the effect of time. Now we deal with
a three-way or three-factorANOVA. In each of these ANOVA's we test
a variety of hypotheses of equality of means (or average responses when
the factors are varied).
Hypotheses First consider the one-way ANOVA. The null hypothesis is: there is no
that can be difference in the population means of the different levels of factor A
tested in an (the only factor).
ANOVA
The alternative hypothesis is: the means are not the same.
For the 2-way ANOVA, the possible null hypotheses are:
1. There is no difference in the means of factor A
2. There is no difference in means of factor B
3. There is no interaction between factors A and B
The alternative hypothesis for cases 1 and 2 is: the means are not equal.
The alternative hypothesis for case 3 is: there is an interaction between
A and B.
For the 3-way ANOVA: The main effects are factors A, B and C. The
2-factor interactions are: AB, AC, and BC. There is also a three-factor
interaction: ABC.
For each of the seven cases the null hypothesis is the same: there is no
difference in means, and the alternative hypothesis is the means are not
equal.
The n-way In general, the number of main effects and interactions can be found by
ANOVA the following expression:
The first term is for the overall mean, and is always 1. The second term
is for the number of main effects. The third term is for the number of
2-factor interactions, and so on. The last term is for the n-factor
interaction and is always 1.
In what follows, we will discuss only the 1-way and 2-way ANOVA.
Sums of The numerator part is called the sum of squares of deviations from the
squares and mean, and the denominator is called the degrees of freedom.
degrees of
freedom The variance, after some algebra, can be rewritten as:
The first term in the numerator is called the "raw sum of squares" and
the second term is called the "correction term for the mean". Another
name for the numerator is the "corrected sum of squares", and this is
usually abbreviated by Total SS or SS(Total).
where k is the number of treatments and the bar over the y.. denotes
the "grand" or "overall" mean. Each ni is the number of observations
for treatment i. The total number of observations is N (the sum of the
ni).
Fixed effects The errors ij are assumed to be normally and independently (NID)
model distributed, with mean zero and variance . is always a fixed
parameter and are considered to be fixed parameters if
the levels of the treatment are fixed, and not a random sample from a
population of possible levels. It is also assumed that is chosen so that
Random If the k levels of treatment are chosen at random, the model equation
effects remains the same. However, now the i's are random variables assumed
model to be NID(0, ). This is the random effects model.
Whether the levels are fixed or random depends on how these levels are
chosen in a given experiment.
Ratio of MST When the null hypothesis of equal means is true, the two mean squares
and MSE estimate the same quantity (error variance), and should be of
approximately equal magnitude. In other words, their ratio should be
close to 1. If the null hypothesis is false, MST should be larger than
MSE.
Divide sum of The mean squares are formed by dividing the sum of squares by the
squares by associated degrees of freedom.
degrees of
freedom to Let N = ni. Then, the degrees of freedom for treatment, DFT = k - 1,
obtain mean and the degrees of freedom for error, DFE = N - k.
squares
The corresponding mean squares are:
MST = SST / DFT
MSE = SSE / DFE
The F-test The test statistic, used in testing the equality of treatment means is: F =
MST / MSE.
The critical value is the tabular value of the F distribution, based on the
chosen level and the degrees of freedom DFT and DFE.
The calculations are displayed in an ANOVA table, as follows:
ANOVA table
Source SS DF MS F
The word "source" stands for source of variation. Some authors prefer
to use "between" and "within" instead of "treatments" and "error",
respectively.
A numerical The data below resulted from measuring the difference in resistance
example resulting from subjecting identical resistors to three different
temperatures for a period of 24 hours. The sample size of each group
was 5. In the language of Design of Experiments, we have an
experiment in which each of three treatments was replicated 5 times.
Example
ANOVA table Source SS DF MS F
Interpretation The test statistic is the F value of 9.59. Using an of .05, we have that
of the F.05; 2, 12 = 3.89 (see the F distribution table in Chapter 1). Since the
ANOVA table test statistic is much larger than the critical value, we reject the null
hypothesis of equal population means and conclude that there is a
(statistically) significant difference among the population means. The
p-value for 9.59 is .00325, so the test statistic is significant at that
level.
Techniques The populations here are resistor readings while operating under the
for further three different temperatures. What we do not know at this point is
analysis whether the three means are all different or which of the three means is
different from the other two, and by how much.
There are several techniques we might use to further analyze the
differences. These are:
● constructing confidence intervals around the difference of two
means,
● estimating combinations of factor levels with confidence bounds
● multiple comparisons of combinations of factor levels tested
simultaneously.
Formula for The formula for a (1- ) 100% confidence interval for the difference
the between two treatment means is:
confidence
interval
where = MSE.
Computation For the example, we have the following quantities for the formula:
of the ● 3 = 8.56
confidence
interval for ● 1 = 5.34
3- 1 ●
● t.025;12 = 2.179
Substituting these values yields (8.56 - 5.34) 2.179(0.763) or 3.22
1.616.
That is, the confidence interval is from 1.604 to 4.836.
where
Example 1
Example for The data in the accompanying table resulted from an experiment run in
a 4-level a completely randomized design in which each of four treatments was
treatment (or replicated five times.
4 different Total Mean
treatments)
Group 1 6.9 5.4 5.8 4.6 4.0 26.70 5.34
Group 2 8.3 6.8 7.8 9.2 6.5 38.60 7.72
Group 3 8.0 10.5 8.1 6.9 9.3 42.80 8.56
Group 4 5.8 3.8 6.1 5.6 6.2 27.50 5.50
All Groups 135.60 6.78
1-Way This experiment can be illustrated by the table layout for this 1-way
ANOVA ANOVA experiment shown below:
table layout Level Sample j
i 1 2 ... 5 Sum Mean N
1 Y11 Y12 ... Y15 Y1. n1
1.
2 Y21 Y22 ... Y25 Y2. n2
2.
3 Y31 Y32 ... Y35 Y3. n3
3.
4 Y41 Y42 ... Y45 Y4. n4
4.
All Y. nt
..
The estimate for the mean of group 1 is 5.34, and the sample size is n1
= 5.
Computing Since the confidence interval is two-sided, the entry /2 value for the
the t-table is .5(1 - .95) = .025, and the associated degrees of freedom is N -
confidence 4, or 20 - 4 = 16.
interval
From the t table in Chapter 1, we obtain t.025;16 = 2.120.
Definition of Definitions
contrasts
and A contrast is a linear combination of 2 or more factor level means with
orthogonal coefficients that sum to zero.
contrasts Two contrasts are orthogonal if the sum of the products of
corresponding coefficients (i.e., coefficients for the same means) adds
to zero.
Formally, the definition of a contrast is expressed below, using the
notation i for the i-th treatment mean:
C = c1 1 + c2 2 + ... + cj j + ... + ck k
where
c1 + c2 + ... + cj + ... + ck = =0
Simple contrasts include the case of the difference between two factor
means, such as 1 - 2. If one wishes to compare treatments 1 and 2
with treatment 3, one way of expressing this is by: 1 + 2 - 2 3. Note
that
1 - 2 has coefficients +1, -1
1 + 2 - 2 3 has coefficients +1, +1, -2.
These coefficients sum to zero.
c1 +1 0 0 -1
c2 0 +1 -1 0
c3 +1 -1 -1 +1
Estimation of As might be expected, contrasts are estimated by taking the same linear
contrasts combination of treatment mean estimators. In other words:
and
The estimator of is
and
Coefficients This resembles a contrast, but the coefficients ci do not sum to zero.
do not have A linear combination is given by the definition:
to sum to
zero for
linear
combinations
where is the overall mean response, i is the effect due to the i-th
level of factor A, j is the effect due to the j-th level of factor B and ij
is the effect due to any interaction between the i-th level of A and the
j-th level of B.
Fixed At this point, consider the levels of factor A and of factor B chosen for
factors and the experiment to be the only levels of interest to the experimenter such
fixed effects as predetermined levels for temperature settings or the length of time for
models process step. The factors A and B are said to be fixed factors and the
model is a fixed-effects model. Random actors will be discussed later.
The ANOVA The various hypotheses that can be tested using this ANOVA table
table can be concern whether the different levels of Factor A, or Factor B, really
used to test make a difference in the response, and whether the AB interaction is
hypotheses significant (see previous discussion of ANOVA hypotheses).
about the
effects and
interactions
The Factor A has 1, 2, ..., a levels. Factor B has 1, 2, ..., b levels. There are ab
balanced treatment combinations (or cells) in a complete factorial layout. Assume that each
2-way treatment cell has r independent obsevations (known as replications). When each
factorial cell has the same number of replications, the design is a balanced factorial. In
layout this case, the abrdata points {yijk} can be shown pictorially as follows:
Factor B
1 2 ... b
1 y111, y112, ..., y11r y121, y122, ..., y12r ... y1b1, y1b2, ..., y1br
2 y211, y212, ..., y21r y221, y222, ..., y22r ... y2b1, y2b2, ..., y2br
Factor . ... .... ...
A .
a ya11, ya12, ..., ya1r ya21, ya22, ..., ya2r ... yab1, yab2, ..., yabr
How to Next, we will calculate the sums of squares needed for the ANOVA table.
obtain ● Let Ai be the sum of all observations of level i of factor A, i = 1, ... ,a. The
sums of Ai are the row sums.
squares
for the ● Let Bj be the sum of all observations of level j of factor B, j = 1, ...,b. The
balanced Bj are the column sums.
factorial
●Let (AB)ij be the sum of all observations of level i of A and level j of B.
layout
These are cell sums.
● Let r be the number of replicates in the experiment; that is: the number of
times each factorial treatment combination appears in the experiment.
Then the total number of observations for each level of factor A is rb and the total
number of observations for each level of factor B is raand the total number of
observations for each interaction is r.
These expressions are used to calculate the ANOVA table entries for the (fixed
effects) 2-way ANOVA.
Materials (B)
LABS (A) 1 2 3
4.1 3.1 3.5
1 3.9 2.8 3.2
4.3 3.3 3.6
2.7 1.9 2.7
2 3.1 2.2 2.3
2.6 2.3 2.5
Row and The preliminary part of the analysis yields a table of row and column sums.
column
sums Material (B)
Source SS df MS F p-value
A fixed level In the previous example, the levels of the factor temperature were
of a factor or considered as fixed; that is, the three temperatures were the only ones
variable that we were interested in (this may sound somewhat unlikely, but let
means that us accept it without opposition). The model employed for fixed levels
the levels in is called a fixed model. When the levels of a factor are random, such as
the operators, days, lots or batches, where the levels in the experiment
experiment might have been chosen at random from a large number of possible
are the only levels, the model is called a random model, and inferences are to be
ones we are extended to all levels of the population.
interested in
Data for the A company supplies a customer with a larger number of batches of raw
example materials. The customer makes three sample determinations from each
of 5 randomly selected batches to control the quality of the incoming
material. The model is
and the k levels (e.g., the batches) are chosen at random from a
population with variance . The data are shown below
Batch
1 2 3 4 5
74 68 75 72 79
76 71 77 74 81
75 72 77 73 79
ANOVA table A 1-way ANOVA is performed on the data with the following results:
for example
ANOVA
Source SS df MS EMS
Treatment (batches) 147.74 4 36.935 +3
Error 17.99 10 1.799
Interpretation The computations that produce the SS are the same for both the fixed
of the and the random effects model. For the random model, however, the
ANOVA table treatment sum of squares, SST, is an estimate of { + 3 }. This is
shown in the EMS (Expected Mean Squares) column of the ANOVA
table.
The test statistic from the ANOVA table is F = 36.94 / 1.80 = 20.5.
If we had chosen an value of .01, then the F value from the table in
Chapter 1 for a df of 4 in the numerator and 10 in the denominator is
5.99.
Method of Since the test statistic is larger than the critical value, we reject the
moments hypothesis of equal means. Since these batches were chosen via a
random selection process, it may be of interest to find out how much of
the variance in the experiment might be attributed to batch diferences
and how much to random error. In order to answer these questions, we
can use the EMS column. The estimate of is 1.80 and the computed
treatment mean square of 36.94 is an estimate of + 3 . Setting the
MS values equal to the EMS values (this is called the Method of
Moments), we obtain
Contingency When items are classified according to two or more criteria, it is often of interest to
Table decide whether these criteria act independently of one another.
approach
For example, suppose we wish to classify defects found in wafers produced in a
manufacturing plant, first according to the type of defect and, second, according to the
production shift during which the wafers were produced. If the proportions of the various
types of defects are constant from shift to shift, then classification by defects is
independent of the classification by production shift. On the other hand, if the
proportions of the various defects vary from shift to shift, then the classification by
defects depends upon or is contingent upon the shift classification and the classifications
are dependent.
In the process of investigating whether one method of classification is contingent upon
another, it is customary to display the data by using a cross classification in an array
consisting of r rows and c columns called a contingency table. A contingency table
consists of r x c cells representing the r x c possible outcomes in the classification
process. Let us construct an industrial case:
Industrial A total of 309 wafer defects were recorded and the defects were classified as being one
example of four types, A, B, C, or D. At the same time each wafer was identified according to the
production shift in which it was manufactured, 1, 2, or 3.
Column Let pA be the probability that a defect will be of type A. Likewise, define pB, pC, and pD
probabilities as the probabilities of observing the other three types of defects. These probabilities,
which are called the column probabilities, will satisfy the requirement
pA + pB + pC + pD = 1
Row By the same token, let pi (i=1, 2, or 3) be the row probability that a defect will have
probabilities occurred during shift i, where
p1 + p2 + p3 = 1
Multiplicative Then if the two classifications are independent of each other, a cell probability will
Law of equal the product of its respective row and column probabilities in accordance with
Probability the Multiplicative Law of Probability.
Example of For example, the probability that a particular defect will occur in shift 1 and is of type A
obtaining is (p1) (pA). While the numerical values of the cell probabilities are unspecified, the null
column and hypothesis states that each cell probability will equal the product of its respective row
row and column probabilities. This condition implies independence of the two classifications.
probabilities The alternative hypothesis is that this equality does not hold for at least one cell.
In other words, we state the null hypothesis as H0: the two classifications are
independent, while the alternative hypothesis is Ha: the classifications are dependent.
To obtain the observed column probability, divide the column total by the grand total, n.
Denoting the total of column j as cj, we get
Similarly, the row probabilities p1, p2, and p3 are estimated by dividing the row totals r1,
r2, and r3 by the grand total n, respectively
Expected cell Denote the observed frequency of the cell in row i and column jof the contingency table
frequencies by nij. Then we have
Estimated In other words, when the row and column classifications are independent, the estimated
expected cell expected value of the observed cell frequency nij in an r x c contingency table is equal to
frequency its respective row and column totals divided by the total frequency.
when H0 is
true.
The estimated cell frequencies are shown in parentheses in the contingency table above.
Test statistic From here we use the expected and observed frequencies shown in the table to calculate
the value of the test statistic
df = The next step is to find the appropriate number of degrees of freedom associated with the
(r-1)(c-1) test statistic. Leaving out the details of the derivation, we state the result:
The number of degrees of freedom associated with a contingency table
consisting of r rows and c columns is (r-1) (c-1).
So for our example we have (3-1) (4-1) = 6 d.f.
Testing the In order to test the null hypothesis, we compare the test statistic with the critical value of
null 2 at a selected value of . Let us use = .05. Then the critical value is 205;6, which
hypothesis is 12.5916 (see the chi square table in Chapter 1). Since the test statistic of 19.18 exceeds
the critical value, we reject the null hypothesis and conclude that there is significant
evidence that the proportions of the different defect types vary from shift to shift. In this
case, the p-value of the test statistic is .00387.
Testing for When we have samples from n populations (i.e., lots, vendors,
homogeneity production runs, etc.), we can test whether there are significant
of proportions differences in the proportion defectives for these populations using a
using the contingency table approach. The contingency table we construct has
chi-square two rows and n columns.
distribution
via To test the null hypothesis of no difference in the proportions among
contingency the n populations
tables H0: p1 = p2 = ... = pn
against the alternative that not all n population proportions are equal
H1: Not all pi are equal (i = 1, 2, ..., n)
The critical The critical value is obtained from the 2 distribution table with
value degrees of freedom (2-1)(n-1) = n-1, at a given level of significance.
An illustrative example
Data for the Diodes used on a printed circuit board are produced in lots of size
example 4000. To study the homogeneity of lots with respect to a demanding
specification, we take random samples of size 300 from 5 consecutive
lots and test the diodes. The results are:
Lot
Results 1 2 3 4 5 Totals
Nonconforming 36 46 42 63 38 225
Conforming 264 254 258 237 262 1275
Totals 300 300 300 300 300 1500
Computation Assuming the null hypothesis is true, we can estimate the single
of the overall overall proportion of nonconforming diodes by pooling the results of
proportion of all the samples as
nonconforming
units
Table of Lot
expected Results 1 2 3 4 5 Totals
frequencies
Nonconforming 45 45 45 45 45 225
Conforming 255 255 255 255 255 1275
Totals 300 300 300 300 300 1500
Table for we use the observed and expected values from the tables above to
computing the compute the 2 test statistic. The calculations are presented below:
test statistic
fo fc (fo - fc) (fo - fc)2 (fo - fc)2/ fc
36 45 -9 81 1.800
46 45 1 1 0.022
42 45 -3 9 0.200
63 45 18 324 7.200
38 45 -7 49 1.089
264 225 9 81 0.318
254 255 -1 1 0.004
258 255 3 9 0.035
237 255 -18 324 1.271
262 255 7 49 0.192
12.131
Typical Questions concerning the reason for the rejection of the null
questions hypothesis arise in the form of:
● "Which mean(s) or proportion (s) differ from a standard or from
each other?"
● "Does the mean of treatment 1 differ from that of treatment 2?"
Multiple One popular way to investigate the cause of rejection of the null
Comparison hypothesis is a Multiple Comparison Procedure. These are methods
test which examine or compare more than one pair of means or proportions
procedures at the same time.
are needed
Note: Doing pairwise comparison procedures over and over again for
all possible pairs will not, in general, work. This is because the overall
significance level is not as specified for a single pair comparison.
ANOVA F test The ANOVA uses the F test to determine whether there exists a
is a significant difference among treatment means or interactions. In this
preliminary sense it is a preliminary test that informs us if we should continue the
test investigation of the data at hand.
If the null hypothesis (no difference among treatments or interactions)
is accepted, there is an implication that no relation exists between the
factor levels and the response. There is not much we can learn, and we
are finished with the analysis.
When the F test rejects the null hypothesis, we usually want to
undertake a thorough analysis of the nature of the factor-level effects.
Procedures If the decision on what comparisons to make is withheld until after the
for data are examined, the following procedures can be used:
performing ● Tukey's Method to test all possible pairwise differences of
multiple means to determine if at least one difference is significantly
comparisons different from 0.
● Scheffé's Method to test all possible contrasts at the same time,
to see if at least one is significantly different from 0.
● Bonferroni Method to test, or put simultaneous confidence
intervals around, a pre-selected group of contrasts
Procedure for When we are dealing with population proportion defective data, the
proportion Marascuilo procedure can be used to simultaneously examine
defective data comparisons between all groups after the data have been collected.
The The Tukey method uses the studentized range distribution. Suppose we
studentized have r independent observations y1, ..., yr from a normal distribution
range q with mean and variance 2. Let w be the range for this set , i.e., the
maximum minus the minimum. Now suppose that we have an estimate
s2 of the variance 2 which is based on degrees of freedom and is
independent of the yi. The studentized range is defined as
The The distribution of q has been tabulated and appears in many textbooks
distribution on statistics. In addition, Dataplot has a CDF function (SRACDF) and a
of q is percentile function (SRAPPF) for q.
tabulated in
many As an example, let r = 5 and = 10. The 95th percentile is q.05;5,10 =
textbooks 4.65. This means:
and can be
calculated
using
Dataplot
Tukey's Method
Confidence The Tukey confidence limits for all pairwise comparisons with
limits for confidence coefficient of at least 1- are:
Tukey's
method
Notice that the point estimator and the estimated variance are the same
as those for a single pairwise comparison that was illustrated previously.
The only difference between the confidence limits for simultaneous
comparisons and those for a single comparison is the multiple of the
estimated standard deviation.
Also note that the sample sizes must be equal when using the
studentized range approach.
Example
Unequal It is possible to work with unequal sample sizes. In this case, one has to
sample sizes calculate the estimated standard deviation for each pairwise comparison.
The Tukey procedure for unequal sample sizes is sometimes referred to
as the Tukey-Kramer Method.
where
Simultaneous It can be shown that the probability is 1 - that all confidence limits of
confidence the type
interval
and
The confidence limits for C1 are -.5 ± 3.12(.5158) = -.5 ± 1.608, and for
C2 they are .34 ± 1.608.
Tukey If only pairwise comparisons are to be made, the Tukey method will
preferred result in a narrower confidence limit, which is preferable.
when only
pairwise Consider for example the comparison between 3 and 1.
comparisons Tukey: 1.13 < - < 5.31
3 1
are of
interest Scheffé: 0.95 < 3- 1 < 5.49
Scheffe In the general case when many or all contrasts might be of interest, the
preferred Scheffé method tends to give narrower confidence limits and is
when many therefore the preferred method.
contrasts are
of interest
Applies for a This method applies to an ANOVA situation when the analyst has
finite number picked out a particular set of pairwise comparisons or contrasts or
of contrasts linear combinations in advance. This set is not infinite, as in the
Scheffé case, but may exceed the set of pairwise comparisons specified
in the Tukey procedure.
Valid for The Bonferroni method is valid for equal and unequal sample sizes.
both equal We restrict ourselves to only linear combinations or comparisons of
and unequal treatment level means (pairwise comparisons and contrasts are special
sample sizes cases of linear combinations). We denote the number of statements or
comparisons in the finite set by g.
Formula for In summary, the Bonferroni method states that the confidence
Bonferroni coefficient is at least 1- that simultaneously all the following
confidence confidence limits for the g linear combinations Ci are "correct" (or
interval capture their respective true values):
where
and
Compute the For a 95 percent overall confidence coefficient using the Bonferroni
Bonferroni method, the t-value is t.05/(2*2);16 = t.0125;16 = 2.473 (see the
simultaneous t-distribution critical value table in Chapter 1). Now we can calculate
confidence the confidence intervals for the two contrasts. For C1 we have
interval
confidence limits -.5 ± 2.473 (.5158) and for C2 we have confidence
limits .34 ± 2.473 (.5158).
Thus, the confidence intervals are:
-1.776 C1 0.776
-0.936 C2 1.616
No one 1. If all pairwise comparisons are of interest, Tukey has the edge. If
comparison only a subset of pairwise comparisons are required, Bonferroni
method is may sometimes be better.
uniformly 2. When the number of contrasts to be estimated is small, (about as
best - each many as there are factors) Bonferroni is better than Scheffé.
has its uses Actually, unless the number of desired contrasts is at least twice
the number of factors, Scheffé will always show wider
confidence bands than Bonferroni.
3. Many computer packages include all three methods. So, study
the output and select the method with the smallest confidence
band.
4. No single method of multiple comparisons is uniformly best
among all the methods.
Marascuilo Rejecting the null hypothesis only allows us to conclude that not (in
procedure this case) all lots are equal with respect to the proportion of defectives.
allows However, it does not tell us which lot or lots caused the rejection.
comparison of
all possible The Marascuilo procedure enables us to simultaneously test the
pairs of differences of all pairs of proportions when there are several
proportions populations under investigation.
Step 3: The third and last step is to compare each of the k(k-1)/2 test statistics
compare test against its corresponding critical rij value. Those pairs that have a test
statistics statistic that exceeds the critical value are significant at the level.
against
corresponding
critical values
Example
Sample To illustrate the Marascuillo procedure, we use the data from the
proportions previous example. Since there were 5 lots, there are (5 x 4)/2 = 10
possible pairwise comparisons to be made and ten critical ranges to
compute. The five sample proportions are:
p1 = 36/300 = .120
p2 = 46/300 = .153
p3 = 42/300 = .140
p4 = 63/300 = .210
p5 = 38/300 = .127
Note: The values in this table were computed with the following
Dataplot macro.
7.5. References
Primary Agresti, A. and Coull, B. A. (1998). Approximate is better than "exact"
References for interval estimation of binomial proportions", The American
Statistician, 52(2), 119-126.
Berenson M.L. and Levine D.M. (1996) Basic Business Statistics,
Prentice-Hall, Englewood Cliffs, New Jersey.
Bhattacharyya, G. K., and R. A. Johnson, (1997). Statistical Concepts
and Methods, John Wiley and Sons, New York.
Birnbaum, Z. W. (1952). "Numerical tabulation of the distribution of
Kolmogorov's statistic for finite sample size", Journal of the American
Statistical Association, 47, page 425.
Brown, L. D. Cai, T. T. and DasGupta, A. (2001). Interval estimation
for a binomial proportion", Statistical Science, 16(2), 101-133.
Diamond, W. J. (1989). Practical Experiment Designs, Van-Nostrand
Reinhold, New York.
Dixon, W. J. and Massey, F.J. (1969). Introduction to Statistical
Analysis, McGraw-Hill, New York.
Draper, N. and Smith, H., (1981). Applied Regression Analysis, John
Wiley & Sons, New York.
Hicks, C. R. (1973). Fundamental Concepts in the Design of
Experiments, Holt, Rinehart and Winston, New York.
Hollander, M. and Wolfe, D. A. (1973). Nonparametric Statistical
Methods, John Wiley & Sons, New York.
Howe, W. G. (1969). "Two-sided Tolerance Limits for Normal
Populations - Some Improvements", Journal of the Americal Statistical
Association, 64 , pages 610-620.
Kendall, M. and Stuart, A. (1979). The Advanced Theory of Statistics,
Volume 2: Inference and Relationship. Charles Griffin & Co. Limited,
London.