DMAIC - Improve Phase

Cu s
Suppl
iers st o ut SIPO C
O
me I np
ut
VO C
Con rs
pu
Project Scope
trac Emplo
tors yees
st
P-M a p, X Y, FM EA
(X1) (X2) (X3) (X4) (X8) (X11) (X9) Ca pa bility
(X6) (X7) (X5) (X10)
Box Plot, Sca tter
(X3) (X4) (X1) (X11) Plots, Regression
(X5) (X8)
(X8)
(X2)
Fra ctiona l Fa ctoria l

Full Fa ctoria l
(X5) (X3) Center Points
(X11)
(X4)
Certified Lean Six

Sigma Black Belt Book
Improve Phase
Welcome to Improve
418
Lean Six Sigma

Black Belt Training
Improve Phase
Welcome to Improve
Now that we have completed the Analyze Phase we are going to jump into the Improve Phase. In
Welcome to Improve we will give you a brief look at the topics we are going to cover.
Certified Lean Six Sigma Black Belt Book Copyright OpenSourceSixSigma.com

419
Welcome to Improve
Overview
Well,, now that the
Analyze Phase is over,
W elcom e to Im prove
on to a more difficult
phase. The good news
is….you’ll hardly ever Process M odeling: Regression
use this stuff, so pay
close attention! Adva nced Process M odeling:
We will examine the M LR
meaning of each of
these and show you Designing Ex perim ents
how to apply them.
Ex perim enta l M ethods
Full Fa ctoria l Ex perim ents
Fra ctiona l Fa ctoria l Ex perim ents
W ra p Up & Action Item s
DMAIC Roadmap
Process Owner
Champion/
Identify Problem Area
Determine Appropriate Project Focus

Defi ne
Estimate COPQ
Establish Team
Measure
Assess Stability, Capability, and Measurement Systems
Identify and Prioritize All X’s

Analyze
Prove/ Disprove Impact X’s Have On Problem

Improve
Identify, Prioritize, Select Solutions Control or Eliminate X’s Causing Problems
Implement Solutions to Control or Eliminate Xs Causing Problems

Control
Implement Control Plan to Ensure Problem Doesn’t Return
Verify Financial Impact
We are currently in the Improve Phase and by now you may be quite sick of Six Sigma, really! In this
module we are going to look at additional approaches to process modeling, its actually quite fun in a
weird sort of way!

420
Welcome to Improve
Improve Phase
Analysis Complete
Identify Few Vital X’s
Experiment to Optimize Value of X’s
Simulate the N ew Process
Validate N ew Process
Implement N ew Process
Ready for Control
After completing the Improve Phase you will be able to put to use the steps as depicted here.

421
Lean Six Sigma

Black Belt Training
Improve Phase
Process Modeling Regression
Now we will continue in the Improve Phase with “Process Modeling: Regression”.

422
Overview
W
W elcom
elcomee to
to Im
Improve
prove Correlation
Correlation
Process
Process M
Modeling:
odeling: Regression
Regression Introduction
Introduction to
to Regression
Regression
Adva
Advanced
nced Process
Process M
M odeling:
odeling: Simple
Simple Linear
Linear Regression
Regression
M
MLR
LR
Designing
Designing Ex
Experim
periments
ents
Ex
Experim
perimenta
entall M
Methods
ethods
Full
Full Fa
Factoria
ctoriall Ex
Experim
periments
ents
FFractiona
Fra ti
ctiona ll Fa
FFactoria
t i ll
ctoria
Ex
Experim
periments
ents
W
W ra
rapp Up
Up &
& Action
Action Item
Itemss
In this module of Process Modeling we will study Correlation, Introduction to Regression and
Simple Linear Regression. These are some powerful tools in our data analysis tool box.
We will examine the meaning of each of these and show you how to apply them.

423
Correlation
• The primary purpose of linea r correla tion a na lysis is to measure the

strength of linear association between two variables (X and Y).
• If X increases there is no definite shift in the values of Y, there is no
correla tion, or no association between X and Y.
• If X increases there is a shift in the values of Y, there is a correla tion.
• The correlation is positive when y tends to increase and negative when y
tends to decrease
decrease.
• If the ordered pairs (x, y) tend to follow a straight line path, there is a linea r
correla tion.
• The preciseness of the shift in y as x increases determines the strength of the
linear correlation.
• To conduct a linear correlation analysis you need:
– Bivariate Data – Two pieces of data that are variable
– Bivariate data is comprised of ordered pairs (X/ Y)
– X is the independent variable
– Y is the dependent variable
The primary purpose of linear correlation analysis is to measure the strength of linear
association between two variables (X and Y). You have already seen correlation graphically when
you created a Scatter Plot
Plot.
If as X increases there is no definite shift in the values of Y, there is no correlation, or no

association between X and Y.
If as X increases there is a shift in the values of Y, there is a correlation.
The correlation is positive when Y tends to increase and negative when Y tends to decrease.
If the ordered pairs (X, Y) tend to follow a straight line path, there is a linear correlation.
The preciseness of the shift in y as x increases determines the strength of the linear correlation.
To conduct the study you need:
- Bivariate Data – Two pieces of data that are variable
- Bivariate data is comprised of ordered pairs (X/Y)
- X is the independent variable
- Y is the dependent variable

424
Correlation Coefficient
Ho: N o Correlation Ho ho ho….

Ha: There is Correlation
Ha ha ha….
The correlation coefficient (always) assumes a value between –1 and +1.
The correlation
Th l ti coefficient
ffi i t off th
the population,
l ti R
R, iis estimated
ti t d bby the
th sample
l
correlation coefficient, r:
The null hypothesis for correlation is: there is no correlation, the alternative is there is correlation.
The correlation coefficient (always) assumes a value between –1 and +1.
The correlation coefficient of the population

population, large R
R, is estimated by the sample correlation
coefficient, small r and is calculated as shown.
Types and Magnitude of Correlation
The graphics shown here are labeled as the type and magnitude of their correlation: Strong,
Moderate or Weak correlation.

425
Limitations of Correlation
To properly
understand • A strong positive or negative correlation between X and Y does not indicate
regression you causality.
must first • Correlation provides an indication of the strength but does not provide us
understand with an exact numerical relationship (i.e. Y=f(x)).
correlation. Once • The magnitude of the correlation coefficient is somewhat relative and should
be used with caution.
a relationship is
• Just like any other statistic, you need to assess whether the correlation
described, then a
coefficient is statistically significant
significant, as well as practically significant.
significant
regression can be
performed. • As usual, statistical significance is judged by comparing a p-value with the
chosen degree of alpha risk.
A strong positive • Guidelines for practical significance are as follows:
or negative – If | r | > 0.80, relationship is practically significant
correlation
between X and Y – If | r | < 0.20, relationship is not practically significant
does not
ot indicate
d cate
Area of
Area ofnega
negative
tive Area of positive
causality. linear rcorrela
correlation
tion N o linea r correla tion linea r correla tion
linea
Correlation
provides an
indication of the -1 .0 -0 .8 -0 .2 0 0 .2 0 .8 + 1 .0
strength but does
not provide us with an exact numerical relationship. Regression however provides us with that data
more specifically a y equals f of x equation. Just like any other statistic, be sure to assess the
correlation coefficient is both statistically significant and practically significant
significant.
Correlation Example
Open MiniTab worksheet RB Stats Correlation.mtw
X va lues Y va lues
The correla tion coefficient [r]:
Pa y
yton ca rries Pa y ton y
ya rds
• Is a positive value if one variable 196 679
increases as the other variable 311 1390
increases. 339 1852
• Is a negative value if one variable 333 1359
decreases as the other increases. 369 1610
317 1460
339 1222
148 596
Correla tion Form ula
314 1421
381 1684
Σ ( X i − X )(Yi − Y )
r= 324 1551
∑ ( X i − X ) ∑ (Yi − Y )
2 2 321 1333
146 586
We will use some data from a National Football League player, Walter Payton of the Chicago
Bears. Open MINITABTM worksheet “RB Stats Correlation.mtw” as shown here.

426
Correlation Analysis
Get outta my way!
In MINITABTM select “Graph>Scatter

p Plot>Simple”.
p The following
g “Scatterplot
p – Simple”
p window will
open. To select your Y variable double-click on “payton yards” from the left hand box. For the X variable
double-click “payton carries” from the same box. To enable MINITABTM for the use of a “Lowess Scatter
Plot” click on the “Data View…” button and select the “Smoother” tab… from there you will see a Lowess
option. Select this option and click “OK”.
Correlation Example
Lowess stands for LOcally-

LOcally
Do you observe any correlation in this graph?
WEighted Scatterplot
Smoother. The Lowess
routine fits a smoothed line Scatterplot
Scatterplotof
of payton
paytonyards
yardsvs
vspayton
paytoncarries
carries
2000
to the data which should be 2000
used to explore the

1750
1750
relationship between two
variables without fitting
ga 1500
1500
specific model, such as a
yards
payton yards
regression line or 1250

1250
payton
theoretical distribution.
Lowess smoothers are 1000
1000
most useful when the
curvature of the relationship 750
750
does not change sharply. In

500
500
this example it appears that 150 200 250 300 350 400
150 200 250 300 350 400
there is correlation in the payton
paytoncarries
carries
data.

427
Correlation Example (cont.)

Now we will g generate
the correlation
coefficient using
MINITABTM. Follow the
MINITABTM command
path shown here and
select the “Variables:”
double-click on “payton
carries” and “payton
carries payton
yards” from the left box. Correla tion coefficient is high a nd
the P-va lue is low . Reject the null
The correlation hypothesis, there is a correla tion.
coefficient is high which
corresponds to the
Results for: RB STATS CO RRELATIO N .M TW
graph on the previous
slide that shows Sca tterplot of Pa y ton ya rds vs Pa y ton ca rries
positive correlation.
p Correla tions: Pa y ton ca rries, Pa yton y a rds
Pea rson correla tion of Pa yton ca rries a nd Pa y ton ya rds = 0 .9 3 5
The P-value is low at P-Va lue = 0 .0 0 0
.935 so we reject the
null hypothesis by
saying that there is significant correlation between Payton’s carries and the number of yards.
Regression Analysis
Correlation only tells us the strength of a relationship, not the numerical

relationship.
The last step to proper analysis of Continuous Ddata is to determine the

regression equation.
The regression equation can mathematically predict Y for any given X.
The regression equation from MIN ITABTM is the BEST FIT for the plotted
data.
Prediction Equations:
Y= a + bx (Linear or 1 st order model)
Y= a + bx + cx2 (Quadratic or 2 nd order model)
Y= a + bx + cx2 + dx3 (Cubic or 3 rd order model)
Y= a (bx) (Exponential)
Correlation ONLY tells us the strength of a relationship while Regression gives the mathematical
relationship or the prediction model.

428
Simple vs. Multiple Regression
Simple Regression: In Simple

– One X, One Y Regression there is
– Analyze in MIN ITABTM using only one X
• Sta t>Regression>Fitted Line Plot or commonly referred
• Sta t>Regression>Regression
to as predictors or
regressors. Multiple
Regression allows
M ultiple Regression: many Y’s. Recall
– Two or More X’s, One Y we are only
– Analyze in MIN ITABTM Using presenting Simple
• Sta t>Regression>Regression Regression in this
phase and will
present Multiple
I both
In b th cases the
th R-sq
R valuel ttells
ll us th
the amountt Regression in detail
of variation explained by our model. in the next phase.
Regression Analysis Graphical Output
Fitted
FittedLine
LinePlot
Plot
payton
paytonyards
yards== --163.5
163.5++4.916
4.916payton
paytoncarries
carries
2000
2000 SS 153.985
153.985
R-Sq
R-Sq 87.3%
87.3%
R-Sq(adj)
R-Sq(adj) 86.2%
86.2%
1750
1750
1500
1500
yards
paytonyards
1250
1250
payton
1000
1000
750
750
500
500
150
150 200
200 250
250 300
300 350
350 400
400
payton
paytoncarries
carries
There are two ways to perform a Simple Regression. One is the Fitted Line Plot which will give a
Scatter Plot with a Fitted Line and will generate a limited regression equation in the Session Window
of MINITABTM as shown above.
Follow the MINITABTM command prompt shown here, double-click “payton yards” for Response (Y)
and double-click “payton carries” for the Predictor (X) and click “OK” which will produce this output.

429
Regression Analysis Statistical Output
Stat > Regression > Regression
Regression Ana ly sis: pa yton ya rds versus pa yton ca rries

R-Sq va lue of 8 7 .3 % = 1 7 9 8 5 8 7 / 2 0 5 9 4 1 3
The regression equation is R-Sq (a dj) of 8 6 .2 % = (1 7 9 8 5 8 7 – 2 3 7 1 1 )/ 2 0 5 9 4 1 3
Payton yards = -163.497 + 4.91622 Payton carries
S = 153.985 R-Sq = 87.3 % R-Sq(adj) = 86.2 %

Analysis of Variance
Source DF SS MS F P M ea n Squa res
Regression 1 1798587 1798587 75.8531 0.000
Error 11 260826 23711
Total 12 2059413
R-Sqq va lue of 8 7 .3 % qua

q ntifies the strength
g of the a ssocia tion betw een
Ca rries a nd Ya rds. In this ca se, our prediction equa tion ex pla ins 8 7 .3 %
of the tota l va ria tion seen in “ Ya rds” . 1 2 .7 % of the va ria tion seen in
“ Ya rds” is not ex pla ined by our equa tion.
Let’s look at the Regression Analysis Statistical Output. The difference between R squared and
adjusted R squared is not terribly important in Simple Regression.
In Multiple Regression where there are many X’s it becomes more important which you will see
in the next module.
Regression (Prediction) Equation
The Regression
Analysis generates a Regression Ana lysis: Pa yton ya rds versus Pa yton ca rries
prediction model The regression equation is
based on the best fit
line through the data Payton yards = -163.497 + 4.91622 Payton carries
represented by the
equation shown here.
To p
predict the Consta nt Level of X
number of yards that
Coefficient
Payton would run if
he had 250 carries
To predict how many yards Payton would run if he had 250 carries use the
you simply fill in that
prediction equation above.
value in the equation
and solve.
Payton
y yyards = - 163.497 + 4.91622(250 ) = 1,065.6

430
Regression (Prediction) Equation (cont.)
You could
Y ld
make an fairly Compa re to the Fitted Line.
accurate
estimate by Fitted
FittedLine
LinePlot
Plot
payton
paytonyards
yards== --163.5
163.5++4.916
4.916payton
paytoncarries
using the Line carries
2000
2000
Plot also. SS
R-Sq
153.985
153.985
87.3%
R-Sq 87.3%
R-Sq(adj)
R-Sq(adj) 86.2%
86.2%
1750
1750
1500
1500
yards
paytonyards
payton 1250
1250
~1067 yds
1000
1000
750
750
500
500
150
150 200
200 250
250 300
300 350
350 400
400
payton
paytoncarries
carries
Regression Graphical Output
For a dem onstra tion, check other regression fits.

Stat>Regression>Fitted Line Plot
Q ua dra tic a nd Cubic – Check the r 2 va lue a ga inst the linea r m odel to
determ ine if the difference betw een the va ria nce ex pla ined by our
equa tion is significa nt.
MINITABTM will also generate both quadratic and cubic fits. Select the appropriate variables for (Y) and
(X) and for the type of Regression Model choose “Quadratic” or “Cubic” for the regression model type.

431
Regression Graphical Output (cont.)
Fitted
FittedLine
LinePlot
Plot
payton
paytonyards
yards== --199.7
199.7++5.239
5.239payton
paytoncarries
carries
--0.00064
0.00064payton
paytoncarries**2
carries**2
2000
If the R-Sq va lue im proves
2000 SS 161.474
R-Sq
R-Sq significa ntly, or if the
161.474
87.3%
87.3%
R-Sq(adj) 84.8%
1750
1750 Q ua dra tic R-Sq(adj)
a ssum ptions of the residua ls a re
84.8%
1500
1500
better m et a s a result of utilizing
yards
paytonyards
the qua dra tic or cubic equa tion

1250
1250
you w ill w a nt to use the best
payton
1000
1000 fitting
g equa
q tion.
750
750
Fitted
FittedLine
LinePlot
Plot
500
500 payton
paytonyards
yards== 2188
2188- -24.71
24.71payton
paytoncarries
carries
150
150 200
200 250
250 300
300 350
350 400
400 ++0.1147
0.1147payton
paytoncarries**2
carries**2--0.000141
0.000141payton
paytoncarries**3
carries**3
payton
paytoncarries
carries
2000
2000 SS 164.218
164.218
R-Sq
R-Sq 88.2%
88.2%
R-Sq(adj)
R-Sq(adj) 84.3%
84.3%
1750
1750
Cubic
1500
1500
yards
yton yards
1250
1250
ayton
pay
pa
1000
1000
750
750
500
500
150
150 200
200 250
250 300
300 350
350 400
400
payton
paytoncarries
carries
Use the best fitting equation by looking at the R-Sq value. If it improves significantly, or if the
assumptions of the residuals are better met as a result of utilizing the quadratic or cubic equation
you should use it.
Here there is no big difference so we will stick with the linear model.
Residuals
Regression Analysis relies on assumptions about the residuals

(differences between predicted and actual Y values).
Analyze the residuals to look for evidence of an outlier (which could

mean a typo or some assignable cause) or nonlinearity.
As in AN OVA
OVA, the residuals should:
– Be normally distributed (normal plot of residuals)
– Be independent of each other
• no patterns (random)
• data must be time ordered (residuals vs. order graph)
– Have a constant variance (visual
(visual, see residuals versus fits chart
chart,
should be (approximately) same number of residuals above and
below the line, equally spread.)

432
Residuals (cont.)
Residual Plots can be generated from both the fitted line plot and
regression selection in MIN ITABTM .
Sta nda rdized residua l a lso

k now n a s the Studentized
residua l or interna lly
Studentized residua l. The
sta nda rdized residua l is the
residua l divided by a n estim a te
of its Sta nda rd Devia tion.
This form of the residua l ta k es
into a ccount tha t the residua ls
m a y ha
h ve different
diff t va ria
i nces,
w hich ca n ma k e it ea sier to
detect outliers.
Residual Plots can be generated from both the Fitted Line Plot and regression selection when using
MINITABTM.
Here we produced the graph by selecting the “Four
Four in one”
one option.
option
N orm a lity a ssum ption Equa l va ria nce

a ssum ption…
Residual Plots for payton yards
Normal Probability Plot of the Residuals Residuals Versus the Fitted Values
99
dual
2
Standardized Resid
90
1
Percent
50
0
10 -1
1 -2
-2 -1 0 1 2 600 900 1200 1500 1800
Standardized Residual Fitted Value
Independence a ssum ption
Histogram of the Residuals Residuals Versus the Order of the Data
8
esidual
2
6
Standardized Re
Frequency
y
1
4 0
2 -1
0 -2
-2 -1 0 1 2 1 2 3 4 5 6 7 8 9 10 11 12 13
Standardized Residual Observation Order

433
Residual Analysis
Standardized Stat>Regression>Regression
residuals greater
than 2 and less Regression Analysis: payton yards versus payton carries
The regression equation is
than -2 are
payton yards = - 163 + 4.92 payton carries
usually Predictor Coef SE Coef T P
considered large Constant -163.5 172.0 -0.95 0.362
and MINITABTM payton c 4.9162 0.5645 8.71 0.000
labels these
observations with S = 154.0 R-Sq = 87.3% R-Sq(adj) = 86.2%
an R in the table Analysis of Variance

P (Sta nda rdized
of unusual Source
Unusua l observa tions
DF SS MS F
Residua l) Residua l
observations or Regression 1
w ill be discussed la ter.
1798587 1798587 75.85 0.000
ex pressed in Sta nda rd
Residual Error 11 260826 23711
fits and residuals. Devia tions
Total 12 2059413
Unusual Observations
Obs payton c payton y Fit SE Fit Residual St Resid
3 339 1852.0 1503.1 49.3 348.9 2.39R
R denotes an observation with a large standardized residual
Normal Probability Plot of Residuals
To view a normal
probability plot in N orma lly distributed response a ssumption.
MINITABTM select
“Stat>Regression>Fit Normal
NormalProbability
ProbabilityPlot
Plotof
(response
ofthe
theResiduals
Residuals
(responseisispayton
paytonyards)
yards)
ted Line Plot” and 99
99
click on the “Graph” 95 Residua ls

95
button. You will 90
90 should la y nea r
notice underneath 80
80 the stra ight line
70
“Residual Plots” there 70
(to w ithin a fa t
Percent
Percent
60
60
50
are four options to 50
40
40
pencil of ea ch
choose from. For
30
30
20
other).
20
this example select 10

10
55
“Normal plot of
residuals”. We will 11
-3
-3 -2
-2 -1
-1 00 11 22 33
t t Residuals
test R id l vs. St
Standardized
d di d Residual
StandardizedRResidual
id l
Fitted Values and

Residual vs. Order of
Data in the next few
pages.
As you can see the Normal probability plot of residuals evaluates the Normally Distributed response
assumption. The residuals should lay near the straight line to within a fat pencil. Looking at a Normal
probability
b bilit plot
l t tto d
determine
t i normality
lit ttakes
k a littl
little practice.
ti T
Technically
h i ll speaking
ki h however, it iis
inappropriate to generate an Anderson-Darling or any other Normality test that generates a p-value
to determine normality. The reason is that residuals are not independent and do not meet a basic
assumption for using the Normality tests. Dr. Douglas Montgomery of Arizona State University
coined the phrase “fat pencil test” much to the chagrin of many of his colleagues.
434
Residuals vs Fitted Values

Residuals versus
Fitted Values Equa l Va ria nce Assumption
evaluates the
Equal Variance
Residuals
ResidualsVersus
Versusthe
theFitted
FittedValues
Values
Assumption. Here (response
(responseisispayton
paytonyards)
yards)
you want to have a 33
random scattering Should be

of points. 22
ra ndom lyy
ual
al
StandardizedResidua
Residu
sca ttered w ith

11
You DO NOT want no pa tterns.
Standardized
to see a “funnel
00
effect” where the
residuals gets -1
-1
bigger and bigger
as the Fitted Value -2
-2
gets bigger or 500
500 750
750 1000
1000 1250
1250 1500
1500 1750
1750
Fitted
FittedValue
Value
smaller.
Residuals vs Order of Data
Independence Assumption
Residuals
ResidualsVersus
Versusthe
theOrder
Orderof
ofthe
theData
Data
((response
esponse isispayton
(response pa ton yards)
payton a ds)
yards)
33
Should show no trends
either up or dow n a nd
22
should ha ve
Residual
ndardizedResidual
a pprox ima tely

11
the sa me num ber of
Standardized
points a bove a nd
00
below the line
( pprox im
(a i a tely
t l
Stan
-1
-1
consta nt va ria nce).
-2
-2
11 22 33 44 55 66 77 88 99 10
10 11
11 12
12 13
13
Observation
ObservationOrder
Order
Residuals
R id l versus th the order
d off d
data
t iis used
d tto evaluate
l t ththe IIndependence
d d A
Assumption.
ti It should
h ld nott
show trends either up or down and should have approximately the same number of points above
and below the line.

435
Modeling Y=f(x) Exercise
Ex ercise objective: To gain an understanding of how to use

regression/ correlation function in MIN ITABTM . Examine
correlation and regression for the Dorsett data in the RB stats
correlation file and answer the following questions.
1. W hat is the type and magnitude of the correlation?

g Positive
a. Strong
b. Moderate Positive
c. W eak Positive
d. Strong N egative
2. W hat is the prediction equation?
3. W hat is the predicted value or yardage if Dorsett carries the

football 325 times?
4. Are all assumptions met?

RB Stats Correlation.mtw

436
Modeling Y=f(x) Exercise: Question 1 Solution
To determine the Type and Magnitude of the relationship we need to

run a basic Scatter Plot.
Select “ Simple”
For “ Y variable” enter Dorsett Yards for “ X variable” enter Dorsett
carries.
The Scatter Plot demonstrates a “ Strong Positive Correlation” .
Scatterplot
Scatterplotof
of dorsett
dorsettyards
yards vs
vs dorsett
dorsett carries
carries
1750
1750
1500
1500
yards
1250
orsett yards
1250
dorsett
1000
1000
do
750
750
500
500
100
100 150
150 200
200 250
250 300
300 350
350
dorsett
dorsett carries
carries

437
To determine the prediction equation we need to run a Fitted Line Plot.

Stat > Regression > Fitted Line Plot…
Fitted Line Plot
For “ Response Y” enter Dorsett yards

For “ Predictor X” enter Dorsett carries
The prediction equation is shown here…
Fitted
FittedLine
LinePlot
Plot
dorsett
dorsettyards
yards== --160.1
160.1++4.993
4.993dorsett
dorsettcarries
carries
1750
1750 SS 79.3033
79.3033
R-Sq
R-Sq 95.0%
95.0%
R-Sq(adj)
R-Sq(adj) 94.5%
94.5%
1500
1500
yards
1250
dorsett yards
1250
dorsett
1000
1000
750
750
500
500
100
100 150
150 200
200 250
250 300
300 350
350
dorsett
dorsettcarries
carries

438
If Dorsett carries the football 325 times the predicted value would be
determined as follows…
Step 1: Dorsett Yards = 160.1 + 4.993 (Dorsett Carries)
Step 2: Dorsett Yards = 160.1 + 4.993 (325)
Step 3: Dorsett Yards = 160.1 + 1622.725
Solution: Dorsett Yards = 1782.825
If Dorsett carries the football 325 times the predicted value would be determined that Dorsett
would carry the football for 1782.825 yards – approximately!
All three assumptions

have been satisfied.
The N ormality Assumptions have been satisfied.
The Equal Variance Assumptions have been satisfied.
Th IIndependence
The d d A
Assumptions
i h
have b
been satisfied.
i fi d
Residual
ResidualPlots
Plotsfor
fordorsett
dorsettyards
yards
Normal
NormalProbability
ProbabilityPlot
Plot Residuals
ResidualsVersus
Versusthe
theFitted
FittedValues
Values
99
99 22
Residual
N 12
StandardizedResidual
N 12
AD 0.309
90 AD 0.309
90 P-Value 0.510 11
P-Value 0.510
Percent
Percent
00
Standardized
50
50
-1-1
10
10
-2-2
11
SS
-2
-2 -1
-1 00 11 22 500
500 750
750 1000
1000 1250
1250 1500
1500
Standardized
Residual Fitted
FittedValue
Value
Histogram
Histogramof
ofthe
theResiduals
Residuals Residuals
ResidualsVersus
Versusthe
theOrder
Orderof
ofthe
theData
Data
22
33
Residual
11
Frequency
Frequency
22
00
Standardized
11 -1-1
-2-2
00
-2.0
-2.0 -1.5
-1.5 -1.0
-1.0 -0.5
-0.5 0.0
0.0 0.5
0.5 1.0
1.0 1.5
1.5 11 22 33 44 55 66 77 88 99 1010 11
11 12
12
Standardized
Residual Observation
ObservationOrder
Order
Ah, so much satisfaction!

439
At this point, you should be able to:
Perform the steps in a Correlation and a Regression Analysis
Explain when Correlation and Regression is appropriate
You have now completed Improve Phase – Process Modeling Regression.
Notes

440
Lean Six Sigma

Black Belt Training
Improve Phase
Advanced Process Modeling
Now we will continue with the Improve Phase “Advanced Process Modeling MLR”.

441
Overview
W
W elcom
elcomee to
to Im
Improve
prove
Review
Review Corr./
Corr./ Regression
Regression
Process
Process M
Modeling:
odeling: Regression
Regression
N
Non-Linear
on-Linear Regression
Regression
Adva
Advanced
nced Process
Process M
Modeling:
odeling:
M
MLR
LR
Transforming
Transforming Process
Process Data
Data
Designing
Designing Ex
Experim
periments
ents
Multiple
Multiple Regression
Regression
Ex
Ex perim
perimenta
entall M
Methods
ethods
Full
Full Fa
Factoria
ctoriall Ex
Ex perim
periments
ents
Fra
Fractiona
ctionall Fa
Factoria
ctoriall
Ex
Ex perim
periments
ents
W
W ra
rapp Up
Up &
& Action
Action Item
Itemss
The core fundamentals of this phase are as shown.
W will
We ill examine
i ththe meaning
i off each
h off th
these and
d show
h you h
how tto apply
l th
them.
Correlation and Linear Regression Review
Correla tion a nd Linea r Regression a re used:

– W ith historica l process da ta . It is N O T a form of
ex perimenta tion.
– To determine if tw o va ria bles a re rela ted in a linea r fa shion.
– To understa nd the strength of the rela tionship.
– To understa nd w ha t ha ppens to the va lue of Y w hen the
va lue of X is increa sed by one unit.
– To esta blish a prediction equa tion tha t w ill ena ble us to
predict Y for a ny level of X .
Correla tion ex plores a ssocia tion.
Correla tion a nd regression do
nott iim ply
l a ca usa l rela
l tionship.
ti hi
Designed ex perim ents a llow
for true ca use a nd effect
rela tionships.
Correla tions: Stirra te, Impurity
Pea rson correla tion of Stirra te a nd Impurity = 0 .9 6 6
P-Va lue = 0 .0 0 0
Recall momentarily the Simple Linear Regression and Correlation proposed earlier in the Analyze
Phase. The essential tools presented here describe the relationship between two variables. A
independent or input factor and typically an output response. Causation is NOT always proved;
however, the tools do present a guaranteed relationship.

442
Correlation Review
The Pearson
coefficient, Correla tion is used to m ea sure the linea r rela tionship betw een tw o
represented here as continuous va ria bles (bi-va ria te da ta ).
“r”; shows the Pea rson correla tion coefficient “ r” w ill a lw a ys fa ll betw een –1
strength of a a nd + 1 .
relationship in A Correla tion of –1 indica tes a strong nega tive rela tionship, one
Correlation. fa ctor increa ses the other decrea ses.
Between -1 and +1 A Correla tion of + 1 indica tes a strong positive rela tionship, one
are the only values f ctor
fa t increa
i ses so does
d the
th other.
th
in which the value
of the coefficient P-Value ≤ 0.05, Ho: N o relationship
P-Value < 0.05, Ha: Is relationship
can be found and
zero has NO
“ r”
relationship.
Strong No Strong
Correla tion Correla tion Correla tion
The P-value proves
the statistical
th t ti ti l
confidences of our -1 .0 0 + 1 .0
conclusion Decision Points
representing
possibility that
relationship exists, simultaneously; the Pearson correlation coefficient shows the “strength” of the
relationship. For example, P-value standardized at .05, then 95% confidence in a relationship is
exceeded by the two factors tested.
tested
Linear Regression Review

Presented here Stir
Rate is directly Linea r Regression is used to model the rela tionship betw een
related to impurity of a continuous response va ria ble (Y) a nd one or m ore
the process; the continuous independent va ria bles (X ). The independent
relationship between predictor va ria bles a re m ost often continuous but ca n be
the two, is one unit ordina l.
Stir Rate causes – Ex a mple of ordina l - Shift 1 , 2 , 3 , etc.
.4566 Impurity P-Value ≤ 0.05, Ho: Regression equation is not significant
increase. Stir Rate P-Value < 0.05, Ha: Regression equation is significant
locked at 30, and
F itte d L ine P lo t
Impurity calculated 2 0 .0
Im p u r ity = - 0 . 2 8 9 + 0 . 4 5 6 6 S tir r a te
by 30 times .4566,
S 0.919316
R-S q 93.4%
R - S q ( ad j) 92.7%
moreover, 1 7 .5
subtracting .289 1 5 .0
Impurit y
gives us a 13.4 1 2 .5 The cha nge in Y-va lue

Impurity. Granted; for every one unit
1 0 .0 cha nge in (X ) Stirra te
that we have an error (Slope of the Line)
in our model, the red 20 25 30
S t ir r a t e
35 40 45
points do not lie on

the blue line
line.
The dependent response variable is Impurity and the Stir Rate is the independent predictor, as
both variables in this example are perpetual.

443
Correlation Review
Numerical
relationship is left Correla tion only tells us the strength of a linea r rela tionship,
out when speaking not the numerica l rela tionship.
of Correlation. The la st step to proper a na lysis of continuous da ta is to
Correlation shows determine the regression equa tion.
potency of linear
relationship, The regression equa tion ca n ma thema tica lly predict Y for
mathematical a ny given X .
relationship is The regression equa tion from M IN ITABTM is the best fit for
shown by and the plotted da ta .
through the
prediction equation Prediction Equa tions:
of regression. As Y = a + bx (Linea r or 1 st order model)
shown, these Y = a + bx + cx 2 (Q ua dra tic or 2 nd order
correlations or model)
regressions are not Y = a + bx + cx 2 + dx 3 (Cubic or 3 rd order model)
proven casual Y = a (b )x (Ex ponentia l)
relationships, we
are in attempt for
PROVING statistical commonality. Exponential, quadratic, simple linear relationships, or even
predictable outputs (Y) concerns REGERRESION equations. More complex relationships are
approaching.
Simple vs. Multiple Regression Review
Simply Regressions
have one X and are Simple Regression
referenced as the – O ne X , O ne Y
regressors or
predictors;
di t multiple
lti l – Ana lyze in M IN ITABTM using
X’s give reason to • Sta t>Regression>Fitted Line Plot or
output or response • Sta t>Regression>Regression
variable, this is
Multiple Regression
accounts. M ultiple Regression
– Tw o or M ore X ’s, O ne Y
g of the
Strength
regression known – Ana
A llyze in
i M IN ITABTM Using
U i
quantity by R • Sta t>Regression>Best Subsets
squared and dictates • Sta t>Regression>Regression
overall variation in
output (Y),
In both ca ses the R-sq va lue estima tes the
independent variable
a mount of va ria tion ex pla ined by the model.
subjected to the
regression equation.
equation

444
Regression Step Review
g
How to run a Regression
Th ba
The b sici steps
t to
t follow
f ll in
i Regression
R i a re a s follow
f ll s:
is directed above. Using
a Scatter Plot, and 1 . Crea te Sca tter Plot (Gra ph>Sca tterplot)
understanding the 2 . Determine correla tion (Sta t> Ba sic Sta tistics> Correla tion – p-va lue less
tha n 0 .0 5 )
variation between the
3 . Run fitted line plot choosing linea r option (Sta t>Regression>Fitted
X’s and Y’s, then Line Plot)
activate a Correlation 4 . Run regression (Sta t> Regression> Regression) (Unusua l
Analysis allowing a O bserva tions?)
potential
t ti l lilinear 5 . Eva lua te R2 , a djusted R2 a nd p-va
p va lues
relationship indication. 6 . Run non-linea r regression if necessa ry (Sta t>Regression>Fitted Line
Plot)
Third step is to find 7 . Ana lyze residua ls to va lida te a ssumptions.
existing linear (Sta t>Regression>Fitted Line Plot> Gra phs)
mathematical 1 . N orm a lly distributed
relationships which calls 2 . Equa l va ria nce
for a prediction equation, 3 . Independence
4 . Confirm one or tw o points do not overly influence m odel.
and fourth to find the
potency or strength of One step at a time….
the linear relationship
that does exist. Linear
regression accompanied by the variation of the input gives a variety of output results and a
completion of the fifth step denoted, the amount percentage a given output has, including the
answer to strength of statistical confidence within our Linear Regression.
To conclude a Linear Regression exists; majority has that a 95% statistical confidence or above
has to be obtained. If unsatisfied conclusions are drawn, a point of contingency, step 6 is
essential. At present, in step 6, we contemplate the potential Non-linear Regression, however, this
is only vital if we can not find a regression equation (statistical and practical) variation of output by
way of scoping the input; analyzing the model error for correctness. Step 7, is depicted in
subsequent slides, validating residuals are a necessity for a valid model.
Simple Regression Example
Recollection of learning This da ta set is from the mining industry . It is a n eva lua tion
tools in and throughout of ore concentra tors.
the Analyze Phase,
presented here is a Scatterplot
Scatterplotof
ofPGM
PGMconcentrate
concentrate(g/ton)
(g/ton)vs
vsAgitator
AgitatorRPM
RPM
simple Regression 70
70
example examining a 60
60
piece
i off equipment
i t
(g/ton))
concentrate(g/ton)
50
pertaining to a mining 50
PGMconcentrate
company. Plotting the 40

40
diagram output to input, 30

30
following the Regression

PGM
20
20
steps and noticing how
the equipment is agitated 10
10
10
10 15
15 20
20 25
25 30
30 35
35 40
40 45
45
by output of PGM Agitator
A gitatorRPM
RPM
concentrate.
Opening the MINITABTM named “Concentrator.MTW” will show how output is always applied to the
Y axis (dependent), as input is always applied to the X axis (independent).

445
Example Correlation
Identifying the existing

Linear Regression is the
2nd step. Having the
Pearson correlation
coefficient at .847 a P-
value subordinate to .05
we see in fact a very
strong statistical
confidence
fid iin a absolute
b l t
Linear Regression. If no
correlation existed the
coefficient would be
closer to Zero,
remember? Correla tions: PGM concentra te (g/ ton), Agita tor RPM
Pea rson correla tion of PGM concentra te (g/ ton) a nd Agita tor RPM = 0 .8 4 7
P Va lue = 0 .0
P-Va 001
Example Regression Line
Fitted
FittedLine
LinePlot
Plot
PGM
PGMconcentrate
concentrate(g/ton)
(g/ton)== 1.119
1.119++1.333
1.333Agitator
AgitatorRPM
RPM
70
70 SS 9.08220
9.08220
R-Sq
R-Sq 71.8%
71.8%
R-Sq(adj)
R-Sq(adj) 69.0%
69.0%
60
60
(g/ton)
centrate (g/ton)
50
50
ncentrate
40
40
PGMconc
con
30
30
PGM
20
20
10
10
10
10 15
15 20
20 25
25 30
30 35
35 40
40 45
45
Agitator
A gitatorRPM
RPM
Now finding the predicted equation of the linear relationship

relationship, two factors; output response and input
variable. Grams per ton of the PGM concentrate is output and the RPM of the agitator is input.
Knowing that a positive slope exists, by a greater than zero correlation coefficient betokens the
agitators RPM increases as does the PGM concentrate. The slope of Linear Regression equals
1.333. Did you recall that the Pearson correlation coefficient exceeded zero?

446
Example Linear Regression

Shown here is a
Linear Regression of
70% process
variation, considering Regression Analysis: PGM concentrate (g/ton) versus Agitator RPM
step five; a 12 data The regression equation is

point MINITABTM alert PGM concentrate (g/ton) = 1.12 + 1.33 Agitator RPM
for a large residual Predictor Coef SE Coef T P

comes to fruition. R Constant
Agitator RPM
1.119
1.3332
7.106
0.2642
0.16
5.05
0.878
0.001
squared R squared
squared,
adjusted and a S = 9.08220 R-Sq = 71.8% R-Sq(adj) = 69.0%
N otice the unusua l
unusual listing of Analysis of Variance
observa tion m a y indica te
Source DF SS MS F P
observation pertain to Regression 1 2101.1 2101.1 25.47 0.001 tha t a non-linea r a na ly sis
our full Regression Residual Error 10 824.9 82.5 ma y ex pla in m ore of the
Analysis. With these
Total 11 2925.9
va ria tion in the da ta .
concerns refer to Unusual Observations
PGM
MINITABTM window ((if Agitator concentrate
necessary) and a Obs
3
RPM
32.0
(g/ton)
23.30
Fit
43.78
SE Fit
3.21
Residual
-20.48
St Resid
-2.41R
Non-linear
R denotes an observation with a large standardized residual.
Regression might be
in consideration.
Example
p Regression
g Line
Stat>Regression>Fitted Line Plot
Fitted
FittedLine
LinePlot
Plot
PGM
PGMconcentrate
concentrate(g/ton)
(g/ton)== 30.53
30.53--1.460
1.460Agitator
AgitatorRPM
RPM
++0.05586
0.05586Agitator
AgitatorRPM**2
RPM**2
70
70 SS 7.61499
7.61499
R-Sq
R-Sq 82.2%
82.2%
R-Sq(adj)
R-Sq(adj) 78.2%
78.2%
60
60
(g/ton)
ate(g/ton)
50
50
concentrate
PGMconcentra
40
40
30
30
PGM
20
20
10
10
10
10 15
15 20
20 25
25 30
30 35
35 40
40 45
45
Agitator
A gitatorRPM
RPM
Noticing how the new line is more appropriate for our diagram, this is in essence of choosing a
Non-linear Regression and choosing Quadratic Regression. The model option can be used, simply
by clicking the “Quadratic:”. The curvature better fits the plotted points by the distances. Can you
see the difference?

447
Example Linear and Non-Linear Regression

We have here
both
b th R
Regression
i Linea r M odel
models. In terms Regression Analysis: PGM concentrate (g/ton) versus Agitator RPM
of R squared PGM concentrate (g/ton) = 1.119 + 1.333 Agitator RPM
M ore va ria tion is
ex pla ined using the non-
being higher in S = 9.08220 R-Sq = 71.8% R-Sq(adj) = 69.0%
linea r m odel since the R-
percentage rate Analysis of Variance Squa red is higher a nd
on the Non- Source DF SS MS F P the S sta tistic is low er
Regression 1 2101.07 2101.07 25.47 0.001 w hich is the estim a ted
linear model as Error 10 824.86 82.49
Sta nda rd Devia tion of
Total 11 2925.93
apposed to that the error in the model.
of the Linear we Non- Linear Model
see more Polynomial Regression Analysis: PGM concentrate (g/ton) versus Agitator RPM
process PGM concentrate (g/ton) = 30.53 - 1.460 Agitator RPM + 0.05586 Agitator RPM**2
variation, in S = 7.61499 R-Sq = 82.2% R-Sq(adj) = 78.2%
addition, S Analysis of Variance
presents
Source DF SS MS F P
Regression 2 2404.04 1202.02 20.73 0.000
estimated Error 9 521.89 57.99
Total 11 2925.93
St d d
Standard
Deviation of Sequential Analysis of Variance
Source DF SS F P
errors, Non- Linear 1 2101.07 25.47 0.001
Quadratic 1 302.97 5.22 0.048
linear model has
a lower decimal.
Referenced earlier in Measure Phase is Standard Deviation. Take a look if necessary. Let us now
consider the model error, you need not be perplexed, model error has many variables in and of
itself. Output dependency on the impact of other input variables and measurement system errors of
output and inputs can be causes. MINITABTM Session Window displays these very Regression
Analyses feel free to use.
Example Residual Analysis
The recommendation here would be to use standardized residuals and “Four in one” option for
plotting. In the upper left window “Graph” NEEDS to be clicked, appropriate modeling and
analyzing the residuals will conclude the seventh step.

448

Example Residual Analysis
Having selected the “Four in
one”” option,
ti we kknow see allll
presented and keep on the
forefront our assumptions to
consider a valid Regression.
Residuals do not have a
pattern across the data
collected, however, they do
have a similar variation
across the board of Fitted
Values; moreover, in a valid
Regression all residuals will
be distributed.
Similarities between the

residuals across the Fitted
Values in the upper right N orm a lly distributed residua ls (N orm a l Proba bility Plot)
graph show no monumental Equa l va ria nce (Residua ls vs. Fitted Va lues)
differences as to variation. Independence (Residua ls vs. O rder of Da ta )
Random placement of the
residuals are proven by the
bottom right graph, no pattern is in essence. Looking for normality the bottom left graph (the
Histogram) insists we have a bell curve, as does the upper right graph proving residuals placed
near the blue line. Now, have we met the necessary requirements of the criteria? With these
randomly dispersed residual data points finding the impact of just a single one is in confirmation.
Non-Linear Relationships Summary
M ethods to find N on-linea r Rela tionships:

– Sca tter Plot indica ting curva ture.
– Unusua l observa tions in Linea r Regression model.
– Trends of the Residua l versus the Fitted Va lues Plot in

simple Linea r Regression.
– Subject ma tter ex pert k now ledge or tea m ex perience.
In identifying Non-linear Relationships, graphically looking at the variation of output to input on any
given Scatter Plot the Non-linear Relationship is self evident. Using step four of the Regression
Analysis methodology, unusual observation will ask us to focus deeper at Fitted Line Plots to see
what is the solution for the historical data. Detecting a Non-linearity carefully look at the Residuals
vs. Fitted Values graph of a Linear Regression. Finding clustering and/or trends of data could
conclude to a Non-linear Regression. Relying on a team or expert whom has prior knowledge can
avail much information, also.

449
Types of Non-Linear Relationships

The simple
Linear Model,
the quadratic
model, the
logarithm model
and the inverse
model are
descriptive of
the more
conventional
relationships
between outputs
and inputs.
Oh, which formula to use?!
Mailing Response Example
This ex a mple
Thi l w ill demonstra
d t te
t how
h to
t use confidence
fid a nd
d
prediction interva ls.
W ha t percent discount should be offered to a chieve a
m inimum 1 0 % response from the m a iling?
The discount is in sa les coupons
being sent in the m a il.
Clip ’em!
Open the MINITABTM file called “Mailing Response vs. Discount.mtw”. This shows transactions by
a retail store chain, in essence, giving data relationship between discount amounts impact and
response of customers to the mailed coupons
coupons. With input variable being displayed in C1 and output
displayed in C2, Belts need to establish the discount rate that will yield 10% response of customers
mailed. The coupons used to buy merchandise by the % of customers whom received the mailings
is the measured % response.

450
Mailing Response Scatter Plot
The output vsvs.

the input is
graphically
plotted and the
output is only
plotted on the Y-
axis. Notice we Scatterplot
Scatterplotof
of%
%response
responsefrom
frommailing
mailingvs
vs%
%discount
discount
have some 70
70
curvature. 60
60
mailing
frommailing
50
50
40
responsefrom
40
%response
30
30
20
20
%
10
10
00
00 10
10 20
20 30
30 40
40
%
%discount
discount
Mailing Response Correlation
Correla tions: % discount, % response from ma iling
Pea rson Correla tion of % discount a nd % response from ma iling = 0 .9 7 2

P-Va lue = 0 .0 0 0
Now we are testing for a Linear Relationship by running a Correlation, the results of the analysis a
strong confidence because the P-value strikes under .05.
Do you notice the Pearson Correlation Coefficient is almost 1.0 indicating a strong correlation?

451
Mailing Response Fitted Line Plot
This model shows

a very high 94% Fitted Line Plot
% response from mailing = - 11.22 + 1.830 % discount
R-squared. 70 S
R-Sq
5.60971
94.5%
Having noticed 60 R-Sq(adj) 94.1%
% response from mailing

50
earlier the 40
curvature the next 30
step is to consider 20
10
a Non-linear 0 Regression Analysis: % response from mailing versus % discount
Regression -10
0 10 20 30 The40regression equation is
Analysis, % discount
% response from mailing = - 11.2 + 1.83 % discount
following right Predictor Coef SE Coef T P
along the N ote there a re no Constant
% discount
-11.215
1.8301
2.541 -4.41 0.001
0.1179 15.52 0.000
methodology. unusua l observa tions. S = 5.60971 R-Sq = 94.5% R-Sq(adj) = 94.1%
Even though the R Source DF SS MS F P
squa red va lues a re Regression 1 7580.0 7580.0 240.87 0.000
Residual Error 14 440.6 31.5
g , a N on-linea r fit
high, Total 15 8020
8020.5
5
ma y be better ba sed
on the Fitted Line Plot.
Mailing Response Non-Linear Fitted Line Plot
Fitted
FittedLine
LinePlot
Plot
%
%response
responsefrom
frommailing
mailing== - -0.416
0.416++0.1526
0.1526%
%discount
discount
++0.04166
0.04166%%discount**2
discount**2
80
80 S 2.91382
S 2.91382
R-Sq 98.6%
R-Sq 98.6%
70
70 R-Sq(adj) 98.4%
R-Sq(adj) 98.4%
mailing
frommailing
60
60
50
50
eefrom
40
40
response
The R squa red

Th d va lue
l for
f
%response
30
30
20
20 the N on-linea r fit
increa sed to 9 8 .6 % from
%
10
10
00 9 4 .5 % in the Linea r
00 10 20 30 40
10 20
% discount
% discount
30 40 Regression.
Polynomial Regression Analysis: % response from mailing versus % discount
% response from mailing = - 0.416 + 0.1526 % discount + 0.04166 % discount**2
S = 2.91382 R-Sq = 98.6% R-Sq(adj) = 98.4%
Source DF SS MS F P
Regression 2 7910.14 3955.07 465.83 0.000
Error 13 110.37 8.49
Total 15 8020.51
Sequential Analysis of Variance
Source DF SS F P
Linear 1 7579.95 240.87 0.000
Quadratic 1 330.19 38.89 0.000
W are satisfied!
We ti fi d! The
Th application
li ti off a N
Non-linear
li R
Regression
i M Model
d l shows
h an iincreased
dRR-
squared.

452
Confidence and Prediction Intervals
IIn order
d to t a nsw er the
th origina
i i l question
ti it is
i necessa ry to
t
eva lua te the confidence a nd prediction interva ls.
W ha t percent discount should be offered to a chieve a 1 0 %
response from the ma iling?
…..O ptions
A powerful option is the Fitted Line Plot analysis, so click “options” after running
“statregressionfittedlineplot” command. Now select “Display confidence interval” and “Display
prediction interval” and leave the Confidence Level at 95%.
Taking a look at
what has changed in
the MINITABTM Fitted
FittedLineLine Plot
Plot
%
%response
response from
from mailing
mailing == -- 0.416
0.416 ++0.1526
0.1526 %%discount
discount
window by selecting ++0.04166
0.04166 % %discount**2
discount**2
both interval options, Regression
80
80 Regression
Confidence and 95%
95%CI CI
70
70 95%
95%PI PI
Prediction; each
ailing
ailing
M a nua lly dra w a SS 2.91382

2.91382
60
60
interval is assigned vertica l line w here it R-Sq
RR Sq
R-Sq
Sq 98.6%
98 6%
98.6%
ma
fromma
R-Sq(adj) 98.4%
50
50 intersects the low er R-Sq(adj) 98.4%
a color code, the red
response from
prediction interva l line.

40
is Confidence and 40
%response
30
the green is 30
Prediction. In the 20
20
M a nua lly dra w a
previous “Option” 10
10 horizonta l line a t 1 0 %.
%
box we can widen or 0

0
narrow the interval -10
-10
by changing the 00 10
10 20
20 30
30 40
40
%
%discount
discount
Confidence Level, W ith 9 5 % confidence, a discount of
with the Prediction 1 8 % should crea te a t lea st a 1 0 %
response from the m a iling.
intervals we find how
all data falls in
between a range, having a particular confidence level of 95%. Much importance lies upon the
horizontal line, however to answer the original question, we need to find out what Prediction interval
is of our most importance
importance. The percentage of customers who would respond with 18% coupon
mailed would be 10 to 23 %, this being at 95% Confidence Level; moreover, if we had drawn this
horizontal line incorrectly we would have had a result of 10% or less.

453
Confidence and Prediction Intervals

Having g less data
available to Fitted
FittedLine
LinePlot
Plot
predict the %
%response
responsefrom
frommailing
mailing== --0.416
0.416++0.1526
0.1526%
%discount
discount
++0.04166
0.04166% %discount**2
regression discount**2
equation usually 80
80
Regression
Regression
95%
95%CICI
causes the 70
70 95%
95%PIPI
Confidence mailing
ailing
SS 2.91382
2.91382
60
60 R-Sq
R-Sq 98.6%
98.6%
Intervals to flare
mm
R-Sq(adj) 98.4%
50
50 R-Sq(adj) 98.4%
response from
from
out at the 40
40
extreme ends; if
%response
30
30
a prediction
20
20 The Prediction Interva l is the ra nge w here a new
equation exists, observa tion is ex pected to fa ll. In this ca se, w e a re
10
10
it would be
%
9 5 % confident a n 1 8 % discount w ill y ield betw een

00 1 0 % a nd 2 3 % response from the ma iling.
found within the
-10
red lines -10 The Confidence Interva l is the ra nge w here the
00 10
10 20
20 prediction
30
30 equa40tion is ex pected to fa ll. The true
40
indicatingg the %discount
%discount
prediction equa
p q tion could be different. How ever,,
Confidence given the da ta w e a re 9 5 % confident tha t the true
prediction equa tion fa lls w ithin the Confidence
Intervals and the Interva ls.
95% confidence.
Considering the question of yielding 10% or more, finding the regression equation is of menial
importance than to estimate where the data ought to predicted within the relationship. The
prediction intervals will provide a degree of confidence in how the customers will respond, this
estimate is of great importance
importance.
Residual Analysis
Confirming the validity, taking into
To complete the ex a mple, the Residua l Ana lysis
consideration our residuals and va lida tes the a ssumptions for Regression Ana lysis.
completing step seven is next. Having a
variation of outputs is due to a high Residual
ResidualPlots
Plotsfor
for%
%response
responsefrom
frommailing
mailing
Normal
NormalProbability
ProbabilityPlot
Plotof
ofthe
theResiduals Residuals
ResidualsVersus
Versusthe
theFitted
FittedValues
level in R-squared, but from that 99
99
Residuals Values
Residual
22
information we cannot draw the 90
90
11
Percent
conclusion it’s a sufficient model. We

Percent
Standardized
50
50 00
can have confidence in our model. 10

10 -1
-1
because all three assumptions are 11

-2
-2 -1
-1 00
Standardized
11 22
-2
-2
00 20
20 40
40 60
60 80
80
Residual Fitted
FittedValue
Value
satisfied; outputs are Normally and Histogram
Histogramof
ofthe
theResiduals
Residuals Residuals
ResidualsVersus
Versusthe
theOrder
Orderof
ofthe
theData
Data
R d l Di
Randomly Distributed
t ib t d across th
the 6.0
Residual
6.0
22
observation order, and have similar 4.5

4.5 11
Frequency
Frequency
Standardized
variance across the fitted values. The 3.0

3.0 00
1.5 -1
store should give a discount of 18% and 1.5
0.0
-1
-2
0.0 -2
see if they redeem their 10% of -1
-1 00
Standardized
11
Residual
22 11 22 33 44 55 66 77 88 99 10
Observation
1011
1112
ObservationOrder
Order
1213
1314
1415
1516
16
customers mailed.
Now does the present data for the response fit the equation as predicted?

454
Transforming Process Data

In the ca se w here da ta is N on-linea r it is p
possible to perform
p
Regression using tw o different m ethods:
– N on-linea r Regression (a lrea dy discussed)
– Linea r Regression on tra nsform ed da ta
Either the X or Y ma y be tra nsformed.
Any sta tistica l tools tha t requires tra nsform a tion uses these
methods.
methods
Adva nta ges of tra nsform ing da ta :
– Linea r Regression is ea sier to visua lly understa nd a nd
ma na ge.
– N on-norm a l da ta ca n be cha nged to resem ble N orm a l da ta
for sta tistica l a na lyses w here N orma lity is required.
Di dva
Disa d nta
t ges off tra
t nsforming
f i da
d ta
t :
– Difficult to understa nd tra nsform ed units.
– Difficult w ithout a utom a tion or computers.
Majority has it that Belts find data that is abnormally distributed. We have learned doing Non-linear
Regression, but another approach is to transform it into Linear Regression. Outputs or inputs can
be transformed and many people will wonder "what'swhat s the point?”
point? Simplicity is very valuable.
Da ta tha t is a sy mmetric ca n often be tra nsform ed to m a k e it

more sym metric using a numeric function w hich opera tes more
strongly on la rge numbers tha n sma ll ones; such a s loga rithms
a nd roots.
Tra nsform Rules:
1 . Th
The tra
t nsform
f m ustt preserve the
th relal tive
ti order
d off the
th da
d tat .
2 . The tra nsform m ust be a sm ooth a nd continuous function.
3 . M ost often useful w hen ra tio of la rgest to sm a llest va lue is grea ter
tha n tw o (2 ). In m ost ca ses, the tra nsform w ill ha ve little effect
w hen this rule is viola ted.
4 . All ex terna l reference points (spec lim its, etc.) m ust use the sa m e
tra nsform .
Tra nsform a tion Pow er(p)
Cube 3
{ }
Square 2
xp
xtrans= N o Change 1
log(x) Square Root 0.5
Logarithm 0
Reciprocal Root -0.5
Reciprocal -1

455
Effect of Transformation
Using g a mathematical
Before Transform
function we have
transformed this data.
25
This wonderful example, 20
shows the simplicity of
Frequency
15
taking a square root of 10
this data and the 5
distribution became 0
Normal to our dismay;

Normal, 10 20 30 40 50
Right Skew
60 70 80 90 100
The tra nsform ed da ta
the trouble, is to find the After Transform
now show s a N orm a l
distribution.
appropriate transform 20
function. 15
Frequency
10
0
0 10 20 30 40 50 60 70 80 90 100
Sqrt
Transforming Data Using MINITABTM
The Box Cox tra nsform procedure in M IN ITABTM is a method of

determining the tra nsform pow er (ca lled “ la mbda ” in the
softw a re) for a set of da ta .
Transform.MTW
Stat>Control Charts>Box-Cox Transformation
In finding an appropriate transform MINITABTM performs a function to aid the Belt

Belt, this is known as
Box Cox Transformation.

456
Box Cox Transform
MINITABTM has selected a Box-Cox Plot of Pos skew

transform, in the upper 3.0
Lower CL Upper CL
Lambda
graph MINITABTM presents (using 95.0% confidence)

Estimate 0.337726
a lambda of .5, the lambda 2.5

Lower CL
Upper CL
0.136963
0.537207
is a mathematical function 2.0

Best Value 0.500000
StDev
applied to the data. In 1.5
taking a square root, you

can notice two probabilities 1.0
of plots in the graphs. The -1 0 1 2 3

Limit
right plot obviously shows a Before Tra nsform After Tra nsform Lambda
new data set after having 99.9

Probability Plot of Pos skew
Normal
99.9
Probability Plot of BoxCox
Normal
been transformed by the

Mean 1.050 Mean 0.9469
StDev 0.8495 StDev 0.3934
99 N 100 99 N 100
AD 2.883 AD 0.265
square root and the left

95 P-Value <0.005 95 P-Value 0.687
90 90
80 80
70 70
showing abnormal
Percent
Percent
60 60
50 50
40 40
x 0.50 or x
30 30
distribution with red dots

20 20
10 10
5 5
away from the blue line 1
0.1
1
0.1
-2 -1 0 1 2 3 4 5 6
symbolized by a P-value of
0.0 0.5 1.0 1.5 2.0 2.5
Pos skew BoxCox
under .05. Using the

function “Stat, Basic Statistic, Normality Test” confirmation of the change in distribution of the particular
data can be accomplished at your discretion.
Transforming Without the Box Cox Routine

Using the
“Calc.Calculator” Transform.MTW An a lterna tive method of
command in tra nsforming da ta is to use sta nda rd
MINITABTM can tra nsforms.
aid you in an The squa re root a nd na tura l log
attempt to do a tra nsform a re m ost com monly.
transformation by A disa dva nta ge of using the Box Cox
yourself. Type in tra nsforma tion is the difficulty in
a new column reversing the tra nsforma tion.
name in “Store The colum n of process da ta is in C1 ,
result in la beled Pos Sk ew . Rem em ber this
da ta w a s not norm a lly distributed a s
variable:”. If you determined w ith the Anderson
obtain a data set Da rling norm a lity test.
already next
already,
Using the M IN ITABTM ca lcula tor,
place the cursor ca lcula te the squa re root of ea ch
in the observa tion in C1 a nd store in C3 ,
“Expression:” box. ca lling it “ Squa re Root” .
Search for the
name of the function in the lower right area of the window and double click.
Before executing the transformation, make sure the word “number” is highlighted, and now within the
f
function
ti the
th new column
l shall
h ll appear iin th
the “Expression:”
“E i ”b box. Th
The ttransformed
f dddata
t will
ill show
h alongside
l id
the unchanged data, providing you clicking the “OK” button.

457
Transforming Without the Box Cox Routine
When using MINITABTM The output should resem ble this view .
for the majority of Confirm if the new da ta set found in C3 is
commands, the order of norm a lly distributed.
columns is unimportant,
Probability
ProbabilityPlot
Plotof
ofSquare
SquareRoot
moreover; if the square Normal
Normal
Root
root data set appears in

99.9
99.9 Mean 0.9469
Mean 0.9469
StDev 0.3934
99 StDev 0.3934
99 N 100
a different column it is
N 100
AD 0.265
95 AD 0.265
95 P-Value 0.687
P-Value 0.687
90
not a problem. Finding g

90
80
80
70
Percentt
70
Percent
60
that the new data is 60

50
50
40
40
30
30
Normally Distributed 20
20
10
10
5
after creating the 1

5
transformed data set 0.1

0.1
0.0 0.5 1.0 1.5 2.0 2.5
0.0 0.5 1.0 1.5 2.0 2.5
placed under the Square
SquareRoot
Root
column labeled “square

root” is a necessity. O ur tra nsform is the squa re root—the sa m e a s the Box Cox
tra nsform of la mbda = 0 .5
5 Transform.MTW
Hopefully remembering
back to the Measure Phase the “stat, basic statistics, normality test” command is now of great
importance, interestingly enough the Box Cox found the best transformation was the same square
root we executed.
Multiple Linear Regression

Regressions a re run on historica l process da ta . It is N O T a form of
ex perim enta tion.
M ultiple Linea r Regression investiga tes m ultiple input va ria bles’ effect on
a n output sim ulta neously.
– If R2 is not a s high a s desired in the Simple Linea r Regression.
– Process k now ledge im plies m ore tha n one input a ffects the output.
The a ssumptions for residua ls w ith Sim ple Regressions a re still necessa ry
for M ultiple Linea r Regressions.
An a dditiona l a ssum ption for M LR is the independence of predictors (X ’ s).
– M IN ITABTM ca n test for m ulticollinea rity (correla tion betw een the
predictors or X ’ s).
M odel error (residua ls) is im pa cted by the a ddition of m ea surem ent error
for a ll the input va ria bles.
In review, we only do Regression on historical data and Regression is not applied to experimental
data, furthermore, we covered performing Regression involving one input and one output. Now
taking into account Multiple Linear Regressions and when they are applicable, these allow us to
identify Linear Regression including one output and more than one input at the same time. If you
haven’tt identified enough of the output variation,
haven variation recall briefly R-squared measures the amount of
variation for the output in correlation with the input you selected. In looking at the equations on this
page we can assume that in Multiple Linear Regressions each input are independent of one another,
no correlation exists. Having the inputs independent of one another gives each of them their own
slope and we also see the epsilon at the end of the equation, every Regression has model error.

458
Definitions of MLR Equation Elements

The definitions for the elements of the M ultiple Linea r Regression
model a re a s follow s:
Y = β0+ β1X1 + β2X2 + β3X3 + ε
Y = The response (dependent) va ria ble.

X 1 , X 2 , X 3 : The predictor ((independent)) inputs. The predictor
va ria bles used to ex pla in the va ria tion in the observed response
va ria ble, Y.
β0 : The va lue of Y w hen a ll the ex pla na tory va ria bles (the X s) a re
equa l to zero.
β1 , β2 , β3 (Pa rtia l Regression Coefficient): The a mount by w hich
the response va ria ble (Y) cha nges w hen the corresponding X i
cha nges by one unit w ith the other input va ria bles rema ining
consta nt.
ε (Error or Residua l): The observed Y minus the predicted va lue of
Y from the Regression.
Simple linear equations and multiple linear equations are very similar, however each in Multiple
Linear Regression there is partial regression coefficient and beta one and beta zero apply to
Simple Linear Regressions. Earlier we did Regressions in this module, do you recall the residuals
we had? Residuals are defined as the observed value minus the predicted value.
MLR Step Review

The ba sic steps to follow in multiple linea r regression a re:
1 . Crea te ma trix plot (Gra ph> M a trix Plot)

2 . Run Best Subsets Regression (Sta t> Regression> Best Subsets)
3 . Eva lua te R2 , a djusted R2 , M a llow s’ Cp, number of predictors
a nd S.
S
4 . Itera tively determine a ppropria te Regression model.
(Sta t> Regression> Regression > O ptions)
5 . Ana lyze residua ls (Sta t> Regression> Regression > Gra phs)
1 . N orma lly distributed
2 . Equa l va ria nce
3 . Independence
4 . Confirm one or tw o points do not overly influence model.
6 . Verify your model by running present process da ta to confirm
your model error.
With many different input variables on hand and only one output it can be so tedious to find if
variations come from one particular input, using a Matrix Plot can greatly speed up the process and
it will show which is impacting the output the most. After narrowing the field of variables use the best
given command to complete the Multiple Linear Regression,
Regression we identify the correct command by
examining R-squared, R-squared adjustable, #’s of predictors, S variable and Mallows Cp; following
this we must iteratively confirm inputs are statistically significantly. We have then only confirmation
of this valid model and we MUST especially in consideration for Multiple Linear Regressions
process and witness the presently performing Regression.

459
Multiple Linear Regression Model Selection
W hen compa ring a nd verify ing models consider the follow ing:
1 . Should be a rea sona bly sma ll difference betw een R2 a nd R2
- a djusted (much less tha n 1 0 % difference).
2 . W hen more terms a re included in the model, does the
a djusted R2 increa se?
3 . Use the sta tistic M a llow s’ Cp. It should be sma ll a nd less
tha n the number of terms in the model.
4 . M odels w ith sma ller S (sta nda rd devia tion of error for the
model) a re desired.
5 . Simpler models should be w eighed a ga inst models w ith
multiple predictors (independent va ria bles).
6 . The best technique is to use M IN ITABTM ’s Best Subsets
comma nd.
Using “Best Subsets Regression” we will be given multiple statistics, provided by MINITABTM, it
is in our best interest to use the least confusing Multiple Linear Regression model using these
particular guidelines.
Flight Regression Example
An a irpla ne m a nufa cturer w a nted to see w ha t va ria bles

a ffect flight speed. The historica l da ta a va ila ble covered
a period of 1 0 months.
Flight Regression M LR.M TW
The MINITABTM “Flightg Regression

g MLR.MTW” needs to be openedp and we see historical data being
g
analyzed by an airplane manufacturer. Output is listed as flight speeds and the other columns contain
input variables, with these we will build a Matrix Plot and witness the possibility of relationships
among the variables come to fruition. Using the “Graph variables:” box we enter all inputs and
outputs.

460
Flight Regression Example Matrix Plot

Now we are given a
Look for plots tha t show correla tion.
fairly confusing graph
of outputs and inputs
Matrix
Matrix Plot
Plotof
of Flight
FlightSpeed,
Speed, Altitude,
Altitude, Turbine
Turbine Angl,
Angl, Fuel/Air
Fuel/Air rat,
rat,...
...
to interpret. Do not be 600 750 900 32 36 40 99 12 15
600 750 900 32 36 40 12 15
discouraged, this is O utput Response
600
600
just a plethora of Flight

FlightSpeed
Speed 500
500
sporadically plotted, 900

900
400
400
outputs and inputs, 750

750
600
Altitude
Altitude
600
flight speeds vs. 37.0
37.0
altitudes. Seeing at Turbine Angle

Turbine Angle
34.5
34.5
32.0
32.0
least two input having 40

40
correlation shows the

36 Fuel/Air ratio
36 Fuel/Air ratio
32
32
necessity to continue
19.5
19.5
18.0
18.0
with a Multiple Linear ICR

ICR 16.5
16.5
15
Regression. The 15
12
12
lower half has 9

9
Temp
Temp
p
identical data as the 400

400
500
500
600
600
32.0
32.0
34.5
34.5
37.0
37.0
16.5
16.5
18.0
18.0
19.5
19.5
upper half of the

Since 2 or m ore predictors show correla tion, run M LR. Predictors
outputs just the axis
are not reversed.
Flight Regression Example Best Subsets
Best Subsets Regression: Flight Speed versus Altitude,

Turbine Angl, ...
Response is Flight Speed
F
T u
u e
r l
b /
i A
A n i
l e r
t
i A r
t n a T
u g t I e
Mallows d l i C m
Vars R-Sq R-Sq(adj) C-p S e e o R p
1 72.1 71.1 38.4 28.054 X
1 39.4 37.2 112.8 41.358 X
2 85.9 84.8 9.0 20.316 X X
2 82
82.0
0 80
80.6
6 17
17.9
9 22
22.958
958 X X
3 87.5 85.9 7.5 19.561 X X X
3 86.5 84.9 9.6 20.267 X X X
4 89.1 87.3 5.7 18.589 X X X X
4 88.1 86.1 8.2 19.481 X X X X
5 89.9 87.7 6.0 18.309 X X X X X
In MINITABTM using “Best Subsets Regression” command is efficient and powerful by loading all inputs
to a single output; we use the “Free predictors:” box and place all inputs of interest inside it. This
particular command can be helpful in other circumstances,
circumstances however,
however now by placing the output column
of data in the “Response:” box it should be on the right of your screen. This is very simple, evaluation is
done and results are given to you in rows; 1st column - # of variables, 2nd column - R squared, 3rd
column - R squared adjusted, 4th column is mallows Cp, 5th column - Standard Deviation of the model
error and finally the 6th column - input variables.

461
Flight Regression Example Model Selection
Best Subsets Regression: Flight Speed versus

Altitude, Turbine Angl, ...
T
F
u
List of a ll the
u e Predictors (X ’s)
r l
b /
i A
A n i
l e r
t
i A r
t n a T
u g t I e W ha t model w ould you select?
Mallows d l i C m
Vars R-Sq R-Sq(adj) C-p S e e o R p
1 72.1 71.1 38.4 28.054 X Let’s consider the 5 predictor m odel:
1
2
39.4
85.9
37.2
84.8
112.8
9.0
41.358
20.316
X
X X
• Highest R-Sq(a dj)
2 82.0 80.6 17.9 22.958 X X • Low est M a llow s Cp
3
3
87.5
86.5
85.9
84.9
7.5
9.6
19.561
20.267 X
X X
X
X
X
• Low est S
4 89.1 87.3 5.7 18.589 X X X X • How ever there a re m a ny term s.
4 88.1 86.1 8.2 19.481 X X X X
5 89.9 87.7 6.0 18.309 X X X X X
In choosing the correct model our attention goes to the bottom5 term Linear Regression. Are they
all statistically significant?
Stat>Regression>Regression>Options
Let’s go back to “Stat>Regression>Regression” again and click on the “Options” button. Place all
outputs in the “Response:” box and the inputs in the “Predictors:” box.

462
Flight Regression Example Model Selection
Regression Analysis: Flight Speed versus Altitude, Turbine Angle, ...

Flight Speed = 770 + 0.153 Altitude + 5.81 Turbine Angle + 8.70 Fuel/Air ratio
- 52.3 ICR + 4.11 Temp
Predictor Coef SE Coef T P VIF The VIF for tem p indica tes it
Constant 770.4 229.7 3.35 0.003
should be rem oved from the
Altitude 0.15318 0.06605 2.32 0.030 2.3
Turbine Angle 5.806 2.843 2.04 0.053 1.4 m odel. Go ba ck to the Best
F l/Ai ratio
Fuel/Air ti 8
8.696
696 3
3.327
327 2
2.61
61 0
0.016
016 3
3.2
2 Subsets a na ly sis a nd select
ICR -52.269 6.157 -8.49 0.000 2.6
the best m odel tha t does not
Temp 4.107 3.114 1.32 0.200 5.4
include the predictor tem p.
S = 18.3088 R-Sq = 89.9% R-Sq(adj) = 87.7%
Va ria nce Infla tion Fa ctor (VIF) detects correla tion a m ong
predictors.
• VIF = 1 indica tes no rela tion a mong g predictors
p
• VIF > 1 indica tes predictors a re correla ted to som e degree
• VIF betw een 5 a nd 1 0 indica tes regression coefficients a re
poorly estim a ted a nd a re una ccepta ble.
Do you notice any similarities here? A foreign column has appeared, labeled VIF, this appears if a
high correlation among inputs exists. Temp has a high VIF, so we will remove it.
Best Subsets Regression: Flight Speed versus

Altitude, Turbine Angl, ...

N ote: It is not necessa ry to
F
T u re-run the Best Subsets
u
r
e
l
a na ly sis. The num bers do
b / not cha nge.
i A
A n i
l e r
t
i A r
t n a T
u g t I e
Vars R-Sq R-Sq(adj)

Mallows
C-p S
d l
e e
i C m
o R p
Select a m odel w ith 4
1 72.1 71.1 38.4 28.054 X term s beca use Tem p
1
2
39.4
85.9
37.2
84.8
112.8
9.0
41.358
20.316
X
X X
w a s rem oved a s a
2 82.0 80.6 17.9 22.958 X X predictor since it ha d
3 87.5 85.9 7.5 19.561 X X X
3 86.5 84.9 9.6 20.267 X X X correla tion w ith the
4
4
89.1
88.1
87.3
86.1
5.7
8.2
18.589
19.481
X
X
X X
X
X
X X
other va ria bles.
5 89
89.9
9 87
87.7
7 6
6.0
0 18
18.309
309 X X X X X Re-run
Re run the regression.
regression
To start step four we want to take into account the Regression Model that does not include TEMP.
We have satisfied the Best Subsets model; we need not rerun this command.

463
Flight Regression Example Model Selection (cont.)
Regression Analysis: Flight Speed versus Altitude

Altitude, Turbine Angle
Angle, ...

Flight Speed = 616 + 0.117 Altitude + 6.70 Turbine Angle + 12.2 Fuel/Air ratio
- 48.2 ICR
The VIF va lues a re N O W
a ccepta ble.
Predictor Coef SE Coef T P VIF
Constant 616.1 200.7 3.07 0.005
Altitude 0.11726 0.06109 1.92 0.067 1.9 Eva lua te the p-va lues.
Turbine Angle 6.702 2.802 2.39 0.025 1.3 • If p > 0 .0 5 , the term (s)
Fuel/Air ratio 12.151 2.082 5.84 0.000 1.2
ICR -48.158 5.391 -8.93 0.000 1.9 should be rem oved from
tthe
e regression.
eg ess o
S = 18.5889 R-Sq = 89.1% R-Sq(adj) = 87.3%
Rem ove a ltitude, re-run m odel.
In removing Temp, we rerun “stat>regression>regression” command and choose four terms

remaining. No temp in the box, we want 95% confidence and four are remaining of the terms, rerun
to Multiple Linear Regression proceeding the removal of Altitude.
Regression Analysis: Flight Speed versus Turbine Angl, Fuel/Air rat, ICR

Flight Speed = 887 + 4.82 Turbine Angle + 12.1 Fuel/Air ratio - 55.0 ICR
Predictor Coef SE Coef T P VIF

Constant 886.6 150.4 5.90 0.000
Turbine Angle 4.822 2.763 1.75 0.093 1.1
Fuel/Air ratio 12.106 2.191 5.53 0.000 1.2
ICR -55.009 4.251 -12.94 0.000 1.1
S = 19.5613 R-Sq = 87.5% R-Sq(adj) = 85.9%
Re-run the
The P-va lue for Turbine
Angle now indica tes it
Regression
should be rem oved a nd re-
run the Regression
beca use p > 0 .0 5
Here we have removed Altitude from the “Predictors:” box and the Regression
g output
p now
shows the Turbine Angle is not statistically significant.

464
Flight Regression Final Regression Model
Regression Analysis: Flight Speed versus Fuel/Air ratio, ICR

Flight Speed = 1101 + 10.9 Fuel/Air ratio - 55.2 ICR
This is the fina l Regression model
Predictor Coef SE Coef T P VIF beca use a ll rema ining terms a re
Constant 1101.04 90.00 12.23 0.000
sta tistica lly significa nt (w e w a nted
Fuel/Air ratio 10.921 2.163 5.05 0.000 1.1
ICR -55.197 4.414 -12.51 0.000 1.1 9 5 % confidence or P-va lue of less
tha n 0 .0 5 ) a nd the R-Sq show s the
S = 20.3162 R-Sq = 85.9% R-Sq(adj) = 84.8% rema ining terms ex pla in 8 5 % of
the va ria tion of flight
g speed.
p
Source DF SS MS F P
Regression 2 65500 32750 79.35 0.000
Residual Error 26 10731 413 Consider removing this
Total 28 76231 N ote the ICR outlier but be ca reful, this
predictor a ccounts is historica l da ta tha t ha s
Source DF Seq SS for 8 4 .7 % of the
Fuel/Air ratio 1 951 no further informa tion.
ICR 1 64549
va ria tion.
8 4 .7 % = Remember, the objective
Unusual Observations 64549/ 76231 is to get informa tion tha t
Fuel/Air Flight ca n be used in a
Obs ratio Speed Fit SE Fit Residual St Resid
1 40.6 618.00 624.29 11.55 -6.29 -0.38 X
Designed Ex periment
22 36.3 578.00 524.45 5.43 53.55 2.74R w here true ca use a nd
R denotes an observation with a large standardized residual. effect rela tionships ca n
X denotes an observation whose X value gives it large influence. be esta blished.
Shown here is the entire Regression output for a complete discussion of the final Multiple Linear
Regression model. We have 2 predictor variables and all are statistically significant.
Flight Regression Example Residual Analysis
Now having a final model, it is VITAL to confirm the residuals are correct and the model is valid. How do
we do this? Graph and appropriate commands to analyze.

465
Flight Regression Example Residual Analysis (cont.)
Residual
Residual Plots
Plots for
for Flight
FlightSpeed
Speed
Normal
NormalProbability
Probability Plot
Plot of
of the
the Residuals
Residuals Residuals
ResidualsVersus
Versusthe
theFitted
Fitted Values
Values
99
99
Residual
90 22
90
Percent
Percent
50
Standardized
50
00
10
10
11 -2
-2
-3.0
-3.0 -1.5
-1.5 0.0
0.0 1.5
1.5 3.0
3.0 450
450 500
500 550
550 600
600 650
650
Standardized
Residual Fitted
FittedValue
Value
Histogram
Histogramof
of the
the Residuals
Residuals Residuals
ResidualsVersus
Versusthe
theOrder
Orderof
of the
theData
Data
88
Residual
22
66
Frequency
Frequency
Standardized
44
00
22
00 -2
-2
-2
-2 -1
-1 00 11 22 22 44 66 88 10
10 12
12 14
14 16
16 18
18 20
20 22
22 24
24 26
26 28
28
Standardized
ObservationOrder
Order
• N orma lly distributed residua ls (N orma l Proba bility Plot)

• Equa l va ria nce (Residua ls vs. Fitted Va lues)
• Independence (Residua ls vs. O rder of Da ta )
It appears our model is valid and the residuals are satisfactory!
Notes

466
Perform Non-Linear Regression Analysis
Perform Multiple Linear Regression Analysis (MLR)
Examine Residuals Analysis and understand its effects
You have now completed Improve Phase – Advanced Process Modeling.
Notes

467
Lean Six Sigma

Black Belt Training
Improve Phase
Designing Experiments
Now we are going to continue with the Improve Phase “Designing Experiments”.

468
Overview
Within this
module we W
W elcom
elcomee to
to Im
Improve
prove
will provide an
introduction to
Process
Process M
Modeling:
odeling: Regression
Regression
Design of
Experiments, Adva
Advanced
nced Process
Process M
M odeling:
odeling:
explain what M LR Reasons
Reasons for
for Experiments
Experiments
M LR
they are, how
they work and Designing
D i i
Designing Ex
E
Experim
i ents
perim t
ents Graphical
G hi l Anal
Graphical Analysis
A l sis
Analysisi
when to use
them. DOE
DOEMethodology
Methodology
Ex
Ex perim
perimenta
entall M
Methods
ethods
Full
Full Fa
Factoria
ctoriall Ex
Experim ents
periments
Fra
Fractiona
ctionall Fa
Factoria
ctoriall
Ex
Ex perim
periments
ents
W
W ra
rapp Up
Up &
& Action
Action Item
Itemss
Project Status Review
• Understa nd our problem a nd it’s impa ct on the business.

(Define)
• Esta blished firm objectives/ goa ls for improvement. (Define)
• Q ua ntified our output cha ra cteristic.
cteristic (Define)
• Va lida ted the m ea surem ent system for our output
cha ra cteristic. (M ea sure)
• Identified the process input va ria bles in our process.
(M ea sure)
• N a rrow ed our input va ria bles to the potentia l “ X ’s”
s through
Sta tistica l Ana lysis. (Ana lyze)
• Selected the vita l few X ’s to optimize the output response(s).
(Improve)
• Q ua ntified the rela tionship of the Y’ s to the X ’s w ith Y=f(x ).
((Improve)
p )

469
Six Sigma Strategy
iers Cu ts
Suppl st o pu SIPO C
O
m n
ut
ersI
VO C
Con
pu
Project Scope
trac Emplo
tors yees
st
P-M a p, X Y, FM EA
(X1) (X2) (X3) (X4) (X8) (X11) (X9) Ca pa bility
(X6) (X7) (X5) (X10)
Box Plot, Sca tter
(X3) (X4) (X1) (X11) Plots, Regression
(X5) (X8)
(X2)
Fra ctiona l Fa ctoria l

Full Fa ctoria l
(X5) (X3) Center Points
(X11)
(X4)
This is reoccurring awareness. By using tools we filter the variables of defects. When talking of the
Improve Phase in the Six Sigma methodology we are confronted by many designed experiments;
transactional, manufacturing, research.
Reasons for Experiments

The Ana lyze Pha se na rrow ed dow n the ma ny inputs to a critica l
few , now it is necessa ry to determine the proper settings for the
vita l few inputs beca use:
– The vita l few potentia lly ha ve intera ctions.
– The vita l few w ill ha ve preferred ra nges to a chieve optim a l results.
– Confirm ca use a nd effect rela tionships a m ong fa ctors identified in
a na ly ze pha se (e.g. regression)
Understa nding the rea son for a n ex periment ca n help in selecting
the design a nd focusing the efforts of a n ex periment.
Rea sons for ex perimenting a re:
– Problem Solving (Improving a process response)
– O ptimizing (Highest yield or low est customer compla ints)
– Robustness (Consta nt response time)
– Screening (Further screening of the critica l few to the vita l
few X ’s)
Design
g where y
you’re g
going
g - be sure y
you g
get there!
Designs of Experiments help the Belt to understand the cause and effect between the process
output or outputs of interest and the vital few inputs. Some of these causes and effects may include
the impact of interactions often referred to synergistic or cancelling effects.

470
Desired Results of Experiments

Designed
experiments Problem Solving
allows us to – Eliminate defective products or services.
describe a – Reduce cycle time of handling transactional processes.
mathematical O ptim izing
relationship
– Mathematical model is desired to move the process response.
between the
inputs and – Opportunity to meet differing customer requirements (specifications or
VOC)
VOC).
outputs.
t t
However, often Robust Design
the mathematical – Provide consistent process or product performance.
equation is not – Desensitize the output response(s) to input variable changes including
necessary or used N OISE variables.
depending on the – Design processes knowing which input variables are difficult to maintain.
focus of the Screening
experiment. – Past
P t process data
d t is i limited
li it d or statistical
t ti ti l conclusions
l i prevented
t d good
d
narrowing of critical factors in Analyze Phase
When it rains it PORS!
DOE Models vs. Physical Models

Here we have models that are the result of designed experiments. Many have difficulty determining
DOE models from that of physical models. A physical model includes: biology, chemistry, physics
and usually many variables, typically using complexities and calculus to describe. The DOE model
doesn’t include any variables or complex calculus: it includes most important variables and shows
variation of data collected. DOE will focus on the specific region of interest.
W ha t a re the differences betw een DO E modeling a nd

physica l models?
– A Phy sica l model is k now n by theory using concepts of
physics, chemistry , biology, etc...
– Physica l models ex pla in outside a rea of immedia te project
needs a nd include more va ria bles tha n typica l DO E models.
– DO E describes only a sm a ll region of the ex perimenta l
spa ce.
ce
The objective is to
minimize the response.
The physica l m odel is
not important for our
business objective. The
DO E M odel will focus in
the region of interest.

471
Definition for Design of Experiments
Design
D i off E
Ex perim t (DO E) is
i ents i a scientific
i tifi method
th d off Design of Experiment
pla nning a nd conducting a n ex periment tha t w ill yield shows the cause and effect
the true ca use-a nd-effect rela tionship betw een the X relationship of variables of
va ria bles a nd the Y va ria bles of interest. interest X and Y. By way of
input variables, designed
DO E a llow s the ex perimenter to study the effect of ma ny
input va ria bles tha t m a y influence the product or process experiments have been
simulta neously, a s w ell a s possible intera ction effects (for noted within the Analyze
ex a mple synergistic effects). Phase then are executed in
the Improve Phase. DOE
The end result of ma ny ex periments is to describe the tightly controls the input
results a s a m a thema tica l function.
variables and carefully
y = f (x )
monitors the uncontrollable
The goa l of DO E is to find a design tha t w ill produce the variables.
inform a tion required a t a minimum cost.
P
Properly
l designed
d i d DOE’s
DOE’ are more efficient
ffi i experiments.
i
One Factor at a Time is NOT a DOE
Let’s assume a Belt has O ne Fa ctor a t a Time (O FAT) is a n ex perimenta l style but not a
found in the Analyze Phase pla nned ex perim ent or DO E.
that p
pressure and The ggra phic
p show s yield
y contours for a process
p tha t a re
temperature impact his unk now n to the ex perim enter.
Trial Temp Press Yield
process and no one knows Yield Contours Are 1 125 30 74
what yield is achieved for the Unknown To Experimenter 75 2 125 31 80
3 125 32 85
possible temperature and 4 125 33 92
pressure combinations. 80 5 125 34 86
6 130 33 85
ure (psi)
7 120 33 90
If a Belt inefficiently did a One 135
6
85
Factor at a Time experiment 130

Pressu
90
125
1 2 3 4 5 Optimum identified
(referred to as OFAT), one 95 with OFAT
120
variable would be selected to 7
change first while the other

True Optimum available
variable is held constant, 30 31 32 33 34 35 with DOE
once the desired result was Temperature (C)
observed, the first variable
is set at that level and the second variable is changed. Basically, you pick the winner of the
combinations tested.
The curves shown on the graph above represent a constant process yield if the Belt knew the
theoretical relationships of all the variables and the process output of pressure. These contour lines
are familiar if you’ve ever done hiking in the mountains and looked at an elevation map which shows
contours of constant elevation. As a test we decided to increase temperature to achieve a higher
yield. After achieving a maximum yield with temperature, we then decided to change the other factor,
pressure. We then came to the conclusion the maximum yyield is near 92% because it was the highest
p g
yield noted in our 7 trials.
With the Six Sigma methodology, we use DOE which would have found a higher yield using
equations. Many sources state that OFAT experimentation is inefficient when compared with DOE
methods. Some people call it hit or miss. Luck has a lot to do with results using OFAT methods.
472
Types of Experimental Designs
DOE is iterative in The m ost common ty pes of DO E’s a re:

nature and may require
– Fra ctiona l Fa ctoria ls
more than one
experiment at times. • 4 -1 5 input va ria bles
– Full Fa ctoria ls
As we learn more about • 2 -5 input va ria bles
the important variables, – Response Surfa ce M ethods (RSM )
our approach will • 2 -4 input va ria bles
change as well. Iff we Response
have a very good Surfa ce
understanding of our Full Fa ctoria l
process maybe we will Fra ctiona l Fa ctoria ls

only need one
experiment, if not we
very well may need a
series of experiments
experiments.
Fractional Factorials or screening designs are used when the process or product knowledge is low.
We may have a long list of possible input variables (often referred to as factors) and need to screen
them down to a more reasonable or workable level.
Full Factorials are used when it is necessary to fully understand the effects of interactions and when
there are between 2 to 5 input variables.
Response surface methods (not typically applicable) are used to optimize a response typically when
the response surface has significant curvature.
Value Chain
DOE is iterative in Generally

noted is 2 to the k and k is The genera l nota tion used to designa te a full fa ctoria l
number of input variables or design is given by:
factors and 2 is the number of
levels all factors used. If the
experiment called for 3
factors, each with levels, it
would be 2 cubed designs; as • W here k is the number of input va ria bles or fa ctors.
the number of experimental – 2 is the number of “ levels” tha t w ill be used for ea ch
runs are shown
h b
by th
the MATH fa ctor.
denoted. Two levels and four
• Qua ntita tive or qua lita tive fa ctors ca n be used.
factors are shown at the
bottom of our slide; by using
the notation, how many runs
would be involved in this
design? 16 is the answer, of
course.

473
Visualization of 2 Level Full Factorial
Let’s
Let s consider a 2 squared
design which means we have 2 600 (-1,+1) (+1,+1)
300
levels for 2 factors. The factors Temp
of interest are temperature and 350
2
pressure. There are several 2 Press
500
ways to visualize this 2 level Press
Full Factorial design. In 600 500
experimenting we often use Uncoded levels for factors (-1,-1) (+1,-1)
what’s called coded variables. 300F Temp 350F

Coding simplifies the notation.
The low level for a factor is T P T*P Four ex perimenta l runs:
-1 -1 +1 • Tem p = 3 0 0 , Press = 500
minus one, the high level is plus
+1 -1 -1 • Tem p = 3 5 0 , Press = 500
one. Coding is not very friendly
-1 +1 -1 • Tem p = 3 0 0 , Press = 600
when trying to run an +1 +1 +1
experiment so we use uncoded • Tem p = 3 5 0 , Press = 600
Coded levels for factors
or actual variable levels
levels. In our
example 300 degrees is the low
level, 500 degrees is the high level for temperature.
Back when we had to calculate the effects of experiments by hand it was much simpler to use coded
variables. Also when you look at the prediction equation generated you could easily tell which
variable had the largest effect. Coding also helps us explain some of the math involved in DOE.
Fortunately for us,

us MINITABTM calculates the equations for both coded and uncoded data
data.
Graphical DOE Analysis - The Cube Plot
The representation
Consider a 2 3 design on a ca ta pult...
here has two cubed
designs and 2
8.2 4.55 A B C Response
levels of three
factors and shows Run Start
Number Angle
Stop
Angle Fulcrum
Meters
Traveled
a treatment 3.35 1.5 1 -1 -1 -1 2.10
combination table 2 1 -1 -1 0.90
using coded inputs
Stop Angle
3 -1 1 -1 3.35
level settings. The 5.15 2.4
4 1 1 -1 1.50
table has 8 5 -1 -1 1 5.15
6 1 -1 1 2.40
experimental runs. Fulcrum
7 -1 1 1 8.20
Run 5 shows start
8 1 1 1 4.55
angle, stop angle 2.1 Start Angle 0.9
very low and the

fulcrum relatively W ha t a re the inputs being m a nipula ted in this design?
high. How m a ny runs a re there in this ex periment?

474
Graphical DOE Analysis - The Cube Plot (cont.)
MINITABTM generates
Stat>DOE>Factorial>Factorial Plots … Cube, select response and factors
various plots, the cube plot is
one. Open the MINITABTM This gra ph is used by the ex perimenter to visua lize how the
response da ta is distributed a cross the ex perimenta l spa ce.
worksheet “Catapult.mtw”.
Cube
CubePlot
Plot(fitted
(fittedmeans)
means)for
forDistance
Distance
This cube plot is a 2 cubed How do you rea d
design for a catapult using or interpret this 8.20
8.20 4.55
4.55
plot?
three variables:
Start Angle 3.35
3.35 1.50
1.50
11
Stop Angle
Fulcrum
W ha t a re
Stop
StopAngle 5.15 2.40
Here we used coded variable these? 11
Angle 5.15 2.40
level settings so we do not Fulcrum

Fulcrum
2.10 0.90
know what the actual -1
-1
2.10 0.90
-1
-1
-1 1
process setting were in -1
St
Start
t Angle
StartAAngle
l
1
uncoded units. The data Catapult.mtw
means for the response

distances are the boxes on the corners of the cube. If we set the stop angle high, start angle low and
fulcrum high we would expect to launch a ball about 8.2 meters with the catapult.
Make sense?
Graphical DOE Analysis - The Main Effects Plot
This gra ph is used to see the rela tive effect of ea ch fa ctor

on the output response.
Hint: Check the slope!

p
Main Effects Plot (data means) for Distance Main Effects Plot (data means) for Distance Main Effects Plot (data means) for Distance
5.0 5.0 5.0
4.5 4.5 4.5

Mean of Distance
Mean of Distance
Mean of Distance
4.0 4.0 4.0
3.5 3.5 3.5
3.0 3.0 3.0
2.5 2.5 2.5
2.0 2.0 2.0

-1 1 -1 1 -1 1
Start Angle Stop Angle Fulcrum
Stat>DOE>Factorial>Factorial Plots … Main Effects, select response and factors
W hich fa ctor ha s the la rgest impa ct on the output?
The Main Effects Plots shown here display the effect that the input values have on the output
response.
The y axis is the same for each of the plots so they can be compared side by side.
Which has the steepest Slope? What has the largest impact on the output?
Answer: Fulcrum
475
Main Effects Plot’s Creation
Avg Distance at Low Setting of Start Angle: 2.10 + 3.35 + 5.15 + 8.20 = 18.8/4 = 4.70
Main Effects Plot (data means) for Distance
-1 1 -1 1 -1 1
5.2
4.4
Dist
3.6
28
2.8
2.0
Start Angle Stop Angle Fulcrum
Avg. distance at High Setting of Start Angle: 0.90 + 1.50 + 2.40 + 4.55 = 9.40/4 = 2.34
Run # Start Angle Stop Angle Fulcrum Distance
1 -1 -1 -1 2.10
2 1 -1 -1 0.90
3 -1 1 -1 3.35
4 1 1 -1 1.50
5 -1
1 -1
1 1 5 15
5.15
6 1 -1 1 2.40
7 -1 1 1 8.20
8 1 1 1 4.55
In order to create the Main Effects Plot we must be able to calculate the average response at the low
and high levels for each Main Effect. The coded values are used to show which responses must be
used to calculate the average.
Let’s review what is happening here. How many experimental runs were operated with the start angle
at the high level or 1. The answer is 4 experimental runs shows the process to run with the start angle
at the high level. The 4 experimental runs running with the start angle at the high level are run number
2, 4, 6 and 8. If we take the 4 distances or process output and take the average, we see the average
distance when the process had the start angle running at the high level was 2.34 meters. The second
dot from the left in the Main Effects Plots shows the distance of 2.34 with the start angle at a high
level.
Interaction Definition
Intera ctions occur w hen va ria bles a ct together to impa ct the output of
the process. Intera ctions plots a re constructed by plotting both va ria bles
together on the sa m e gra ph. They ta k e the form of the gra ph below .
N ote tha t in this gra ph, the rela tionship betw een va ria ble “ A” a nd Y
cha nges a s the level of va ria ble “ B” cha nges. W hen “ B” is a t its high (+)
level, va ria ble “ A” ha s a lm ost no effect on Y. W hen “ B” is a t its low (-)
level, A ha s a strong effect on Y. The fea ture of intera ctions is non-
pa ra llelism betw een the tw o lines.
Higher
B-
Y
W hen B cha nges
from low to high,
utput
the output drops

Ou
W hen
h B cha
h nges dra ma tica lly.
lly
from low to high,
B+
the output drops
Lower
very little.
- A +

476
Degrees of Interaction Effect

Degrees
g of interaction can
Some Interaction N o Interaction Full Reversal
be related to non-
High High
parallelism and the more B- High
B-
B-
non-parallel the lines are
B+
B+
the stronger the Y Y B+ Y
interaction.
B+
Low Low Low
A common - A + - A + - A +
misunderstanding is that Strong Interaction Moderate Reversal
the lines must actually High
B- High
B-
cross each other for an
interaction to exist but Y Y
that’s NOT true. The lines
B+
may cross at some level B+ B+
Low Low
OUTSIDE of the - A + - A +
experimental region, but
we really don’t know that.
Parallel lines show absolutely no interaction and in all likelihood will never cross.
Interaction Plot Creation
Calculating the points

Interaction Plot (data means) for Distance
to plot the interaction Start Angle
6.5
is not as straight -1
1
forward as it was in 5.5
the Main Effects Plot. 4.5

Mean
Here we have four 3.5
points to plot and 2.5

since there are only 8 (4.55 + 2.40)/ 2 = 3.48
1.5
data points each
(0.90 + 1.50)/ 2 = 1.20
average will be -1
Fulcrum 1
created using data

Run # Start Angle Stop Angle Fulcrum Distance
points from 2 1 -1 -1 -1 2.10
experimental runs. 2 1 -1 -1 0.90
3 -1 1 -1 3.35
This plot is the 4 1 1 -1 1.50
interaction of Fulcrum 5 -1 -1 1 5.15
6 1 -1 1 2.40
with Start Angle on the 7 -1 1 1 8.20
distance. Starting with 8 1 1 1 4.55
the point indicated
with the green arrow above we must find the response data when the fulcrum is set low and start
angle is set high (notice the color coding MINITABTM uses in the upper right hand corner of the plot for
the second factor). The point indicated with the purple arrow is where fulcrum is set high and start
angle is high. Take a few moments to verify the remaining two points plotted.
Let’s review what is happening here. The dot indicated by the green arrow is the mean distance when
the fulcrum is at the low level as indicated by a -1 and when the start angle is at the high level as
i di t d b
indicated by a 1
1. EEarlier
li we said
id th
the point
i t iindicated
di t d bby th
the green arrow h
had
d th
the ffulcrum
l att th
the llow
level and the start angle at the high level. Experimental runs 2 and 4 had the process running at those
conditions so the distance from those two experimental runs is averaged and plotted in reference to a
value of 1.2 on the vertical axis. You can note the red dotted line shown is for when the start angle is
at the high level as indicated by a 1.
477
Graphical DOE Analysis - The Interaction Plots
Based on how many

Stat>DOE>Factorial>Factorial Plots … Interactions, select response and factors
factors you select
MINITABTM will create W hen you select more tha n tw o va ria bles, M IN ITABTM genera tes
a number of a n Intera ction Plot M a trix w hich a llow s you to look a t intera ctions
interaction plots. sim ulta neously. The plot a t the upper right show s the effects of
Sta rt Angle on Y a t the tw o different levels of Fulcrum. The red
Here there are 3 line show s the
factors selected so it effects of Interaction
InteractionPlot
Plot(data
(datameans)
means)for
forDistance
Distance
generates
t the
th 3 Fulcrum on Y -1-1
1
Start
11 -1-1
1 11
Start 66
interaction plots. w hen Sta rt AAngle

ngle
-1-1
These are referred to Angle is a t its Star t AAngle

Start ngle 11 44
as 2-way interactions. high level. The 22
Stop
bla ck line Stop
AAngle
ngle
66
-1-1
represents the Stop 11
StopAAngle
ngle
44
effects of 2
2
Fulcrum on Y
w hen Sta rt Fulcr um
Fulcrum
Angle is a t its
low level.
MINITABTM will also plot the mirror images, just in case it is easier to interpret with the variables
flipped. If you care to create the mirror image of the interaction plots, while creating interaction plots,
click on “Options”
p and choose “Draw full interaction pplot matrix” with a checkmark in the box. These
mirror images present the same data but visually may be easier to understand.
Stat>DOE>Factorial>Factorial Plots … Interactions, select response and factors

The plots a t the low er left in the gra ph a bove (outlined in blue) a re the
“ m irror im a ge” plots of those in the upper right. It is often useful to look
a t ea ch intera ction in both representa tions.
Interaction
InteractionPlot
Plot(data
(datameans)
means)for
forDistance
Distance
-1-1 11
Start
Start
66
AAngle
ngle
-1-1
44
Star t AAngle
ngle 11
Start
22
Stop
Stop
66
AAngle
ngle
-1-1
44
Stop
St 11
StopAAngle
ngle
l
22 Choose this option

6
6
Fulcrum
Fulcrum
-1-1
for the a dditiona l
4
4 Fulcr um
Fulcrum
11
plots.
2
2
-1 1 -1 1
-1 1 -1 1

478
DOE Methodology
1 . Define the pra ctica l problem

2 . Esta blish the ex perimenta l objective
3 . Select the output (response) va ria bles
4 . Select the input (independent) va ria bles
5 . Choose
Ch th
the levels
l l for
f the
th input
i t va ria
i bles
bl
6 . Select the ex perimenta l design
7 . Ex ecute the ex periment a nd collect da ta
8 . Ana lyze the da ta from the designed ex periment a nd
dra w sta tistica l conclusions
9 . Dra w p
pra ctica l solutions
1 0 .Replica te or va lida te the ex perimenta l results
1 1 .Implement solutions
Generate Full Factorial Designs in MINITABTM
It is easy to
generate full
factorial designs in
MINITABTM.
Follow the
command path
shown here.
These are the
designs that
MINITABTM will
create. They are
color coded using
th R
the Red,
d Y
Yellow
ll
and Green. Green
are the “go”
designs, yellow
are the “use
caution” designs
and red are the
stop, wait and
“stop,
think” designs. It
has a similar
meaning as do
street lights.

479
Create Three Factor Full Factorial Design
Stat>DOE>Factorial>Create Factorial Design
Let’s create a three factor full factorial design using the MINITABTM command shown at the top of the
graphic above. This design we selected will give us all possible experimental combinations of 3 factors
using 2 levels for each factor.
factor
Be sure to have changed the number of factors as seen in the upper left to “3”. Also be sure not to forget
to click on the “Full factorial” line within the Designs box shown in the lower right of the graphic.
In the “Options” box

of the upper left
MINITABTM display,
display
one can change the
order of the
experimental runs.
To view the design in
standard order (not
randomized for now)
be sure to uncheck
the default of
“Randomize runs” in
the “Options” tab.
“Un-checking”
means no
checkmark is in the
white box next to
“Randomize
Randomize runs
runs”.

480
Create Three Factor Full Factorial Design (cont.)

Enter the names of the
three factors as well as
the numbers for the
levels shown in the
lower right portion of
this graphic. To reach
this display, click on
“Factors…” in the upper
left hand display.
display
Remember when we
discussed uncoded
levels? The process
settings of 140 and 180
for the start angle are
examples of uncoded
levels.
Three Factor Full Factorial Design

Here is the worksheet
MINITABTM creates. If you
had left the randomize runs
selection checked in the
Options box, your design
would be in a different
order than shown. Notice
the structure of the last 3
columns where the factors
are shown. The first factor,
start angle
angle, goes from low
to high as you read down
the column. The second
factor, stop angle, has 2
low then 2 high all the way
down the column and the
third factor, fulcrum, has 4
low then 4 high.g Notice the structure jjust keeps
p doublingg the p
pattern. If we had created a 4 factor full
factorial design the fourth factor column would have had 8 rows at the low setting then 8 rows at the
high setting. You can see it is very easy to create a full factorial design. This standard order as we
call it is not however the recommended order in which an experiment should be run. We will discuss
this in detail as we continue through the modules.
One warning to you as a new Belt to using MINITABTM. Never copy, paste, delete or move columns
within the first 7 columns or MINITABTM may not recognize the design you are attempting to use.
Is our experiment done? Not at all. The process must now be run at the 8 experimental set of
conditions shown above and the output or outputs of interest must be recorded in columns to the
right of our first 7 columns shown. After we have collected the data we will then analyze the
experiment. Remember the 11 Step DOE methodology from earlier?

481
Determine the reason for experimenting
Describe the difference between a physical model and a

DOE model
Explain an OFAT experiment and its primary weakness
Shown Main Effects Plots and interactions, determine

which effects and interactions may be significant
Create a Full Factorial Design
You have now completed Improve Phase – Designing Experiments.
Notes

482
Lean Six Sigma

Black Belt Training
Improve Phase
Experimental Methods
Now we will continue with the Improve Phase “Experimental Methods”.

483
Within this module

we will go through Welcome
Welcome to
to Improve
Improve
a basic introduction
to Designing
Process
Process Modeling:
Modeling: Regression
Regression
Experiments
Advanced
Advanced Process
Process Modeling:
Modeling:
MLR
MLR
Designing
Experiments Methodology
Methodology
Experimental
Methods Considerations
Considerations
Full
Full Factorial
Factorial Experiments
Experiments Steps
Steps
Fractional
Fractional Factorial
pp
Experiments
Wrap
Wrap Up
Up &
& Action
Action Items
Items
DOE Methodology
In this module we will describe the 11 step DOE methodology some basic concepts and lots of
fun and exciting terminology. Once again great content for dinner conversation later tonight!
1 . Define the Pra ctica l Problem

2 . Esta blish the Ex perim enta l O bjective
3 . Select the O utput (response) Va ria bles
4 . Select the Input (independent) Va ria bles
5 . Choose the Levels for the input va ria bles
6 . Select the Ex perimenta l Design
7 . Ex ecute the ex perim ent a nd Collect Da ta
8 . Ana lyze the da ta from the Designed Ex periment a nd
dra w Sta tistica l Conclusions
9 . Dra w Pra ctica l Solutions
1 0 .Replica te or va lida te the ex perimenta l results
1 1 .Im
Im plem ent Solutions

484
Questions to Design Selection
Project Management Considerations
What is the process environment:

1. How much access to the process?
2. Are the team members fully involved and any subject matter experts?
3. Who are the process owners and stakeholders?
4. Are the process owners involved?
5. Do the process owners know what a DOE is ?
6. Do the process owners know what the DOE means to them?
7. How many runs can you afford (time and money)?
8. Will you run the DOE at the process or in a lab?
9. What noise variables need to be designed around?
10. How large of an experimental region will be explored for the DOE?
So you’ve decided to use Designed Experiments. Shown here are 10 basic project management
considerations before running any experiment. This is obviously not an exhaustive list, but certainly
some important questions to consider and answer.
What is behind some of these questions? Let’s briefly discuss a few aspects individually.
1.Access to a process is necessary for proper monitoring and execution of a project. If restricted
access for whatever reason exists, then work around must exist.
2.If the team members or subject matter experts aren’t fully involved, then potential conflicts or
unrealistic designs may be awaiting you for a poor experiment.
3.If the Process Owners and stakeholders are unknown to you before execution of an experiment rude
awakenings such as cancellations, scheduling conflicts and other nightmares can occur.
4 No one wants to be told what will happen to the process they are managing so if you don’t involve
4.No
them in the experimental design even if it involves reviewing the team’s designed experiment, how do
you expect cooperation?
5.If the Process Owners don’t understand what your DOE is, how can they assist you?
6.Does your DOE intend to make a wide range of quality product or potentially produce an
unacceptable product in the quest to improve the process? If the Process Owner has never known
what your DOE intentions were, how can they not be upset if they are surprised by the results of the
DOE?
7.Time and money impact scheduling, randomization, testing concerns. All of these must be
considered especially when using the actual process.
8.It is often desirable to run DOE’s in a pilot plant or facility but this is not often the case. If a pilot
facility is to be used, do the results match the process when translated outside of the laboratory?
9.Noise variables cannot be controlled, by definition, but if ambient weather is considered to have an
effect on your process, why would you execute an experiment when a cold or warm front is passing
through your area. This is one example of a known disturbance being designed around.
10 Manage your project to know if the DOE is intended to stretch the boundaries of conceived product
10.Manage
creation or work well within a small experimental area.
There are many considerations to consider. Often learning comes through experience so if you are
unsure about your future experiment in this project or another, consult with mentors or Six Sigma belts.

485
Questions to Design Selection (cont.)
Technical Considerations
What are the objectives/goals for the experiment:
1. What factors are important? (narrowed from Analyze Phase)
USL
2. What is the operating range for each factor?
3. How can I minimize both the cost of DOE and the cost of
6 Sigma
running the process? 5 Sigma
4. How much change in the process do we require?
4 Sigma
5. How close to optimal does the process currently run?
6. Are we tackling a centering or variation problem? 3 Sigma
7. What impact to the process while running the DOE?
2 Sigma
8. What is the cost of competing DOE designs?
1 Sigma
9. What do you know about the process interactions?
The technical considerations to be made, these need be answered before running an experiment.
Making sense of these at the present is not necessary.
DOE Methodology Step 1
First define the

problem in a 1 . Define the Pra ctica l Problem
practical sense.
Will we achieve
hi allll • W rite down how the experiment connects with the original project
that is necessary; scope. Practically speaking, what is this experiment supposed to
it does in certain accomplish?
circumstances
take multiple 1. Identify Root Cause
experiments? 2. Measure Variation
3. Measure Output Response
Notice an
• Have the measurement systems been verified for the Input Variables
example of this
and Output Response?
shown here.
A circuit boa rd m a nufa cturer w a nted to identify w ha t fa ctors im pa ct

the a dhesion level betw een circuit boa rds. The fa ctors a nd output ha d
sa tisfa ctory ga ge R& R results of less tha n 1 5 % study va ria tion.

486

In Step 2,
2 we have to determine the critical characteristics and the desired outcome; This gives us
our critical characteristic.
2 . Esta blish the Ex perim enta l O bjective
• Objective must include the critical characteristics and the desired outcome.
– If the experiment
p and pproject
j is tackling
g recurring
g issues,, consider a
different critical characteristic.
• The characteristic may require a different physical phenomenon
being measured or with a differing measurement system.
• The measurement system precision and accuracy may influence the
specific output to be measured.
• Identify the desired experimental outcome.
1 Eliminate Root Cause
1.
2. Reduce Variation
3. Achieve a target
4. Maximize Output Response
5. Minimize Output Response
6. Robust process or product
Step 3 is knowing that a DOE is going to be performed, does it makes sense to go an extra mile?
Let’s get our money’s worth by measuring more than one output if it could benefit us in any way.
• Is the output(s) qualitative or quantitative?

• W hat was the past Response Variable’s baseline results?
• Is the output(s) typically under statistical control?
• Does the output(s) vary with time?
• How much change in the output(s) do you want to detect?
• Is the measurement system adequate with the same units of measure
as identified in Step 1?
– For experimental reasons, this measurement may be different
than your past outputs considered.
• How many outputs?
The output is ta ck iness a nd is m ea sured in N ew tons (force).

The output m ea surem ent must be done w ithin a n hour of production a nd
the m ea surem ent sy stem ha s not cha nged. W e w a nt to detect a t lea st a
cha nge in ta ck iness of 1 5 N ew tons in the Response Va ria ble.

487

Step 4 is to select the input or independent variables.
variables At this point you should have a decent
understanding of the variables that need to be explored as a result of the work accomplished in the
previous phases.
4 . Select the Input (independent) Va ria bles
• Use the Analyze Phase and subject matter experts to select these factors.
• All factors must be independent of each other.
• Consider past results from previous experiments.
• Test the most likely candidates first.
• Factors not included in the designed experiment should be held constant
and recorded.
• N oise
i or uncontrollable
t ll bl ffactors
t (t
(typically
i ll environmental
i t l conditions)
diti ) should
h ld
be monitored and the experimental design may be impacted (see Step 6).
The inputs selected by the tea m follow ing the Six Sigm a m ethodology a re
dw ell tim e (sec), tem pera ture of solution (deg F) a nd concentra tion of solution
((% solids).
) N oise fa ctors of a mbient tem ppera ture a nd hum idity
y w ere
recorded a nd m onitored.
Step 5 is to choose
the levels for the 5 . Choose the Levels for the Input Va ria bles
input variables. The
• Factor levels must be considered to create the desired change in Output
factor levels must be
Response identified in Step 3.
considered to create
the desired change • Do N OT create unsafe conditions or beyond the feasibility of the process.
in the output – This does N OT mean constraining Input Variable levels to current
response as process range.
identified in Step 3
3. – Be wary y if operating
p g near the extremes or operating
p g limits.
Poor choices for • Realize some experimental runs may produce unacceptable product or
input variable level process results. These results must be weighed against the risk of future
settings could very production.
well render an • Even when designing your experiment with coded levels for the factors, the
experiment useless team MUST be aware of what the levels mean in the process language.
so be smart. • Factor levels can be impacted by the Experimental Objective in Step 2.
– Screening g experiments
p have wider settings
g for factors
– Full Factorials have narrower settings than screening experiments
– Response surface Designed Experiments have quite narrow settings

488
DOE Methodology Step 5 (cont.)

Do not set the levels too wide, this may cause our experiment to lose very valuable output response.
Making an assumption by way of drawing what you have in your mind of what it will look like, helps a
great deal.
5 . Choose the Levels for the Input Va ria bles

• Setting the factor levels too wide may cause the experiment to miss an
important region or change in the Output Response.
Results of experiment show no

significant difference in settings
onse
Output Respo
“-” “+”
F
Factor Settings
S i
Be aware you do not want to set the factor levels too low either. We could be shown no difference in
output to input relationship.
5 . Choose the Levels for the Input

p Va ria bles
• Setting the factor levels too narrow will show no difference in the output or
not give enough statistical confidence in the effect of the factor on the output
relative to the noise in the experiment.
Output Response
O
“ -” “ +” Factor Settings

489

Input
p variable level settingsg
should be set far enough 5 . Choose the Levels for the Input Va ria bles
apart to detect a difference • Should be set far enough apart to detect a difference in the response and to
in the response and to have have enough statistical confidence in the change of the output relative to the
enough statistical experimental noise.
confidence in the change of
the output relative to the
ponse
experimental noise.
A
Assume this
thi graphic
hi was a
Output Resp
sketch generated from our
basic understanding of the
theory. We don’t know
exactly what factor setting Factor Settings
would produce the output “ -”
response but we do know “ +”
the ggeneral shape
p of the The ex p perim ent is usingg coded levels:
curve. Notice that we Dw ell tim e: +1 (2 0 sec); -1 (1 0 sec)
Temp of sol’n: + 1 (8 0 deg F); -1 (1 0 0 deg F)
stayed away from the sharp Conc. of sol’n: + 1 (4 0 %) ; -1 (2 0 %)
peak. It is very easy to slide
off such a steep peak,
unless your process controls are very tight it would be better to find the nice robust region where the
output response is high but flat, meaning that the factor settings can change a bit, but it does not
have much effect on the output response. If the concern for spending too much time on this comes
up, also,
l consider
id h how many d defects
f t are ttaken
k iin when h th the statistical
t ti ti l significance
i ifi iis d
deemed d
inadequate.
You might think we have spent too much time on just setting the levels for the input variables or
factors in your experiment. However, consider the learning of others who have had to go back to
their Process Owners or Champions and explain that no factors were deemed statistically significant
because the design was inadequate.
Step 6 is to select the

experimental design. In
the green where it says • Factorial Design (full vs.
full, we have full factorials; fractional)
and for the rest of the – Full designs typically have
factorials we will discuss 5 or fewer factors
them later. Here we are • All interactions can be
selecting the estimated
Experimental Design. – Screening or Fractional
Factorial designs have
many factors
• N ot all interactions
can be estimated

490

Step 6 involves
selecting the Balanced and orthogonal designs are highly encouraged and the
Experimental Design. definition of balanced and orthogonal is covered in a later module.
DOE’s can be designed
Center Points are used for investigating curvature and advanced
in many ways but
designs. Center Points are covered in a later module.
balanced and
orthogonal designs are Blocking can be used to account for noise variables and is covered in a
highly encouraged. later module.
MINITABTM will always
design a balanced and
orthogonal design if you
use the program to
design your I’m keeping out the Noise coach!!
experiment.
Remember our advice

that subject matter experts along with your team members should pay attention to their experience
and the previously gathered and analyzed data. If curvature is suspected, center points are used to
confirm if curvature exists within the experimental region.
Remembering that noise variables can’t be controlled but managed around, blocking is a technique
for managing your experiment around noise variables considered of importance. Remember, you are
interested in understanding the effects and interactions of your controlled variables so you want
statistical confidence
confidence.
Randomization has an
impact on your statistical Randomization has an impact on your statistical confidence because your
experimental noise is spread across the runs.
confidence because your
experimental noise is
spread across the runs.
What would happen if
another unknown significant
variable changed halfway
during our experiment?
It is possible that an
unknown significant variable
such as machine warm up
would g get confused with the What would happen if
C variable because without another unknown
randomization all the low significant variable
changed halfway
levels would be generate thru our experiment?
first and then all the high
levels?

491
Determining sample
size is very similar to
what we did in the Sa m ple size m ust be determ ined.
Analyze Phase.
There are a few
Determine
distinctions. Much of d by Step
4.
For full
fa ctoria ls,
the values are self- this equa ls

2 fa ctors
explanatory.
As in the Analyze Specified

in Step 2 .
Phase, we are
typically solving for
the number of See first
Ty pica lly
0 .9
replicates, but you σ of process

output
slide of
Step 6 .
va ria ble
can work the
numbers backwards
as we did before and
estimate how big an After number of replicates is determined, we must decide the sampling
effect could be strategy.
detected.
“Number of corner points” is the number of experimental runs in the base design before any
replication or center points are added.
Effects is the same as delta in the Analyze Phase.

“Effects” Phase How much of a difference do you need to detect
detect.
You have the choice of using real values or simply estimating in terms of Standard Deviations. If you
use an estimate in Standard Deviations, then the Standard Deviation should be 1.0.
Here we have
a 2 cubed
design which
gives us 8
corner points
i t A sa mplel size
i off 2 iis
and have indica ted for the
used an effect ex a mple show n. W ha t
of 2 Standard does this mea n?
Deviations to
determine the
sample size. Power and Sample Size
2 L
2-Level
l F
Factorial
t i l DDesign
i
MINITABTM
then shows Alpha = 0.05 Assumed standard deviation = 1
us that we Factors: 3 Base Design: 3, 8

need to have Blocks: none
2 Reps.
Center Total Target
WHAT THE Points Effect Reps Runs Power Actual Power
HECK IS A 0 2 2 16 0.9 0.936743
REP??

492

A rep is a
Replication of an experimental run is an independent observation of the run that
replication. A represents variation from experimental run to experimental run.
replication is an – A replicate must be made at a unique time or sequence in the experiment.
independent
observation of the
Single Replicate Design Replicated Design (2)
run that represents
variation from
experimental run to
experimental run.
A replication is
NOT a duplicate or
a repeat. Look at
the two designs
shown here. The
g
first is a single
replicate design,
which means there
is only one value
for each unique experimental run. The terminology is a bit confusing, but don’t worry.
The replicated design has double the runs. The design is fully randomized whenever possible so this is
not the order in which it is run.
Notice how experimental run #1 and #9 have the three factors which are start angle, stop angle and
fulcrum, running with the same combination of levels and then experimental run #9 is a replicate of run
#1.
Additional considerations are required when determining what a

sample size means.
For the experimental results to be representative of the process,

sample across the largest family of variation.
– It is also necessary to determine how to define a representative
sample and experimental unit.
• Characteristics of a representative sample are:
– Repeatable measurement and represents natural
variation of the process.
• An experimental unit is the basic unit to which an
experimental run can be applied and includes all the
qualities of a representative sample.

493
Recall from the Analyze Phase the Multi-Vari tool described the three
families of variation. Consider these families of variation to determine
how to sample with replication for an experiment.
– W ithin Unit or Positional
• W ithin piece variation related to the geometry of the part.
• Variation across a single unit containing many individual parts
such as a wafer containing many computer processors.
• Location in a batch process such as plating.
– Between Unit or Cyclical
• Variation among consecutive pieces.
• Variation among groups of pieces.
• Variation among consecutive batches.
• Temporal or Over time
• Shift-to-Shift
• Day-to-Day
• W eek-to-W eek

Step 7 is to Execute the Experiment and Collect Data.
7 . Ex ecute the Ex periment a nd Collect Da ta
• Discuss the experimental scope, time and cost with the process owners
prior to the experiment.
• Some team members must be present during the entire experiment.
• After the experiment has started, are you getting output responses you
expected?
– If not, quickly evaluate for N oise or other factors and consider
stopping or canceling the experiment.
• Use a log book to make notes of observations, other factor settings, etc.
• Communicate with the operators, technicians, staff about the
experimental details and why the experiment is being discussed before
running the experiment.
– This communication can prevent “ helping” by the operators,
technicians, etc. that might damage your experimental design.
• Alert the laboratory or quality technicians if your experiment will
increase the number of samples arriving during the experiment
experiment.

494

Step
p 8 is to Analyze
y the data from the Designed
g Experiment
p and draw Statistical Conclusions.
8 . Ana ly ze the Da ta from the Designed Ex perim ent

a nd dra w Sta tistica l Conclusions
• Graphical Analysis has already been covered in the previous

modules.
modules
• Further analysis of “ reducing” the model to the significant terms
will be covered in the next module.
• Further analysis of “ reducing” the model to the significant terms
will be covered in the next module.
• The final model fitting will occur.
• Terms in the final DOE equation will have statistical confidence
you needed
needed.
• Diagnose the residuals similarly to that of Regression Analysis.
• Details of this step are covered in the next module.

Step 9 is to Draw Practical Solutions.
• This will be covered in detail in the next module.

• Even if terms or factors are statistically significant, for practical
significance the term might be removed.
• “ Stat>DOE>Factorial>Response Optimizer” will help the project team
find where the vital few factors need to be targeted to achieve the
desired output response.
– This will be covered in detail in the next module.
• This step is how the project team determines the project’s potential
success.
• Immediately share the results with the process owner for feedback on
implementation of the experimental results.

495

Step 10 is to Replicate or Validate the Experimental Results
Results.
1 0 . Replica te or Va lida te the Ex perimenta l Results
• After finding the Practical Results from Step 9, verify the results:
– Set the factors at the Practical Results found with Step 9 and see
if the process output responds as expected. This verification
replicates the result of the experiment.
– Do not forget your model has some error.

And the final step is to Implement Solutions. We spend so much time with the 11 step
methodology for a couple of reasons. One, it is easy to get confused or excited about running a
Designed Experiment. Two, experiments are easy to design with the help of MINITABTM but
difficult to execute appropriately and achieve statistical results unless you follow a planning
approach as we have discussed here. Overall there is a lot that can be overlooked or not done
properly, take your time and follow this process, it WILL ensure better results.
1 1 . Implement Solutions
• If the objective of the experiment was accomplished and the Business

Case is satisfied, then proceed to the Control Plan which is covered
in the Control Phase.
• Do not just run experiments and not implement the solutions.
• Further experiments may need to be designed to further change the
output
ou pu to
o sa
satisfy
s y the
e Business
us ess Case
Case.
– This possible need for another experiment is why we stated in
earlier modules that DOE’s can be an iterative process.
You will p
probably
y not fully
y appreciate
pp all the comments in the modules of this p
phase until yyou have
designed, managed, executed and analyzed a few real life experiments for yourself.

496
Be able to Design, Conduct and Analyze an Experiment
You have now completed Improve Phase – Experimental Methods.
Notes

497
Lean Six Sigma

Black Belt Training
Improve Phase
Full Factorial Experiments
p
Now we will continue in the Improve Phase with “Full Factorials”.

498
In this module
we will discuss W
W elcom
elcomee to
to Im
Improve
prove
the Full
Factorial in Process
Process M
Modeling:
odeling: Regression
Regression
detail.
Adva
Advanced
nced Process
Process M
Modeling:
odeling:
M
MLR
LR
Designing
gg gg Ex
Designing Ex perim
pperim
p ents
ents
Mathematical
Mathematical Models
Models
Ex
Experim
perimenta
entall M
Methods
ethods
Balance
Balance and
and Orthogonality
Orthogonality
Full
Full Fa
Factoria
ctoriall Ex
Ex perim
periments
ents
Fit
Fit and
and Diagnose
Diagnose Model
Model
Fra
Fractiona
ctionall Fa
Factoria
ctoriall
Ex
Experim
periments
ents Center
Center Points
Points
W
W ra
rapp Up
U &
Up & Action
A
Action
ti Item
Itemss
It
Why Use Full Factorial Designs
Two level Full Factorial designs are the most powerful and efficient set of experiments.
2 k Full Fa ctoria l designs a re used to:

• Investigate multiple factors at only two levels, requiring fewer runs than multi-
level designs.
g
• Investigate large number of factors simultaneously in relatively few runs.
• Provide insight into potential interactions.
• Frequently used in industrial DOE applications because of simplicity and ease
of analysis.
• Obtain a mathematical relationship between X’s and Y’s.
• Determine a numerical, mathematical relationship to identify the most
important or critical factors in the experiments.
experiments
Full Fa ctoria l designs a re used w hen:

• There are five or fewer factors.
• You know the critical factors and need to explain interactions.
• Optimizing processes.

499
Mathematical Output of Experiments
• The end result of a DOE is a mathematical function to describe the

results of the experiment.
• For the 2k Factorial designs this module discusses, linear
relationships are covered.
• All models will have some error as shown by the ε in the below
equation.
equation
• The mathematical equation below is the prediction from the

experimental data. Notice there is no error term in this form.
• Ŷ is the predicted output response as a function of the input
variables used in the experiment.
This may look similar to regression, but the important difference is that DOE is considered true
cause and effect because of the controlled nature of experimentation. This is an important tool in
manufacturing environments.
The only difference between the model equation and the prediction equation shown is that the
prediction equation is simplified for describing the data gathered in the experiment and using it to
predict future events
events. Just because you end up with a prediction equation in an experiment does not
mean it is a good predictive model. We will discuss this further when we introduce Center Points.

500
Linear Mathematical Model
The linear model is sufficient for most industrial experimental objectives.

The linear model can explain response planes and twisted response surfaces
because of interactions.
– The following is a linear prediction model used in a two-level full or
fractional factorials.
Surface Plot of % Reacted Surface Plot of % Reacted
65
65
60 55
% Reacted % Reacted
1 1
55
45
-1
0
Cn -1
0
Cn
0 -1 0 -1
Ct 1 T 1
Linear Models are usually sufficient for most industrial experimental objectives. This goes back to
the difference between a physical model and a DOE model. Just because we know by theory that
the model should not be linear, it may express itself as sufficiently Linear in the particular design
space.
People can get confused between the concept of curvature and twisted response planes. We do
not have enough information (not enough levels for each variable) to describe true curvature. Take
a piece
i off paper which
hi h will
ill representt 2 iinputt variables.
i bl Lift opposite
it corners. Th
Thatt iis a graphical
hi l
representation of an interaction. The response plane (paper) is twisted. Now lift up the paper to eye
level and rotate until the projection looks like a curved line. We are simply looking at the projection
of the twisted plane with Linear Models. There may be true curvature in the real world, we simply
can’t describe it with a Linear Model.
HOWEVER, in most manufacturing processes the Linear Model is very powerful because of the
constrained design space. Draw a box on the paper and hold it up by two opposite corners.
Depending on how much twist you give the paper and how big the box is you will either see a
curve or not in the defined space.
The surface plot on the left has no significant interaction, but both Main Effects are significant. The
surface plot on the right shows a significant interaction with T and Cn.

501
Quadratic Mathematical Model

True curvature can be described using g the Quadratic Model. The squared
q term in the model g
gives
us the ability to describe true curvature. With the ability of describing curvature comes a cost. The
experiment gets much bigger. Central composite designs are an example of a Quadratic Model.
Here is a surface plot of true curvature in a Quadratic Model. This shape is referred to as a saddle
for obvious reasons.
Quadratic Models can be obtained with designs not described in this module.
Quadratic Models explain curvature, maximums, minimums and twisted
maximums and minimums when interactions are active.
– The following is the quadratic prediction model used in some response
surface models not covered in this training.
– The simpler 2k models do not include enough information to generate
the Quadratic Model.
21
16
C6
11
1.5
1.0
6 0.5
-1.5
5 -0.5
0.0
B
-1.0
-1 0 -0.5
05 -1.0
-1 0
0.0 05
0.5 -1.5
A 1.0
0 1.5
Nomenclature for Factorial Experiment

The nomenclature for 2 level designs is 2 to the K. If you had an experiment with 3 factors it would
be a 2 cubed design. If you simply do the math, that is the number of experimental runs in the basic
design.
2-level designs are most commonly used:
–2k where k is the number of factors

– The total number of runs in the design is equal to the result of
the math
math.
• Example: 3 factors
• 23 = 8 runs
Other designs have more levels in the factorial designs.

– Example is a 34 factorial design with 4 factors at 3 levels for each
factor
factor.

502
Treatment Combinations
Treatment combinations or experimental runs, show how to set the

levels for each of the factors.
Minuses and plusses can be used to indicate low and high factor level
settings, center points are indicated with zeros.
If the process is evaluated with combinations of the temperature set at

10 and 20 degrees and pressure at 50 and 100 psi, an example of an
experimental run or treatment combination would be 20 degrees and
50 psi.
– This 22 design shown below has 2 factors at 2 levels.
– A total of 4 treatment combinations are in this experiment.
Temperature
10 20 T
Treatment
t t combination
bi ti
Pressure 50 1 2 for run number 2 is:
100 3 4
Temperature at 20 deg
and Pressure at 50 psi.
Standard Order of 2 Level Designs

Dr. Frank Yates created this standard order to aid in calculating the effects of each effect by
hand. Thank goodness we no longer have to perform hand calculations. It is common to draw a
cube for a 2 cubed design as shown.
The design matrix for 2 k factorials are shown in standard order (not
randomized).
– The low level is indicated by a “ -” and the high level by a “ +” .
– This order is commonly referred to Yates standard order for Dr.
Frank Yates.

503
Full Factorial Design with 4 Factors
Here we h
H have
standard notation for
2 to the 4 design and
above using 2 cubes,
a common
representation; now
for the low levels of
tthe
e 4 the
t e factor
acto and
a d
one for the high.
Full Factorial Design

Let’s walk through and Stat>DOE>Factorials>Create Factorial Design
design a 2 cubed design This design is in coded units because it simply lists minus and plus signs for the
again for practice. You factor levels. Coded units provide some advantages in the analysis but is not
can name the columns A, useful for process owners when running an experiment.
B and C or any name
The table is also referred to as a Table of Contrasts.
you’d like.
This ttable
Thi bl created
t d with
ith Factors
the factors is referred to
as a table of contrasts.
The contrast columns are
the minus ones and plus
ones in the factor
columns. In order to
calculate contrast
columns for interactions,
we need the contrast
columns for the main
factors.
Warning, whatever you do, do not change the names of the columns by simply typing over the
names. MINITABTM creates a model that it uses for the analysis later. If it can’t find the column
names used to generate the worksheet, it will give an error message.

504
Balanced Design
Factorial designs should be
Factorial Designs should be balanced for proper interpretation of the
balanced for proper
mathematical equation.
interpretation of the
mathematical equation. An experiment is balanced when each factor has the same number of
experimental runs at both high and low levels.
An experiment is balanced
when each factor has the Summing the signs of the column contrast should yield a zero.
same number of
experimental runs at both Balance simplifies the math necessary to analyze the experiment
experiment.
high and low levels. – If you always use the designs MIN ITABTM provides, they will always be
balanced.
Summing the signs of the A B
column contrast should yield 1 - -
a zero. In this example, there 2 + -
are 2 minuses and 2 plusses.
3 - +
B l
Balance simplifies
i lifi th the math
th 4 + +
necessary to analyze the ∑ Xi 0 0
experiment.
MINITABTM creates balanced, orthogonal designs. If they aren’t changed, this isn’t a problem.
Orthogonal Design
An orthogonal design
allows each effect in An Orthogonal Design allows each effect in an experiment to be
an experiment to be p
measured independently, y theyy are vectors that are at 90 degrees
g to
measured each other.
independently, these If every interaction for all possible variable pair sums to zero, the
are vectors which are design is orthogonal.
at 90 degrees to each W ith an Orthogonal Design, if an interaction is found to be significant,
other. When every it is because of the data and not the experimental design.
interaction for all – If you always use the designs MIN ITABTM provides, they will always be
possible variable pair orthogonal and balanced.
sums to zero, the
A B C AB AC BC
design is orthogonal.
1 - - + + - -
2 + - - - - +
3 - + - - + -
4 + + + + + +
∑ XiX y = 0 0 0

505
Biomedical Production Example
In this example we will walk through the 11 Step DOE methodology.

The biomedical firm is attempting to increase the yield of a specific
protein expression for use in research by universities and
pharmaceutical companies.

• Increase the yield by 50% of current production. The Measurement System
A l i ffor yield
Analysis i ld h
has b
been verified.
ifi d Th
The b
baseline
li ffor th
the primary
i metric
t i off
yield is at 50%. The objective of the Project Charter required the team to
achieve at least a 50% increase in yield.
2 . Esta blish the Ex perimenta l O bjective

• Maximize the yield.

• Yield of protein expression is the only output of interest.
• It is desirable to change the yield from 50% to at least 75%.
4. Select the Input (independent) Va ria bles

• Temperature
• Concentration
• Catalyst
• N oise and other variables such as ambient room temperature and technician
will be recorded during the experiment
experiment.

• The following levels were determined with tools from the Analyze Phase such
as Regression, Box Plots, Hypothesis Testing and Scatter Plots. The levels
were set far enough to attempt large yield changes to get statistical
confidence in our results.
– Temperature C (25, 45)
– Concentration % (5, 15)
– Catalyst (Supplier A, Supplier B)

• A Full Factorial Design is desired because the team has no knowledge of the
interactions and the number of factors is only 3.
• Randomization is desired because of statistical confidence.
• Randomization is possible because all factors can be changed easily without
large, long disruptions to the process.
• The sample size will be based on a delta of 2 Standard Deviations.
Sta t> Pow er a nd Sa m ple Size> 2 -level Full Fa ctoria l
Power and Sample Size

2-Level Factorial Design
Alpha = 0.05 Assumed standard deviation = 1
Factors: 3 Base Design: 3, 8

Blocks: none
Center Total Target
Points Effect Reps Runs Power Actual Power
0 2 2 16 0.9 0.93674

506
Biomedical Production Example (cont.)
Stat>DOE> Create Factorial Design When creating the worksheet

in MINITABTM be sure to
change the default in the
“Number of replicates:”
window to 2.
Enter the names of the

factors and their levels
here in MINITABTM.
This is where these
are created so
remember to do it
here,
e e, itt will not
ot carry
ca y
through if you only do
it in the worksheet
itself.
For ease of data entry for the

For ease of data entry for the results of the DOE, we have turned off
results of the DOE, we have “Randomize runs” by deselecting in this “Options…” tab.
turned off “Randomize runs” by
deselecting g in this “Options…”
p
tab. You will almost always use
the randomization selection
when creating designs for real
experiments. There are some
exceptions that we will cover
later in this module.

507

In an empty
p y column, type
yp in
‘Yield’ where we will place the In an empty column, type in ‘Yield’ where we will place the experimental
results. Column C8 was selected in this example.
experimental results. Column
Do NOT edit, copy, paste or alter anything in the first 7 columns or
C8 was selected in this MINITABTM will not understand the worksheet.
example.
If we had more than one

response we would have added
that as a column as well
well.
Take a moment to look at your

worksheet. It should look the
same as the one shown here.
Why is the supplier column
justified to the left instead of the
right?
That’s right, it’s a text column.
7. Execute the Experiment and Collect Data
Even though we do not have a • Enter the results of the experiment in the column labeled “Yield”, our output.
number for the supplier • The ambient room temperature and technician were recorded per our
original plan but we did not place the information into this worksheet.
variable, MINITABTM will
handle the calculations easily.
In fact, it would be misleading
to assign numbers to the
variable names to trick
MINITABTM into thinking it was
a continuous variable. There is
no “in between” value for 2
different suppliers.
Type in the yield information in

the worksheet yyou created.
Over the next several graphics
we will walk through the 8 . Ana lyze the Da ta from the Designed Ex perim ent
analysis. Sta t>DO E> Fa ctoria l>Ana ly ze Fa ctoria l Design
Select “ N orma l” a nd
“ Pa reto” effects plots.
We first need to estimate the Select “ Sta nda rdized”
residua ls.
effects for ALL possible effects
in the design, including all
main effects and all interaction
effects. Then we will decide
which ones are important to
describing the variation in the
data set.
We will remove the effects that

are not important to describing
the variation
ariation in the data set
and re-run the model with only
those effects. This is similar to
the work you have already done in Regression Analysis. After we have run the final model fit we will
check our Residual Analysis to validate our assumptions, the same as in Regression.

508

Select the “Terms”
Terms tab and you
MINITABTM defaults with all
will see MINITABTM effects in the model. After
automatically selects all the significant effects are
determined, the insignificant
possible terms for the design effects will be removed.
you are using. If any of the

seven effects listed here are
found to be insignificant in
explaining error then we need to
remove them
th ffrom the
th modeld l
soon.
We have selected two graphical
tools to help us select the correct
model. The Normal Probability
Plot assumes that insignificant
effects or effects that have values
close
l tto zero are due
d tto noise
i
which is distributed Normally. The N orm a l Proba bility Plot
Normal Probability Plot of the Standardized Effects
Any insignificant effects should 99
(response is Yield, Alpha = .05) a ssum es tha t insignifica nt
effects a re due to noise a nd
plot closely to the Normal 95
Effect Ty pe
Not Significant
Significant therefore N orm a lly Distributed.
Probability line. Effects that are 90
80
A F actor
A
B
N ame
Temp
C onc
Any significa nt effects w ill be
AC
plotted off the stra ight line a nd
large are indicated in red and
C S upplier
70
Percent
60
50
highlighted in red.
labeled. This method is referred 40
30
20 Pareto Chart of the Standardized Effects
to as the Daniels method in some 10
5 2.31
(response is Yield, Alpha = .05)
literature. 1
A
F actor N ame
A
B
Temp
C onc
0 10 20 30 40 50 60 70 C S upplier
Standardized Effect AC
The Pareto Chart also shows us B
the significant effects based on The Pa reto Cha rt of

Term
BC
sta nda rdized effects
the selected alpha level. Any gra phica lly show s w hich
ABC
effects a re significa nt AB
Effect that is beyond the red line ba sed on the selected C
is considered significant. a lpha level. Any effect
0 10 20 30 40 50 60 70
tha t goes bey ond the red Standardized Effect
line is significa nt.
At this point, Temperature and
the interaction of Temperature
with supplier are the significant
Effects. In the Session W indow under the factorial fit, any effect that has a P-
value less than 0.05 (for an alpha of 0.05) is considered significant.
Look for the factorial fit
N otice that all three methods of determining what effects belong in
information. We interpret this
the final model fit agree.
based on the same way as we
Factorial Fit: Yield versus Temp, Conc, Supplier
have interpreted as we do any
other statistical test. Estimated Effects and Coefficients for Yield (coded units)
What does this tell us….there are Term Effect Coef SE Coef T P
2 significant Effects that should

Constant 61.1250 0.1811 337.44 0.000
Temp 23.4500 11.7250 0.1811 64.73 0.000
be in this model. Conc 0.5750 0.2875 0.1811 1.59 0.151
Supplier 0.0000 0.0000 0.1811 0.00 1.000
Temp*Conc
e p Co c -0.0250
0.0 50 -0.0125
0.0 5 0.
0.1811
8 -0.07
0.0 0.9
0.947
Temp*Supplier 10.0500 5.0250 0.1811 27.74 0.000
Conc*Supplier -0.4750 -0.2375 0.1811 -1.31 0.226
Temp*Conc*Supplier 0.1750 0.0875 0.1811 0.48 0.642

509
Biomedical Production Example (cont>)
Since we have removed the

Re-fit the model by removing the insignificant factors.
insignificant factors we need
to go back and refit the
model. Even though there Even though Supplier was not a
significant effect, it is necessary to
were only 2 significant include it in the model because the
Effects we must include all Temp/Supplier effect was significant.
Main Effects in the model This type of model is referred to as a
that are involved in an Hierarchical Model.
interaction since we don’t
completely understand the
interactions.
Under “Graphs” uncheck

“Normal” and “Pareto” plots
and include either “Individual
plots” or the “Four in one” to
evaluate our assumptions with
the Residual Plots. Another
plot that should always be
explored is the “Residuals
versus variables:” plot.
The Residual Analysis will

be discussed shortly.
We need to create some Sta t> DO E> Fa ctoria l>Fa ctoria l Plots
factor plots before evaluating Anytim e there is a significa nt
the residuals. Follow the intera ction, it is useful to plot.
Plot both “ M a in Effects Plot” a nd
MINITABTM path shown here.
here “ Intera ction Plot” in this ex a m ple.
ple

510

Thee steep slope
s ope oon a Main
a
Effects Plot means the Main Effects Plot (data means) for Yield N on-pa ra llel lines in the
Intera ction Plot indica ted
variable is significant. Flat Temp Conc
significa nce. The lines do

70
lines, as shown for 65 not ha ve to cross ea ch
concentration and supplier, 60 other to be significa nt.
55
Also, they ca n cross
Mean of Yield
indicate they are not 50
25 45 5 15
slightly a nd still be
significant. Supplier insignifica nt.
70 Interaction Plot (data means) for Yield
65 5 15 A B
The interaction plot shows all 60

70
Temp
25
the plots with the variables you

55 45
50 T emp 60
selected in the previous A B 50
MINITABTM command. The

C onc
70 5
15
interaction of interest for our A steep slope in the M a in C onc 60
Effects Plot indica te 50
example is temperature with significa nce. A fla t slope

supplier. Here it looks like high indica tes no significa nce.
Supplier
temperature with supplier B

gives the highest yield which,
in our case, is exactly what we
want.
Factorial Fit: Yield versus Temp, Supplier

Estimated Effects and Coefficients for Yield (coded units) Review the fitted
Term Effect Coef SE Coef T P
t bl iin your S
table Session
i
Constant 61
61.1250
1250 0
0.1847
1847 330
330.94
94 0
0.000
000
Temp 23.4500 11.7250 0.1847 63.48 0.000
Supplier 0.0000 0.0000 0.1847 0.00 1.000
Window. This
Temp*Supplier 10.0500 5.0250 0.1847 27.21 0.000 provides a lot of
information that we
S = 0.738805 R-Sq = 99.75% R-Sq(adj) = 99.69%
will explore later in the
module, for now
Analysis of Variance for Yield (coded units)
Source DF Seq SS Adj SS Adj MS F P
Main Effects 2 2199.61 2199.61 1099.80 2014.91 0.000
notice the P-values.
M odel is significa nt
2-Way
2 Way Interactions 1 404
404.01
01 404
404.01
01 404
404.01
01 740
740.17
17 0
0.000
000
Residual Error 12 6.55 6.55 0.55
Pure Error 12 6.55 6.55 0.55
Total 15 2610.17
Interpret the Residual Analysis the same as in Regression.
Residual
Residual Plots
Plots for
for YYield
ield
This shows us our 4 in 1 plot of 99

99
Normal
NormalPro babilit y Plo
Probability Plott of
of tthe
he Residuals
Residuals
22
Residuals
ResidualsVersus
Versustthe
he Fit
Fittted
ed Values
Values
Residual
Residuals for yield. The 90

90 11
Percent
Percent
interpretation is the same as we’ve

Standardized
50
50 00
used in the past for Regression. 10

10
-1
-1
11 -2
-2
-2
-2 -1
-1 00 11 22 40
40 50
50 60
60 70
70 80
80
Standardized
Residual Fitted
FittedValue
Value
Hist
Histogram
ogramof
of tthe
he Residuals
Residuals Residuals
ResidualsVersus
Versustthe
heOrder
Orderof
of tthe
he Dat
Dataa
22
44
esidual
esidual
33 11
Re
Frequency
yy
StandardizedRe
Frequency
Standardized
22 00
11 -1
-1
00 -2
-2
-1.5
-1.5 -1.0
-1.0 -0.5
-0.5 0.0
0.0 0.5
0.5 1.0
1.0 1.5
1.5 11 22 33 44 55 66 77 88 99 10
10 11
11 12
12 13
13 14
14 15
15 16
16
Standardized Obser vation OOrder
rder

511
The Residuals versus The Residuals versus variables are most important when deciding what level to
Variables are most set an insignifica nt factor.
important when A typical guideline is a difference of a factor of 3 in the spread of the Residuals
deciding what level to between the low and high levels of an insignificant input variable.
set an insignificant – In this case concentration was not significant, but we still need to make a decision
factor. on how to set it for the process. The low level for concentration has a smaller
spread of Residuals, but there is not a difference of 3:1. Other considerations for
setting the variable are cost and reducing cycle time.
A typical guideline is a
difference of a factor Residuals Versus Temp Residuals Versus Conc
(response is Yield) (response is Yield)
of 3 in the spread of 2
Spread of residuals
2
the Residuals
1 1
Standardized Residual
between the low and
high levels of an 0 0
insignificant input
-1 -1
variable. In this case
concentration was not -2
25 30 35 40 45
-2
5.0 7.5 10.0 12.5 15.0
significant, but we still Temp Conc
need to make a
decision on how to set it for the process. The low level for concentration has a smaller spread of
Residuals, but there is not a difference of 3:1. Other considerations for setting the Variable are
cost and reducing cycle time.
The Response Optimizer in MINITABTM is a great tool to visually determine where to set the input
variables to achieve the desired output response. Play with it for a while and see what you get. The
more you play around with these thing the better your understanding will be of how it works.

Sta t> DO E> Fa ctoria l> Response O ptim izer
Reca ll the objective w a s

to m a x im ize the y ield.
It is necessa ry to
esta
t blish
bli h a ta
t rgett a nd
d
low er lim it for the yield
va lues.

512

As you can see from
this there is only one
continuous input
variable which
MINITABTM came up
with the best solution
based on the data we
have used.
Practical Solution:
Temp 45C
Concentration 5%
Supplier B

• Verify, verify, verify.
• Verify settings determined in the last step, by producing several Now that we have
typical manufacturing quantities. completed one example we
• The variation or error seen in the experiment will be different than the are going to add to your
variation seen in the manufacturing validation. knowledge base by
covering Center Points and
1 1 . Implem ent Solutions run through another
• If the objective of the experiment was accomplished and the Business example adding further
Case is satisfied, then proceed to the Control Plan which is covered explanation of the statistics
in the Control Phase. as well.
output to satisfy the Business Case.
• Implement
p the changes
g necessary y to maintain the new g
gains to the
process.
Center Points
As you can see in the A Center Point is an additional experimental run made at the physical center of
the design.
graphic there may be an – Center Points do not change the model to quadratic.
unknown hump in the – They allow a check for adequacy of linear model.
Response Curve, by The Center Point provides a check to see if it is valid to say that the output
response is linear through the center of the design space.
adding the Center Point it If a straight line connecting high and low levels passes through the center of the
allows us to calculate an design, the model is adequate to predict inside the design space.
– “Curvature” is the statistic used to interpret the adequacy of the Linear
additional statistic. If there Model.
is significant curvature in – If curvature is significant the P-value will be less than 0.05.
the model all we know is Do NOT predict outside the design space.
that the model is not

ponse
Linear
Linear.
Output Resp
We don’t know what it is,

just what it is not.
“-” “c” “+” Factor Settings

513
Center Point Clues

Pseudo Center Points are used
when there are discrete input A Center
C Point is always a good insurance policy, but is most effective
ff
when all the input factors are Continuous.
variables in the model.
A guideline is to run 2-4 Center Point runs distributed uniformly through
The model can be collapsed the experiment when all the input factors are continuous in a Full or
creating real Center Points if Fractional Factorial.
the discrete input variables are
not significant. Y
M a x imize Response
If the desire was to maximize Does it m a tter tha t
the response (as shown in the linea r model is
graphic) then the model doesn’t ina ppropria te?
matter. The model is an
important tool to predict output
response inside the design
space. If the experimenter x
“-” “c” “+”
decides to set upp another
experiment to continue in the
direction indicated, then
predicting is not an issue.
Panel Cleaning Example
In this example
p we will walk throughg the 11 stepp DOE methodologygy for
a panel cleaning machine using Center Points in the analysis. The
manufacturing firm is attempting to start up a new panel cleaning
machine and would like to getting it running quickly. They have
experience with this type of machine, but they do not have experience
with this particular model of equipment.

• Start
St t the
th new equipment
i t as efficiently
ffi i tl as possible.
ibl ThThe need
d ffor the
th new
equipment was determined in the Analyze Phase.
• A Measurement System Analysis has been completed and modified to bring
within acceptable guidelines.

• Hit a target for W idth of 40 +/ - 5.
• Minimize variation as much as possible
possible.

514
Panel Cleaning Example (cont.)
Na2S2O8 is Sodium
Persulfate; please 3 . Select the O utput (response) Va ria bles
use that any time • W idth of conductor is the only response.
you see that
notation. 4. Select the Input (independent) Va ria bles
• Dwell Time
• Temperature
• N a2 S2 O 8
• The experts believe that ambient temperature and humidity will have
no effect on the process. Monitors will be placed in the room to
record temperature and humidity.

– Dwell Time ( 4, 6) minutes
– Temperature (40, 80) C
– N a2 S2 O 8 (1.8, 2.4) gm/ lit
Open file “Panel Cleaning.MTW”.
You actually know the answer already since the sample size is the same as the previous example
since they were both 2 cubed designs. Look at your worksheet and find the Center Point runs.
Why are the Center Points uniformly distributed?

• A Full Factorial will be used since there are only 3 input variables.
• Randomization is possible because all factors can be changed easily without
large, long disruptions to the process.
• Is the sample size adequate based on a delta of 2 Standard Deviations?
Why are
Wh
N otice the
Center
Center Points
Points not
a re uniform ly
random?
distributed
through the
design.
Pa nel Clea ning.m tw
Center Points not only tell us something about how well the linear model works, but is also a
reality check for our data. By eyeballing the Center Point data as our experiment progressed we
can see if anything has effected our experiment that we were not expecting. If your Center Points
are dramatically different from each other, you’ve got a problem -- somewhere. They should be
fairly close in magnitude, at least within normal variation.

515
Creating Designs with Center Points

You most likelyy alreadyy know how to create a design
g with Center Points added. Simply
pyg go through
g
the usual steps to create a design and include Center Points.
MIN ITABTM will place the Center Points randomly in the worksheet. The
next few slides will demonstrate how to move the Center Points so they
are uniformly distributed.
1. Create a 3 factor design with 3 Center Points and 2 replicates,
be sure to randomize the design
design.
Your designg should look different than the one in the illustration because we more likelyy than not have
a different random seed that generated the designs. It is possible that our designs are the same, but
trying to calculate the odds of that occurring is not worth the bother. You should have 19 rows in your
design, so if you do not, go back and fix it.
N otice the Center

Points are not
uniformly distributed
with this random
design. It is
desirable to move
one Center Point
near or att the
th
beginning, middle
and end.

516
Creating Designs with Center Points

Do the same for the Center
DO NOT move rows or generate new worksheets in MINITABTM’s DOE
Point you want in the platform, it will corrupt the model stored in memory!
middle and end of the
To move the center points to new locations, find a Center Point and type a ‘1’
design. We have color in the “RunOrder” column. Find the original 1 and replace with the original
coded our example for Center Point RunOrder number.
ease of understanding.
The rows you move most
likely will be different.
To complete the Center Point arrangement, sort the data on the

RunOrder column but DO N O T create a new worksheet.
Data>Sort
You should now have a
worksheet that has a Center
Point at or near the
beginning, middle and end. If
your original design had the
Center Points roughly in
those positions, great that
saved a little work.
7. Execute the Experiment and Collect Data

• The experiment has been run in the order shown below.
• One of the most common mistakes in DOE is typing the data in the data Let’s continue on with
sheet incorrectly.
incorrectly Always verify number entry!
the Panel Cleaning
Example. You may
close the worksheet
we just used
demonstrating how to
move Center Points.

517

Analyze the experiment in
MINITABTM. For fun since 8. Analyze the Data from the Designed Experiment
you’ve already done this

once in this module, stop
reading and work on your
own for a while. When you
think you know what
should be removed from
the model, go ahead and
do it.
Stat>DOE> Factorial>Analyze Factorial Design
So how did it go? Looks like

the significant effects are Normal Probability Plot of the Standardized Effects
(response is Width, Alpha = .05)
Sodium Persulfate,
99
Effect Type
Not Significant
95 Significant
temperature, the interaction of 90
80
B
C F actor
A
B
C
Name
Dw ell Time
Temp
Na2S 2O 8
temp with Sodium Persulfate

70
A
Percent
60
50
and dwell time in that order of

40
30
20
Pareto Chart of the Standardized Effects
importance. 10
5
BC
2.23
(response is Width, Alpha = .05)
F actor N ame
1 A D w ell Time
-10 -5 0 5 10 15 20 C B Temp
Standardized Effect C N a2S 2O 8
B
BC
Term
A
The significa nt effects a re
N a 2 S2 O 8 , Tem p, Dw ell AB
Tim e a nd the intera ction AC

of Tem p w ith N a 2 S2 O 8 .
ABC
0 5 10 15 20
Standardized Effect
Notice that all three methods of determining what effects belong in the final
model fit agree.
Factorial Fit: Width versus Dwell Time,

, Temp,
p, Na2S2O8
Estimated Effects and Coefficients for Width (coded units)
The P-values from
the analysis in the
Constant 34.724 0.2605 133.30 0.000
Dwell Time 4.871 2.436 0.2605 9.35 0.000
Temp 6.484 3.242 0.2605 12.44 0.000 session agree as
Na2S2O8 9.169 4.584 0.2605 17.60 0.000 well.
Dwell Time*Temp 0.941 0.471 0.2605 1.81 0.101
Dwell Time*Na2S2O8 0.861 0.431 0.2605 1.65 0.129
Temp*Na2S2O8 -4.876 -2.438 0.2605 -9.36 0.000
Dwell Time*Temp*Na2S2O8 -0.199 -0.099 0.2605 -0.38 0.711
Ct Pt 0.296 0.6556 0.45 0.662
S = 1.04201 R-Sq = 98.48% R-Sq(adj) = 97.26%

518

Re-fit the model byy
removing the insignificant Re-fit the model by removing the insignificant factors.
factors if you have not
already done this. Be sure
to generate the necessary
Residual Plots and turn off
the “Normal” and “Pareto”
plots.
Here we are going to define the calculations in the ANOVA table.
When working with 2 level designs you will always have 1 degree of freedom for each effect
(including interactions) which is calculated as 2 levels minus 1 equals 1 degree of freedom. In the
ANOVA table for Main Effects we have 3 degrees of freedom for the 3 Main Effects placed in the
model There is one degree of freedom for the temperature Sodium Persulfate interaction
model. interaction.
A Degree of Freedom (DF) is a measure of the number of independent pieces of

information used to estimate a parameter. It is a measure of the precision of an
estimate of variability. A typical definition is n -1= D. F., however, it depends
on what parameters are being estimated.
Analysis of Variance for Width (coded units) 3 DF for the 3 Main Effects, 1 DF for
Source DF Seq SS Adj SS the Adj
interaction
MS effectF in the model.
P
Main Effects 3 599
599.336
336 599.336
599 336 199.779
199 779 148 148.18
18 0 0.000
000
1 DF for curvature based on the
2-Way Interactions 1 95.111 95.111 95.111 70.55 0.000
difference between the average of
Curvature 1 0.221 0.221 0.221 0.16 0.692
the factorial points and the average
Residual Error 13 17.527 17.527 of the1.348
Center Points.
Lack of Fit 3 6.669 6.669 2.223 2.05 0.171
Pure Error 10 10.858 10.858 13 DF for residual error broken into
1.086
Total 18 712.195 two components: Lack of Fit and
Pure Error.
18 DFEstimated
for the TotalCoefficients for Width using data in uncoded units
Lack of Fit: 3 DF for the 3
(# of Term
data points -1). Coef insignificant interaction effects that
Constant -70.4706 were removed from the model.
Dwell Time 2
2.43562
43562
Pure Error: 10 DF: 8 from the
Temp 1.01544 replicated runs (#reps-1 * # of runs)
Na2S2O8 39.6625 and 2 from the Center Points
Temp*Na2S2O8 -0.406354 (#CP – 1).
The Residual error is broken into 2 sources. The 3 degrees of freedom for lack of fit are from the 3
interaction effects that were removed from the model because they were not significant in explaining
the variation of the data. The 10 degrees of freedom come from replication. The 8 runs from the
original design generated 8 degrees of freedom
freedom, in this case there were 2 replicates minus 1 equals 1
degree of freedom for each run in the design. Add to that 2 degrees of freedom from the Center
Points (3 Center Points minus 1 equals 2 degrees of freedom) and we have a total of 10 degrees of
freedom for pure error. Pure error can be defined as the failure of things treated alike to act alike
which are the replicates.

519
Adj MS = Adj SS/DF

For each respective source. F= Adj MS/MSError
Analysis of Variance for Width (coded units)

Main Effects 3 599.336 599.336 199.779 148.18 0.000
Curvature 1 0.221 0.221 0.221 0.16 0.692
Residual Error 13 17.527 17.527 1.348
L k of
Lack f Fit 3 6
6.669
669 6 669
6.669 2 223
2.223 2
2.05
05 0
0.171
171
Pure Error 10 10.858 10.858 1.086 No significant
Total 18 712.195 curvature, the
linear model is
Estimated Coefficients for Width using data in uncoded units adequate.
Term Coef
Constant -70.4706 Prediction Equation No significant
Dwell Time 2.43562 based on coefficients. lack of fit, the
Temp 1.01544 effects do not
Na2S2O8 39.6625 belong in the
model
model.
Temp*Na2S2O8 -0.406354
Ŷ = - 70.47 + 2.44 * Dwell Time + 1.02 * Temp +

39.6625 * Na 2 S 2 O 8 - 0.41 * Temp * Na 2S 2 O 8
Continuing here with some definitions….
The SS or Sum
S off Squares
S calculations are simply an unscaled or unadjusted measure off
dispersion or spread of the data. Seq or Sequential Sum of Squares and Adj or Adjusted Sum of
Squares are the same for DOE analyses. (There may be differences in Regression Analysis).
Adj MS or Adjusted Mean Square takes the Sum of the Squares number and scales it using the
number of degrees of freedom for that calculation. Mean Squares are the equivalent of variance.
Here we use the F statistic. An F statistic is simply variance divided by variance. In the case of
DOE it is the Variance of an effect divided by the variance due to residual error. In this platform,
MINITABTM sums the sum of the squares for certain elements of the model to report in the ANOVA
table instead of keeping them separate. The F statistic with respect to the Main Effects is calculated
by taking 199.779 and dividing by 1.348 which equals 148.18. The associated P-value is 0.000
which is less than 0.05 so our conclusion is that the model is significant.
Notice in this example the curvature is not significant which means our assumption of linearity is
good. Also
good so tthe
e p value
a ue for
o lack
ac oof fitt is
s not
ot ssignificant.
g ca t Thatat means
ea s tthe
eeeffects
ects we
e removed
e o ed from
o tthe e
model really do not belong in the model. If there was significant lack of fit, that would indicate that
some of the effects that were removed from the model actually belong in the model.
The last to discuss here is the prediction equation. Please note here the coefficients for the
prediction equation are based on uncoded units. In other words, you can use this equation directly
in real units. Let’s do an example next.

520
Prediction Equation
Take a few minutes to study
Determine the predicted value when:
the equation above. It really
– Dwell time = 4.2 minutes
is simply “plug and chug”.
– Temp = 75C
– Sodium Persulfate = 2.0
Please note, we have taken
liberties with rounding
numbers! You won’t actually Simply insert these values into the equation and do the math.
have to do this by hand
because that is exactlyy what
the response optimizer does
in MINITABTM.
The most interesting thing

Main Effects Plot (data means) for Width to look at here is the
Dwell Time Temp Point Ty pe
C orner
interaction plot. The
temperature with Sodium
38 C enter
36
34
Persulfate interaction
of Width
32
30 shows there is very little

diff
difference iin th
the predicted
di t d
4 5 6 40 60 80
Mean o
Na2S2O8
Interaction Plot (data means) for Width
38
36
40 60 80 1.8 2.1 2.4
Dwell
response as long as
40
34
Time
4
5
Point Ty pe
C orner
C enter
Sodium Persulfate is held
Dwell T ime
at the high level. But if the

32
32 6 C orner
30
concentration of Sodium
24
1.8 2.1 2.4
Temp Point Ty pe
40
40 C orner
Interaction shows there is very T emp 32

60
80
C enter
C orner Persulfate is lowered,
little difference in the predicted
response as long as Sodium 24
temperature and in
Persulfate is held at the high particular 40 degrees
level. Na2 S2 O 8
lowers the width more
rapidly than if temperature
was set at 80 degrees.
This is the Cube Plot again and the

average of the actual data points Cube
Cube Plot
Plot(data
(data means)
means)for
for Width
Width
appear around the cube as
previously discussed. 36. 875
36.875 43. 350
43.350
CCen terpo int
enterpoint
FFactorial
acto rial PPoint
o in t
33.245
33.245 38.395
38.395
80
80
35.020
35.020
Temp
Temp 36. 010
36.010 41. 000
41.000
2.4
2.4
Na2S2O8
N 2S2O8
Na2S2O8
23.025
23.025 25.895
25.895
40
40 1.8
1.8
44 66
Dwell
DwellTime
Time

521

p
There are no assumption
violations within the plots The Residual Plots look good.
shown here.
Residual
Residual Plots
Plotsfor
for Width
Width
No rmal Pro
Normal b abilit yy Plo
Probabilit Plott oof
f tthe
heRResiduals
esid uals RResiduals
esid uals Versus
Versustthe
he Fit t ed Values
Fitted Values
99
99 22
Residual
90
90 11
Percent
Percent
00
Standardized
50
50
-1
-1
10
10
11 -2
-2
-2
2
-2 -1
1
-1 00 11 22 20
20 25
25 30
30 35
35 40
40
Standar dized Residual
Standardized Residual Fitted
FittedValue
Value
Hist o gram oof

Histogram f tthe
he Resid uals
Residuals Residuals
ResidualsVersus
Versustthe
he Ord er oof
Order f tthe
heDat
Dataa
22
6.0
Residual
6.0
4.5 11
4.5
Frequency
Frequency
00
Standardized
3.0
3.0
1.5 -1
-1
1.5
0.0 -2
-2
0.0
-2
-2 -1
-1 00 11 22 22 44 66 88 10 10 1212 14 14 16
16 18
18
Standar dized Residual
Standardized Residual OObservation
bser vation OOrder
r der
Residuals Versus Dwell Time Residuals Versus Temp

(response is Width) (response is Width)
2 2
1 1
0 0
As depicted here the
Residuals Versus
-1 -1
Factor Plots do NOT
-2
4.0 4.5 5.0 5.5 6.0
-2
40 50 60 70 80
show
h any diff
differences
Dwell Time Temp
in the variation of the
Residuals Versus Na2S2O8 data from the low to
(response is Width)
2 the high values.
1
-1
S
-2
1.8 1.9 2.0 2.1 2.2 2.3 2.4
Na2S2O8
9 . Dra w Pra ctica l Solutions Sta t>DO E> Fa ctoria l> Response O ptimizer
Here we will use the
Response Optimizer
to draw some
Practical
Conclusions. Play
with the Response
Optimizer and see
what you can do
remembering that the
original
i i l objective
bj ti was
to hit a target of 40
+/- 5 for the width.

522

This looks a little odd. Even
Th Response
The R O ptim
i izer
i ha
h s a little
li l trick
i k ; if you include
i l d
though each of the input Center Points in the m odel it w ill trea t the low , center a nd
variables is continuous if you high va lues a s discrete points.
include Center Points in the
As you ca n see the
model it will treat the low, center
Center Points fit the
and high values as discrete
Linea r M odel.
points.
As you can see the Center

Points fit the linear model.
Do it again and this time

turn off “Include center
points in the model” so that
MINITABTM will generate its
best optimization.
Is this the only solution?
Are there other solutions?

E l
Explore your options
ti b
by
sliding the red lines around
Setting each factor
to see the various at these settings will
reactions. achieve the target
output.
Predicted output

523
MINITABTM does an excellent jjob of

W hat
h t if you assume N a2 S2 O 8 is
i very expensive?
i ? Wh
here
optimizing according to the data,
would you set the variables.
what it does not know are all the
quirks of your equipment, cost of raw
materials, increasing throughput, etc.
Is it possible to achieve the target

value of 40 with Sodium Persulfate
set at the minimum value?
It looks like we can get close, but we

can’t hit the target. We know our
Use the m ouse a nd slide the red line for
lower specification limit is 35 and it N a 2 S2 O 8 to the low level first, then a djust the
looks like we can get to 38 with the other sliders to move the predicted response to
4 0 . Is it possible to a chieve 4 0 w ith Sodium
Sodium Persulfate at the low level, Persulfa te set a t the m inimum va lue?
temp and dwell time high. Is the good
enough?h? M Maybe,
b maybeb not. t
If you knew the spread of the data or variation and it was small you could capitalize on that capability
by using 38 as the target instead of 40 and still guarantee your customer they would never see any
product with widths smaller than 35.
Imagine if you were working with gold or platinum. What effect could that have on the bottom line?
Look at another
graphical tool you can There is another MINITABTM function that will show the complete
use in MINITABTM to solution set for a targeted values.
visualize the solution Stat>DOE>Factorial>Overlaid Contour Plot
set of input variable

level settings in order
to achieve the desire
result.

524
As shown here we
generate 3 Overlaid Contour Plot of Width Overlaid Contour Plot of Width
different graphs as 2.40

W idth
35
2.40
W idth
35
45
a result of
45
Hold Values Hold Values

2.25 Dwell Time 4 2.25 Dwell Time 5
changing the set
Na2S2O8
Na2S2O8
point for dwell 2.10
Dw ell Tim e 2.10 Dw ell Tim e
time. The areas a t low a t m iddle
1.95
setting 1.95 setting
shown in white are
the solution set for 1.80
40 50 60 70 80
1.80
40 50 60 70 80
adjusting Temp Temp
temperature and Overlaid Contour Plot of Width

The a rea s 2.40
Sodium Persulfate show n in
W idth
35
45
to get a predicted w hite a re 2.25

Hold Values
Dwell Time 6
response between the solution Na2S2O8
35 and 45. This is set for 2.10 Dw ell Tim e

a djusting a t high
g
an alternative to Tem p a nd 1.95 setting
the Response Sodium
Optimizer. Persulfa te. 1.80
40 50 60 70 80
Temp
It’s a wrap……. Fun stuff, right?!

• Verify, verify, verify.
• Verify settings determined in the last step
step, by producing several
typical manufacturing quantities.
• The variation or error seen in the experiment will be different than the
variation seen in the manufacturing validation.
1 1 . Implem ent Solutions

• If the objective of the experiment was accomplished and the Business
Case is satisfied, then proceed to the Control Plan which is covered
in the Control Phase.
output to satisfy the Business Case.
• Implement the changes necessary to maintain the new gains to the
process.
p

525
Understand how to create Balanced and Orthogonal

Designs
Explain how Fit, Diagnose and Center Points factor into
an Experiment
You have now completed Improve Phase – Full Factorial Experiments.
Notes

526
Lean Six Sigma

Black Belt Training
Improve Phase
Fractional Factorial Experiments
Now we will continue with the Improve Phase “Fractional Factorial Designing Experiments”.

527
Within this module we

will explore how to
Welcome
Welcome to
to Improve
Improve
conduct a Fractional
Factorial Experiment.
Process
Process Modeling:
Modeling: Regression
Regression
Advanced
Advanced Process
Process Modeling:
Modeling:
MLR
MLR
Designing
Experiments
Experimental
Methods
Designs
Designs
Full
Full Factorial
Experiments
Creation
Creation
Fractional
Fractional Factorial
Experiments
Generators
Generators
Wrap
Wrap Up
Up &
& Action
Action Items
Items
Confounding
Confounding && Resolution
Resolution
Why Use Fractional Factorial Designs
Fra ctiona l Fa ctoria l Designs a re used to:

• Analyze factors to find cause/ effect relationships if the Analyze Phase was
unable to sufficiently narrow the number of factors impacting the output(s).
– Fractional Factorials are often referenced as “ screening experiments” -- fewer runs
with
i h llarger number
b off ffactors.
– Fractional Factorials are usually done in early stages of the improvement process.
Fractional Factorial Design

StdOrder A B C D Full Factorial Design
1 -1 -1 -1 -1 StdOrder A B C D
1 -1 -1 -1 -1
2 1 -1 -1 1
2 1 -1 -1 -1
3 -1 1 -1 1
3 -1 1 -1 -1
4 1 1 -1 -1 4 1 1 -1 -1
5 -1
1 -1
1 1 1 5 -1
1 -1
1 1 -1
1
6 1 -1 1 -1 6 1 -1 1 -1
7 -1 1 1 -1 7 -1 1 1 -1
8 1 1 1 1 8 1 1 1 -1
9 -1 -1 -1 1
10 1 -1 -1 1
11 -1 1 -1 1
12 1 1 -1 1
13 -1 -1 1 1
14 1 -1 1 1
15 -1 1 1 1
16 1 1 1 1
Fractional Factorial Designs are a powerful sub-set of Factorial Designs. As the name implies, you
may expect they are some fraction of the original Factorial Designs – and you’d be correct. The
question is what fraction?

528
Why Use Fractional Factorial Designs (cont.)

Fractional
act o a Factorial
acto a Designs
es g s aare
e used to a
analyze
a y e factors
acto s to find
d cause a
andde
effect
ect relationships
e at o s ps if tthe
e
Analyze Phase was unable to sufficiently narrow the number of factors impacting the output(s).
Fractional Factorials are often referenced as “screening experiments” meaning that fewer runs are
required with larger number of factors. Fractional Factorials are usually done in early stages of the
improvement process.
We’ve shown two 4 factor designs side by side so you can contrast the two designs. Notice the
Fractional Factorial Design requires only a fraction of the experimental runs to evaluate 4 input
factors In this case,
factors. case it is a half fraction
fraction. As with most things in life there is a price to be paid for
reducing the number of runs required which we will go through in detail in this module.
Fractional Factorial Designs are also used to:

• Study Main Effects and 2-way interactions if the experimenter and team has
good process knowledge and can assume higher order interactions are
negligible.
• Reduce time and cost of experiments because the number of runs have
been lowered.
– As the number of factors increases, the number of runs required to run a full 2k
factorial experiment also increases (even without repeats or replicates)
• 3 factors: 2x2x2 = 8 runs
• 4 factors: 2x2x2x2 = 16 runs
• 5 factors: 2x2x2x2x2 = 32 runs etc….
• Be an initial experiment
p that can be augmented
g with another fraction to
reduce confounding and estimate factors of interest.
The answer is in there

somewhere!!
Fractional Factorial designs are also used to study Main Effects and 2-way interactions if the
experimenter and team has good process knowledge and can assume higher order interactions are
negligible. There is the cost in a nutshell. In exchange for reducing the overall experiment’s size you will
give up the ability to evaluate higher order interactions. It turns out this is a pretty good assumption in
many cases. We’ll talk about this more later.
Fractional Factorial designs are also used to reduce the time and cost of experiments because the
number of runs are lowered. As the number of factors increases, the number of runs required to run a
full 2k factorial experiment also increases (even without repeats or replicates) as you already know.
3 factors: requires 8 runs
4 factors: requires 16 runs
5 factors: requires 32 runs etc….
The number of runs required for a Fractional Factorial will depend on how many factors are included in
the design and how much fractioning can be tolerated based on the facts of the process
process.
Fractionals are also used as an initial experiment that can be augmented with another fraction to
reduce confounding and estimate factors of interest. We’ll define this as we advance through the
module.

529
Nomenclature for Fractional Factorials
The general notation for a

Fractional Factorial is k-p
2R
The genera l nota tion for
similar to that of a Full
Factorial. Take a few Fra ctiona l Fa ctoria ls is:
moments and read through
the definitions for the – k = number of factors to be investigated
notation. – p = number of factors assigned to an interaction column (also called
“ degree
g of fractionating”
g with 1=1/ 2,, 2=1/ 4,3=1/
, 8,, etc.))
Let’s look at the 2 to the 5 – R = design resolution (III, IV, V, etc.). It details amount of
minus 1 example here: confounding to compare design alternatives
How many factors are in – 2 k-p = the number of experimental runs
the experiment? That is the
first number in the exponent The example clarifies how to use the nomenclature. 5-1
2V
or in this case, 5. • How many factors in the experiment?
• How many runs if no repeats or replicates?
At this p
point we are not
• W hat
h t fractional
f ti l design
d i iis thi
this (1/ 8
8, 1/ 4 or 1/ 2)?
ready to discuss the
resolution since we have
not covered it yet.
How many runs if no repeats or replicates? Simply do the math. 2 to the 5 minus 1 is the same as
2 to the fourth which is 8 runs.
What Fractional Design is this? Since this design uses only half the number of runs as a Full
Factorial with 5 factors it is a half fraction.
Half-Fractional Experiment Creation
Recall the 2x2x2 full 3-factor, 2-level Factorial Design. Suppose we needed to investigate
a fourth factor but we could N OT increase the number of runs because of time or cost.
g
Select the highest order interaction to represent
p the levels of the fourth factor. The ABC
interaction will determine the levels for factor D.
W hen we replace the ABC interaction with factor D, we say the ABC 3-way interaction
was aliased or confounded with D. This experiment maintains balance and orthogonality.
– The first experimental run in the first row indicates the experiment is executed with factor D at
the low level while running all the 3 other factors at the low level.
Factor D
A B C AxB AxC BxC AxBxC
-1
1 -1
1 -1
1 1 1 1 -1
1
1 -1 -1 -1 -1 1 1
-1 1 -1 -1 1 -1 1
1 1 -1 1 -1 -1 -1
-1 -1 1 1 -1 -1 1
1 -1 1 -1 1 -1 -1
-1 1 1 -1 -1 1 -1
1 1 1 1 1 1 1
This is a ha lf-fra ction 2 4 -1 design - a Resolution IV design w ith only 8 runs.

530
Half-Fractional Experiment Creation

g 4 runs can not p
Having project
j 4 factor therefore,, this would have 3 degrees
g of freedom,, so the
answer is a big fat NO.
Why is the design, shown as orange rows, called a “half” fraction? This is the design
just created on the previous slide. This is a half fraction since a full 2x2x2x2 factorial
would take 16 runs. With the half fraction we can estimate the effects of 4 factors in 8
runs. What is the cost? We lose the ability to study the higher order interaction
independently!
A B C D AxB AxC BxC AxBxC

-1 -1 -1 -1 1 1 1 -1
1 -1 -1 -1 -1 -1 1 1
-1 1 -1 -1 -1 1 -1 1
Half Fraction: 1 1 -1 -1 1 -1 -1 -1
-1 -1 1 -1 1 -1 -1 1
Alias Structure: 1 -1 1 -1 -1 1 -1 -1
D = ABC -1 1 1 -1 -1 -1 1 -1
Note D settings 1 1 1 -1 1 1 1 1
-1 -1 -1 1 1 1 1 -1
are the same as 1 -1 -1 1 -1 -1 1 1
the ABC -1
1 1 -1
1 1 -1
1 1 -1
1 1
1 1 -1 1 1 -1 -1 -1
interaction -1 -1 1 1 1 -1 -1 1
1 -1 1 1 -1 1 -1 -1
-1 1 1 1 -1 -1 1 -1
1 1 1 1 1 1 1 1
Could we create a quarter fraction experiment out of the above

matrix and still study four factors at once?
Why or why not?
Graphical Representation of Half-Fraction

Why would we call this a half fraction? Because half the number of runs is necessary as apposed to
that of a Full Factorial.
We have discussed half-fractional Experimental Designs for 4 factors:

The graphical representation shows the 8 runs we created on the previous 2
slides.
- A + - A +
Top line of previous slide
-
- C
+
B
-
+ C
+
- D +
Remember that D is confounded with the ABC interaction in this half-fractional
design.

531
Design Generators
Don’tt worry – MINITABTM will take care of this! THANK YOU MINITABTM!!!!
Don
Design Generators are an easier technique to use than to generate the

Fractional Factorial Designs by hand as done in the previous slides.
Design Generators help us EASILY find the confounding within the

g
Fractional Design.
A Design Generator is the mathematical definition for how to begin

aliasing a Full Factorial to create a Fractional Factorial.
Example of a Design Generator:
Design Generator D = ABC
This means the D column is the same as the ABC

interaction column; they cannot be distinguished from
each other so are called “confounded”.
This graph helps us visually draw the conclusion of the data that we already have. We have
highlighted in green two boxes and this can very simply be filled in by the data expressed by the
generator; A times B times C equals D.
Design
g Generator D = ABC
• Because of the Design Generator we can now fill out the D column
– For each row of D, multiply the values in the columns of A, B and
C together and create the column
• You may correctly suspect some 2-factor interactions are
confounded
• Create contrast columns for AD,
AD BD,
BD CD using a similar technique
used to create the column for D
A B C AB AC BC D AD BD CD
-1 -1 -1 1 1 1
1 -1 -1 -1 -1 1
-1 1 -1 -1 1 -1
1 1 -1 1 -1 -1
-1 -1 1 1 -1 -1
1 -1 1 -1 1 -1
-1 1 1 -1 -1 1
1 1 1 1 1 1

532
Design Generators (cont.)
Which columns are the same?

A B C AB AC BC D AD BD CD
-1 -1 -1 1 1 1 -1 1 1 1
1 -1 -1 -1 -1 1 1 1 -1 -1
-1 1 -1 -1 1 -1 1 -1 1 -1
1 1 -1 1 -1 -1 -1 -1 -1 1
-1 -1 1 1 -1 -1 1 -1 -1 1
1 -1 1 -1 1 -1 -1 -1 1 -1
-1 1 1 -1 -1 1 -1 1 -1 -1
1 1 1 1 1 1 1 1 1 1
Why do I want to know this?
Generate this design in MINITABTM and bring up the Session Window.
MINITABTM Session Window
This MINITABTM output

What does this mean?
gives the summary of
what you did on the
previous slides much
quicker than we can do
by hand. The reason we Factors: 4 Base Design: 4, 8 Resolution: IV
have you did things Runs: 8 Replicates: 1 Fraction: 1/2
manually earlier is to Blocks: none Center pts (total): 0
being to appreciate and
understand
d t d th the D i
Design Generators:
G t D = ABC
MINITABTM output Alias Structure
I + ABCD
generated in the session
window after you create A + BCD
a Fractional Factorial B + ACD
design with 4 factors, half C + ABD
fraction with no Center D + ABC
Points or replicates and AB + CD
the number of blocks AC + BD
equal to 1. You should AD + BC
get the same output. Try
it.
Notice after the design structure an alias structure is indicated. The line under the alias structure
showing A plus BCD means the A Main Effect is confounded with the 3 way interaction BCD. Also,
later we can see the AB 2 2-way
way interaction is confounded with the CD 2 2-way
way interaction meaning we
cannot distinguish if the interaction is statistically significant whether it is a result of the AB or CD
interaction or a combination.

533
So What is “Confounding”?
Confounding is the consequence an experimenter accepts for not running a

Full Factorial Design.
W hen using the “ Confounding” or “ Alias” pattern we assume that the
higher order interactions in a Confounded effect are not significant.
– Sparsity of effects principle indicates that higher order interactions
are very rare.
• “ W hile intera ctions a re im porta nt they do not a bound…,
intera ctions tha t a re m ore com plex tha n those involving
tw o fa ctors a re ra re” Thom a s B. Ba rk er
In the past example, the D factor was Confounded with the ABC 3-way
interaction. W hen the effect is assigned to D which is Confounded with
ABC, we assume because of the sparsity of effects principle the effect is
entirely because of the D factor.
Remember when two items such as an interaction with a Main Effect are
Confounded, one cannot distinguish if the statistical significance is a result
of the Main Effect or the interaction or a combination.
Alia sing is a nother term for “ Confounding” .
Confounded Effects With Fractionals

Using more enhance visuals, here is another Fractional Design structure, notice how the Alias
structure A is Confounded with the two way interaction. The light green box indicates this to be true
the most obvious.
M IN ITABTM w ill a utoma tica lly Sa me levels

genera te the a lia s structure w hich
lists a ll the Confounded Effects. Alia s
N ote: For this ca se AA BC
BC ABC
ABC
Structure +1
– A is Confounded w ith BC +1 +1
+1 +1
+1
-1
-1 -1
-1 +1
+1
– B is Confounded w ith AC I + ABC -1
-1 -1
-1 +1
+1
– C is Confounded w ith AB
The Confounding m ea ns a ny effect A + BC B AC ABC
noted ca nnot be specifica lly -1 -1 +1
a ssigned to either of the Confounded
B + AC
+1 +1 +1
fa ctors. C + AB
-1 -1 +1
– Rem
R ember
b w e w ill use the
th +1 +1 +1
spa rsity principle.
C AB ABC
N ote: This is a level III design -1 -1 +1
a nd is N O T recommended since -1 -1 +1
Confounding ex ists betw een +1 +1 +1
M a in Effects a nd 2 -fa ctor +1 +1 +1
intera ctions.

534
Experimental Resolution
k-p
2R
Remember R in the nomenclature referenced the Resolution.
This useful visual aid remembers definitions of the
Confounding designated by the Resolution.
Resolution III Fully Saturated Design

Hold up Three Fingers, One on one
hand and Two on the other
other. This
illustrates the Confounding of main
Main Effects Two Way Interactions effects with two way interactions.
Resolution IV
Next hold up four fingers
The Confounding is main effects with
three way interactions or…
Main Effects Three Way Interactions
Two way interactions Confounded with

other two way interactions.
Two Way Interactions Two Way Interactions
k-p
2R
The visual aid is shown through Resolution V.
Resolution V
Hold up Five Fingers, One on one hand and
F
Four on the
th other.
th This
Thi illustrates
ill t t the th
Confounding of main effects with four way
Main Effects Four Way Interactions interactions or …
Two way interactions Confounded with

three way interactions.
Two Way Interactions Three Way Interactions

535
MINITABTM Fractional Factorial Design Creation

We have already seen this
MINITABTM output from the Fortunately, MINITABTM creates the designs for us to prevent having to create
Session Window after a a fractional factorial by hand. This output, found in the MINITABTM session
Fractional Factorial Design window after creating a Fractional Factorial design, should be understood
was created. We have because it also informs us of the Resolution of the design.
highlighted in green an area Stat>DOE>Factorial>Create Factorial Design … 4 factors, Designs, ½ fraction
not focused on yet until
Resolution was discussed.
MINITABTM automatically Factors:
F t
Runs:
4 Base
B D
Design:
i
8 Replicates:
4,
4 8 RResolution:
l ti
1 Fraction: 1/2
IV
tells us the Resolution and if Blocks: none Center pts (total): 0
we use the hands technique Design Generators: D = ABC

Alias Structure
to remember the Aliasing I + ABCD
type of structure, we can
A + BCD
save time. The Resolution B + ACD
C + ABD
can get very complicated D + ABC
with those screening AB + CD
AC + BD
Fractional Factorial Designs AD + BC
with factors more than 5 so
this help is desirable.
2V (5 -1) Fractional Design Resolution V
Example of a very useful Fractional Design often used for screening designs.
Run A B C D E
1 -1 -1 -1 -1 1
2 1 -1 -1 -1 -1
3 -1 1 -1 -1 -1
E
4 1 1 -1 -1 1
5 -1 -1 1 -1 -1
6 1 -1 1 -1 1
7 -1 1 1 -1 1
B
8 1 1 1 -1 -1
C
A 9 -1 -1 -1 1 -1
D 10 1 -1 -1 1 1
11 -1 1 -1 1 1
Pros Cons 12 1 1 -1 1 -1
13 -1 -1 1 1 1
5 factors (Main Effects) 16 trials to get 5 Main Effects 14 1 -1 1 1 -1
10 2-way interactions 2nd order interactions are
15 -1 1 1 1 -1
Main Effects only Confounded Confounded with 3rd order
with rare 4-way interactions 16 1 1 1 1 1

536
MINITABTM’s Display of Available Designs

Lots of options
p here – once again
g the g
great MINITABTM itself!
Fra ctiona l Designs a re

colored box es w ithout “ Full”
N ote: Since w e discoura ge Design Resolution III or IV, M IN ITABTM ha s

sha ded these a s RED a nd YELLO W for ca utiona ry. GREEN is a ccepta ble
beca use M a in Effects a re not Confounded w ith low er level intera ctions.
DOE Methodology
We have included a copy of the methodology here for you to use when following our practical
example for Fractional Factorials.
1. Define the Pra ctica l Problem

2. Esta blish the Ex perimenta l O bjective
3. Select the O utput (response) Va ria bles
4. Select the Input (independent) Va ria bles
5. Choose the Levels for the input va ria bles
6. Select the Ex perimenta l Design
7. E ecute
Ex t the
th Ex
E periment
i t a nd
d collect
ll t da
d ta
t
8. Ana lyze the Da ta from the designed ex periment a nd
dra w Sta tistica l Conclusions
1 0 .Replica te or Va lida te the Ex perimenta l Results
1 1 .Implement Solutions
Just follow these simple steps…..

537
Fractional Factorial Example

• 8 factors are of interest in increasing the output but process knowledge
is limited because of a previously poor gauge for the output
• The output is to be maximized
3 . Select the O utput Va ria bles
• The output is labeled Y and has a Gage R&R % study variation of less
than 5%
4 . Select the Input Va ria bles
• The Input Variables are simply labeled A through H
• For simplicity sake of this exercise, the Levels can be expected to be
appropriately
pp p y set and we will only
y work with coded levels
This is a two to the eighth minus four power design with a resolution four design
design. This design has
16 runs as you see in the graphic with all eight factors at two levels.

Select the appropriate design in MIN ITABTM and create this exact worksheet in
columns C1 through C12.
W e have no reason to believe curvature exists and are satisfied that no
replicates
li t are required.
i d
For ease of this exercise, be sure N OT to have randomized the experiment.

538
Fractional Factorial Example (cont.)
7 . Ex ecute the Ex periment a nd Collect the Da ta

Select the appropriate design in MIN ITABTM and create this exact worksheet in
columns C1 through C12.
W e have no reason to believe curvature exists and are satisfied that no
replicates are required.
The resources and time allow us to only run the experiment with 16
treatment combinations or experimental runs.
Take a look at what Confounding exists before you jump into analysis.
8 . Ana lyze the Da ta a nd dra w Sta tistica l Conclusions

Before doing any analysis, let’s review what Confounding exists in this highly
fractionated Factorial Design
The Main Effects are Confounded with numerous 3-way interactions
The 2-way interactions are Confounded with numerous 2-way interactions
This is important and must be remembered in our analysis.

539

We chose to set alpha
p to 0.1 initially
y but this is not required.
q We find the factors with important
p Main
Effects are E, H and B. The 2-way interactions AC, AF and AE seem important at an alpha level of
0.1.
We want 95% confidence in our Statistical Conclusions for this example.
We have generated the initial Pareto of effects.
Pareto
ParetoChart
Chartof
ofthe
theEffects
Effects
(response
(responseisisY,
Y,Alpha
Alpha==.10)
.10)
0.26
0.26
EE FFactor
actor Name
N ame
AA AA
AC
AC BB BB
HH CC CC
BB DD DD
AF EE EE
AF FF FF
AE
AE GG GG
AD
AD HH HH
TTerm
Term
AA
AG
AGG
CC
AH
AH
AB
AB
GG
FF
DD
00 22 44 66 88 10
10 12
12 14
14
Effect
Effect
Lenth's
Lenth's PSE
PSE==0.129375
0.129375
A choice m ust be m a de in reducing our m odel or reducing the num ber of

term s in the m odel. W e ha ve chosen to look a t the Confounding ta ble
genera ted by M IN ITABTM .
The AC 2 fa ctor intera ction is Confounded w ith other 2 -w a y

intera ctions but w e w ill a ssum e for now using the Confounding
ta ble from M IN ITABTM tha t the 2 -w a y AC intera ction is a ctua lly
the EH 2 fa ctor intera ction beca use both fa ctors E a nd H a re
significa nt.
The second highest effect for a 2 fa ctor intera ction AF. W e w ill
look a t the Confounding ta ble a nd a ssum e it is the BE 2 -w a y
intera ction since the B a nd E fa ctors a re significa nt.
The 2 -w a y intera ction AE a lso is significa nt w ith the a lpha
a bove 0 .1 . W e ca nnot find a nother 2 -w a y intera ction tha t
m ight be significa nt using just the B, E, a nd H fa ctors.
If the AE intera ction is k ept in the m odel, then to m a inta in
“ hiera rchica l order” fa ctors A a nd E m ust be k ept in the
m odel.
W e w ill now low er reduce the m odel a nd see if w e ca n
further reduce the m odel.

540
The Reduced M odel is show n here a nd w e w a nt 9 5 % confidence to

include term s.
N otice the AE 2-way interaction has the smallest effect of the statistically
significant terms and factor A kept in the model to maintain the “ hierarchical
order” also has a small term and is statistically insignificant. W e choose to
reduce the model and remove those terms. R-sq should not be severely impacted.
If it was impacted severely, we would reconsider this choice.
Factorial Fit: Y versus A, B, E, H
Estimated Effects and Coefficients for Y (coded units)

Constant 22.001 0.04381 502.21 0.000
A 0.144 0.072 0.04381 1.64 0.139
B 4.939 2.469 0.04381 56.37 0.000
E 12.921 6.461 0.04381 147.48 0.000
H -6.246 -3.123 0.04381 -71.29 0.000
A*E -0.351 -0.176 0.04381 -4.01 0.004
B*E -3.836 -1.918 0.04381 -43.78 0.000
E*H 8.244 4.122 0.04381 94.09 0.000
S = 0.175232 R-Sq
R Sq = 99.98% R-Sq(adj)
R Sq(adj) = 99.96%
Analysis of Variance for Y (coded units)
Main Effects 4 921.55 921.545 230.386 7502.91 0.000
Residual Error 8 0.25 0.246 0.031
Total 15 1252.99
The further refit m odel show s a n a dequa te m odel beca use:

Simplicity of terms; which is desired but N OT required
R-sq is quite high (overly unusual for practical experiments)
N o or few unusual observations which would be noted below the AN OVA in
MIN ITABTM ’s session window
Residuals are appropriate
Factorial Fit: Y versus B, E, H

Constant 22.001 0.07167 306.98 0.000
B 4.939 2.469 0.07167 34.46 0.000
E 12.921 6.461 0.07167 90.15 0.000
H -6.246 -3.123 0.07167 -43.58 0.000
B*E -3.836 -1.918 0.07167 -26.76 0.000
E*H 8.244 4.122 0.07167 57.51 0.000
S = 0.286673 R-Sq = 99.93% R-Sq(adj) = 99.90%

Analysis of Variance for Y (coded units)

Main Effects 3 921.46 921.462 307.154 3737.52 0.000
Residual Error 10 0.82 0.822 0.082
Lack of Fit 2 0.10 0.099 0.050 0.55 0.597
Pure Error 8 0.72 0.722 0.090
Total 15 1252.99

541
The Residua ls Ana ly sis is a dequa te a nd a ppropria te beca use:

The residuals are concluded to be normally distributed
N o pattern for residuals in the order or versus Fitted Value
Residual
Residual Plots
Plots for
for YY
Normal
NormalProbability
Probability Plot
Plot Residuals
ResidualsVersus
Versusthe
theFitted
FittedValues
Values
99
99 NN 16
16
0.4
0.4
AD
AD 0.532
0.532
90
90 P-Value
P-Value 0.146
0.146 0.2
02
0.2
al
Percentt
Residua
Residual
Percent
50
50
0.0
0.0
10
10 -0.2
-0.2
11
-0.50
-0.50 -0.25
-0.25 0.00
0.00 0.25
0.25 0.50
0.50 00 10
10 20
20 30
30
Residual
Residual Fitted
FittedValue
Value
Histogram
Histogramof
of the
theResiduals
Residuals Residuals
ResidualsVersus
Versusthe
theOrder
Orderof
ofthe
theData
Data
44 0.4
0.4
33 0.2
ency
0.2
ncy
ual
al
Residua
Residu
Frequen
Freque
22
0.0
0.0
11
-0.2
-0.2
00
-0.3
-0.3 -0.2
-0.2 -0.1
-0.1 0.0
0.0 0.1
0.1 0.2
0.2 0.3
0.3 0.4
0.4
11 22 33 44 55 66 77 88 99 10
1011
11 12
1213
1314
14 15
1516
16
ObservationOrder
Order
Residual
Sta tistica l Conclusions to m a inta in term s in the m odel m ust consider:

Maintaining hierarchical order
A 2-way interaction must have the involved factors in the model also
High statistical confidence with the P-value less than your alpha risk
A higher R-sq or model explanation of the process changes is desired
Proper residuals and few to no unusual observations
No, no unusual
observations here…

542

W e will have to remember our Experimental Objective to increase the output Y.
Looking at the positive coefficient for B and E, we know if we put those factors
at the high level or value of +1, the output increases
Looking at the negative coefficient for H, we would think we should operate at
the low level or value of -1. However, the 2-way interaction of EH shows a
coefficient that is larger and would result in a net decrease in the output of Y
so we must set H to a +1 or the high level
level.
A big reminder is we have ASSUMED the 2-way interactions involved the
factors we left in the model.
Factorial Fit: Y versus B, E, H

Constant 22.001 0.07167 306.98 0.000
B 4.939 2.469 0.07167 34.46 0.000
E 12.921 6.461 0.07167 90.15 0.000
H -6.246 -3.123 0.07167 -43.58 0.000
B*E -3.836 -1.918 0.07167 -26.76 0.000
E*H 8.244 4.122 0.07167 57.51 0.000
S = 0.286673 R-Sq = 99.93% R-Sq(adj) = 99.90%
It ca n be difficult to optim ize the solutions a nd get the Pra ctica l Solution
desired.
Using Response O ptim izer w ithin M IN ITABTM helps us find the Pra ctica l
Solution of setting the fa ctors left in the m odel a ll a t the high level or + 1

543
Pra ctica l Conclusions to k eep in the m odel include:

Simple models can be useful depending on the project or process
requirements
Terms with practically large enough significance even if statistically
significant
Impact of R-sq by removing a term with low effects
Ability to set and control the controllable inputs in the model may
decide on the use of terms
Robust designs or minimal variation requirements may require
close inspection of interactions’ effects on the Y
If multiple outputs are involved in the process requirements,
balancing of requirements will be necessary
That’s a lot of juggling….

After we have determined with 95% statistical confidence, we must
replicate the results to confirm our assumptions; such as which 2-way
interactions were significant among the Confounded ones
If the results do not match the expected results OR the project goal,
further experimentation may be needed
In this case, we were able to achieve 29.8 on average with the
process setting of E, B and H and so the results are considered
successful in the project
We win, we win…!!

544
1 1 . Im plement Solutions
W ork with the Process Owners and develop the Control Plans
to sustain your success
Fractional Factorial Exercise
Ex ercise objective: Open file “ bhh379.mtw” and

analyze using the 11 Step methodology.
1. W hat kind of Factorial Design is this?
2. Generate Factorial Plots in MIN ITABTM .
3. Create the Statistical and Practical model.

545
• Explain why & how to use a Fractional Factorial Design
• Create a proper Fractional Factorial Design
• Analyze a proper model with aliased interactions
Not that kind of model!!
You have now completed Improve Phase – Fractional Factorial Experiments.
Notes

546
Lean Six Sigma

Black Belt Training
Improve Phase
Wrap Up and Action Items
Congratulations on completing the training portion of the Improve Phase. Now comes the
exciting and challenging part…implementing what you have learned to real world projects.

547
Improve Phase Overview—The Goal

This is a summary of
the purpose for the The goa l of the Improve Pha se is to:
Improve Phase.
Avoid getting into
analysis paralysis, • Determine the optimal levels of the variables which are significantly
only use DOE’s as impacting your Primary Metric.
necessary. Most
problems will NOT • Demonstrate a working g knowledge
g of modeling
g as a means of
require
i the
h use off process optimization.
Designed
Experiments
however to qualify as
a decent Green Belt
you at least need to
have an
understanding of
DOE as described
above.
Improve Phase Action Items
• Listed below are the Improve Phase deliverables that each candidate
will present in a Power Point presentation at the beginning of the
Control Phase training.
• At this point you should all understand what is necessary to provide
these deliverables in your presentation.
– Team Members (Team Meeting Attendance)
– Primary Metric
– Secondary Metric(s)
– Experiment Justification
– Experiment Plan / Objective
– Experiment Results
– Project Plan
– Issues and Barriers
It’s your show!
Before beginning the Control Phase you should prepare a clear presentation that addresses each
topic shown here.

548
Six Sigma Behaviors
• Being tenacious
tenacious, courageous
• Being rigorous, disciplined
• Making data-based decisions
• Embracing change & continuous learning Walk

• Sharing best practices
the
Walk!
Ea ch ““pla
Each player ” in
yer” in the
the Six
Six Sigma
Sigmaa process
Sigm process m ust be
must be
AA RO LE M O DEL
RO LE M O DEL
for
for the
the Six
Six Sigm
Sigmaa culture.
culture.
Improve Phase - The Roadblocks
Look for the potential roadblocks and plan to address them before they
become problems:
– Lack of data
– Data p presented is the best g
guess by
y functional managers
g
– Team members do not have the time to collect data
– Process participants do not participate in the analysis planning
– Lack of access to the process
Each phase will have roadblocks. Many will be similar throughout your project.

549
DMAIC Roadmap
Process Owner
Champion/
Identify Problem Area
Determine Appropriate Project Focus

Define
Estimate COPQ
Establish Team
Measure
Assess Stability, Capability and Measurement Systems
Identify and Prioritize All X’s

Analyze
Prove/ Disprove Impact X’s Have On Problem

ve
Identify, Prioritize, Select Solutions Control or Eliminate X’s

X s Causing Problems
Improv
Implement Solutions to Control or Eliminate Xs Causing Problems

Control
Implement Control Plan to Ensure Problem Doesn’t Return
Verify Financial Impact
The objective of the Improve Phase is simple – utilize advanced statistical methods to identify
contributing variables OR more appropriately optimize variables to create a desired output.
Improve Phase
Over 80% of projects will realize there
solutions in the Analyze Phase – Analysis Complete
Designed Experiments can be extremely
effective when used properly, it is Identify Few Vital X’s
imperative that a designed experiment is

justified. From an application and Experiment to Optimize Value of X’s
practical standpoint, if you can identify a
solution by utilizing the strategy and tools Simulate the N ew Process
within the Measure and Analyze Phases,
then do it. Do not force Designed
Validate N ew Process
Experiments.
Remember, your sole objective in Implement N ew Process
conducting a Lean Six Sigma project is to

find a solution to the problem. You
created a Problem Statement and an Ready for Control
Objective Statement at the beginning of

your project.
you p oject However
o e e you ca can reach
eac a
solution that achieves the stated goals in
the Objective Statement, than implement
them and move on to another issue –
there are plenty!

550
Improve Phase Checklist
Improve Pha se Q uestions
• Are the potential X’s measurable and controllable for an experiment?
• Are they of statistically significant and practical significance?
• How much of the problem have you explained with these X’s?
X s?
• Have you clearly justified the need for conducting a designed

experiment?
• Are adequate resources available to complete the project?
• W hat next steps

p are yyou recommending?
g
These are questions that the participant should be able to answer in clear, understandable language
at the end of this phase.
Planning for Action
W HAT W HO W HEN W HY W HY N O T HO W
A DOE to meet your problem solving strategy
Scheduling your experimental plan
Executing your planned DO E
Analysis of results form your DOE
Obtain mathematical model to represent process
Planning the pilot validation for breakthrough
Present statistical promise to process owner
Prepare for implementation of final model
Schedule resources, for implementation timeline
Conclude on expected financial benefits
Over the last decade of deploying Six Sigma it has been found that the parallel application of the
tools and techniques in a real project yields the maximum success for the rapid transfer of
knowledge. It is imperative that you complete this and submit your plan for action for review with
your mentors. Thanks and good luck!

551
At this point, you should:
Have a clear understanding of the specific action items
Have started to develop a project plan to complete the

action items
Have identified ways to deal with potential roadblocks
Be ready to apply the Six Sigma method within your

business
You’re on your way!
You have now completed Improve Phase – Wrap Up and Action Items.
Notes

552
Lean Six Sigma

Black Belt Training
Improve Phase
Quiz
Now we will see what you have retained from the Improve Phase of the course. Please answer
these questions to the best of your ability without referencing the text. The answers are in the
Appendix. Please check your answers against the answers provided and review the sections in
the Improve Phase where your retention of the knowledge is less than you desire.

553
Improve Phase Quiz
1 M
1. Multiple
lti l RRegressions
i are b
bestt used
d ffor?
?
A. Non-linear relationships between an X and a Y.
B. Uncertainty in the slope of the linear relationship between an X and a Y.
C. Relationships between Y and two or more X’s.
D. Replacing the use of a Designed Experiment.
2. Which relationships can be modeled with a Regression Equation? (check all that apply)
A. Simple Linear
B. Quadratic
C. Cubic
D. Multiple Linear
E. Logarithmic
3. Which statements are true about Multiple Regressions? (check all that apply)
A. Multiple Regressions are a form of experimentation.
B The
B. Th X’X’s are assumed d tto b
be iindependent
d d t off each
h other.
th
C. The X’s are assumed to not be correlated.
D. The residuals or errors are assumed to be Normally Distributed.
E. Interactions are NOT included in Multiple Linear Regressions.
F. R2 and the statistical confidence of the coefficients are impacted by the measurement
error of the inputs or X’s.
4. If a process output was mathematically transformed to achieve a Normal Distribution,

then which statements are NOT true? (check all that apply)
A. Independent of the transform, the upper specification will be a larger number than the
lower specification when transformed.
B. If the transform by the Box Cox transformation command in MINITABTM generated a
lambda equal to 0.5, then the upper specification limit of 100 would then be transformed
to 10.
C. The transformation function must be a smooth and continuous function.
D Th
D. The process d data
t iis ttransformed
f dbbutt nott the
th specification
ifi ti lilimits.
it
5. The results for experiments include the desire for problem solving, screening factors and
(check all that apply)
A. Physically model a process
B. Screening factors among possibilities
C. Achieving a robust design
D. Provide Regression Analysis
E. Understand the impact of an improved Measurement System
6. Which Experimental Design typically is most associated with the fewest number of input
variables or factors in the design?
A. Fractional Factorial Design
B. Full Factorial Design
C. Simple Linear Regression
D R
D. Response S Surface
f D
Design
i

554
Improve Phase Quiz
7. The 11 step methodology recommended for performing a DOE has which item as the first
step?
A. Select the output response variable(s)
B. Select the Experimental Design
C. Select the input variables
D. Define the Practical Problem
8. How many experimental runs exist in a full factorial 2-level design for 5 factors with 2
replicates for the Corner Points and no Center Points?
A. 10
B. 16
C. 32
D. 34
E. 64
9. Which statements are true about Full Factorials? (check all that apply):
A. Full Factorials are used when 5 or fewer factors are involved.
B. Full Factorials are better for optimizing a process than Fractional Factorials.
C. Full Factorials are used instead of Fractional Factorials if interactions need to be fully
understood.
D. Full Factorials are used for screening factors if the Analyze Phase was unable to
narrow the critical factors sufficiently.
E. Full Factorials never have Center Points in the design.
10. Examples of the first step in the recommended 11 step methodology for a DOE include:
(check all that apply)
A. Consider the cost of a DOE.
B. The root cause for the defective product characteristic needs to be found.
C. The variation needs to be affected by the input factors.
D. The response time to calls needs to be reduced.
E. The DOE effect on the project timeline needs to be considered.
11. What is the best reason for not selecting too large of a difference among the factor
levels in the Experimental Design?
A. The process output must not change too much.
B. The process may show little change if curvature exists and the local maximum of the
process output is between the large differences of factor levels chosen.
C. The experimental factors have rarely been operating in such a wide range.
D The experiment must have Center Points if the factor levels are wide.
D. wide
12. Which statements are correct about Experimental Designs? (check all that apply):
A. An Experimental Design cannot be orthogonal if not balanced.
B. An Experimental Design can be a balanced design but not orthogonal although it is
encouraged to use only balanced and orthogonal designs.
C. The use of blocking can be used for accounting of the impact of Noise variables.
D. Center Points are not recommended unless the experimenter is attempting to
optimize the process.
E. A resolution IV design has only 4-way interactions confounded with Main Effects.

DMAIC - Improve Phase

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DMAIC - Improve Phase

Uploaded by

Copyright:

Available Formats

Cu s

Fra ctiona l Fa ctoria l

Certified Lean Six

Lean Six Sigma

Certified Lean Six Sigma Black Belt Book Copyright OpenSourceSixSigma.com

Full Fa ctoria l Ex perim ents

Fra ctiona l Fa ctoria l Ex perim ents

W ra p Up & Action Item s

Identify Problem Area

Determine Appropriate Project Focus

Assess Stability, Capability, and Measurement Systems

Identify and Prioritize All X’s

Prove/ Disprove Impact X’s Have On Problem

Identify, Prioritize, Select Solutions Control or Eliminate X’s Causing Problems

Implement Solutions to Control or Eliminate Xs Causing Problems

Implement Control Plan to Ensure Problem Doesn’t Return

Verify Financial Impact

Certified Lean Six Sigma Black Belt Book Copyright OpenSourceSixSigma.com

Identify Few Vital X’s

Experiment to Optimize Value of X’s

Simulate the N ew Process

Ready for Control

Certified Lean Six Sigma Black Belt Book Copyright OpenSourceSixSigma.com

Lean Six Sigma

Certified Lean Six Sigma Black Belt Book Copyright OpenSourceSixSigma.com

Process Modeling Regression

Certified Lean Six Sigma Black Belt Book Copyright OpenSourceSixSigma.com

Process Modeling Regression

• The primary purpose of linea r correla tion a na lysis is to measure the

If as X increases there is no definite shift in the values of Y, there is no correlation, or no

If as X increases there is a shift in the values of Y, there is a correlation.

Certified Lean Six Sigma Black Belt Book Copyright OpenSourceSixSigma.com

Process Modeling Regression

Ho: N o Correlation Ho ho ho….

The correlation coefficient (always) assumes a value between –1 and +1.

The correlation coefficient of the population

Types and Magnitude of Correlation

Certified Lean Six Sigma Black Belt Book Copyright OpenSourceSixSigma.com

Process Modeling Regression

Open MiniTab worksheet RB Stats Correlation.mtw

Certified Lean Six Sigma Black Belt Book Copyright OpenSourceSixSigma.com

Process Modeling Regression

Get outta my way!

In MINITABTM select “Graph>Scatter

Lowess stands for LOcally-

used to explore the

regression line or 1250

does not change sharply. In

Certified Lean Six Sigma Black Belt Book Copyright OpenSourceSixSigma.com

Process Modeling Regression

Correlation Example (cont.)

Correlation only tells us the strength of a relationship, not the numerical

The last step to proper analysis of Continuous Ddata is to determine the

The regression equation can mathematically predict Y for any given X.

Certified Lean Six Sigma Black Belt Book Copyright OpenSourceSixSigma.com

Process Modeling Regression

Simple vs. Multiple Regression

Simple Regression: In Simple

Regression Analysis Graphical Output

Certified Lean Six Sigma Black Belt Book Copyright OpenSourceSixSigma.com

Process Modeling Regression

Regression Analysis Statistical Output

Stat > Regression > Regression

Regression Ana ly sis: pa yton ya rds versus pa yton ca rries

Perform the steps in a Correlation and a Regression Analysis

Explain when Correlation and Regression is appropriate