You are on page 1of 134

Step 8:

Discover Variable Relationships

Tools Included:
Introduction to Design of Experiment
2k Factorials

Green Belt Training

1
Step 8: Purpose
• Discover variable relationships:
– Introduction to Design of Experiment, an important
tool for process improvement.
– Recognize the importance of Planning Experiments
for success.
– Become familiar with the various types and levels of
Design of Experiments and their applications,
advantages and disadvantages.
– Use DOE to Discover Variable Relationships in a
process (some projects will not use DOE in this
step).
2
DOE vs. Discover Variable Relationships

• Do all projects require a DOE to establish


variable relationships?
– NO, not all projects require a DOE
– Many Green Belt projects will not.
• The most important reason to use a DOE is if
you have multiple factors that can effect each
other as well as the output (Y).

3
Step 8 - Other Options to Get There

If you have no DOE, Variable relationships can be


determined in step 8 through continued use of Hypothesis
testing possibly including regression analysis.
• T tests
• ANOVA
• Proportions tests
• Test for equal variances
• Chi Square
• Fitted Line Plot
• Multiple regression
• Correlation

4
Six Sigma Breakthrough Steps
Step 1 - Select Output Characteristic
Define
- Identify Key Process Input/Output Variables
Step 2 - Define Performance Standards
Step 3 - Validate Measurement System
Measure Step 4 - Establish Process Capability
Step 5 - Define Performance Objectives
Step 6 - Identify Variation Sources
Analyze
Step 7 - Screen Potential Causes
Step 8 - Discover Variable Relationships
Step 9 - Establish Operating Tolerances
Improve
Step 10 - Validate Measurement System
Step 11 - Determine Process Capability
Control Step 12 - Implement Process Controls

5
Narrow Input Variables
Process Map
30 - 50 Inputs Variables
C&E Matrix and FMEA
Key Process Input
Measure 10 - 15 Variables (KPIVs)
Gage
GageR&R,
R&R,Capability
Capability
Multi-Vari
Multi-VariStudies,
Studies,
Correlation's
Correlation's Analyze 8 - 10 KPIVs
T-Test,
T-Test,ANOM,
ANOM,ANOVA
ANOVA
Screening
ScreeningDOE’s
DOE’s
4-8 Critical KPIVs
DOE’s,
DOE’s,RSM
RSM Improve
Quality
QualitySystems
Systems
Key Leverage
SPC, 3-6
SPC,Control
ControlPlans
Plans Control KPIVs

Step 8 is where we make the fixes.


We decide which Xs are important and where to set them.
6
Step 8:
Discover Variable Relationships

Introduction to Design of Experiment

7
Step 8: Questions to Answer

Introduction to Design of Experiment:


• What are the ways of learning?
• What is Design of Experiment (DOE)?
• Why use Design of Experiment (DOE)?
• What is inference space?

8
Discussion Topics

• Ways of Learning
• Definition - Design of Experiment
• Some Terminology
• Barriers to Experimentation
• Benefits - Design of Experiment
• Levels & Types of Design
• Inference Space
• Randomization

9
Ways of Learning

• Watching: A perceptive observer of naturally


occurring informative events as a process runs
its normal course. If you’re lucky, an
informative event might happen while you’re
present.
• Experimental Design: Proactively manipulate
inputs so their effect on the outputs can be
studied. You invite an informative event to
occur to demonstrate its impact on the output.
If done correctly, an experiment is efficient and
powerful.
10
Terminology
• Factor: One of the controlled or uncontrolled inputs
into a process whose influence upon a response is
being studied in an experiment. (example:
Temperature)
• Level: Value of the input factor being examined in
an experiment. (example: Temperature levels of 100
and 200 degrees C).
• Treatment Combination: A unique set of factor level
combinations. (example: Set Temperature factor at
the 100 degree C level and the pressure factor at the
50PSI level).
11
Terminology (continued)
Coded Units: Factor level values, e.g., -1 = Low,
+1 = High, 0 = Center Point. Coded units are used
as standard design. The standard order is set-up in
coded units. Analysis of final equation is easier if
coded units are used.
Uncoded Units: Actual factor level values.
Example: Temperature is factor
Levels Low Center Point High
Uncoded Levels 100C 150C 200C
Coded Levels -1 0 +1
12
Definition
A Design of Experiment (DOE) is a systematic
set of experiments which permits one to evaluate
the effect of multiple factors on the output both
individually and together .
• It begins with the statement of experimental
objectives and ends with the reporting of the
results.
• It can eliminate the effect of all possible factors
except the ones you have changed.

13
Design Of Experiment

Design of Experiment (DOE) is a proactive


tool!
• There is no such thing as a bad experiment, only
poorly designed and executed ones.
• Not every experiment will produce major results,
but most will provide information.
• New data prompts asking new questions and
generates follow-on studies.

14
Barriers
• Problem statement not clearly defined.
• Experimentation objectives not clearly defined.
• Inadequate brainstorming when planning the
experiment.
• Results of the experiment were unclear.
• The experiment was too costly.
• The experiment was too time consuming.
• There was a lack of understanding of
experimentation strategies.
15
Barriers (continued)
• There was a lack of understanding of
experimentation tools.
• Lack of confidence during the early stages of the
experiment.
• Resource Competition
• Recognizing the need for instant results.
• Lack of adequate coaching support when
conducting the experiment.

16
Benefits
• Relatively short period of time to get results.
• Relatively low cost for conducting the experiment.
• Excellent chance of detecting optimal settings.
• Very high confidence in the results.
• Provides the ability to identify independent main
and interaction effects.
• Statistical Experiments Are Rich With
Information!

17
DOE Tool Matrix
Type of Design Screening Characterization Optimization Definition
One Factor at a Time
(OFAT)    Toggle one item at a time
holding everything else
constant (inexpensive, but
does not see interactions well).
Fractional Factorials
   Looks at some of the
combinations. Frequently used
to take first look when there are
many factors.
Full Factorials
General Full Factorial    Looks at all combinations. Can
provide final DOE results.
And 2K
Response Surface
Methods Forthis   Provides mathematically
optimized results using multiple
Introduction to dimension mapping.
Infrequently used.
DOE, we will
focus  
PLEX Used to achieve the objective
on 2K through limited 2k supervised
experiments done during
Full Factorial regular production. Can
evaluate different sets of X’s.
EVOP
   Continuous improvement
method used to optimize a
process during regular
production. High operator
involvement. Will optimize
limited number of X’s.
18
Levels of Experiments

Used when
there are many Screening Designs
factors

Most of the Characterization


benefits come
Studies
from here

Optimization
Studies

19
Inference Space
Inference space is the area within which you can draw
your conclusions.
• Narrow Inference: Experiment focuses on a specific
subset of the overall operation and these studies are
not effected by Noise variables.
• Broad Inference: Usually addresses an entire
process requiring more data to be taken over a
longer period of time and these studies are effected
by Noise variables.
Generally, Narrow Inference Studies are done first to
control Noise and then Broad Inference Studies are
used to verify results of the Narrow Inference Studies.
20
Randomization – The Experimenter’s Insurance

• Lets discuss a plating etch bath.


• The output of concern is etch rate – higher is
better.
• Someone wants to evaluate if adding a stirrer
to the bath will increase etch rate.
• We tell the supervisor to evaluate etch rate 20
times for both stirrer on and stirrer off.
• We are in a hurry and say he has one day to
do test.
• What do we get?
21
Ran 20 with stirrer “off” then 20 with it “on”

110

100 ON
Etch Rate

90

80 OFF

70

60
Index 10 20 30 40
Avg Off = 91.70 If higher is better

Avg On = 74.50
22
Ran with randomization

on
110
off on on
off on
off onon
100 off off on
off on
Etch Rate1

off on
90 on
off off
off on on
off on
on
80 off on
off on onon
on
off off on
70 off off
off
off off
60
Index 10 20 30 40
Avg Off = 82.0
Avg On = 88.6
23
The results

• Which tells the truth about the value of


stirring?
• The degradation of the bath over the day is
called a “lurking variable”.
• While this one would have been easy to
predict, not all lurking variables are so
obvious.
Randomize whenever feasible

24
Step 8: Questions to Answer

Introduction to Design of Experiment:


• What are the ways of learning?
• What is Design of Experiment (DOE)?
• Why use Design of Experiment (DOE)?
• What is inference space?

25
Step 8: Answers Summary
Introduction to Design of Experiment:
• There are two ways of learning: watching a process
and hoping an informative event occurs; and,
experimental design where you manipulate the
factors to invite an informative event to occur.
• A Design of Experiment (DOE) is a systematic set
of experiments that will allow evaluation of the
effects of factors on the process outputs. It is the
vehicle of the scientific method, giving
unambiguous results.

26
Step 8: Answers Summary (continued)
Introduction to Design of Experiment:
• We use DOE because there is a very high confidence
in the results and there is an increased ability to identify
the independent main and interaction effects of the
process, which can lead to detecting the optimal
settings.
• Inference space is the area within which you can draw
conclusions about an experiment. There are two
classes of inference space: Narrow, focus on a specific
subset of the process where noise will not be an effect;
and, Broad, address the entire process where noise will
be an effect.
27
Step 8: Lessons Learned
Introduction to Design of Experiment:
• Management and coaching support a must for
effective experimentation.
• DOEs are an advanced tool to Discover Variable
Relationships, but they are not the only tool
available.

28
Step 8:
Discover Variable Relationships

Planning Experiments

29
Step 8: Questions to Answer

Planning Experiments:
• Why plan an experiment?
• What is very critical in the experiment planning
stage?
• Why perform a pilot run?
• What is repetition and replication?

30
Discussion Topics

• Initial Planning
• Planning Considerations
• Planning Method
• Conducting the Experiment
• Terminology
• Experiment Documentation
• General Advice
• Sequential Experimentation

31
DOE Planning Process

MSA Inputs

Performance
Objectives C &E

Capability
Analysis Fishbone
DOE
Plan & Design
Process
Map

32
Initial Planning

Initial planning of an experiment should be a


team involvement activity. Use of C&E Matrix,
Flowchart, FMEA, etc., are used to maximize
prior knowledge of the process. All team
members must have the same understanding of
the current process regarding; the outputs, the
baseline capability, process statistical control
and stability, and the current measurement
system(s) used.

33
Planning Considerations
• What is the objective of the experiment?
• What will the experiment cost?
• What is our plan for randomization and how will we
determine the sample sizes?
• Are all the necessary players involved or informed?
• How long will it take to perform the experiment and how
are we going to analyze the data?
• Have we planned a pilot run and walked through the
process and has all necessary paperwork been
completed?
34
Planning Method
Using the DOE Worksheet:
• Define the Problem: A complete and detailed description
that clearly defines and quantifies the problem.
• Establish the Objective: Define what is to be discovered
by conducting the experiment.
• Select the Response Variable: Define each response
variable as qualitative or quantitative and the amount of
change to be detected.
• Select the Independent Factors: Define the controlled or
uncontrolled inputs whose influence upon the response
variable(s) is being studied in the experiment.
35
Planning Method (continued)
• Choose the Factor Levels: Determine the values of the
inputs being examined in the experiment, e.g.,
temperature could be set at 100 and 160 degree levels.
• Select the Experimental Design: Choose the design
for the study to be conducted; screening,
characterization, or optimization.
• Walk Through the Experiment: Perform a pilot to
ensure everyone has the same understanding about the
experiment procedures to be performed.
• Ready to Collect Data/Start the Experiment: Plan on
how to collect and record the data.
36
Issues With Problem Statements
• Response variable (Output) poorly defined, or not
quantifiable.
• Response variable does not link to customer CTQ.
• Quantification based on gut feeling, not data.
• Data source, measurement method, not indicated.
• Scale of measure and specifications not supported
by customer data.
• Stated as a predetermined solution instead of
problem.

37
Conducting the Experiment
• Document all initial information.
• Make sure the baseline conditions are included in
the experiment.
• Make sure clear responsibilities are assigned for
proper data collection.
• Watch for and record any extraneous sources of
variation.
• Always run one or more verification runs to confirm
the results, go from Narrow to Broad Inference.
38
Terminology

Repetition: Running more then one sample of a


single run (test with a single combination of factors).
Replication: Repeating the entire experiment
(usually randomized).
• Both help to reduce noise variation and identify
measurement system error.
• Both can be used in the same experiment.
• Both determine the sample size for the
experiment.
39
Experiment Documentation
• Problem statement and objectives of the
experiment.
• Response variables and the validated MSA used.
• Overview of the experiment design and the
procedures for conducting the experiment.
• Experimentation budget and timelines for
completion and analysis.

40
Experiment Documentation (continued)
• Listing of the team members and their
responsibility at each phase of the experiment.
• Documentation of the experiment results:
• Executive summary
• Results and data analysis
• Conclusions and recommendations
• Appendices; original data if practical, detailed
data analysis and details on instrumentation or
procedures.
41
General Advice

• The planning documentation may be more


important then the running of the experiment.
• Make sure experiment objectives have been tied to
the business results of the project.
• Focus on one experiment at a time and don’t try to
answer all the questions with one study, rely on
sequential studies.
• Spend less then 25% of the budget on the initial
experiment.

42
General Advice (continued)

• Always verify experimental results with a follow-on


study.
• It is acceptable to abandon an experiment.
• Use the two-level designs early (a two level design
is one where every factor simply has a high and a
low setting – this will be discussed later). Push the
envelope with robust levels, but keep in mind safety
for the people and the equipment.
• A final documentation is a must!

43
Sequential Experimentation

• Frequently, your first DOE will lead you to a


follow-on study with new or revised factors and/or
factor levels.
• The best time to design a sequential experiment
is immediately after the previous one is finished.

44
Step 8: Questions to Answer

Planning Experiments:
• Why plan an experiment?
• What is very critical in the experiment planning
stage?
• Why perform a pilot run?
• What is repetition and replication?

45
Step 8: Answers Summary
Planning Experiments:
• You plan an experiment to efficiently design and execute
a controlled study. This includes maximizing prior
knowledge of the process and assuring all team members
understand their assignments and roles.
• The initial steps in planning an experiment are very
critical; defining the problem and establishing the
objectives of the experiment.
• A pilot is performed to ensure everyone involved has the
same understanding of the experimentation procedures
and to avoid upsets once the experiment begins.

46
Step 8: Answers Summary (continued)
Planning Experiments:
• Repetition is running more then one sample of a
single run (a test with a single combination of
factors). Replication is repeating the entire
experiment. Both help to reduce the noise
variation, identify the measurement system error
and determine the sample sizes and both can be
used in the same experiment.

47
Step 8: Lessons Learned
Planning Experiments:
• Not planning or poorly planning an experiment
will result in doing it over.
• If response variable is poorly defined, not
quantified and/or it does not link to customer
CTQ’s, results will not apply to your objectives.
• Validating Measurement System is critical.
• Factor levels too broad or narrow to get proper
results.

48
Step 8:
Discover Variable Relationships

Full Factorials

49
Step 8: Questions to Answer

Full Factorials:
• What is the purpose of a Full Factorial?
• What are the advantages of Full Factorials?
• What is a 2k Factorial?
• Why use a 2k Factorial?
• What is a Standard Order?

50
What is a Full Factorial?
An experiment that looks at all Runs Heat Dwell Pressure
factors and every possible 1 Lo Slow Min
combination of the factors, e.g., 2 Lo Slow Max
Factor 1 = 3 Lo Fast Min
heat with 3 levels; low / med / hi 4 Lo Fast Max
Factor 2 = 5 Med Slow Min
6 Med Slow Max
dwell with 2 levels; slow / fast
7 Med Fast Min
Factor 3 =
8 Med Fast Max
pressure with 2 levels; min / max 9 Hi Slow Min
10 Hi Slow Max
11 Hi Fast Min
12 Hi Fast Max

These are the 12 different combinations of factors that are


possible. Note: These are not randomized
51
Purpose

Full Factorials are used to


improve or optimize a process by
identifying the most critical
factors and interactions in a
process.

52
Advantages OF DOE

• More efficient than One-Factor-at-a-Time


(OFAT) experiments.
• Allows the investigation of the combined
effects of factors (Interactions).
• Covers a wider experimental region than
OFAT studies.
• Identifies the critical Factors (Inputs).
• More efficient in estimating effects of both
input and noise variables on the output.

53
Advantages Of DOE
Assume we performed a One-Factor-at-a-Time (OFAT) study
as shown below, where higher response is better:
Pressure 1 Pressure 2 Run One
Temperature 1 20 40

Temperature 2 50 12 Run Two

• If we held Temperature constant at level 1 and varied


Pressure, we would conclude that Pressure at level 2 is best.
• Then holding Pressure constant at level 2, we vary
temperature and find Temperature 1 is best (40).
• While we would have achieved improvement, we would have
missed the optimum point (50).

54
2k Factorials
• A special case of Full Factorial DOE
– Every factor has only two levels (e.g. Low and
High)
– Centerpoints can be used to check for linearity
– Relatively Inexpensive and quick
– Most common form of DOE

We will use 2k Factorials for our


Explanation of DOE

55
Definition
A 2k Factorial refers to an experiment with (k)
number of factors, each with (2) levels. A (22)
Factorial is a (2x2) Factorial. This design has two
factors with two levels each and can be completed
in (4) runs, (2x2). Likewise, a (23) Factorial is a
(2x2x2) design with three factors of two levels
each and can be completed in (8) runs.

Number of factors

2k Factorial
Number of factor levels
56
Standard Order
• The design matrix for 2k factorials are usually shown
in standard order.
• The low level of a factor is designated with a “-” or “-
1” and the high level is designated with a “+” or “1”.
An example of a design matrix for a (22) Factorial
would look like this:

Temp Conc
-1 -1
1 -1
-1 1
1 1

57
Standard Order (continued)

A (23) Factorial Looks like this:


Temp Conc Catalyst
A (22) design is -1 -1 -1
1 -1 -1
contained in a -1 1 -1
(23) design 1 1 -1
-1 -1 1
1 -1 1
-1 1 1
1 1 1

What are the minimum number of runs needed


a (24) Factorial Design Matrix?
58
Example of a 22 Factorial

StdOrder Temp Dwell


1 100 1Min
2 212 1Min
3 100 3Min
4 212 3Min

All Combinations are Covered

59
Example of a 23 - Coded (+/-1) Values
• This example relates two quantitative inputs (factors),
temperature and concentration, and one qualitative input
(factor), catalyst, to yield.
• The factors and levels:
– Temperature: 160o C (-1), 180o C (1)
– Concentration (%): 20 (-1), 40 (1)
– Catalyst: Brand A (-1), Brand B (1)
• Design: Temp Conc Catalyst Yield
-1 -1 -1 60 This is an example
1 -1 -1 72 of a Full Factorial
-1 1 -1 54 Experiment with only
1 1 -1 68
one observation per
-1 -1 1 52
Treatment
1 -1 1 83
-1 1 1 45
Combination (Cell).
1 1 1 80

60
Example of a 23 Factorial Actual values
StdOrder Temp Dwell Pressure
1 100 1Min 50psi
2 212 1Min 50psi
3 100 3Min 50psi
All Combinations
4 212 3Min 50psi Covered
5 100 1Min 80psi
6 212 1Min 80psi
7 100 3Min 80psi
8 212 3Min 80psi

61
Advantages of 2K DOEs

• Require relatively few runs per factor studied.


• Can be the basis for more complex designs.
• Good for early investigations and can look at a
large number of factors with relatively few runs.
• Lend themselves well to sequential studies.
• Analysis is fairly easy.

62
Step 8: Questions to Answer

Full Factorials:
• What is the purpose of a Full Factorial?
• What are the advantages of Full Factorials?
• What is a 2k Factorial?
• Why use a 2k Factorial?
• What is a Standard Order?

63
Step 8: Answers Summary
Full Factorials:
• Full Factorials are used to improve or optimize a
process by identifying the most critical factors and
interactions in a process.
• Full Factorials are more efficient in estimating the
effects of both input and noise variables on the
output. They also allow investigation of the
combined effects of factors and cover a wide
experimental range.

64
Step 8: Answers Summary (continued)
Full Factorials:
• A 2k Factorial is an experiment with (k) number of factors,
each with (2) levels.
• We use a 2k Factorial because; they are good for early
investigations, they require few runs, they can look at a
large number of factors, and the analysis is fairly easy.
• The Standard Order is the design matrix for 2k Factorial
experiments showing all treatment combinations in coded
units of (-) and (+) for the factor levels.

65
Step 8:
Discover Variable Relationships

2k Factorial Outputs

66
Step 8: Questions to Answer

Factorial Outputs:
• Why perform Analysis of Variance (ANOVA)?
• What are Main Effects and Interactions?
• What is Epsilon-Squared?
• What are Residuals?

67
What do we get for Output of a DOE?

• Initial and final reduced ANOVA tables


• Pareto Chart of the effects (2K only)
• Interaction Plots
• Main Effects Plots
• Epsilon Squared Values
• Residual Plots

Let’s look at these

68
ANOVA
Analyzing ANOVA Results of Full-Factorial Model
• First - Interpret highest order interaction. The 2-way interaction
Temp* Pressure is investigated and the p-value (p>0.05) indicates the
interaction not important.
• Second - Interpret the Main Effects. The two Main Effects of
Temperature and Pressure are important (p<.05).
Source df SS MS F p
Temp 2 0.30111 0.15056 8.47 0.009
Pressure 2 0.76778 0.38389 21.59 0.000
Temp*Pressure 4 0.06889 0.01722 0.97 0.470
Error 9 0.16000 0.01778
Total 17 1.29778

69
ANOVA (continued)
Since we identified the Temperature*Pressure interaction was not
statistically significant, we can assume that this effect is part of the
random noise in the experiment, does not need to be included in the
final model. Next step is to re-run the model (Reduced Model)
excluding this effect. Results are shown below.

Source df SS MS F p
Temp_ 2 0.30111 0.15056 8.55 0.004
Press_ 2 0.76778 0.38389 21.80 0.000
Error 13 0.22889 0.01761
Total 17 1.29778

Results indicate that Temperature and Pressure Main Effects are


still important (p<.05).

70
Pareto Chart of the Effects
Pareto Chart of the Standardized Effects
(response is CYCLE TIME, Alpha = .05)
A:Speed
B:Feed
AB
Sometimes, you will not
get P-values when you
first run your ANOVA.
A This is due to lack of
sufficient runs. In that case,
for 2K only, you can get a
Pareto Chart of the Effects.
B
All Factors that pass the
Red line have a P-value of
0 1 2 3 4 5 less than 0.05.

71
Main Effects Of DOEs
In a factorial experiment the Main Effect of a factor is defined as
the average change in the Output variable produced when the levels
of the factor change.
Pressure 1 Pressure 2
Temperature 1 20 30
Temperature 2 40 52

To determine the Main Effect for Temperature we calculate the


average yield at each level of temperature and subtract the low level
from the high level.

Lets write these numbers down


And look at them graphically
72
How to Read an Interaction Plot

Interaction Plot
Temp and Pressure vs Yield Pressure
Pressure 1
50 Pressure 2

40
Can we identify the four
Yield

experimental conditions
30 and their values? An
Interaction Plot can be used
to identify factor levels
20 and the Main Factor Effects.
Temp 1 Temp 2
Temp

73
How to Read an Interaction Plot - 2

Interaction Plot
Temp and Pressure vs Yield Pressure
Pressure 1
50 Pressure 2

40
Yield

30 Temp 1 (low) and Pressure 1 (low) = 20


Temp 2 (high) and Pressure 1 (low) = 40
Temp 1 (low) and Pressure 2 (high) = 30
20 Temp 2 (high) and Pressure 2 (high) = 52

Temp 1 Temp 2
Temp

74
Main Effect of Temp on Yield
Interaction Plot
Main effect of Temp on Yield
Pressure
Pressure 1
50 Pressure 2

Average of values at Temp2 = 46

40
Yield

As I go from the
21
average value of
30
the low “Temp”
Average of values at Temp1 = 25
settings to the average
value of the high
20
“Temp” settings,
Temp 1 Temp 2 I gain 21
Temp

75
Main Effect of Pressure on Yield
Interaction Plot
Main Effect of Pressure on Yield
Pressure
Pressure 1
50 Pressure 2

Average of values at
40 Pressure2 = 41 As I go from the
Yield

average value of
11
the low “Pressure”
30 Average of values at
Pressure1 = 30 settings to the average
value of the high
20
“Pressure” settings,
Temp 1 Temp 2
Temp I gain 11

76
Main Effects - Algebraically
In a factorial experiment the Main Effect of a factor is defined as
the average change in the Output variable produced when the levels
of the factor change.
Pressure 1 Pressure 2
Temperature 1 20 30
Temperature 2 40 52

To determine the Main Effect for Temperature we calculate the


average yield at each level of temperature and subtract the low level
from the high level, as shown: 40 + 52 20 + 30
Temp = - = 21
2 2
As Temperature increases from level 1 to level 2, yield increases by 21
points, we state that the Main effect of Temperature is 21 points.
77
Main Effects Algebraically (continued)
Likewise, to determine the Main Effect for Pressure:

Pressure 1 Pressure 2
Temperature 1 20 30

Temperature 2 40 52

30 + 52 20 + 40
We perform this calculation: Pressure = - = 11
2 2

As the Pressure is increased from level 1 to level 2 Yield increases


by 11 points, we state that the Main Effect of Pressure is 11 points.

78
Interaction Definition

• If you have two or more factors that taken


together have a significantly higher or lower
effect on your Y than just the separately
evaluated effect of each factor, you have a
significant interaction.
• Example: Car stopping distance
– Factor 1: Temperature (0 deg / 80 deg)
– Factor 2: Moisture (dry / wet for 12 hours before
test)
– Result: Both Temperature and moisture evaluated
separately make some difference, BUT when you
add 12 hours of water AND zero degrees ……
79
Data with interaction and a Question
Interaction Plot- new data set

Pressure
50 Pressure 1
Pressure 2

40 If my manager said “I do
not know what temperature
our new boiler will provide,
Yield

30
BUT today I need to order the
equipment that will
20
establish our pressure.
TELL ME WHAT
10 PRESSURE WE
SHOULD RUN TO
Temp 1 Temp 2
MAXIMIZE YIELD, NOW!!!”
Temp

80
How to Manually Evaluate Interactions
In some experiments we find that the effect between the levels of
one factor is not the same for different levels of the other factor.
Pressure 1 Pre ssure 2
Consider this Te mpe ra ture 1 20 40
DIFFERENT data Te mpe ra ture 2 50 12

At the first level of Pressure, Temperature = 50 - 20 = 30


what happens?
At the second level of Pressure, Temperature = 12 - 40 = -28
what happens?
What happens to the Response as Temperature changes for the
levels of Pressure, reflects an Interaction between Temperature and
Pressure.
81
Interaction Comparison
Factorial without Interaction Factorial with Interaction

Tem p Tem p
1 1
50
50 2 2
1 1
2 2
40
40
Mean

Mean
30

30

20

20

1 2 1 2

Pressure Pressure

No Interaction: Interaction:
As we move from Pressure 1 to 2 As we move from Pressure 1 to 2 at
at either constant Temperature 1 either constant Temperature 1 or 2,
or 2 we see the same effect, the we see an opposing effect, the yield
yield increases in both cases. goes in different directions at the
Although, the yield is higher for two temperature levels. Given we
Temp. 2 desire, higher yield we must
determine the best conditions

82
Interaction Plot showing significant interaction

Interaction Plot (data means) for CYCLE


Speed
25 20
80
Cycle Time

20
If we want low cycle time.
What should I set Feed at?

15
5 10
Feed

83
Main Effects Plot

Main Effects Plot (data means) for CYCLE


20 80 5 10
22
CYCLE TIME

21

20
19
For low cycle
18 time, what are
Speed Feed best settings
for Speed and
Feed?

84
Epsilon-Squared
• The Epsilon-Squared value (e-squared), provides an indication
of the practical implications of a Main effect or an Interaction.
• It is calculated by dividing each of the Sums of Squares by the
Total Sum of Squares.

Analysis of Variance for Yield


Source DF SS e-squared
Temp 2 0.30111 23%
Pressure 2 0.76778 59%
Error 13 0.22889 18%
Total 17 1.29778

Which is the most influential Input variable in this study?


What implications does this result have for process control?
85
Epsilon Squared - continued

• Epsilon Squared gives an approximation of


the “Practical Significance” of a factor.
– How much of the variation is explained by a factor
– It helps identify the most critical factors.
– Example: You have 3 factors with P-values of
0.000
• They are all statistically significant. You can
believe that they all do make a difference.
• BUT what is the size of those differences they
make?
• Epsilon Squares gives an estimate of the size
of those differences.
86
Residual Model D
Normal Plot of Residuals

Residual Analysis 0.2

0.1

Residual
Residual
Residual Analysis - Determine “Goodness
0.0 of Fit” for the Final Model

Residuals are the difference between the -0.1observed values and


-0.2
predicted or fitted values. This is the part of the observation/output that
-2 -1 0 1 2
is not explained by the model. Normal Score

Histogram of Residuals
4
The Residuals Plot indicates
3
the residuals are randomly Frequency

Residual
distributed and there 2

appears to be nothing out of 1

control or a visible pattern. 0


Therefore the model is a -0.25-0.20
-0.15
- 0.10
- 0.050.000.050.100.150.20
Residual
Good Fit!
87
Residual Analysis
Residual Analysis for DOE
1 Normal Plot of Residuals I Chart of Residuals 2 Chart 1: If data
10 20 fits a straight line,
UCL
10 data is normal.
Residual

Residual
Chart 2: If data
0 0 Mean
stays between the
-10 limits and has no
-10 LCL patterns, data is
-20
-2 -1 0 1 2 0 10 20 30 40 stable and errors
Normal Score Observation Number
are random.
3 Histogram of Residuals Residuals vs. Fits 4 Chart 3: If
8 10
7 Histogram looks
6 normal, data is
Frequency

Residual

5
4 0 normal.
3 Chart 4: If no
2
1 Patterns exist,
0 -10
-8 -6 -4 -2 0 2 4 6 8 70 80 90 100 110
errors are random.
Residual Fit

88
Step 8: Questions to Answer

Factorial Outputs:
• Why perform Analysis of Variance (ANOVA)?
• What are Main Effects and Interactions?
• What is Epsilon-Squared?
• What are Residuals?

89
Step 8: Answers Summary
Factorial outputs:
• ANOVA is used to show statistical significance and
an indication of the practical implications of the Main
Effects and Interactions through the calculated P-
Values and Epsilon-Squared Values respectively.
• A Main Effect is the average change observed in the
output variable when the level is changed of a factor.
• An Interaction is the combined effect of one or more
factors on the output variable.

90
Step 8: Answers Summary (continued)

• Epsilon-Squared provides the practical


significance of a factor or interaction.
• Residuals are the difference between the
observed values and the predicted or fitted
values, used to show the “goodness of fit” for
the final experimental model.

91
Step 8:
Discover Variable Relationships

2k Factorials Analysis
A 12 Step Process

92
(2k) Factorial Method

(1) State the practical problem.


(2) State the factors and the levels of interest.
(3) Select the appropriate sample size based on the
effect you are trying to detect.
(4) Create the experimental data sheet with the
factors in their respective columns. Randomize
the experimental runs and conduct the
experiment.
(5) Construct the ANOVA table for the full model.
93
(2k) Factorial Method (continued)

(6) Review the ANOVA table and eliminate effects with


p-values above (.05). Remove these one at a time and
then run the reduced model. Plot the residual graphs
for further analysis. If necessary use Pareto Chart of
Effects in Minitab.
(7) Analyze the Residual Plots to ensure we have a model
that fits.
(8) Assess the significance of the highest order
interactions first, p-value (< .05).

94
(2k) Factorial Method (continued)

(9) Investigate significant main effects NOT determined


by interactions, p-value (< .05).
(10) State the mathematical model obtained. If possible,
calculate the Epsilon-Squared values and determine
the practical significance.
(11) Translate the mathematical model into process terms
and formulate conclusions and recommendations.
(12) Replicate optimum conditions. Plan the next
experiment or institutionalize the change.

95
2k Factorial Example

96
Lets follow the steps
(1) State the practical problem.
• Determine the effect Temperature,
Concentration, and Catalyst have on the process
Yield.
(2) State the factors and the levels of interest.
• Temp: Low = 160C and High = 180C
(Coded Values: Low = -1 and High = +1)
• Conc: Low = 20% and High = 40%
(Coded Values: Low = -1 and High = +1)
• Catalyst: Low = Brand A and High = Brand B
(Coded Values: Low = -1 and High = +1)

97
Lets follow the steps
(3) Select the appropriate sample size based on
the effect you are trying to detect.
• We would typically use Power and Sample Size
in Minitab, but for exercise purposes, we will use
the 8 runs in the data set for one replication.

98
Following Along
(4) Create the experimental data sheet with the factors in
their respective columns. Randomize the experimental
runs and conduct the experiment. (Note for example,
data is not randomized)

StdOrder RunOrder Temp Conc Cataylst Yield


1 1 160 20% Brand A 60
2 2 180 20% Brand A 72
3 3 160 40% Brand A 54
4 4 180 40% Brand A 68
5 5 160 20% Brand B 52
6 6 180 20% Brand B 83
7 7 160 40% Brand B 45
8 8 180 40% Brand B 80
99
Still Moving Along
(5) Construct the ANOVA table for the full model in Minitab.
If no P values are available, evaluate using Pareto
Chart of Effects for first pass at analysis.

Pareto Chart of the Effects


(res pon s e is Yield, Alpha = . 10)

A: T em p
B : Co nc
A
C: Ca t

AC

AB

ABC

BC

0 10 20

100
Further Still
(6) Review the ANOVA table and eliminate effects
with p-values above (.05). Remove these one
at a time starting with highest order first and
then run the reduced model. Plot the residual
graphs for further analysis. If necessary use
Pareto Chart of Effects in Mintab.

Next page shows Final Reduced Model

101
Final Reduced ANOVA Model – Step 6

'Yield' = Temp Catalyst Temp* Catalyst

Estimated Effects and Coefficients for Yield (coded units)

Term Effect Coef StDev Coef T p


Constant 64.250 0.1768 363.45 0.000
Temp 23.000 11.500 0.1768 65.05 0.000
Conc -5.000 -2.500 0.1768 -14.14 0.005
Cat 1.500 0.750 0.1768 4.24 0.051
Temp*Conc 1.500 0.750 0.1768 4.24 0.051
Temp*Cat 10.000 5.000 0.1768 28.28 0.001

What is significant about this ANOVA chart?


102
Onward

(7) Analyze the Residual Plots to ensure we have


a model that fits. (errors random, normal data)
Residual Analysis for DOE
Normal Plot of Residuals I Chart of Residuals Please note that with
0.3
0.2
1 UCL=0.9498
only one test for
0.1 each experimental
Residual

Residual

0.0 0 Mean=0

-0.1 condition., these charts


-0.2
-0.3 -1 LCL=-0.9498
are limited in
-0.5 0.0
Normal Score
0.5 1 2 3 4 5
Observation Number
6 7 8 development.
Histogram of Residuals Residuals vs. Fits
4 0.3
0.2
3
Frequency

0.1
Residual

2 0.0
-0.1
1
-0.2
0 -0.3
-0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 45 55 65 75 85
Residual Fit

103
To determine our critical Xs
(8) Assess the significance of the highest order interactions
first. Interaction Plot (data means) for Yield

-1 1 -1 1
80
Temp
1 65

-1
50

80
Conc
Cat*Conc
1 65
Not Significant
-1 50

Cat

104
Continue to Evaluate
(9) Investigate significant main effects NOT determined by
interactions, p-value (<.05).
Main Effects Plot (data means) for Yield

-1 1 -1 1 -1 1
Note: In this case, this
75
step could be skipped
70 as all factor levels
Yield

were set in step 8


“review of Interactions
65

60 with P-values <0.05”.

55

Temp Conc Cat

105
Time to put the answer together
(10) State the mathematical model obtained. If possible,
calculate the Epsilon-Squared values and determine
the practical significance.
Let’s look at the ANOVA table again
Term Effect Coef StDev Coef T p
Constant 64.250 0.1768 363.45 0.000
Temp 23.000 11.500 0.1768 65.05 0.000
Conc -5.000 -2.500 0.1768 -14.14 0.005
Cat 1.500 0.750 0.1768 4.24 0.051
Temp*Conc 1.500 0.750 0.1768 4.24 0.051
Temp*Cat 10.000 5.000 0.1768 28.28 0.001

What would the mathematical equation for


this experimental model look like?
106
Let’s look at the design geometrically.
Cube Plot (data means) for Yield

We are still
64.250 Constant Coef on step 10,
45 80 we just need
to explain
what the
40% 54 68
design looks
like to help
Conc 52 83 Brand understand
B the next
slide.
20% 60 72 Cataylst
160 180 Brand
Temp
A

107
Effects and Coefficients – step 10 Cont.
Term Effect Coef StDev Coef T p
Constant 64.250 0.1768 363.45 0.000
Temp 23.000 11.500 0.1768 65.05 0.000
Conc -5.000 -2.500 0.1768 -14.14 0.005
Cat 1.500 0.750 0.1768 4.24 0.051
Temp*Conc 1.500 0.750 0.1768 4.24 0.051
Temp*Cat 10.000 5.000 0.1768 28.28 0.001

The Coef of the constant is the grand average of all the data. Think of it as
the center of the design cube. (64.250)

Let’s look at just one factor effect. For Temp we gain 23 units of yield as we
go From the low to the high. If we were to go only from the center to the
high, we would only gain 11.5 units. This is the factor Coef. It is half of the
factor Effect.

108
What is the Prediction Equation? Step 10

Yield = 64.25 + 11.50(Temp)-2.50(Conc) + 0.75(Cat) +


0.75(Temp*Conc) + 5.00(Temp*Cat)

Given this formula, how do we predict the maximum


yield at the optimum settings?
• Look at results from evaluation of Interaction and Main
Effects plots (Steps 8 and 9) to determine proper
setting of levels, (+1 or -1)
• Use those settings in the formula at the top of the page

109
Let’s plug our values into the equation - Step 10

• From steps 8 and 9, we get the following


settings to maximize yield:
– Temp = +1
– Conc = -1
– Cat = +1
• Therefore:
– Temp*Conc = (+1)(-1) = -1
– Temp*Cat = (+1)(+1)= +1

• Substitute these into the equation

110
Optimum Prediction Equation – Step 10

Yield = 64.25 + 11.50(Temp)-2.50(Conc) +


0.75(Cat) + 0.75(Temp*Conc) +
5.00(Temp*Cat)

Maximum Predicted Yield = 64.25 + 11.50(+1)


–2.50(-1) + 0.75(+1) + 0.75(-1) + 5.00(+1) =

83.25

111
Epsilon Squares - Step 10
Source SS Epsilon Sq
Temp 1058 80.30% Temp is most important
Conc 50 3.80%
Cataylst 4.5 0.34%
Temp*Conc 4.5 0.34%
Temp*Cataylst 200 15.18% Temp*Cat interaction 2’nd
Error 0.5 0.04%
Total 1317.5 100.00%

112
The End

(11) Translate the mathematical model into


process terms and formulate conclusions
and recommendations.
Set Temp = High (+1); Conc = Low (-1); Cat =
High (+1)

(12) Replicate optimum conditions. Plan the


next experiment or institutionalize the
change.
Prove the optimum solution in real life production.

113
Step 8:
Discover Variable Relationships

Additional Topics

114
Additional Topics

• Power and Sample Size


• Center Points and Curvature
• Review of types of DOE
• Exercise

115
Power and Sample Size

• Minitab has a powerful Power and Sample


Size Function – See Black Belt for aid.
• The purpose of the next two slides is to show
the concepts:
– That the bigger the difference between the means
of the two samples, the easier it is to see (smaller
sample size needed).
– That the smaller the difference between the
means of the two samples, the harderv the
difference is to see (larger sample size needed).

116
Large Difference Between Sample Means

d The Delta
between the
s curves is
approximately
4 sigma

It should be easy to see this amount of change, small sample size.


The delta (difference) between the two samples (red and blue) is very
large compared to their variation (sigma)

117
Small Difference Between Sample Means

d The Delta between the


s curves is approximately
1 sigma

It will be harder to see this small amount of change, large sample size.
The delta is very small COMPARED to the variation (sigma).

118
Center Points
There is always a risk in 2-level designs of missing a
curvilinear relationship. The addition of Center Points is
an efficient way to test for curvature without adding a
large number of experimental runs. Curvature is part of
ANOVA when Center Points are used.
The same P-Value rule applies. If P< .05 then curvature
is statistically significant.
Examples follow.

Curvature: Possible curvilinear relationship missed in


an experiment when using only two levels for each factor.

119
23 Design with center points (8 plus 3 center points)
1) In Minitab, a “0” in the
CenterPt Volts Size Time CenterPt column
0 180 15 7.5 indicates a center
point.
1 120 10 10
1 120 20 5 2) If you use Center
Points, a good practice
1 240 10 5 is to have some at the
beginning, in the
1 240 20 10 middle, and at the end.
0 180 15 7.5 This will help check
that there was no
1 240 10 10 drifting due to an
1 120 10 5 unknown lurking
variable.
1 240 20 5
1 120 20 10 3) A Center Point is
defined by the
0 180 15 7.5 midpoint of all of the
factors. 120
Example of No Curvature
Analysis of Variance for Yield (coded units)
Source DF Seq SS Adj SS Adj MS F P
Main Effects 2 2.82500 2.82500 1.41250 32.85 0.003
2-Way Interactions 1 0.00250 0.00250 0.00250 0.06 0.821
Curvature 1 0.00272 0.00272 0.00272 0.06 0.814
Residual Error 4 0.17200 0.17200 0.04300
Pure Error 4 0.17200 0.17200 0.04300
Total 8 3.00222

P-Value of 0.814 >>0.05


Therefore Curvature is not significant in this model
Next slide verifies that responses for both main effects appear linear.

121
Example of No Curvature - Graph
Main Effects Plot (data means) for Yield
Centerpoint

0 0
15 16 30 40
41.2

40.8
Yield

40.4

40.0

39.6
Temp Time

122
Example of Curvature

Analysis of Variance for Yield (coded units)

Source DF Seq SS Adj SS Adj MS F P


Main Effects 2 2.8250 2.8250 1.4125 54.12 0.000
Curvature 1 44.1045 44.1045 44.1045 2E+03 0.000
Residual Error 5 0.1305 0.1305 0.0261
Lack of Fit 1 0.0025 0.0025 0.0025 0.08 0.794
Pure Error 4 0.1280 0.1280 0.0320
Total 8 47.0600

P-Value for curvature is 0.000


Therefore curvature is a statistically significant factor in this model
Next slide verifies that responses for both main effects appear non-linear.

123
Example of Curvature - Graph
Main Effects Plot (data means) for Yield
Centerpoint

0 0
15 16 30 40
44.8

43.6
Yield

42.4

41.2

40.0

Temp Time

Note that the center point is the point that gives highest yield
The Dotted lines are for demo only. We only know the 3 points
We do NOT know the shape of the curve.
124
Curvature – Take Away
1. If you do not use center points, you only know
the end points.
2. If you test for center points and you get P-value
over 0.05 (no curvature), you can assume that
there is a linear relationship and you can
interpolate between high and low settings.
3. If you test for center points and you get P-value
less than 0.05 (curvature), then there is a non-
linear relationship. All you know are the tested
points (including the center point). Need to use
different tools (regression, response surface).

125
DOE Tool Matrix
Type of Design Screening Characterization Optimization Definition
One Factor at a Time
(OFAT)
 Toggle one item at a time
holding everything else
constant (inexpensive, but
does not see interactions well).
Fractional Factorials
  Looks at some of the
combinations. Frequently used
to take first look when there are
many factors.
Full Factorials
  Looks at all combinations. Can
provide final DOE results.
Response Surface
Methods
 Provides mathematically
optimized results using multiple
dimension mapping.
Infrequently used.
PLEX
 Used to achieve the objective
k
through limited 2 supervised
experiments done during
regular production. Can
evaluate different sets of X’s.
EVOP
 Continuous improvement
method used to optimize a
process during regular
production. High operator
involvement. Will optimize
limited number of X’s.

126
Exercise
• Objective: To perform a 2K (2x2x2) full factorial experiment
with the Catapult.
• Output: Distance – We want to maximize.
• Procedure:
– Select 3 Factors (Inputs)
• Factor 1: Ball Type, 2 levels (different kinds of ammo)
• Factor 2: Stop Pin, 2 levels (45 degrees and adjacent
setting)
• Factor 3: Tension Pin, 2 levels (two adjacent pin
locations)
– Four replications.
– Randomize the experimental runs.
• Complete DOE Planning Worksheet
127
Catapult Set Up

Factor C, Two Different


Stop Pin Locations
Factor B, Two Different
Tension Pin Locations

Pull back arm all the way.

Factor A, Two Different Ball Types

128
DOE Planning Worksheet
• Prior to running the experiment, plan the
following items on a flipchart:
– Problem Statement
– Objective
– Response Variable
– Factors
– Levels
– Possible Noise Variables
– Data collection method, responsible parties
– Measurement system (Gage R&R results)
• Reconvene and review your plan with the
class
129
Entry Form (partial, showing randomization)
StdOrder RunOrder CenterPt Blocks Ball Type Stop Pin Tension PinDistance
18 1 1 1 1 -1 -1
9 2 1 1 -1 -1 -1
31 3 1 1 -1 1 1
26 4 1 1 1 -1 -1
15 5 1 1 -1 1 1
17 6 1 1 -1 -1 -1
21 7 1 1 -1 -1 1
24 8 1 1 1 1 1
1 9 1 1 -1 -1 -1
20 10 1 1 1 1 -1
32 11 1 1 1 1 1
12 12 1 1 1 1 -1
13 13 1 1 -1 -1 1
14 14 1 1 1 -1 1
Data Sheet : “Step 8 - DOE Data Entry Sheet.XLS”
DOE Minitab file: “Step 8 - DOE Exercise Design.MPJ”
130
Run Exercise

• Teams of 4 - 6
• Record all shots.
• One operator fires all shots.
• Black Belt or Assistant Instructor will enter
data into Minitab and calculate final reduced
model.
• Instructor will review final model, all
significant interaction and main effects plots,
residual fits, ANOVA table, p-values, epsilon
squared.
131
Outcomes of Exercise

• A review of the final reduced ANOVA table.


• A review of the Residuals Analysis to
evaluate quality of data.
• A review of the appropriate Interaction and
Main Effects Plots.
• An identification of which factors are
statistically important and what setting they
should be at to maximize distance.
• Generated Prediction Model.

132
Step 8: Project Deliverables

If a DOE is required, the deliverables are:


• Well thought out and documented plan.
• Completion of DOE, with documentation.
– Reduced ANOVA table
– Interaction and main effects plots
– Residuals Analysis
– Summarized Results (Prediction Equation)
• Evaluation of a possible curvilinear relationship.

133
Step 8: Project Deliverables

If a DOE is not required, the deliverables are:


• Selection of appropriate hypothesis tools (t-tests,
ANOVA, regression, proportion tests, chi-square,
etc.)
• Identification and statistical validation through the
above tests of which inputs and their optimal
setting are critical.

134

You might also like