Professional Documents
Culture Documents
UNCLASSIFIED / FOUO
National Guard
Black Belt Training
Module 34
Analysis of Variance
(ANOVA)
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
ACTIVITIES TOOLS
• Value Stream Analysis
• Identify Potential Root Causes • Process Constraint ID
• Reduce List of Potential Root • Takt Time Analysis
Causes • Cause and Effect Analysis
• Brainstorming
• Confirm Root Cause to Output
• 5 Whys
Relationship
• Affinity Diagram
• Estimate Impact of Root Causes • Pareto
on Key Outputs • Cause and Effect Matrix
• FMEA
• Prioritize Root Causes
• Hypothesis Tests
• Complete Analyze Tollgate • ANOVA
• Chi Square
• Simple and Multiple
Regression
Note: Activities and tools vary by project. Lists provided here are not necessarily all-inclusive. UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
Learning Objectives
Gain a conceptual understanding of Analysis of Variance
(ANOVA) and the ANOVA table
Be able to design and perform a one or two factor
experiment
Recognize and interpret interactions
Fully understand the ANOVA model assumptions and how
to validate them
Understand and apply multiple pair-wise comparisons
Establish a sound basis on which to learn more complex
experimental designs
Regression ANOVA
ANOVA Output
Boxplot of Processing Time by Facility
12
10
Processing Time
8
0
Facility A Facility B Facility C
Facility
Is There a Difference?
30
25
x
20 x
Response
x
15 x
10
Factor A
Plotting the averages for the different methods shows a
difference, but is it statistically significant?
Analysis of Variance (ANOVA) UNCLASSIFIED / FOUO 8
UNCLASSIFIED / FOUO
30
25
x
20 x
Response
x
15 x
10
Factor A
Now that we have a bit more data, does
factor A make a difference? Why or why
not?
Analysis of Variance (ANOVA) UNCLASSIFIED / FOUO 9
UNCLASSIFIED / FOUO
30
25
x
20 x
Response
x
15 x
10
Factor A
Sources of Variability
ANOVA looks at three sources of variability:
Total – Total variability among all observations
Between – Variation between subgroup means
(factor)
Within – Random (chance) variation within each
subgroup (noise, or statistical error)
“Between “Within
Subgroup Subgroup
Variation” Variation”
Ho : 1 2 3 4
Sums of Squares
yj = Mean of Group
70
experiment
65
60
1 2 3 4
i =represents a data point
Factor/Level within the jth group
j = represents the jth group
g nj g g nj
ij
( y y
j 1 i 1
) 2
j j
n ( y y ) 2
j 1
ij j
( y y ) 2
j 1 i 1
Ho : 1 2 3 4
Ha : At least one k is different
n 1
a
a
TOTAL SS(Total) nj 1
j 1
i = represents a data point within
the jth group (factor level)
j = represents the jth group (factor
level)
a = total # of groups (factor levels)
Select Graphs to go to
the Graphs dialog box
ANOVA-Boxplots
Select >
Boxplots of data
12
10
Processing Time
0
Facility A Facility B Facility C
Facility
What would you conclude? Which facility has the best cycle time?
Tukey pairwise
comparisons
answer the question
“Which ones are
Statistically Significantly
Different?”
We want to determine if
there is a significant
difference in the level of
production between the
different plans.
What concerns might you
have about this experiment?
Boxplots in Minitab
Let‟s start with Graphs > Boxplots
1200
1150
If you
production
were the
1100
manager,
what would
1050
you do?
1000
A B C D E
plan
Which intervals
do not contain
zero?
Is it possible for
the ANOVA Table and
the Tukey
pairs to conflict?
1175
1150
Mean
1125
1100
1075
1050
A B C D E
plan
1200
1150
production
1100
1050
A B C D E
plan
Source DF SS MS F p
Error ? 2,242 ?
Total 23 6,868
Let‟s say we are testing a factor that has five levels and we collect seven data points
at each factor level…
How many observations would we have? 5 levels x 7 observations per level =35 total
observations
How many total degrees of freedom would we have? 35 - 1 = 34
How many degrees of freedom to estimate the factor effect? 5 levels - 1 = 4
How many degrees of freedom do we have to estimate error? 34 total - 4 factor =
30 degrees of freedom
Note: Minitab will perform this analysis for us with the procedure
called „Test for Equal Variances’
Bartlett's Test
A Both Bartlett’s Test
Test Statistic 4.95
P-Value 0.292
and Levene’s Test
Levene's Test are run on the data
B
Test Statistic 0.46 and are reported
P-Value 0.764 at the same time.
plan
0 20 40 60 80 100 120
95% Bonferroni Confidence Intervals for StDevs
Individual Exercise
A market research firm for the Defense Commissary
Agency (DECA) believed that the sales of a given
product in units was dependent upon its placement
Items placed at eye level tended to have higher sales
than items placed near the floor
Using the data in the Minitab file Sales vs Product
Placement.mtw, draw some conclusions about the
relationship between sales and product placement
You will have 10 minutes to complete this exercise
UNCLASSIFIED / FOUO
National Guard
Black Belt Training
Two-Way ANOVA
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
A
Low High
At a high level, a Two-
69 80 Way ANOVA (two
Low 65 82 factor) can be viewed
as a two-factor
B experiment
59 The factors can take
42
High on many levels; you
44 63
are not limited to two
levels for each
Two-Way ANOVA
Experiments often involve the study of more than one
factor.
Factorial designs are very efficient methods to
investigate various combinations of levels of the
factors.
These designs evaluate the effect on the response
caused by different levels of factors and their
interaction.
As in the case of One-Way ANOVA, we will be building
a model and verifying some assumptions.
SS E
Error SSE ab(n - 1) MS E
ab(n 1)
Marketing Example
AAFES is trying to introduce their own brand of candy and
wants to find out which product packaging or regions will
yield the highest sales.
They sold their candy in either a plain brown bag, a colorful
bag or a clear plastic bag at the cash register (point of sale).
AAFES had stores in regions which varied economically and
the information was captured to see if different regions
affect sales.
The data set is: Two Way ANOVA Marketing.mtw.
As a class we will analyze the data.
What is significant?
Who wants to give it a try?
750
700
Mean
650
600
550
Selecting Interactions
Enter the Responses and Factors, then click on OK to get Plot
Interactions Plot
Interaction Plot for sales
Data Means
region
1100
1
2
1000 3
900
800
Mean
700
600
500
400
Residual Analysis
Select Store residuals and Store fits, then select Graphs
and select Four in one (under Residual Plots) and click on OK and OK
again so we can do some model confirmation
Residual
Percent
What are we
50 0
10
1
-100
looking for?
0.1 -200
-200 -100 0 100 200 400 600 800 1000 1200
Residual Fitted Value
What are
Histogram Versus Order
200
the
40
100
assumptions
Frequency
we want to
Residual
30
0
20
10 -100 verify?
0 -200
-120 -60 0 60 120 1 20 40 60 80 00 20 40 60 80 00 20 40 60
1 1 1 1 1 2 2 2 2
Residual
Observation Order
Bartlett's Test
Test Statistic 42.13
1 P-Value 0.000
Lev ene's Test
Test Statistic 17.02
P-Value 0.000
region
Bartlett's Test
Test Statistic 70.97
color P-Value 0.000
Lev ene's Test
Test Statistic 34.49
P-Value 0.000
packaging
plain
point of sale
ANOVA Conclusions
Did our model assumptions hold up?
How comfortable are we with the conclusions drawn?
Questions?
Takeaways
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
National Guard
Black Belt Training
APPENDIX
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
1093 = 33.1
Analysis of Variance (ANOVA) UNCLASSIFIED / FOUO 85
UNCLASSIFIED / FOUO
Multiple Comparisons
Tukey’s – Family error rate controlled
Fisher’s – Individual error rate controlled
Dunnett’s – Compares all results to a control group
Hsu’s MCB – Compares all results to a known best group
Which one do you use? In general, Tukey’s is recommended
because it‟s „tighter‟. In other words, you will be less likely to
find a difference between means (less statistical power), but you
will be protected against a “false positive”, especially when there
are a lot of groups.
Tukey‟s makes each test at a higher level of significance (a‟ > .05) and
holds the family error rate to a = .05
Fisher‟s makes all tests at the specified significance level (usually a = .05)
and reports the “family” error rate, a‟
F - D is t r i b u t io n f o r 3 a n d 2 0 d e g r e e s o f F r e e d o m
0 .7
10% Point
0 .6
0 .5
5% Point Observed Point
0 .4
Prob
0 .3
0 .2
1% Point
0 .1
0 .0
0 2 4 6 8 10 12 14
F - V a lu e
F Distribution:
Probability Distribution Function (PDF) Plots
1.0
N1* N2
0.9
1, 1 d.f.
0.8 3, 3 d.f.
0.7 5, 8 d.f.
0.6 8, 8 d.f
0.5
* N1 refers to the d.f. in
0.4 the numerator
0.3
0.2
0.1
0.0
0 1 2 3
F