Professional Documents
Culture Documents
ANALYZE PHASE
Sources of Variations
1
5/7/2015
Multi-vari Plots.
• Confidence Intervals on Mean.
• Hypothesis Test on Mean.
• Hypothesis Test on Paired Mean.
• Hypothesis Test on Mean of Two Samples.
• Hypothesis Test on Variance of Two Samples.
• Contingency tables.
• Power & Sample Size.
• Non-parametric Tests.
It displays the response means at each level for every factor. For
example, in a two-factor model (Factor A and Factor B, each with two
levels).
2
5/7/2015
• For (Ali), night shift is better where response (lead time) is less generally
regardless production line.
• Best results (lowest lead time) can be achieved with the following
conditions: Operator (Ali) working on (Night shift) using production
line (2).
• etc..
Strategic Management and Business Analysis - 2014 6
1 -6
3
5/7/2015
• Multi-vari Plots.
Confidence Intervals on Mean.
• Hypothesis Test on Mean.
• Hypothesis Test on Paired Mean.
• Hypothesis Test on Mean of Two Samples.
• Hypothesis Test on Variance of Two Samples.
• Contingency tables.
• Power & Sample Size.
• Non-parametric Tests.
Statistical Inference…
Statistical inference is the field of statistics that allow to compare
sample statistics, such as the sample mean and sample standard
deviation, to known populations or to other samples.
Due to sampling errors, a given sample from a population may lie within
a range of values, just as multiple samples from Deming’s Red Bead
Experiment yielded different estimates of the percent of red beads, even
though no one changed the bucket.
4
5/7/2015
Statistical Inference…
Similarly, the true population parameter may lie anywhere within a
given range of our estimates. This is the basis for a confidence interval.
A key assumption is that population is both constant (it does not change
over time) and homogenous (a given sample is representative of the
sample).
Sample Population
Average Xbar µ
Standard Dev. s σ
1. 1 Sample Z test
(Confidence intervals for Mean, while (σ) is known)…
When we sample from a population, and have historical evidence of the
sigma value, we can estimate the confidence interval of the mean at a
given confidence level. (For example 95%)
5
5/7/2015
1. 1 Sample Z test
(Confidence intervals for Mean, while (σ) is known)…
It will be noticed that where samples n increases, the confidence
interval gets smaller. That is, we have less confidence of where the true
mean is for a smaller sample.
6
5/7/2015
2. 1 Sample t test
(Confidence intervals for Mean, while (σ) is unknown)…
Often in practice, we do not know the true population standard
deviation. Rather, we seek to estimate that with our sample as well. In that
case, we use the student “t” distribution, which approaches a Normal
distribution as the sample size n increases. An additional parameter of the
student “t” distribution is the degrees of freedom often shown using the
Greek letter (pronounced nu), which equals n-1 for this test statistic.
7
5/7/2015
Note that: Confidence interval is wider than that calculated (using the
same statistics) assuming a known population sigma. The wider interval is
indicative of the decreased confidence associated with absence of
population standard deviation.
Strategic Management and Business Analysis - 2014 16
1 -16
8
5/7/2015
• Multi-vari Plots.
• Confidence Intervals on Mean.
Hypothesis Test on Mean.
• Hypothesis Test on Paired Mean.
• Hypothesis Test on Mean of Two Samples.
• Hypothesis Test on Variance of Two Samples.
• Contingency tables.
• Power & Sample Size.
• Non-parametric Tests.
9
5/7/2015
The Analyze phase in Six Sigma closely examines the many process
inputs identified in the Measure phase to determine if they are related to
outputs, if a relationship does exist and if it is statistically significant. An
important tool for this analysis is hypothesis testing. Hypothesis testing
uses statistical analysis to determine if the observed relationship between
two or more samples is real or due to random chance. A variety of tests
are used to find statistical evidence to reject or "not to reject" a
hypothesis. Once this is accomplished, the Six Sigma team is ready to
move forward with identifying, testing, and implementing solutions to
address the root causes of failure.
10
5/7/2015
Definitions…
Null Hypothesis H0 : Statement of no change.. Means of the two
samples are equal.
Definitions…
Alpha (α) Error : Probability of wrongly rejecting the null hypothesis..
Also called Type I error.
11
5/7/2015
REMEMBER!!!!
Mean 1
Mean 3
Mean 2
Distribution 3
Distribution 2
Distribution 1
12
5/7/2015
13
5/7/2015
REMEMBER!!!!
We don’t ACCEPT null hypothesis, we either reject (Strong conclusion)
or fail to reject (Weak conclusion).
If (P) is low, NULL will go.. If (P) is high, NULL is the guy.
• Multi-vari Plots.
• Confidence Intervals on Mean.
• Hypothesis Test on Mean.
Hypothesis Test on Paired Mean.
• Hypothesis Test on Mean of Two Samples.
• Hypothesis Test on Variance of Two Samples.
• Contingency tables.
• Power & Sample Size.
• Non-parametric Tests.
14
5/7/2015
3. Paired t test
Hypothesis on paired sample…
We can also compare the means of two samples to test if they are from
populations with equal means (or from the same population).
In this first case, we will consider samples that are paired: each
observation has a corresponding observation in the other sample batch.
For example, if we have two operators, or two test methods, and we make
measurements for each piece from each operator or method, then the
data is paired: Observation 1 from Operator A is paired with Observation 1
from Operator 2.
3. Paired t test
Hypothesis on paired sample…
We can use the Hypothesis Test on Mean we have previously
discussed, testing µ=0 vs. µ≠0
15
5/7/2015
Current New
13.75 11.5
13.5 9.5
16.75 12.5
13.25 14.5
15.5 14.5
Strategic Management and Business Analysis - 2014 31
1 -31
Final answer is: P > 0.05 (0.114) , and consequently we fail to reject Null
There is no significant difference between the two samples
16
5/7/2015
• Multi-vari Plots.
• Confidence Intervals on Mean.
• Hypothesis Test on Mean.
• Hypothesis Test on Paired Mean.
Hypothesis Test on Mean of Two Samples.
• Hypothesis Test on Variance of Two Samples.
• Contingency tables.
• Power & Sample Size.
• Non-parametric Tests.
4. 2 sample t test
Hypothesis on two sample means…
In a more general two sample case, we can use Hypothesis Tests to
compare the mean of two samples, to test whether the samples came
from the populations of equal mean (or from the same population).
Assuming the population distribution is Normal, we calculate the test
statistic and the degrees of freedom for the critical value of the test
statistic ( V, t0) as follows:
X1 − X2 (s12 / n1 + s22 / n2 )2
t0 = ν= 2 2
s12 s22 (s1 / n1) (s22 / n2 )2
+ +
n1 n 2 n1 −1 n2 −1
Strategic Management and Business Analysis - 2014 34
1 -34
17
5/7/2015
18
5/7/2015
• Multi-vari Plots.
• Confidence Intervals on Mean.
• Hypothesis Test on Mean.
• Hypothesis Test on Paired Mean.
• Hypothesis Test on Mean of Two Samples.
Hypothesis Test on Variance of Two Samples.
• Contingency tables.
• Power & Sample Size.
• Non-parametric Tests.
19
5/7/2015
• Multi-vari Plots.
• Confidence Intervals on Mean.
• Hypothesis Test on Mean.
• Hypothesis Test on Paired Mean.
• Hypothesis Test on Mean of Two Samples.
• Hypothesis Test on Variance of Two Samples.
Contingency tables.
• Power & Sample Size.
• Non-parametric Tests.
R x C Contingency tables…
Contingency tables, also known as R x C Contingency Tables, refers to
data that can be assembles in tables (of rows and columns) for
comparison. For example:
20
5/7/2015
R x C Contingency tables…
The methodology for analyzing the (r) rows by (c) columns involves
using the chi-square statistic to compare the observed values with the
expected values, assuming independence.
The Null Hypothesis is that the p-values are equal for each column in
each row. The alternative is that at least one of the p-values is different.
Chi-square test…
Used to test whether two discrete variables are associated or not (one
variable is dependent on the other one).
It compares the actual data readings with the expected values in case
of independence.
21
5/7/2015
22
5/7/2015
Consider this the next time you read the report of the Surprising Results
Of A New Study in your newspaper. Would the unsurprising results of the
other nine researchers warrant a headline?
Beta risk…
The Beta risk is the probability of not rejecting a false null hypothesis.
Sometimes, we speak instead of the power of the test, which is the
probability of correctly rejecting the false null hypothesis. Either way, we
need to recognize that even though we fail to reject, the null hypothesis
may still be false.
What influences our ability to correctly reject the false null hypothesis?
Or what gives a higher power (small beta risk)
23
5/7/2015
µo
β
True Condition
Decision Line
• Multi-vari Plots.
• Confidence Intervals on Mean.
• Hypothesis Test on Mean.
• Hypothesis Test on Paired Mean.
• Hypothesis Test on Mean of Two Samples.
• Hypothesis Test on Variance of Two Samples.
• Contingency tables.
Power & Sample Size.
• Non-parametric Tests.
24
5/7/2015
25
5/7/2015
Final answer is 5
• Multi-vari Plots.
• Confidence Intervals on Mean.
• Hypothesis Test on Mean.
• Hypothesis Test on Paired Mean.
• Hypothesis Test on Mean of Two Samples.
• Hypothesis Test on Variance of Two Samples.
• Contingency tables.
• Power & Sample Size.
Non-parametric Tests.
26
5/7/2015
27
5/7/2015
28
5/7/2015
ANOVA
Assumptions.
29
5/7/2015
Why ANOVA …
Imagine that we need to compare 4 means (µ1, µ2, µ3, µ4) using t
sample t test.
Why ANOVA …
If the six tests was performed at a 95% significance level, the final (α)
risk value will be as follows:
So, probability to wrongly reject the Null hypothesis (equal 4 means) will
be 26.5% …
30
5/7/2015
What ANOVA is …
ANalysis Of VAriance.
It is a tool used for comparison of two or more means using analysis of
variance and F statistics.
ANOVA Assumptions…
1. Treatments are following normal distribution, yet there is strong
evidence that the test is robust to departures from Normality. In other
words, Normality is not generally an issue.
31
5/7/2015
• Assumptions.
32
5/7/2015
33
5/7/2015
Source DF SS MS F P
Factor 3 8.157 2.719 2.88 0.062
Error 20 18.883 0.944
Total 23 27.040
From the P value, we can get the conclusion that there is no significant
difference between them…
Also, from calculated value for (F Statistics), we can get the same result
after comparing it with the critical F value … How??
34
5/7/2015
If the F-test had detected a difference in one of the means (with a p <
0.05), then we could use these confidence intervals on the mean to
determine which mean is significantly different.
35
5/7/2015
A (----------*---------)
B (----------*---------)
C (---------*----------)
D (----------*---------)
• Assumptions.
36
5/7/2015
37
5/7/2015
38
5/7/2015
Source DF SS MS F P
Product 1 17.8506 17.8506 24.02 0.001
Representative 3 8.1019 2.7006 3.63 0.064
Interaction 3 0.2819 0.0940 0.13 0.942
Error 8 5.9450 0.7431
Total 15 32.1794
A (-------*------)
B (------*-------)
C (-------*------)
D (-------*------)
Product type confidence intervals
P1 (------*-----)
P2 (-----*-----)
39
5/7/2015
• Assumptions.
40
5/7/2015
Are there differences due to the Product Type (P1 or P2)? Or to the
geographic area (region) they service?
41
5/7/2015
42
5/7/2015
Regression Analysis
43
5/7/2015
• Scatter Diagram.
Listing all the causes for a given effect in a clear, organized way makes
it easier to separate out potential problems and target areas for
improvement.
44
5/7/2015
Cause
Effect
Sub-cause
Sub-Cause
Sub-Cause
Cause
45
5/7/2015
Bear in mind that the causes listed are all potential cause, since we
have no data at this point to support whether any of the causes really
contribute to the problem. In that regard, as in all brainstorming activities,
avoid judging the merits of each cause as it is offered. Only data can lead
to that judgment.
46
5/7/2015
Analyze Phase:
To brainstorm potential basic process factors, which can be investigated
in designed experiment
Scatter Diagram.
47
5/7/2015
Scatter Diagram …
Scatter Diagrams are used to investigate possible correlation, or
(inter-dependence), of one variable to another.
Scatter Diagram …
For example, if we can show that weight gain in the first three months
of pregnancy correlates well with baby development, we can use weight
gain as a predictor of healthy fetal development. The alternative would be
expensive tests to monitor the actual development of the baby.
48
5/7/2015
X Y
dependent variable
90%
1.5 76
1.3 85
1.1 75 80%
1.6 84
2 88 70%
1.75 90
49
5/7/2015
Extrapolation …
Dependent variable
Independent variable
50
5/7/2015
Positive Correlation…
Dependent variable
Independent variable
Slope is positive.
Negative Correlation…
Dependent variable
Independent variable
Slope is negative.
51
5/7/2015
Weak Correlation…
Dependent variable
Independent variable
Strong Correlation…
Dependent variable
Independent variable
52
5/7/2015
Correlation coefficient …
• The main result of a correlation is called the correlation coefficient (or
"r"). It ranges from -1.0 to +1.0. The closer r is to +1 or -1, the more
closely the two variables are related. If r is close to 0, it means there is no
relationship between the variables.
• If r is positive, it means that as one variable gets larger the other gets
larger.
• If r is negative it means that as one gets larger, the other gets smaller
(often called an "inverse" correlation).
Determination coefficient …
Used to identify how much the fitted line can determine variability in
dependent variable according to variability in independent variable. Also
known as (R-squared).
53
5/7/2015
Dependent variable
Independent variable
Stratification…
Dependent variable
Independent variable
In this Scatter Diagram, there is no clear correlation between the two
variables. As we change the x-variable, we can see no clear pattern in the
y-values.
So, we will analyze the same data by grouping them into three series
based on the value of a third variable.
Strategic Management and Business Analysis - 2014 108
1 -108
54
5/7/2015
Stratification…
Dependent variable
Independent variable
Suppose each of the three series, shown in the graph as yellow, orange
and blue points, represent different cooking temperatures.
• Series one, displayed here in yellow, shows a clear positive correlation.
• Series two, in orange, shows a negative correlation.
• Series three, in blue, shows no correlation.
Strategic Management and Business Analysis - 2014 109
1 -109
• Scatter Diagram.
55
5/7/2015
56
5/7/2015
Y= β0 + β1 X + error
The error term is an acknowledgement that even if we could sample all
possible values, there would most likely be some
unpredictability in the outcome. It could be due to many
possibilities, including measurement error in either the
dependent or independent variable, the effects of other
unknown variables, or non-linear effects.
57