11.L11 Analyze GB Phase Material

5/7/2015
ANALYZE PHASE
Strategic Management and Business Analysis - 2014 1

1 -1
Sources of Variations

1 -2
1
5/7/2015
Multi-vari Plots.
• Confidence Intervals on Mean.
• Hypothesis Test on Mean.
• Hypothesis Test on Paired Mean.
• Hypothesis Test on Mean of Two Samples.
• Hypothesis Test on Variance of Two Samples.
• Contingency tables.
• Power & Sample Size.
• Non-parametric Tests.

1 -3
What are Multi Vari Plots…?

Graphical tool to look at interactions among data. An interaction occurs
when the change in response from one level of a factor to another level
depends upon the level of another factor.
It displays the response means at each level for every factor. For
example, in a two-factor model (Factor A and Factor B, each with two
levels).
This categorization of input factors helps to:
1. Reduce number of input factors.

2. Detect effect of interactions.

1 -4
2
5/7/2015
Multi Vari Plot Example…

1 -5
Multi Vari Plot Example…

From previous graph, we can get the following information:
• No big difference between (Reda’s) performance regardless production

line (1 Or 2), while there is a difference in case of (Ali). This means that
there is an interaction between operator and production line.
• For (Ali), night shift is better where response (lead time) is less generally
regardless production line.
• Best results (lowest lead time) can be achieved with the following
conditions: Operator (Ali) working on (Night shift) using production
line (2).
• etc..
1 -6
3
5/7/2015
• Multi-vari Plots.
Confidence Intervals on Mean.

1 -7
Statistical Inference…
Statistical inference is the field of statistics that allow to compare
sample statistics, such as the sample mean and sample standard
deviation, to known populations or to other samples.
For example, if we sample customer response times before and after a

process change, we will likely see a difference. Does that difference
suggest that there was an improvement, or are the two samples similar
enough to have come from the same population?
Due to sampling errors, a given sample from a population may lie within
a range of values, just as multiple samples from Deming’s Red Bead
Experiment yielded different estimates of the percent of red beads, even
though no one changed the bucket.

1 -8
4
5/7/2015
Statistical Inference…
Similarly, the true population parameter may lie anywhere within a
given range of our estimates. This is the basis for a confidence interval.
A key assumption is that population is both constant (it does not change
over time) and homogenous (a given sample is representative of the
sample).
Sample Population
Average Xbar µ
Standard Dev. s σ

1 -9
1. 1 Sample Z test
(Confidence intervals for Mean, while (σ) is known)…
When we sample from a population, and have historical evidence of the
sigma value, we can estimate the confidence interval of the mean at a
given confidence level. (For example 95%)
On Minitab, To calculate the 95% confidence interval on the mean,
(Stat > Basic statistics > 1 sample Z).

1 -10
5
5/7/2015
1. 1 Sample Z test
(Confidence intervals for Mean, while (σ) is known)…
It will be noticed that where samples n increases, the confidence
interval gets smaller. That is, we have less confidence of where the true
mean is for a smaller sample.
An assumption is that the samples are from a population with a Normal

distribution. Otherwise we should use non-parametric tests.
Needs large sample size (30 or more points)
Not commonly used as (σ) is not always exists.

1 -11
1. 1 Sample Z test (Example)…

Average waiting time for 25 patients is 35.7 minutes, with known
standard deviation of 1.8 minutes. Calculate 95% confidence interval on
mean.

1 -12
6
5/7/2015
1. 1 Sample Z test (Solution)…

Using Minitab ,
(Stat > Basic statistics > 1 sample Z).
Final answer is: 34.99 ≤ µ ≤ 36.41

1 -13
2. 1 Sample t test
(Confidence intervals for Mean, while (σ) is unknown)…
Often in practice, we do not know the true population standard
deviation. Rather, we seek to estimate that with our sample as well. In that
case, we use the student “t” distribution, which approaches a Normal
distribution as the sample size n increases. An additional parameter of the
student “t” distribution is the degrees of freedom often shown using the
Greek letter (pronounced nu), which equals n-1 for this test statistic.
Can be done with a small sample size.
On Minitab, To calculate the 95% confidence interval on the mean,
(Stat > Basic statistics > 1 sample t).

1 -14
7
5/7/2015
2. 1 Sample t test (Example)…

Average waiting time for 25 patients is 35.7 minutes, with sample
standard deviation of 1.8 minutes. Calculate 95% confidence interval on
mean.

1 -15
2. 1 Sample t test (Solution)…

Using Minitab ,
(Stat > Basic statistics > 1 sample t).
Final answer is: 34.96 ≤ µ ≤ 36.44
Note that: Confidence interval is wider than that calculated (using the
same statistics) assuming a known population sigma. The wider interval is
indicative of the decreased confidence associated with absence of
population standard deviation.
1 -16
8
5/7/2015
Hypothesis Test on Mean.

1 -17
What is the hypothesis?

Example of Fantastic Jump !!!!!
Example of Risky Games !!!!!
A hypothesis is a tentative statement that proposes a possible

explanation to some phenomenon or event. A useful hypothesis is a
TESTABLE statement which may include a prediction.
Usually, a hypothesis is based on some previous observations.

1 -18
9
5/7/2015
Why we are using hypothesis?
The Analyze phase in Six Sigma closely examines the many process
inputs identified in the Measure phase to determine if they are related to
outputs, if a relationship does exist and if it is statistically significant. An
important tool for this analysis is hypothesis testing. Hypothesis testing
uses statistical analysis to determine if the observed relationship between
two or more samples is real or due to random chance. A variety of tests
are used to find statistical evidence to reject or "not to reject" a
hypothesis. Once this is accomplished, the Six Sigma team is ready to
move forward with identifying, testing, and implementing solutions to
address the root causes of failure.

1 -19
What is the hypothesis tests?

It is a test performed to know how two variables might be related.
How to perform a hypothesis test:
1. Put the two assumptions: NULL “H0” (The statement of no

change) & ALTERNATIVE assumption “H1 Or Ha” (The statement
of a change).
2. Agree about Alpha (α) level (Accepted error level Or
SIGNIFICANCE). Commonly used as 5% but may reach 1% in
some critical tests and even 0.1%.
3. Calculate (P) Value (Probability with a value ranging from Zero to
One).
4. Compare between (P & α ) Values.

1 -20
10
5/7/2015
Definitions…
Null Hypothesis H0 : Statement of no change.. Means of the two
samples are equal.
Alternative Hypothesis H1 Or Ha : Statement of a change.. Means of the

two samples are not equal.
P Value : Probability - with a value ranging from Zero to one -

of obtaining results same or more extreme than the sample data if the null
hypothesis is true.
Samples this extreme or more extreme would occur p%
of time if the null is true

1 -21
Definitions…
Alpha (α) Error : Probability of wrongly rejecting the null hypothesis..
Also called Type I error.
Beta (β) Error : Probability of wrongly accepting the null hypothesis..

Also called type II error.
Power of the test : 1 – β.

1 -22
11
5/7/2015
REMEMBER!!!!
Ha … Means that there is a change …
H0 (Hum…) Means that there is no change …

1 -23
Mean 1
Mean 3
Mean 2
Distribution 3
Distribution 2
Distribution 1

1 -24
12
5/7/2015
Hypothesis tests conclusion…

Consider the Hypothesis on the Mean. In this case, the null hypothesis
is defined to test whether the population mean is equal to a specified
value. The alternative is that the population mean is not equal to that
value.
This is known as a two-sided test, since we must test two alternatives

on either side of the mean as follows:
1. If Mean is larger than the specified value
2. If Mean is smaller than the specified value.
In two-sided tests, we must split the level of significance alpha

between the two alternatives. For example, if we used a level of
significance of 5%, then alpha/2 = 2.5% would be applied to each
alternative to achieve a total level of significance of 5%.

1 -25
Hypothesis tests conclusion…

Note that we could also specify our Null and Alternatives to use the
entire level of significance in a one-sided test. Say, for example, we were
interested in asserting that the average fill volume of a bottle of medicine
is 3 ml or more. In that case, we might specify the Null as the mean is
greater than or equal to 3 ml, with the alternative that the mean is less
than 3 ml. If we use a level of significance of 5%, it would be applied
entirely to the one-sided test.
If (P < α) >> Reject Null hypothesis.
If (P > α) >> Fail to reject Null hypothesis.

1 -26
13
5/7/2015
REMEMBER!!!!
We don’t ACCEPT null hypothesis, we either reject (Strong conclusion)
or fail to reject (Weak conclusion).
If (P) is low, NULL will go.. If (P) is high, NULL is the guy.

1 -27
Hypothesis Test on Paired Mean.

1 -28
14
5/7/2015
3. Paired t test
Hypothesis on paired sample…
We can also compare the means of two samples to test if they are from
populations with equal means (or from the same population).
In this first case, we will consider samples that are paired: each
observation has a corresponding observation in the other sample batch.
For example, if we have two operators, or two test methods, and we make
measurements for each piece from each operator or method, then the
data is paired: Observation 1 from Operator A is paired with Observation 1
from Operator 2.
In the case of paired observations, we can calculate the difference

between the two samples for each piece, and we would expect the
average difference to be close to zero if the samples were from
populations of equal mean.
1 -29
3. Paired t test
Hypothesis on paired sample…
We can use the Hypothesis Test on Mean we have previously
discussed, testing µ=0 vs. µ≠0
(Stat > Basic statistics > Paired t).

1 -30
15
5/7/2015
3. Paired t test (Example)…

To evaluate the effect of a new process design on cycle time, five
orders are processed by both the current & new process. Test at 5%
significance (2 sided) if µCURR = µNEW
Current New
13.75 11.5
13.5 9.5
16.75 12.5
13.25 14.5
15.5 14.5
1 -31
3. Paired t test (Solution)…

Using Minitab ,
(Stat > Basic statistics > Paired t).
Final answer is: P > 0.05 (0.114) , and consequently we fail to reject Null
There is no significant difference between the two samples

1 -32
16
5/7/2015
Hypothesis Test on Mean of Two Samples.

1 -33
4. 2 sample t test
Hypothesis on two sample means…
In a more general two sample case, we can use Hypothesis Tests to
compare the mean of two samples, to test whether the samples came
from the populations of equal mean (or from the same population).
Assuming the population distribution is Normal, we calculate the test
statistic and the degrees of freedom for the critical value of the test
statistic ( V, t0) as follows:
X1 − X2 (s12 / n1 + s22 / n2 )2
t0 = ν= 2 2
s12 s22 (s1 / n1) (s22 / n2 )2
+ +
n1 n 2 n1 −1 n2 −1
1 -34
17
5/7/2015
4. 2 sample t test (Example)…
Given the following data, calculate P value.

n1=25; Xbar1=15.7; S1=1.8; S12=3.24
n2=50; Xbar2=21.2; S2=2.5; S22=6.25

1 -35
4. 2 sample t test (Solution)…
t0 = (21.2-15.7) / SQRT[(3.24/25)+(6.25/50)] = 10.90

ν = ((3.24/25)+(6.25/50))2 / (((3.24/25)2/24) + ((6.25/50)2/49))
ν = 63; t.025, 63 = 2.29 ;
p=0.000
Reject Null: Means are different

1 -36
18
5/7/2015
Hypothesis Test on Variance of Two Samples.

1 -37
Hypothesis on two samples variance…

We can also test if samples come form populations of equal variance
using the F distribution. The F statistic is calculated as the ratio of the two
sample variances. The critical value of the F statistic is calculated for
alpha/2 (for the two-sided case where H1: σ1 ≠ σ2; for a one-sided case
(ex: H1: σ1 > σ2) we use alpha). The degrees of freedom for the F statistic
are based on the sample size n for each sample; equal sample sizes are
not required.
DOF ν1 = n1-1; ν2 = n2-1
(Stat > Basic statistics > 2 variances).

1 -38
19
5/7/2015
Contingency tables.

1 -39
R x C Contingency tables…
Contingency tables, also known as R x C Contingency Tables, refers to
data that can be assembles in tables (of rows and columns) for
comparison. For example:
We may have five healthcare plans to choose from, and wish to

determine if there is a detectable difference between how these different
plans are rated by hourly and salaried employees.
We may be interested to see if there is a difference between how men

and women rate three different television shows,
We may need to check whether the repair rate for 4 machines is

different from shift to shift.

1 -40
20
5/7/2015
R x C Contingency tables…
The methodology for analyzing the (r) rows by (c) columns involves
using the chi-square statistic to compare the observed values with the
expected values, assuming independence.
The Null Hypothesis is that the p-values are equal for each column in
each row. The alternative is that at least one of the p-values is different.
The degrees of freedom equals (r-1)*(c-1), where r is the number of

rows and c is the number of columns in the Contingency table.
(Stat > tables > Chi-square test).

1 -41
Chi-square test…
Used to test whether two discrete variables are associated or not (one
variable is dependent on the other one).
Two variables are associated if the distribution of observations for one

variable differs depending on the category of the second variable.
Two variables are independent if the distribution of observations for one

variable is similar for all categories of the second variable.
It compares the actual data readings with the expected values in case
of independence.
When testing chi square, use numbers not percentages.

To use contingency tables, Expected frequency count for each cell of
the table should be at least 5
1 -42
21
5/7/2015
Hypothesis tests problems…

There are some predictable problems that can occur with Hypothesis
Testing that should be considered in our samples and our analysis:
We must address and validate the key assumptions of the tests.
Samples must be random, and we must ensure they are representative

of the population we are investigating. In surveys, low response rates
would typically provide extreme value estimates (that is, the sub-
population of people who have strong opinions one way or the other).
Samples must be from a single population. If the population is changing

over time, then estimates will be biased, with associated increases in
alpha and beta risks.

1 -43
Hypothesis tests problems…

Many of the hypothesis tests, and associated alpha and beta risks, are
dependant on normality of the population. If the population is significantly
non-normal, then the tests are not meaningful. We should test this
assumption using the Goodness of Fit tests. Some test additionally
require equal variance, which should also be tested.
Realize failure to reject is NOT acceptance. Rather, it means we don’t

have proof yet that the hypothesis should be rejected.
We need to recognize that the alpha risk is real!
Finally, we should consider the power of the samples to detect real

differences. High power implies that we are more likely to detect a
difference that truly exists.

1 -44
22
5/7/2015
Alpha risk is real…

Alpha Is the probability of rejecting the null hypothesis when it is true.
For example, if alpha is 0.05, then there are 5 chances in 100 of
incorrectly rejecting a true null hypothesis. If n investigators are
independently researching the issue, then the probability that at least one
researcher (incorrectly) rejects the null hypothesis is 1-(1- α)n. For
example, the chance that one of ten researchers, each with an alpha risk
of 0.05, will (incorrectly) reject the true null hypothesis is 40%!
Consider this the next time you read the report of the Surprising Results
Of A New Study in your newspaper. Would the unsurprising results of the
other nine researchers warrant a headline?

1 -45
Beta risk…
The Beta risk is the probability of not rejecting a false null hypothesis.
Sometimes, we speak instead of the power of the test, which is the
probability of correctly rejecting the false null hypothesis. Either way, we
need to recognize that even though we fail to reject, the null hypothesis
may still be false.
What influences our ability to correctly reject the false null hypothesis?
Or what gives a higher power (small beta risk)
1. Large difference between null & alternative mean

2. Small population sigma
3. Large sample size
4. Large Significance (α) Level

1 -46
23
5/7/2015
Alpha & Beta risk relation…

Null Hypothesis As alpha decreases,
decision line moves to right,
which increases beta risk
α
µo
β
True Condition
Decision Line

1 -47
Power & Sample Size.

1 -48
24
5/7/2015
Power & Sample size…

It links or calculate sample size required for certain test power or vice
versa
(Stat > Power & sample size).

1 -49
Power & Sample size (Example)…

Calculate required sample size to achieve power of .90 in t-test w
α=0.05, σ=1.5
Null hypothesis H0 µ=50

Alternative H1 µ= 53

1 -50
25
5/7/2015
Power & Sample size (Solution)…

Using Minitab,
(Stat > Power & sample size).
Final answer is 5

1 -51
Non-parametric Tests.

1 -52
26
5/7/2015
Non parametric tests…

We sometimes choose to use non-parametric tests, particularly when
distributional assumptions can not be met. A non-parametric test is one in
which there are no distributional requirements, such as Normality, for the
validity of the test.
Typically, non-parametric tests require larger sample sizes than the

parametric tests.

1 -53
Non-Parametric Sign Tests on Median…

Basic form: 1/2 of data should have positive difference from stated
median
Ignores magnitude of difference

1 -54
27
5/7/2015
Wilcoxon Signed Rank Test on Median…

Includes magnitude & sign of difference from median
Assumes symmetric continuous distribution
Compare to tabulated values based on n, alpha
Can be applied to differences between paired observation

1 -55
Non-Parametric Two Sample Test …

Wilcoxon Rank-Sum Test
Aka Mann-Whitney test
Assumes distributions of X1 & X2 have same shape & spread
Compare to tabulated values based on n, alpha
If normal, 95% as efficient as t-test in large samples
Always ≥86% as efficient as t-test

1 -56
28
5/7/2015
ANOVA

1 -57
Assumptions.
• One way ANOVA.
• Two Way ANOVA.
• Multi Factor ANOVA.

1 -58
29
5/7/2015
Why ANOVA …
Imagine that we need to compare 4 means (µ1, µ2, µ3, µ4) using t
sample t test.
Firstly we need to check if (µ1 = µ2)

then (µ1 = µ3)
then (µ1 = µ4)
then (µ2 = µ3)
then (µ2 = µ4)
then (µ3 = µ4)
So, six tests should be performed to get the result….
In addition to the time and effort needed to do so, there is an important

factor should be considered which is (α) risk..

1 -59
Why ANOVA …
If the six tests was performed at a 95% significance level, the final (α)
risk value will be as follows:
(α) risk value = 1 – confidence level

= 1 – (0.95 * 0.95 * 0.95 * 0.95 * 0.95 * 0.95)
= 1 – (0.735)
= 0.265
= 26.5%
So, probability to wrongly reject the Null hypothesis (equal 4 means) will
be 26.5% …

1 -60
30
5/7/2015
What ANOVA is …
ANalysis Of VAriance.
It is a tool used for comparison of two or more means using analysis of
variance and F statistics.

1 -61
ANOVA Assumptions…
1. Treatments are following normal distribution, yet there is strong
evidence that the test is robust to departures from Normality. In other
words, Normality is not generally an issue.
2. Most important is the assumption of a common level of variance

amongst the treatments, The F-test has been found to be quite
sensitive to unequal variances, so this must always be verified before
applying the F-test… Minitab includes Bartlett’s test for the data
following normal distribution & Levene’s test, which is recommended if
the data is significantly not following normal distribution. (Stat >>
ANOVA >> Test for equal variances)
3. Finally, each treatment is influenced by only random effects. In other

words, any other possible factor influencing the treatment is equally
represented within each treatment.
1 -62
31
5/7/2015
• Assumptions.
One way ANOVA.
• Two Way ANOVA.

1 -63
Onw Way ANOVA …

One-way ANOVA compares mean of several treatments of a variable
Ex: Revenue for each of 4 sales representatives. (Each representative

is a treatment).
Null hypothesis (H0) is µ1= µ2= µ3 = µ4
Alternative hypothesis (Ha) is one or more (µ) is different.
The F-statistic is calculated to check hypothesis by dividing the

variation between treatments (the Treatment Mean Square MST) by the
variation within treatments (the Error Mean Square MSE), as shown in the
following slides.

1 -64
32
5/7/2015
Onw Way ANOVA Components …

ANOVA partitions the total variation in the data amongst the following
two components:
1. Variation between treatment means

MST: Treatment Mean Square
2. Variation within treatment means

MSE: Error Mean Square
Which is the random error common to all treatments (due to equal
variance assumption)
Ratio of between to within near (Which is F statistics) equal 1 if treatment

means are equal. In other words, when the variation between the
treatments is not different than the variation within the treatments.

1 -65
One Way ANOVA Example …

For the data below and assuming equal variances between the four
representatives, Is the variation between the representatives simply due
to random error, or are their mean revenues significantly different, at a
significance alpha of 0.05?
Rep A Rep B Rep C Rep D

11.7 11.6 13.6 12.3
10.7 10.8 11.4 14.1
12.2 10.4 11.9 11.5
13.3 11.5 11.2 13.2
13.2 11.0 10.4 12.0
13.7 11.8 12.7 13.0

1 -66
33
5/7/2015
One Way ANOVA Solution …

Using Minitab, the output from the session window will be:
Source DF SS MS F P
Factor 3 8.157 2.719 2.88 0.062
Error 20 18.883 0.944
Total 23 27.040
From the P value, we can get the conclusion that there is no significant
difference between them…
Also, from calculated value for (F Statistics), we can get the same result
after comparing it with the critical F value … How??

1 -67
What is the critical value for a statistic…

Critical value is the value corresponding to a given significance level.
This cutoff value determines the boundary between rejecting null
hypothesis and fail to reject it. If the absolute value of the calculated value
from the statistical test is greater than the critical value, then the null
hypothesis is rejected in favor of the alternative hypothesis, and vice
versa.
In other words, it is the statistic value which should be reached or

exceeded in order to have a (P Value) equal to or less than the required
significance level to be able to reject the null hypothesis…

1 -68
34
5/7/2015
How can we calculate the critical value for a statistic…

Open Minitab
Go to Calc >> Probability Distribution >> then Choose the distribution

Choose “Inverse cumulative probability” .
Put required value (we can get it from the session window of the
statistics test “for example, one way ANOVA”)
Choose “input constant”, put the value of significance. (0.95 if not
stated)
And then press OK.
From session window, record the ABSOLUTE value of “X

1 -69
One Way ANOVA with Confidence Intervals…

Minitab goes a step further by providing the confidence intervals of the
mean for each treatment.
If the F-test had detected a difference in one of the means (with a p <
0.05), then we could use these confidence intervals on the mean to
determine which mean is significantly different.
In this example, the intervals overlap, consistent with our inability to

detect a statistical difference between the means. If one of the means
were statistically different, its confidence interval would not overlap one or
more of the others.

1 -70
35
5/7/2015
One Way ANOVA with Confidence Intervals…
A (----------*---------)
B (----------*---------)
C (---------*----------)
D (----------*---------)

1 -71
• Assumptions.
• One way ANOVA.
Two Way ANOVA.

1 -72
36
5/7/2015
Two Way ANOVA …

Two-Way ANOVA is used when there is a second factor that occurs
within each treatment. For example, Does the product type impact each
sales representatives revenue?
We sometimes refer to this second factor as a block effect within each

treatment (the treatments may also be called main effects).
The same basic assumptions are necessary as with the One-way

ANOVA, with the added provision that all other possible factors are
randomized within each block (when multiple samples occur within a
block).

1 -73
Two Way ANOVA Components …

Like One Way, Two Way ANOVA partitions the total variation in the data
amongst the following two components:
1. Variation between treatment means
2. Variation within treatment means

Where within treatment variation divided into three parts:
i. The random error common to all treatments (equal variance)

ii. The effect of the block variable
iii. The interaction of the treatment & the block. For example, Is
revenue for given sales agent affected by product type?

1 -74
37
5/7/2015
Two Way ANOVA Example …

For the data below and assuming equal variances between the four
representatives, Is the variation between the representatives simply due
to random error, or are their mean revenues significantly different, at a
significance alpha of 0.05? Are there differences due to the Product Type
(P1 or P2)?
Product Rep A Rep B Rep C Rep D
P1 11.7 11.6 13.6 12.3

P2 14.2 13.3 14.7 15.3
P1 12.2 10.4 11.9 11.5
P2 13.3 12.5 15.6 13.2
1 -75
Two Way ANOVA Example …

In the above data, because we have repeated (or replicated)
conditions, we can also estimate the interaction between the Product
Types and Representatives. In other words, does the effect of Product
Type on sales vary depending on the Rep? In other words, are some
Reps better at selling certain products, but not others?

1 -76
38
5/7/2015
Two Way ANOVA Solution …

Using Minitab, Firstly we have to stack the values of all representatives
in one column. Use (Data >> Stack >> Columns), output from the session
window will be:
Source DF SS MS F P
Product 1 17.8506 17.8506 24.02 0.001
Representative 3 8.1019 2.7006 3.63 0.064
Interaction 3 0.2819 0.0940 0.13 0.942
Error 8 5.9450 0.7431
Total 15 32.1794

1 -77
Two Way ANOVA with Confidence Intervals…

Sales Rep. confidence intervals
A (-------*------)
B (------*-------)
C (-------*------)
D (-------*------)
Product type confidence intervals
P1 (------*-----)
P2 (-----*-----)

1 -78
39
5/7/2015
Two Way ANOVA with Confidence Intervals…

Minitab’s confidence intervals (shown with the Display Means option)
shows that the difference in means for Sales Reps B and C account for
the relatively low p-value for Rep (0.06). The means for Product Types P1
and P2 are shown to be significantly different.
Note that this is not a preferred method for testing whether the means are
different, as the combined Type I error for the confidence intervals is much
higher than the alpha significance of each individually (as was discussed
in the first portion of this topic). A better approach using Tukey’s HSD
statistic is shown later in this topic.

1 -79
• Assumptions.
• One way ANOVA.
• Two Way ANOVA.
Multi Factor ANOVA.

1 -80
40
5/7/2015
Multi Factor ANOVA Example…

Example data is shown of revenue for four sales representatives
(expressed in thousands of dollars). Is the variation between the reps
simply due to random error, or are their mean revenues significantly
different, at a significance alpha of 0.05?
Are there differences due to the Product Type (P1 or P2)? Or to the
geographic area (region) they service?
Prod Area Rep A Rep B Rep C Rep D

P1 X 11.7 11.6 13.6 12.3
P2 X 14.2 13.3 14.7 15.3
P1 Y 12.2 10.4 11.9 11.5
P2 Y 13.3 12.5 15.6 13.2
1 -81
Multi Factor ANOVA Example…

In the above data, we have the correct combination of runs to estimate
the two-factor interactions between the main factors: Rep and Product;
Rep and Area; Product and Area. The interactions test if there is a
significant difference in the Rep means for different Product Types? (In
other words, are some Reps better at selling certain products, but not
others?) Or different Areas? Is there a significant difference in the Product
means for different Areas?

1 -82
41
5/7/2015
Multi Factor ANOVA Solution…

On Minitab, go to Stat >> ANOVA >> General Linear model
Enter required data including interactions between factors using symbol

(*) … But don’t forget to stack the data points first…
Output from the session window, will be:
Source DF Seq SS Adj SS Adj MS F P

Rep. 3 8.1019 8.1019 2.7006 3.07 0.191
Product 1 17.8506 17.8506 17.8506 20.31 0.020
Area 1 2.3256 2.3256 2.3256 2.65 0.202
Rep.*Product 3 0.2819 0.2819 0.0940 0.11 0.951
Rep.*Area 3 0.9769 0.9769 0.3256 0.37 0.782
Product*Area 1 0.0056 0.0056 0.0056 0.01 0.941
Error 3 2.6369 2.6369 0.8790
Total 15 32.1794
1 -83
Tukey’s HSD Test…

As mentioned earlier, a simple check of the confidence intervals
between all pairs of treatment means will elevate the Type I error. Tukey’s
HSD (Honestly Significant Difference) test will allow consideration of all
mean differences for a combined significance level as stated.
Reject equal means if difference in means ≥ HSD
Minitab: use Comparisons button w/in GLM dialog
Select main factors as Terms

1 -84
42
5/7/2015
Tukey’s HSD Example…

Using the Multi-Factor example data, only the Tukey’s 95%
simultaneous confidence interval for the absolute value of P2-P1 provides
an interval that does not include the value 0.0. This indicates that the
difference between the treatment means P1 and P2 is significantly
different.
Product = P1 subtracted from P2:

Lower: 0.6207
Center: 2.112
Upper: 3.604
(--------------*--------------)
----+---------+---------+---------+--
1.0 2.0 3.0 4.0
1 -85
Regression Analysis

1 -86
43
5/7/2015
Cause & Effect Analysis.
• Scatter Diagram.
• Regression Linear Model.

1 -87
Cause & Effect Diagram …

Structured approach to brainstorm root causes
Graphical representation of your understanding of relationships
Also known as Fishbone Diagrams because of their form
Sometimes called Ishakawa Diagrams in reference to a Japanese

engineer who popularized their use for Quality Improvement
Listing all the causes for a given effect in a clear, organized way makes
it easier to separate out potential problems and target areas for
improvement.

1 -88
44
5/7/2015
Cause & Effect Diagram …
Cause
Effect
Sub-cause
Sub-Cause
Sub-Cause
Cause

1 -89
Cause & Effect Diagram Example …

1 -90
45
5/7/2015
Cause & Effect Diagram Methodology …

To create a Cause and Effect diagram, begin by brainstorming the
potential relationships between the process and the outcome. The
outcome, or effect, is typically stated in terms of a problem rather than a
desired condition, which tends to help the brainstorming.
Bear in mind that the causes listed are all potential cause, since we
have no data at this point to support whether any of the causes really
contribute to the problem. In that regard, as in all brainstorming activities,
avoid judging the merits of each cause as it is offered. Only data can lead
to that judgment.
The major branches of the Fishbone are chosen to assist in

brainstorming, or afterwards to categorize the potential problems. Sub-
causes are added as needed, and it’s often helpful to go down several
levels of sub-causes.
1 -91
Cause & Effect Diagram Methodology …

You may find it convenient to use either the 5M and E or 4 P’s to either
categorize on the final Fishbone or to ensure that all areas are considered
during brainstorming.
• 5M’s & E: Manpower, Machines, Methods, Material, Measurement,

Environment
• 4 P's: Policy, Procedures, Plant, People

1 -92
46
5/7/2015
Cause & Effect Diagram Uses …

Measure Phase:
To brainstorm potential areas / defects / causes to focus process
measurements
Analyze Phase:
To brainstorm potential basic process factors, which can be investigated
in designed experiment

1 -93
• Cause & Effect Analysis.
Scatter Diagram.
• Regression Linear Model.

1 -94
47
5/7/2015
Scatter Diagram …
Scatter Diagrams are used to investigate possible correlation, or
(inter-dependence), of one variable to another.
Correlation indicates that as one variable changes, the other also

changes. Although this may indicate a cause and effect relationship, this
is not always the case, since there may be other characteristics (possibly
many more) that are actually the cause, and both the characteristics being
investigated are their effect.
In spite of these somewhat relations between variables, it can still be

useful to establish correlation. If we know, for instance, that there is
considerable correlation between two characteristics, we can use one to
predict the other, particularly if one characteristic is easy to measure and
the other isn’t.

1 -95
Scatter Diagram …
For example, if we can show that weight gain in the first three months
of pregnancy correlates well with baby development, we can use weight
gain as a predictor of healthy fetal development. The alternative would be
expensive tests to monitor the actual development of the baby.
A Scatter Diagram is a graphical tool to examine the relationship

between data collected for two different characteristics. Although the
Scatter Diagram cannot determine the cause of such a relationship, it
can show whether or not such a relationship exists, and if so, just how
strong it is.

1 -96
48
5/7/2015
Scatter Diagram Example …
X Y
dependent variable
90%
1.5 76
1.3 85
1.1 75 80%
1.6 84
2 88 70%
1.75 90
1.0 1.5 2.0

independent variable
1 -97
Interpolation and extrapolation…

In Scatter diagram we can predict Y variable given known values of the
X variables. Prediction within the range of values in the dataset used for
model-fitting is known informally as INTERPOLATION. Prediction outside
this range of the data is known as EXTRAPOLATION . Performing
extrapolation relies strongly on the model assumptions. The further the
extrapolation goes outside the data, the more room there is for the model
to fail due to wrong assumptions.
It is generally advised that when performing extrapolation, one should

accompany the estimated value of the dependent variable with a
prediction interval that represents the uncertainty. Such intervals tend to
expand rapidly as the values of the independent variable(s) moved
outside the range covered by the observed data.

1 -98
49
5/7/2015
Extrapolation …

1 -99
Correlation & Scatter plot…

Correlation is a statistical technique that can show whether and how
strongly pairs of variables are related. For example, height and weight are
related; taller people tend to be heavier than shorter people.
Dependent variable
Independent variable

1 -100
50
5/7/2015
Positive Correlation…
Dependent variable
Slope is positive.
As X increases, Y will increase.

1 -101
Negative Correlation…
Dependent variable
Slope is negative.
As X increases, Y will decrease.

1 -102
51
5/7/2015
Weak Correlation…
Dependent variable
As X varies, response Y not well predicted by line

(high error).

1 -103
Strong Correlation…
Dependent variable
As X varies, response Y well predicted by line

(low error)

1 -104
52
5/7/2015
Correlation coefficient …
• The main result of a correlation is called the correlation coefficient (or
"r"). It ranges from -1.0 to +1.0. The closer r is to +1 or -1, the more
closely the two variables are related. If r is close to 0, it means there is no
relationship between the variables.
• If r is positive, it means that as one variable gets larger the other gets
larger.
• If r is negative it means that as one gets larger, the other gets smaller
(often called an "inverse" correlation).

1 -105
Determination coefficient …
Used to identify how much the fitted line can determine variability in
dependent variable according to variability in independent variable. Also
known as (R-squared).
Regression model can be trusted only if (R-squared) is higher than 70%

1 -106
53
5/7/2015
Non Linear Correlation…
Dependent variable
Correlation may also be non-linear in nature. This is neither good nor

bad in a general sense, although it may serve our specific interests one
way or the other. Statistically, we prefer linear regression only because the
analysis is less complicated. When the correlation is non-linear we can
use stratification technique.
1 -107
Stratification…
Dependent variable
In this Scatter Diagram, there is no clear correlation between the two
variables. As we change the x-variable, we can see no clear pattern in the
y-values.
So, we will analyze the same data by grouping them into three series
based on the value of a third variable.
1 -108
54
5/7/2015
Stratification…
Dependent variable
Suppose each of the three series, shown in the graph as yellow, orange
and blue points, represent different cooking temperatures.
• Series one, displayed here in yellow, shows a clear positive correlation.
• Series two, in orange, shows a negative correlation.
• Series three, in blue, shows no correlation.
1 -109
• Cause & Effect Analysis.
• Scatter Diagram.
Regression Linear Model..

1 -110
55
5/7/2015
What is the meaning of regression analysis…

Technique for modeling and analyzing several variables, when the
focus is on the relationship between a dependent variable and one or
more independent variables. More specifically, regression analysis helps
us to understand how the typical value of the dependent variable changes
when any one of the independent variables is varied, while the other
independent variables are held fixed. Most commonly, regression analysis
estimates the expected value of the dependent variable given the
independent variables.

1 -111
What is the benefit of regression analysis…

The objective of the regression analysis is to determine if the
dependent variable can be adequately predicted by the x-axis. We can
never observe each potential value of x to determine y, so it is useful if we
could instead observe a sample of the possible outcomes and define the
relationship in the form of an equation. We could then use the equation, or
model, to predict the dependent variable for other values of the
independent variable.

1 -112
56
5/7/2015
How to build regression model…

The Regression Model used for Simple Linear Regression is that of a
straight line. You might recall this equation as y equals m times x plus b,
where y is the dependent variable, x is the independent variable, m is the
slope, and b is the value of y when x equals zero. (b is sometimes called
the intercept).
Y= β0 + β1 X + error
The error term is an acknowledgement that even if we could sample all
possible values, there would most likely be some
unpredictability in the outcome. It could be due to many
possibilities, including measurement error in either the
dependent or independent variable, the effects of other
unknown variables, or non-linear effects.

1 -113
57

11.L11 Analyze GB Phase Material

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

11.L11 Analyze GB Phase Material

Uploaded by

Copyright:

Available Formats

5/7/2015

Strategic Management and Business Analysis - 2014 1

Strategic Management and Business Analysis - 2014 2

Strategic Management and Business Analysis - 2014 3

What are Multi Vari Plots…?

This categorization of input factors helps to:

1. Reduce number of input factors.

Strategic Management and Business Analysis - 2014 4

Multi Vari Plot Example…

Strategic Management and Business Analysis - 2014 5

Multi Vari Plot Example…

• No big difference between (Reda’s) performance regardless production

Strategic Management and Business Analysis - 2014 7

For example, if we sample customer response times before and after a

Strategic Management and Business Analysis - 2014 8

Strategic Management and Business Analysis - 2014 9

On Minitab, To calculate the 95% confidence interval on the mean,

(Stat > Basic statistics > 1 sample Z).

Strategic Management and Business Analysis - 2014 10

An assumption is that the samples are from a population with a Normal

Needs large sample size (30 or more points)

Not commonly used as (σ) is not always exists.

Strategic Management and Business Analysis - 2014 11

1. 1 Sample Z test (Example)…

Strategic Management and Business Analysis - 2014 12

1. 1 Sample Z test (Solution)…

(Stat > Basic statistics > 1 sample Z).

Final answer is: 34.99 ≤ µ ≤ 36.41

Strategic Management and Business Analysis - 2014 13

Can be done with a small sample size.

On Minitab, To calculate the 95% confidence interval on the mean,

(Stat > Basic statistics > 1 sample t).

Strategic Management and Business Analysis - 2014 14

2. 1 Sample t test (Example)…

Strategic Management and Business Analysis - 2014 15

2. 1 Sample t test (Solution)…

(Stat > Basic statistics > 1 sample t).

Final answer is: 34.96 ≤ µ ≤ 36.44

Strategic Management and Business Analysis - 2014 17

What is the hypothesis?

Example of Risky Games !!!!!

A hypothesis is a tentative statement that proposes a possible

Usually, a hypothesis is based on some previous observations.

Strategic Management and Business Analysis - 2014 18

Why we are using hypothesis?

Strategic Management and Business Analysis - 2014 19

What is the hypothesis tests?

How to perform a hypothesis test:

1. Put the two assumptions: NULL “H0” (The statement of no

Strategic Management and Business Analysis - 2014 20

Alternative Hypothesis H1 Or Ha : Statement of a change.. Means of the

P Value : Probability - with a value ranging from Zero to one -

Strategic Management and Business Analysis - 2014 21

Beta (β) Error : Probability of wrongly accepting the null hypothesis..

Power of the test : 1 – β.

Strategic Management and Business Analysis - 2014 22

Ha … Means that there is a change …

H0 (Hum…) Means that there is no change …

Strategic Management and Business Analysis - 2014 23

Strategic Management and Business Analysis - 2014 24

Hypothesis tests conclusion…

This is known as a two-sided test, since we must test two alternatives

In two-sided tests, we must split the level of significance alpha