You are on page 1of 130

Process Optimization .

Cost Reduction
Innovation

Six Sigma Green Belt Training


Module – Analyse Phase

GENAXIS SDN BHD PROPRIETARY INFORMATION


The information contained in this document is GENAXIS proprietary information and is disclosed in confidence. It is the property of
GENAXIS and shall not be used, disclosed to others or reproduced without the express written consent of GENAXIS. If consent is given for
reproduction in whole or in part, this notice and the notice set forth on each page of this document shall appear in any such reproduction in
whole or in part. The information contained in this document may also be controlled by Malaysia export control laws. Unauthorized export
or re-export is prohibited.

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 1
Module Objectives

By the end of this module, participants should be able to:

Understand what and why hypothesis

Type of hypothesis testing in 6 sigma project

Discuss the hypothesis testing process

Understand the risks in hypothesis testing

Discuss how the p-value is used for decision making

Understand how hypothesis testing can be used on our projects to


“move down the funnel” thus screening out unimportant input
variables

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 2
Looking for Differences

• The essential questions are often:


- “Is this different than this?”, or
- “Is this bigger (or smaller) than this?”
• Graphical methods may help, but statistical tests are
needed to verify with certainty.
• Hypothesis testing involves using statistics to determine if
there truly is a difference.
• We’ll often need to make decisions about differences in
averages and differences in variability.
• A factor the “CAUSE” difference in the process output is
defined as “SIGNIFICANT FACTOR” …Thus looking for
difference in hypothesis is a “FUNNELLING PROCESS” in
6 sigma project .

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 3
Hypothesis Testing…Huh?

Imagine you are a professional golfer and


have been offered an endorsement contract
with a new ball manufacturer, GoGolf SDN BHD Ugh…
Statistics
However, you’re worried that this manufacturer
is new and its product may adversely affect
your game
You played 4 rounds with this ball, below are
your scores Your average

Did this new ball increase my average?


How sure can I be?
Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 4
Why Learn Hypothesis Testing
In General many problems require a decision to accept or
reject a statement about a parameter.
In spesific , 6 sigma projects require hypothesis test to
determine a factor significant or not to the project Y

Is the observed
difference real
or by chance?
Cycle
Time
15.7 Is the change
(minutes) in X has an
11.3 effect on Y?

Before Changes After Changes

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 5
What is hypothesis testing ?

Hypothesis
HypothesisTest:
Test: AAstatistical
statisticaltest
test to
todetermine
determine ifif
there’s
there’saadifference
difference between
betweendatadatasets.
sets.AAdecision
decision
aids
aidsin
inuncertain
uncertainsituation.
situation.
Hypothesis tests always compare two options:
• The null hypothesis (Ho) says,
‘There is no difference between the two data sets.’
• Application in 6 sigma : If changing the setting of a factor
does not affect to the mean or spread of the process output ,
thus the factor is NOT SIGNIFICANT
• The alternative hypothesis (Ha) says,
‘There is a difference between the two sets of data
• Application in 6 sigma : If changing the setting of a factor
does affect to the mean or spread of the process output ,
thus the factor is SIGNIFICANT’

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 6
Hypothesis Test in 6 Sigma

Before  Process output data set during an input is set at current operation setting
After  Process output data set when the same input is set at another operation setting

After Before After


Shift

Variation
Before reduction
Mean Shifts Variation Shift
Case 1 : Case 2 :
The factor is significant in The factor is significant in
shifting the process reducing the process
output mean ( accuracy ) . output variation ( precision ).
Thus by controlling this factor Thus by controlling this factor
at appropriate setting , it can at appropriate setting , it can
improve the process output improve the process output
accuracy precision

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 7
Hypothesis Test in 6 Sigma
Before  Process output data set during an input is set at current operation setting
After  Process output data set when the same input is set at another operation setting

After After
Before
Before
Shift Variation
reduction
No Shift
Mean and Variation Shifts
in Mean or Variation
Case 1 :
Case 2 :
The factor is significant in
The factor is not significant in
shifting the process
shifting the process mean
output mean ( accuracy ) and also
(accuracy) nor reducing the process
reducing the variation .
output variation ( precision ).
Thus by controlling this factor
Thus this factor can be ignored in
at appropriate setting , it can
term of improvement .
improve the process output
accuracy & precision .

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 8
Thus …

Hypothesis testing enable us to move “down the funnel” by analyzing


the dataset that consists of process input settings ( data ) versus
process output data . The Rule Of Thumb as below :

– There is no difference: Do not focus on the variables that changed


between the two groups of data – Screen them out

– There is a difference: Focus on the input variables that are


different between the groups

Data can come from a couple sources

– Passive: A process is sampled or historic sample data is obtained

– Active: A modification is made to a process and then sample data


is obtained

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 9
Can Hypothesis Testing Lead Us
To The Wrong Conclusions?
In hypothesis testing, relatively small samples are used to answer
questions about population parameters (inferential statistics)

There is always a chance that the selected sample is not


representative of the population; therefore, there is always a chance
that the conclusion obtained is wrong

With some assumptions, inferential statistics allows the estimation of


the probability of getting an “odd” sample and quantifies the
probability (p-value) of a wrong conclusion

Let’s go through the process of Hypothesis Testing via


an example…

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 10
Hypothesis Testing Example
Ho: Age doesn’t matter in a company’s hiring practices
Ha: Age does matter in a company’s hiring practices

Ho: Return rate is the same for Customer A and Customer B


Ha: Return rate is not the same for Customer A and Customer B

Ho: Project X Avg. Cycle Time = Project Y Avg. Cycle Time


Ha: Project X Avg. Cycle Time  Project Y Avg. Cycle Time

Ho = _______________________________________
Ha = _______________________________________

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 11
Hypothesis

State
Stateaa“Null
“NullHypothesis”
Hypothesis”(H
(Hoo))

Gather
Gatherevidence
evidence(a
(asample
sampleof
ofreality)
reality)

DECIDE:
DECIDE:
What
Whatdoes
doesthe
theevidence
evidencesuggest?
suggest?

Reject
RejectHHoo?? or
or Don’t
Don’tReject
RejectHHoo??

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 12
Decision Errors

In deciding to Reject or Not, we could make one of two decision errors

Your
YourDecision
Decision

Accept Ho Reject Ho

Ho True Type I
Correct
Correct Error
-Risk)
The
The
Truth
Truth
Type II Error
-Risk) Correct
Correct
Ho False

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 13
Example:
A Trial

Jury’s
Jury’sDecision
Decision

He’s Not Guilty He’s Guilty Consequence:


Innocent Man
Actually Type I Goes to Jail
Innocent Correct
Correct Error
-Risk)
The
The
Truth
Truth
Type II Error
-Risk) Correct
Correct
Actually Guilty

Consequence: Criminal goes Free

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 14
Example:
Airport Security

Alarm’s
Alarm’sDecision
Decision

No Bad Stuff In
Bag Bad Stuff In Bag

Type I Consequences:
No Bad Stuff In Customer held
Correct _____________
Bag Correct Error up at security
_____________
-Risk)
The
The
Truth
Truth
Type II Error
-Risk) Correct
Correct
Bad Stuff In Bag

Consequences: Weapon on plane


_____________

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 15
Using the p-value for
Hypothesis Testing

p-value: Probability that the observed results could occur by chance

By default p-value = a which is 0.05

Small
Smallp-value
p-value
 H is rejected
Large
Largep-value
p-value
 Ho is rejected
o  H is not rejected
 Ho is not rejected
 HHa is accepted and I have p- o
a is accepted and I have p-

 HHa is rejected
value a is rejected
valuechance
chanceof
ofbeing
beingwrong

wrong

Traditionally for Six Sigma: IfIfppis


islow,
low,
If p  0.05, we reject the Null Hypothesis and accept HHo must
mustGO!
GO!
o
the Alternative Hypothesis

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 16
p-values are Everywhere!
Test for Equal Variances for All
F-Test Probability Plot of All
Test Statistic 0.20
Normal
Data1 P-Value 0.000 99.9
Mean 4.318
Levene's Test
StDev 0.4640
data

Test Statistic 92.73 99 N 460


P-Value 0.000 AD 7.234
Data2 95 P-Value <0.005
90
80
0.20 0.25 0.30 0.35 0.40 0.45 0.50
70

Percent
95% Bonferroni Confidence Intervals for StDevs 60
50
40
30
20
10
Data1
5
data

Data2
0.1
3.0 3.5 4.0 4.5 5.0 5.5 6.0
3.0 3.5 4.0 4.5 5.0 5.5 6.0 All
A ll
Summary for All
A nderson-D arling N ormality Test
A -S quared 7.23
One-way ANOVA: All versus data P -V alue < 0.005
Source DF SS MS F P M ean 4.3177
S tD ev 0.4640
data 1 43.503 43.503 360.27 0.000 V ariance 0.2153
Error 458 55.303 0.121 S kew ness 0.655323
Kurtosis -0.014033
Total 459 98.806 N 460

M inimum 3.0859
1st Q uartile 3.9984
S = 0.3475 R-Sq = 44.03% R-Sq(adj) = 43.91%
M edian 4.2009
3rd Q uartile 4.6338
3.2 3.6 4.0 4.4 4.8 5.2 5.6 M aximum 5.8671
Individual 95% CIs For Mean Based on
95% C onfidence Interv al for M ean
Pooled StDev 4.2752 4.3602
Level N Mean StDev --+---------+---------+---------+------- 95% C onfidence Interv al for M edian

Data1 230 4.0102 0.1994 (--*-) 4.1554 4.2434


95% C onfidence Interv al for S tD ev
Data2 230 4.6253 0.4491 (-*--) 95% Confidence Intervals
0.4358 0.4961
--+---------+---------+---------+------- Mean

4.00 4.20 4.40 4.60


Median

4.15 4.20 4.25 4.30 4.35

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 17
Hypothesis and Decision Risk

P Value Is Extremely Important

Remember This Key Saying….

If P is Low, Ho
Must Go!

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 18
Types of Hypothesis Testing

Comparing Means

1 Factor 2 Factors 3 or
more
factors
1 Sample 2 Samples 2 or
more
 Known  Not known Independen Paired samples
t
1-Sample 1-Sample 2-Sample Paired One-Way Two-Way ANOVA
Z-test t-test t-test t-test ANOVA ANOVA GLM

StatAnova  One Way Anova


StatBasic StatisticsPaired t
StatBasic Statistics2-Sample t
StatBasic Statistics1-Sample t
StatBasic Statistics1-Sample Z

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 19
Hypothesis Testing Of
Variation – Road Map

Comparing Variances

1 Sample 2 Samples More than 2 Samples

1 Variance Test 2 Variance test Test for Equal Variance

Descriptive
Statistics F-test/Levene’s test Bartlett’s test/Levene’s test

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 20
Process Flow Of A Hypothesis Test

Define the problem and state objectives

State a “Null Hypothesis” (Ho)

State the “Alternate Hypothesis” (H a)

Establish significance level ()

Collect sample data

Calculate test statistic and/or p-value

DECIDE:
What does the evidence suggest?
Reject Ho? or Fail to reject Ho?

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 21
Example: Web Server Efficiency

A corporation has several web servers; there have been consistent complaints
about the efficiency of these servers – None perform significantly better or
worse than the others

The manager responsible for the operation of these servers decided to spend
$10k modifying one of them (Server 1) to improve its efficiency – Before
spending more money, time, and resources in modifying the rest of the servers
the manager wants to know if she has “significantly” improved the efficiency of
Server 1

She has sought the help of a Black Belt to answer this question – The Black
Belt suggested collecting sample data on the efficiency (measured using data
on availability of server, processing speed etc.) of Server 1 (modified) and
Server 2 (unmodified); the collected data is given in the file Servers.xls

Based on the data, how do we determine if there is a “real”


improvement in the efficiency of Server 1?

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 22
Example: Server Efficiency

Server 1 Server 2
86.1 80.2
84.9 78.7
87.8 83.6
84.2 87.2
83.5 82.1
81.4 84.7
83.3 84.5
84.7 81.5
88.4 83.4
81.0 84.7
Is the efficiency of Server 1 higher than that
of Server 2?

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 23
Example: Server Efficiency

In SigmaXL, SigmaXL  Statistical Tools -  Descriptive Statistics

Practical Question: Will the modifications on Server 1 improve its efficiency


when compared to the current process, represented by
Server 2?
Review descriptive statistics on file: Servers.xls

Is Server 1 different from Server 2?

Let’s take a look graphically first.

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 24
Example: Server Efficiency

Frequency
2
1
0
81.0 82.0 83.0 84.0 85.0 86.0 87.0 88.0
Server 1

3
Frequency

2
1
0
78.7 80.7 82.7 84.7 86.7
Server 2

• Is the mean efficiency for Server 1 significantly higher than that of Server
2?
• Are the means close enough to have occurred just by chance?
• We must show that the values we observed were so unlikely to come
from the same population, that they must come from two
different populations

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 25
Process Flow Of A Hypothesis Test

Define the problem and state objectives

State a “Null Hypothesis” (Ho)

State the “Alternate Hypothesis” (H a)

Establish significance level ()

Collect sample data

Calculate test statistic and/or p-value

DECIDE:
What does the evidence suggest?
Reject Ho? or Fail to reject Ho?

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 26
Forming A Hypothesis

Null Hypothesis (H0) Alternative Hypothesis (Ha)


– No difference/no change – Difference/change occurred
– Factor not statistically significant – Factor statistically significant
– Population follows a normal – Population does not follow a
distribution normal distribution

Assume H0 to be true until proven otherwise.


Burden of proof rests with Ha.

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 27
Example: Server Efficiency

Practical Language
– Null Hypothesis, H0: No difference in average efficiency
of servers
– Alternate Hypothesis, Ha: Average efficiency of Server 1 higher
than Server 2
Statistical Language

H0: 1 = 2
Ha: 1 > 2

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 28
Process Flow Of A Hypothesis Test

Define the problem and state objectives

State a “Null Hypothesis” (H0)

State the “Alternate Hypothesis” (H a)

Establish significance level ()

Collect sample data

Calculate test statistic and/or p-value

DECIDE:
What does the evidence suggest?
Reject H0? or Fail to reject H0?

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 29
Significance Level

The significance level (a) is the risk you are willing to take of saying a
difference exists, but you are incorrect

Statisticallyspeaking, the data supported rejection of the null Ho, but


we were wrong in doing so

There is an  % chance that we are wrong when we say


that Server 1 is more efficient than Server 2.

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 30
How Should We Select a?

Impact of Error Possible Risk Range


Minor Rework 0.1 0.05

Major Rework 0.05 0.01

Injury/ 0.01 0.001


Litigation
Death 0.001 0.0001

What level of risk can you


afford?
Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 31
Process Flow Of A Hypothesis Test

Define the problem and state objectives

State a “Null Hypothesis” (H0)

State the “Alternate Hypothesis” (H a)

Establish significance level ()

Collect sample data

Calculate test statistic and/or p-value

DECIDE:
What does the evidence suggest?
Reject H0? or Fail to reject H0?

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 32
Process Flow Of A
Hypothesis Test

Define the problem and state objectives

State a “Null Hypothesis” (H0)

State the “Alternate Hypothesis” (H a)

Establish significance level ()

Collect sample data

Calculate test statistic and/or p-value

DECIDE:
What does the evidence suggest?
Reject H0? or Fail to reject H0?

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 33
Decision Criteria

If test statistic is greater than critical statistic, reject the null
hypothesis

– Or in other words…

p < , reject the null hypothesis

p > , fail to reject the null hypothesis

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 34
Example: Server Efficiency

Interpretation

p-value = 0.199

Since p-value > -value (0.05), fail to reject H0; no difference

Conclude that no evidence of statistically significant improvement in


efficiency for server 1

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 35
Choosing the Correct Tools

The data types for input & output will determine


which tools might be useful
Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 36
Hypothesis Testing Road Map
XX(Factor)
(Factor)
Single X Several X’s

X (Factor) treated as: X (Factor) treated as:


X (Factor) treated as: X (Factor) treated as:

Continuous Discrete Continuous Discrete


Single

Continuous
• T-Test
Continuous • Scatterplot •
Homogeneity • DOE

Y • Simple • Multiple • 2,3,4,5,...-


Y (Response)

Y (Response)
of Variance
Y (Response)

Y (Response)
Regression • 1-Way Regression Way
• Curve Fitting ANOVA
ANOVA
(Response)

• Goodness of
YY(Response)

Discrete
• Multiple
Discrete

• Logistic Fit • Multiple


Regression • Test of Logistic Logistic
Independenc Regression Regression
e

Several
Multivariate
MultivariateStatistics
Statistics
Y’s (not
(not“Multi-Vari”Charts)
“Multi-Vari”Charts)

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 37
Click to edit Master title style

Process Optimization . Cost Reduction


Innovation
 Click to edit Master text styles
– Second level
 Third level
Hypothesis
– Fourth level Of Mean
» Fifth level

GENAXIS SDN BHD PROPRIETARY INFORMATION


The information contained in this document is GENAXIS proprietary information and is disclosed in confidence. It is the property of
GENAXIS and shall not be used, disclosed to others or reproduced without the express written consent of GENAXIS. If consent is given for
reproduction in whole or in part, this notice and the notice set forth on each page of this document shall appear in any such reproduction in
whole or in part. The information contained in this document may also be controlled by Malaysia export control laws. Unauthorized export
or re-export is prohibited.

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 38
Why Learn Hypothesis Tests Of Mean?
Make data driven decisions with defined confidence to determine if input
significantly affect the center of tendency of a process by shifting the
mean or median .

Mean is used as center of tendency for a normal distribution

Median is used as center of tendency for a non-normal distribution ( later


module )

Determine if a statistically significant difference of means exists between

– A sample and a target

– Two independent samples

– Paired samples ( Dependent samples )

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 39
What Are Hypothesis Tests Of Mean?

Test Method for analyzing the differences between:


1-Sample Z A sample mean and a target value when population standard
deviation is known.

1-Sample t A sample mean and a target value when the population


standard deviation is not known.

2-Sample t Means obtained from two independent samples.

Paired t Mean differences obtained from paired samples.

Note: Above tests are used when the response variable is continuous.

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 40
Hypothesis Testing
Of Means – Road map

Comparing Means

1 Factor 2 Factors 3 or
more
factors
1 Sample 2 Samples 2 or
more
 Known  Not known Independent Paired samples

1-Sample 1-Sample 2-Sample Paired One-Way Two-Way ANOVA


Z-test t-test t-test t-test ANOVA ANOVA GLM

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 41
Check Point (1) – Random

The datasets MUST


be random ( Do not influence
By special cause effect )

A non random DATA required


Data cleaning ( Brushing data that due to special cause
Effect only )
Before the use of hypothesis
Of mean .

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 42
Check Point (2) – Normality

The datasets MUST


be normally distributed

A non-normal DATA required


Data transformation ( Reciprocal method )
Before the use of hypothesis
Of mean .

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 43
Check Point (3) – Variance Check

Only before proceeding with


2 sample – T , you need to
decide whether the 2 datasets
are equal variance or not .
2 sample – T test in Minitab
required this information !

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 44
1-Sample t-test

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 45
Single Mean Comparison

target
vs.
value

  NOT known

Practical Question “Is the population


(example) statistically different from
the target value?”

Statistical Question
Ho:  = target value
Ha:   target value

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 46
1-Sample t-test

Hypothesis test about the unknown population mean using


information from one sample

Population standard deviation not known and distribution is normal

Note: Normality assumptions relaxed when the number of sample


observations is large (generally true when sample size >30)

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 47
Example: Delinquent Mortgages

A bank holds 1432 delinquent mortgages on residential properties –


The Black Belt investigating the loan process wants to know if the
mean appraised value (current) of the properties is greater than
$175,000

A preliminary random sample of 38 delinquent properties were


appraised – The sample produced a mean appraised value of
$181,769

Use the data in mortgage.xls to determine if there is enough


evidence that population mean appraised value is greater than
$175,000?

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 48
Example: Delinquent Mortgages

Practical Problem
Is the mean appraised value of all 1432 properties higher than $175,000 in
spite of the sample mean value of $181,769?
Statistical Problem
Is the population mean of appraised value greater than $175,000?
Null hypothesis: Mean appraised value of properties = $175,000
Alternate hypothesis: Population average value is greater
than $175,000
Is there sufficient evidence to show that the population average of
properties is greater than the $175,000 at a significance level of 5%?
Otherwise we will maintain the current belief – e.g., the null hypothesis

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 49
Example: Delinquent Mortgages
State the hypotheses and significance level
Ho:  = $175,000
Ha:  > $175,000
 = 0.05
What hypothesis test is appropriate?
– These hypotheses deal with mean values
– Only one factor for examination – Appraised value
– Comparing population mean against a target value using
1-Sample data
– Data follows a normal distribution
– Population standard deviation NOT known
– Use 1-Sample t-test

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 50
Example: Delinquent Mortgages

P-value > 0.05  Data


normally distributed

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 51
Example: Delinquent Mortgages

In SigmaXL  Statistical Tools  1-Sample t

Choose greater than for


Alternative (Ha:  > $175,000)

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 52
Example: Delinquent Mortgages

Interpretation
p-value = 0.132
Since p-value > -value (0.05) fail to reject Ho
Infer Ho true: Not enough evidence that average appraised value of
properties with delinquent mortgages is higher than $175,000

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 53
2-Sample t-test

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 54
2-Sample Comparison

vs.
1 2

Practical Question “Are the two populations


(example) statistically different?”

Statistical Question
Ho: 1 = 2
H: 1  2

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 55
Example: Teller vs. ATM Costs

As part of an investigation to study the transaction costs of tellers vs.


ATMs, a bank has collected a random sample of 45 teller transaction
costs and 53 ATM transaction costs

The data is given in file ATMTeller.xls

Perform a hypothesis test to determine if average value teller


transaction cost is higher than ATM transaction costs by at least $0.35

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 56
Example: Teller vs. ATM Costs

Practical problem

Is average cost of teller transactions higher than average cost of ATM
transactions by at least $0.35?

Statistical problem

Is the population mean for teller transaction cost higher than the
population mean of ATM transaction costs by at least $0.35?

Null hypothesis: Difference between mean value of teller transaction


costs and mean value of ATM transaction costs is equal to $0.35

Alternate hypothesis: Difference between mean value of teller


transaction costs and mean value of ATM transaction costs is greater
than $0.35

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 57
Example: Teller vs. ATM Costs

State the hypotheses and significance level

Ho: Teller - ATM = $0.35

Ha: Teller - ATM > $0.35


 = 0.05
What hypothesis test is appropriate?
– These hypotheses deal with mean values
– Only one factor for examination – Transaction cost
– Comparing population means based on two independent sets of
sample data
– Samples are normally distributed
– Use 2-Sample t-test

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 58
Example: Teller vs. ATM Costs

P-value > 0.05  Data normally


distributed

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 59
Example: Teller vs. ATM Costs

Graphically , the mean of


2 distributions is different.
2.2
But how confidence to say
the different is ??? 2

Need to refer to p-value 1.8

1.6

1.4

1.2

0.8

0.6

0.4

0.2
Teller ATM

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 60
Example: Teller vs. ATM Costs

In SigmaXL Statistical Tools  2-Sample t

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 61
Example: Teller vs. ATM Costs

Interpretation
p-value 0.0183
Since p-value < -risk (0.05), reject the null hypothesis
The difference between Teller cost and ATM costs is greater than
$0.35

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 62
Click to edit Master title style

Process Optimization . Cost Reduction


Innovation
 Click to edit Master text styles
– Second level

Hypothesis
 Third
level Testing Of
– Fourth level
» Fifth level Variance

GENAXIS SDN BHD PROPRIETARY INFORMATION


The information contained in this document is GENAXIS proprietary information and is disclosed in confidence. It is the property of
GENAXIS and shall not be used, disclosed to others or reproduced without the express written consent of GENAXIS. If consent is given for
reproduction in whole or in part, this notice and the notice set forth on each page of this document shall appear in any such reproduction in
whole or in part. The information contained in this document may also be controlled by Malaysia export control laws. Unauthorized export
or re-export is prohibited.

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 63
Hypothesis Testing Of
Variation – Road Map

Comparing Variances

2 Samples More than 2 Samples

2 Variance test Test for Equal Variance

F-test/Levene’s test Bartlett’s test/Levene’s test

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 64
Comparison Of Variance: 3 Scenarios

1. Two Sample Comparison


vs.
Variances of two independent
populations compared to
each other. s 2 s2 2
1

2. More than Two Sample Comparison

Variances of more than two independent populations compared to


each other.

vs. vs.

s2 1 s2 2 s2 3

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 65
2 Variance Test

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 66
Check Point (1) – Random

The datasets MUST


be random ( Do not influence
By special cause effect )

A non random DATA required


Data cleaning ( Brushing data that due to special cause
Effect only )
Before the use of hypothesis
Of mean .

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 67
Example: High Potential Analyst

A key success factor for financial analysts at Sigma Finance Consulting,


Inc. is their ability to accurately predict earnings per share (EPS)
values of various corporations – The quality of these forecasts is
typically expressed as Absolute Percentage Forecast Error (APFE) =
100 * ABS (predicted EPS-actual EPS)/actual EPS

During annual performance evaluations, Sigma Finance assesses each


analyst’s capability by measuring mean and standard deviation of
his/her Absolute Percentage Forecast Error (APFE) for the prior year

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 68
Example: High Potential Analyst

It is known that a high potential analyst will have an APFE standard
deviation of less than 3%

A VP wants to know if the rookie analyst George has proven himself


(beyond reasonable doubt) to be a high potential analyst using the
above criteria

APFE data for a random sample of 18 corporations forecasted by


George is included in a MINITAB™ worksheet APFE.xls

Perform an appropriate hypothesis test at 5% significance level to


answer the VP’s question

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 69
Example: High Potential Analyst

P-value > 0.05 Data normally


distributed

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 70
Example: High Potential Analyst

Interpretation
p-value 0.6768
Since p-value > -risk (0.05), accept the null hypothesis
The variance is equal

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 71
2 Variance Test Exercise

The hardness of water is measured in terms of the calcium ion


concentration (in parts per million). The hardness of the water in both
hot and cold water pipes in a manufacturing process was measured. A
technician claims that the hardness of cold water is more consistent than
hot water. The calcium ion concentrations of the various samples taken
are given below:

Hot Water Cold Water Test the null hypothesis that the
133.5 134.0 hot and cold water samples have
135.4 134.7
137.2 136.0 the same variability in calcium ion
138.4 131.7 concentration vs. the alternate
136.3 134.6 hypothesis that they are not same.
137.1 135.2
133.3 135.9 Use data file Caion.xls
136.5 135.6
137.5
139.4

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 72
Hypothesis Tests For 2 Or More
Variances

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 73
2 Or More Sample Comparison

vs. vs.

21 23 23


Practical Question “Are the
(example) populations’ variations
statistically different?”
Statistical Question
H o:  12 =  22 =  32

H a: At least one s2i is different

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 74
Example: Receivables Variation

Corporate finance is concerned about the excessive month-to-month


variability in accounts receivables for its Strategic Business Units (SBU)
– As part of the investigation, a Black Belt is asked to find out if we
have enough evidence to show that variances (2) of monthly
outstanding accounts receivables are not the same for various
business units

The Black Belt has collected outstanding receivables data for 18


months of four different business units with similar financial profile –
The data (in MM) is recorded in Receivables.xls

Perform a hypothesis test at 5% significance level to determine if the


population variances of receivables for the SBUs are different

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 75
Example: Receivables Variation

State the hypotheses and significance level

H o :  21 =  22 =  23 =  24

Ha: At least one of the 2i is different

 = 0.05

What hypothesis test is appropriate?

– These hypotheses deal with variances

– Comparing population variances based on four different sample data

– This data follows a normal distribution

– Apply Test for Equal Variance, choose Bartlett’s results

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 76
Example: Receivables Variation

In SigmaXL  Statistical Tools  Test for Equal Variances  Bartlett (All normal)
In SigmaXL  Statistical Tools  Test for Equal Variances  Bartlett (at least one
is not normal)
In this case, all are normal

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 77
Example: Receivables Variation

Interpretation:

p-value = 0.417 (F-test)

p-value > -value (0.05): Fail to reject Ho

Insufficient evidence that variations in receivables for four SBUs are


statistically different from each other

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 78
Click to edit Master title style

Process Optimization . Cost Reduction


Innovation
 Click to edit Master text styles
– Second level
 Third Hypothesis
level Testing
– Fourth level

Chi Square
» Fifth level

GENAXIS SDN BHD PROPRIETARY INFORMATION


The information contained in this document is GENAXIS proprietary information and is disclosed in confidence. It is the property of
GENAXIS and shall not be used, disclosed to others or reproduced without the express written consent of GENAXIS. If consent is given for
reproduction in whole or in part, this notice and the notice set forth on each page of this document shall appear in any such reproduction in
whole or in part. The information contained in this document may also be controlled by Malaysia export control laws. Unauthorized export
or re-export is prohibited.

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 79
Why Chi-Square Test For Independence?

Tests the hypothesis of independence between two variables

– Probabilities of items or subjects being classified for one variable are


tested for dependence on the classifications of the other variable

Random and Independent Sampling

Nonparametric test

– Does not require assumption of normality

– Used for attribute data

Example :
Do female feels differently than males about 3 different criteria ( appearance,
performance , fixed price ) used in choosing a car or do they feel basically
The same

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 80
Test For Independence Procedure

State the practical problem


– “Is the Y variable independent of the X variable?”
– Each combination has equal probability
State the statistical problem
– Ho: Y independent of X (no difference)
– Ha: Y dependent of X (at least one combination is different)
Calculate the test statistic 2 observed
Determine the critical value of 2 test statistic
– Reject the null hypothesis if 2 observed > 2 critical
or
– Reject the null hypothesis if p-value < -value
Translate statistical conclusion into a practical solution

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 81
Survey Example

1000 Adults were surveyed to determine their opinion on an issue


– 246 were agree
– 405 were oppose
– 349 were neutral
The split of males and females was also recorded
– 447 were male
– 553 were female
Data Types
– Gender is a binomial variable
– Political view is an ordinal variable with three values
This combination of two variables creates six possible categories

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 82
State The Problem

Null Hypothesis: The two variables, opinion and gender, are


independent variables; each combination has equal probability

– P (agree|female) = P (agree|male) and

– P (oppose|female) = P (oppose|male) and

– P (neutral|female) = P (neutral|male)

Alternative: The two variables , opinion and gender , are dependent


variables .

– P (agree|female)  P (agree|male) and/or

– P (oppose|female)  P (oppose|male) and/or

– P (neutral|female)  P (neutral|male)

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 83
Analysis In MINITAB

SigmaXL  Statistical Tools  Chi-Square Test – Two Way Table

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 84
Interpreting MINITAB Results
Results for: Test of Association Examples
Chi-Square Test: Agree, Oppose, Neutral

Observed
Expected
Chi-Sq

p-value < 
Evidence to
Opinion and gender are dependent ( Ha ) reject the Null.

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 85
Exercise: Black Belt Projects

A sample of Black Belts were asked to rate both their Six Sigma® project
performance and the average weekly hours spent with the Project
Champion discussing project details. The results are shown in the
following table. Test at the 1% level the null hypothesis of no association
between the two sets of ratings.

Flip Chart Your Analysis: If you need prompting, answers are in the
Appendix..

Time with Champion PROJECT PERFORMANCE


HOURS Low Medium High
< 0.1 17 21 12
0.1 - 1 31 53 21
>1 17 42 71

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 86
Analyze Of Variance
( ANOVA )

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 87
When to Use ANOVA?

• The ANOVA is used to test the relationship between


two sources of variation (KPIV vs. KPOV). A
discrete X and continuous Y. The relationship can
be statistically described as follows :
X(KPIV, Input)
Discrete Continuous

Y(KPOV, Response)

Continuous Discrete
Chi Square Logistic
Regression
Analyze
1 X and 1 Y S E E S E C T IO N 1 S E E S E C T IO N 4
Sim ple
ANO VA Linear
Regression
1 Discrete X
at 3 or more levels & S E E S E C T IO N 2 S E E S E C T IO N 3
a continuous Y

• ANOVA allows for 3 or more groups comparison


Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 88
The basic ANOVA Question
Main Question: Do the (means of) the quantitative
variables depend on which group (given by categorical
variables) the individual is in?
µ2
µ1 µ
3

Ho: mpop1 = mpop2 = mpop3 = . . .


Ha: at least one is different
Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 89
One-Way ANOVA Example –
MINITAB Analysis
SigmaXL  Statistical Tools  One-
Way ANOVA
The data…

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 90
One-Way ANOVA Example – MINITAB
Analysis (Cont’d)

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 91
Example #2

A rope manufacturer wants to determine whether the percentage of cotton in a


synthetic rope fiber has an effect on the tensile strength of the rope. Other
variables have been determined to be minor contributors and thus held
constant for this study. It is known that the percentage of cotton should be
between 10 and 40 for the rope to maintain other desirable properties
(abrasion resistance, etc). Increasing the tensile strength makes the rope hold
more load.
Objective is to study the effects of changing the cotton content on the tensile
strength
The Experiment:
– Factor = % Cotton Content
– Levels = 15%, 20%, 25%, 30%, 35%
– Alpha Risk = 5% (.05)
– Sampling = five samples for each level of cotton percentage
– In order to ensure that there are no influences due to lurking
variables(hidden variable/noise), we will randomize the order of the tests,
25 runs in all

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 92
Example #2
(Cont’d)

Test was run in randomized order and


the data was collected
Cotton Content.xls

We’ll analyze the data, first graphically,


then using One-Way ANOVA

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 93
Example #2
One-Way ANOVA In MINITAB
We can get additional plots simultaneously with performing the
One-Way ANOVA.

SigmaXL  Statistical Tools 


One-Way Anova

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 94
Example #2 – ANOVA Output

Mean/CI - Tensile Strength


24.30
22.30
20.30
18.30
16.30
14.30
12.30
10.30
8.30
6.30
15 20 25 30 35

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 95
Some Statistical Items To Cover
Via The Example
The hypothesis tested in a One-Way ANOVA is of equal means

– Ho: m15% = m20% = m25% = m30% = m35%

– Ha: At least one m is not equal

Since p-value is < .05, we reject Ho and conclude that there is


evidence that one or more mean is different

– This agrees with our initial graphical assessment via the Xbar-R
chart

Before we’re done, we need to validate the


assumptions of ANOVA thus putting us on solid
statistical ground with our conclusions.

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 96
ANOVA Assumptions

When using ANOVA, we make the following assumptions regarding the


residuals

– Independence

– Normally distributed

– Equal variance (an in-control Range chart is a good indication of this)

How important are these assumptions?

– Patterns in the residuals (lack of independence) can point to serious


problems with the analysis and any conclusions you make

How do we check these assumptions?

Residual analysis or residual diagnostics

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 97
Optional Exercise – Credit Processing

A financial services company has four different sites that process credit cases.
The question has been asked if the sites are performing the same or differently.
The table below contains productivity data on the average number of cases
worked per hour for a sample of employees at each of the four sites.
File: Credit Processing.xls
Follow the process shown prior to
analyzing the data; first graphically
then using ANOVA.
Is there reason to believe the sites
are performing differently?
How would you improve productivity?

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 98
Correlation &
Regression

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 99
Regression Analysis

 Used to fit lines and curves to data


 The fitted lines
– Quantify the relationship between the process variables (X’s)
and process performance (Y)
– Help identify the vital few X’s
– Enable predictions to be made
– Identify the impact of controlling the process variables (X’s)
 Produces an equation to match the line

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 100
Terms

Correlation
– A measure of linear association
– Used when both X and Y are continuous
r values range from:
– Perfect positive relationship = 1
– No relationship = 0
– Perfect negative relationship = -1
Regression
– Provides the basis for predicting the values of a variable from
the values of one or more other variables
– Used with a continuous Y and continuous Xs, or continuous Y
and categorical Xs
r2 - proportion of variation of Y explained by the prediction equation

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 101
Correlation Illustrated

20 0 2 4 6 8 10
0
15
-5
10 -10
-15
5
-20
0
-25
0 2 4 6 8 10 -30

Correlation = 1 Correlation = - 1

12
10
8
6
4
2
0
0 2 4 6 8 10

Correlation = 0

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 102
Correlation Example

Correlat.xls
Two test stations are used to measure power supply voltage.
Is there a correlation?

SigmaXL>Statistical Tools>Correlation Matrix

Correlation of Station 1 and Station 2 = 0.959, P-Value = 0.000

The two are highly correlated (.959)

Is this reasonable?
Are you comfortable with .959?
What does it mean to you?
How does the data actually look?
How would you find out?

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 103
Multiple Factor Correlation

A Black Belt evaluating a yarn spinning process wants to determine


the key process input variables (KPIVs; the “x’s”) for skein strength
(the “y”).

Her team used a cause and effect matrix to select fiber strength, fiber
length, and fiber finish as likely contributors.

Yarn.xls

SigmaXL>Statistical Tools>Correlation Matrix

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 104
Minitab Correlation

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 105
Multiple Regression Analysis

 Two or more process variables (X’s) may have an influence upon


process performance (Y).
 Multiple regression is used whenever there are two or more
possible predictor variables.
 The general form of the multiple regression equation is:

Y = b0 + b1 X 1 + b2 X 2 + ... + bn X n

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 106
Que Time Example

A call center manager would like to be able to predict the average


waiting time for the resolution of customer inquiries. Data has been
collected on two factors:
1. Number of calls in the queue
2. Problem complexity

Callque.xls

 What is/are the important factor(s)?


 What is the prediction equation?
 What are the implications?

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 107
Que Time Example (cont.)

SigmaXL>Statistical Tools >Regression > Multiple Regression

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 108
Que Time Example (cont.)

The probability the


differences attributed to
problem complexity
could have occurred by
chance.

Let’s remove problem complexity from the model.

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 109
Reduced Model

Regression Analysis: Average Time in hours versus Inquires in the Queue

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 110
A More Complex Example: Water Usage

Seventeen observations regarding water usage at a


manufacturing plant are given. There are five known process
variables and one performance variable, Y:

X1 = average monthly temperature (oF)


X2 = amount of production (pounds)
X3 = number of plant operating days in the month
X4 = number of persons on the monthly plant payroll
X5 = unknown variable that process owner said may be important
Y = monthly water usage

Use the data to mine for the “vital few” process variables that
may influence water usage.

Water.xls

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 111
Questions 1 and 2

1. Which, if any, of the normally cited performance indicators in the


file influence CEO compensation?
CEO-COMPENSATION.xls
Note: this is real data.
2. You are planning to relocate. You have collected data on the
actual selling price of 20 houses in the general area of interest.
Realestate.xls You have located a house with 3,250 square
feet, with a neighborhood rating of 4 and a condition of 3. How
much should you offer?

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 112
Non-Parametric
Statistics

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 113
What We Will Learn

 How to use the following Nonparametric tests:


– 1-Sample Sign Test
– 1-Sample Wilcoxon
– 2-Sample Mann-Whitney Test
– Kruskal-Wallis Test
– Friedman Test
 Use as alternative statistical tests.

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 114
Cautions

1. The power of nonparametric tests are less than the power of


t-tests when populations are normally distributed.
2. For a given confidence level, confidence intervals based on
nonparametric statistics will usually be wider than confidence
intervals based on normal distribution statistics.
3. However, if the sample data is from a non-normal population the
normal distribution statistics (e.g., t-tests) may give incorrect values
for P values of tests or confidence levels of confidence intervals.

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 115
One-Sample Sign Test

 Used to determine if the median of distribution is equal to a specified


target (similar to a one-sample t-test).
 Less powerful than a t-test for normal data.
 Less powerful than the Wilcoxon test if the data is symmetric.

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 116
1-Sample Sign Test Example

The historic median time to complete a phone transaction has been 55


seconds. A new process has been introduced. The time required to
complete the transactions can be found in: nonpara.xls

SigmaXL >Statistical Tools > Nonparametrics Tests > 1 Sample Sign

Ho New process  55
Ha New process < 55

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 117
1-Sample Sign Test Output

Sign Test for Median: New Process

Do we reject or fail to reject the


null hypothesis?
Do we adopt the new process?

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 118
1-Sample Wilcoxon

 Used to determine if a median is equal to a target.


 Distribution is assumed to be continuous and symmetric.
 More powerful than the T test for non-normal distributions.
 Takes into account the magnitude of the calculated difference from
the hypothesized median.

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 119
Example

A delivery company has offered to deliver all individual software


packages for a standard price (eliminating the necessity for weighing
individual packages) if the median weight is less than 11 pounds.
1,000 samples are collected. Can the offer be accepted?
Use data from nonpara1.xls

Data = Nonpara column titled Weight


Test for normality, and create a histogram.

The data is not normal, but it is continuous and symmetric, therefore a


1-Sample Wilcoxon test will be used.

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 120
The Result

SigmaXL >Statistical Tools


> Nonparametrics Tests >
1 Sample Wilcoxon

With a P value of 0,
we reject the null
and conclude that
the median weight
is below 11lbs.

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 121
2-Sample Mann-Whitney Test

 Extends the 1-sample Wilcoxon test to 2 samples.


 Can be used on ranked as well as continuous data.

Example:
After six months (and many changes to the product mix), the delivery
company ask for verification that the median package weight was
still below 11 lbs.
Verify that the data is not normal and is symmetric.
Perform a 2-Sample Mann-Whitney test.
Use data from nonpara2.xls

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 122
Comparing Two Samples

SigmaXL >Statistical Tools > Nonparametrics Tests > Mann Whitney

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 123
Session Window Output

Mann-Whitney Test and CI: Weight, Weight 1

Cannot reject at alpha = 0.05

While the median has increased, the confidence interval contains 0;


and the significance level is > .05. Therefore, we conclude the
difference in the medians is not statistically significant.

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 124
Kruskal-Wallis Test

An alternative to one-way ANOVA when the data is not normal.


Assumes data are from independent populations, continuous and the
same shape.

Example: You wish to purchase a bicycle helmet; you obtain impact


force by helmet type on the WWW. How would you decide which
helmet to buy? Use data from Helmet.xls

Can ANOVA be used?


Test for normality
SigmaXL > Graphical Tools > Histogram & Descriptive Statistics
Test for independence
SigmaXL > Statistical Tools > Correlation Matrix

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 125
Conducting the Test

Data is not normal, but is independent, continuous and relatively the


same shape. Therefore, a Kruskal-Wallis test can be used.

SigmaXL >Statistical Tools > Nonparametrics Tests > Kruskal-Wallis

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 126
Kruskal-Wallis Output

Kruskal-Wallis Test: Impact versus Helmet

29.35
27.35

Median/CI - Impact
25.35
23.35
21.35
19.35
17.35
15.35
1 2 3

There is a significant difference among the helmet


types.
Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 127
Summary

Normal Data Non-normal data


(Nonparametric tests)
T test 1-Sample Sign Test
1-Sample Wilcoxon Test
2 Sample T Test 2-Sample Mann Whitney
Test
One-way ANOVA Kruskal-Wallis Test

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 128
Tollgate Check List

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 129
Analyse Phase Presentation Slides

• Project Charter
• Sigma Value
• FMEA from Measure Phase
• Y = f(X1 + X2 + X3 …..Xn)
• Validation of root causes (Graphical & Statistical)
• Action Plan for Improve Phase
• Just Do It
• Project Execution Plan
• Issues / Challenges / Barriers

Green Belt Training Moduleã 2010, Genaxis Sdn Bhd, Malaysia. All rights reserved. Genaxis Proprietary Information – Subject to the restrictions on the cover page 130

You might also like