You are on page 1of 28

Analyze Phase

Inferential Statistics

M# 298
Inferential Statistics

Welcome to Analyze

“X” Sifting Inferential Statistics

Inferential Statistics Nature of Sampling

Intro to Hypothesis Testing Central Limit Theorem

Hypothesis Testing ND P1

Hypothesis Testing ND P2

Hypothesis Testing NND

Hypothesis Testing Attribute Data

Wrap Up & Action Items

M# 299
OSSS LSS Black Belt v10.3 - Analyze Phase 2 © Open Source Six Sigma, LLC
Nature of Inference

in·fer·ence (n.) “The act or process of deriving logical conclusions


from premises known or assumed to be true. The act of reasoning
from factual knowledge or evidence.” 1
1. Dictionary.com

Inferential Statistics – To draw inferences about the process or


population being studied by modeling patterns of data in a way that
account for randomness and uncertainty in the observations. 2
2. Wikipedia.com

Putting the pieces of


the puzzle
together….
M# 299
OSSS LSS Black Belt v10.3 - Analyze Phase 3 © Open Source Six Sigma, LLC
5 Step Approach to Inferential Statistics

1. What do you want to know?

2. What tool will give you that information?

3. What kind of data does that tool require?

4. How will you collect the data?

5. How confident are you with your data summaries?

So many
questions….?
M# 300
OSSS LSS Black Belt v10.3 - Analyze Phase 4 © Open Source Six Sigma, LLC
Types of Error

1. Error in sampling
– Error due to differences among samples drawn at random from the
population (luck of the draw).
– This is the only source of error that statistics can accommodate.
2. Bias in sampling
– Error due to lack of independence among random samples or due to
systematic sampling procedures (height of horse jockeys only).
3. Error in measurement
– Error in the measurement of the samples (MSA/GR&R).
4. Lack of measurement validity
– Error in the measurement does not actually measure what it intends
to measure (placing a probe in the wrong slot measuring temperature
with a thermometer that is just next to a furnace).

M# 300
OSSS LSS Black Belt v10.3 - Analyze Phase 5 © Open Source Six Sigma, LLC
Population, Sample, Observation

Population
– EVERY data point that has ever been or ever will be generated
from a given characteristic.

Sample
– A portion (or subset) of the population, either at one time or over
time.
X X
X
X X

Observation
– An individual measurement.
X

M# 301
OSSS LSS Black Belt v10.3 - Analyze Phase 6 © Open Source Six Sigma, LLC
Significance

Significance is all about differences. In general, larger differences


(or deltas) are considered to be “more significant.”
Practical difference and significance is:
– The amount of difference, change, or improvement that will be of
practical, economic or technical value to you.
– The amount of improvement required to pay for the cost of making
the improvement.
Statistical difference and significance is:
– The magnitude of difference or change required to distinguish
between a true difference, change or improvement and one that
could have occurred by chance.
Six Sigma decisions will ultimately have a return on resource
investment (RORI)* element associated with them.
– The key question of interest for our decisions “is the benefit of
making a change worth the cost and risk of making it?”

* RORI includes not only dollars and assets but the time and participation of your teams.
M# 301
OSSS LSS Black Belt v10.3 - Analyze Phase 7 © Open Source Six Sigma, LLC
The Mission

Variation
Mean Shift Reduction Both

Your mission, which you have chosen to accept, is to reduce cycle time, reduce the
error rate, reduce costs, reduce investment, improve service level, improve throughput,
reduce lead time, increase productivity… change the output metric of some process,
etc…

In statistical terms, this translates to the need to move the process Mean and/or reduce
the process Standard Deviation.

You’ll be making decisions about how to adjust key process input variables based on
sample data, not population data - that means you are taking some risks.

How will you know your key process output variable really changed, and is not just an
unlikely sample? The Central Limit Theorem helps us understand the risk we are
taking and is the basis for using sampling to estimate population parameters.
M# 302
OSSS LSS Black Belt v10.3 - Analyze Phase 8 © Open Source Six Sigma, LLC
A Distribution of Sample Means

Imagine you have some population. The individual values of this


population form some distribution.

Take a sample of some of the individual values and calculate the


sample Mean.

Keep taking samples and calculating sample Means.

Plot a new distribution of these sample Means.

The Central Limit Theorem says that as the sample size becomes
large, this new distribution (the sample Mean distribution) will form a
Normal Distribution, no matter what the shape of the population
distribution of individuals.

M# 302
OSSS LSS Black Belt v10.3 - Analyze Phase 9 © Open Source Six Sigma, LLC
Sampling Distributions—The Foundation of Statistics

Population
3
• Samples from the population, each with five observations:
5
2 Sample 1 Sample 2 Sample 3
12 1 9 2
10 12 8 3
1
6 9 5 6
12 7 14 11
5 8 10 10
6
12 7.4 9.2 6.4
14
3 • In this example, we have taken three samples out of the
6 population, each with five observations in it. We computed a
11 mean for each sample. Note that the means are not the
9
10 same!
10 • Why not?
12 • What would happen if we kept taking more samples?

M# 303
OSSS LSS Black Belt v10.3 - Analyze Phase 10 © Open Source Six Sigma, LLC
Constructing Sampling Distributions

Calc > Random Data > Integer…

M# 303
OSSS LSS Black Belt v10.3 - Analyze Phase 11 © Open Source Six Sigma, LLC
Sampling Distributions

To draw random samples from the population follow the command


shown below and repeat 4 more times for the other columns.
Calc > Random Data > Sample From Columns…

M# 304
OSSS LSS Black Belt v10.3 - Analyze Phase 12 © Open Source Six Sigma, LLC
Sampling Error

Calculate the Mean and Standard Deviation for each column and
compare the sample statistics to the population.
Stat > Basic Statistics > Display Descriptive Statistics…
Select all 6 columns to the ‘Variables” window

Range in Mean 3.0 Range in StDev 1.386


M# 304
OSSS LSS Black Belt v10.3 - Analyze Phase 13 © Open Source Six Sigma, LLC
Sampling Error

Create 5 more columns of data sampling 10 observations from the


population.
“Calc > Random Data > Sample from Columns…”

M# 305
OSSS LSS Black Belt v10.3 - Analyze Phase 14 © Open Source Six Sigma, LLC
Sampling Error - Reduced

Calculate the Mean and Standard Deviation for each column and
compare the sample statistics to the population.
“Stat > Basic Statistics > Display Descriptive Statistics…”

Range in Mean 1.5 Range in StDev 0.867

With 10 observations, the differences


between samples are now much smaller.
M# 305
OSSS LSS Black Belt v10.3 - Analyze Phase 15 © Open Source Six Sigma, LLC
Sampling Error - Reduced

Variable N Mean StDev


Sample 11 30 3.733 1.818
Sample 12 30 3.800 1.562
Sample 13 30 3.400 1.868
Sample 14 30 3.667 1.768
Sample 15 30 3.167 1.487
Range in Mean 0. 63 Range in StDev 0.381
M# 306
OSSS LSS Black Belt v10.3 - Analyze Phase 16 © Open Source Six Sigma, LLC
Sampling Distributions

In theory, if we kept taking samples of size n=5 and n=10 and


calculated the sample Means, we could see how the sample
Means are distributed.

Feeling
lucky…?

Simulate this in MINITABTM by creating ten columns of 1000


rolls of a die:
M# 306
OSSS LSS Black Belt v10.3 - Analyze Phase 17 © Open Source Six Sigma, LLC
Sampling Distributions

For each row, calculate the Mean of five columns.

Repeat this command to


calculate the Mean of
C1-C10, and store result
in Mean10.

M# 307
OSSS LSS Black Belt v10.3 - Analyze Phase 18 © Open Source Six Sigma, LLC
Sampling Distributions

Create a Histogram of C1, Mean5 and Mean10.


Graph > Histogram > Simple…..
Multiple Graph…On separate graphs…Same X, including same bins

Select “Same X ,
including same
bins” to facilitate
comparison

M# 307
OSSS LSS Black Belt v10.3 - Analyze Phase 19 © Open Source Six Sigma, LLC
Different Distributions

Sample Means

1. What is different about the


three distributions?

2. What happens as the


number of dice increases?

Individuals
M# 308
OSSS LSS Black Belt v10.3 - Analyze Phase 20 © Open Source Six Sigma, LLC
Observations

As the sample size (number of dice) increases from 1 to 5 to 10, there


are three points to note:
1. The Center remains the same.
2. The variation decreases.
3. The shape of the distribution changes - it tends to become
Normal.

The Mean of the sample Mean The Standard Deviation of the


distribution: sample Mean distribution, also
known as the Standard Error.

Good news: the Mean of the sample Better news: I can reduce my
Mean distribution is the Mean of the uncertainty about the population
population. Mean by increasing my sample size n.

M# 308
OSSS LSS Black Belt v10.3 - Analyze Phase 21 © Open Source Six Sigma, LLC
Central Limit Theorem

If all possible random samples, each of size n, are taken from any
population with a Mean μ and Standard Deviation σ, the distribution
of sample Means will:

have a Mean

have a Std Dev

and be Normally Distributed when the parent population is


Normally Distributed or will be approximately Normal for samples
of size 30 or more when the parent population is not Normally
Distributed.

This improves with samples of larger size.

Bigger is Better!
M# 308
OSSS LSS Black Belt v10.3 - Analyze Phase 22 © Open Source Six Sigma, LLC
So What?

So how does this theorem help me


understand the risk I am taking when I use
sample data, instead of population data?

Recall that 95% of Normally Distributed data is within ± 2 Standard


Deviations from the Mean. Therefore, the probability is 95% that my
sample Means are within 2 standard errors of the true population
Mean.

M# 309
OSSS LSS Black Belt v10.3 - Analyze Phase 23 © Open Source Six Sigma, LLC
A Practical Example

Let’s say your project is to reduce the setup time for


a large casting:
– Based on a sample of 20 setups, you learn that your baseline
average is 45 minutes, with a Standard Deviation of 10 minutes.
– Because this is just a sample, the 45 minute average is just an
estimate of the true average.
– Using the Central Limit Theorem, there is 95% probability that the
true average is somewhere between 40.5 and 49.5 minutes. This
range is also referred to as a Confidence Interval
– Therefore don’t get too excited if you made a process change that
resulted in a reduction of only 2 minutes, you are just “bouncing
within the Confidence Interval” of the population.

M# 309
OSSS LSS Black Belt v10.3 - Analyze Phase 24 © Open Source Six Sigma, LLC
Sample Size and the Mean

When taking a sample we have only estimated the


true Mean:
– All we know is that the true Mean lies somewhere within the theoretical
distribution of sample Means or the t-distribution which are analyzed
using t-tests.
– T-tests measure the significance of differences between Means.

Theoretical distribution of
sample Means for n = 2

Theoretical distribution of Distribution of individuals in


sample Means for n = 10 the population

M# 310
OSSS LSS Black Belt v10.3 - Analyze Phase 25 © Open Source Six Sigma, LLC
Standard Error of the Mean

The Standard Deviation for the distribution of Means


is called the standard error of the Mean and is
defined as:
– This formula shows that the Mean is more stable than a single
observation by a factor of the square root of the sample size.

M# 310
OSSS LSS Black Belt v10.3 - Analyze Phase 26 © Open Source Six Sigma, LLC
Standard Error

The rate of change in the Standard Error approaches zero at about


30 samples.
Standard Error

0 5 10 20 30
Sample Size

This is why 30 samples is often recommended when generating


summary statistics such as the Mean and Standard Deviation.

This is also the point at which the t and Z distributions become


nearly equivalent.
M# 311
OSSS LSS Black Belt v10.3 - Analyze Phase 27 © Open Source Six Sigma, LLC
Summary

At this point, you should be able to:

• Explain the term “Inferential Statistics”

• Explain the Central Limit Theorem

• Describe what impact sample size has on your estimates of


population parameters

• Explain Standard Error

M# 312
OSSS LSS Black Belt v10.3 - Analyze Phase 28 © Open Source Six Sigma, LLC

You might also like