Professional Documents
Culture Documents
Day 1 Agenda
Welcome and Introductions Course Structure
Group Expectations Introduction to Six Sigma Applications Red Bead Experiment Introduction to Probability Distributions Common Probability Distributions and Their Uses Correlation Analysis
The National Graduate School of Quality Management v7 2
Day 2 Agenda
Team Report Outs on Day 1 Material Central Limit Theorem Process Capability Multi-Vari Analysis Sample Size Considerations
Day 3 Agenda
Team Report Outs on Day 2 Material Confidence Intervals Control Charts Hypothesis Testing ANOVA (Analysis of Variation) Contingency Tables
Day 4 Agenda
Team Report Outs on Practicum Application Design of Experiments Wrap Up - Positives and Deltas
Class Guidelines
Q&A as we go Breaks Hourly Homework Readings
As assigned in Syllabus
Grading
Class Preparation 30% Team Classroom Exercises30% Team Presentations 40% 10 Minute Daily Presentation (Day 2 and 3) on Application of previous days work 20 minute final Practicum application (Last day) Copy on Floppy as well as hard copy Powerpoint preferred Rotate Presenters Q&A from the class
Learning Objectives
Have a broad understanding of statistical concepts and tools. Understand how statistical concepts can be used to improve business processes. Understand the relationship between the curriculum and the four step six sigma problem solving process (Measure, Analyze, Improve and Control).
Customer Critical To Quality (CTQ) Criteria Breakthrough Improvements Fact-driven, Measurement-based, Statistically Analyzed Prioritization Controlling the Input & Process Variations Yields a Predictable Product
A Quality Level
6 = 3.4 Defects per Million Opportunities A Structured Problem-Solving Approach
F F
A Program
F F F
Dedicated, Trained BlackBelts Prioritized Projects Teams - Process Participants & Owners
The National Graduate School of Quality Management v7 9
Sweet Fruit
Design for Manufacturability
Process Entitlement
Bulk of Fruit
Process Characterization and Optimization
Ground Fruit
Logic and Intuition
PROCESSES WHICH PROVIDE PRODUCT VALUE IN THE CUSTOMERS EYES FEATURES OR CHARACTERISTICS THE CUSTOMER WOULD PAY FOR.
WASTE SCATTERED THROUGHOUT THE VALUE STREAM EXCESS INVENTORY REWORK WAIT TIME EXCESS HANDLING EXCESS TRAVEL DISTANCES TEST AND INSPECTION
Waste is a significant cost driver and has a major Waste is a significant cost driver and has a major impact on the bottom line... impact on the bottom line...
Y = f(x)
INSPECTION EXERCISE The necessity of training farm hands for first class farms in the fatherly handling of farm livestock is foremost in the minds of farm owners. Since the forefathers of the farm owners trained the farm hands for first class farms in the fatherly handling of farm livestock, the farm owners feel they should carry on with the family tradition of training farm hands of first class farms in the fatherly handling of farm livestock because they believe it is the basis of good fundamental farm management.
How many fs can you identify in 1 minute of inspection.
The National Graduate School of Quality Management v7 14
INSPECTION EXERCISE The necessity of* training f*arm hands f*or f*irst class f*arms in the f*atherly handling of* f*arm livestock is f*oremost in the minds of* f*arm owners. Since the f*oref*athers of* the f*arm owners trained the f*arm hands f*or f*irst class f*arms in the f*atherly handling of* f*arm livestock, the f*arm owners f*eel they should carry on with the f*amily tradition of* training f*arm hands of* f*irst class f*arms in the f*atherly handling of* f*arm livestock because they believe it is the basis of* good f*undamental f*arm management.
How many fs can you identify in 1 minute of inspection.36 total are available.
The National Graduate School of Quality Management v7 15
Traditional Traditional
Focus on Firefighting High cost/low throughput Reliance on Test and Inspection Processes based on Random Probability Reactive High Failure Rates Focus on Short Term Wasteful Manage by Seat of the pants
SIX SIGMA TAKES US FROM FIXING PRODUCTS SO THEY ARE EXCELLENT, TO FIXING PROCESSES SO THEY PRODUCE EXCELLENT PRODUCTS
Dr. George Sarney, President, Siebe Control Systems
The National Graduate School of Quality Management v7 16
IMPROVEMENT ROADMAP
Objective
Phase 1: Measurement Characterization Phase 2: Analysis Breakthrough Strategy Phase 3: Improvement Optimization Phase 4: Control
Define the problem and verify the primary and secondary measurement systems. Identify the few factors which are directly influencing the problem. Determine values for the few contributing factors which resolve the problem. Determine long term control measures which will ensure that the contributing factors remain controlled.
WHY STATISTICS?
IF IF IF IF IF
WE WE WE WE WE
DONT HAVE DATA, WE DONT KNOW DONT KNOW, WE CAN NOT ACT CAN NOT ACT, THE RISK IS HIGH DO KNOW AND ACT, THE RISK IS MANAGED DO KNOW AND DO NOT ACT, WE DESERVE THE LOSS.
LSL
USL
TO GET DATA WE MUST MEASURE DATA MUST BE CONVERTED TO INFORMATION INFORMATION IS DERIVED FROM DATA THROUGH STATISTICS
WHY STATISTICS?
Ignorance is not bliss, it is the food of failure and the breeding ground for loss.
DR. Mikel J. Harry
LSL
USL
Years ago a statistician might have claimed that statistics dealt with the processing of data. Todays statistician will be more likely to say that statistics is concerned with decision making in the face of uncertainty.
Bartlett
Learning Objectives
Have a broad understanding of statistical concepts and tools. Understand how statistical concepts can be used to improve business processes. Understand the relationship between the curriculum and the four step six sigma problem solving process (Measure, Analyze, Improve and Control).
Learning Objectives
Have an understanding of the difference between random variation and a statistically significant event. Understand the difference between attempting to manage an outcome (Y) as opposed to managing upstream effects (xs). Understand how the concept of statistical significance can be used to improve business processes.
The National Graduate School of Quality Management v7 24
HIRING NEEDS
BEADS ARE OUR BUSINESS
STANDING ORDERS
Follow the process exactly. Do not improvise or vary from the documented process. Your performance will be based solely on your ability to produce white beads. No questions will be allowed after the initial training period. Your defect quota is no more than 5 off color beads allowed per paddle.
Insert the bead paddle at a 45 degree angle into the bead bowl. Agitate the bead paddle gently in the bead bowl until all spaces are filled. Gently withdraw the bead paddle from the bowl at a 45 degree angle and allow the free beads to run off. Without touching the beads, show the paddle to inspector #1 and wait until the off color beads are tallied. Move to inspector #2 and wait until the off color beads are tallied. Inspector #1 and #2 show their tallies to the inspection supervisor. If they agree, the inspection supervisor announces the count and the tally keeper will record the result. If they do not agree, the inspection supervisor will direct the inspectors to recount the paddle. When the count is complete, the operator will return all the beads to the bowl and hand the paddle to the next operator.
WHITE BEAD MANUFACTURING PROCESS The operator will take the bead paddle in the right hand. PROCEDURES
INCENTIVE PROGRAM
Low bead counts will be rewarded with a bonus. High bead counts will be punished with a reprimand. Your performance will be based solely on your ability to produce white beads. Your defect quota is no more than 7 off color beads allowed per paddle.
The National Graduate School of Quality Management v7 28
PLANT RESTRUCTURE
Defect counts remain too high for the plant to be profitable. The two best performing production workers will be retained and the two worst performing production workers will be laid off. Your performance will be based solely on your ability to produce white beads. Your defect quota is no more than 10 off color beads allowed per paddle.
The National Graduate School of Quality Management v7 29
OBSERVATIONS.
Y = f(x)
Learning Objectives
Have an understanding of the difference between random variation and a statistically significant event. Understand the difference between attempting to manage an outcome (Y) as opposed to managing upstream effects (xs). Understand how the concept of statistical significance can be used to improve business processes.
The National Graduate School of Quality Management v7 32
Learning Objectives
Have a broad understanding of what probability distributions are and why they are important. Understand the role that probability distributions play in determining whether an event is a random occurrence or significantly different. Understand the common measures used to characterize a population central tendency and dispersion. Understand the concept of Shift & Drift. Understand the concept of significance testing.
Why do we Care?
An understanding of An understanding of Probability Distributions is Probability Distributions is necessary to: necessary to: Understand the concept and Understand the concept and use of statistical tools. use of statistical tools. Understand the significance of Understand the significance of random variation in everyday random variation in everyday measures. measures. Understand the impact of Understand the impact of significance on the successful significance on the successful resolution of a project. resolution of a project.
IMPROVEMENT ROADMAP
Uses of Probability Distributions
Project Uses
Phase 1: Measurement Characterization Phase 2: Analysis Breakthrough Strategy Phase 3: Improvement Optimization Phase 4: Control
Establish baseline data characteristics. Identify and isolate sources of variation. Demonstrate before and after results are not random chance. Use the concept of shift & drift to establish project expectations.
KEYS TO SUCCESS
Focus on understanding the concepts Visualize the concept Dont get lost in the math.
Types of Measures
Measures where the metric is composed of a classification in one of two (or more) categories is called Attribute data. This data is usually presented as a count or percent.
Measures where the metric consists of a number which indicates a precise value is called Variable data.
Time Miles/Hr
Results from 10,000 people doing a coin toss 200 times. Cumulative Count
10000
Frequency
Cumulative Frequency
Cumulative Frequency
Cumulative Percent
5000
"Head Count"
70
80
90
100
110
120
130
Cumulative Percent
Cumulative count is simply the total frequency Cumulative count is simply the total frequency count accumulated as you move from left to count accumulated as you move from left to right until we account for the total population of right until we account for the total population of 10,000 people. 10,000 people. Since we know how many people were in this Since we know how many people were in this population (ie 10,000), we can divide each of the population (ie 10,000), we can divide each of the cumulative counts by 10,000 to give us a curve cumulative counts by 10,000 to give us a curve with the cumulative percent of population. with the cumulative percent of population.
Results from 10,000 people doing a coin toss 200 times. Cumulative Percent
100
50
"Head Count"
Cumulative Percent
This means that we can now This means that we can now predict the change that predict the change that certain values can occur certain values can occur based on these percentages. based on these percentages. Note here that 50% of the Note here that 50% of the values are less than our values are less than our expected value of 100. expected value of 100.
50
This means that in a future This means that in a future experiment set up the same experiment set up the same way, we would expect 50% way, we would expect 50% of the values to be less than of the values to be less than 100. 100.
Frequency
We can now equate a probability to the We can now equate a probability to the occurrence of specific values or groups of occurrence of specific values or groups of values. values. For example, we can see that the For example, we can see that the occurrence of a Head count of less than occurrence of a Head count of less than 74 or greater than 124 out of 200 tosses 74 or greater than 124 out of 200 tosses is so rare that a single occurrence was is so rare that a single occurrence was not registered out of 10,000 tries. not registered out of 10,000 tries. On the other hand, we can see that the On the other hand, we can see that the chance of getting a count near (or at) 100 chance of getting a count near (or at) 100 is much higher. With the data that we is much higher. With the data that we now have, we can actually predict each of now have, we can actually predict each of these values. these values.
Results from 10,000 people doing a coin toss 200 times. Cumulative Percent
100
"Head Count"
Cumulative Percent
50
"Head Count"
If we know where If we know where we are in the we are in the population we can population we can equate that to a equate that to a probability value. probability value. This is the purpose This is the purpose of the sigma value of the sigma value (normal data). (normal data).
Frequency
SIGMA ( ) IS A MEASURE OF SCATTER FROM THE EXPECTED VALUE THAT CAN BE USED TO CALCULATE A PROBABILITY OF OCCURRENCE
80
79 86
90
93
100
100 107
110
114
120
121
130
128 135 142
-6
-5
-4
.003
-3
.135
-2
2.275
-1
15.87
0
50.0
1
84.1
2
97.7
3
99.86
4
99.997
What are the chances that this What are the chances that this just happened If they are small, just happened If they are small, chances are that an external chances are that an external influence is at work that can be influence is at work that can be used to our benefit. used to our benefit.
The National Graduate School of Quality Management v7 45
Since the sample does not contain all the possible values, there is some uncertainty about the population. Hence any statistics, such as mean and standard deviation, are just estimates of the true population parameters.
Descriptive Statistics
Descriptive Statistics is the branch of statistics which most people are familiar. It characterizes and summarizes the most prominent features of a given set of data (means, medians, standard deviations, percentiles, graphs, tables and charts. Descriptive Statistics describe the elements of a population as a whole or to describe data that represent just a sample of elements from the entire population
Inferential Statistics
The National Graduate School of Quality Management v7 48
Inferential Statistics
Inferential Statistics is the branch of statistics that deals with drawing conclusions about a population based on information obtained from a sample drawn from that population. While descriptive statistics has been taught for centuries, inferential statistics is a relatively new phenomenon having its roots in the 20th century. We infer something about a population when only information from a sample is known. Probability is the link between Descriptive and Inferential Statistics
The National Graduate School of Quality Management v7 49
Frequency
And the first 50 And the first 50 trials showed trials showed Head Counts Head Counts greater than 130? greater than 130?
80
79 86
90
93
100
100 107
110
114
120
121
130
128 135 142
-6
-5
-4
.003
-3
.135
-2
2.275
-1
15.87
0
50.0
1
84.1
2
97.7
3
99.86
4
99.997
Rare Occurrence
Common Occurrence
Rare Occurrence
What are some of the ways that we can easily indicate the centering characteristic of the population? Three measures have historically been used; the mean, the median and the mode.
The National Graduate School of Quality Management v7 53
mean = x =
x
n
2 = .17 12
n=12
Mean
x =
i
Median
Median Value
The National Graduate School of Quality Management v7 55
MEAN (
X =
X)
X n
-6 -5 -4 -3 -2 -1 0
2 = = .17 12
MEDIAN
(50 percentile data point) Here the median value falls between two zero values and therefore is zero. If the values were say 2 and 3 instead, the median would be 2.5.
n/2=6 Median
n=12
n/2=6
Mode = 0
MODE
(Most common value in the data set) The mode in this case is 0 with 5 occurrences within this data.
The National Graduate School of Quality Management v7 57
Mode = 0
MEDIAN AND MODE MEDIAN AND MODE Only use if you cannot use Only use if you cannot use mean. mean.
What are some of the ways that we can easily indicate the dispersion (spread) characteristic of the population? Three measures have historically been used; the range, the standard deviation and the variance.
The National Graduate School of Quality Management v7 60
Range
X =
Xi
2 12
2
= -.17
s =
2
(X
n 1
X)
61.67 = = 5.6 12 1
-6 -5 -4 -3 -2 -1 0
DATA SET X i X -5 -5-(-.17)=-4.83 -3 -3-(-.17)=-2.83 -1 -1-(-.17)=-.83 -1 0 -1-(-.17)=-.83 0 0-(-.17)=.17 0 0-(-.17)=.17 0 0-(-.17)=.17 0 0-(-.17)=.17 1 0-(-.17)=.17 3 6 4 1-(-.17)=1.17
3-(-.17)=3.17 4-(-.17)=4.17
(X
X)
(-4.83)2=23.32 (-2.83)2=8.01 (-.83)2=.69 (-.83)2=.69 (.17)2=.03 (.17)2=.03 (.17)2=.03 (.17)2=.03 (.17)2=.03 (1.17)2=1.37 (3.17)2=10.05 (4.17)2=17.39 61.67
MEASURES OF DISPERSION
ORDERED DATA SET -5 -3 -1 -1 0 0 0 0 0 1 3 6 4
Min=-5
RANGE (R)
R = X max X min = 4 ( 6) = 10
VARIANCE (s2)
-6 -5 -4 -3 -2 -1 0
DATA SET X i X -5 -5-(-.17)=-4.83 Xi 2 X = = = -.17 -3 -3-(-.17)=-2.83 n 12 -1 -1-(-.17)=-.83 -1 -1-(-.17)=-.83 0 0-(-.17)=.17 0 0 0-(-.17)=.17 0 0-(-.17)=.17 0 0-(-.17)=.17 1 0-(-.17)=.17 3 1-(-.17)=1.17 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 4 3-(-.17)=3.17 ORDERED DATA SET -5 4-(-.17)=4.17 -3 -1 -1 0 0 0 0 0 1 3 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 4
Max=4 ( X X)
2 i
(-4.83)2=23.32 (-2.83)2=8.01 (-.83)2=.69 (-.83)2=.69 (.17)2=.03 (.17)2=.03 (.17)2=.03 (.17)2=.03 (.17)2=.03 (1.17)2=1.37 (3.17)2=10.05 (4.17)2=17.39 61.67
s =
2
(X
X) 61.67 = = 5.6 n 1 12 1
2 i
s=
s2 =
5.6 = 2.37
= X =
= s =
2 2
X
N
1
i
2 3 4 5 6 7 8 9 10 Xi
(X
n 1
Xi 10 15 12 14 10 9 11 12 10 12
Xi X
Xi X
s2
The National Graduate School of Quality Management v7 64
-12
-10
-8
-6
-4
-2
10
12
The project is progressing well and you wrap it up. 6 months later you are surprised to find that the population has taken a shift.
SO WHAT HAPPENED?
All of our work was focused in a narrow time frame. All of our work was focused in a narrow time frame. Over time, other long term influences come and go Over time, other long term influences come and go which move the population and change some of its which move the population and change some of its characteristics. This is called shift and drift. characteristics. This is called shift and drift.
e im T
Original Study
Historically, this shift and drift Historically, this shift and drift primarily impacts the position of primarily impacts the position of the mean and shifts it 1.5 the mean and shifts it 1.5 from its original position. from its original position.
The National Graduate School of Quality Management v7 68
VARIATION FAMILIES
Sources of Variation Within Individual Sample Variation is present upon repeat measurements within the same sample. Piece to Piece Variation is present upon measurements of different samples collected within a short time frame. Time to Time Variation is present upon measurements collected with a significant amount of time between samples.
PPM ST C pk
500,000 460,172 420,740 382,089 344,578 308,538 274,253 241,964 211,855 184,060 158,655 135,666 115,070 96,801 80,757 66,807 54,799 44,565 0.0 0.0 0.1 0.1 0.1 0.2 0.2 0.2 0.3 0.3 0.3 0.4 0.4 0.4 0.5 0.5 0.5 0.6
PPM LT (+1.5 )
933,193 919,243 903,199 884,930 864,334 841,345 815,940 788,145 758,036 725,747 691,462 655,422 617,911 579,260 539,828 500,000 460,172 420,740
Here, you can see that the Here, you can see that the impact of this concept is impact of this concept is potentially very significant. In potentially very significant. In the short term, we have driven the short term, we have driven the defect rate down to 54,800 the defect rate down to 54,800 ppm and can expect to see ppm and can expect to see occasional long term ppm to occasional long term ppm to be as bad as 460,000 ppm. be as bad as 460,000 ppm.
Learning Objectives
Have a broad understanding of what probability distributions are and why they are important. Understand the role that probability distributions play in determining whether an event is a random occurrence or significantly different. Understand the common measures used to characterize a population central tendency and dispersion. Understand the concept of Shift & Drift. Understand the concept of significance testing.
Learning Objectives
Have a broad understanding of how probability distributions are used in improvement projects. Review the origin and use of common probability distributions.
Why do we Care?
Probability distributions are Probability distributions are necessary to: necessary to: determine whether an event is determine whether an event is significant or due to random significant or due to random chance. chance. predict the probability of specific predict the probability of specific performance given historical performance given historical characteristics. characteristics.
IMPROVEMENT ROADMAP
Uses of Probability Distributions
Common Uses
Phase 1: Measurement Characterization Phase 2: Analysis Breakthrough Strategy Phase 3: Improvement Optimization Phase 4: Control
Baselining Processes
Verifying Improvements
KEYS TO SUCCESS
Focus on understanding the use of the distributions Practice with examples wherever possible Focus on the use and context of the tool
Data points vary, but as the data accumulates, it forms a distribution which occurs naturally.
Location
Spread
Shape
4 3 2 1 0 -1 -2 -3 -4 4 3 2 1 0 -1 -2 -3 -4 4 3 2 1 0 0 1 2 0 1 2 0 1 2
Subgroup Average
3 4 5 6 7
Normal Distribution
Distribution
Name Alpha Chi Square Sum t, Student t Sample Size Nu Beta Delta Sigma Value
Statistic Meaning Significance level Probability Distribution Sum of Individual values Probability Distribution Total size of the Sample Taken Degree of Freedom Beta Risk Difference between population means Number of Standard Deviations a value Exists from the Mean
Common Uses Hypothesis Testing, DOE Confidence Intervals, Contingency Tables, Hypothesis Testing Variance Calculations Hypothesis Testing, Confidence Interval of the Mean Nearly all Functions Probability Distributions, Hypothesis Testing, DOE Sample Size Determination Sample Size Determination Probability Distributions, Process Capability, Sample Size Determinations
t n
X tCALC = s n
s12 Fcalc = 2 s2
2 ,df
(f
fa ) fe
Note that in each case, a limit has been established to determine what is random chance verses significant difference. This point is called the critical value. If the calculated value exceeds this critical value, there is very low probability (P<.05) that this is due to random chance.
The National Graduate School of Quality Management v7 83
Z TRANSFORM
-1 -1 +1 +1 -2 -2 +2 +2
+/1
= 68%
+/2 %
= 95
68.26%
-3 -3 +3 +3
95.46%
Common Test Values Common Test Values Z(1.6) = 5.5% (1 tail =.05) Z(1.6) = 5.5% (1 tail =.05) Z(2.0) = 2.5% (2 tail =.05) Z(2.0) = 2.5% (2 tail =.05)
+/3 %
= 99.7
99.73%
Attempting to manage results (Y) only causes increased costs due to rework, test and inspection Understanding and controlling the causative factors (x) is the real key to high quality at low cost...
The National Graduate School of Quality Management v7 85
Z Stat ( ,n>30) Z Stat (p) t Stat ( ,n<30) 2 Stat ( ,n<10) 2 Stat (Cp) Z Stat (n>30) Z Stat (p) t Stat (n<30) Stat (n<10)
86
These interactions form a new population which can now be used to predict future performance.
Means Add
Population means interact in a simple intuitive manner.
nw e
n e w
nw e
Variations Add
+ 22 = 1 new 2
2
Means Subtract
Population means interact in a simple intuitive manner.
nw e
2
n e
nw e
Variations Add
+ 22 = new 2
TRANSACTIONAL EXAMPLE
Orders are coming in with the following characteristics:
X = $53,000/week s = $8,000
Shipments are going out with the following characteristics:
X = $60,000/week s = $5,000
Assuming nothing changes, what percent of the time will shipments exceed orders?
The National Graduate School of Quality Management v7 90
TRANSACTIONAL EXAMPLE
Orders Shipments
To solve this problem, we must create a new distribution to model the situation posed in the problem. Since we are looking for shipments to exceed orders, the resulting distribution is created as follows:
2 2 sshipments + sorders =
The new distribution looks like this with a mean of $7000 and a standard deviation of $9434. This distribution represents the occurrences of shipments exceeding orders. To answer the original question (shipments>orders) we look for $0 on this new distribution. Any occurrence to the right of this point will represent shipments > orders. So, we need to calculate the percent of the curve that exists to the right of $0.
The National Graduate School of Quality Management v7 91
2 shipments
+s
2 orders
To calculate the percent of the curve to the right of $0 we need to convert the difference between the $0 point and $7000 into sigma intervals. Since we know every $9434 interval from the mean is one sigma, we can calculate this position as follows:
Look up .74s in the normal table and you will find .77. Therefore, the answer to the original question is that 77% of the time, shipments will exceed orders. Now, as a classroom exercise, what percent of the time will shipments exceed orders by $10,000?
MANUFACTURING EXAMPLE
2 Blocks are being assembled end to end and significant variation has been found in the overall assembly length. The blocks have the following dimensions:
Learning Objectives
Have a broad understanding of how probability distributions are used in improvement projects. Review the origin and use of common probability distributions.
CORRELATION ANALYSIS
Learning Objectives
Understand how correlation can be used to demonstrate a relationship between two factors. Know how to perform a correlation analysis and calculate the coefficient of linear correlation (r). Understand how a correlation analysis can be used in an improvement project.
Why do we Care?
Correlation Analysis is Correlation Analysis is necessary to: necessary to: show a relationship between show a relationship between two variables. This also sets the two variables. This also sets the stage for potential cause and stage for potential cause and effect. effect.
IMPROVEMENT ROADMAP
Uses of Correlation Analysis
Common Uses
Phase 1: Measurement Characterization Phase 2: Analysis Breakthrough Strategy Phase 3: Improvement Optimization Phase 4: Control
Determine and quantify the relationship between factors (x) and output characteristics (Y)..
KEYS TO SUCCESS
Always plot the data Remember: Correlation does not always imply cause & effect Use correlation as a follow up to the Fishbone Diagram Keep it simple and do not let the tool take on a life of its own
WHAT IS CORRELATION?
Output or y variable (dependent)
Correlation Correlation Y= f(x) Y= f(x) As the input variable changes, As the input variable changes, there is an influence or bias there is an influence or bias on the output variable. on the output variable.
Input or x variable (independent)
WHAT IS CORRELATION?
A measurable relationship between two variable data characteristics. Not necessarily Cause & Effect (Y=f(x)) Correlation requires paired data sets (ie (Y1,x1), (Y2,x2), etc) The input variable is called the independent variable (x or KPIV) since it is independent of any other constraints The output variable is called the dependent variable (Y or KPOV) since it is (theoretically) dependent on the value of x. The coefficient of linear correlation r is the measure of the strength of the relationship. The square of r is the percent of the response (Y) which is related to the input (x).
The National Graduate School of Quality Management v7 101
TYPES OF CORRELATION
Y=f(x) Positive x x x Strong Y=f(x) Weak Y=f(x) None
Negative
CALCULATING r
Coefficient of Linear Correlation
sxy =
(x
x)(yi y) n 1
Calculate sample covariance Calculate sample covariance s (( xy )) Calculate sxx and syyfor each data Calculate s and s for each data set set Use the calculated values to Use the calculated values to compute rrCACC .. compute CALL Add a + for positive correlation Add a + for positive correlation and -- for a negative correlation. and for a negative correlation.
sxy rCALC = sx s y
While this is the most precise method to calculate Pearsons r, there is an easier way to come up with a fairly close approximation...
APPROXIMATING r
Coefficient of Linear Correlation
Plot the data on orthogonal axis Plot the data on orthogonal axis Draw an Oval around the data Draw an Oval around the data Measure the length and width Measure the length and width of the Oval of the Oval Calculate the coefficient of Calculate the coefficient of linear correlation (r) based on linear correlation (r) based on the formulas below the formulas below
Y=f( x)
| 1
| 2
| 3
W
| 5
L
| 6
+ = positive slope
| 8 | 9 | 10 | 11 | 12
6.7
| 7
12.6
| 13
| 14
| 15
- = negative slope
Learning Objectives
Understand how correlation can be used to demonstrate a relationship between two factors. Know how to perform a correlation analysis and calculate the coefficient of linear correlation (r). Understand how a correlation analysis can be used in a blackbelt story.
Learning Objectives
Understand the concept of the Central Limit Theorem. Understand the application of the Central Limit Theorem to increase the accuracy of measurements.
Why do we Care?
The Central Limit Theorem is: The Central Limit Theorem is: the key theoretical link between the key theoretical link between the normal distribution and the normal distribution and sampling distributions. sampling distributions. the means by which almost any the means by which almost any sampling distribution, no matter sampling distribution, no matter how irregular, can be how irregular, can be approximated by a normal approximated by a normal distribution if the sample size is distribution if the sample size is large enough. large enough.
IMPROVEMENT ROADMAP
Uses of the Central Limit Theorem
Common Uses
Phase 1: Measurement Characterization Phase 2: Analysis Breakthrough Strategy Phase 3: Improvement Optimization Phase 4: Control
The Central Limit Theorem underlies all statistic techniques which rely on normality as a fundamental assumption
KEYS TO SUCCESS
Normal
Why do we Care?
What this means is that no matter what kind What this means is that no matter what kind of distribution we sample, if the sample size of distribution we sample, if the sample size is big enough, the distribution for the mean is big enough, the distribution for the mean is approximately normal. is approximately normal. This is the key link that allows us to use This is the key link that allows us to use much of the inferential statistics we have much of the inferential statistics we have been working with so far. been working with so far. This is the reason that only a few probability This is the reason that only a few probability distributions (Z, tt and 22)) have such broad distributions (Z, and have such broad application. application. If a random event happens a great many times, If a random event happens a great many times, the average results are likely to be predictable. the average results are likely to be predictable. Jacob Bernoulli Jacob Bernoulli
The National Graduate School of Quality Management v7 113
Parent Population
n=2
n=5
n=10
x sx = n
This formula is for the standard error of the mean. What that means in layman's terms is that this formula is the prime driver of the error term of the mean. Reducing this error term has a direct impact on improving the precision of our estimate of the mean.
The practical aspect of all this is that if you want to improve the precision of any test, increase the sample size. So, if you want to reduce measurement error (for example) to determine a better estimate of a true value, increase the sample size. The resulting error will be 1 reduced by a factor of . The same goes for any significance testing. n Increasing the sample size will reduce the error in a similar manner.
DICE EXERCISE
Break into 3 teams Team one will be using 2 dice Team two will be using 4 dice Team three will be using 6 dice Each team will conduct 100 throws of their dice and record the average of each throw. Plot a histogram of the resulting data. Each team presents the results in a 10 min report out.
Learning Objectives
Understand the concept of the Central Limit Theorem. Understand the application of the Central Limit Theorem to increase the accuracy of measurements.
Learning Objectives
Understand the role that process capability analysis plays in the successful completion of an improvement project. Know how to perform a process capability analysis.
Why do we Care?
Process Capability Analysis is Process Capability Analysis is necessary to: necessary to: determine the area of focus determine the area of focus which will ensure successful which will ensure successful resolution of the project. resolution of the project. benchmark a process to enable benchmark a process to enable demonstrated levels of demonstrated levels of improvement after successful improvement after successful resolution of the project. resolution of the project. demonstrate improvement after demonstrate improvement after successful resolution of the successful resolution of the project. project.
IMPROVEMENT ROADMAP
Uses of Process Capability Analysis
Common Uses
Phase 1: Measurement Characterization Phase 2: Analysis Breakthrough Strategy Phase 3: Improvement Optimization Phase 4: Control
Baselining a process primary metric (Y) prior to starting a project. Characterizing the capability of causitive factors (x). Characterizing a process primary metric after changes have been implemented to demonstrate the level of improvement.
KEYS TO SUCCESS
Must have specification limits - Use process targets if no specs available Dont get lost in the math Relate to Z for comparisons (Cpk x 3 = Z) For Attribute data use PPM conversion to Cpk and Z
Out of Spec
In Spec
Out of Spec
In Spec
Out of Spec
Probability
Probability
Probability
High Cpk
Spec Spec (Lower) (Lower) Spec Spec (Upper) (Upper)
Poor Cpk
Spec Spec (Lower) (Lower) Spec Spec (Upper) (Upper)
Process Center (|
-spec|)
Out of Spec
In Spec
Out of Spec
Out of Spec
Process Spread
( )
Out of Spec
In Spec
Distance between the population mean and the nearest spec limit (| -USL |). This distance divided by 3 is Cpk .
CPK =
Calculation Values:
Upper Spec value = $200,000 maximum No Lower Spec = historical average = $250,000 s = $20,000 Calculation: CPK = MIN ( LSL, USL ) = ($200,000 $250,000) = -.83 3 3 *$20,000
PPM ST C pk
500,000 460,172 420,740 382,089 344,578 308,538 274,253 241,964 211,855 184,060 158,655 135,666 115,070 96,801 80,757 66,807 54,799 44,565 35,930 28,716 22,750 17,864 13,903 10,724 8,198 6,210 4,661 3,467 2,555 1,866 1,350 968 687 483 337 233 159 108 72.4 48.1 31.7 0.0 0.0 0.1 0.1 0.1 0.2 0.2 0.2 0.3 0.3 0.3 0.4 0.4 0.4 0.5 0.5 0.5 0.6 0.6 0.6 0.7 0.7 0.7 0.8 0.8 0.8 0.9 0.9 0.9 1.0 1.0 1.0 1.1 1.1 1.1 1.2 1.2 1.2 1.3 1.3 1.3
PPM LT (+1.5 )
933,193 919,243 903,199 884,930 864,334 841,345 815,940 788,145 758,036 725,747 691,462 655,422 617,911 579,260 539,828 500,000 460,172 420,740 382,089 344,578 308,538 274,253 241,964 211,855 184,060 158,655 135,666 115,070 96,801 80,757 66,807 54,799 44,565 35,930 28,716 22,750 17,864 13,903 10,724 8,198 6,210
C PK =
We find that it bears a striking resemblance to the equation for Z which is: with the value - 0 0 substituted for Z CALC = MIN( -LSL,USL- ). Making this substitution, we get :
C pk =
We can now use a table similar to the one on the left to transform either Z or the associated PPM to an equivalent Cpk value.
So, if we have a process which has a short term So, if we have a process which has a short term PPM=136,666 we find that the equivalent Z=1.1 and PPM=136,666 we find that the equivalent Z=1.1 and Cpk=0.4 from the table. Cpk=0.4 from the table.
The National Graduate School of Quality Management v7 127
Learning Objectives
Understand the role that process capability analysis plays in the successful completion of an improvement project. Know how to perform a process capability analysis.
MULTI-VARI ANALYSIS
Learning Objectives
Understand how to use multi-vari charts in completing an improvment project. Know how to properly gather data to construct multi-vari charts. Know how to construct a multi-vari chart. Know how to interpret a multi-vari chart.
Why do we Care?
Multi-Vari charts are a: Multi-Vari charts are a: Simple, yet powerful way to Simple, yet powerful way to significantly reduce the number significantly reduce the number of potential factors which could of potential factors which could be impacting your primary be impacting your primary metric. metric. Quick and efficient method to Quick and efficient method to significantly reduce the time and significantly reduce the time and resources required to determine resources required to determine the primary components of the primary components of variation. variation.
IMPROVEMENT ROADMAP
Uses of Multi-Vari Charts
Common Uses
Phase 1: Measurement Characterization Phase 2: Analysis Breakthrough Strategy Phase 3: Improvement Optimization Phase 4: Control
KEYS TO SUCCESS
Careful planning before you start Gather data by systematically sampling the existing process Perform ad hoc training on the tool for the team prior to use Ensure your sampling plan is complete prior to gathering data Have team members (or yourself) do the sampling to avoid bias
supplier
tooling
humidity technique
The Goal of the logical search is to narrow down to 5-6 key variables !
The National Graduate School of Quality Management v7 134
Im thinking of a word in this book? Can you figure out what it is?
Is it a Noun?
Websters Dictionary
Websters Dictionary
How many guesses do you think it will take to find a single word in the text book?
Statistically it should take no more than 17 guesses 217 =2X2X2X2X2X2X2X2X2X2X2X2X2X2X2X2X2= 131,072 Most Unabridged dictionaries have 127,000 words.
Determine the possible families of variation. Determine how you will take the samples. Take a stratified sample (in order of creation). DO NOT take random samples. Take a minimum of 3 samples per group and 3 groups. The samples must represent the full range of the process. Does one sample or do just a few samples stand out? There could be a main effect or an interaction at the cause.
The National Graduate School of Quality Management v7 138
(Machining)
Transactional
(Order Rate)
Piece to Piece Customer Differences Order Editor Sales Office Sales Rep
Time to Time Seasonal Variation Management Changes Economic Shifts Interest Rate
Step 2 Step 2
Range between two sample averages
Sample 1
Plot the first sample range with a point for the maximum reading obtained, and a point for the minimum reading. Connect the points and plot a third point at the average of the within sample readings
Step 3 Step 3
Sample 2
Sample 3
Time 1
Time 2 Time 3
Plot the sample ranges for the remaining piece to piece data. Connect the averages of the within sample readings.
Within Piece Characterized by large variation in readings taken of the same single sample, often from different positions within the sample.
Piece to Piece Characterized by large variation in readings taken between samples taken within a short time frame.
Time to Time Characterized by large variation in readings taken between samples taken in groups with a significant amount of time elapsed between groups.
MULTI-VARI EXERCISE
We have a part dimension which is considered to be impossible to manufacture. A capability study seems to confirm that the process is operating with a Cpk =0 (500,000 ppm). You and your team decide to use a Multi-Vari chart to localize the potential sources of variation. You have gathered the following data: Construct a multiConstruct a multivari chart of the data vari chart of the data and interpret the and interpret the results. results.
Sample 1 2 3 4 5 6 7 8 9
9
Day/ Time 1/0900 1/0905 1/0910 2/1250 2/1255 2/1300 3/1600 3/1605 3/1610
Beginning Middle of of Part Part .015 .017 .010 .012 .013 .015 .014 .015 .009 .012 .012 .014 .013 .014 .010 .013 .011 .014
End of Part .018 .015 .016 .018 .017 .016 .017 .015 .017
Learning Objectives
Understand how to use multi-vari charts in completing an improvment project. Know how to properly gather data to construct multi-vari charts. Know how to construct a multi-vari chart. Know how to interpret a multi-vari chart.
Learning Objectives
Understand the critical role having the right sample size has on an analysis or study. Know how to determine the correct sample size for a specific study. Understand the limitations of different data types on sample size.
Why do we Care?
The correct sample size is The correct sample size is necessary to: necessary to: ensure any tests you design ensure any tests you design have a high probability of have a high probability of success. success. properly utilize the type of data properly utilize the type of data you have chosen or are limited to you have chosen or are limited to working with. working with.
IMPROVEMENT ROADMAP
Uses of Sample Size Considerations
Common Uses
Phase 1: Measurement Characterization Phase 2: Analysis Breakthrough Strategy Phase 3: Improvement Optimization Phase 4: Control
Sample Size considerations are used in any situation where a sample is being used to infer a population characteristic.
KEYS TO SUCCESS
Use variable data wherever possible Generally, more samples are better in any study When there is any doubt, calculate the needed sample size Use the provided excel spreadsheet to ease sample size calculations
CONFIDENCE INTERVALS
The possibility of error exists in almost every system. This goes for point values as well. While we report a specific value, that value only represents our best estimate from the data at hand. The best way to think about this is to use the form: true value = point estimate +/- error The error around the point value follows one of several common probability distributions. As you have seen so far, we can increase our confidence is to go further and further out on the tails of this distribution. This error band which exists around the point estimate is called the confidence interval.
Point Value
Reality
Not different (Ho) Different (H1) Type II Error risk Consumer Risk Correct Conclusion Not different (Ho)
Test Decision
Correct Conclusion
Different (H1)
Test
Reality = Different
Decision Point
Risk Risk
The National Graduate School of Quality Management v7 151
How confident do you want to be that you have made the right decision?
A person does not feel well and checks into a hospital for tests.
Reality
Not different (Ho) Different (H1) Type II Error risk Consumer Risk Correct Conclusion Not different (Ho)
Test Decision
Correct Conclusion
Different (H1)
Error Impact Type I Error = Treating a patient who is not sick Type II Error = Not treating a sick patient
The National Graduate School of Quality Management v7 152
A change is made to the sales force to save costs. Did it adversely impact A change is made to the sales force to save costs. Did it adversely impact the order receipt rate? the order receipt rate?
Reality
Not different (Ho) Different (H1) Type II Error risk Consumer Risk Correct Conclusion Not different (Ho)
Test Decision
Correct Conclusion
Different (H1)
Error Impact Type I Error = Unnecessary costs Type II Error = Long term loss of sales
The National Graduate School of Quality Management v7 153
Mean Mean
X + t a / 2 ,n 1
n 1 n 1 s 12a / 2 a2/ 2
These individual formulas are not critical at this point, but notice that the only These individual formulas are not critical at this point, but notice that the only opportunity for decreasing the error band (confidence interval) without decreasing the opportunity for decreasing the error band (confidence interval) without decreasing the confidence factor, is to increase the sample size. confidence factor, is to increase the sample size.
The National Graduate School of Quality Management v7 154
Mean Mean
Za /2 n= X
2
2 n = a /2 + 1 s Z / 2 n = p(1 p) E
2
Allowable error = /s
Allowable error = E
Calculation Values:
Average tells you to use the mean formula Significance: = 5% (95% confident) Z /2 = Z.025 = 1.96 s=10 pounds -x = error allowed = 2 pounds 2
Z / 2 196 *10 . = n= = 97 2 X
Calculation Values:
Percent defective tells you to use the percent defect formula Significance: = 5% (95% confident) Z /2 = Z.025 = 1.96 p = 10% = .1 E = 1% = .01
Calculation:
Learning Objectives
Understand the critical role having the right sample size has on an analysis or study. Know how to determine the correct sample size for a specific study. Understand the limitations of different data types on sample size.
CONFIDENCE INTERVALS
Learning Objectives
Understand the concept of the confidence interval and how it impacts an analysis or study. Know how to determine the confidence interval for a specific point value. Know how to use the confidence interval to test future point values for significant change.
Why do we Care?
Understanding the confidence Understanding the confidence interval is key to: interval is key to: understanding the limitations of understanding the limitations of quotes in point estimate data. quotes in point estimate data. being able to quickly and being able to quickly and efficiently screen a series of efficiently screen a series of point estimate data for point estimate data for significance. significance.
IMPROVEMENT ROADMAP
Uses of Confidence Intervals
Common Uses
Phase 1: Measurement Characterization Phase 2: Analysis Breakthrough Strategy Phase 3: Improvement Optimization Phase 4: Control
KEYS TO SUCCESS
Use variable data wherever possible Generally, more samples are better (limited only by cost) Recalculate confidence intervals frequently Use an excel spreadsheet to ease calculations
All of our work was focused in a narrow time frame. All of our work was focused in a narrow time frame. Over time, other long term influences come and go Over time, other long term influences come and go which move the population and change some of its which move the population and change some of its characteristics. characteristics.
e Tim
Original Study
Confidence Intervals give us Confidence Intervals give us the tool to allow us to be able to the tool to allow us to be able to sort the significant changes from sort the significant changes from the insignificant. the insignificant.
The National Graduate School of Quality Management v7 166
TIME
2 3 4 5 6 7
Significant Change?
WHAT KIND OF PROBLEM DO YOU HAVE? WHAT KIND OF PROBLEM DO YOU HAVE? Analysis for a significant change asks the question What happened to Analysis for a significant change asks the question What happened to make this significantly different from the rest? make this significantly different from the rest? Analysis for a series of random events focuses on the process and asks Analysis for a series of random events focuses on the process and asks the question What is designed into this process which causes it to have the question What is designed into this process which causes it to have this characteristic?. this characteristic?.
The National Graduate School of Quality Management v7 167
Mean Mean
X t a / 2 ,n 1
s
X + t a / 2 ,n 1
s
n
n 1 n 1 s 12a / 2 a2/ 2
These individual formulas enable us to calculate the confidence interval for many of These individual formulas enable us to calculate the confidence interval for many of the most common metrics. the most common metrics.
The National Graduate School of Quality Management v7 168
Calculation:
p Za / 2
.014 1.96
Answer: Yes, .023 is significantly .outside .of the expected 95% confidence interval of . 012 p 016
012 to .016.
p(1 p) p p + Za / 2 n
p(1 p) n
Learning Objectives
Understand the concept of the confidence interval and how it impacts an analysis or study. Know how to determine the confidence interval for a specific point value. Know how to use the confidence interval to test future point values for significant change.
CONTROL CHARTS
Learning Objectives
Understand how to select the correct control chart for an application. Know how to fill out and maintain a control chart. Know how to interpret a control chart to determine the occurrence of special causes of variation.
Why do we Care?
Control charts are useful to: Control charts are useful to: determine the occurrence of determine the occurrence of special cause situations. special cause situations. Utilize the opportunities Utilize the opportunities presented by special cause presented by special cause situations to identify and correct situations to identify and correct the occurrence of the special the occurrence of the special causes .. causes
IMPROVEMENT ROADMAP
Uses of Control Charts
Common Uses
Phase 1: Measurement Characterization Phase 2: Analysis Breakthrough Strategy Phase 3: Improvement Optimization Phase 4: Control
Control charts can be effectively used to determine special cause situations in the Measurement and Analysis phases
KEYS TO SUCCESS
Use control charts on only a few critical output characteristics Ensure that you have the means to investigate any special cause
Point Value
X-s Chart
X-R Chart
yes
no
np Chart
p Chart
c Chart
u Chart
AR LCL = X A R
UCL = X +
2 2
n 2 3 4 5 6 7 8 9 D4 3.27 2.57 2.28 2.11 2.00 1.92 1.86 1.82 D3 0 0 0 0 0 0.08 0.14 0.18
DR LCL = D R
UCL =
4
3
A2 1.88 1.02 0.73 0.58 0.48 0.42 0.37 0.34
R = average of the subgroup range values = a constant function of subgroup size (n)
2
Note: P charts have an individually calculated control limit for each point plotted
P = number of rejects in the subgroup/number inspected in subgroup P = total number of rejects/total number inspected n = number inspected in subgroup
The National Graduate School of Quality Management v7 182
Note: U charts have an individually calculated control limit for each point plotted
C= total number of nonconformities/total number of subgroups U= total number of nonconformities/total units evaluated n = number evaluated in subgroup
The National Graduate School of Quality Management v7 183
One or more points fall outside of the control limits When you divide the chart into zones as shown and:
2 out of 3 points on the same side of the centerline in Zone A 4 out of 5 points on the same side of the centerline in Zone A or B 9 successive points on one side of the centerline 6 successive points successively increasing or decreasing 14 points successively alternating up and down 15 points in a row within Zone C (above and/or below centerline)
Learning Objectives
Understand how to select the correct control chart for an application. Know how to fill out and maintain a control chart. Know how to interpret a control chart to determine out of control situations.
HYPOTHESIS TESTING
Learning Objectives
Understand the role that hypothesis testing plays in an improvement project. Know how to perform a two sample hypothesis test. Know how to perform a hypothesis test to compare a sample statistic to a target value. Know how to interpret a hypothesis test.
Why do we Care?
Hypothesis testing is Hypothesis testing is necessary to: necessary to: determine when there is a determine when there is a significant difference between significant difference between two sample populations. two sample populations. determine whether there is a determine whether there is a significant difference between a significant difference between a sample population and a target sample population and a target value. value.
IMPROVEMENT ROADMAP
Uses of Hypothesis Testing
Common Uses
Phase 1: Measurement Characterization Phase 2: Analysis Breakthrough Strategy Phase 3: Improvement Optimization Phase 4: Control
Confirm sources of variation to determine causative factors (x). Demonstrate a statistically significant difference between baseline data and data taken after improvements were implemented.
KEYS TO SUCCESS
Use hypothesis testing to explore the data Use existing data wherever possible Use the teams experience to direct the testing Trust but verify.hypothesis testing is the verify If theres any doubt, find a way to hypothesis test it
Statistic Value We determine a critical value from a probability table for the statistic. This We determine a critical value from a probability table for the statistic. This value is compared with the calculated value we get from our data. If the value is compared with the calculated value we get from our data. If the calculated value exceeds the critical value, the probability of this occurrence calculated value exceeds the critical value, the probability of this occurrence happening due to random variation is less than our test .. happening due to random variation is less than our test
The National Graduate School of Quality Management v7 193
Hypothesis
Null
Symbol
Ho
What it means
Data does not support conclusion that there is a significant difference Data supports conclusion that there is a significant difference
Alternative
H1
Test Used Z Stat (n>30) Z Stat (p) t Stat (n<30) Stat (n<10)
Rare Occurrence
Rare Occurrence
Common Occurrence
Rare Occurrence
1 Probability
Probability
/2 Probability
1 Probability
/2 Probability
n1= n2 20
2 Sample Tau 2 X1 X 2 = R1 + R 2
n1 + n2 30
2 Sample t (DF: n1+n2-2)
n1 + n2 > 30
2 Sample Z
dCALC
t=
X1 X 2
1 1 + n1 n2
2 2 [( n 1) s1 + ( n 1) s2] n1 + n2 2
ZCALC =
X1 X 2 2 12 21 + n1 n2
Use these formulas to calculate the actual statistic for comparison with the critical (table) statistic. Note that the only major determinate here is the sample sizes. It should make sense to utilize the simpler tests (2 Sample Tau) wherever possible unless you have a statistical software package available or enjoy the challenge.
Statistic Summary:
n1 = n2 = 5 Significance: /2 = .025 (2 tail) taucrit = .613 (From the table for = .025 & n=5)
Calculation:
R1=337, R2= 577 X1=2868, X2=2896
dCALC
X1 X 2
R1 + R2
Receipts 1 Receipts 2 3067 3200 2730 2777 2840 2623 2913 3044 2789 2834
tauCALC =2(2868-2896)/(337+577)=|.06|
Test:
Ho: tauCALC < tauCRIT Ho: .06<.613 = true? (yes, therefore we will fail to reject the null hypothesis).
Conclusion:
Fail to reject the null hypothesis (ie. The data does not support the conclusion that there is a significant difference)
The National Graduate School of Quality Management v7 199
Statistic Summary:
n1 = n2 = 5 DF=n1 + n2 - 2 = 8 Significance: /2 = .025 (2 tail) tcrit = 2.306 (From the table for =.025 and 8 DF)
Calculation:
s1=130, s2= 227 X1=2868, X2=2896 tCALC =(2868-2896)/.63*185=|.24|
Receipts 1 Receipts 2 3067 3200 2730 2777 2840 2623 2913 3044 2789 2834
X1 X 2
2 [( n 1) s12 + ( n 1) s2] n1 + n2 2
t=
Test:
Ho: tCALC < tCRIT
1 1 + n1 n2
Ho: .24 < 2.306 = true? (yes, therefore we will fail to reject the null hypothesis).
Conclusion:
Fail to reject the null hypothesis (ie. The data does not support the conclusion that there is a significant difference
The National Graduate School of Quality Management v7 200
n 20
1 Sample Tau
n 30
1 Sample t (DF: n-1)
0
n > 30
1 Sample Z
1CALC =
X
R
t CALC =
X 0 s2 n
ZCALC =
X 0 s2 n
Use these formulas to calculate the actual statistic for comparison with the critical (table) statistic. Note that the only major determinate again here is the sample size. Here again, it should make sense to utilize the simpler test (1 Sample Tau) wherever possible unless you have a statistical software package available (minitab) or enjoy the pain.
The National Graduate School of Quality Management v7 201
Range Test
Use these formulas to calculate the actual statistic for comparison with the critical (table) statistic. Note that the only major determinate again here is the sample size. Here again, it should make sense to utilize the simpler test (Range Test) wherever possible unless you have a statistical software package available.
The National Graduate School of Quality Management v7 202
Calculation:
R1=337, R2= 577 FCALC =577/337=1.7
Test:
Ho: FCALC < FCRIT
Receipts 1 Receipts 2 3067 3200 2730 2777 2840 2623 2913 3044 2789 2834
Ho: 1.7 < 3.25 = true? (yes, therefore we will fail to reject the null hypothesis).
Conclusion:
Fail to reject the null hypothesis (ie. cant say there is a significant difference)
The National Graduate School of Quality Management v7 203
n1 > 30, n2 > 30 Compare to target (p0) Compare two populations (p1 & p2)
ZCALC =
p0 (1 p0 ) n
p1 p0
ZCALC =
p1 p2 n1 p1 _ n2 p2 n p _ n p 1 1 1 1 1 2 2 + n1 + n2 n1 n2 n1 + n2
Use these formulas to calculate the actual statistic for comparison with the critical (table) statistic. Note that in both cases the individual samples should be greater than 30.
Learning Objectives
Understand the role that hypothesis testing plays in an improvement project. Know how to perform a two sample hypothesis test. Know how to perform a hypothesis test to compare a sample statistic to a target value. Know how to interpret a hypothesis test.
Learning Objectives
Understand the role that ANOVA plays in problem solving tools & methodology. Understand the fundamental assumptions of ANOVA. Know how to perform an ANOVA to identify sources of variation & assess their significance. Know how to interpret an ANOVA.
Why do we Care?
Anova is a powerful method for Anova is a powerful method for analyzing process variation: analyzing process variation: Used when comparing two or Used when comparing two or more process means. more process means. Estimate the relative effect of the Estimate the relative effect of the input variables on the output input variables on the output variable. variable.
IMPROVEMENT ROADMAP
Common Uses
Phase 1: Measurement Characterization Phase 2: Analysis Breakthrough Strategy Phase 3: Improvement Optimization Phase 4: Control
Hypothesis Testing
KEYS TO SUCCESS
Dont be afraid of the math Make sure the assumptions of the method are met Use validated data & your process experts to identify key variables. Carefully plan data collection and experiments Use the teams experience to direct the testing
L ite r s
P e r H r B y
F o r m u la t i o n
G r o u p M e a n s a r e In d i c a t e d b y L i n e s 2 5
3 Groups = 3 Treatments
2 0
n
1 5
{
1
L i te r s P e r H r
Ove ra ll M e a n
F Lets say we run a study where we o r m u l a t i o n groups which we are evaluating for a have three significant difference in the means. Each one of these groups is called a treatment and represents one unique set of experimental conditions. Within each treatments, we have seven values which are called repetitions. The National Graduate School of Quality Management v7 212
L ite r s
P e r H r B y
F o r m u la t i o n
G r o u p M e a n s a r e In d i c a t e d b y L i n e s 2 5
3 Groups = 3 Treatments
2 0
n
1 5
{
1
L i te r s P e r H r
Ove ra ll M e a n
o compare The basic concept of ANOVA isFtor m u l a t i o n the variation between the treatments with the variation within each of the treatments. If the variation between the treatments is statistically significant when compared with the noise of the variation within the treatments, we can reject the null hypothesis. The National Graduate School of Quality Management v7 213
( y
a n i =1 j =1
ij
- y
= n ( yi - y )
i =1
( y
a n i =1 j =1
ij
- yi
SSTotal = SSTreatments
(Between Treatments)
+ SSError
(Within Treatments)
(Variation Between Treatments) (Variation of Noise) Note: a = # of treatments (i) n = # of repetitions within each treatment (j)
The National Graduate School of Quality Management v7 214
FCalculated
FCritical
e Between e Within
M Between S M Within S
, D Between a F
, D W in F ith
The National Graduate School of Quality Management v7 215
DETERMINING SIGNIFICANCE
F Critical Value
Ho : All All
s = 0 s equal
not = 0 is different
(Null hypothesis)
StatisticCALC < StatisticCRIT StatisticCALC < StatisticCRIT
(Alternate Hypothesis)
StatisticCALC > StatisticCRIT StatisticCALC > StatisticCRIT
Statistic Value We determine a critical value from a probability table. This value is compared with the calculated value. If the calculated value exceeds the critical value, we will reject the null hypothesis (concluding that a significant difference exists between 2 or more of the treatment means.
The National Graduate School of Quality Management v7 216
Treatment 1 19 18 21 16 18 20 14
Treatment 2 20 15 21 19 17 22 19
Treatment 3 23 20 25 22 18 24 22
Here we have an example of three different fuel formulations that are expected to show a significant difference in average fuel consumption. Is there a significant difference? Lets use ANOVA to test our hypothesis..
Class Example -- Evaluate 3 Fuel Formulations Is there a difference? Step 1: Calculating the Mean Squares Between
Step #1 = Calculate the Mean Squares Between Value Treatment 1 19 18 21 16 18 20 14 Treatment 2 20 15 21 19 17 22 19 Treatment 3 23 20 25 22 18 24 22 Treatment 4
Sum Each of the Columns Find the Average of each Column (Ac) Calculate the Overall Average (Ao) 2 Find the Difference between the Average and Overall Average Squared (Ac-Ao) Find the Sum of Squares Between by adding up the Differences Squared (SSb) and multiplying by the number of samples (replicates) within each treatment. Calculate the Degrees of Freedom Between (DFb=# treatments -1) Calculate the Mean Squares Between (SSb/DFb)
0 19.7 0.0
The first step is to come up with the Mean Squares Between. To accomplish this we: find the average of each treatment and the overall average find the difference between the two values for each treatment and square it sum the squares and multiply by the number of replicates in each treatment divide this resulting value by the degrees of freedom
The National Graduate School of Quality Management v7 218
Class Example -- Evaluate 3 Fuel Formulations Is there a difference? Step 2: Calculating the Mean Squares Within
Step #2 = Calculate the Mean Squares Within Value Treatment # Data (Xi) Treatment Average (Ao) 1 19 18.0 1 18 18.0 1 21 18.0 1 16 18.0 1 18 18.0 1 20 18.0 1 14 18.0 1 0 18.0 1 0 18.0 1 0 18.0 2 20 19.0 2 15 19.0 2 21 19.0 2 19 19.0 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 17 22 19 0 0 0 23 20 25 22 18 24 22 0 0 0 19.0 19.0 19.0 19.0 19.0 19.0 22.0 22.0 22.0 22.0 22.0 22.0 22.0 22.0 22.0 22.0 Difference (Xi-Ao) Difference (Xi-Ao) 1.0 1.0 0.0 0.0 3.0 9.0 -2.0 4.0 0.0 0.0 2.0 4.0 -4.0 16.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 1.0 -4.0 16.0 2.0 4.0 0.0 0.0 4.0 9.0 0.0 0.0 0.0 0.0 1.0 4.0 9.0 0.0 16.0 4.0 0.0 0.0 0.0 0.0 102.0 18 5.7
2
-2.0 3.0 0.0 0.0 0.0 0.0 1.0 -2.0 3.0 0.0 -4.0 2.0 0.0 0.0 0.0 0.0 Sum of Squares Within (SSw) Degrees of Freedom Within (DFw = (# of treatments)(samples -1) Mean Squares Within (MSw=SSw/DFw)
The next step is to come up with the Mean Squares Within. To accomplish this we: square the difference between each individual within a treatment and the average of that treatment sum the squares for each treatment divide this resulting value by the degrees of freedom
4.56
TRUE
The remaining steps to complete the analysis are: find the calculated F value by dividing the Mean Squares Between by the Mean Squares Within Determine the critical value from the F table Compare the calculated F value with the critical value determined from the table and draw our conclusions.
Learning Objectives
Understand the role that ANOVA plays in problem solving tools & methodology. Understand the fundamental assumptions of ANOVA. Know how to perform an ANOVA to identify sources of variation & assess their significance. Know how to interpret an ANOVA.
Learning Objectives
Understand how to use a contingency table to support an improvement project. Understand the enabling conditions that determine when to use a contingency table. Understand how to construct a contingency table. Understand how to interpret a contingency table.
Why do we Care?
Contingency tables are helpful Contingency tables are helpful to: to: Perform statistical significance Perform statistical significance testing on count or attribute data. testing on count or attribute data. Allow comparison of more than Allow comparison of more than one subset of data to help one subset of data to help localize KPIV factors. localize KPIV factors.
IMPROVEMENT ROADMAP
Uses of Contingency Tables
Common Uses
Phase 1: Measurement Characterization Phase 2: Analysis Breakthrough Strategy Phase 3: Improvement Optimization Phase 4: Control
Confirm sources of variation to determine causative factors (x). Demonstrate a statistically significant difference between baseline data and data taken after improvements were implemented.
KEYS TO SUCCESS
Conduct ad hoc training for your team prior to using the tool Gather data, use the tool to test and then move on. Use historical data where ever possible Keep it simple Ensure that there are a minimum of 5 units in each cell
No significant differences Ho
CI R
LSL
USL
CL AC
We ensure that we include both good and bad situations so that the sum of each column is the total opportunities which have been grouped into good and bad. We then gather enough data to ensure that each cell will have a count of at least 5. Fill in the data.
Orders On Time Orders Late Vendor A 25 7 Vendor B 58 9 Vendor C 12 5 Vendor A 25 7 32 Vendor B 58 9 67 Vendor C 12 5 17 Total 95 21 116
Sincebar #3 does not meet the 5 or more criteria, Since bar #3 does not meet the 5 or more criteria, we do not have enough data to evaluate that particular we do not have enough data to evaluate that particular Bar #3 Collapsed cell for Bar #3. This means that we must combine the cell for Bar #3. This means that we must combine the into Bar #2 data with that of another bar to ensure that we have data with that of another bar to ensure that we have significance. This is referred to as collapsing the significance. This is referred to as collapsing the table. The resulting collapsed table looks like the table. The resulting collapsed table looks like the following: following:
Won Money Lost Money Bar #1 5 7 Bar #2&3 9 13
Wecan now proceed to evaluate our hypothesis. We can now proceed to evaluate our hypothesis. Note that the data between Bar #2 and Bar #3 will be Note that the data between Bar #2 and Bar #3 will be aliased and therefore can not be evaluated separately. aliased and therefore can not be evaluated separately.
The National Graduate School of Quality Management v7 229
.The row percentage times the column total gives us the expected occurrence for each cell based on the percentages. For example, for the Orders on time row, .82 x32=26 for the first cell and .82 x 67= 55 for the second cell.
Orders On Time Orders Late Total Actual Occurrences Vendor A Vendor B Vendor C 25 58 12 7 9 5 32 67 17 Expected Occurrences Vendor A Vendor B Vendor C 26 55 Total Portion 95 0.82 21 0.18 116 1.00
Now we need to calculate the 2 value for the data. This is done using the formula (a-e)2/e (where a is the actual count and e is the expected count) for each cell. So, the 2 value for the first column would be (25-26)2/26=.04. Filling in the remaining 2 values we get:
Orders On Time Orders Late Column Total Actual Occurrences Vendor A Vendor B Vendor C 25 58 12 7 9 5 32 67 17 Calculations ( =(a-e) /e) Vendor A Vendor B Vendor C Orders On Time Orders Late
(25-26) /26 (7-6) /6
2 2
Calculated Values Vendor A Vendor B Vendor C 0.04 0.18 0.27 0.25 0.81 1.20
Actual Occurrences Machine 1 Machine 2 Machine 3 100 350 900 15 18 23 Actual Occurrences Shift 1 Shift 2 Shift 3 500 400 450 20 19 17
The National Graduate School of Quality Management v7 233
Learning Objectives
Understand how to use a contingency table to support an improvement project. Understand the enabling conditions that determine when to use a contingency table. Understand how to construct a contingency table. Understand how to interpret a contingency table.
Learning Objectives
Have a broad understanding of the role that design of experiments (DOE) plays in the successful completion of an improvement project. Understand how to construct a design of experiments. Understand how to analyze a design of experiments. Understand how to interpret the results of a design of experiments.
The National Graduate School of Quality Management v7 236
Why do we Care?
Design of Experiments is Design of Experiments is particularly useful to: particularly useful to: evaluate interactions between 2 evaluate interactions between 2 or more KPIVs and their impact or more KPIVs and their impact on one or more KPOVs. on one or more KPOVs. optimize values for KPIVs to optimize values for KPIVs to determine the optimum output determine the optimum output from a process. from a process.
IMPROVEMENT ROADMAP
Uses of Design of Experiments
Phase 1: Measurement Characterization Phase 2: Analysis Breakthrough Strategy Phase 3: Improvement Optimization Phase 4: Control
Verify the relationship between KPIVs and KPOVs by manipulating the KPIV and observing the corresponding KPOV change. Determine the best KPIV settings to optimize the KPOV output.
KEYS TO SUCCESS
Keep it simple until you become comfortable with the tool Statistical software helps tremendously with the calculations Measurement system analysis should be completed on KPIV/KPOV(s) Even the most clever analysis will not rescue a poorly planned experiment Dont be afraid to ask for help until you are comfortable with the tool Ensure a detailed test plan is written and followed
A design of experiment introduces A design of experiment introduces purposeful changes in KPIVs, so purposeful changes in KPIVs, so that we can methodically observe that we can methodically observe the corresponding response in the the corresponding response in the associated KPOVs. associated KPOVs.
The National Graduate School of Quality Management v7 240
Variables Input, Controllable (KPIV) Input, Uncontrollable (Noise) Output, Controllable (KPOV)
How do you know how How do you know how much a suspected KPIV much a suspected KPIV actually influences a actually influences a KPOV? You test it! KPOV? You test it!
The National Graduate School of Quality Management v7 241
4 IV
4 factors evaluated
Resolution IV
Mathematical objects are sometimes as Mathematical objects are sometimes as peculiar as the most exotic beast or bird, peculiar as the most exotic beast or bird, and the time spent in examining them may and the time spent in examining them may be well employed. be well employed. H. Steinhaus H. Steinhaus
Main Effects - Factors (KPIV) which directly impact output Interactions - Multiple factors which together have more impact on process output than any factor individually. Factors - Individual Key Process Input Variables (KPIV) Levels - Multiple conditions which a factor is set at for experimental purposes Aliasing - Degree to which an output cannot be clearly associated with an input condition due to test design. Resolution - Degree of aliasing in an experimental design
The National Graduate School of Quality Management v7 242
23=8
22=4
21=2
20=1
One aspect which is critical to the design is that they be balanced. A balanced design has an equal number of levels represented for each KPIV. We can confirm this in the design on the right by adding up the number of + and - marks in each column. We see that in each case, they equal 4 + and 4- values, therefore the design is balanced.
Yates algorithm is a quick and easy way (honest, trust me) to ensure that we get a Yates algorithm is a quick and easy way (honest, trust me) to ensure that we get a balanced design whenever we are building a full factorial DOE. Notice that the balanced design whenever we are building a full factorial DOE. Notice that the number of treatments (unique test mixes of KPIVs) is equal to 233 or 8. number of treatments (unique test mixes of KPIVs) is equal to 2 or 8. Notice that in the A factor column, we have 4 + in a row and then 4 -- in a row. This Notice that in the A factor column, we have 4 + in a row and then 4 in a row. This is equal to a group of 222 or 4. Also notice that the grouping in the next column is 211or is equal to a group of 2 or 4. Also notice that the grouping in the next column is 2 or 2 + values and 2 -- values repeated until all 8 treatments are accounted for. 2 + values and 2 values repeated until all 8 treatments are accounted for. Repeat this pattern for the remaining factors. Repeat this pattern for the remaining factors.
The National Graduate School of Quality Management v7 244
Factorial
B
AB
AC
BC
ABC
+ + + + -
+ + + + -
+ + + + -
+ + + +
+ + + +
+ + + +
+ + + + -
Now we can add the columns that reflect the interactions. Remember that the interactions are the main reason we use a DOE over a simple hypothesis test. The DOE is the best tool to study mix types of problems.
You can see from the example above we have added additional columns for each of You can see from the example above we have added additional columns for each of the ways that we can mix the 3 factors which are under study. These are our the ways that we can mix the 3 factors which are under study. These are our interactions. The sign that goes into the various treatment boxes for these interactions. The sign that goes into the various treatment boxes for these interactions is the algebraic product of the main effects treatments. For example, interactions is the algebraic product of the main effects treatments. For example, treatment 7 for interaction AB is (- x -- = +), so we put a plus in the box. So, in these treatment 7 for interaction AB is (- x = +), so we put a plus in the box. So, in these calculations, the following apply: calculations, the following apply: minus (-) times minus (-) = plus (+) minus (-) times minus (-) = plus (+) minus (-) times plus (+) = minus (-) minus (-) times plus (+) = minus (-) plus (+) times plus (+) = plus (+) plus (+) times plus (+) = plus (+) plus (+) times minus (-) = minus (-) plus (+) times minus (-) = minus (-)
Use the Yates algorithm to design the experiment. Use the Yates algorithm to design the experiment.
+ + + + -
+ + + + -
+ + + + -
+ + + +
+ + + +
+ + + +
+ + + + -
Analysis of a DOE
Calculate the average output for each treatment. Place the average for each treatment after the sign (+ or -) in each cell.
Treatment 1 2 3 4 5 6 7 8 A B C AB AC BC ABC AVG 18 12 6 9 3 3 4 8 RUN1 RUN2 RUN3 18 12 6 9 3 3 4 8
+18 +12 +6 +9 -3 -3 -4 -8
+ + + + -
+ + + + -
+ + + +
+ + + +
+ + + +
+ + + + -
Analysis of a DOE
Add up the values in each column and put the result under the appropriate column. This is the total estimated effect of the factor or combination of factors. Divide the total estimated effect of each column by 1/2 the total number of treatments. This is the average estimated effect.
Treatment 1 2 3 4 5 6 7 8 SUM AVG A B C AB AC BC ABC AVG 18 12 6 9 3 3 4 8 63
+18 +12 +6 +9 -3 -3 -4 -8
27 6.75
+18 +12 -6 -9 +3 +3 -4 -8
9 2.25
+18 -12 +6 -9 +3 -3 +4 -8
-1 -0.25
+18 +12 -6 -9 -3 -3 +4 +8
21 5.25
+18 -12 +6 -9 -3 +3 -4 +8
7 1.75
+18 -12 -6 +9 +3 -3 -4 +8
13 3.25
+18 -12 -6 +9 -3 +3 +4 -8
5 1.25
Analysis of a DOE
These averages represent the average difference between the factor levels represented by the column. So, in the case of factor A, the average difference in the result output between the + level and the level is 6.75. We can now determine the factors (or combination of factors) which have the greatest impact on the output by looking for the magnitude of the respective averages (i.e., ignore the sign).
AB
AC
BC
ABC
AVG 18 12 6 9 3 3 4 8 63
+18 +12 +6 +9 -3 -3 -4 -8
27 6.75
+18 +12 -6 -9 +3 +3 -4 -8
9 2.25
+18 -12 +6 -9 +3 -3 +4 -8
-1 -0.25
+18 +12 -6 -9 -3 -3 +4 +8
21 5.25
+18 -12 +6 -9 -3 +3 -4 +8
7 1.75
+18 -12 -6 +9 +3 -3 -4 +8
13 3.25
+18 -12 -6 +9 -3 +3 +4 -8
5 1.25
This means that the This means that the impact is in the impact is in the following order: following order: A (6.75) A (6.75) AB (5.25) AB (5.25) BC (3.25) BC (3.25) B (2.25) B (2.25) AC (1.75) AC (1.75) ABC (1.25) ABC (1.25) C (-0.25) C (-0.25)
Analysis of a DOE
We can see the impact, We can see the impact, but how do we know if but how do we know if these results are these results are significant or just significant or just random variation? random variation?
Ranked Degree of Impact A AB BC B AC ABC C (6.75) (5.25) (3.25) (2.25) (1.75) (1.25) (-0.25)
What tool do you What tool do you think would be think would be good to use in this good to use in this situation? situation?
The National Graduate School of Quality Management v7 252
Confidence Interval Confidence Interval = Effect +/- Error = Effect +/- Error Some of these factors do not seem to have much impact. We can use them to estimate our error. We can be relatively safe using the ABC and the C factors since they offer the greatest chance of being insignificant.
The National Graduate School of Quality Management v7 253
Confidence = t / 2 , DF
(ABC 2 + C 2 )
DF
DF=# of groups used In this case we are using 2 groups (ABC and C) so our DF=2
(1.25
+ ( .25 ) 2
6
GUTTER!
How do I know what works for me...... Lane conditions? Ball type? Wristband?
It looks like the lanes are in good condition today, Mark... Tim has brought three different bowling balls with him but I dont think he will need them all today. You know he seems to have improved his game ever since he started bowling with that wristband..........
A 3 factor, 2 level full factorial DOE would have 23=8 experimental treatments!
oily lane
2
The Ball Type depends on the Lane Condition.... This is called an Interaction!
The National Graduate School of Quality Management v7 260
With Wristband and When lane is: Dry Oily use: Hard Ball Soft Ball
Learning Objectives
Have a broad understanding of the role that design of experiments (DOE) plays in the successful completion of an improvment project. Understand how to construct a design of experiments. Understand how to analyze a design of experiments. Understand how to interpret the results of a design of experiments.
The National Graduate School of Quality Management v7 263