03 Measure Phase

5/30/2021
Note : These hand outs are used for GB Training of Henry Harwin
Management Academy and for the purpose of GB course reference 2
Propritary/Gopalakrishna/Nirmala
Pharma Consultancy Services 1
5/30/2021
PROCESS MAP
What is a Process Map?
A picture of the process

showing the process steps,
the inputs and outputs
It is always produced as the

first step of a Six Sigma
project
5/30/2021
Levels of Mapping Detail
High Level Maps Detailed Maps

○ Scope: critical process steps
○ Scope: entire process
○ Process steps: identified by
○ Process steps: grouped individual task/activity – shows
alternate paths and activities
into major activities
○ Inputs & Outputs: stated in
○ Inputs & Outputs: stated in terms of attributes,
generalized terms characteristics or variables
○ Measurement: all data

○ Measurement: major collection, process
inspection points noted measurement, test and
inspection steps noted
Start at high level move to detailed maps as required 5
Process Mapping Steps
1. Identify the process, the inputs and the outputs (customer

requirement)
2. Identify all process steps
3. List vital output variables at each step
4. List vital input variables and classify process inputs as
controlled [C] or uncontrolled [U]
5. Add process specifications for Input Variables
6. Start an initial assessment of the control plan
Do not forget to
Walk the process 6
5/30/2021
Example: Process Operations
Inputs Outputs
Making Bread
Flour Good flavor
Yeast Right Texture
Water Color
Energy Correct Weight
Equipment
Personnel
Other Ingredients
Identify Major Process Steps
Include all process steps, including verification & rework. (1000 meter view)
Making Bread
Mixing Kneading Rising Baking
•Assess demand •Turnout on pastry •Turn twice •Pre-Heating

•Measure flour, yeast, board •Let rise until •Racking
milk, butter, sugar, salt •Knead center out doubled •Time/Temperature
& water until stiffens •Punch down & rise Cycle
•Prepare yeast mix •Knead in until •Shape loaves & • Deliver Loaves
•Mix sugar, salt & butter smooth & satiny place on greased
•Combine with flour •Place in greased trays
8
•Beat well bowl
5/30/2021
60X,s
Standard Process Mapping Symbols
10
5/30/2021
Cross Functional Flow Charts – Swim Lanes
PRIORITIZATION MATRIX
12
5/30/2021
Prioritization Matrix  Identify Vital Few from Many
A matrix that details:

the importance of the outputs
and the relationship between
the inputs and those outputs
The tool produces a list of

inputs that the team consider
to be the most important.
It starts the funnelling process

13
Process Map & Prioritization Matrix Exercise
Define the process step and the inputs /

outputs for the process
You have 10 minutes
5/30/2021
WHY DATA ???
Data Collection Plan
Define a Metric (CTQ)
Define Operational Definition of CTQ
Define How & by Whom Measurements will be done
Collection of Data
Graphical Representation of the Data
5/30/2021
Operational Definition - Baseline Data Collection
17
5/30/2021
19
TYPES OF DATA
20
5/30/2021
TYPES OF DATA
21
TYPES OF DATA
22
5/30/2021
TYPE OF DATA
23
Exercise - Data
24
5/30/2021
Sampling of Data from Population
25
Sampling of Data from Population
26
5/30/2021
Sampling works when...

● Each member of the population has an equal chance of being
selected (unbiased)
● Selecting one member doesn’t influence likelihood of another

member being selected or not (independent)
● There aren’t any significant differences between those selected

and those that weren’t (representative)
● You have a large enough sample to find what you’re looking for.
If it’s a rare event or you want to be very precise, you’ll need a
large sample (big enough)
27
Sampling Plan
● A good sampling plan will capture all relevant sources of

noise variability, ie will capture the process going wrong
○ Lot-to-lot, batch-to-batch
○ Different shifts, operators, machines or processes
● Sample Size rule of thumb: 30
● Input Variables do not always have to be measured for
each sample
○ Example:
■ Samples are drawn for an output variable measurement every hour
■ Ambient humidity (input variable) is assumed to be constant over a 4
hour period 28
5/30/2021
Sampling Strategies
● You can use one or more of the following sampling designs

● Sampling designs:
○ Simple Random Sample
○ Stratified Random Sampling
○ Cluster Sampling
○ Systematic Sampling
○ Subgroup Sampling
29
BASIC STATISTICS
INFERENTIAL STATISTICS
30
5/30/2021
Three Things to Know about DATA
When you have a collection of data, there are three things to know
Where is the middle?

(location / central tendency)
How spread out is the data?
(dispersion)
How are the data distributed?
(shape of the distribution)
31
Three Measures of Central Tendency
Mean
Median
Mode
32
5/30/2021
Mean
“Mean” is the statistical term for what most people
call “average”. It is usually the best indicator of
where the center is.
5 X 1
3 X2
We also use
6 this notation X3
2 =4
Mean X4
33
Mean
is the symbol we use for the true

population mean
is the symbol we use for the sample

mean. It is an estimate of the
population mean, based on a sample.
34
5/30/2021
Median
Sorted
Set of Data
Data
8 7
8
13 Median is the middle value,
9
7 there are as many samples
10
12 above as there are below.
10 10
This is the physical middle – median
11 Mean is the statistical mean 11
10 12
9 13
35
Mode
Mode is the most common value, the
peak of the distribution.
It is a very weak indicator of where
the center is.
.
36
5/30/2021
Mean vs. Median
The mean is usually the

best indicator of where the
center of the data is.
However there are times
when the median is better.
37
Mean vs. Median

Salaries of randomly selected Salaries of randomly selected
employees, case 1. employees, case 2.
Worker 1 $30,000 Worker 1 $30,000
Worker 2 $35,000 Worker 2 $35,000
Worker 3 $40,000 Worker 3 $40,000
Worker 4 $45,000 Worker 4 $45,000
Worker 5 $50,000 President, $400,000
Find the mean and the median for both cases.
38
What effect does a single large value have on the mean? median?
5/30/2021
Three Things to Know

(dispersion)
39
Measures of Dispersion (Spread)
Range
Variance
Standard deviation
40
5/30/2021
Range
Largest
Set of Data
8 Smallest
13 Definition
7 Range=Largest minus the smallest
10 = (13-7) = 6
12 For groups of 2-10 items, range is about as
11 sophisticated as you need to get.
10 Easy to calculate.
9 Easily distorted by one unusually large or small
datum (outlier). 41
Variance
Set of Data
8
13
7
=
10
12
11
10
Calculate the Mean
9
42
5/30/2021
Plot the data points

Plot the data in order, relative to the mean
13
12
11
10
9
In English 7
we say X bar
What does (Xi - Xbar) mean?
43
Mathematics
Calculate the deviation (Xi – Xbar)
13
12
11
10
9
Fill the others in Xi 8 13 7 10 12 11 10 9

Xi -Xbar -2 3
44
5/30/2021
Mathematics
Square the differences, and fill in the (Xi-Xbar)2 row
13
12
11
10
9
Xi 8 13 7 10 12 11 10 9
Fill the others in Xi- Xbar -2 3 -3 0 2 1 0 -1
(Xi- Xbar )2 4 9
45
Mathematics
Sum the (Xi-Xbar)2 row
13
We write it as
12
11 S(Xi- Xbar )2
10
9
This is the
8
“sum of the squares”,
7 a measure of
total variation
Xi 8 13 7 10 12 11 10 9
Xi- Xbar -2 3 -3 0 2 1 0 -1
(Xi- Xbar )2 4 9 9 0 4 1 0 1 = 28
46
5/30/2021
Formulae
The formula for mean is: The formula for the

variance of a sample is:
SXi
S(Xi- X )2
n
n-1
It s a correction factor so that for small no of data points the outliers do not effect the stad dev value
Can you see that variance is just a fancy “average deviation”?

The variance of our set of sample data =
47
Standard Deviation
The formula for The formula for the

the variance of a standard deviation of a
sample is: sample is:
S(Xi- X )2 S(Xi-X)2
n-1 n-1
The standard deviation is just the square root of the variance.
The standard deviation of our set of sample data =
48
5/30/2021
Standard Deviation
The standard deviation is the most common measure of
spread for collections of data larger than 10 items. It also
works fine for collections as small as n=2.
It is not easily “pulled” by one outlier.
The symbol for the population standard deviation is s, we

say sigma, and the symbol for an estimate of standard
deviation, based on a sample, is s.
49
Symbols
Roman letters for Greek letters for
estimates based true population
on a sample. parameters
Mean X m
Range R
Standard s
deviation
s
Variance s2 s2 50
5/30/2021
Summary of Measures
Mean is usually the best measure of where the middle of the data is.
Median is another measure of where the middle is, and it is best when
the data contains outliers, or is known to be non-normal.
Range is the maximum minus the minimum. It is easy to compute, and
gives a good measure of “spread” for small groups of data.
Standard deviation and variance are very sophisticated measures of
“spread”. They are not easily influenced by an outlier.
Mean, median, range, standard deviation, and variance apply
regardless of how your data are distributed. 51
Exercise
For each data set, choose which measure of
“middle” and “spread” is best.
Data Set Middle Spread
2,5,3
3,4,6,1,4,5,7,2,4,1000
,1,5,7,3
3,4,6,1,4,5,7,2,4,
1,5,7,3,3,7,4
2,4,6,651
52
5/30/2021
Three Things to Know

(dispersion)
53
Distributions
A bag of marbles is sorted according to size.
-2 -1 1 +1 +2
mm mm mm mm
cm
54
5/30/2021
Distributions
A bag of marbles is sorted according to size.
-2 -1 1 +1 +2
mm mm mm mm
cm
55
Dotplot
This gives us a natural graph of the number of cases (frequency) vs.
diameter of the marble. This is a dotplot.
Number
of cases
-2 -1 1 +1 +2 Size of
mm mm mm mm
cm marble
56
5/30/2021
Histogram
If we make a bar chart, with bars the length of the
stacks of marbles, we have made a histogram.
Number
of cases
-2 -1 1 +1 +2 Size of
mm mm mm mm
cm marble
57
Distribution Curve
If we do an infinite number of measurements, and make our increments of
size infinitesimal, we get a continuous distribution curve.
Number
of cases
-2 -1 1 +1 +2 Size of
mm mm mm mm
cm marble
58
5/30/2021
Distribution Curve
The number of cases that happen between any two points on the horizontal axis is approximately
the area under the distribution curve, between those two points.
Number Point 1
of cases
Point 2
-2 -1 1 +1 +2 Size of
mm mm mm mm
cm marble
59
There are Many Distributions

● Normal, Gaussian, or “bell curve”
● F distribution
● T distribution
● Chi-square distribution
● Uniform distribution
● Weibull distribution
These are all mathematical models. If your data fits one of these
models, you can use the model to represent your data.
60
5/30/2021
Normal Distribution
The Normal Distribution often occurs in nature.
It is a very useful model.
61
Properties of the Normal Distribution
The Normal Distribution is symmetrical. The left half

is the exact mirror image of the right half.
62
5/30/2021

The Mean, the Median, and the Mode all occur
exactly in the middle of the curve.
63

Once you specify the mean and the standard deviation of the
normal curve, the curve is completely known.
64
5/30/2021

The areas under the curve is 1 and we calculate the area under + - std dev
In sample size calculation we take 1.96 which is close to 2 and we calculate for 95% of the data
About 68% of all cases occur within + / - 1 Standard

Deviation of the Mean. 65

The areas under the curve is 1 and we calculate the area under + - std dev
In sample size calculation we take 1.96 which is close to 2 and we calculate for 95% of the data
About 95% of all cases occur within + / - 2 Standard

Deviations of the Mean. 66
5/30/2021
About 99.7% of all cases occur within + / - 3 Standard

Deviations of the Mean. 67
Summary: Normal Distribution
● For a Normal Distribution:

68% of the data is within +/- 1 standard deviations
95% of the data is within +/- 2 standard deviations
99.73% of the data is within +/- 3 standard deviations
68
5/30/2021
Testing for a Normal Distribution
● We can test whether a given data set can be described as “normal” with a
Normal Probability Plot
● If a distribution is close to normal, the Normal Probability Plot will be close

to a straight line
● Minitab makes the normal probability plot easy
69
Standard Normal Distribution – Z distribution
m2
m3
m1
Z1=0 Z2=0 Z3=0
70
5/30/2021
THE NORMAL DISTRIBUTION
The Area Bounded By Std. Deviations Can Be Used To Estimate The

Cumulative Probability Of A Certain “Event” Occurring
Probability of sample value
68.26%
40%
30% 95.44%
20%
99.73%
10%
0%
m - 3s m - 2s m - s m m + s m + 2s m + 3s
50%
71
THE NORMAL DISTRIBUTION

Important Because:
 Many Natural Phenomena Seem To Follow It
 Provides The Basis For Statistical Inference Because Of Its Relationship To The Central
Limit Theorem (SPC Will Explain More)
Properties Of Normal Distribution:
 Symmetrical In Appearance (One Side Mirror Image of Another)
 Measures Of Central Tendency Are All Identical (Mean, Median, Mode)
 +/-1 Sigma – 68.26%
 +/-2 Sigma – 95.44%
 +/-3 Sigma – 99.73% 72
5/30/2021
THE NORMAL CURVE / BELL-SHAPED CURVE
Standard Normal Distribution

• Average (Mean) =0
68.26%
• Standard Deviation = 1
95.44%
-¥ 99.73%
+¥
Std Dev (s) -3 -2 -1 0 1 2 3
Characteristics Z -
• 68.26% of data lie within +/- 1 standard deviation

73
• 99.9996 % of data lie within +/- 6 standard deviation
Z VALUE - SCALE OF MEASURE

A Unit of Measure, equivalent to the number of Standard Deviations
Z = that a value is Away from the Target Value
A 6Sigma Process Mean
Lower Z
Specification Upper
Limit Specification
6.0s 6.0s Limit
LSL USL
Z - Values 74
-6.0 0 6.0
5/30/2021
STANDARD NORMAL DISTRIBUTION
-3s -2s -1s 0 +1s +2s +3s
3 4 .1 % 3 4 .1 %
1 3 .6 % 1 3 .6 %
0 .1 3 5 % 2 .2 % 0 .1 3 5 %
2 .2 %
-3 -2 -1 0 +1 +2 +3
S ta n d a r d D e v ia tio n U n its
75
INTRODUCTION TO MINITAB
76
5/30/2021
You will see a series of windows..

Session Window
This reports the results
of your calculations
Project Manager
Gives you control
over items in
your project
Data Window
This is where data is entered
It can be typed,
pasted from other applications
or generated internally by Minitab 77
STAT>BASIC STATISTICS> GRAPHICAL SUMMARY
GRAPHS … histogram, box plot
NORMALITY TEST
NORMALITY TEST +/- 1 SIGMA = 68%
NORMALITY TEST + / - 2 Sigma = 95%
78
5/30/2021
Box Plot
79
EXAMPLE --
In a Factory, the reactor Cycle time is measured every day for 365 days
In year 2002 2003 and 2004
1) Calculate the average cycle time each year
2) Calculate the Median cycle time each year
3) Calculate the Range for each year
4) Calculate the Standard deviation each yr
5) What % data is above cycle time 22 Hrs

80
5/30/2021
THE NORMAL PROBABILITY DENSITY FUNCTION
● Minitab can help us verify the properties of normal distribution.

● Go to Calc>Probability Distributions>Normal.
● You have 3 Choices here

1. Probability Density (Gives the Height of Curve at any Value)
2. Cumulative Probability (Area under the curve from -∞ to a given Value)
3. Inverse Cumulative probability ( The value at which the area under the curve
from -∞ is given)
● For a Normal Distribution You need to specify Mean and Standard

Deviation
● In Input Constant, give value for which you want The Calculations Done
● If calculations need to be done for more than 1 values, give values in a
Column in worksheet and use Input Column option 81
1. Probability Density (Gives

the Height of Curve at any
Value)
2. Cumulative Probability (Area
under the curve from -∞ to a
given Value )
3. Inverse Cumulative probability
2
( The value at which the area 1
under the curve from -∞ is
given)
3
82
5/30/2021

● Let us now verify the properties of normal distribution.
● Consider 2 sets of Normally distributed data
1. μ= 0, σ=1
2. μ= 70, σ=10
● Find out what fraction of data points lie left of mean in Case 1 ?
● Use Calc>Probability Distributions> Normal.
● Choose Cumulative Probability
● Specify Mean =0 and Standard Deviation =1
● Choose Input Constant and Specify value =0.
● Click OK
83
● Repeat the calculation for Case 2. What do the results tell us?
Cumulative Distribution Function
Normal with mean = 0 and standard deviation = 1
x P( X <= x )
0 0.5

x P( X <= x )
70 0.5
84
5/30/2021
● If we now need to find out what fraction of data points fall between μ-σ and
μ+σ in Case 1 ?
Use Calc>Probability Distributions> Normal.
■ Choose Cumulative Probability
■ Specify Mean =0 and Standard Deviation =1
■ Choose Input Constant and Specify value =-1.
■ Click OK
Use Calc>Probability Distributions> Normal.
■ Choose Cumulative Probability
■ Specify Mean =0 and Standard Deviation =1
■ Choose Input Constant and Specify value =1.
■ Click OK
85

x P( X <= x )
-1 0.158655
x P( X <= x )
1 0.841345
Total fraction within –1 and 1 =0.84 – 0.16 = .68 Approx.
Does this match with our knowledge ?
What are the results for case 2 ?
86
5/30/2021
● Find out fraction of Data points within μ±2σin both the cases
1.
2.
● What about fraction of Data points within μ±3σin both the cases
1.
2.
● Why are the results always same in both the cases ?
● Does this match our knowledge of Normal Distribution ?
● What are the Practical Limits of a Normally Distributed Process ?
● These Practical Limits represent what fraction of Data points ?
87
88
5/30/2021
Although the normal distribution takes center stage in statistics, many

processes follow a non normal distribution.
Non Normal Distribution This can be due to the data naturally following a specific type of non normal
distribution (for example, bacteria growth naturally follows an exponential
distribution).
In other cases, your data collection methods or other methodologies may be at

fault.
Types of Non Normal Distribution

Beta distribution with different
parameter values 1) Beta Distribution.
2) Exponential Distribution.
3) Gamma Distribution.
4) Inverse Gamma Distribution.
5) Log Normal Distribution.
6) Logistic Distribution.
7) Maxwell-Boltzmann Distribution.
8) Poisson Distribution.
9) Skewed Distribution.
10) Symmetric Distribution.
11) Uniform Distribution.
12) Unimodal Distribution.
13) Weibull Distribution.
89
Reasons for the Non Normal Distribution

Many data sets naturally fit a non normal model. For example, the number of accidents
tends to fit a Poisson distribution and lifetimes of products usually fit a Weibull distribution.
However, there may be times when your data is supposed to fit a normal
distribution, but doesn’t. If this is a case, it’s time to take a close look at your data.
Outliers can cause your data the become skewed. The mean is especially
sensitive to outliers. Try removing any extreme high or low values and testing your
data again.
Insufficient Data can cause a normal distribution to look completely scattered. For example,
For example..classroom test results are usually normally distributed.
An extreme example: if you choose three random students and plot the results on a graph,
you won’t get a normal distribution. You might get a uniform distribution (i.e. 62 62 63) or you might
get a skewed distribution (80 92 99). If you are in doubt about whether you have a sufficient sample size,
90
collect more data.
5/30/2021
Dealing with Non Normal Distributions
You may still be able to run these tests if your sample size is large
enough (usually over 20 items) for non-normal distributions
You can also choose to transform the data with a function, forcing it to
fit a normal model.
However, if you have a very small sample, a sample that is skewed or

one that naturally fits another distribution type, you may want to run
a non parametric test.
A non parametric test is one that doesn’t assume the data fits a specific distribution type.
Non parametric tests include the Wilcoxon signed rank test, the Mann-Whitney U Test and
the Kruskal-Wallis test.
91
Chebyshev’s Rule
● Applies to any data set regardless of the frequency distribution of the data
● The rules are:

○ +/- 1 standard deviation, no useful information
○ +/- 2 standard deviations, at least 75% of the data
○ +/- 3 standard deviations, at least 88.9% of the data
○ +/- 4 standard deviations, at least 99.8% of the data
This is applicable for all kinds of distribution
So, for any data set, approximately 90% of the data will fall within +/- 3 StDev of the mean 92
5/30/2021
The Empirical Rule ... Fat pencil test

● This applies to data sets with frequency distributions that are
mound-shaped and symmetrical
● The rules are:
+/- 1 standard deviation, approximately 68%
+/- 2 standard deviations, approximately 95% of the data
+/- 3 standard deviations, approximately 99.7% of the data
Tip
Use Chebyshev’s rule for any data set
Use The Empirical rule for symmetrical and mound-shaped data sets
Use the properties of the normal distribution for non-normal data
We can see put the pencil test on the normality plot and draw the + and – 1 std dev and find the % data if 68% and + and –2 std dev
We have 95% then we can treat the data set as normal. 93

03 Measure Phase

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

03 Measure Phase

Uploaded by

Copyright:

Available Formats

5/30/2021

What is a Process Map?

A picture of the process

It is always produced as the

Levels of Mapping Detail

High Level Maps Detailed Maps

○ Measurement: all data

Start at high level move to detailed maps as required 5

Process Mapping Steps

1. Identify the process, the inputs and the outputs (customer

Example: Process Operations

Identify Major Process Steps

Mixing Kneading Rising Baking

•Assess demand •Turnout on pastry •Turn twice •Pre-Heating

Standard Process Mapping Symbols

Cross Functional Flow Charts – Swim Lanes

Prioritization Matrix  Identify Vital Few from Many

A matrix that details:

The tool produces a list of

It starts the funnelling process

Process Map & Prioritization Matrix Exercise

Define the process step and the inputs /

You have 10 minutes

WHY DATA ???

Data Collection Plan

Define a Metric (CTQ)

Define Operational Definition of CTQ

Define How & by Whom Measurements will be done

Graphical Representation of the Data

Operational Definition - Baseline Data Collection

Data Collection Plan

Data Collection Plan

Sampling of Data from Population

Sampling of Data from Population

Sampling works when...

● Selecting one member doesn’t influence likelihood of another

● There aren’t any significant differences between those selected

● A good sampling plan will capture all relevant sources of

● You can use one or more of the following sampling designs

Three Things to Know about DATA

Where is the middle?

Three Measures of Central Tendency

is the symbol we use for the true

is the symbol we use for the sample

Mean vs. Median

The mean is usually the

Mean vs. Median

Three Things to Know

Where is the middle?

Measures of Dispersion (Spread)

Plot the data points

Fill the others in Xi 8 13 7 10 12 11 10 9

The formula for mean is: The formula for the

Can you see that variance is just a fancy “average deviation”?

The formula for The formula for the

The symbol for the population standard deviation is s, we

Three Things to Know

Where is the middle?

There are Many Distributions

Properties of the Normal Distribution

The Normal Distribution is symmetrical. The left half

Properties of the Normal Distribution

Properties of the Normal Distribution