Professional Documents
Culture Documents
Lesson 3—Measure
Process mapping refers to a workflow diagram which gives a clear understanding of the process or a
series of parallel processes.
Features
of process
Gives wider perspective
First step in process mapping
of the problems and
improvement opportunities
! Process mapping can be done by using flowcharts, written procedures, or detailed work instructions.
The X-Y diagram is a Six Sigma tool that helps in correlating Inputs (X) and Outputs (Y). It can be used
to identify what inputs are more valuable and impactful when there are multiple inputs and outputs
in a project.
Steps to Create X-Y Diagram
1
Capture all the inputs and outputs variables.
2
Insert an impact or correlation factor.
1
List down each b c
of the input
variables.
Statistics refers to the science of collection, analysis, interpretation, and presentation of data. There
are two major types of statistics—Descriptive statistics and Inferential statistics.
The main objective of statistical inference or analytical statistics is to draw conclusions on population
characteristics based on the information available in the sample. A sample from the population is
collected. An assessment about the population parameter is made from the sample.
Q The management team of a cricket council wants to know if the team’s performance has improved after
recruiting a new coach. Is there a way the improvement can be proven statistically?
Central Limit Theorem (CLT) states that for a sample size greater than 30, the sample mean is very
close to the population mean.
● When sample size is greater than 30, the sample mean approaches normal distribution.
● In such cases, the Standard Error of Mean (SEM) that represents the variability between the
sample means is very less.
! Selecting a sample size also depends on the concept called Power of the Test.
● CLT aids in making inferences from the sample statistics about the population parameters
irrespective of the distribution of the population.
! ● CLT becomes the basis for calculating the confidence interval for a hypothesis test as it allows the
use of a standard normal table.
Copyright 2014, Simplilearn, All rights reserved. 13
Measure
Topic 3—Collecting and Summarizing Data
Data is a collection of facts from which conclusions can be drawn. The two types of data are:
The first step in the measure phase is to determine the type of data required based on the following
considerations:
What variables have been Critical to Quality parameters (CTQs), Key Process Output Variables
identified for the process? (KPOVs), and Key Process Input Variables (KPIVs)
Why should the data type Enables collecting, analyzing, and drawing inferences from the right
be identified? set of data
! It is difficult to convert attribute data to variable data in the absence of assumptions or additional
information, which can include retesting all units.
Copyright 2014, Simplilearn, All rights reserved. 16
Simple Random Sampling vs. Stratified Sampling
The differences between simple random sampling and stratified sampling are given here.
A measure of central tendency is a single value that indicates the central point in a set of data. The
three most common measures of central tendency are as follows:
𝑛+1
Median = 2
For the following data set, the mean, median, and mode are calculated:
1, 2, 3, 4, 5, 5, 6, 7, 8
1+2+3+4+5+5+6+7+8
Mean = = 4.56
9
Median = 5
Mode = 5
The dataset is modified to include a new value. The new dataset is given:
1, 2, 3, 4, 5, 6, 7, 8, 100
! When the dataset has outliers, median is preferred over mean as a measure of central tendency.
Measures of dispersion describe the spread of values. Higher the variation of data points, higher the
spread of the data. The three main measures of dispersions are as follows:
● Range
● Variance
● Standard Deviation
Range is defined as the difference between the largest and smallest values of data.
For the data set given here,
4, 8, 1, 6, 6, 2, 9, 3, 6, 9
Variance is defined as the average of squared mean differences and shows the variation in a data set.
∑ (𝑥𝑖 − 𝑥)2
Variance = σ2 =
𝑛 −1
Consider the data set given here:
4, 8, 1, 6, 6, 2, 9, 3, 6, 9
Sample variance can be calculated using the formula = VARS() in an Excel sheet. Population variance
can be calculated using the formula = VARP(). Here,
Sample variance = 8.04
Population variance = 7.24
! Variance is a measure of variation and cannot be considered as the variation in a data set. Population variance
is preferred over sample variance as it is an accurate indicator of variation.
Frequency distribution is the grouping of data into mutually exclusive categories showing the number
of observations in each class. To create a frequency distribution table:
2 0 IIII 4
Make a table with separate columns for the interval
numbers, the tallied results, and the frequency of 1 IIII I 6
results in each interval.
2 IIII 5
3
Record the number of observations in each interval
with a tally mark. 3 III 3
4 4 II 2
Add the number of tally marks in each interval and
record them in the Frequency column.
A cumulative frequency distribution table is more detailed than a frequency distribution table.
1
To the frequency distribution table, add three more columns for the cumulative frequency,
percentage, and cumulative percentage.
2
In the cumulative frequency column, the cumulative frequency of the previous row(s) is
added to the current row.
3
The percentage is calculated by dividing the frequency by the total number of results and
multiplying by 100.
4
The cumulative percentage is calculated similar to the cumulative frequency.
For the following dataset, the cumulative frequency distribution table is given:
Cumulative Cumulative
Lower Value Upper Value Frequency Percentage
Frequency Percentage
35 44 1 1 10 10
45 54 2 3 20 30
55 64 2 5 20 50
65 74 2 7 20 70
75 84 2 9 20 90
85 94 1 10 10 100
A stem and leaf plot is used to present data in a graphical format to enable visualizing the shape of a
distribution. For example, following are the temperatures for the month of May in Fahrenheit.
78, 81, 82, 68, 65, 59, 62, 58, 51, 62, 62, 71, 69, 64, 67, 71, 62, 65, 65, 74, 76, 87, 82, 82, 83, 79, 79, 71, 82, 77, 81
To create the plot, all the tens digits are entered in the Stem column and all the units digits against
each tens digit are entered in the Leaf column.
Stem Leaf
5 189
6 22224555789
7 111467899
8 11222237
Copyright 2014, Simplilearn, All rights reserved. 28
Graphical Methods—Box and Whisker Plots
A box and whisker graph, based on medians or quartiles, is used to view the data distribution easily.
Example: The lengths of 13 fish caught in a lake are measured and recorded as follows:
Median
Step 3: Find the lower and upper quartile.
Step 5: Locate the main median, 12, using a vertical line. Locate the lower and upper quartiles (8.5 and 14) and
join them with the median by drawing boxes.
Step 6: Extend whiskers from either ends of the boxes to the smallest and largest numbers (5 and 20) in the data set.
In perfect positive correlation, as the value of X increases, the value of Y also increases proportionally.
In moderate positive correlation, as the value of X increases, the value of Y also increases, but not in
the same proportion.
Example: Correlation between monthly salary and monthly savings
Salary (in thousands) (X) Savings (in thousands) (Y)
45 6
48 6.2
52 8
55 8.2
57 8.5
58 8.6
60 10
65 12
When a change in one variable has no impact on the other, there is no correlation between them.
Example: Relation between number of fresh graduates and job openings in a city
In moderate negative correlation, as the value of X increases, the value of Y decreases, but not in the
same proportion.
Example: Correlation between the price of a product and the number of units sold
Unit Price of Product
Units Sold (Y)
(in thousands) (X)
30 1000
32 980
33 970
35 965
38 950
40 920
42 910
A histogram is similar to a bar graph, except that the data in a histogram is grouped into intervals. A
histogram is best suited for continuous data.
Example: Number of hours spent by 15 team members on a special project in a week
1.5, 1.5, 2, 3, 3, 3, 25, 3, 5, 4, 4, 4, 4.5, 5, 6, 9.5, 10
Normal probability plots are used to identify if a dataset is normally distributed. A normally
distributed dataset forms a straight line in a normal probability plot.
Example: The following data sample is of diameters from a drilling operation:
.127, .125, .123, .123, .120, .124, .126, .122, .123, .125, .121, .123, .122, .125, .124, .122, .123, .123, .126, .121,
.124, .121, .124, .122, .126, .125, .123
Step 1: Construct a cumulative frequency distribution table and calculate the mean rank probability
estimate using the formula:
Cumulative frequency
Mean rank probability estimate = ∗ 100
(n+1)
After performing Step 1, mean rank probability estimations are calculated. The table below lists
them:
Cumulative (Cumulative
X Frequency Mean Rank (%)
Frequency Frequency)/(n+1)
0.120 1 1 1/28 4
0.121 3 4 4/28 14
0.122 4 8 8/28 29
0.123 7 15 15/28 54
0.124 4 19 19/28 68
0.125 4 23 23/28 82
0.126 3 26 26/28 93
0.127 1 27 27/28 96
n = 27
Step 2: Plot the graph on log paper or using Minitab, a statistical software used in Six Sigma.
Conclusion: From this graph, it can be observed that the random sample forms a straight line, and
therefore, the data is taken from a normally distributed population.
Copyright 2014, Simplilearn, All rights reserved. 41
Measure
Topic 4—Measurement System Analysis
The Measurement System’s (MS) output is used throughout the DMAIC process. An error-prone MS
leads to incorrect conclusions. Measurement System Analysis (MSA) is a technique that identifies
measurement error (variation) and its sources to reduce variation.
In MSA, the system’s capability is calculated, analyzed, and interpreted using Gage Repeatability and
Reproducibility (GRR) to determine:
● measurement correlation;
● bias;
● linearity;
● percent agreement; and;
● precision/tolerance (P/T).
! Variation in the measurement system has to be resolved to ensure correct baselines for the project
objectives.
Copyright 2014, Simplilearn, All rights reserved. 44
Precision and Accuracy
In statistical measurements, the terms Precision and Accuracy are the two important factors to be
considered when taking data measurements.
Precision Accuracy
● The ability to replicate measurements time ● Clustering of data around a known target.
after time, consistent measurements. ● It is also known as unbiased measurement.
● It refers to the tightness of the cluster of ● To have a stable measurement system,
data. focus on the accuracy first by addressing
● Measurement issues related to precision measurement issues, and get accurate
can be addressed through Measurement results.
Systems Analysis.
Good precision and accuracy are equally important for a stable measurement system.
Precision Accuracy
In any measurement system, precision is the In any measurement system, accuracy is the
degree to which repeated measurements degree of conformity of a measured or
under unchanged conditions show the same calculated value to its actual (true) value.
results (repeatability). Example: Accurately hitting the target
Example: Hitting a target means all the hits means you are close to the center of the
are closely spaced, even if they are very far target, even if all of the marks are on
from the center of the target. different sides of the center.
Examples of the four combinations of accuracy and precision are shown here:
Bias, Linearity and Stability are the three aspects of measurement system that helps in analyzing how
good the measurements are.
• While performing the MSA, it is important to evaluate these along with precision and accuracy.
• Bias, linearity, and stability help you understand what is causing mismatch, if any, or resulting in
inaccurate data.
Bias is a measure of the distance between the measured value and the True or Actual value. It could
be either on the positive side or the negative side.
Example: An Analog Bathroom Weighing Scale provides an adjustment screw or a dial to set it to zero
prior to weighing.
Linearity is a measure of consistency of bias over the range of measurement from smaller number to
higher number and vice-a-versa.
Example: If a bathroom scale is showing 2 pounds less when measuring a 100 pound person, and 5
pounds less when measuring a 150 pound person, the scale bias is said to be non-linear. The degree
of bias changes between the lower end and high end (Linearity issue).
Stability refers to the ability of a measurement system to show the same values over time when
measuring the same repeatedly.
Example: Suppose the weighing scale shows one reading in the morning and other in the afternoon
for the same item, the measurement system is said to be instable.
Measurement Systems Analysis (MSA) should be done for both Variable and Attribute data.
This is the variation in measurements obtained when This is the variation in the average of measurements
one operator uses the same gage for measuring made by different operators using the same gage
identical characteristics of the same part repeatedly. when measuring identical characteristics of
the same part.
● Distance between the sample ● Consistency of bias over the ● Degree of repeatability or
mean value and the sample range of the gage closeness of data
true value ● Smaller dispersion results in
● Also called accuracy better precision
Measurement resolution is the smallest detectable increment that an instrument will measure or
display. The number of increments in the measurement system should extend over the full range for a
given parameter.
Repeatability or Equipment Variation (EV) occurs when the same operator repeatedly measures the
same part or process, under identical conditions, with the same measurement system.
Example: A 36 km/hr pace mechanism is timed by a single operator over a distance of 100 meters on
a stop watch. Three readings are taken:
● Trial 1 = 9 seconds
● Trial 2 = 10 seconds
● Trial 3 = 11 seconds
Assuming there is no operator error, the variation in the three readings is known as Repeatability or
Equipment Variation (EV).
Reproducibility or Appraiser Variation (AV) occurs when different operators measure the same part or
process, under identical conditions, with the same measurement system.
Example: A 36 km/hour pace mechanism is timed by two operators over a distance of 100 meters on a
stop watch. Three readings are taken by each:
Trial Operator 1 Reading Operator 2 Reading
1 9s 12s
2 10s 13s
3 11s 14s
! It is important to resolve EV before resolving AV, as the other way round is counter-productive.
ANOVA is considered the best method for analyzing GRR studies due to the following reasons:
● ANOVA separates equipment and operator variation, and also provides insight on the combined
effect of the two.
● ANOVA uses standard deviation instead of range as a measure of variation and therefore gives a
better estimate of the measurement system variation.
The primary concerns in using ANOVA are those of time, resources required, and cost.
The results page of the data entered in the template is displayed here:
1
Check the value of %GRR. If %GRR < 30, Gage Variation is acceptable, and thus the gage is
acceptable. If %GRR > 30, the gage is not acceptable.
2
Check EV first. If EV = 0, the MS is reliable and the variation in the gage is contributed by
different operators. If AV = 0, the MS is precise.
3
If EV = 0, resolve AV by providing operators with training.
The interaction between operators and parts can also be studied under GRR using Part Variation. The trueness and
! precision cannot be determined in a GRR if only one gage or measurement method is evaluated as it may have an
inherent bias.
Copyright 2014, Simplilearn, All rights reserved. 65
Measure
Topic 5—Process Capability
Process Capability is how well the process can potentially run, if the
sources of variation are controlled and the process runs on target.
• The Business judges its process by looking at the Process Capability,
which is a metric that reflects only the common cause variation,
assuming special causes are controlled.
• There are two types of limits, Natural Process Limits and
Specification Limits. The USL and LSL will be as provided by the user.
LSL USL
The comparison between natural process limits and specification limits is presented here:
! If the limits lie within the specification limits, the process is under control. Conversely, if the specification
limits lie within the control limits, the process will not meet customer requirements.
! The difference between USL and LSL is also called the Specification width or Tolerance.
Process capability indices (Cpk) was developed to objectively measure the degree to which a process
meets or does not meet customer requirements.
To calculate Cpk, the first step is to determine if the process mean is closer to the LSL or the USL.
● If the process mean is closer to LSL, Cpkl is determined.
X – LSL
Cpkl = , where X is Process Average and Sigma represents the Standard Deviation.
3∗Sigma
If the process mean is equidistant, either specification limit can be chosen. Cpk takes up the value of CpkU and
! Cpkl, depending on whichever is the lower value.
Q A batch process produces high fructose corn syrup with a specification for the Dextrose Equivalent (DE) to
be between 6.00 and 6.15. The DEs are normally distributed, and a control chart shows the process is
stable. The standard deviation of the process is 0.035. The DEs from a random sample of 30 batches have a
sample mean of 6.05. Determine Cp and Cpk.
Process capability is the actual variation in the process specification. The steps in a process capability
study are:
1 2 3
Plot and analyze the
Plan for data collection Collect data results
Getting the appropriate sampling plan for the process capability studies depends on the purpose and
whether there are customer or standards requirements for the study.
For new processes, a pilot run may be used to estimate process capability.
The objective of a process capability study is to establish a state of control over a manufacturing
process and then to maintain control over a time period.
To select a characteristic for a process capability study, it should meet the following requirements:
● The characteristic should indicate a key factor in the quality of the product or process.
● It should be possible to influence the value of the characteristic through process adjustments.
● The operating conditions that affect the characteristic should be defined and controlled.
For attribute or discrete data, process capability is determined by the mean rate of non-conformity
and DPMO is the measure used. For this, the mean and standard deviation have to be defined.
Defectives Defects
● 𝑝 is used for checking process capability ● 𝑐 is used when the sample size is
for constant and variable sample sizes. constant.
● 𝑢 is used when the sample size is
variable.
𝑝 (1 − 𝑝) 𝑛 𝑢
; 𝑐
𝑎
! 𝑝, 𝑐, and 𝑢 are the equivalent of the standard deviation for continuous data.
The activities carried out in the measure phase are MSA, collection of data, statistical calculations,
and checking for accuracy and validity.
This is followed by a test for stability as changes cannot be made to an unstable process.
A A process becomes unstable due to special causes of variation. Multiple special causes of variation lead to
instability. A single special cause leads to an out-of-control condition.
● Include the many sources of variation within a ● Include factors external to and not always acting
process on the process
● Have a stable and repeatable distribution over a ● Sporadic in nature
period ● Contribute to instability to a process output
● Contribute to a state of statistical control where ● May result in defects and have to be eliminated
the output is predictable ● If indicated by Run charts, point to the need for
root cause analysis
Data
Stat -> Quality Tools -> Run Charts 30
20
10
If p-values for any of the last 4 values 0
1 2 3 4 5 6
provided in the chart is less than 0.05, Sample
Number of runs about median: 4 Number of runs or down: 3
Expected number of runs: 4.0 Expected number of runs: 3.7
the process has special causes of Longest run about median: 2 Longest run up or down: 3
Approx P-Value for Clustering: 0.500 Approx P-Value for Trends: 0.220
Approx P-Value for Mixtures: 0.500 Approx P-Value for Oscillation: 0.780
variation, and the chances of the
process going unstable is high.
The causes of variation existing in a process are used to verify its normality or stability.
● If special causes of variation are present in a process, process distribution changes and the output
are not stable. The process is not said to be in control.
● If only common causes of variation are present in a process, the output is stable and the process is
in control.
For a stable process, the control chart data can be used to calculate the process capability indices.
Monitoring techniques refers to how well we can monitor the process capabilities. Some of the
monitoring techniques are as follows:
● Statistical Process Control techniques;
● Control Charts for monitoring both process capability and stability; and
● Appropriate charts are used depending on the data type (attribute/discrete and
variable/continuous).
Answer: b.
Explanation: KPOV stands for Key Process Output Variables.
a. Accuracy
b. Precision
c. Linearity
d. Stability
a. Accuracy
b. Precision
c. Linearity
d. Stability
Answer: a.
Explanation: Accuracy demonstrates the degree of conformity of measured value to its true
value.
a. Accuracy
b. Precision
c. Linearity
d. Stability
a. Accuracy
b. Precision
c. Linearity
d. Stability
Answer: b.
Explanation: The degree to which the repeated measurements under unchanged conditions
show the same results is called Precision.
a. Bias
b. Precision
c. Linearity
d. Stability
a. Bias
b. Precision
c. Linearity
d. Stability
Answer: a.
Explanation: It is the Measurement Bias that is consistently higher or lower than the
expected value with the same magnitude.
a. Measurement errors
b. Linear Scale
c. Positive bias
d. Consistency of bias
a. Measurement errors
b. Linear Scale
c. Positive bias
d. Consistency of bias
Answer: d.
Explanation: Measurements performed at smaller levels and measurements at higher levels
have consistent bias over the range of measurements.
a. variance
b. standard deviation
c. one
d. mean deviation
a. variance
b. standard deviation
c. one
d. mean deviation
Answer: a.
Explanation: Variance, denoted by σ2, is given by 𝑥 − 𝜇 2 𝑛, that is, the sum of the
squared deviations of a group of measurements from their mean, divided by the number of
measurements.
Answer: a.
Explanation: Repeatability is determined by examining the variation between individual
inspectors and their measurement readings.
a. cannot be determined.
b. is determined by the control limits on the applicable attribute chart.
c. is defined as the average proportion of nonconforming product.
d. is measured by counting the average nonconforming units in 25 or more
samples.
a. cannot be determined.
b. is determined by the control limits on the applicable attribute chart.
c. is defined as the average proportion of nonconforming product.
d. is measured by counting the average nonconforming units in 25 or more
samples.
Answer: c.
Explanation: The average proportion may be reported on a defects or defectives per million
scale by multiplying the average (𝑝, 𝑐, 𝑢) by 1,000,000.
a. 1/1
b. 1+1
c. 1:1
d. 0
a. 1/1
b. 1+1
c. 1:1
d. 0
Answer: c.
Explanation: Perfect correlation, either positive or negative, is when a dependent variable
changes equally with a change in the independent variable, and is represented by 1:1.
a. Gage Repeatability
b. Gage Reproducibility
c. Gage Repeatability and Reproducibility
d. Gage Variation
a. Gage Repeatability
b. Gage Reproducibility
c. Gage Repeatability and Reproducibility
d. Gage Variation
Answer: b.
Explanation: Gage reproducibility is the variation in measurement when different operators
use the same gage to measure identical characteristics of the same part.
a. Equipment Variation
b. Appraiser Variation
c. Process Variation
d. Product Variation
a. Equipment Variation
b. Appraiser Variation
c. Process Variation
d. Product Variation
Answer: a.
Explanation: Repeatability or Equipment Variation or EV occurs when the same operator
repeatedly measures the same part or same process, under the same conditions, with the
same measurement system.
Answer: c.
Explanation: Natural process limits are derived from real-time values. Specification limits
are defined by the customer.
Answer: c.
Explanation: If the process limits are within the specification limits, it means the process is
in control, and no action is required.
a. Stability
b. Bias
c. Linearity
d. Process capability
a. Stability
b. Bias
c. Linearity
d. Process capability
Answer: a.
Explanation: Stability refers to the ability of a measurement system to show the same
values over time when measuring the same repeatedly.
a. Central Limit
b. USL and LSL
c. Natural Specification Limit
d. Specification Limits
a. Central Limit
b. USL and LSL
c. Natural Specification Limit
d. Specification Limits
Answer: b.
Explanation: The USL and LSL will be as provided by the user.
Answer: a.
Explanation: In moderate positive correlation, as the value of X increases, the value of Y also
increases, but not in the same proportion.
a. Standard distribution
b. Frequency distribution
c. Cumulative frequency distribution
d. Normal distribution
a. Standard distribution
b. Frequency distribution
c. Cumulative frequency distribution
d. Normal distribution
Answer: b.
Explanation: Frequency distribution is the grouping of data into mutually exclusive
categories showing the number of observations in each class.
Answer: d.
Explanation: The key objective of the measure phase is to gather as much information as
possible on the current processes.
Here is a quick ● Process definition helps in defining the process and capture inputs and
recap of what we
outputs in the X-Y diagram.
have learned in this
lesson: ● Statistics refers to the science of collection, analysis, interpretation, and
presentation of data. The major types are Descriptive Statistics and
Inferential Statistics.
● Precision refers to getting repeatable measurements and accuracy refers to
getting measurements closer to the actual measurement.
● Process Capability is how well the process can potentially run, if the sources
of variation are controlled and process runs on target.
Here is a quick ● Measures of central tendency, dispersion, and graphical methods are used
recap of what we
to analyze sample data.
have learned in this
lesson: ● MSA is used to calculate, analyze, and interpret a measurement system's
capability using Gage Repeatability and Reproducibility.
● Variation in a process can be because of common causes and special causes
which determine the bias, linearity, stability, capability, distribution and
defects of a process.