Professional Documents
Culture Documents
Contact Us
Revision History
View Cart
My Account
HOME
SPC SOFTWARE
SPC TRAINING
SPC CONSULTING
SPC BLOG
PRICING/ORDERING
ABOUT US
Your supplier has sent you the process capability charts you requested. The supplier has
produced some very nice charts, obviously generated with some fancy software package – and, of
course, with all those accompanying statistics. You know, things like Cpk, Ppk, sigma level, ppm out
of spec and so on. Very pretty charts. Looks like your supplier is really performing for you. You
note one capability chart that has a Ppk = 1.14 and a Cpk = 2.07. Why are those different? Well, it
doesn’t matter. The Cpk is above 1.33, which is what you asked the supplier for. Time to work on
something else.
You just missed a very important piece of information about your supplier’s performance. Know
what it is?
Cpk and Ppk are two commonly used measures of process
capability – how well your process is meeting your customer specifications. Software today makes
it easy to plug the data in and generate the results. But, far too often, we simply take the results and
move forward without thinking about what they mean. In this month’s SPC Knowledge Base
publication we take an in-depth look at Cpk and Ppk. What are they? What are they
measuring? What do the values means? Which one should you rely on? Some of the answers may
well surprise you.
In this publication:
Introduction
Process Capability Review
Cpk and Ppk Review
Within Subgroup Variation vs Overall Variation
That Little Issue of Statistical Control!
Two Processes – Same Data, Same Ppk
Two Processes – Same Data, Different Cpk
Summary: So, Who Wins: Cpk or Ppk?
Quick Links
INTRODUCTION
We did not mention Ppk in either publication. Time to change that in this publication.
Process capability analysis answers the question of how well your process meets specifications –
either those set by your customer or your internal specifications. To calculate process capability,
you need three things:
This is true for both Cpk and Ppk. We will assume that our data are normally distributed.
Process capability indices represent a ratio of how far a specification limit is from the average to the
natural variation in the process. The natural variation in the process is taken as being 3 times the
process standard deviation. Figure 1 shows the general set up for determining a process capability
index based on the upper specification limit (USL) with “s” being a measure of the process variation.
Both Cpk and Ppk are the minimum of two process indices. The equations for Cpk and Ppk are
shown in Table 1.
The X with double bar over it is the overall average. In the Cpk equations, σ is used to estimate the
process variation. σ is the estimated standard deviation obtained from a range control chart. In the Ppk
equations, s is used to estimate the process variation. s is the calculated standard deviation using all the
data.
Thus, the major difference between Cpk and Ppk is the way the process variation is estimated. So
what is the difference between these two?
The question of Cpk vs Ppk is really a question of within subgroup variation, σ, vs overall variation,
s. Let’s start with s or the calculated standard deviation, which is given by the equation below.
N is the total number of data points. Look at the summation term under the square root sign. This
term is squaring how far each individual data point is from the overall average, as shown in Figure
2.
According to the equation, you add up the squares of those deviations, divide by the total
number of points minus 1 and take the square root. You can view the calculated standard deviation
as the average distance each individual data point is from the overall average. Note that you use all
the data in the calculation. This is why this standard deviation is sometimes called the overall
variation. It accounts for all the variation in the data.
R is the average range and d is a control chart constant that depends on subgroup
2
size. So, σ accounts for the variation within the subgroup. It may or may not account for all the
variation as we will see below.
All our publications on process capability have stressed the need for the process to be in statistical
control. How often is this just ignored? Last month we gave the process capability checklist
developed by Dr. Don Wheeler to paint a true picture of your process capability. That checklist had
five items:
1. Plot your data using a control chart to determine if the process is in statistical control
(consistent and predictable)
2. For a process that is in statistical control, construct a histogram with the specifications
added
3. For a process that is in statistical control, calculate the natural variation in the process
data
4. For a process that is in statistical control, calculate Cp and Cpk
5. Combine these four items together and present them all when talking about process
capability
See how often “for a process that is in statistical control” occurs? The point is
that Cpk (and Ppk) have no meaning unless your process is in statistical control. And for the kicker:
if your process is in statistical control, Cpk and Ppk will be very close to being equal. In fact, if you
compare Cpk and Ppk values for a given process, you will find the following to be true:
In addition, if the process is not in statistical control, Cpk and Ppk have no meaning. You cannot be
sure of getting similar values in the future because the process is not consistent and predictable. We
will explore this further in the following example for two processes with the same data – just in a
different order.
We will use two processes that have the same data (the data from last month’s
publication). Suppose you are taking four samples per hour and forming a subgroup. You want to
determine if your process is capable of meeting specifications (LSL = 65 and USL = 145). The data
for the 30 subgroups for Process 1 are shown in Table 2.
Day X1 X2 X3 X4 Day X1 X2 X3 X4
Day X1 X2 X3 X4 Day X1 X2 X3 X4
Since they are the same, the data in Tables 2 and 3 have the same average and the same standard
deviation.
Average = 98.98
Now draw a histogram for the data in Table 2 and a histogram for the data in Table 3. Throw in the
specifications: LSL = 65 and USL = 145. If you do this, you will discover that the histograms are
exactly the same – just what you expect since the data are the same. Figure 4 shows the histogram.
This process looks very good – definitely within the specifications. You are one happy person. You
can calculate Ppk for Process 1 and Process 2. Since the average and standard deviation are the
same, Ppk will be the same for both processes. The calculations are:
So, Ppk = 1.14 for Process 1 and Process 2.
But wait – what is the value for Cpk for these two processes? Are the same? No, they are not the
same. Remember Cpk is based on the within subgroup variation. And although the data are the
same in both processes, they are in different order – which changes the within subgroup
variation. To calculate the Cpk values, you need to estimate the standard deviation () from the
range chart and the overall process average from the X chart. This is a very important difference
between the Ppk approach and Cpk approach. Ppk simply uses calculations; Cpk uses control charts
to estimate the average and the process variation. It is the way you tell the future about your
process.
The control charts for Process 1 are shown below. The range chart is shown in Figure 5.
Note that since the range chart is in statistical control, the within subgroup variation is consistent
and predictable. The value for the process standard deviation is “valid.” The process that generated
it is consistent and predictable and will remain so as long as the process stays the same.
To calculate Cpk, you need an estimate of the average. That comes from the X control chart. Figure 6
shows this chart.
The X chart is in statistical control. This means that you have a good ("valid") estimate of the process
average. You can now use that average, along with σ to determine the Cpk values
.
Now compare the results for Ppk and Cpk for Process 1:
When a process is in statistical control, the within subgroup variation is a good estimate of the
overall process variation, i.e., σ= s.
Cpk is essentially the same as Ppk in this case. They are giving you the same information.
Let’s move on to Process 2. The range chart for Process 2 is shown in Figure 7.
Again you need an estimate of the average to determine Cpk. This comes from the X̅ chart, which is
shown in Figure 8
This means that you do not have a good estimate of the process average. It is moving around. What
will the next subgroup average be? You have no idea where it will be. The process is not consistent
and predictable. You can’t really calculate the Cpk value.
Many times folks just simply ignore this fact and move full steam ahead with calculating Cpk. After
all, the calculated average is 98.98. The Cpk calculations are as follows:
So, Cpk for Process 2 is 2.07. Now compare the results for Process 1 and Process 2.
The reality is that Cpk is a better estimate of the potential of your process. It represents the best
your process can do and that is when the within subgroup variation is essentially the same as the
between subgroup variation. This is what it means to be in statistical control. And if the process is
in statistical control, Cpk is essentially the same as Ppk. So, you really don’t need Ppk in this case.
And if your process is not in statistical control, you have something to work on – Cpk and Ppk are
pretty well meaningless – except for the fact that values of Cpk and Ppk that are widely different are
indications that the process is not in statistical control.
But you know that already because you are following the process capability checklist from Dr.
Wheeler. Always start by looking at the data in control chart format.
QUICK LINKS
SPC Training
SPC Consulting
Ordering Information
Thanks so much for reading our publication. We hope you find it informative and useful. Happy
charting and may the data always support your position.
Sincerely,
Connect with Us
Comments (35)
Anonymous
o reply
Hi!!I am a regular reader of all the articles posted on your website and they are really very informative
as well as useful. Thanks a lot for posting.While going through this article i feel it need corrections in
two places:1. We are differentiating standard deviation between Cpk and Ppk with the help of sign s &
sigma. But i have seen in formules we have used these signs but when i see the explanation then in
both cases we are using "s" as symbol for standard deviation. It is creating little confusion while
understading the difference.2. You mentioned above checlist of 5 terms as adviced by Dr. don. In case
of first point it is clearly mentioned that we need to construct the control chart to see if our data is in
statistical control. Now the limits or we can say natural variation is already calculated in this first point
so why are we asked to do same in point number 3 " For a process that is in statistical control,
calculate the natural variation in the process data".Kindly clear these doubts.Thanks again for posting
such a wonderful posts...RegardsAshok Pershad
Bill McNeese
o reply
Hello. Thanks for your comment. Yes, different books/articles/people handle s and
sigma differently - or call them both s as you said. There is not consistency in the
approach. It would be better to use the terms the "within" and "overall" to describe
which one you are talking about. I typcially use "s" for the overall and " " for the within.
The natural variaiton is not the same as the control limits. THe natural variation is
6. The control limits are based on what you are plotting, i.e., the subgroup averages in
the examples in this article.
Best Regards,
Bill
Dnyandeo
o reply
Thanks really helpful. since it is in simple and plain hence carries no confusion. I have been regular
reader of your articles.
palash
o reply
hello bill Could you please explain how you calculate UCL and LCL in X-bar chart you shown in your
article. Generally UCL= X-bar+3* sigma and LCL=X-bar-3*sigma. Please explain this!!
Oct 04, 2016
Bill McNeese
o reply
The controls limts are referred to as three sigma limits, but it is three sigma limits of what
is being plotted. In this case, that is subgroup averages. Plus the value of sigma is
estimated from the average range. We have a two part series on Xbar-R control charts in
the SPC Knowledge Base. The first part is here:
https://www.spcforexcel.com/knowledge/variable-control-charts/xbar-r-cha...
We also have an article that explains where control lmits come from:
https://www.spcforexcel.com/knowledge/control-chart-basics/control-limits
Let me know if these do not answer your questions.
Bill
Henry
o reply
What do you suggest is more valuable when the subgroup of the control chart is 1? Is there any value
in estimating sigma from a range chart? or the overall variation is better and consequently Ppk is
more valuable? What are your thoughts?Thank you
Bill McNeese
o reply
When individual values are used, the moving range chart is used to estimate sigma. The
moving range chart uses the range between consecutive points. So, sigma estimated
from the average moving range still looks at the variation in individual values. I don't
think it makes a difference if individuals values are used or subgroups are used. I still
find Cpk more valuable because it says what the proess is capable of doing in the short
term. Of course, if in control, Cpk and Ppk will be the same essentially.
Samir
o reply
Thank you very much. Good explanation, need just 5 minutes to understand this concept.
Jul 01, 2017
Ashok
o reply
Hi Bill,You have said that the X bar chart for the process is not stable and the points are not in control
limits, Then how could we rely on cpk value. Because for an inconsistent process it shows the value to
be higher than 2. So how could you conclude that cpk is better and ppk.Dont know whether my
understanding is wrong. please explain. Thanks
Bill McNeese
o reply
Hello. The point I am trying to make is that many people just calculate the Cpk value
without considering whether or not the process is in statistical control. This one is
not. So the Cpk value has no meaning - nor does the Ppk value. Since the process is not in
control, you have no idea of what hte results will be in the future. To have meaning, the
process has to be in control. If it is, then Cpk and Ppk will be very close.
Brown-China
o reply
Hi Bill,First thanks for your information, it's really useful.I have two questions.1.many people think
PPK is one index that already consider speical cause and common cause ,thus they also think there's no
need to consider if the process is in statistical control or not before caculating PPK.I saw you said if not
in statistical control, CPK and PPK are both meaningless, how do you understand the
difference?2.Whatever the process is in development or after mass production, always caculate CPK
first?
Bill McNeese
o reply
If your process does not show some degree of consistency (being in statistical control), it
is impossible to know what the near term future looks. You don't know where the
process will be so, calculating anything on that process (average, Cpk, Ppk, etc.) doesn't
give you any real information because you won't get similar results in the future. If you
have lots of data the impact of special causes can be less when calculating the standard
deviating but not from estimating it from a range control chart. I would always calculate
Cpk, but you can calculate both. If they are similar, the process is probably in control.
bala
o reply
Bill McNeese
o reply
If there is a large difference between the two, it usually means that the process is not in
statistical control.
Steve-o
o reply
Hi Bill, In the equation below figure 5 and again in the equation below figure 7 you use 2.059 for
d2. But there are 30 observations in the sub group for the averages. Why use the d2 for a subset of
4? Was that arbitrary?
Bill McNeese
o reply
d2 is a constant based on subgroup size, in this example, 4 since there are four samples
per subgroup. Yes, my choice of 4 was arbitrary for this example.
SD
o reply
Excellen material
o reply
Hello Bill, Really great article about SPC.! I got one question, the only purpose of calculating PPK seems
to compare with CPK in order to see if the process is in statistical control or not. PPK looks quite
meaningless, doesn't it? There are different articles/opinions that CPK/PPK reprents short/long term
capability of process, how do you think? THANKS!
Bill McNeese
o reply
Thanks. Short and long term. Yes i have read those. Usually Cpk is short tand Ppk is
long. It is a matter of how quickly your process changes i image. Only use for Ppk is if
you can't get your process under control ever. But in that case you never know what it
will be next time. So, quite meaningless actually as you say.
Adriana Cortes
o reply
I haven't seen tables with d2 for a subgroup of 1 but ussing your logic about the difference between
Cpk and Ppk when the values are shuffled I will think that for both the value will be the same?How do
you calculate cpk for a subgroup of 1?
Bill McNeese
o reply
If there are individual values, the average range is the average of the range between
consecutive samples. d2 is 1.128 in that case.
Noemi
o reply
Bill McNeese
o reply
Bana
o reply
Hi, My question is how did you get the value of d2 is 2.059? Could you explain?
Bill McNeese
o reply
d2 is a control chart constant that depends on subgroup size. For n = 4, the value of d2 is
2.059. For more information, please see this
link: https://www.spcforexcel.com/knowledge/control-chart-basics/control-limits
David203
o reply
The example throws me off. I get it that the goal is to be consistent, but in all things process related
error closer to zero is good - or in this case Cpk greater is better. In the second data set the limits pull
in naturally because the data shows higher consistency. While that does produce control charts that
show greater variance from the norm based on the small sample it still exceeds the process
requirements. If a process control chart results in a Cpk increase (bigger is better) why would this
mean the process is out of control? The x-bar hart in data set 2 shows out of range based on the small
set, but the ultimate goal of exceeding expectations is being met. The process should not be
compromised because a subset performed well and had some outliers that still fell into the greater
range. Did I miss something?
o reply
Hello David,
It is all about consistency. Unless your process is in control you can't predict what it will
make in the future. So even though an 'out of control" process is within specification, it is
not good - for your or your customer probably. Bringing it into control with reduce the
variation and make the process even better. Cpk increasing does not mean that hte
process is out of control. Cpk has no meaning if the process is out of control because you
don't know the average or the variation.
David203
o reply
When you say Cpk has no meaning if the process is out of control because
you don't know the average or the variation, I disagree based on your
example. If your Ppk is less than your Cpk you are closer to, not futher from,
your average. And your variation is better than, not worse than, your
established benchmark. This would indicate your process is performing in
control, not out of control. It would indicate you could improve your process
and the data is telling you that you could do better, but that would be a
business decision. It would make no sense at all to start looking for ways to
decrease your Cpk to bring it closer to your Ppk in your process because it is
becoming more consistent. If your example was indicating Cpk dropping
consistenly lower than Ppk then I would agree with this example, but this is
not the case - your example shows Cpk significantly better than Ppk - which
is good and in control.
Bill McNeese
o reply
Pavan
o reply
Hi Sir, Based on the example given, before calculating cPk the data isnt verified for natural distribution.
The data provided is resulting in p value 0.039 (Using Anderson Darling test for Normality must be
greater than 0.05) which denotes data is not following a normal distribution (Considered 120 data
points from example). Now, whether cPk can be calculated for a data which doesnot follow natural
distribution without transforming the data?Please clarify / correct me..Thanks in advance
Bill McNeese
o reply
I didn't worry about checking normality because the histogram looks close enough to
me. Also remember that the Anderson-Darling test will give wrong indications for large
data sets - which 120 probably is. If you take the first sample from each subgroup and
run the normal probably plot wtih those 30 points, the p value is .79 - which says it is
normally distributed. For large data sets, rely on the histogram - not the normal
probability plot - to decide about normality - and of course your knowledge of the
process.
Ezhilarasan
o reply
HiCan you tell my observation is Right or Wrong? 1. When between subgoup variation is more 1a.
Material batch variation. 1b. Operator variation 3c. May be measurement variation. from one sub
group to another subgroup.With in sub group variation is less wehn Machine give the output
(Standard deviation) range is same.Kindly reply I am right or wrong
Bill McNeese
o reply
I am not sure I understand what you are asking. If the between subgroup variation is
much larger than the within subgroup variation, the control limits will be very tight and
you should look at using a Xbar-mR-R chart.
Mark Anderson
o reply
In the automotive manufacturing industry, the standard for when to use Ppk and Cpk differs some.
Maybe you could validate or explain the reasoning for this. In the automotive industry, Ppk is used for
initial process studies and is based off a single run. Cpk is used to determine capaibility over multiple
runs. My understanding of this is because Ppk is a measure of process performance, and Cpk is a
measure of process capability. And until you introduce all the different sources of variation such as
component lot to lot, operator, changeover... etc, you cannot say the process is stable or capable. And
this needs to be done over multiple runs. From a single run you can only analyze the current
performance. And that is why for initial process studies with a single run, Ppk is used to evalauate the
performance of the process and determine whether it meets the expectations.
Bill McNeese
o reply
Thanks for the insights. I agree with what you say. A true process capabilty study has to
have the potential sources of variation present.
Leave a comment
Your name
Comment *
Paragraph
Styles
Path: p
Disable rich-text
More information about text formats
Text format
Web page addresses and e-mail addresses turn into links automatically.
Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <h1> <h2>
<h3> <h4> <h5> <h6> <img> <hr> <div> <span> <strike> <b> <i> <u> <table> <tbody> <tr>
<td> <th>
Lines and paragraphs break automatically.
CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam
submissions.
Process Capability
Software (3)
Complete Teaching Guides to SPC (11)
Connect with Us
Click here to sign up for our FREE monthly publication, featuring SPC and other statistical topics,
case studies and more!
SPC for Excel is used in over 60 countries internationally. Click here for a list of those countries.
Copyright © 2019 BPI Consulting, LLC. All Rights Reserved.
Site developed and hosted by ELF Computer Consultants.