Atlas Copco - 4. Statistical Analysis, PG PDF

Pocket guide
to statistical analysis techniques

Pocket guide to statistical
analysis techniques for use
with tightening tools
Chapter ..............................................................................Page
1. Introduction.......................................................................4
2. Basic statistics ...................................................................5
2.1 Variation........................................................................5
2.2 Distribution...................................................................6
2.3 Histogram .....................................................................6
2.4 Mean value ...................................................................6
2.5 Standard deviation ........................................................7
2.6 Estimation of a normal distribution .............................9
Sample mean and standard deviation ...............................10
3. Accuracy requirements ..................................................11
3.1 Meanshift and combined scatter.................................11
Example ............................................................................12
4. Understanding processes................................................13
5. Capability ........................................................................14
5.1 Cp................................................................................14
5.2 Cpk..............................................................................15
5.3 When is a process capable? ........................................16
5.4 Machine capability indices.........................................18
5.5 What else is there to think about? ..............................18
6. Control charts .................................................................19
6.1 X-bar charts ................................................................20
6.2 The subgroup ..............................................................21
6.3 Alarms.........................................................................22
6.4 Range charts ...............................................................22
6.5 Control charts conclusion...........................................23
Summary .........................................................................23
Appendix..........................................................................24
A1. Example of statistics calculation ...............................24
A2. Example of capability calculation .............................28
A3. Example of control chart calculation ........................29
A4. Analysis of assembly tool performance –
ISO 5393 Calculation ................................................32
P O C K E T G U I D E T O S TAT I S T I C S 3
1. Introduction
The purpose of this guide is to explain the basics of statistics

and how statistics can be used in production. You will learn
that with the aid of statistics we can compare tools with each
other, we can tell whether a tool is good enough for a speci-
fic application, and by using Statistical Process Control
(SPC) we can see how a production process develops over
time. Our hope is that you, after reading this guide, will have
a general knowledge and understanding of the potential of
using statistics as a tool in production.
4 P O C K E T G U I D E T O S TAT I S T I C S
2. Basic statistics
2.1 Variation
Understanding statistics is much about understanding varia-
tion. Variation is present everywhere, in nature as well as in
industrial processes. In industrial processes, even a slight
deviation from the target value, a dimension for instance,
may have strong influence on the functionality of the finis-
hed product. This means it is important to understand, and
in some cases control, variation.
Figure 1. Variations in air
There are two different kinds of variation. Random varia- pressure and operator
tions are predictable, always present and with many contribu- influence are examples of
ting causes. Examples of random variations are small varia- random variations.
tions in hole diameter, inconsistent friction, operator influen-
ce and variations in air pressure. It is hard to isolate one of
these causes. The variations are tackled by improvement of
the process. Random variations are natural and depend on the
process and its environment. They are also called common
causes.
Systematic variations are sporadic and isolated. They are
not predictable but it is often easy to pinpoint the cause. They
are tackled by controlling the process. Systematic variation
has a determined cause and can often be identified and elimi-
nated. Examples are machine adjustments, wear of tools and
human error. They are also called special causes. Figure 2. Human errors like
missing washers and using
A great deal of importance has been placed on the use of wrong screws are exam-
ples of systematic varia-
statistical analysis techniques to control the quality of the tions.
assembly process. The traditional method of using these tech-
niques is to analyze what has already occurred and when a
problem is identified to adjust the process accordingly. It is
now becoming increasingly common to use statistical techni-
ques to predict how the process will perform in the future
and to identify systematic variations and adjust the process
before we end up with faulty products.
2.2 Distribution
Consider a tightening process where we measure the torque
applied to a bolt. As you know, we would not achieve the
same readings for all tightenings. Suppose we collect enough
readings to create a plot of the frequency (the number of
times a particular reading occurred) against the actual torque
readings. The result would be a plot similar to the one in
figure 3 below. In statistical analysis this curve is known as a
“distribution”. There are many different types of distribution,
but the one that best describes this example (and others like
it) is called the Normal or Gaussian distribution.
A normal distribution is always symmetrical and determined

by the mean and the standard deviation. A normal distribu-
tion only occurs when random variations affect the result.
2.3 Histogram
A histogram is when you divide the results into categories
(for example all results between 20 – 21 Nm). Then it is pos-
sible to create a diagram by counting the number of results in
every category and putting them into a diagram. By doing
this it is possible to visualize the distribution with a fairly
limited number of results.
2.4 Mean value

Figure 3. Histogram.
A normal distribution can be found everywhere, both in natu-
re and in industrial processes. If we have a big sample of
measures, i.e. we have made 1000 tightenings with one tool,
we can make a histogram. The more tightenings we have, the
better curve we get. If we were to measure the height of all
Swedish men, we would achieve an average (mean value) of
1.80. The mean value is the most common value in a normal
distribution. There are not that many men that are really tall
or really short. Another example could be when you cut off a
stick. The target value is 20.00 cm and this would probably
also be the mean value. However, some parts become only
19.90 and others 20.10, which is due to the natural variation
Figure 4. Normal of the process and is normal.
distributions can
be found every-
where. The height
of people is one
example. Another
example can be if
you try to cut off
sticks to the same
length.
2.5 Standard deviation
If a tool is used for a very large number of tightenings at a
set torque of e.g. 30 Nm, it is unlikely that every single tigh-
tening will reach this torque value exactly. This will be the
case even if the tool is run on the same screw joint, a test
fixture. Random factors, such as material wear and different
handling of the tool may cause the applied torque to exceed
or fall below the intended torque. The readings are said to
deviate from the mean and we measure this with what is
known as standard deviation.
It is not essential to fully understand the formula, which is
presented later. But it is helpful if you know how to calculate
it, and it is crucial that you understand what it is! The stan-
dard deviation is the amount by which each reading is most
likely to deviate from the average.
What is the practical use of standard deviation? We have alre-
ady said that the mean tells us the average value of the distri-
bution (all different tightenings) and standard deviation indi-
cates the scatter. We can use it to estimate how many of our
values will come within a certain range. The standard devia-
tion may be more accurately described as a calculation of
how far a known percentage of the distribution lies from the
mean.
σ is a letter in the Greek alphabet and it is used to symbolize
the deviation from the mean (average) of any distribution.
For a business or manufacturing process, the σ value indica-
tes how well that process is performing. A low σ value indi-
cates that most of the values are close to the target. A high σ
value indicates that the spread is big and that the values devi-
ate more from the target value.
If you have 20 values of a population, you are able to group

them as shown in the figure. We make the assumption that
they belong to a normal distribution. This is in fact the “area”
within which you will get the next tightening. There is a
100% probability of getting inside the entire range. It is mat-
hematically proven that there is a
• 68% certainty that all data lies between +/– σ
• 95% certainty that all data lies between +/– 2 σ, and
• 99.7% certainty that all data lies between +/– 3 σ.
It is an important characteristic of the normal distribution

that the standard deviation is symmetrical around the mean, Figure 5. We always know how
and always covers the same percentage of the distribution. many percent of our values we
This is a mathematical law. will have within a certain range.
This now brings us to something very useful. Now that we
know the percentage of the values that will end up within a
certain σ boundary, we can predict how the process will
behave in the future. Do you remember the discussion about
random and systematic variation? We said that for a normal
distribution all systematic variations are eliminated and only
random variation is present. We now also know that 99.7% of
all values are within 6σ, (or +/– 3σ). This enables us to make
an important assumption: even though 0.3% of all tightenings
will fall outside the 6σ limits for a normal distribution, we
assume that all tightenings outside these limits happen becau-
se of systematic variations in the process. This means that
something new has entered the process – it is not under con-
trol any more.
To make things clearer, we assume that as long we have tigh-

tenings within the 6σ limits, the process is only affected by
random variations and is under control. When we have tighte-
nings outside the 6σ limits, the process is affected by syste-
matic variation and is not under control. When this happens,
this means that something new and strange has started to
affect the tightening process and we need to find the reason
for this and eliminate it. The following graphs show a com-
parison of two different normal distributions.
Figure 6. The first pic-

ture shows two curves
with the same avera-
ge, but different devia-
tion. The second pictu-
re shows two curves
with same deviation
but different averages.
2.6 Estimation of a normal distribution
When we talk about measurements or readings on an applica-
tion, we can calculate an average and a standard deviation.
If we were to measure an infinite number of tightenings, we
would know for sure that we have the true value of the mean
and the standard deviation. This is the population mean and
the population standard deviation. But in reality this is not
possible and we have to rely on a limited number of tighte-
nings. In statistics we talk about a sample; in the tightening
business we talk about subgroup or a batch. This means that
we cannot really know for sure that our calculations (mean
and standard deviation) are correct, since they are only based
on a limited number of tightenings. In fact, what we have is Figure 7. It is impossi-
an estimate of the real values. The more tightenings we have ble to measure the
entire population. We
on which to base our calculations, the more sure we can be have to rely on a limi-
that we are close to the population mean and standard devia- ted number of values,
tion. a sample or a batch.
We say that the average value of the distribution is the popu-

lation mean (µ) and the scatter is represented by the popula-
tion standard deviation (σ). The population mean (µ) is cal-
culated by:
n
Σ xi
µ= n i=1
Σx – the sum of all tightenings, divided by the total number

of tightenings (n).
The population standard deviation (σ) is calculated by
Where:
xi is the value of each individual occurrence, the ith
measurement of variable x.
n is the total number of occurrences in the population

n
Σ xi is the value of all occurrences added together
i=1 (the sum)
i
is the sum of all values of (xi-µ)2
We take the value of each individual occurrence minus m, the

mean, and square this new value. Then we add each new
value together. We now divide this by the number of tighte-
nings. Finally, we need to take the root of this total value, as
we have (Nm)2 and need Nm, and we get the population stan-
dard deviation. The square and the root only exist because we
want to get rid of the positive and negative deviations from
the mean.
However, in practice it is very rare that we can measure every

occurrence of the data. In fact, n would then have to be infi-
nite, which of course is impossible. Instead we use a repre-
sentative sample to predict the mean and standard deviation
of the population.
Sample mean and standard deviation

We calculate sample mean ( ) in the same way as for the
population mean (µ):
n
Σ xi
i=1
x= n
The calculation for Sample standard deviation (s) differs
slightly from the population standard deviation (σ):
Where
xi is the value of each individual occurrence
in the sample
n is the total number of occurrences in the
sample
Σ xi is the sum of the values of all occurrences
in the sample
i is the sum of all values of (xi - )2
The use of (n – 1) instead of (n) gives more accurate estimate

of the population standard deviation, σ, and is very important
when small sample sizes are used. So remember that we can
never use the total population in our calculations; that is
impossible. We have to use smaller samples and calculate
estimates of the real average and the real standard deviation.
Thus, the sample mean ( ) is an estimation of the population

mean (µ).
The sample standard deviation (s) is an estimation of the
population standard deviation (σ).
10 P O C K E T G U I D E T O S T A T I S T I C S
3. Accuracy requirements
In a tightening application there are often accuracy require-

ments of the tools. Accuracy requirements are written as a
target torque +/– a maximal acceptable deviation from the
target, for example +/– 10%. The accuracy of a tool is often
calculated as 50% of the natural variation (3σ) divided by the
target value. This makes it possible to compare different tools
at a certain target value, without relating them to a certain
application (tolerances). As you will notice in the next chap-
ter, the accuracy calculations are similar to some capability
calculations (in accuracy calculations we compare the natural
variation to the mean value, in capability calculations we
compare the natural variation to tolerance demands in the
application)!
If the accuracy requirements are 40 Nm +/– 10%, we have to

check that 3s is within 10%, or 100 * 3σ/Ave is less than
10%. Assume that we test the tool and achieve a mean value
of 40 Nm, and a standard deviation of 1.2 Nm. Then we cal-
culate the accuracy: (3*1.2 / 40) = 9%. We now see that the
tool is accurate enough to do the job.
3.1 Mean shift and combined scatter

Mean shift is what occurs when you run a tool on both hard
Figure 8. The mean shift is the dif-
and soft joints. You will most probably get two different ference between the mean values
mean values, a higher value for the hard joint, with two diffe- of the hard and the soft joints.
rent distributions. The difference between these two mean
values is the mean shift. We want to find the limits (compa-
rable to the normal distribution) where the probability of get-
ting a torque outside these limits is 99.7% on the hard or soft
joint. This is the combined scatter and corresponds to 6σ on
the normal distribution. Once we have the combined scatter
we can relate this to the combined average. This gives us
something that is often referred to as the “accuracy”.
Written as a formula it will look like this:

Accuracy = 100 x 0.5 ((Avehard +3σ hard) – (Avesoft –
3σsoft))/Ave
Where Ave = (Avesoft+Avehard)/2 (the combined average). Figure 9. Combined average and
combined scatter.
This is normally true, but we cannot know for sure that the
distribution will look like this. We can, for example, have a
negative mean shift. We need to check which of the limits are
the outermost.
Adjusted, the formula would look like this:
Accuracy = 100 * 0.5 Deviation/Ave

Where Deviation = max (Avehard +3σhard, Avesoft +3σsoft)– min
(Avesoft –3σsoft, Avehard –3σhard)
Ave = (Avesoft + Avehard)/2 (the combined average)
Example:
Tests on a hard joint (30 degrees) and a soft joint (800
degrees) produced the following data.
Hard joint: Ave = 61 Nm and σ = 1.2 Nm

Soft joint: Ave = 60.2 Nm and σ = 1.0 Nm
Deviation = Max (61+3*1.2, 60.2+3*1.0) – min (61-3*1.2,

60.2-3*1.0) = 7.4 Nm
Ave = (61+60.2)/2 = 60.6 Nm
Accuracy = 100*0.5*7.4/60.6 = 6.1%
It is hard to give an estimate of the accuracy of tools because

of:
• Different accuracy on hard, soft and combined applica-
tions.
• Different accuracy if the tool is used high up in the torque
range or in the lower part.
4. Understanding processes
Every organization produces something, whether it be pro-

ducts or activities, and this is done in many different ways.
But what all organizations have in common is that the way
they work can be described as methods and activities.
A process is simply a structured set of activities designed to
produce a specified output for a particular customer or mar-
ket. It has a beginning, an end, and clearly identified inputs
and outputs. A process is therefore a structure for action, for
how work is done. Within the quality area, the process con-
cept is defined as “a set of activities, which are repeated in
time, for the purpose of creating value for a customer”. As
you now understand, the process approach implies adopting
the customer point of view. Processes also have performance
dimensions, such as cost, time, output quality and customer
satisfaction. Bear in mind that all of these dimensions can be
measured and improved.
Figure 10. A process is a set of

activities designed to produce an
output for a customer or a market.
In a modern car plant, the production line is a typical “opera-

tive process” i.e. it creates value for the person buying the
car. Along the line, the cars are assembled with different
kinds of nutrunners, all with different functionality, perfor-
mance and reliability. In the assembly process there are a lot
of things that affect the outcome of the tightening. The opera-
tors, the screws, the holes and many other things affect the
tightenings. All this contributes to the total process variation
for each application. Remember the discussion about
variation in chapter 1.
The dimensions with which we measure the perfor-
mance of the nutrunners are torque and sometimes
angle. By using statistics, we can analyze the perfor-
mance of the process (tightenings) and we can monitor, con-
trol and improve the assembly process. This means, in the Figure 11. Industrial production is
an operative process. A lot of
long run, more accurate tightenings, better and safer cars and things contribute to the process
better value for the customers. variation.
5. Capability
Earlier in this pocket guide we talked about statistics and

accuracy. The accuracy of a tool tells us something about the
performance, but this is not enough. The important aspect for
our customers is how the tool performs in an application, on
the production line. So, somehow we have to relate the accu-
racy of the tool to the application. Every joint has a target
value, but also some tolerance that is acceptable for the cus-
tomer. By relating the mean and the standard deviation to the
target value and the tolerance limits of an application we can
tell how a tool is performing where it really matters, in its
application. This is possible thanks to different capability
indices.
There are many different capability indices, some of them

quite simple and some of them more intricate. This pocket
guide deals with the most commonly used ones, the ones our
customers use.
We know from before that a normal distribution is defined by

its mean and its standard deviation. We also remember our
assumption that all values, when the process is under control,
are within the 6σ limits, although only 99.7% really are. This
is called the process natural variation.
5.1 Cp
The first, and most commonly used capability index, is called
Cp. The formula for the Cp is:
Cp = Tolerance interval = HI – LO
6σ 6σ
If you look at the formula, you can see that it simply relates
the tolerance interval (HI-LO), to the process natural varia-
tion! If we have a tool with a big spread, and an application
Figure 12. When calculating Cp,
the tolerance interval is related to with very high demands (narrow tolerance limits), we get a
the 6σ. low Cp value. Conversely, if we have a tool with very small
spread (small σ), but very wide tolerance limits, we get a
high Cp. Of course this is what we want, because the smaller
the variation in relation to the tolerance limits, the lower the
risk of tightenings outside the tolerances. The Cp require-
ments vary. The most common is that Cp has to be greater
than 1.33. This indicates that 6 times the standard deviation
covers no more than 75% of the tolerance interval.
But is this enough for us to tell if the tool is good or bad for
a specific application? Do we need something more? Yes.
The Cp does not consider whether the mean of the distribu-
tion is close to the target value or not. This index does not
guarantee that the distribution lies in the middle of the tole-
rance interval. In the picture below you can see the same tool
on the same application, but before and after torque adjust-
ment. In both cases we would have the same Cp. If we are off
target, it is possible that the tightenings are outside one of the
tolerance limits, even if the scatter is small in relation to the
tolerance interval (high Cp). So we need something more that
also relates the distribution to the target value.
Figure 13. High Cp does not guarantee that we are close to the target
value.
5.2 Cpk
The Cpk also relates the mean of the distribution to the target
value of the application. The way to do this is to divide the
distribution and the application into two different parts and
make one calculation for each side. The formula looks like
this:
Cpk = min [(HI – AVE) / 3σ , (AVE – LO) / 3σ]
First we relate the difference between the upper tolerance

limit and the average to half the natural variation (3σ). Then
we make another calculation, relating the difference between
the average and the lower tolerance limit to 3σ. We now have
two potentially different values, and the LOWER of the two
is the Cpk. If you think this is difficult, just take a few minu- Figure 14. When calculating
tes to think about this. If the average is higher than the target Cpk also the target value is
considered.
value, then the difference between the upper tolerance limit
and the average is smaller than the difference between the
average and the lower tolerance limit. If this is the case, the
“upper calculation” will give us the Cpk, because we are clo-
ser to the upper tolerance limit.
What happens to the Cpk if we are right on target? Well, in

this case we are as close to the upper tolerance limit as to the
lower, and both calculations will give us the same result.
In this case, we can also see that the Cpk has the same value
as the Cp.
Bad Cp Good
Process not capable Process capable
Change tool or but average needs
Bad adjust for good to be adjusted.
accuracy.
Cpk
Figure 15. The relation bet- Not possible. Process capable
ween Cp and Cpk. Good and well adjusted.
Now we have introduced the Cp and the Cpk. By studying

the formulas it is easy to see that Cp only relates the toleran-
ce interval to the process 6σ. Cpk also considers the target
value. We want both Cp and Cpk to be higher than 1.33. If
our average is right on target, the Cp and Cpk are the same.
The more off target we are, the bigger the difference between
Cp and Cpk. Obviously Cpk can never be higher that Cp.
5.3 When is a process capable?

The question of “how good is capable?” has still not been
definitively answered. Since Cp was first used, a Cp value of
1.33 has become the most commonly acceptable criterion as
a lower boundary. The Cpk requirements vary. The most
common is that Cpk has to be greater than 1.33. A process
that has a Cpk lower than 1.00 is never capable.
It is very important that you understand why we use both the

Cp and the Cpk. If we only use the Cp, we do not know
whether we are on target or not. If we only use the Cpk, we
cannot know whether a good or bad Cpk value is because of
the centering of the process or because of the spread. So we
have to use both. Together they can give us a very good indi-
cation of how well a specific tool is performing in a specific
application. They are also the perfect way to compare diffe-
rent tools.
Look at the following dartboards:
The first dartboard shows a poorly centered process, but with Figure 16.
Dartboard 1:
a low spread (high accuracy). In this case the Cp is high and High Cp and low Cpk.
the Cpk low. On the second dartboard, the darts are spread Dartboard 2:
randomly around the bull’s eye, but the spread is quite large Low Cp and low Cpk.
Dartboard 3:
related to the tolerances. Cp is probably not so good, but if High Cp and high Cpk
the “mean value” is on target, the Cpk has the same value as
the Cp. The third dartboard shows a well centered process,
with high accuracy. This means that both the Cp and Cpk are
high; the process is capable.
An example:
A joint should be tightened at 70 Nm ± 10 %. A tool is tested
and we get an average of 71 Nm and a σ of 1.2 Nm.
Cp = (77-63) / 6*1.2 = 1.95

Cpk = min [ (77-71) / (3*1.2) , (71-63) / (3*1.2) ] =
min [ 1.67, 2.22 ] = 1.67
Both the Cp and Cpk values are greater than 1.33 and the
process is capable and does not need to be adjusted.
5.4 Machine capability indices
As you now know, Cp and Cpk are process capability indices.
Everything that affects the process affects these indices. But
if we take away all variation affecting the assembly process,
except the variation in the tool itself, we get what are called
Machine Capability indices. This must be done under very
controlled circumstances, preferably in a tool crib. The tests
should be carried out on the same joint and by the same ope-
rator (or even better, place the tool in a fixture in order to get
rid of all the operator influence). The calculations are the
same for Cm as for Cp, and the same for Cmk as for Cpk.
So remember, Cp and Cpk determine whether the process is

capable. The Cm and Cmk determine whether the machine
(tool) is capable.
5.5 What else is there to think about?

When you analyze the capability of a tool, the sample size is
of great importance in order to obtain reliable mean and stan-
dard deviation calculations. A sample size of at least 25 is
strongly recommended.
And remember that if a someone says something like “I have

a tool that always can live up to a Cpk demand of 2.0”, there
are two alternatives:
1. He does not know what he is talking about, because it is
meaningless to talk about capability indices without rela-
ting the tool performance to an application with customer
demands (tolerance limits)!
2. He knows what he is talking about and is trying to make
the tool look better than it really is.
6. Control charts
We have talked about statistics and accuracy, about processes

and capability. Now we are going to learn about control
charts. Statistics, tool performance and a production environ-
ment (process variation) are important elements in understan-
ding this.
The control chart is an important tool within Statistical

Process Control. The idea is to repeatedly collect a number of
observations (samples) with a certain interval from the pro-
cess. With help from these observations (measurements) we
want to calculate some kind of quality indicator and plot it in
a diagram. The indicator normally used in the tightening
industry is subgroup mean and/or subgroup range.
Do you remember the difference between special and random

variation? If not, do go back and read the section again,
because this is very important. If the plotted quality indicator
is within the 6σ limits, we say that the process is under statis-
tical control, only random variation affects the tightenings.
When we use these limits in control charts, they are called
control limits. We also have an “ideal level”, a target value
marked between the control limits, and of course it should be
the same as our target value in the assembly process. If some
special variation enters the process, it can affect the tighte-
nings in two different ways; it can affect the average of the
tightenings, the spread or both.
We have the following requirements on a control chart:

• It should be possible to quickly detect systematic changes
in the process, enabling us to find sources of variation.
• It should be easy to use.
• The chance of getting a “false alarm” should be very
small (if we use the 6s limits as control limits, the chance
is 0.3 %).
• It should be possible to know when the change started to
affect the process.
• It should prove that the process has been under control.
• It should be motivating and constantly bring attention to
variations in the process and to quality related issues.
6.1 X-bar charts
First we introduce a control chart for controlling the average
level of a certain unit. It can be the diameter of a bolt, or the
torque applied to a joint. It is called -chart, and when using
it we plot the average of the observations (measurements)
into the diagram. At pre-defined intervals we collect a num-
ber of measurements, a subgroup, from the process. We then
calculate the mean for each subgroup and use this value as
our quality indicator.
We know that the tightening applications can be described as

a normal distribution. We know that the mean and the stan-
dard deviation help us to do that. We also know that all pro-
cesses vary over time, due to different kinds of variation, i.e.
material differences, operator influence etc. The 6σ limit
LO makes it possible to tell whether the process variation is due
to random or special causes, so the control limits are normal-
ly based on the 6σ limits, the natural variation of the process.
The procedure for plotting these charts is straightforward, the
relevant variable (in our case torque or angle) is measured at
regular intervals (maybe once every hour or once a day), and
typically a group of 5 consecutive readings are taken each
time.
When the control limits are set, the -values from each
group of readings can be plotted on the charts. When the
Figure 17. We collect a num-
assembly process is under control (only random variation
ber of measurements, a affects the tightenings), the subgroup averages will spread
subgroup from the process, randomly around the overall mean ( ).
and plot the averages into
the diagram.
6.2 The subgroup
Assume that the quality variable (in our case the tightenings)
we want to control has the average µ and standard deviation
σ when the process is under control. Remember that our qua-
lity indicator is the subgroup mean, . Ideally the individual
measurements and the subgroup averages have the same
mean value (see picture). But we can also see that the spread
between the individual measurements (σ) is bigger than bet-
ween the subgroup averages, which in fact is σ/√n, where n
is the number of measurements in each subgroup. So the
chance of detecting a deviation from µ is greater when we Figure 18. The spread between
study subgroups instead of individual measurements. So, in individual measurements is
fact, the control limits are normally set to (the subgroup 6σ- bigger than between subgroup
averages.
limits):
UCL = µ + 3σ/√n
Estimated by:
LCL = µ – 3σ/√n
But how big does the subgroup need to be? If you look at the
picture below you see that as we increase the size of the sub-
groups (n), the standard deviation does not decrease so much
when we go over 4 or 5. This explains why 4, 5 or 6 are very
common choices of subgroup sizes. Historically, a subgroup
of 5 is a very common choice.
Figure 19. Using a subgroup size of 5 is very common in the industry.
6.3 Alarms
Now to the good stuff; what happens if something non-ran-
dom starts affecting the tightenings? What if the quality of
the screws suddenly deteriorates? Well, maybe it will affect
the mean of the subgroups. Maybe it will affect the spread
within the subgroups. Maybe the torque applied to the joints
will slowly decrease. All this can now be detected. The beau-
ty of control charts is that the quality engineer, or quite often
the operator, can pick up potential problems at an early stage
before we get tightenings outside the tolerances, before faul-
ty assemblies are made.
The easiest way to detect that something non-random has

started to affect the process is when we get values outside the
control limits. This is an ALARM and we have to find out
what has happened immediately, before we get tightenings
outside the tolerance limits!
In the figure to the left you can see what a control chart
CAN look like when special variation starts affecting the
assembly process. The first two cases show “trend alarms”.
Production can continue during investigation. The fourth case
is when the overall mean ( ) starts to deviate from the target
Figure 20. Examples of value. We have to find out why this has happened, but maybe
what control charts can look an adjustment of the tool is enough; this depends on the rea-
like when systematic varia-
tion has entered the pro- son for the change.
cess.
6.4 Range charts
To control the spread in the process we can use either the
standard deviation or the range within the subgroups. The
range (R) is the difference between the biggest and the smal-
lest value of each subgroup. The standard deviation is of
course based on all values within the subgroup, whereas the
range is only based on two. This means that the s-chart is
more reliable and gives us more information about the spre-
ad. However, the range is easier to calculate and even though
we now have very good tools, which calculate everything for
us, the R-chart is still the most popular chart to use.
The Range R helps us to estimate the spread of the subgroup.
This can be done with the aid of different devisors, which
can be found in manuals for statistical process control. If you
want the centerline to be , the control limits for the control
chart will be:
UCL = D4*
LCL = D3*
The R-chart indicates how the spread within the subgroups

develops. It makes it possible to detect when a systematic
change in the process affects the subgroup spread.
6.5 Control chart conclusion

The control limits should be based on a large and reliable
number of tightenings and they should be re-calculated, using
the actual production results, at regular intervals in order to
obtain reliable charts.
This chapter is only intended as an introduction to Process
Control charts and does not cover all aspects of these charts.
Summary
This guide explains the basics of statistics such as the distri-
bution, mean value and standard deviation. It also describes
how this can be related to an application by capability calcu-
lations. The process can be monitored and controlled by
using SPC, and this is also described and explained.
This pocket guide does not explain all aspects of and the
potential of statistics. This is an introduction to the subject,
and if there is a need for further studies we recommend you
refer to specialist literature.
The different product offerings that Atlas Copco can supply

to help customers utilise the potential of statistics in produc-
tion are not explained in this guide. If you need to discuss
Atlas Copco’s product range, please contact your local Atlas
Copco sales representative.
Appendix
A1. Example of basic statistics calculation

The following example will help you to understand the basics
of statistics. In this example we compare the torque levels of
two different tools. You then might obtain the torque values
shown below. Target torque is 10.
Atlas Copco Tool Other tool

10 10
10.1 11
10.2 9
9.7 8
10.0 12
10.2 10
10.1 9
9.7 12
9.8 8
10.2 11
Which one of these tools is the most accurate? To answer

this, we first calculate the mean value of the two series. The
mean value gives us an average of all values received from
the different tightenings and we use the symbol . The mean
value is calculated by adding all tightening data, x, and divi-
ding by the number of tightenings, n.
n
Σ xi
i=1
x= n
Mean value,
10 10
Both tools have a mean value of 10. If one tool were to have
a mean value of 15, we would know that that tool is not as
good as the one hitting the target torque. Do both tools have
the same accuracy? Accuracy tells us how accurate a tool is,
i.e. how well it hits the target. It is the degree to which an
indicated value matches the actual value of a measured varia-
ble.
How do we now see the difference? Let us look at the range
of the values of the two tools. The range, R, tells us between
which values we have received our tightenings, and is calcu-
lated as the difference between the highest and the lowest
value in the range.
R = xmax – xmin.
Range, R
0.5 4
With the Atlas Copco tool, our tightening values differ by 0.5
Nm between highest and lowest value; while the other tool
has a deviation of 4 Nm. But if you perform 1000 tightenings
with the Atlas Copco tool and get one value totally out of the
range, e.g. 5, you get a range for the Atlas Copco tool of 5.5.
Then the Atlas Copco tool becomes the bad one. We have to
find a function to remove the influence of that one tighte-
ning.
The standard deviation is a statistical index of variability,
which describes the deviation and tells us the average diffe-
rence between the value of a specific variable and some desi-
red value, usually a process set point. Let us calculate the
deviation for each value received and sum them up.

Torque xi - Torque xi -
10 0 10 0
10.1 0.1 11 1
10.2 0.2 9 -1
9.7 -0.3 8 -2
10.0 0 12 2
10.2 0.2 10 0
10.1 0.1 9 -1
9.7 -0.3 12 2
9.8 -0.2 8 -2
10.2 0.2 11 1
=10
=0 =10 =0
The result is 0 for both tools. What is it that causes a problem

in this case? We have both positive and negative values. We
need to take away the minus, to get the absolute values of
each deviation. To mathematically take away the minus, we
can square each value.
σ xi - (xi - ) 2
σ xi - (xi - )2
10 0 0 10 0 0
10.1 0.1 0.01 11 1 1
10.2 0.2 0.04 9 -1 1
9.7 -0.3 0.09 8 -2 4
10.0 0 0 12 2 4
10.2 0.2 0.04 10 0 0
10.1 0.1 0.01 9 -1 1
9.7 -0.3 0.09 12 2 4
9.8 -0.2 0.04 8 -2 4
10.2 0.2 0.04 11 1 1
=10 (xi – )=0 (xi– )2= 0.036 =10 (xi – )=0 (xi– )2=2
Now we have a value that is Nm2 to compare with. But what

does this value tell us? It tells us something about deviation.
This value depends on the number of tightenings. What we
do is to divide this value by the number of tightenings –1 to
get an average. We have to take the square root of this sum to
get the value back to Nm.

0.2 1.4
What we now have done is to calculate the sample standard

deviation. Standard deviation is a way of measuring how well
the tool performs, how close we are to the expected value.
Now we can see a clear difference. The Atlas Copco tool has
a standard deviation of 0.2 Nm from the target; while the
other tool has a standard deviation of 1.4 Nm.
So what this example tells us is that although both tools have
the same mean value, the first tool is more accurate. The
different tightenings are closer to the target value and the
standard deviation is a way for us to prove this.
A2. Example of capability calculation
We know that the capability of a tool is how the tool is per-
forming in a specific application. So what we do when calcu-
lating capability indices is to relate the tool accuracy (mean
value and standard deviation) to the demands on the applica-
tion (target value and tolerance limits).
Let us assume that we have an application with target value

15 Nm, and tolerances +/– 8%. This means that the upper
tolerance limit is 16.2 Nm and the lower limit is 13.8 Nm.
We have collected 20 tightening results from one tool, on the
production line:
15.4
15.6
15.4
15.1
15.1
15.5
15.0
15.3
15.2
15.1
15.5
15.3
15.4
15.3
15.3
15.1
15.2
15.4
15.1
15.2
It is now easy to calculate the mean value and standard devi-
ation:
n
Σ xi
x= i=1n
It is now easy to calculate Cp and Cpk:
Cp = (HI – LO) / 6σ = (16.2-13.8)/(6*0.165) = 2.42
Cpk = min [(HI - AVE) / 3σ , (AVE - LO) / 3σ] =

min [(16.2-15.275)/3*0.165 , (15.275-13.8/3*0.165] =
min [1.87 , 2.98] = 1.87
Both the Cp and Cpk values are greater than 1.33 and the
process is capable and does not necessarily need to be adjus-
ted, even though the average is slightly off target.
A3. Example of control chart calculation

Now we want to create a control chart from the same tighte-
nings, as in the previous example. Let us assume that we are
starting up a production process after it has been stopped for
some time. Then we do not really know the mean value µ and
the standard deviation σ. In order to calculate the control
limits for the control chart, the calculations must be based on
a reliable number of tightenings. A good rule of thumb is to
collect at least 20-25 subgroups before calculating the control
limits for a control chart. The reason for this is that at least
20 subgroups are needed for us to be able to say whether the
process is under control or not. However, in this example we
have simplified things and collected only 4 subgroups.
Let us assume that we have collected these results on 4 diffe-
rent occasions. We have set the subgroup size to 5, so we
have collected 5 results on each occasion:
Day 1 15.4
15.6
15.4
15.1
15.1
Day 2 15.5
15.0
15.3
15.2
15.1
Day 3 15.5
15.3
15.4
15.3
15.3
Day 4 15.1
15.2
15.4
15.1
15.2
The first thing we do is to calculate the average for every
subgroup:
= 15.32
= 15.22
= 15.36
= 15.2
When the production process is under control, the target

value is the same as the overall average value. It is easy to
calculate the overall average ( ) = 15.275. We know from
before that the control limits are based on the natural varia-
tion between the average values of the subgroups.
UCL = + 3s / √n = 15.275 + (3*0.165 / √5) = 15.275 + 0.22 = 15.5

LCL = – 3s / √n = 15.275 – (3*0.165 / √5) = 15.275 – 0.22 = 15.05
Now we can create our control chart. We use the overall

average as centerline and also mark the control limits in the
chart. Now we can plot the subgroup averages in the chart.
As we can see they are all within the control limits and the
production is under control (even though the individual tigh-
tening values are outside the limits. Remember that the limits
are based on the variation between the subgroup averages,
not the individual tightenings.
From now on it is easy to plot a new subgroup average into

the chart every day. As long the plotted values are spread ran-
domly round the centerline, the process is under control. Figure 21. The process is
under control when the
subgroup averages spread
randomly around the ove-
rall mean.
A4. Analysis of assembly tool performance –
ISO 5393 Calculation
To allow us to assess the performance of different tools and
to compare one tool with another, there is an international
standard (ISO 5393), which sets out a basic test procedure
and analysis of results. Based on this, many motor vehicle
manufacturers have developed their own certification stan-
dards.
As an example we will assume we have tested a tool accor-
ding to the procedure stated in ISO 5393.
On a hard joint with the tool set at the highest torque setting
the following results were obtained (in Nm).
31.5 33.2 32.6 33.7 31.4 32.5 33.1 31.2 33.5 32.6 33.1
31.0 32.3 33.2 32.4 31.5 33.5 33.3 31.5 32.6 31.3 33.7
33.0 31.8 33.0
We now calculate the values required to analyze the tool’s

tightening accuracy as described in ISO 5393.
For the data on the hard joint at the highest torque setting.
Mean torque ( )
= (31.5 +33.2 + 32.6 + 33.7 + ....+ 33.0) / 25
= 32.5 Nm
Range
= 33.7 - 31.0
= 2.7 Nm
Standard deviation (s)
= 0.863 Nm
Sigma (6s) torque scatter

6 x 0.863 = 5.18 Nm
6s scatter as a percentage of mean torque

= (5.18 / 32.5) x 100
= 15.93 %
Now let us assume that for the same tool we calculated the
following values for the data collected at the other torque
settings and joint conditions described in ISO 5393.
For the higher torque setting on the soft joint
A mean of 31.95 and a standard deviation of 0.795.
For the lower torque setting on the hard joint
For the lower torque setting on the soft joint
We can now make the following calculations for

the higher torque setting
a= Mean hard joint +3S hard joint
b= Mean soft joint +3S soft joint
c= Mean hard joint – 3S hard joint
d= Mean soft joint – 3S soft joint
a= 32.50 + (3 x 0.863) = 35.09

b= 31.95 + (3 x 0.795) = 34.34
c= 32.50 – (3 x 0.863) = 29.91
d= 31.95 – (3 x 0.795) = 29.56
Combined mean torque

(35.09 + 29.56) / 2 = 32.33 Nm
Mean shift
32.5 – 31.95 = 0.55 Nm
Combined torque scatter

35.09 – 29.56 = 5.53 Nm
Combined torque scatter as a % of combined mean

(5.53 / 32.33) x 100 = 17.1 %
Lower torque setting

a= Mean hard joint + 3s hard joint
b= Mean soft joint + 3s soft joint
c= Mean hard joint – 3s hard joint
d= Mean soft joint – 3s soft joint
a= 23.72 + (3 x 0.892) = 26.40

b= 22.87 + (3 x 0.801) = 25.27
c= 23.72 – (3 x 0.892) = 21.04
d= 22.87 – (3 x 0.801) = 20.47
Combined mean torque
(26.40 + 20.47) / 2 = 23.44 Nm
Mean shift
23.72 -22.875 = 0.85 Nm
Combined torque scatter

26.40 – 20.47 = 5.93 Nm
Combined torque scatter as a % of combined mean

(5.93 / 23.44) x 100 = 25.3 %
Tool Capability is 25.3 %

as the greatest torque scatter was at the lower torque setting.
This particular tool will tighten 99.7 % of all practical joints

to within ± 13 % of its pre-set torque value. (i.e. 99.7 % of
results will fall within ± 3s of the mean).
Atlas Copco Pocket Guides
Title Ordering No.
Air line distribution 9833 1266 01
Air motors 9833 9067 01
Drilling with hand-held machines 9833 8554 01
Grinding 9833 8641 01
Percussive tools 9833 1003 01
Pulse tools 9833 1225 01
Riveting technique 9833 1124 01
Screwdriving 9833 1007 01
Statistical analysis technique 9833 8637 01
The art of ergonomics 9833 8587 01
Tightening technique 9833 8648 01
Vibrations in grinders 9833 9017 01
www.atlascopco.com
9833 8637 01 Recyclable paper. Jetlag 2003:1. Printed in Sweden

Atlas Copco - 4. Statistical Analysis, PG PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Atlas Copco - 4. Statistical Analysis, PG PDF

Uploaded by

Copyright:

Available Formats

Pocket guide

to statistical analysis techniques

The purpose of this guide is to explain the basics of statistics

A normal distribution is always symmetrical and determined

2.4 Mean value

If you have 20 values of a population, you are able to group

It is an important characteristic of the normal distribution

To make things clearer, we assume that as long we have tigh-

Figure 6. The first pic-

We say that the average value of the distribution is the popu-

Σx – the sum of all tightenings, divided by the total number

n is the total number of occurrences in the population

We take the value of each individual occurrence minus m, the

However, in practice it is very rare that we can measure every

Sample mean and standard deviation

The use of (n – 1) instead of (n) gives more accurate estimate

Thus, the sample mean ( ) is an estimation of the population

In a tightening application there are often accuracy require-

If the accuracy requirements are 40 Nm +/– 10%, we have to

3.1 Mean shift and combined scatter

Written as a formula it will look like this:

Adjusted, the formula would look like this:

Accuracy = 100 * 0.5 Deviation/Ave

Hard joint: Ave = 61 Nm and σ = 1.2 Nm

Deviation = Max (61+3*1.2, 60.2+3*1.0) – min (61-3*1.2,

It is hard to give an estimate of the accuracy of tools because

Every organization produces something, whether it be pro-

Figure 10. A process is a set of

In a modern car plant, the production line is a typical “opera-

Earlier in this pocket guide we talked about statistics and

There are many different capability indices, some of them

We know from before that a normal distribution is defined by

Cpk = min [(HI – AVE) / 3σ , (AVE – LO) / 3σ]

First we relate the difference between the upper tolerance

What happens to the Cpk if we are right on target? Well, in

Now we have introduced the Cp and the Cpk. By studying

5.3 When is a process capable?

It is very important that you understand why we use both the

Cp = (77-63) / 6*1.2 = 1.95

So remember, Cp and Cpk determine whether the process is

5.5 What else is there to think about?

And remember that if a someone says something like “I have

We have talked about statistics and accuracy, about processes

The control chart is an important tool within Statistical

Do you remember the difference between special and random

We have the following requirements on a control chart:

We know that the tightening applications can be described as

Figure 19. Using a subgroup size of 5 is very common in the industry.

The easiest way to detect that something non-random has

The R-chart indicates how the spread within the subgroups

6.5 Control chart conclusion

The different product offerings that Atlas Copco can supply

A1. Example of basic statistics calculation

Atlas Copco Tool Other tool

Which one of these tools is the most accurate? To answer

Atlas Copco Tool Other tool

The result is 0 for both tools. What is it that causes a problem

Now we have a value that is Nm2 to compare with. But what

Atlas Copco Tool Other tool

What we now have done is to calculate the sample standard

Let us assume that we have an application with target value

It is now easy to calculate Cp and Cpk:

Cp = (HI – LO) / 6σ = (16.2-13.8)/(6*0.165) = 2.42

Deviation = Max (61+31.2, 60.2+31.0) – min (61-3*1.2,