Professional Documents
Culture Documents
Measurements
27
T.A. Spedding
Nanyang Technological University, Singapore, and Received March 1993
Revised July 1993
P.L. Rawlings
Renishaw Metrology Limited, Gloucestershire, UK
Introduction
In recent years the introduction of Statistical Process Control (SPC) in
manufacturing has provided a major boost to ensuring the quality of the final
product, by monitoring the variability of the process through each stage of
production. Several computerized SPC packages have made this task easier;
allowing data to be rapidly entered into the computer so that they may be
stored, analysed and the results output to the inspector. One of the underlying
assumptions of SPC is the use of the Gaussian distribution. Such assumptions
are implicit in the construction of control charts and process capability studies.
It has long been realized that the variability associated with many
engineering processes does not have a Gaussian distribution. In continuous
batch manufacture the normality assumption is often justified, but the
distribution of the process variation is more critical when considering the
sample sizes associated with small batch manufacture. Several SPC packages
provide indications of non-normality such as skewness and kurtosis and χ2
statistics but none offers an interpretation of the results or suggest an
alternative approach, if the process is found to be non-Gaussian.
This article addresses the problem of non-normality by using a general
system of distributions, so that once the distribution of the variability of the
engineering process has been identified it may be fitted to the data.
Use is made of the Johnson System of Distributions which provide the
capability of characterizing a wide range of engineering processes of differing
distributions and also facilitates transformations to and from normality. This
means that traditional statistical techniques based on the Gaussian distribution
may be applied and then converted, using translation techniques to the
appropriate Johnson distribution for the specified characterization. Thus
percentage points which otherwise would have meant a more complex analysis
may easily be generated by using the appropriate computer algorithm.
1.4
1.3
1.2 2.5, 10 Upper 1 per cent point
2, 7 for skewness
1.1 1.25, 4
1 1, 3
0.9, 2.5
0.9
Skewness
0.8
0.7
0.6 ✕
✕
0.5
+ + ✕
0.4 +✕
+ ✕
0.3 ✕ ++ ✕
✕ ✕
0.2 + +
✕ + +
✕
✕ +
✕
+
0.1 +
✕ ✕
Figure 1. 0
0 10 20 30 40 50
Skewness vs Sample Sample size
Size
Non-normality
5.5 in Statistical
Upper 1 per cent point
2.5, 10 for kurtosis Process Control
2, 7
5 1.25, 4
1, 3
0.9, 2.5
4.5 31
Kurtosis
3.5
✕ ✕
✕ ✕ ✕ + ✕
+ +
3 + +✕ ✕ + + + ✕
+
++✕ ✕ ✕ +
+ +
✕ ✕
✕
2.5
0 10 20 30 40 50
Sample size
Figure 2.
Kurtosis vs Sample
Size
1 0 1 0.9 2.5
2 0 1 1.0 3.0
3 0 1 1.25 4.0
4 0 1 2.0 7.0 Table I.
5 0 1 2.5 10.0 Population
Sample Size
A Gaussian distribution has zero skewness and a kurtosis of three, hence if the
properties of the Central Limit Theorem are to be used to obtain a Gaussian
distribution of the sample means, the shape statistics should quickly assume
their Gaussian values with increasing sample size. The graphs presented in
Figures 1 and 2 show the tendency for the sample shape statistics to move
towards 0, 3 with increasing sample size for the populations shown in Table I,
although the picture is rather confused, due to the sampling variation
associated with each statistic. A clear trend can be identified for the
relationship between skewness and sample size. For reference, a line
representing the upper one per cent point is shown for the distribution of
sample skewness for samples of increasing size from a Gaussian population.
This was obtained from Table 34B of Biometrika Tables for Statisticians,
Volume 1[10] which presents the percentage points as a test for departure
from normality.
IJQRM There is no clearly discernible trend for the graph of kurtosis against sample
11,6 size, Figure 2, except for the population distributions with the highest value of
kurtosis (i.e. distributions 4 and 5). Again for reference a line representing the
upper one per cent point for the distribution of sample kurtosis for samples
from a Gaussian population is shown. This was obtained from Table 34C of
Biometrika Tables for Statisticians, Volume 1[10]. The distributions can be seen
32 to move towards normality at sample sizes of around 15. However, the
distributions do not assume a stable level of kurtosis after this point, merely
varying around three.
This variability can be explained if the method of computation for the
kurtosis is considered, i.e. kurtosis is calculated as the fourth central moment of
a distribution normalized by the standard deviation. Hence any non-normality
in the sample distribution will be shown as a measurable deviation in the
sample kurtosis. While this indicates that kurtosis is a sensitive measure of
non-normality, it also shows that it is very sensitive to outliers and inaccuracies
in the shape distribution due to insufficient data points. In addition, it should be
noted that kurtosis is a biased estimator and that the bias decreases with
increased sample size so that a trend in the data could be masked by this effect.
Skewness will suffer from the same sources of variability as kurtosis, but not
to the same extent, as it is calculated as a third order statistic. For this reason
the method used to assess the efficiency of the Central Limit Theorem in these
experiments was based upon the tendency of the distribution’s skewness to
move towards zero.
Figure 1 illustrates a clear tendency for the distributions to move towards
normality as the sample size increases. A set of curves were fitted through the
data points using a simple exponential regression model and this has resulted
in the family of curves shown in Figure 3. It is evident from these sets of curves
that a sample size of 30 is required to be sure of achieving a Gaussian sampling
distribution for reasonably well behaved population distributions such as 1 and
2, otherwise sample sizes of between 30 and 50 are required.
It is possible to quantify the error associated in assuming normality by
application of the Pearson system of distributions. Taking the example of
distribution 2, i.e. skewness = 1, kurtosis = 3, the sampling distribution takes
on the approximate shape with skewness = 0.2, kurtosis = 3.2 for a sample size
of 30. The upper and lower 99 percentiles for this distribution are 2.81 and -2.44
respectively, compared to the symmetrical 99 percentiles of ±2.58 for the
Gaussian distribution. Thus it can be seen that the assumption of normality
will tend to increase the average run length for the lower process limit (i.e. the
process mean can move 0.14 before there is any possibility of detecting the shift)
and increase the number of false out-of-control signals above the upper limit.
By interpolation within the table of Pearson distributions[2] the increased rate
of false out-of-control signals, can be shown to be of the order 0.5 per cent.
This error would be negligible in the practical situation.
Non-normality
1.3
in Statistical
1.2 Process Control
Skewness and kurtosis
1.1 2.5, 10
2, 7
1 1.25, 4
1, 3
33
Skewness
The sampling distribution for a sample size of four, i.e. skew = 0,6, kurtosis =
3.4, has upper and lower 99 percentiles of 3.09 and –1.95 respectively, resulting
in increased rate of false out-of-control signals of the order 1 per cent and a shift
in the process mean of 0.63.
The above calculations are examples of the likely errors associated with
assuming normality of the distribution of sample means. It can be seen that the
errors are relatively small, but for situations where tight control is required the
quality engineer should be aware of their possible magnitude.
It can be seen that the true limits for the distribution of sample standard
deviations are generally far wider than those set using the χ2 distribution, and
the sampling distribution of means still exhibits significant skewness for a
sample size of 25. Therefore, the statistical process control practitioner should
be aware of this when analysing the process control charts.
It is interesting to note that if sufficient computing power were available it
would be possible to characterize the moments of sampling distributions for a
wide range of non-Gaussian processes using the above techniques. Thus, if a
good estimate of the population distribution were possible, limits could be
prescribed for the sampling distribution of the means and standard deviation.
The main problem of estimating the distribution of the process from the
sample statistics such as skewness and kurtosis is the sampling variation
associated with these parameters. It is very difficult to obtain reliable estimates
of the process skewness and kurtosis from small samples. In these cases
significant errors could occur. As a better system cannot be suggested it is
recommended that the standard charts be used, but when an out-of-control
condition is registered, a second sample should be taken immediately after the
deviant sample to determine whether a shift in the process has really occurred.
Process Capability Non-normality
Even though the Johnson system is impracticable for process control it has in Statistical
great potential for the determination of process capability. This is because Process Control
process capability is calculated using the standard deviation of the individuals
rather than the sample averages and is based upon, typically, sample sizes of
100. A problem with using the skewness and kurtosis statistics is that they are
not independent. The relationship between sample skewness and kurtosis has 35
been investigated[11,12] by producing sets of bivariate plots. Inspection of these
plots shows that the greatest density lies within a well defined central area thus
the sample statistic is most likely to be representative of the population. Hence,
the Johnson distribution is more likely to describe accurately the distribution
than the Gaussian distribution, and the following method is proposed as a
revision to statistical process control practice, and is based on the Ford Guide to
the use of Control Charts[9]:
(1) Calculate the process mean, standard deviation, skewness and kurtosis
based upon at least 100 individuals.
= =
(2) Calculate ZUSL = (USL –X )/s and ZLSL = (LSL –X )/s where USL and LSL
are the process upper and lower bilateral tolerances.
(3) Calculate the upper and lower Johnson distribution limits A1 and A2 for
±3s using the values from (1).
(4) Calculate the upper and lower C pk index =Z USL/A 1 and Z LSL/A 2 and
quote the smaller of the two.
The above procedure has been used to calculate the Cpk indices for the parent
distribution, i.e. mean 0, standard deviation 1, skewness 2 and kurtosis 7, used in
Table II. The following calculations present a comparison of the C pk values
obtained first by assuming normality and second by using the Johnson system
of transformations.
The upper and lower bilateral tolerances have been set at ±1.6 for the
purposes of comparison, so that under the assumption of normality:
Cpk = 0.533.
Using the Johnson System of Transformations, the limits at ±3s are 4.56 and
–0.801, hence:
Cpk (USL) = 0.351, and Cpk (LSL) = 1.998.
It can be seen from the above calculations that significant errors can be made,
by assuming normality, while the Johnson system enables a more accurate
calculation of process capability without the need for the practitioner to make,
or test, any assumptions of normality.
Discussion
The use of the Johnson System of Translatory Distributions and a general
approach to characterizing the variation of engineering processes have been
IJQRM illustrated. The use of such techniques for the construction of control charts for
11,6 processes of known distribution is a more accurate and reliable method than
traditional methods. For small batch manufacture where the distribution is
unknown, there could be significant errors due to the inherent sampling
variation associated with statistics such as skewness and kurtosis which are
needed to estimate Johnson distribution curves. In these cases it is
36 recommended that standard practices are adopted. It has been shown that
significant errors occur when estimating the process capability of non-Gaussian
processes by assuming the Gaussian distribution. In these cases the use of a
generalized system of distributions such as the Johnson System has significant
advantages over the traditional methods in terms of accuracy and reliability.
The Pearson System of Distributions has been adopted in several recent
software packages for SPC. The Johnson System of Distributions is
computationally more efficient because of the facility to translate to and from
the Gaussian distribution. This also has the advantage of being able to apply
standard Gaussian techniques such as the calculation of percentage points
without reference to other forms of analysis.
Conclusion
This article has highlighted many of the limitations inherent in traditional
statistical process control theory. It has been shown that the assumption of a
Gaussian distribution of sample means of sample four, from non-Gaussian
populations, is valid for reasonably behaved population distributions. But the
characterization of the distribution of sample standard deviation from non-
Gaussian populations, using the χ2 distribution is less accurate. Under these
circumstances a second sample should be taken immediately after an out-of-
control signal is generated to ensure the process variance has altered.
Significantly, for non-Gaussian distributions the statistical process control
practitioner should be aware that control limits can only be a guide, and for close
tolerance work 100 per cent inspection is required to ensure conformance.
An alternative method of assessing process capability has been developed,
based on the Johnson translatory system of distributions. Its use leads to a more
accurate assessment of process capability. The method has been computerized
and is easy to use.
It is hoped that this article has contributed to the awareness of the problem of
non-normality in statistical process control, and illustrated the errors associated
with non-normality, so that the quality practitioner has a greater understanding
of the significance of his results. A second article by the authors will address the
problem of correlation in statistical process control measurements.
References
1. Elderton, W.P. and Johnson, N.L., Systems of Frequency Curves, Cambridge University
Press, Cambridge, 1969.
2. Pearson, E.S. and Hartley, H.O. (Eds), Biometrika Tables for Statisticians, Vol. 2, Charles Non-normality
Griffin and Co., London, 1976.
3. Johnson, N.L., “Systems of Frequency Curves Generated by Methods of Translation”,
in Statistical
Biometrika, Vol. 36, 1949, pp. 149-76. Process Control
4. Hill, I.D., Hill, R. and Holder, R.L., “Fitting Johnson Curves by Moments”, Algorithm AS99,
Applied Statistics, Vol. 25, 1976, pp. 180-9.
5. Hill, I.D.,“Normal-Johnson and Johnson-Normal Transformations”, Algorithm AS100
Applied Statistics, Vol. 25, 1976, pp. 190-2. 37
6. Shewart, W.A., Economic Control of Quality of Manufacturing Product, Van Nostrand,
London, 1931.
7. British Standard(BS) 2564, Control Chart Technique when Manufacturing to a
Specification with Special Reference to Articles Machined to Dimensional Tolerances,
British Standard Institution, 1988.
8. Chatfield, C., Statistics for Technology, Chapman & Hall, London, 1976.
9. Continuing Process Control and Process Capability Improvement. A Guide to the Use of Control
Charts for Improving Quality and Productivity for Company, Supplier and Dealer Activities,
The Statistical Methods Office Operations Support Staffs, Ford Motor Company, 1987.
10. Pearson, E.S. and Hartley, H.O., (Eds), Biometrika Tables for Statisticians, Vol. 1, Charles
Griffin and Co., London, 1976.
11. Shenton, L.R. and Bowman, K.O., “A Bivariate Model for the Distribution of √b1 and √b2”
Journal American Statistical Association, Vol. 72 No. 357, 1977, pp. 206-11.
12. Spedding, T.A., “The Machined Surface – Statistics and Characterization”, PhD thesis,
Coventry Polytechnic, 1983.