You are on page 1of 11

Non-normality

Non-normality in Statistical in Statistical


Process Control Process Control

Measurements
27
T.A. Spedding
Nanyang Technological University, Singapore, and Received March 1993
Revised July 1993
P.L. Rawlings
Renishaw Metrology Limited, Gloucestershire, UK

Introduction
In recent years the introduction of Statistical Process Control (SPC) in
manufacturing has provided a major boost to ensuring the quality of the final
product, by monitoring the variability of the process through each stage of
production. Several computerized SPC packages have made this task easier;
allowing data to be rapidly entered into the computer so that they may be
stored, analysed and the results output to the inspector. One of the underlying
assumptions of SPC is the use of the Gaussian distribution. Such assumptions
are implicit in the construction of control charts and process capability studies.
It has long been realized that the variability associated with many
engineering processes does not have a Gaussian distribution. In continuous
batch manufacture the normality assumption is often justified, but the
distribution of the process variation is more critical when considering the
sample sizes associated with small batch manufacture. Several SPC packages
provide indications of non-normality such as skewness and kurtosis and χ2
statistics but none offers an interpretation of the results or suggest an
alternative approach, if the process is found to be non-Gaussian.
This article addresses the problem of non-normality by using a general
system of distributions, so that once the distribution of the variability of the
engineering process has been identified it may be fitted to the data.
Use is made of the Johnson System of Distributions which provide the
capability of characterizing a wide range of engineering processes of differing
distributions and also facilitates transformations to and from normality. This
means that traditional statistical techniques based on the Gaussian distribution
may be applied and then converted, using translation techniques to the
appropriate Johnson distribution for the specified characterization. Thus
percentage points which otherwise would have meant a more complex analysis
may easily be generated by using the appropriate computer algorithm.

Systems of Distributions International Journal of Quality


& Reliability Management,
Distributions such as the Gaussian distribution are defined as one particular Vol. 11 No. 6, 1994, pp. 27-37,
© MCB University Press,
shape. In terms of skewness and kurtosis they have particular set values. 0265-671X
IJQRM A system of distributions may be developed to cover a whole range of
11,6 skewness and kurtosis values and so a whole range of distribution shapes.
The Pearson System of Distributions was developed by Karl Pearson in the
early 1900s[1] and covers a wide range of skewness and kurtosis values
encompassing many standard types of distributions such as Gaussian, Beta,
Gamma etc. The particular distribution from the system may be identified and
28 fitted using the first four sample moments (“method of moments”) that is the
mean, standard deviation, and the moment coefficients of skewness and
kurtosis. Once a particular type of Pearson distribution has been identified, the
standard characteristics of that particular distribution may be used to model
the data. Thus the 99 and 99.5 percentiles may be obtained for any distribution
of data given the first four sample moments. These percentage points are
tabulated in Biometrika Tables for Statisticians, Volume 2[2], for a wide range of
skewness and kurtosis values. The Pearson distribution has been used in some
SPC software packages recently as an alternative approach to process
capability analysis.
An alternative, translatory system of distributions was suggested by N.L.
Johnson in 1949[3]. The particular Johnson distribution is identified and fitted
from the data values using the “method of moments” in a similar way to the
Pearson distribution. Once identified a particular Johnson distribution may be
used to characterize the data. The Johnson system however also facilitates a
transformation to and from normality. This has the advantage that a particular
Johnson distribution may be fitted to a set of data points. Standard statistical
methods using, for example, the mean and standard deviation, and based on the
Gaussian distribution may be transformed to the particular distribution.
Using this approach it is unnecessary to rework standard statistical methods
based on the Gaussian distribution for non-Gaussian populations. For example,
to fit 99 per cent confidence limits to the data the following sequence could be
adopted:
(1) Identify the particular Johnson distribution based on the sample
moments of the data.
(2) Fit the appropriate Johnson distribution and so obtain the parameters for
that Johnson distribution.
(3) Determine the confidence limits based on the standard Gaussian
distribution. For example for the 99 per cent confidence limits these are
±1.96sd.
(4) Transform these values to the fitted Johnson distribution using the
parameters obtained in (2) and hence obtain the 99 per cent confidence
limits for the Johnson distribution.
This procedure may be easily computerized. A computer program has been
developed using algorithms presented by Hill et al.[4,5] which identify and fit
the Johnson distributions. This could be developed into a fully automatic
approach for an online SPC application. Note that when using this technique Non-normality
it is unnecessary to determine whether or not the engineering process under in Statistical
consideration is non-Gaussian. If the process is identified and found to be Process Control
exactly Gaussian then no transformation is necessary; this is accommodated in
the transformation system. The effect of the transformation on distributions
which are identified as being close to normality will be small, so the confidence
limits will be close to those of the Gaussian distribution anyway. In cases where 29
a computer is not available it is possible to use the tables presented in
Biometrika by using the nearest values for the sample skewness and kurtosis
values which are tabulated.

An Investigation into the Effect of Non-normality for Control


Charts
It was noted by Shewart[6], that the distribution of many individual
engineering measurements are non-Gaussian, although the distribution of
sample means, of sample size four, will, in many cases, follow the Gaussian
distribution, as predicted by the central limit theorem. Shewart arrived at this
conclusion on the basis of a set of experiments involving random drawings
from populations with rectangular and right-triangular distributions. However,
these distributions do not represent very significant departures from normality
compared to those found in many engineering processes. More extreme
population distributions may require larger sample sizes to achieve a Gaussian
distribution of sample means. When a quality engineer formulates a statistical
process control plan he needs to know this information to arrive at an equitable
balance of inspection cost and statistical accuracy.
The setting of control limits for process dispersion is even less well defined
than for location, i.e. sample variances are distributed according to the chi-
squared distribution when the population distribution is Gaussian, but for
every non-Gaussian population distribution the sampling distribution of
variance must be characterized individually. However, the British Standard[7]
and many other sources, advocate setting control limits based upon the chi-
squared distribution even when the population is non-Gaussian. Again the
quality engineer requires guidance on the likely errors associated with this
approximation. Hence, a systematic investigation of the relationship between
sample size, sampling distribution and population distribution is required.
To investigate the effects of sampling from non-Gaussian distributions, a
non-Gaussian random number generator is required. One method of achieving
this is to use a Gaussian random number generator and then to transform the
deviates to the required non-Gaussian distribution, as characterized by the
sample moments, using the Johnson system of transformations.
A Gaussian distribution has zero skew and kurtosis of three, hence if the
properties of the central limit theorem are to be used to obtain a Gaussian
distribution of the sample means or the χ 2 distribution for the standard
IJQRM deviation then the shape statistics should quickly assume their Gaussian values
11,6 with increasing sample size.

The Sampling Distribution of the Mean


Sampling distributions are characterized by their location, scale and shape, the
most common statistics used to quantify them being the mean, standard
30 deviation, skewness and kurtosis. Statistical process control theory is based on
the distribution of sample means following the standardized Gaussian
distribution, (i,e. characterized as having a mean of zero, standard deviation
one, skewness zero and kurtosis three), as stated by the Central Limit Theorem,
However, for non-Gaussian processes the distribution of sample means
asymptotically approaches normality with increased sample size. Statistical
textbooks [e.g. 8] suggest that the Central Limit Theorem should not be relied
on for sample sizes of less than 30 from non-Gaussian populations. Usually
however, no advice is given concerning the accuracy of the Central Limit
Theorem with respect to the sample size and the degree of non-normality.
Shewart contended that a sample size of four is adequate for normalizing the
data, on the basis of sampling experiments from rectangular and right
triangular distributions, and this sample size has subsequently been endorsed
by the British Standards Institute[7] and the Ford Guide to Quality Control[9].
To determine the validity of this contention a system of computerized
statistical simulations were performed for various non-Gaussian populations
using various sample sizes. The results of the simulations are shown in Figures
1 and 2.

1.4
1.3
1.2 2.5, 10 Upper 1 per cent point
2, 7 for skewness
1.1 1.25, 4
1 1, 3
0.9, 2.5
0.9
Skewness

0.8
0.7
0.6 ✕

0.5
+ + ✕
0.4 +✕
+ ✕
0.3 ✕ ++ ✕
✕ ✕
0.2 + +
✕ + +

✕ +

+
0.1 +
✕ ✕
Figure 1. 0
0 10 20 30 40 50
Skewness vs Sample Sample size
Size
Non-normality
5.5 in Statistical
Upper 1 per cent point
2.5, 10 for kurtosis Process Control
2, 7
5 1.25, 4
1, 3
0.9, 2.5

4.5 31
Kurtosis

3.5
✕ ✕
✕ ✕ ✕ + ✕
+ +
3 + +✕ ✕ + + + ✕
+
++✕ ✕ ✕ +
+ +
✕ ✕

2.5
0 10 20 30 40 50
Sample size
Figure 2.
Kurtosis vs Sample
Size

No. Mean SD Skew Kurt

1 0 1 0.9 2.5
2 0 1 1.0 3.0
3 0 1 1.25 4.0
4 0 1 2.0 7.0 Table I.
5 0 1 2.5 10.0 Population
Sample Size

A Gaussian distribution has zero skewness and a kurtosis of three, hence if the
properties of the Central Limit Theorem are to be used to obtain a Gaussian
distribution of the sample means, the shape statistics should quickly assume
their Gaussian values with increasing sample size. The graphs presented in
Figures 1 and 2 show the tendency for the sample shape statistics to move
towards 0, 3 with increasing sample size for the populations shown in Table I,
although the picture is rather confused, due to the sampling variation
associated with each statistic. A clear trend can be identified for the
relationship between skewness and sample size. For reference, a line
representing the upper one per cent point is shown for the distribution of
sample skewness for samples of increasing size from a Gaussian population.
This was obtained from Table 34B of Biometrika Tables for Statisticians,
Volume 1[10] which presents the percentage points as a test for departure
from normality.
IJQRM There is no clearly discernible trend for the graph of kurtosis against sample
11,6 size, Figure 2, except for the population distributions with the highest value of
kurtosis (i.e. distributions 4 and 5). Again for reference a line representing the
upper one per cent point for the distribution of sample kurtosis for samples
from a Gaussian population is shown. This was obtained from Table 34C of
Biometrika Tables for Statisticians, Volume 1[10]. The distributions can be seen
32 to move towards normality at sample sizes of around 15. However, the
distributions do not assume a stable level of kurtosis after this point, merely
varying around three.
This variability can be explained if the method of computation for the
kurtosis is considered, i.e. kurtosis is calculated as the fourth central moment of
a distribution normalized by the standard deviation. Hence any non-normality
in the sample distribution will be shown as a measurable deviation in the
sample kurtosis. While this indicates that kurtosis is a sensitive measure of
non-normality, it also shows that it is very sensitive to outliers and inaccuracies
in the shape distribution due to insufficient data points. In addition, it should be
noted that kurtosis is a biased estimator and that the bias decreases with
increased sample size so that a trend in the data could be masked by this effect.
Skewness will suffer from the same sources of variability as kurtosis, but not
to the same extent, as it is calculated as a third order statistic. For this reason
the method used to assess the efficiency of the Central Limit Theorem in these
experiments was based upon the tendency of the distribution’s skewness to
move towards zero.
Figure 1 illustrates a clear tendency for the distributions to move towards
normality as the sample size increases. A set of curves were fitted through the
data points using a simple exponential regression model and this has resulted
in the family of curves shown in Figure 3. It is evident from these sets of curves
that a sample size of 30 is required to be sure of achieving a Gaussian sampling
distribution for reasonably well behaved population distributions such as 1 and
2, otherwise sample sizes of between 30 and 50 are required.
It is possible to quantify the error associated in assuming normality by
application of the Pearson system of distributions. Taking the example of
distribution 2, i.e. skewness = 1, kurtosis = 3, the sampling distribution takes
on the approximate shape with skewness = 0.2, kurtosis = 3.2 for a sample size
of 30. The upper and lower 99 percentiles for this distribution are 2.81 and -2.44
respectively, compared to the symmetrical 99 percentiles of ±2.58 for the
Gaussian distribution. Thus it can be seen that the assumption of normality
will tend to increase the average run length for the lower process limit (i.e. the
process mean can move 0.14 before there is any possibility of detecting the shift)
and increase the number of false out-of-control signals above the upper limit.
By interpolation within the table of Pearson distributions[2] the increased rate
of false out-of-control signals, can be shown to be of the order 0.5 per cent.
This error would be negligible in the practical situation.
Non-normality
1.3
in Statistical
1.2 Process Control
Skewness and kurtosis
1.1 2.5, 10
2, 7
1 1.25, 4
1, 3
33
Skewness

0.9 0.9, 2.5


0.495, 3.27
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0 20 40 Figure 3.
Sample size Skewness against
Sample Size

The sampling distribution for a sample size of four, i.e. skew = 0,6, kurtosis =
3.4, has upper and lower 99 percentiles of 3.09 and –1.95 respectively, resulting
in increased rate of false out-of-control signals of the order 1 per cent and a shift
in the process mean of 0.63.
The above calculations are examples of the likely errors associated with
assuming normality of the distribution of sample means. It can be seen that the
errors are relatively small, but for situations where tight control is required the
quality engineer should be aware of their possible magnitude.

Process Limits for Non-Gaussian Processes –


To illustrate the effects of non-normality on the process control limits for X and
sd charts a series of simulations were performed, similar to those presented in
the section above, but extended to include the sampling distribution of the
standard deviation.
The investigation is based on the simulated sample statistics for a population
with a mean of 0, standard deviation 1, skewness 2, and kurtosis 7 for various
sample sizes using a trial size of 10,000. This distribution was chosen because
it represents the distribution of individuals produced when 30 diameters were
turned on a computer numerically controlled (CNC) lathe.
The process control limits are shown in Table II for both the averages and
standard deviation charts using the Gaussian and χ2 distributions respectively
and then using the relevant Johnson distribution for comparison.
IJQRM
Traditional chart Johnson chart
11,6 Sample X s(χ2) X s
No. size UCL LCL UCL LCL UCL LCL UCL LCL

1 4 1.542 –1.521 1.623 – 2.048 –0.798 2.317 0.028


2 5 1.332 –1.314 1.576 – 1.701 –0.746 2.179 0.052
34
3 6 1.229 –1.216 1.578 0.024 1.507 –0.735 2.081 0.063
4 7 1.148 –1.142 1.542 0.097 1.518 –0.733 2.037 0.074
5 8 1.049 –1.039 1.585 0.162 1.287 –0.692 1.884 0.140
6 9 0.999 –0.999 1.506 0.204 1.229 –0.663 1.917 0.112
7 10 0.948 –0.945 1.499 0.248 1.169 –0.644 1.884 0.140
8 15 0.778 –0.776 1.434 0.390 0.960 –0.624 1.750 0.180
9 20 0.669 –0.676 1.394 0.477 0.821 –0.548 1.669 0.244
10 25 0.600 –0.600 1.361 0.536 0.714 –0.501 1.611 0.288
11 30 0.550 –0.544 1.331 0.588 0.640 –0.465 1.634 0.292
12 35 0.507 –0.502 1.314 0.620 0.583 –0.434 1.593 0.347
13 40 0.476 –0.473 1.293 0.644 0.516 –0.401 1.567 0.400
14 45 0.441 –0.444 1.277 0.663 0.509 –0.384 1.528 0.438
15 50 0.431 –0.426 1.271 0.684 0.501 –0.368 1.506 0.461

Note: UCL = Upper control unit


Table II. LCL = Lower control unit
Process Control Limits

It can be seen that the true limits for the distribution of sample standard
deviations are generally far wider than those set using the χ2 distribution, and
the sampling distribution of means still exhibits significant skewness for a
sample size of 25. Therefore, the statistical process control practitioner should
be aware of this when analysing the process control charts.
It is interesting to note that if sufficient computing power were available it
would be possible to characterize the moments of sampling distributions for a
wide range of non-Gaussian processes using the above techniques. Thus, if a
good estimate of the population distribution were possible, limits could be
prescribed for the sampling distribution of the means and standard deviation.
The main problem of estimating the distribution of the process from the
sample statistics such as skewness and kurtosis is the sampling variation
associated with these parameters. It is very difficult to obtain reliable estimates
of the process skewness and kurtosis from small samples. In these cases
significant errors could occur. As a better system cannot be suggested it is
recommended that the standard charts be used, but when an out-of-control
condition is registered, a second sample should be taken immediately after the
deviant sample to determine whether a shift in the process has really occurred.
Process Capability Non-normality
Even though the Johnson system is impracticable for process control it has in Statistical
great potential for the determination of process capability. This is because Process Control
process capability is calculated using the standard deviation of the individuals
rather than the sample averages and is based upon, typically, sample sizes of
100. A problem with using the skewness and kurtosis statistics is that they are
not independent. The relationship between sample skewness and kurtosis has 35
been investigated[11,12] by producing sets of bivariate plots. Inspection of these
plots shows that the greatest density lies within a well defined central area thus
the sample statistic is most likely to be representative of the population. Hence,
the Johnson distribution is more likely to describe accurately the distribution
than the Gaussian distribution, and the following method is proposed as a
revision to statistical process control practice, and is based on the Ford Guide to
the use of Control Charts[9]:
(1) Calculate the process mean, standard deviation, skewness and kurtosis
based upon at least 100 individuals.
= =
(2) Calculate ZUSL = (USL –X )/s and ZLSL = (LSL –X )/s where USL and LSL
are the process upper and lower bilateral tolerances.
(3) Calculate the upper and lower Johnson distribution limits A1 and A2 for
±3s using the values from (1).
(4) Calculate the upper and lower C pk index =Z USL/A 1 and Z LSL/A 2 and
quote the smaller of the two.
The above procedure has been used to calculate the Cpk indices for the parent
distribution, i.e. mean 0, standard deviation 1, skewness 2 and kurtosis 7, used in
Table II. The following calculations present a comparison of the C pk values
obtained first by assuming normality and second by using the Johnson system
of transformations.
The upper and lower bilateral tolerances have been set at ±1.6 for the
purposes of comparison, so that under the assumption of normality:
Cpk = 0.533.
Using the Johnson System of Transformations, the limits at ±3s are 4.56 and
–0.801, hence:
Cpk (USL) = 0.351, and Cpk (LSL) = 1.998.
It can be seen from the above calculations that significant errors can be made,
by assuming normality, while the Johnson system enables a more accurate
calculation of process capability without the need for the practitioner to make,
or test, any assumptions of normality.

Discussion
The use of the Johnson System of Translatory Distributions and a general
approach to characterizing the variation of engineering processes have been
IJQRM illustrated. The use of such techniques for the construction of control charts for
11,6 processes of known distribution is a more accurate and reliable method than
traditional methods. For small batch manufacture where the distribution is
unknown, there could be significant errors due to the inherent sampling
variation associated with statistics such as skewness and kurtosis which are
needed to estimate Johnson distribution curves. In these cases it is
36 recommended that standard practices are adopted. It has been shown that
significant errors occur when estimating the process capability of non-Gaussian
processes by assuming the Gaussian distribution. In these cases the use of a
generalized system of distributions such as the Johnson System has significant
advantages over the traditional methods in terms of accuracy and reliability.
The Pearson System of Distributions has been adopted in several recent
software packages for SPC. The Johnson System of Distributions is
computationally more efficient because of the facility to translate to and from
the Gaussian distribution. This also has the advantage of being able to apply
standard Gaussian techniques such as the calculation of percentage points
without reference to other forms of analysis.

Conclusion
This article has highlighted many of the limitations inherent in traditional
statistical process control theory. It has been shown that the assumption of a
Gaussian distribution of sample means of sample four, from non-Gaussian
populations, is valid for reasonably behaved population distributions. But the
characterization of the distribution of sample standard deviation from non-
Gaussian populations, using the χ2 distribution is less accurate. Under these
circumstances a second sample should be taken immediately after an out-of-
control signal is generated to ensure the process variance has altered.
Significantly, for non-Gaussian distributions the statistical process control
practitioner should be aware that control limits can only be a guide, and for close
tolerance work 100 per cent inspection is required to ensure conformance.
An alternative method of assessing process capability has been developed,
based on the Johnson translatory system of distributions. Its use leads to a more
accurate assessment of process capability. The method has been computerized
and is easy to use.
It is hoped that this article has contributed to the awareness of the problem of
non-normality in statistical process control, and illustrated the errors associated
with non-normality, so that the quality practitioner has a greater understanding
of the significance of his results. A second article by the authors will address the
problem of correlation in statistical process control measurements.

References
1. Elderton, W.P. and Johnson, N.L., Systems of Frequency Curves, Cambridge University
Press, Cambridge, 1969.
2. Pearson, E.S. and Hartley, H.O. (Eds), Biometrika Tables for Statisticians, Vol. 2, Charles Non-normality
Griffin and Co., London, 1976.
3. Johnson, N.L., “Systems of Frequency Curves Generated by Methods of Translation”,
in Statistical
Biometrika, Vol. 36, 1949, pp. 149-76. Process Control
4. Hill, I.D., Hill, R. and Holder, R.L., “Fitting Johnson Curves by Moments”, Algorithm AS99,
Applied Statistics, Vol. 25, 1976, pp. 180-9.
5. Hill, I.D.,“Normal-Johnson and Johnson-Normal Transformations”, Algorithm AS100
Applied Statistics, Vol. 25, 1976, pp. 190-2. 37
6. Shewart, W.A., Economic Control of Quality of Manufacturing Product, Van Nostrand,
London, 1931.
7. British Standard(BS) 2564, Control Chart Technique when Manufacturing to a
Specification with Special Reference to Articles Machined to Dimensional Tolerances,
British Standard Institution, 1988.
8. Chatfield, C., Statistics for Technology, Chapman & Hall, London, 1976.
9. Continuing Process Control and Process Capability Improvement. A Guide to the Use of Control
Charts for Improving Quality and Productivity for Company, Supplier and Dealer Activities,
The Statistical Methods Office Operations Support Staffs, Ford Motor Company, 1987.
10. Pearson, E.S. and Hartley, H.O., (Eds), Biometrika Tables for Statisticians, Vol. 1, Charles
Griffin and Co., London, 1976.
11. Shenton, L.R. and Bowman, K.O., “A Bivariate Model for the Distribution of √b1 and √b2”
Journal American Statistical Association, Vol. 72 No. 357, 1977, pp. 206-11.
12. Spedding, T.A., “The Machined Surface – Statistics and Characterization”, PhD thesis,
Coventry Polytechnic, 1983.

You might also like