Professional Documents
Culture Documents
Liu R Y 1995 Control Charts For Multivariate Processes
Liu R Y 1995 Control Charts For Multivariate Processes
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
http://www.jstor.org/page/info/about/policies/terms.jsp
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.
American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journal
of the American Statistical Association.
http://www.jstor.org
Control Charts for Multivariate Processes
Regina Y uu
This article uses the concept of data depth to introduce several new control charts for monitoring p~ocess~s of multivariate quality
measurements. For any dimension of the measurements, these charts are_ in the form of two-dimensional graphs that can be
visualized and interpreted just as easily as the well-known univaria~e X,~, and CUS.UM charts. Moreover, th~y have s~v~ral
significant advantages. First, they can detect simultaneously the .1ocat1on sh1_ft a~d scale mcrease of the pro~~s~, unh~e the e~1stmg
methods, which can detect only the location shift. Second, their construction is com~letely nonparametnc, m particular, it <loes
not require the assumption of normality for the quality distribution, which is needed m stan?ar~ appro~ches such as the X2 and
Hotelling's r2 charts. Thus these new charts generalize the principie of control charts to multivariate settmgs and apply to a much
broader class of quality distributions.
KEY WORDS: Control charts; Q chart; Quality control; r chart; S chart; Statistical process control.
where tia and "Ea denote the mean and the covariance ma-
trix of G, " ' " denotes the transpose of a ( k x 1) vector,
0.6 and "-1" denotes the inverse of a matrix. The empirical
version of MDa(y) is
0.5 -
MDam (y) = 1/[1 +(y - Y-)'s-1(y - Y)], (4)
0.4
where Y is the sample mean of Y1 , ... , Ym and S is the
sample covariance matrix. We observe that MDa(-) is also
affine invariant.
0.2 There are several other affine-invariant notions of data
depth, including Tukey's depth (Tukey 1975) and the ma-
jority depth of Singh (Liu and Singh 1993). As a matter
o.o ··---·-- -·-·---·--- ·----·-- of fact, all control charts proposed herein are also valid for
these two depths. (See Liu and Singh 1993 for a fuller dis-
cussion of various notions of data depth.) The simplicial
o 20 40 60 80 depth and the Mahalanobis depth suffice for our purposes,
because they illustrate well the contrasting properties of
Figure 1. r Chart. probabilistic geometry and metric distances. Henceforth
we use the same notation Da(·) to denote either notion of
the prescribed G( ·) in a certain sense. Thus we need to depth, unless indicated otherwise. We also assume that G
compare F with G. The statistics that we use to character- and F are two absolutely continuous distributions.
ize certain aspects of the difference between G and F are Clearly, a data depth induces a center-outward ordering
based on the notion of data depth, so we begin by describing of the sample points if depth values for all points are com-
sorne concepts of data depth. puted and compared. More specifically, if we arrange all
For any point y in R k, the simplicial depth (Liu 1990) of Da(Yi)'s in an ascending order and use Y[J] to denote the
y with respect to G is defined to be sample point associated with the jth smallest depth value,
then Y¡1¡, Y¡2¡, ... , Y[m] are the order statistics of Yi 's,
SDa(Y) = Pa{Y E s[Y1, ... , Yk+1]}, (1) with
0.4
which measures how deep y is within the data cloud
{Y1, ... , Ym}. Here I ( ·) is the indicator function; that is,
0.3
I(A) = 1 if A occurs and I(A) =O otherwise. The function
Gm(·) denotes the empirical distribution of {Yi, ... , Ym}
and (*) runs over all possible subsets of {Yi, ... , Ym} of 0.2
size ( k + 1). A fuller motivation together with the basic
properties of SDa ( ·) can be found in an earlier work 0.1
(Liu
1990), where it was shown in particular that SDa(·) is affine
invariant and that SDam ( ·) converges uniformly and
o.o
strongly to SDa ( ·). The affine invariance will ensure that
our proposed control charts are coordinate free, and the 5 10 15 20
convergence of SDam to SDa will allow us to approximate Figure 2. Q Chart (n = 4).
SDa(·) by SDam (-) when Gis not specified.
13 Control Charts for Multivariate
Liu: Journal of the 13
and UCL
rcm(Y) = #{YjlDcm(Yj) :S Dcm(y),j = 1, ... ,m}/m.
(6)
Let Fn ( ·) denote the empirical distribution of the sample
{X1, ... , Xn}· We can now define
Q( G, F) = P{ Dc(Y) :S Dc(X) IY rv G, X rv F} 2 3 4 5 6 7 8 9 10
( = Ep[rc(X)]), (7)
In this example, UCL = CL + Za¡2a, LCL = CL - za¡2a,
1 and CL = µ if µis known and =Y otherwise. Here Za indi-
Q(G, Fn) = -
n
L rc(Xi),
n
(8) cates the upper a critical value of the standard normal dis-
tribution; that is, a = P(Z > Za), where z
rv N(O, 1). The
i=l
-1 o
a. rc(X) rv U[O, 1], and
b. as m --+ oo, "c-; (X) --+¡:, U[0,1] along almost all
{Y1, ... , Ym} sequences, provided that Dcm (-) con-
verges to De(·) uniformly as m=» oo.
-15
Remark 3.1. The uniform convergence of Dcm (·) holds
for the simplicial depth if G is absolutely continuous, and
for the Mahalanobis depth if G has a bounded second ab-
-20 solute moment.
Under H0: F = G, Proposition 3.1 implies that the ex-
pected value of rc(X) is .5 and that of rc.; (X) is .5 almost
o 20 40 60 80 surely for all sequences {Y1, ... , Ym} for large m. This jus-
Figure 4. S Chart. tifies choosing .5 to be CL of the r chart. When rc(X) (or
rc.; (X)) is much smaller than .5, there is doubt for H0 and
is lost in the plot. Furthermore, when the dimension k goes evidence to support Ha, signaling a possible quality deteri-
beyond 3, it does not seem possible to follow the same idea oration. When rc(X) (or "a.; (X)) is larger than .5, there
to construct charts that are easy to visualize. is indication of a decrease in scale with perhaps a negligible
Our r chart is constructed as follows. Compute {re (X 1), location shift. This is seen as an improvement in quality,
termed a gain in precision, and thus the process should not
rc(X2), ... } (or rcm(X1),rcTJX2), ... if only Y1, ... , Ym
be viewed as out-of-control. Therefore, there is only an
are available, but not G), following (5) (or (6)). The r chart
LCL in the r chart. The uniform distribution of rc(X) (or
is the plot of rc(Xi)'s (or "a.; (Xi)'s) against time i, with
"a.; (X)) implies clearly that LCL should be a.
CL = .5 and the control limit a. The process is declared
out-of-control if re ( ·) falls below a. Recall that a is the
false alarm rate, which generally is close to zero, so the r 1
chart only has LCL = a but no UCL. The motivation and
justification of the r chart as a control chart are given next.
The expression (6) shows that "a.; (X) is an indication
o
of how outlying X is with respect to the data cloud Yi 's.
A very small value of rc.; (X) means that only a very small -1
proportion of }i's are more outlying than X. Thus X is at
the "outskirt" and is not conforming to most of the central -2
part of the good data set. Assuming that X rv F, a small
value of "a.; () then suggests a possible deviation from G -3
to F. Since rc.; (-) is defined according to data depth, the
possible deviation here can be a shift in "center" and/ or an
increase in scale. (A detailed mathematical justification of -4
this interpretation can be derived from Liu and Singh 1993,
sec. 3.) Thus the r chart with LCL =a corresponds toan -5
o-level test of the following hypotheses:
-6
H0: F = G vs. Ha: there is a location shift
and/or a scale increase from G to F. (10) -7
We observe that the alternative hypothesis is particularly
suitable for detecting quality deterioration in quality con-
trol, as it presents a loss of accuracy and/ or a loss of pre-
o 20 40 60 80
cision. This also justifies viewing the process as out-of- Figure5. S * Chart.
13 Control Charts for Multivariate
Liu: Journal of the 13
Table 1. Simplicial Depth Values and Ranks
or
X D(X) r(X) X D(X) r(X)
{Q(Gm, F~), Q(Gm, F~), ... }
1 .0028 .082 41 o .022
2 .2263 .948 42 o .022
if only Y1, ... , Ym are available .
3 .1794 . 840 43 o .022
64 .0196
.0025 .256
.074 44
46 .0107
.0041 .194
.100
The main issue now is to set the correct values for CL
75 .1144
.0115 .670
.196 45
47 o .022
.022 shall
andsee
LCL thatQwhen
in this chart.n is large,
This in view
depends onofthethechoice
approximations
of n.
8 .0443 .392 48 o .022 described in Proposition 3.2, CL should be .5, whereas LCL
9 .0389 .358 49 .0111 .194 should be (.5- z (12n)-112) for plotting {Q(G, F~)}'s and
0
ZaV
10 .0268 .296 50 .0261 .290
11 o .022 51 o .022 {.5- 112[(1/m) + (1/n)]} for plotting {Q(Gm,F~)}'s
12 .1962 .888 52 o .022
13 .1651 .812 53 o .022 (cf. Fig. 3). This approximation seems to be quite reason-
14 .1835 .852 54 o .022 able even when nis as small as 5. In practice, however, n
15 .0249 .280 55 o .022 can be even smaller, say 3 or 4. In this case, we may use
16 .0583 .446 56 o .022
17 .1106 .658 57 o .022 the exact distributions for Q(G, Fn) given in Proposition
18 .0022 .068 58 o .022 3.3. lt turns out that for a small a value the Q chart should
19 .2315 . 962 59 o have CL = .5 and LCL = (n!a)1fn /n .
.022
20 .0366 .348 60 o .022 First we describe the large n asymptotics. The Q chart
21 .0711 .502 61 .0932 .588 corresponds to the o-level test based on Q( G, Fn) (or
22 . 0645 .472 62 o Q( Gm, Fn)) for testing the same set of hypotheses in (10).
.022
23 .0103 .186 63 o .022 These are actually two of the several multivariate rank tests
24 .0797 .542 64 o .022 studied by Liu (1992) and Liu and Singh (1993). Their main
25 .0870 .566 65 o .022 asymptotic properties are as follows .
26 .0051 . 114 66 o .022
27 .0518 .424 67 o .022
Proposition 3.2. Assume that the conditions in Propo-
28 o .022 68 .0123 .202
29 .0044 .102 69 o .022 sition 3 .1 hold. Then
30 .0903 .576 70 .1984 .896 a. as n ~ oo, [Q(G, Fn) - ~] ~1:, N(O, 1/(12n)); and
31 .1900 .866 71 .0250 .280 b. as min(m,n) ~ oo, [Q(Gm,Fn) - ~] ~¡:, N{O,
32 .1621 .800 72 .0087 .160
33 .1499 .768 73 o .022 [(1/m) + (1/n)l/(12)}, under the following additional
34 .0757 .528 74 o .022 condition: if MD(·) is used to define Q(·, ·), and G has
35 .0514 .420 75 o .022 a bounded fourth absolute moment; if SD( ·) is used to
36 .0581 .444 76 o .022
define Q(·, ·), and G is a one-dimensional distribu-
37 .1096 .656 77 o .022
38 .0570 .436 78 o .022 tion and its density is bounded above and below in a
39 .2082 .920 79 o .022 neighborhood of the median (or center) .
40 .1927 . 876 80 o
.022 The statement (a) is a straightforward application of the
central limit theorem, because Q( G, Fn) is just the average
Remark 3.2. Even though the r chart does not have the of n iid uniform random variables. The statement (b) has
UCL to make its CL the center line of the in-control region, been established by Liu and Singh (1993). Although (b) has
the CL here does serve as a reference point to allow us been proven only for R 1 in the case of SD, it was conjec-
to observe whether a pattern or trend is developing in a tured by Liu and Singh (1993) with the support of simu-
sequence of samples. lation results that it actually holds for any k-dimensional
G. lt is now evident that CL and LCL should be set to the
3.2 The Q Charts values indicated earlier when n is large.
The idea behind the Q chart is similar to that of the When n is small, the foregoing asymptotic results may
univariate X chart. When X1, X2, ... are univariate and G not be applicable. Since LCL in this case is the oth quan-
is normal, the X chart plots the averages of consecutive tile of the distribution of Q(G, Fn) = (1/n) ¿7=1 rc(Xi),
we need the distribution of the average of uniform random
subsets of the Xi 's. The X chart may prevent a false alarm
variables (cf. Prop. 3.1). This follows directly from the
when the process is actually in control but sorne individual
formula for the distribution of the sum of uniform random
sample point falls outside the control limits merely due to
variables provided in Proposition 3.3.
random ftuctuations. This is an advantage over the X chart.
In the multivariate setting we propose to plot the averages Proposition 3.3. Let {U1, ... , Un} be an iid sample
of subsets of the rc(Xi)'s (or "o.; (Xi)'s). Assume that each from U[O, 1], and let Hn(t) be the distribution function of
subset has size n. In the notation of (8) and (9), the averages 2.::=7=1 u; that is, Hn(t) = P{2.::=7=1 u, < t}. Then for each
of the rc(Xi)'s and rc.; (Xi)'s are given by Q( G, F~) and n = 1, 2, ... , Hn(t) =O for t ~O and
Q(Gm, F~). Here F~ is the empirical distribution of the
X/sin the jth subset, j = 1, 2, .... The Q chart plots
Hn(t) = ~! ~ (-1)• ( ~) (t-k)~, (11)
where defined by
(x)~ o, if X < O;
(13)
xn if X> O.
This formula has been derived by Feller (1971). The and
~l ·
expression (11) shows that H¿ ( ·) is a piecewise polynomial.
For our purpose, the most relevant part of the polynomial
is
«sc.: = ~ [ra=(X¡) - (14)
w0 = (n!a)1fn /n. This justifies our choice of LCL chart based on Sn(Gm) is -{z Jn2[(1/m) + (1/n)]/12}.
0
-.418 .030 .370 .126 .296 -.130 -.434 -.542 -.684 -.888
-1.366 -.978 -.666 -.314 -.534 -.588 -.430 -.862 -.400 -.552
-.550 -.578 -.892 -.850 -.784 -1.170 -1.246 -1.724 -2.122 -2.046
-1.680 -1.380 -1.112 -1.084 -1.164 -1.220 -1.064 -1.128 -.708 -.332
-.810 -1.288 -1.766 -2.072 -2.550 -2.950 -3.428 -3.906 -4.212 -4.422
-4.900 -5.378 -5.856 -6.334 -6.812 -7.290 -7.768 -8.246 -8.724 -9.202
-9.114 -9.592 -10.070 -10.548 -11.026 -11.504 -11.982 -12.28 -12.758 -12.362
-12.582 -12.922 -13.400 -13.878 -14.356 -14.834 -15.312 -15.79 -16.268 -16.746
(1992). This algorithm is highly efficient, because it re- which is -1. 96 in this case. For both figures, CL equals
quires only O( m log m) steps in computing the simplicial zero.
depths for m data point, instead of O(m4) steps as required In the simulation here, we have chosen m = 500. Clearly,
by direct computation based on solving systems of linear larger values of m give better approximations to the limiting
equations. The simplicial depth values of X/s are recorded distributions stated in Propositions 3.1, 3.2, and 3.4 and to
in the first column of Table 1. Based on these values we LCL' s for the r, Q, and S charts. Our experience shows
can compute all rc.; (Xi) using (6), and record them in the that the approximation results are reasonable when m is as
second column of Table 1. Figure 1 gives the plot of the small as 50 in the bivariate case. We would recommend
ro.; (Xi) 's with CL = .5 and LCL = .025, which is the larger values for higher-dimensional observations.
a value that we choose for all five charts. It clearly shows
5. CONCLUDING REMARKS
that the process is out-of-control in the second half, with
most of the re; (Xi)'s falling below LCL. The few false In addition to the X, X and CUSUM charts, there are
alarms in the first half of the X/s should be attributed to more complicated control charts for monitoring a univari-
random fluctuations in the same manner that false alarms ate process mean change, such as the moving average chart,
are characterized in a univariate X chart. the EWMA chart and the CUSUM chart with a V mask
Figures 2 and 3 show the Q charts with the group size (cf. Wetherill 1977). It would be interesting to develop
n = 4 and n = 10. The {Q(Gm, F~),j = 1, 2, ... } are our charts further along these lines. For example, a mov-
computed according to the definition (9) and are recorded ing average chart based on the r * ( ·) values in (5) or ( 6)
in Tables 2 and 3. For Figure 2, the CL has been set to .5 can be readily constructed. To obtain proper control limits
and the LCL has been set to .220, following Proposition 3.3. for this chart, one may apply the moving blocks bootstrap
In Figure 3, the results in Proposition 3.2 lead to the choice tech- niques of Liu and Singh ( 1992) to develop the
of CL = .5 and LCL = g-zaJl/12[(1/m) + (1/n)]}, distributions of the moving averages.
As discussed by Alt and Smith (1988), the classical mul-
which turns out to be .3193 when a = .025. Both plots
tivariate control charts based on the x2 or Hotelling's T2
clearly show that the process is out-of-control in the second
statistics (Hotelling 1949) are valid only when the process
half. We also observe that the averaging of rc.; (-)'s in
follows a normal distribution and can be used to detect a
Q has eliminated the random fluctuations appearing in the
mean shift only. When the process is bivariate, a control
first half of the r chart in Figure 1. In principie, because
ellipse may be used instead of the foregoing two charts.
the underlying distribution here is specified, we can use for
The control ellipse approach also requires the normality
example the computing package Mathematica to compute
assumption for the underlying process, and it loses the
the exact values of Dc(·)'s and hence Q(G, F~),j = 1, 2, ... chronological order of the plotted observations. In a differ-
and give the corresponding Q chart. The difference of this ent direction, one may use separate X charts for individual
chart and our Figure 2 appears to be negligible. component variables and then apply Bonferroni's inequal-
Figure 4 illustrates the S chart of the Sn(Gm) values in ity to provide a bound for the level of the combined test.
Table 4. Since the S values are not standardized here, the
As pointed out by Alt (1982), this inequality is not sharp
LCL is -zaJ(n2 /12)[(1/m) + (1/n)]. To keep the chart enough to give an accurate level unless the component vari-
within standard paper size, we need to adopt a much smaller ables are independent. More precisely, this approach tends
scale for the S axis. By contrast, in Figure 5, the S values to overestimate the probability for asserting that the process
have been standardized, and hence no severe rescaling is is in control.
needed. The standardized S values are recorded in Table 5, Since the sample Mahalanobis depth defined in ( 4) and
labeled as S*. The control limit LCL is a straight line - Za, Hotelling' s T2 are both measuring the quadratic distance of
Table5. S*-values
-1.447 .073 .738 .217 .456 -.183 -.564 -.659 -.783 -.963
-1.411 -.966 -.632 -.287 -.471 -.501 -.355 -.691 -.312 -.419
-.407 -.418 -.630 -.587 -.530 -.775 -.809 -1.098 -1.327 -1.257
-1.014 -.819 -.649 -.623 -.659 -.680 -.585 -.611 -.378 -.175
-.421 -.661 -.895 -1.037 -1.261 -1.442 -1.656 -1.866 -1.989 -2.066
-2.264 -2.459 -2.650 - 2.837 -3.020 -3.200 -3.377 -3.550 3.721 -3.889
-3.816 -3.980 -4.142 -4.300 -4.457 -4.610 -4.762 -4.840 -4.987 -4.794
-4.840 -4.932 -5.075 -5.216 -5.355 -5.492 -5.627 -5.760 -5.892 -6.022
13 Control Charts for Multivariate
Liu: Journal of the 13
a point to its mean, one may attempt to equate Hotelling's book of Statistics, 7, eds. P. R. Krishnaiah and C. R. Rao, Amsterdam:
T2 chart to our r or Q charts when Mahalanobis depth Elsevier, pp. 333-351.
is used. Note that in our approach, Mahalanobis depth Banks, J. (1989), Principies of Quality Control, New York: John Wiley.
Feller, W. (1971), Introduction to Probability Theory and Its Applications
serves only as a stepping stone to reduce the observations (2nd ed.), New York: John Wiley.
to "ranks." What we chart here are the "ranks" but not Hotelling, H. (1949), "Multivariate Quality Control," in Techniques in Sta-
the Mahalanobis depth values themselves. The determi- tistical Analysis, eds. C. Eisenhart, M. W. Hastay, and W. A. Wallis,
nation of the control limit in Hotelling's T2 plot requires New York: McGraw-Hill.
the exact sampling distribution of Hotelling's T2 statistic, Liu, R. (1990), "On a Notion of Data Depth Based on Random Simplices,"
The Annals of Statistics, 18, 405-414.
whereas this is not needed in our charts due to the fur- -- (1992), "Data Depth and Multivariate Rank Tests," in L1 -Statistical
ther transformation of statistics into ranks. Consequently, Analysis and Related Methods, ed. Y. Dodge, Amsterdam: Elsevier, pp.
our charts based on Mahalanobis depth are different from 279-294.
the Hotelling T2 plots. Regarding the choice of data depth Liu, R., and Singh, K. (1992), "Moving Blocks Bootstrap and Jackknife
Capture Weak Dependence," in Exploring the Limits of Bootstrap, eds.
for our charts, we note that if the underlying distribution
R. LePage and L. Billard, New York: John Wiley, pp. 225-248.
is close to elliptical, then it is more efficient to use Ma- -- (1993), "A Quality Index Based on Data Depth and Multivariate
halanobis depth. Otherwise, the more geometric type of Rank Tests;' Joumal of the American Statistical Association, 88, 252-
depth, such as majority depth, simplicial depth, and Tukey's 260.
depth, may be more desirable, because they do not require Mahalanobis, P. C. (1936), "On the Generalized Distance in Statistics,"
Proceedings of the National Academy India, 12, 49-55.
moment conditions.
Rousseeuw, P. J., and Ruts, l. (1992), "Bivariate Simplicial Depth,"
techni- cal report, University of Antwerp, Dept. of Mathematics and
[Received September 1993. Revised January 1995.] Computer Science.
Tukey, J. W. (1975), "Mathematics and Picturing Data," Proceedings of
REFERENCES the 1975 Intemational Congress of Mathematics, 2, 523-531.
Wadsworth, H., Stephen, K. S., and Godfrey, A. B. (1986), Modem Meth-
Alt, F. (1982), "Multivariate Quality Control: State of the Art," ASQC ods for Quality Control and lmprovement, New York: John Wiley.
Annual Quality Congress Transactions, pp. 886-893. Wetherill, G. B. (1977), Sampling Inspection and Quality Control (2nd
Alt, F., and Smith, N. (1988), "Multivariate Process Control," in Hand- ed.), New York: Chapman and Hall.