Multivariate Statistical Process Control With Industrial Application ( PDFDrive )

Multivariate Statistical
Process Control with

Industrial Applications
ASA-SIAM Series on
Statistics and Applied Probability
The ASA-SIAM Series on Statistics and Applied Probability is published
jointly by the American Statistical Association and the Society for Industrial and
Applied Mathematics. The series consists of a broad spectrum of books on topics
in statistics and applied probability. The purpose of the series is to provide
inexpensive, quality publications of interest to the intersecting membership of the
two societies.
Editorial Board
Robert N. Rodriguez Gary C. McDonald
SAS Institute, Inc., Editor-in-Chief General Motors R&D Center
Janet P. Buckingham Jane F. Pendergast

Southwest Research Institute University of Iowa
Richard K. Burdick Alan M. Polansky

Arizona State University Northern Illinois University
James A. Calvin Paula Roberson

Texas A&M University University of Arkansas
for Medical Sciences
Katherine Bennett Ensor
Rice University Dale L Zimmerman
University of Iowa
Robert L. Mason
Southwest Research Institute
Mason, R. L. and Young, J. C., Multivariate Statistical Process Control with

Smith, P. L, A Primer for Sampling Solids, Liquids, and Gases: Based on the Seven
Sampling Errors of Pierre Gy
Meyer, M. A. and Booker, J. M., Eliciting and Analyzing Expert Judgment:
A Practical Guide
Latouche, G. and Ramaswami, V., Introduction to Matrix Analytic Methods in
Stochastic Modeling
Peck, R., Haugh, L., and Goodman, A., Statistical Case Studies: A Collaboration
Between Academe and Industry, Student Edition
Peck, R., Haugh, L., and Goodman, A., Statistical Case Studies: A Collaboration
Between Academe and Industry
Barlow, R., Engineering Reliability
Czitrom, V. and Spagon, P. D., Statistical Case Studies for Industrial Process
Improvement
mULTIVARIATE STATISTICAL
Process Control with
Robert L Mason
Southwest Research Institute
San Antonio, Texas
John C. Young
McNeese State University
Lake Charles, Louisiana
InControl Technologies, Inc.
Houston, Texas
S1HJTL ASA
Society for Industrial and Applied Mathematics American Statistical Association
Philadelphia, Pennsylvania Alexandria, Virginia
Copyright © 2002 by the American Statistical Association and the Society for Industrial
and Applied Mathematics.
10987654321
All rights reserved. Printed in the United States of America. No part of this book may be
reproduced, stored, or transmitted in any manner without the written permission of the
publisher. For information, write to the Society for Industrial and Applied Mathematics,
3600 University City Science Center, Philadelphia, PA 19104-2688.
Library of Congress Cataloging-in-Publication Data

Mason, Robert L, 1946-
Multivariate statistical process control with industrial applications / Robert L.
Mason, John C. Young.
p. cm. - (ASA-SIAM series on statistics and applied probability)
Includes bibliographical references and index.
ISBN 0-89871-496-6
1. Process control-Statistical methods I. Young, John C., 1942- II. Title.
III. Series.
TS156.8 .M348 2001

658.5'62-dc21
2001034145
is a registered trademark.
Windows and Windows NT are registered trademarks of Microsoft Corporation.

QualStat is a trademark of InControl Technologies, Inc.
The materials on the CD-ROM are for demonstration only and expire after 90 days
of use. These materials are subject to the same copyright restrictions as hardcopy
publications. No warranties, expressed or implied, are made by the publisher, authors,
and their employers that the materials contained on the CD-ROM are free of error.
You are responsible for reading, understanding, and adhering to the licensing terms
and conditions for each software program contained on the CD-ROM. By using this
CD-ROM, you agree not to hold any vendor or SIAM responsible, or liable, for any
problems that arise from use of a vendor's software.
^&o ^Saimen an& Qjâm
This page intentionally left blank
Contents
Preface xi
1 Introduction to the T2 Statistic 1

1.1 Introduction 3
1.2 Univariate Control Procedures 4
1.3 Multivariate Control Procedures 5
1.4 Characteristics of a Multivariate Control Procedure 9
1.5 Summary 11
2 Basic Concepts about the T2 Statistic 13

2.1 Introduction 13
2.2 Statistical Distance 13
2.3 T2 and Multivariate Normality 17
2.4 Student t versus Retelling's T2 20
2.5 Distributional Properties of the T2 22
2.6 Alternative Covariance Estimators 26
2.7 Summary 28
2.8 Appendix: Matrix Algebra Review 28
2.8.1 Vector and Matrix Notation 29
2.8.2 Data Matrix 29
2.8.3 The Inverse Matrix 30
2.8.4 Symmetric Matrix 30
2.8.5 Quadratic Form 31
2.8.6 Wishart Distribution 32
3 Checking Assumptions for Using a T2 Statistic 33

3.1 Introduction 33
3.2 Assessing the Distribution of the T2 33
3.3 The T2 and Nonnormal Distributions 37
3.4 The Sampling Distribution of the T2 Statistic 37
3.5 Validation of the T2 Distribution 41
3.6 Transforming Observations to Normality 47
3.7 Distribution-Free Procedures 48
3.8 Choice of Sample Size 49
vii
viii Contents
3.9 Discrete Variables 50

3.10 Summary 50
3.11 Appendix: Confidence Intervals for UCL 51
4 Construction of Historical Data Set 53

4.1 Introduction 54
4.2 Planning 55
4.3 Preliminary Data 57
4.4 Data Collection Procedures 61
4.5 Missing Data 62
4.6 Functional Form of Variables 64
4.7 Detecting Collinearities 65
4.8 Detecting Autocorrelation 68
4.9 Example of Autocorrelation Detection Techniques 72
4.10 Summary 78
4.11 Appendix 78
4.11.1 Eigenvalues and Eigenvectors 78
4.11.2 Principal Component Analysis 79
5 Charting the T2 Statistic in Phase I 81

5.1 Introduction 81
5.2 The Outlier Problem 81
5.3 Univariate Outlier Detection 82
5.4 Multivariate Outlier Detection 85
5.5 Purging Outliers: Unknown Parameter Case 86
5.5.1 Temperature Example 86
5.5.2 Transformer Example 89
5.6 Purging Outliers: Known Parameter Case 91
5.7 Unknown T2 Distribution 92
5.8 Summary 96
6 Charting the T2 Statistic in Phase II 97

6.1 Introduction 98
6.2 Choice of False Alarm Rate 98
6.3 T2 Charts with Unknown Parameters 100
6.4 T2 Charts with Known Parameters 105
6.5 T2 Charts with Subgroup Means 106
6.6 Interpretive Features of T2 Charting 108
6.7 Average Run Length (Optional) Ill
6.8 Plotting in Principal Component Space (Optional) 115
6.9 Summary 118
7 Interpretation of T2 Signals for Two Variables 119

7.1 Introduction 119
7.2 Orthogonal Decompositions 120
7.3 The MYT Decomposition 125
Contents ix
7.4 Interpretation of a Signal on a T2 Component 127

7.5 Regression Perspective 129
7.6 Distribution of the T2 Components 131
7.7 Data Example 135
7.8 Conditional Probability Functions (Optional) 140
7.9 Summary 142
7.10 Appendix: Principal Component Form of T2 143
8 Interpretation of T2 Signals for the General Case 147

8.2 The MYT Decomposition 147
8.3 Computing the Decomposition Terms 149
8.4 Properties of the MYT Decomposition 152
8.5 Locating Signaling Variables 155
8.6 Interpretation of a Signal on a T2 Component 157
8.7 Regression Perspective 162
8.8 Computational Scheme (Optional) 163
8.9 Case Study 165
8.10 Summary 169
9 Improving the Sensitivity of the T2 Statistic 171

9.2 Alternative Forms of Conditional Terms 172
9.3 Improving Sensitivity to Abrupt Process Changes 174
9.4 Case Study: Steam Turbine 175
9.4.1 The Control Procedure 175
9.4.2 Historical Data Set 176
9.5 Model Creation Using Expert Knowledge 180
9.6 Model Creation Using Data Exploration 183
9.7 Improving Sensitivity to Gradual Process Shifts 188
9.8 Summary 191
10 Autocorrelation in T2 Control Charts 193

10.2 Autocorrelation Patterns in T2 Charts 194
10.3 Control Procedure for Uniform Decay 199
10.4 Example of a Uniform Decay Process 201
10.4.1 Detection of Autocorrelation 202
10.4.2 Autoregressive Functions 202
10.4.3 Estimates 207
10.4.4 Examination of New Observations 209
10.5 Control Procedure for Stage Decay Processes 211
10.6 Summary 212
11 The T2 Statistic and Batch Processes 213

x Contents
11.2 Types of Batch Processes 213

11.3 Estimation in Batch Processes 217
11.4 Outlier Removal for Category 1 Batch Processes 219
11.5 Example: Category 1 Batch Process 221
11.6 Outlier Removal for Category 2 Batch Processes 226
11.7 Example: Category 2 Batch Process 226
11.8 Phase II Operation with Batch Processes 230
11.9 Example of Phase II Operation 232
11.10 Summary 234
Appendix. Distribution Tables 237
Bibliography 253
Index 259
Preface
Industry continually faces many challenges. Chief among these is the requirement to
improve product quality while lowering production costs. In response to this need,
much effort has been given to finding new technological tools. One particularly im-
portant development has been the advances made in multivariate statistical process
control (SPC). Although univariate control procedures are widely used in industry
and are likely to be part of a basic industrial training program, they are inadequate
when used to control processes that are inherently multivariate. What is needed
is a methodology that allows one to monitor the relationships existing among and
between the process variables. The T2 statistic provides such a procedure.
Unfortunately, the area of multivariate SPC can be confusing and complicated
for the practitioner who is unfamiliar with multivariate statistical techniques. Lim-
ited help comes from journal articles on the subject, as they usually include only
theoretical developments and a limited number of data examples. Thus, the prac-
titioner is not well prepared to face the problems encountered when applying a
multivariate procedure to a real process situation. These problems are further
compounded by the lack of adequate computer software to do the required complex
computations.
The motivation for this book came from facing these problems in our data con-
sulting and finding only a limited array of solutions. We soon decided that there
was a strong need for an applied text on the practical development and application
of multivariate control techniques. We also felt that limiting discussions to strate-
gies based on Hotelling's T2 statistic would be of most benefit to practitioners. In
accomplishing this goal, we decided to minimize the theoretical results associated
with the T2 statistic, as well as the distributional properties that describe its be-
havior. These results can be found in the many excellent texts that exist on the
theory of multivariate analysis and in the numerous published papers pertaining
to multivariate SPC. Instead, our major intent is to present to the practitioner
a modern and comprehensive overview on how to establish and operate an ap-
plied multivariate control procedure based on our conceptual view of Hotelling's
T2 statistic.
The intended audience for this book are professionals or students involved with
multivariate quality control. We have assumed the reader is knowledgeable about
univariate statistical estimation and control procedures (such as Shewhart charts)
and is familiar with certain probability functions, such as the normal, chi-square,
t, and F distributions. Some exposure to regression analysis also would be helpful.
xi
xii Preface
Although an understanding of matrix algebra is a prerequisite in studying any area

of multivariate analysis, we have purposely downplayed this requirement. Instead,
appendices are included in various chapters in order to provide the minimal material
on matrix algebra needed for our presentation of the T2 statistic.
As might be expected, the T2 control procedure requires the use of advanced
statistical software to perform the numerous computations. All the T2 charts pre-
sented in this text were generated using the QualStat™ software package, which
is a product of InControl Technologies, Inc. On the inside back cover of the book
we have included a free demonstration version of this software. You will find that
the full-licensed version of the package is easy to apply and provides an extended
array of graphical and statistical summaries. It also contains modules for use with
most of the procedures discussed in this book.
This text contains 11 chapters. These have been designed to progress in the
same chronological order as one might expect to follow when actually constructing
a multivariate control procedure. Each chapter has numerous data examples and
applications to assist the reader in understanding how to apply the methodology.
A brief description of each chapter is given below.
Chapter 1 provides the incentive and intuitive grasp of statistical distance and
presents an overview of the T2 as the ideal control statistic for multivariate pro-
cesses. Chapter 2 supplements this development by providing the distributional
properties of the T2 statistic as they apply to multivariate SPC. Distributional re-
sults are stated, and data examples are given that illustrate their use when applied
to control strategy. Chapter 3 provides methods for checking the distributional
assumptions pertaining to the use of the T2 as a control statistic. When distribu-
tional assumptions cannot be satisfied, alternative procedures are introduced for
determining the empirical distribution of the T2 statistic.
Chapters 4 and 5 discuss the construction of the historical data set and T2
charting procedures for a Phase I operation. This includes obtaining the preliminary
data, analyzing data problems such as collinearity and autocorrelation, and purging
outliers. Chapter 6 addresses T2 charting procedures and signal detection for a
Phase II operation. Various forms of the T2 statistic also are considered.
Signal interpretation, based on the MYT (Mason-Young-Tracy) decomposition,
is presented in Chapter 7 for the bivariate case. We show how a signal can be
isolated to a particular process variable or to a group of variables. In Chapter 8
these procedures are extended to cases involving two or more variables. Procedures
for increasing the sensitivity of the T2 statistic to small consistent process changes
are covered in Chapter 9. A T2 control procedure for autocorrelated observations
is developed in Chapter 10, and the concluding chapter, Chapter 11, addresses
methods for monitoring batch processes using the T2 statistic.
We would like to express our sincere thanks to PPG Industries, Inc., especially
the Chemicals and Glass Divisions, for providing numerous applications of the T2
control procedure for use in this book. From PPG in Lake Charles, LA, we thank
Joe Hutchins of the Telphram Development Project; Chuck Stewart and Tommy
Hampton from Power Generation; John Carpenter and Walter Oglesby from Vinyl;
Brian O'Rourke from Engineering; and Tom Hatfield, Plant Quality Coordinator.
The many conversations with Bob Jacobi and Tom Jeffery (retired) were most
Preface xiii
helpful in the initial stages of development of a T2 control procedure. A special

thanks also is due to Cathy Moyer and Dr. Chuck Edge of PPG's Glass Technical
Center in Harmerville, PA; Frank Larmon, ABB Industrial Systems, Inc.; Bob
Smith, LA Pigment, Inc.; and Stan Martin (retired), Center for Disease Control.
Professor Youn-Min Chou of the University of Texas at San Antonio, and Pro-
fessor Nola Tracey McDaniel of McNeese State University, our academic colleagues,
have been most helpful in contributing to the development of the T2 control pro-
cedure. We also wish to acknowledge Mike Marcon and Dr. James Callahan of
InControl Technologies, Inc., whose contributions to the application of the T2 statis-
tic have been immeasurable.
We ask ourselves, where did it all begin? In our case, the inspiration can be
traced to the same individual, Professor Anant M. Kshirsagar. It is not possible
for us to think about a T2 statistic without incurring the fond memories of his
multivariate analysis classes while we were in graduate school together at Southern
Methodist University many years ago.
Finally, we wish to thank our spouses, Carmen and Pam. A project of this
magnitude could not have been completed without their continued love and support.
Robert L. Mason
John C. Young
Chapter 1
Introduction to the T2 Statistic
The Saga of Old Blue

Imagine that you have recently been promoted to the position of performance
engineer. You welcome the change, since you have spent the past few years
as one of the process engineers in charge of a reliable processing unit labeled
"Old Blue." You know every "nook and cranny" of the processing unit and
especially what to do to unclog a feed line. Each valve is like a pet to you, and
the furnace is your "baby." You know all the shift operators, having taught
many and learned from others. This operational experience formed the basis
for your recent promotion, since in order to be a performance engineer, one
needs a thorough understanding of the processing unit. You are confident that
your experience will serve you well.
With this promotion, you soon discover that your job responsibilities have
changed. No longer are you in charge of meeting daily production quotas,
assigning shifts to the operators, solving personnel problems, and fighting for
your share of the maintenance budget. Your new position demands that you
adopt the skills of a detective and search for methods to improve unit perfor-
mance. This is great, since over time you have developed several ideas that
should lead to process improvement. One of these is the recently installed
electronic data collector that instantly provides observations on all variables
associated with the processing unit.
Some other areas of responsibility for you include identifying the causes of
upset conditions and advising operations. When upsets occur, quick solutions
must be found to return the unit to normal operational conditions. With your
understanding of the unit, you can quickly and efficiently address all such
problems. The newly created position of performance engineer fulfills your
every dream.
But suddenly your expectations of success are shattered. The boss anx-
iously confronts you and explains that during the weekend, upset conditions
occurred with Old Blue. He gives you a diskette containing process data,
1
2 Chapter 1. Introduction to the T2 Statistic
retrieved from the data net for both a good-run time period and the upset time
period. He states that the operations staff is demanding that the source of the
problem be identified. You immediately empathize with them. Having lived
through your share of unit upsets, you know no one associated with the unit
will be happy until production is restored and the problem is resolved. There
is an entire megabyte of data stored on the diskette, and you must decide how
to analyze it to solve this problem.
What are your options ? You import the data file to your favorite spread-
sheet and observe that there are 10,000 observations on 35 variables. These
variables include characteristics of the feedstock, as well as observations on
the process, production, and quality variables. The electronic data collector
has definitely done its job.
You remember a previous upset condition on the unit that was caused by a
significant change in the feedstock. Could this be the problem? You scan the
10,000 observations, but there are too many numbers and variables to see any
patterns. You cannot decipher anything.
The thought strikes you that a picture might be worth 1,000 observations.
Thus, you begin constructing graphs of the observations on each variable plot-
ted against time. Is this the answer? Changes in the observations on a variable
should be evident in its time-sequence graph. With 35 variables and 10,000
observations, this may involve a considerable time investment, but it should
be worthwhile. You readily recall that your college statistics professor used to
emphasize that graphical procedures were an excellent technique for gaining
data insight.
You initially construct graphs of the feedstock characteristics. Success
eludes you, however, and nothing is noted in the examination of these plots.
All the input components are consistent over the entire data set, including
over both the prior good-run period and the upset period. From this analysis,
you conclude that the problem must be associated with the 35 process vari-
ables. However, the new advanced process control (APC) system was working
well when you left the unit. The multivariable system keeps all operational
variables within their prescribed operational range. If a variable exceeded the
range, an alarm would have signaled this and the operator would have taken
corrective action. How could the problem be associated with the process when
all the variables are within their operational ranges?
Having no other options, you decide to go ahead and examine the process
variables. You recall from working with the control engineers in the instal-
lation of the APC system that they had been concerned with how the process
variables vary together. They had emphasized studying and understanding the
correlation structure of these variables, and they had noted that the variables
did not move independently of one another, but as a group. You decide to
examine scatter plots of the variables as well as time-sequence plots. Again,
you recall the emphasis placed on graphical techniques by that old statistics
professor. What was his name?
You begin the laborious task, soon realizing the enormity of the job. From
experience, it is easy to identify the most important control variables and the
1.1. Introduction 3
fundamental relationships existing between and among the variables. Perhaps

scatter plots of the most influential variables will suffice in locating the source
of the problem. However, you realize that if you do not discover the right
combination of variables, you will never find the source of the problem.
You are interrupted from your work by the reappearance of your boss, who
inquires about your progress. The boss states he needs an immediate success
story to justify your newly created position of performance engineer. There
is a staff meeting in a few days, and he would like to present the results of
this work as the success story. More pressure to succeed. You decide that you
cannot disappoint your friends in the processing unit, nor your boss. You feel
panic creeping close to the edge of your consciousness.
A cup of coffee restores calm. You reevaluate your position. How does one
locate the source of the problem? There must be a quicker, easier way than
the present approach. We have available a set of data consisting of 10,000
observations on 35 variables. The solution must lie with the use of statistics.
You slowly begin to recall the courses you took in college, which included basic
courses in statistical procedures and statistical process control (SPC). Would
those work here? Yes, we can compare the data from the good-run period to
the data from the upset period. How is this done for a group o/35 variables?
This was the same comparison made in SPC. The good-run period data served
as a baseline and the other operational data were compared to it. Signals
occurred when present operational data did not agree with the baseline data.
Your excitement increases as you remember more. What was that professor's
name, old Dr. or Dr. Old?
Your coursework in SPC covered only situations involving 1 variable. You
need a procedure that considers all 35 related variables at one time and in-
dicates which variable or group of variables contributes to the signal. A pro-
cedure such as this would offer a solution to your problem. You rush to the
research center to look for a book that will instruct you on how to solve prob-
lems in multivariate SPC.
1.1 Introduction
The problem confronting the young engineer in the above situation is common in
industry. Many dollars have been invested in electronic data collectors because
of the realization that the answer to most industrial problems is contained in the
observations. More money has been spent on multivariable control or APC sys-
tems. These units are developed and installed to ensure the containment of process
variables within prescribed operational ranges. They do an excellent job in reduc-
ing overall system variation, as they restrict the operational range of the variables.
However, an APC system does not guarantee that a process will satisfy a set of
baseline conditions, and it cannot be used to determine causes of system upsets.
As our young engineer will soon realize, a multivariate SPC procedure is needed
to work in unison with the electronic data collector and the APC system. Such a
Group Number
Figure 1.1: Shewhart chart of a process variable.
procedure will signal process upsets and, in many cases, can be used to pinpoint
precursors of the upset condition before control is lost. When signals are identified,
the procedure allows for the decomposition of the signal in terms of the variables
that contributed to it. Such a system is the main subject of this book.
1.2 Univariate Control Procedures

Walter A. Shewhart, in a Bell Telephone Laboratories memorandum dated May
16, 1924, presented the first sketch of a univariate control chart (e.g., see Duncan
(1986)). Although his initial chart was for monitoring the percentage defective in
a production process, he later extended his idea to control charts for the average
and standard deviation of a process. Figure 1.1 shows an example of a Shewhart
chart designed to monitor the mean, X, of a group of process observations taken
on a process variable at the same time point. Drawn on the chart are the upper
control limit (UCL) and the lower control limit (LCL).
Shewhart charts are often used in detecting unusual changes in variables that
are independent and thus not influenced by the behavior of other variables. These
changes occur frequently in industrial settings. For example, consider the main
laboratory of a major chemical industry. Many duties are assigned to this facility.
These may range from research and development to maintaining the quality of
production. Many of the necessary chemical determinations are made using various
types of equipment.
How do we monitor the accuracy (i.e., closeness to a target value) of the de-
termination made by the equipment? Often a Shewhart chart, constructed from
a set of baseline data, is utilized. Suppose a measurement on a sample of known
concentration is taken. If the result of the sample falls within the control limits
of the Shewhart chart, it is assumed that the equipment is performing normally.
1.3. Multivariate Control Procedures 5
Figure 1.2: Model of a production unit.

Otherwise, the equipment is fixed and recalibrated and a new Shewhart chart is
established.
It may be argued that a specific chemical determination is dependent on other
factors such as room temperature and humidity. Although these factors can influ-
ence certain types of equipment, compensation is achieved by using temperature
and humidity controls. Thus, this influence becomes negligible and determinations
are treated as observations on an independent variable.
1.3 Multivariate Control Procedures

There are many industrial settings where process performance is based on the be-
havior of a set of interrelated variables. Production units, such as the one illustrated
in Figure 1.2, are excellent examples. They are designed to change an input to some
specific form of output. For example, we may wish to change natural gas, a form
of energy, to an alternate state such as steam or electricity. Or, we may wish to
convert brine (salt water) to caustic soda and chlorine gas; sand to silica or glass;
or hydrochloric acid and ethylene to ethylene dichloride, which in turn is changed
to vinyl chloride. Our interest lies in the development of a control procedure that
will detect unusual occurrences in such variables.
Why not use univariate control procedures for these situations? To answer this
question, we first need to describe the differences between univariate and multivari-
ate processes. Although the biggest distinction that is evident to the practitioner
is the number of variables, there are more important differences. For example, the
characteristics or variables of a multivariate process often are interrelated and form
a correlated set. Since the variables do not behave independently of one another,
they must be examined together as a group and not separately.
Multivariate processes are inherent to many industries, such as the chemical
industry, where input is being chemically altered to produce a particular output. A
good example is the production of chlorine gas and caustic soda. The input variable
is saturated brine (water saturated with salt). Under proper conditions, some of the
brine is decomposed by electrolysis to chlorine gas; caustic soda is formed within the
brine and is later separated. The variables of interest are the components produced
by the electrolysis process. All are related to the performance of the process. In
addition, many of the variables follow certain mathematical relationships and form
a highly correlated set.
The correlation among the variables of a' multivariate system may be due to
either association or causation. Correlation due to association in a production unit
often occurs because of the effects of some unobservable variable. For example, the
blades of a gas or steam turbine will become contaminated (dirty) from use over
time. Although the accumulation of dirt is not measurable, megawatt production
will show a negative correlation with the length of time from the last cleaning of
the turbine. The correlation between megawatt production and length of time since
last cleaning is one of association.
An example of a correlation due to causation is the relationship between tem-
perature and pressure since an increase in the temperature will produce a pressure
change. Such correlation inhibits examining each variable by univariate procedures
unless we take into account the influence of the other variable.
Multivariate process control is a methodology, based on control charts, that is
used to monitor the stability of a multivariate process. Stability is achieved when
the means, variances, and covariances of the process variables remain stable over
rational subgroups of the observations.
The analysis involved in the development of multivariate control procedures
requires one to examine the variables relative to the relationships that exist among
them. To understand how this is done, consider the following example. Suppose
we are analyzing data consisting of four sets of temperature and pressure readings.
The coordinates of the points are given as
where the first coordinate value is the temperature and the second value is the
pressure. These four data points, as well as the mean point of (175, 75), are plotted
in the scatter plot given in Figure 1.3. There also is a line fitted through the points
and two circles of varying sizes about the mean point.
If the mean point is considered to be typical of the sample data, one form of
analysis consists of calculating the distance each point is from the mean point. The
distance, say D, between any two points, (ai, a^) and (&i, 62)5 is given by the
formula
This type of distance measure is known as Euclidean, or straight-line, distance.

The distance that each of our four example points is from the mean point (in
order of occurrence) is computed as
From these calculations, it is seen that points 1 and 4 are located an equal distance
from the mean point on a circle centered at the mean point and having a radius of
3.16. Similarly, points 2 and 3 are located at an equal distance from the mean but
on a larger circle with a radius of 7.07.
1.3. Multivariate Control Procedures 7
Figure 1.3: Scatter plot illustrating straight-line distance.
There are two major criticisms of this analysis. First, the variation in the
two variables has been completely ignored. From Figure 1.3, it appears that the
temperature readings contain more variation than the pressure readings, but this
could be due to the difference in scale between the two variables. However, in this
particular case the temperature readings do contain more variation.
The second criticism of this analysis is that the covariation between tempera-
ture and pressure has been ignored. It is generally expected that as temperature
increases, the pressure will increase. The straight line given in Figure 1.3 depicts
this relationship. Observe that as the temperature increases along the horizontal
axis, the corresponding value of the pressure increases along the vertical axis. This
poses an interesting question. Can a measure of the distance between two points
be devised that accounts for the presence of a linear relationship between the corre-
sponding variables and the difference in the variation of the variables? The answer
is yes; however, the distance is statistical rather than Euclidean and is not as easy
to compute.
To calculate statistical distance (SD), a measure of the correlation between the
variables of interest must be obtained. This is generally expressed in terms of the
covariance between the variables, as covariance provides a measure of how variables
vary together. For our example data, the sample covariance between temperature
and pressure, denoted as S]2, is computed using the formula
where x\ represents the temperature component of the observation vector and x%

represents the pressure component. The number of sample points is given by n.
The value of the sample covariance as computed from the temperature-pressure
data set is 18.75. Also needed in the computation of the statistical distance is the
sample variance of the individual variables. The sample variance of a variable, x,
Figure 1.4: Scatter plot illustrating statistical distance.
is given by
The sample variances for temperature and pressure as determined from the example
data are 22.67 and 17.33, respectively.
Using the value of the covariance, and the values of the sample variances and the
sample means of the variables, the squared statistical distance, (SD)2, is computed
using the formula
where R = S12/S1S2is the sample correlation coefficient The actual SD value is

obtained by taking the principal square root of both sides of (1.2). Since (1.2) is
the formula for an ellipse, the SD is sometimes referred to as elliptical distance (in
contrast to straight-line distance). It also has been labeled Mahalanobis's distance,
or Hotelling's T 2 , or simply T2. The concept of statistical distance is explored in
more detail in Chapter 2.
Calculating the (SD)2 for each of the four points in our temperature-pressure
sample produces the following results;
From this analysis it is concluded that our four data points are the same statistical
distance from the mean point. This result is illustrated graphically in Figure 1.4.
All four points satisfy the equation of the ellipse superimposed on the plot.
From a visual perspective, this result appears to be unreasonable. It is obvious
that points 1 and 4 are closer to the mean point in Euclidean distance than points
2 and 3. However, when the differences in the variation of the variables and the
1.4. Characteristics of a Multivariate Control Procedure 9
relationships between the variables are considered, the statistical distances are the
same. The multivariate control procedures presented in this book are developed
using methods based on the above concept of statistical distance.
1.4 Characteristics of a Multivariate Control

Procedure
There are at least five desirable characteristics of a multivariate control procedure.
These include the following:
1. The monitoring statistic should be easy to chart and helpful in identifying
process trends.
2. When out-of-control points occur, it must be easy to determine the cause in
terms of the contributing variables.
3. The procedure must be flexible in application.
4. The procedure needs to be sensitive to small but consistent process changes.
5. The procedure should be capable of monitoring the process both on-line as
well as off-line.
A good charting method not only allows for quick signal detection but also
helps in identifying process trends. By examining the plotted values of the charting
statistic over time, process behavior is observed and upset conditions are identified
in advance of a signal, i.e., an out-of-control point.
For a clear understanding of the control procedure, interpretation needs to be
global and not isolated to a particular data set. The same must be true for the
detection of variable conditions that are precursors to upset or chaotic situations.
A control procedure having this ability is a valuable asset.
A control procedure should work with both independent and time-dependent
process observations and be applicable to both continuous processes and batch
processes. It should be flexible enough for use with various forms of the control
statistic, such as a sample mean or an individual observation, and work with dif-
ferent estimators of the internal structure of the variables.
Most industries are volume-oriented. Small changes in efficiency can be the
difference in creating a profit or generating a loss. Sensitivity to small process
changes is a necessary component of any multivariate control procedure.
Multivariate procedures are computationally intense. This is a necessary compo-
nent since industrial processes are steadily moving toward total computerization.
Recent technological advances in industrial control procedures have greatly im-
proved the quantity and quality of available data. The use of computing hardware,
such as electronic data collectors, facilitates the collection of data on a multitude
of variables from all phases of production. In many situations, one may be working
with a very large number of variables and thousands of observations. These are the
data available to a statistical control procedure. Any control procedure must be
programmable and able to interface and react with such collected online data.
Charting with the T2 Statistic
Although many different multivariate control procedures exist, it is our belief that
a control procedure built on the T2 statistic possesses all the above characteristics.
Like many multivariate charting statistics, the T2 is a univariate statistic. This is
true regardless of the number of process variables used in computing it. However,
because of its similarity to a univariate Shewhart chart, the T2 control chart is
sometimes referred to as a multivariate Shewart chart. This relationship to com-
mon univariate charting procedures facilitates the understanding of this charting
method.
Signal interpretation requires a procedure for isolating the contribution of each
variable and/or a particular group of variables. As with univariate control, out-
of-control situations can be attributed to individual variables being outside their
allowable operational range; e.g., the temperature is too high. A second cause of
a multivariate signal may be attributed to a fouled relationship between two or
more variables; e.g., the pressure is not where it should be for a given temperature
reading.
The signal interpretation procedure covered in this text is capable of separating
a T2 value into independent components. One type of component determines the
contribution of the individual variables to a signaling observation, while the other
components check the relationships among groups of variables. This procedure is
global in nature and not isolated to a particular data set or type of industry.
The T2 statistic is one of the more flexible multivariate statistics. It gives ex-
cellent performance when used to monitor independent observations from a steady-
state continuous process. It also can be based on either a single observation or the
mean of a subgroup of n observations. Minor adjustments in the statistic and its
distribution allow the movement from one form to the other.
Many industrial processes produce observations containing a time dependency.
For example, process units with a decaying cycle often produce observations that
can be modeled by some type of time-series function. The T2 statistic can be readily
adapted to these situations and can be used to produce a time-adjusted statistic.
The T2 statistic also is applicable to situations where the time correlation behaves
as a step function.
We have experienced no problems in applying the T2 statistic to batch or semi-
batch processes with targets specified or unspecified. In the case of target specifica-
tion, the T2 statistic measures the statistical distance the observed value is from the
specified target. In cases where rework is possible, such as blending, components
of the T2 decomposition can be used in determining the blending process.
Sensitivity to small process change is achieved with univariate control proce-
dures, such as Shewhart charts, through applications of zonal charts with run rules.
Small, consistent process changes in a T2 chart can be detected by using certain
components of the decomposition of a T2 statistic. This is achieved by monitoring
the residual error inherent to these terms. The detection of small process shifts is
so important that a whole chapter of the text is devoted to this procedure.
An added benefit of the T2 charting procedure is the potential to do on-line
experimentation that can lead to local optimization. Because of the demand of
1.5. Summary 11
production quotas, the creation of dangerous and hazardous conditions, extended

upset recovery periods, and numerous other reasons, the use of experimental design
is limited in most production units. However, one can tweak the process. Mon-
itoring of the appropriate residual terms allows one to observe the effect of this
type of experimentation almost instantaneously. In addition, the monetary value
of process changes, due to new equipment or operational procedures, can be quickly
determined. This aspect of a T2 control procedure has proved invaluable in many
applications.
Numerous software computer programs are available for performing a variety
of univariate SPC procedures. However, computer packages for doing multivariate
SPC are few in number. Some, such as SAS™, can be useful but require individ-
ual programming. Others, such as JMP™, a product of SAS, Inc., provide only
limited multivariate SPC. The program QualStat™, a product of InControl Tech-
nologies, Inc., contains a set of procedures based entirely on the T2 statistic. This
program is used extensively in this book to generate the T2 graphs and perform the
T2 analyses.
1.5 Summary
Industrial process control generally involves monitoring a set of correlated variables.
Such correlation confounds the interpretation of univariate procedures run on indi-
vidual variables. One method of overcoming this problem is to use a Hotelling's T2
statistic. As demonstrated in our discussion, this statistic is based on the concept
of statistical distance. It consolidates the information contained in a multivariate
observation to a single value, namely, the statistical distance the observation is from
the mean point.
Desirable characteristics for a multivariate control chart include ease of applica-
tion, adequate signal interpretation, flexibility, sensitivity to small process changes,
and available software to use it. One multivariate charting procedure that possesses
all these characteristics is the method based on the T2 statistic. In the following
chapters of this book, we explore the various properties of the T2 charting procedure
and demonstrate its value.
Chapter 2
Basic Concepts about the
72 Statistic
2.1 Introduction
Some fundamental concepts about the T2 statistic must be presented before we
can discuss its usage in constructing a multivariate control chart. We begin with
a discussion of statistical distance and how it is related to the T2 statistic. How
statistical distance differs from straight-line or Euclidean distance is an important
part of the coverage. Included also is a discussion of the relationship between
the univariate Student t statistic and its multivariate analogue, the T2 statistic.
The results lead naturally to the understanding of the probability functions used
to describe the T2 statistic under a variety of different circumstances. Having
knowledge of these distributions aids in determining the UCL value for a T2 chart,
as well as the corresponding false alarm rate.
2.2 Statistical Distance

Hotelling (1947), in a paper on using multivariate procedures to analyze bombsight
data, was among the first to examine the problem of analyzing correlated variables
from a statistical control perspective. His control procedure was based on a chart-
ing statistic that he had introduced in an earlier paper (i.e., Hotelling (1931)) on
the generalization of the Student t statistic. The statistic later was named in his
honor as Hotelling's T 2 . Slightly prior to 1931, Mahalanobis (1930) proposed the
use of a similar statistic, which would later become known as Mahalanobis's dis-
tance measure, for use in measuring the squared distance between two populations.
Although the two statistics differ only by a constant value, the T2 form is the most
popular in multivariate process control and is the main subject of this text.
The following discussion provides insight into how the concept of statistical
distance, as defined with the T2 statistic, is used in the development of multivariate
13
14 Chapter 2. Basic Concepts about the T2 Statistic
control procedures. The reader unfamiliar with vectors and matrices may find the
definitions and details given in this chapter's appendix (section 2.8) to be helpful
in understanding these results. Suppose we denote a multivariate observation on p
variables in vector form as X' = (x±,x2,..., xp}. Our main concern is in processing
the information available on these p variables. One approach is to use graphical
techniques, which are usually excellent for this task, but plotting points in a p-
dimensional space (p > 3) is severely limited. This restriction inhibits overall
viewing of the multivariate situation. Another method for examining information
provided in a ^-dimensional observation is to reduce the multivariate data vector
to a single univariate statistic. If the resulting statistic contains information on
all p variables, it can be interpreted and used in making decisions as to the status
of a process. There are numerous procedures for achieving this result, and we
demonstrate two of them below.
Suppose a process generates uncorrelated bivariate observations, (xi,:^), and
it is desired to represent them graphically. It is common to construct a two-
dimensional scatter plot of the points. Also, suppose there is interest in determining
the distance a particular point is from the mean point. The distance between two
points is always measured as a single number or value. This is true regardless of
how many dimensions (variables) are involved in the problem.
The usual straight-line (Euclidean) distance measures the distance between two
points by the number of units that separate them. The squared straight-line dis-
tance, say D, between a point (£1,22) and the population mean point (//i,/^) is
defined as
Note that we have taken the bivariate observation, (xi,x 2 ), and converted it
to a single number D, the distance the observation is from the mean point. If this
distance, D, is fixed, all points that are the same distance from the mean point can
be represented as a circle with center at the mean point and a radius of D (i.e., see
Figure 2.1). Also, any point located inside the circle has a distance to the mean
point less than D.
Unfortunately, the Euclidean distance measure is unsatisfactory for most statis-
tical work (e.g., see Johnson and Wichern (1998)). Although each coordinate of an
observation contributes equally to determining the straight-line distance, no con
sideration is given to differences in the variation of the two variables as measured
by their variances, a\ and cr|, respectively. To correct this deficiency, consider the
standardized values
and all points satisfying the relationship
The value SD, the square root of (SD)2 in (2.1), is known as statistical distance.
For a fixed value of SD, all points satisfying (2.1) are the same statistical distance
2.2. Statistical Distance 15
Figure 2.1: Region of same straight-line distance.
from the mean point. The graph of such a group of points forms an ellipse, as is
illustrated in the example given in Figure 2.2. Any point inside the ellipse will have
a statistical distance less than SD, while any point located outside the ellipse will
have a statistical distance greater than SD.
In comparing statistical distance to straight-line distance, there are some major
differences to be noted. First, since standardized variables are utilized, the statis-
tical distance is dimensionless. This is a useful property in a multivariate process
since many of the variables may be measured in different units. Second, any two
points on the ellipse in Figure 2.2 have the same SD but could have possibly dif-
ferent Euclidean distances from the mean point. If the two variables have equal
variances and are uncorrelated, the statistical and Euclidean distance, apart from
a constant multiplier, will be the same; otherwise, they will differ.
The major difference between statistical and Euclidean distance in Figure 2.2
is that the two variables used in statistical distance are weighted inversely by their
standard deviations, while both variables are equally weighted in the straight-line
distance. Thus, a change in a variable with a small standard deviation will con-
tribute more to statistical distance than a change in a variable with a large standard
deviation. In other words, statistical distance is a weighted straight-line distance
where more importance is placed on the variable with the smaller standard devia-
tion to compensate for its size relative to its mean.
It was assumed that the two variables in the above discussion are uncorrelated.
Suppose this is not the case and that the two variables are correlated. A scatter
plot of two positively correlated variables is presented in Figure 2.3. To construct
a statistical distance measure to the mean of these data requires a generalization
of (2.1).
Figure 2.2: Region of same statistical distance.
Figure 2.3: Scatter plot of correlated variables.
From analytical geometry, the general equation of an ellipse is given by
where the a^- are specified constants satisfying the relationship (a^2 — 4ana22) < 0,
and c is a fixed value. By properly choosing the a^ in (2.2), we can rotate the
2.3. T2 and Multivariate Normality 17
Figure 2.4: Elliptical region encompassing data points.
ellipse while keeping the scatter of the two variables fixed, until a proper alignment
is obtained. For example, the ellipse given in Figure 2.4 is centered at the mean of
the two variables vet rotated to reflect the correlation between them.
2.3 72 and Multivariate Normality

Suppose (xi, #2) can be described jointly by a bivariate normal distribution. Under
this assumption, the statistical distance between this point and the mean vector
(//i, Hz] is the value of the variable part of the exponent of the bivariate normal
probability function
where — oo < x^ < oo for i = 1,2, and Oi > 0 represents the standard deviation of
Xi. The value of (SD) 2 is given by
where p represents the correlation between the two variables, with — 1 < p < I.
The cross-product term between x\ and x^ in (2.4) accounts for the fact that
the two variables vary together and are dependent. When x\ and x^ are correlated,
the major and minor axes of the resulting ellipse differ from that of the variable
space (,TI, x%). If the correlation is positive, the ellipse will tilt upward to the right,
and if the correlation is negative, the ellipse will tilt downward to the right. This
Figure 2.5: Correlation and the ellipse.
is illustrated in Figure 2.5. If p = 0, so that there is no correlation between x\ and

#2, the ellipse will be oriented similar to the one given in Figure 2.2.
Equation (2.4) can be expressed in matrix notation (see section 2.8.5) as
1
where X' = (£1,2:2), // = (//i, ^2), and E is the inverse of the matrix E. Note
a\E
THAT ai2 [ L
2
2 = cr2i
1
= po\a-2
whereis the covariance between
<7i 2:1 and
x-2- The matrix E is referred to as the covariance matrix between x\ and 2:2- The
expression in (2.5) is a form of Hotelling's T2 statistic.
Equations for the contours of a bivariate normal density are obtained by fixing
the value of SD in (2.4). This can be seen geometrically by examining the bivariate
normal probability function presented in Figure 2.6. The locus, or path, of the
point X' = (2:1,2:2) traveling around the probability function at a constant height
is an ellipse. Ellipses of constant density are referred to as contours and can be
determined mathematically to contain a fixed amount of probability. For example,
the 75% and 95% contours for the bivariate normal function illustrated in Figure
2.6 are presented in Figure 2.7. The elliptical contours represent all points having
the same statistical distance or T2 statistic value.
2.3. T2 and Multivariate Normality 19
Figure 2.6: A bivariate normal probability function (a\ = 1, cr2 = 1, p = 0.8).
Figure 2.7: Bivariate normal contours containing 75% and 95% of the probability.
This result can be generalized to the situation where X' — (xi, £2, • • • , %p) is de-
scribed by the p-variate normal (multivariate normal (MVN)) probability function
given by
where — oo < Xi < oo for i = 1,2, . . . , £ > . The mean vector of X' is given by
//' = (//I, //2, • • • , (J>P) and the covariance matrix is given by
A diagonal element, an, of the matrix E represents the variance of the ith
variable, and an off-diagonal element, a^, represents the covariance between the
ith and jth variables. Note that E is a nonsingular, symmetric, and positive def-
inite matrix. In this setting, the equation for an ellipsoidal contour of the MVN
distribution in (2.6) is given by
where T2 is a form of Hotelling's T2 statistic. As in the bivariate case, the ellipsoidal

regions contain a fixed percentage of the MVN distribution and can be determined
exactly.
2.4 Student t versus Hotelling's T"2

The univariate Student t statistic is very familiar to most data analysts. If it is to
be compared to a Student t distribution, the statistic is computed from a random
sample of n observations taken from a population having a normal distribution with
mean /j, and variance a2. Its formula is given by
where is the sample mean and is the
corresponding sample standard deviation. The square of the t statistic is given by
and its value is defined as the squared statistical distance between the sample mean
and the population mean.
2.4. Student t versus He-telling's T2 21
The numerator of (2.9) is the squared Euclidean distance between x and //.
Thus, it is a measure of the closeness of the sample mean to the population mean.
As x gets closer to //, the value of t2 approaches zero. Division of the squared Eu-
clidean distance by the estimated variance of x (i.e., by s2/n) produces the squared
statistical distance. Hotelling (1931) extended the univariate t statistic to the mul-
tivariate case using a form of the T2 statistic based on sample estimates (rather
than known values) of the covariance matrix. His derivation is described as follows.
Consider a sample of n observations X\, X^, • • • , Xn, where X[ = (xn,Xi2,...,
ZIP), i = 1, 2 , . . . , n, is taken from a p-variate normal distribution having a mean
vector // and a covariance matrix E. A multivariate generalization of the t2 statistic
is given by
where X and S are sample estimators of /u and E and are denned by
and
The sample covariance matrix S also can be expressed as
where sa is the sample variance of the ith variable and s^- is the sample covariance
between the ith and jth variables. The matrix S has many special properties. Those
properties that pertain to our use of the T2 as a control statistic for multivariate
processes are discussed in later sections.
In terms of probability distributions, the square of the t statistic in (2.9) has
the form
t 2 = (normal random variable) * (chi-square random variable/dj)~ 1
* (normal random variable),
where df represents the n — I degrees of freedom of the chi-square variate, (n —
I)s 2 /cr 2 , and the normal random variable is given by ^/n(x — /u)/cr. In this rep-
resentation, the random variable, x, and the random variable, s 2 . are statistically
independent. Similarly, the T2 statistic in (2.10) may be expressed as
T2 — (multivariate normal vector) * (Wishart matrix/df)

* (multivariate normal vector),
where df indicates the n~l degrees of freedom of the Wishart variate, (n—1)5, and
the multivariate normal vector is given by ^/n(X — ^). The random vector X and
the random matrix S are statistically independent. The Wishart distribution (see
section 2.8.6 for details) in (2.12) is the multivariate generalization of the univariate
chi-square distribution.
Using the two forms presented in (2.10) and (2.12), it is possible to extend
Hotelling's T2 statistic to represent the squared statistical distance between many
different combinations of p-dimensional points. For example, one can use the T2
statistic to find the statistical distance between an individual observation vector
X and either its known population mean // or its population mean estimate X.
Hotelling's T2 also can be computed between a sample mean, Xi, of a subgroup
and the overall mean, X, of all the subgroups.
2.5 Distributional Properties of the T2

One basic assumption preceding any discussion of the distribution properties of a
Hotelling's T2 statistic is that the multivariate observations involved are the result
of a random sampling of a p-variate normal population having a mean vector \JL and
a covariance matrix E. Thus, the behavior of the independent observations can be
described by a probability function with parameters either known or unknown. If
the parameters are unknown, it will be assumed that there exists a historical data
set (HDS) which was collected under steady-state conditions when the process was
in control. This data set is used to produce estimates of the unknown parameters.
Our work requires that we transform these p-variate sample observations to a
single Hotelling's T2 statistic. Since the original variables are random, these new
T2 values also are random and can be described by an appropriate probability
function. For example, when the parameters of the underlying multivariate normal
distribution are unknown and must be estimated, some form of a univariate F
distribution is used in describing the random behavior of the T2 statistic. This is
also applicable in the univariate case. If the t statistic in (2.8) can be described by
a t distribution with (n — 1) degrees of freedom, the square of the statistic, t2 in
(2.9), can be described by an F distribution with 1 and (n — 1) degrees of freedom.
The T2 statistic may be computed using a single observation made on p com-
ponents at a fixed sampling point, or it may be computed using the mean of a
sample of size m taken during a fixed time period. Unless otherwise stated in this
book, a subgroup of size 1 (i.e., a single observation) will be assumed for the T2
computations. When necessary, stated results will be modified to include the case
where the subgroup size exceeds 1.
Particular assumptions that govern the distribution of the T2 statistic are sepa-
rated into two major cases: the parameters IJL and £ of the underlying distribution
being either known or unknown. The second case, when the parameters must be
estimated, also has two different situations. The first occurs when an observation
vector X is independent of the estimates of the parameters. Independence will
occur when X is not included in the computation of X and S1, the sample estimates
of IJL and S. The second situation occurs when X is included in the computation of
the estimates and hence is not independent of them.
2.5. Distributional Properties of the T2 23
Figure 2.8: Chi-square distribution.
Several different probability functions can be used in describing the T2 statis-

tic (e.g., see Fuchs and Kenett (1998); or Wierda (1994)). Three key forms are
discussed below along with the conditions in which each is applicable.
(1) Assume the parameters, [i and S, of the underlying MVN distribution are
known. The T2 statistic for an individual observation vector X has the form and
distribution given by
where %2 ^ represents a chi-square distribution with p degrees of freedom. The

T2 distribution depends only on p, the number of variables in the observation
vector X.
Graphs of the chi-square distribution function, /(^ 2 ), for various values of p are
presented in Figure 2.8, and percentage points of the chi-square distribution are
given in Table A.3 in the appendix. For smaller values of p, we observe a skewed
distribution with a long tail to the right; a more symmetric form is observed for
larger values of p. The chi-square probability function provides the probability
distribution of the T2 values along this axis.
(2) Assume the parameters of the underlying MVN distribution are unknown
and are estimated using the estimators X and S given in (2.11). These values are
obtained using an HDS consisting of n observations. The form and distribution of
the T2 statistic for an individual observation vector X, independent of X and /S, is
Figure 2.9: F distribution.

given as
where F(p^n_p^ is an F distribution with p and (n — p) degrees of freedom. The

distribution in (2.14) depends on the sample size of the HDS as well as on the
number of variables being examined.
Graphs of the F distribution, f ( F ] , for various values of the numerator and
denominator degrees of freedom are presented in Figure 2.9, and the percentage
points of the F distribution are given in Table A.4 in the appendix. Again, we
observe a skewed distribution with a long tail to the right.
(3) Assume the observation vector X is not independent of the estimators X and
S but is included in their computation. In this situation, the form and distribution
of the T2 statistic (e.g., see Tracey, Young, and Mason (1992)) is given as
where -B(p/2,(n-P-i)/2) represents a beta distribution with parameters p/2 and (n —

p — l)/2. The distribution in (2.15) depends on the number of variables, p, and on
the sample size, n, of the HDS.
Of the above three probability functions used in describing the random behavior
of a T2 statistic, the beta distribution is probably the one that is most unfamiliar to
analysts. Unlike the chi-square and F distributions that allow evaluations for any
variable values greater than zero, the beta distribution f(B] restricts beta values
2.5. Distributional Properties of the T2 25
Figure 2.10: Beta distribution.
to the unit interval (0,1). However, within this interval, the distribution can take
on many familiar shapes, such as those associated with the normal, chi-square, and
F distributions. Examples of the beta distribution for various parameter values are
depicted in Figure 2.10, and percentage points of the beta distribution are given in
Table A.5 in the appendix.
It was stated earlier that the distribution used in describing a T2 statistic when
the parameters of the underlying normal distribution are unknown is some form of
an F distribution. However, in (2.15) we have used the beta distribution to describe
the T2 statistic. The T 2 , for this case, can be expressed as an F statistic by using
a relationship that exists between the F and beta probability functions. The result
is given by
where
In practice, we generally choose to use the beta distribution rather than the F dis-
tribution in (2.16). Although this is done to emphasize that the observation vector
X is not independent of the estimates obtained from the HDS, either distribution
is acceptable.
Since each T2 value obtained from the HDS depends on the same value of X
and 5, a weak interdependence among the T2 values is produced. The correlation
between any two T2 values computed from an HDS is given as — l/(n — 1) (see
Kshirsagar and Young (1971)). It is easily seen that even for modest values of
n, this correlation rapidly approaches zero. Although this is not justification for
assuming independence, it has been shown (see David (1970) and Hawkins (1981))
that, as n becomes large, the set of T2 values behaves like a set of independent
observations. This fact becomes important when subjecting the T2 values of an
HDS to other statistical procedures.
Process control in certain situations is based on the monitoring of the mean of
a sample (i.e., subgroup) of m observations taken at each of k sampling intervals.
The distribution that describes the statistical distance between the sample mean
of the ith observation vector Xi and the HDS mean X is given by
where F(p,n-p) represents an F distribution with p and (n — p) degrees of freedom.

This distribution, which assumes independence between Xi and the parameter es-
timates, depends on the sample size m, the size of the HDS, and the number of
variables in the observation vector, Xi. Although the estimator of S given in (2.11)
is used in the statistic of (2.17), it is more common to insert a pooled estimate of
5 given by
where Si represents the sample covariance estimate for the data taken during the
ith sampling period, i = 1, 2 , . . . , k. With this estimate, the form and distribution
of the T2 statistic becomes
2.6 Alternative Covariance Estimators

The T2 is a versatile statistic as it can be constructed with covariance estimators
other than the common estimator S given in (2.11). An example is given in the
formula in (2.19), where the pooled estimator Sw is used in the construction of
the T2 statistic for monitoring subgroup means. Several other estimators of the
covariance matrix and the associated T2 statistic also are available.
For example, Holmes and Mergen (1993) as well as Sullivan and Woodall (1996),
labeled S&W below, presented an estimator based on the successive difference of
consecutive observation vectors (order of occurrence) in computing a covariance
estimator when only individual observations are available. The estimator, SD, is
given as
The distribution of a T2 statistic using SD is unknown. However, S&W provided

the following approximation:
2.6. Alternative Covariance Estimators 27
where
is based on a result given in Scholz and Tosch (1994). Note that the formula in
(2.21) contains a correction to the expression given in the SfeW article. In selected
situations, the statistic in (2.21) serves as an alternative to the common T2 in
detecting step and ramp shifts in the mean vector.
Other covariance estimators have been constructed in a similar fashion by parti-
tioning the sample in different ways. For example, Wierda (1994) suggested forming
a covariance estimator by partitioning the data into independent, nonoverlapping
groups of size 2. Consider a sample of size n, where n is even. Suppose group 1
= {Xi,X2}, group2 = {X 3 ,X 4 },..., group(n/2) = {Xn-i,Xn}. The estimated
covariance matrix for each group C{, i = 1, 2 , . . . , (n/2), is
and the partition-based covariance estimator is given by
S&W presented a simulation study of five alternative estimators of the covari-

ance matrix and compared the power of the corresponding T2 chart in detecting
outliers as well as in detecting step and ramp shifts in the mean vector. Included
in the comparisons were the common estimator S, given in (2.11), and the above
Sp. These authors showed that, with 5, the T2 chart for individual data is not
effective in detecting step shifts near the middle of the data. In such situations,
the chart is biased, meaning that the signal probability with out-of-control data
is less than with in-control data. Accordingly, S&W recommended the covariance
estimator, S1^, given in (2.20), for retrospective analysis of individual observations,
and they showed that T2 charts based on SD are more powerful than ones based
on Sp. They also showed that the T2 chart based on S was the most effective
among those studied in detecting outliers, but was susceptible to masking with a
moderate number of outliers. In a follow-up study, Sullivan and Woodall (2000)
gave a comprehensive method for detecting step shifts or outliers as well as shifts
in the covariance matrix.
Chou, Mason, and Young (1999) conducted a separate power analysis of the T2
statistic in detecting outliers using the same five covariance estimators as used by
S&W. They too showed that the common estimator, $, is preferred for detecting
outliers.
Although we have presented a brief description of some alternative covariance
estimators, our applications of the T2 statistic in future chapters will rely primarily
on usage of the common estimator S. The major exception is in the use of the
within-group estimator Sw in the application of the T2 to batch processes (see
Chapter 11). We do this for several reasons. In many industrial settings, ramp
changes, step changes, and autocorrelation may be inherent qualities of the process.
The variation associated with observations taken from such a process, even when the
process is in-control, does not represent random fluctuation about a constant mean
vector. As we will see later, a multivariate control procedure can accommodate
such systematic variations in the mean vector. The common estimator S captures
the total variation, including the systematic variation of the mean, whereas many
of the above-mentioned alternative estimators estimate only random variation, i.e.,
stationary variation. Thus, these alternative estimators require additional modeling
of the systematic variation to be effective. With autocorrelation, the T2 charts
based on S are less likely to signal, either when the data are out-of-control or
in-control, unless similar additional modeling is performed (i.e., see Chapter 10).
When data are collected from an MVN distribution, the common covariance
estimator has many interesting properties. For example, 5 is an unbiased estimator
and is the maximum likelihood estimator. The probability function that describes
5 is known, and this is important in deriving the distribution of the T2 statistic.
Another important observation on S that is useful for later discussion is that its
value is invariant to a permutation of the data. Thus, the value of the estimator is
the same regardless of which one of the many possible arrangements of the data,
Xi, X-2-, • • • , Xn, is used in its computation.
2.7 Summary
In this chapter, we have demonstrated the relationship between the univariate t
statistic and Hotelling's T2 statistic. Both are shown to be a measure of the statis-
tical distance between an observed sample mean and its corresponding population
mean. This concept of statistical distance, using the T2 statistic, was expanded to
include the distance a single observation is from the population mean (or its sample
estimate) and the distance a subgroup mean is from an overall mean.
With the assumption of multivariate normality, we presented several probability
functions used to describe the T2 statistic. This was done for control procedures
based on the monitoring of a single observation or the mean of a subgroup of obser-
vations. Since there are many occasions in which the T2 statistic is slightly modified
to accommodate a specific purpose, we will continue to introduce appropriate forms
of the T2 statistic and its accompanying distribution. The ability to construct a
Hotelling's T2 for these different situations adds to its versatility as a useful tool in
the development of a multivariate control procedure.
2.8 Appendix: Matrix Algebra Review

In Chapter 1, we introduced the T2 statistic for two variables in an algebraic form
given by (1.2). As the number of variables increases, the algebraic form of the T2
becomes intractable and we must resort to a simpler form of the T2 using matrix
notation. Many of the properties of the T2 statistic are related to the properties
of the matrices that compose this form. As an aid to understanding Chapter 2, we
offer a brief matrix review. Our presentation includes only those matrix properties
that we deem necessary for a clear understanding of the matrix form of the T2
2.8. Appendix: Matrix Algebra Review 29
statistic. It is assumed that the background of the reader includes knowledge of

basic matrix operations. For additional information on this subject, the interested
reader is directed to the many textbooks written on matrix algebra (e.g., see Agnew
and Knapp (1995)).
2.8.1 Vector and Matrix Notation

An (r x c) matrix is a rectangular array of elements having r rows and c columns.
For example, the (2 x 3) matrix A has 2 rows and 3 columns and is given by
where the a^ are constant values. Vectors are matrices with either one row a row
vector) or one column (a column vector).
Consider a multivariate process involving p process variables. We denote the
first process variable as #1, the second as #2, • • • , and the pth process variable as xp.
A simple way of denoting an observation (at a given point in time) on all process
variables is by using a (p x 1) column vector X, where
The p process variables in X define a p-dimensional variable space, one dimension

for each process variable. An observation on the p process variables translates to a
point in this p-dimensional space.
The transpose of a matrix is obtained by taking each of its columns and making
a corresponding row from them. Thus, the first column of a matrix becomes the first
row in the transpose, the second column becomes the second row in the transpose,
etc. If A denotes the vector, then A' will denote the transpose. For example, the
transpose of X is a (1 x p) row vector that is given by
2.8.2 Data Matrix

A sample of n observation vectors on p process variables is designated as X\, X%,...,
Xn. The sample mean vector is obtained by
The information contained in the sample can be arranged in an (n x p] data matrix

given by
Another important form of the data matrix is achieved by subtracting the mean
vector from each observation vector. This form is given as
The sample covariance matrix S that is necessary in computing a T2 statistic is

computed from the data matrix using
2.8.3 The Inverse Matrix

The inverse of a square (p x p} matrix A is defined to be the matrix, A~1 that
satisfies
where / is the (p x p} identity matrix given by
The inverse matrix exists only if the determinant of A is nonzero. This implies
that the matrix A must be nonsingular. If the inverse does not exist, the matrix
A is singular. Sophisticated computer algorithms exist to compute accurately the
inverses of large matrices and are contained in most computer packages used in the
analysis of multivariate data.
2.8.4 Symmetric Matrix

Numerous references are made to symmetric matrices throughout this text. A
symmetric matrix A is one that is equal to its transpose, i.e.,
2.8. Appendix: Matrix Algebra Review 31
The definition dictates that the matrix A must be a square matrix, so that the
number of rows equals the number of columns. It also implies that the off-diagonal
elements of the matrix A, denoted by a^-, i ^ j, are equal, i.e., that
As an example, the sample covariance matrix S is a symmetric matrix, since,
2.8.5 Quadratic Form

Let X' = (#1,2:2) and
and consider the quadratic form expression given by
Performing the matrix and vector multiplication produces the following univariate
expression:
With the assumption that A is a symmetric matrix, so that 0-12 = 021, the above
expression can be written as
In this form, it is easy to see the relationship between the algebraic expression and
the matrix notation for a quadratic form. The matrix A of the quadratic form is
defined to be positive definite if the quadratic expression is larger than zero for all
nonzero values of the vector X.
To demonstrate this procedure, consider the quadratic form in three variables
given by
This expression is written in matrix notation as X'AX, where X' = (2:1, £2,2:3)
and
Consider the algebraic expression for a T2 statistic given in (2.4) as

With a little manipulation, the above T2 can be written as the Quadratic form
where X' = (xi,X2) and // = (^1,^2)- The inverse of the matrix S, is given by
where a\i = a<i\ — pa\ai is the covariance between x\ and #2, p is the corresponding
correlation coefficient, and ai is the square root of an, i = 1,2.
2.8.6 Wishart Distribution

When data are collected from an MVN distribution, the (p x p) random matrix,
(n — 1)5, where 5 is defined by
is labeled a Wishart matrix. This name comes from Wishart (1928), who generalized
the joint distribution of the p(p + l)/2 unique elements of this matrix. This result
is predicated on the assumption that the original random sample of n observation
vectors is obtained from an NP(/J,, E), so that each Xi ~ NP(^L, E) for i = 1, 2 , . . . , n.
It is also assumed that the matrix 5 is a symmetric matrix.
The Wishart probability function is given as
The matrix S is positive definite and F(.) is the gamma function (e.g., see Anderson
(1984)). Unlike the MVN distribution, the Wishart density function has very little
use other than in theoretical derivations (e.g., see Johnson and Wichern (1998)).
For the case of p = 1, it can be shown that the Wishart distribution reduces to a
constant multiple of a univariate chi-square distribution. It is for this reason that
the distribution is thought of as being the multivariate analogue of the chi-square
distribution.
Chapter 3
Checking Assumptions for Using

a T2 Statistic
3.1 Introduction
As was indicated in Chapter 2, the distributions of various forms of the T2 statistic
are well known when the set of p-dimensional variables being sampled follows an
MVN distribution. The MVN assumption for the observation vectors guarantees the
satisfaction of certain conditions that lead to the known T2 distributions. However,
validating this assumption for the observation vectors is not an easy task. An
alternative approach for use with nonnormal distributions is to approximate the
sampling distribution of the T2. In this chapter, we take this latter approach by
seeking to validate only the univariate distribution of the T2 statistic, rather than
the MVN distribution of the observation vectors.
There are a number of other basic assumptions that must be made and re-
quirements that must be met in order to use the T2 as a control statistic. These
conditions include: (1) selecting a sample of independent (random) observations,
(2) determining the UCL to use in signal detection, (3) collecting a sufficient sample
size, and (4) obtaining a consistent estimator of the covariance matrix for the vari-
ables. In this and later chapters, we discuss these assumptions and requirements
and show how they relate to the T2 statistic. We also demonstrate techniques
for checking their validity and offer alternative procedures when these assumptions
cannot be satisfied.
3.2 Assessing the Distribution of the T2

In univariate process control, if the probability function of the control chart statis-
tic is assumed to be a normal distribution, the assumption can be verified by
performing a goodness-of-fit test. There are numerous formal tests available for
this task, including the Shapiro-Wilk test, the Kolmogorov-Smirnov test, and the
33
34 Chapter 3. Checking Assumptions for Using a T2 Statistic
Anderson-Darling test. These procedures have been documented in numerous sta-

tistical textbooks (e.g., see Anderson, Sweeney, and Williams (1994)), and the cor-
responding test statistics and associated probability functions are readily defined
and tabulated.
Many different tests have been proposed for assessing multivariate normality for
a group of p-variate observations (e.g., for a comprehensive list and discussion, see
Seber (1984) and Looney (1995)). These include such familiar ones as those based
on the multivariate measures of skewness and kurtosis. Unfortunately, the exact
distribution of the test statistic is unknown in these tests. Without this knowledge
the critical values used for the various test procedures can only be approximated.
Hence, these procedures serve solely as indicators of possible multivariate normality.
In view of these results, we take a different approach. Although, for these
discussions, we restrict our attention to the case where a baseline data set is being
utilized, the findings can easily be extended to the monitoring of future observations.
The assumption of multivariate normality guarantees known distribution results for
the T2 statistic (e.g., see Tracey, Young, and Mason (1992)). In a baseline data
set, when the p-dimensional observation vectors X are distributed as an MVN, the
T2 values follow a beta distribution, so that
where the T2 statistic is based on the formula given in (2.15). Since the T2 in
(3.1) is a univariate statistic with a univariate distribution, we propose performing
a goodness-of-fit test on its values to determine if the beta is the appropriate dis-
tribution, rather than on the values of X to determine if the MVN is the correct
distribution.
Although observations taken from an MVN can be transformed to a T2 statistic
having a beta distribution, it is unknown whether other multivariate distributions
possess this same property. The mathematics for the nonnormal situations quickly
becomes intractable. However, we do know that the beta distribution obtained
under MVN theory provides a good approximation for some nonnormal situations.
We illustrate this phenomenon below using a bivariate example.
We begin by generating 1,000 standardized bivariate normal observations having
a correlation of 0.80. A scatter plot of these observations is presented in Figure
3.1. For discussion purposes, three bivariate normal contours (i.e., equal altitudes
on the surface of the distribution) at fixed T2 values are superimposed on the data.
These also are illustrated on the graph.
Note the symmetrical dispersion of the points between the concentric T2 ellipses.
The concentration of points diminishes from the center outward. Using 31 contours
to describe the density of the 1,000 observations, and summing the number of
observations between these contours, we obtain the histogram of the T2 values
presented in Figure 3.2. The shape of this histogram corresponds to that of a beta
distribution.
In contrast to the above example, a typical scatter plot of a nonnormal bivariate
distribution is presented in Figure 3.3. Note the shape of the plot. The observa-
tions in Figure 3.3 are generated from the observations in Figure 3.2 by truncating
variable x\ at the value of 1. Distributions such as this occur regularly in industries
where run limits are imposed on operational variables. Truncation, which produces
3.2. Assessing the Distribution of the T2 35
Figure 3.1: Bivariate normal scatter plot and contours.
Figure 3.2: Histogram of T2 values for bivariate normal data.
long-tailed distributions, also occurs with the use of certain lab data. This can be
due to the detection limit imposed by the inability of certain types of equipment
to make determinations below (or above) a certain value.
In Figure 3.4, three bivariate normal contours at fixed T2 values, computed
using the mean vector and covariance matrix of the truncated distribution, are su-
perimposed on the nonnormal data. A major distinction between the scatter plots
in Figures 3.1 and 3.4 is the dispersion of points between the elliptical contours.
For the bivariate normal data in Figure 3.1, the points are symmetrically dispersed
between the contours. This is not the case for the nonnormal data in Figure 3.4.
Figure 3.3: Scatter plot of data from a truncated distribution.
Figure 3.4: Bivariate normal contours superimposed on truncated distribution.
For example, note the absence of points in the lower left area between the two
outer contours. Nevertheless, possibly due to this particular pattern of points, or
due to the size of the sample, the corresponding T2 histogram given in Figure 3.5
for this distribution has a strong resemblance to the histogram given in Figure 3.2.
Agreement of this empirical distribution to a beta distribution can be determined
by performing a univariate goodness-of-fit test.
3.4. The Sampling Distribution of the T2 Statistic 37
Figure 3.5: Histogram of T2 values for truncated data.
3.3 The T2 and Nonnormal Distributions

We do not want to imply from the above example that the beta distribution can
be used indiscriminately to describe the T2 statistic. To better understand the
problem, consider the bivariate skewed distribution, f(xi,x<2), illustrated in Figure
3.6. The contours of this density are presented in Figure 3.7. Observe the irregular
shape and lack of symmetry of the contours; they definitely lack the elliptical shape
of the bivariate normal contours given in Figure 3.1.
The T2 statistic, when multiplied by n/(n — I) 2 in a baseline data set, maps
the observation vector X into a value on the unit interval (0,1). This is true
regardless of the distribution of X. However, multivariate normality guarantees
that these transformed values are distributed as a beta variable. This result is
illustrated in the histogram of T2 values given in Figure 3.2. For a nonnormal
multivariate distribution to produce the same result, the number of observations
contained within the bins (i.e., between contours) of the T2 histogram must be
close to that produced by the MVN. Some multivariate distributions appear to
satisfy this condition, as was the case with the truncated normal example. What
characteristics will these nonnormal distributions have in common with the MVN?
The answer lies in investigating the sampling distribution of the T2 statistic.
3.4 The Sampling Distribution of the T2 Statistic

Consider a random sample of size n taken from a multivariate population with a
finite mean vector JJL and covariance matrix E. When the assumption of multi-
variate normality is imposed, the sampling distribution of the T2 statistic can be
shown to be a beta distribution when a baseline data set is used or, if appropriate, a
38 Chapter 3. Checking Assumptions for Using a T Statistic
Figure 3.6: Bivariate skewed distribution.
Figure 3.7: Contours of skewed distribution.
chi-square or F distribution. However, when this MVN assumption is invalid, the

sampling distribution of the T2 statistic must be approximated.
The kurtosis of the underlying multivariate distribution plays an important role
in approximating the sampling distribution of the T2 statistic. To understand this
3.4. The Sampling Distribution of the T2 Statistic 39
role, consider first the kurtosis, denoted by 0:4, for a imivariate distribution with a
known mean fj, and a known standard deviation a. The kurtosis is usually defined
as being the expected value of the fourth standard moment, i.e.,
This is a measure of the "peakedness" or "heaviness" of the tail of a distribution.

For example, the univariate bell-shaped normal distribution has a kurtosis value of
3. Using this value as a point of reference, the kurtosis values for other distributions
can be compared to it. If a distribution has a kurtosis value that exceeds 3, it is
labeled "peaked" relative to the normal, and if its kurtosis value is less than 3, it
is labeled "flat" relative to the normal.
For example, the univariate exponential distribution has a kurtosis value of 9,
whereas the univariate uniform and beta distributions have kurtosis values less
than 3. We conclude that the uniform and beta distributions are "flatter" than the
normal distribution and that the exponential distribution is more "peaked." The
gamma, lognormal, and weibull distributions have kurtosis values around 3 and
thus are "peaked" similarly to the normal distribution.
In a multivariate distribution, the kurtosis measure is closely related to the
sampling distribution of the T2 statistic. Consider a ^-dimensional vector X, with
a known mean vector /j, and a known covariance matrix E, and assume a T2 statistic
is computed using this vector. The first moment of the T2 sampling distribution is
given as E[T2} = p. The second moment, E [ ( T 2 ) 2 ] , can be expressed in terms of
Mardia's kurtosis statistic, denoted as /?2,p (see Mardia, Kent, and Bibby (1979)),
and associated with the multivariate distribution of the original observation vector
X. The result is given as
When X follows an MVN distribution, Np(p,, £), the kurtosis value in (3.3) reduces
to
where T2 is based on the formula given in (2.13).

For a sample of size n, taken from a p-dimensional multivariate distribution
with an unknown mean vector and covariance matrix, the sample kurtosis, &2,p, is
used to estimate /?2,p- The estimate is given by
where T2 is based on the formula given in (2.15). The relationship in (3.5) indicates
that large T2 values directly influence the magnitude of the kurtosis measure.
We can use the above results to relate the kurtosis value of a multivariate non-
normal distribution to that of an MVN distribution. As an example, consider two
uncorrelated variables, (x\,x-2), having a joint uniform distribution represented by
a unit cube. Both marginal distributions are uniform and have a kurtosis value
of 1.8. Thus, these are "very flat" distributions relative to a univariate normal.
Table 3.1: Comparison of kurtosis values.

No. of No. of Kurtosis Value Kurtosis Value
Uniform Exponential for Multivariate for Multivariate
Variates Variates Nonnormal Distribution Normal Distribution
1 1 12.8 8
2 1 18.6 15
3 1 26.4 24
4 1 36.2 35
5 1 48.0 48
6 1 61.8 63
This "flatness" carries over to the joint distribution of the two variables. Using
(3.4), the kurtosis of the bivariate normal, with known parameters, is given by
p(p + 2) = 2(4) = 8. The kurtosis value for the joint uniform is found by evaluating
(3.3), where // = (0.5,0.5) and E is a diagonal matrix with entries of (1/12) on
the diagonal. The value is calculated as 5.6. This implies that a bivariate uniform
distribution is considerably "flatter" than a bivariate normal distribution.
In contrast to the above, there are many combinations of distributions of p in-
dependent nonnormal variables that can produce a multivariate distribution that
has the same kurtosis value as a p-variate normal. For example, consider a mul-
tivariate distribution composed of two independent variables: x\ distributed as a
uniform (0,1) and x% distributed as an exponential (i.e., f ( x ) = e~x for x > 0 and
zero elsewhere). Using (3.3), the kurtosis of this bivariate nonnormal distribution
is 12.8. In comparison, the kurtosis of a bivariate normal distribution is 8. Thus,
this distribution is heavier in the tails than a bivariate normal. However, suppose
we keep adding another independent uniform variate to the above nonnormal dis-
tribution and observe the change in the kurtosis value. The results are provided in
Table 3.1.
As the number of uniform variables increases in the multivariate nonnormal
distribution, the corresponding kurtosis value of the MVN distribution approaches
and then exceeds the kurtosis value of the nonnormal distribution. Equivalence
of the two kurtosis values occurs at the combination of five uniform variables and
one exponential variable. For this combination, the tails of the joint nonnormal
distribution are similar in shape to the tails of the corresponding normal. The
result also implies that the T2 statistic based on this particular joint nonnormal
distribution will have the same variance as the T2 statistic based on the MVN
distribution.
This example indicates that there do exist combinations of many (independent)
univariate nonnormal distributions with the same kurtosis value that is achieved
under an MVN assumption. For these cases, the mean and variance of the T2
statistic based on the nonnormal data are the same as for the T2 statistic based on
the corresponding normal data. This result does not guarantee a perfect fit of the
T2 sampling distribution to a beta (or chi-square or F) distribution, as this would
require that all (higher) moments of the sampling distribution of the T2 statistic
be identical to those of the corresponding distribution. However, such agreement of
3.5. Validation of the T2 Distribution 41
the lower moments suggests that, in data analysis using a multivariate nonnormal
distribution, it may be beneficial to determine if the sampling distribution of the
T2 statistic fits a beta (or chi-square or F) distribution. If such a fit is obtained,
the data can then be analyzed as if the MVN assumption were true.
3.5 Validation of the T2 Distribution

A popular graphical procedure that is helpful in assessing if a set of data represents
a reference distribution is a quantile-quantile (Q-Q) plot (e.g., see Gnanadesikan
(1977) or Sharma (1995)). Thus, this technique can be used in assessing the sam-
pling distribution of the T2 statistic. We emphasize that the Q-Q plot is not a
formal test procedure but simply a visual aid for determining if a set of data can
be approximated by a known distribution. Alternatively, goodness-of-fit tests can
also be used for such determinations.
For example, with baseline data, we could construct a Q-Q plot of the ordered
sample values, denoted by x^ = [n/(n — 1)2]T?, against the corresponding quan-
tiles, <?(i), of the reference distribution. If [n/(n — 1)2]T2 can be described by
the appropriate beta distribution, /3[ p /2,(n-p-i)/2]> the beta quantiles are computed
from the following integral equation:
If a Q-Q plot of the data results in an approximately straight line, it can be

concluded that the distribution of the data is not different from the reference distri-
bution. A straight line, with a slope of 1 and an intercept of 0, indicates an excellent
fit of the empirical distribution to the hypothesized distribution. A straight-line
plot with an intercept different from zero indicates a difference in location for the
two distributions, while a distinct curve in the plot suggests a difference in variation.
As illustrated in later chapters, the Q-Q plot can provide an additional method of
locating atypical observations.
To demonstrate the Q-Q plot for use in determining if the T2 statistic follows
a beta distribution, we generate a sample of size n = 54 from a bivariate normal
distribution. The corresponding T2 values are computed and converted to beta val-
ues using (3.1). A Q-Q plot corresponding to a beta distribution, with parameters
equal to 1 and 26, is constructed and presented in Figure 3.8. The plot displays an
approximate linear trend, along a 45-degree line, though the last five to six points
are slightly below the projected trend line. This pattern suggests that the T2 data
for this sample follow a beta distribution, a conclusion that is expected given that
the underlying data were generated from a bivariate normal distribution.
From Table 3.1, we noted that the joint distribution of five independent uniform
variables with an additional independent exponential variable has the same kurtosis
value as an MVN with six variables. The mean and variance of the corresponding T2
Figure 3.8: Q-Q plot of generated bivariate normal data.
Figure 3.9: T chart based on simulated data from nonnormal distribution.
sampling distributions are also equal. Under the MVN assumption, the T2 statistic
follows a chi-square distribution with p degrees of freedom. We now illustrate the
appropriateness of the same chi-square distribution for the T2 statistic generated
from the nonnormal distribution. Two hundred observations for five independent
uniform variables and one independent exponential are generated. The T2 chart
for the 200 observations are presented in Figure 3.9, where UCL = 16.81 is based
on a = 0.01.
The corresponding Q-Q plot using chi-square quantiles is presented in Figure
3.10. Our interest lies in the tail of the distribution. For a = 0.01, we would expect
two values greater than the chi-square value of 16.81. Although two T2 values in
Figure 3.9 are near the UCL, the T2 chart indicates no signaling T2 values. These
two large T2 values are located in the extreme right-hand corner of the Q-Q plot in
Figure 3.10. The Q-Q plot follows a linear trend and displays little deviation from
Figure 3.10: Q-Q plot based on simulated data from nonnormal distribution.
Figure 3.11: T2 chart based on simulated data from nonnormal distribution.
it. Thus, the T2 values appear to follow a chi-square distribution despite the fact
that the underlying multivariate distribution is nonnormal.
In contrast, consider a bivariate nonnormal distribution based on one indepen-
dent uniform and one independent exponential variable. From Table 3.1, it is noted
that the kurtosis of this distribution is larger than that of a bivariate normal. This
implies a heavier tail (i.e., more large T2 values) than would be expected under
normality. We generate 200 observations from this nonnormal distribution and
construct the T2 chart presented in Figure 3.11 using UCL = 9.21 and a = 0.01.
With this a, we would expect to observe two signals in the chart. However, there
are five signaling T2 values, indicating the heavy tailed distribution expected by
the high kurtosis value.
The corresponding Q-Q plot for this data is presented in Figure 3.12. Note the
severe deviation from the trend line in the upper tail of the plot of the T2 values.
Figure 3.12: Q-Q plot based on simulated data from nonnormal distribution.
Figure 3.13: Scatter plot of bivariate data.
The conclusion is that a chi-square distribution does not provide a good fit to
these data.
To illustrate the appropriateness of the use of the beta distribution to describe
the sampling distribution of the T2 statistic, we consider 104 bivariate observations
taken from an actual industrial process. A scatter plot of the observations on the
variables x\ and x 2 is presented in Figure 3.13. Observe the elongated elliptical
shape of the data swarm. This is a characteristic of the correlation, r = —0.45,
between the two variables and not of the form of their joint distribution. Observe
also the presence of one obvious outlier that does not follow the pattern established
by the bulk of the data.
The presence of outliers poses a problem in assessing the distribution of the T2
statistic. Thus, we must remove the three outliers and recompute the T2 values
of the remaining data. These remaining values are plotted in the T2 chart given
in Figure 3.14. There remain some large T2 values associated with some of the
observations, but none are signals of out-of-control points. Observations of this
Figure 3.14: T2 chart of bivariate data.
Figure 3.15: Q-Q plot of T2 values of bivariate data.
type (potential outliers) are not removed in this example, although they could
possibly affect the fit of the corresponding beta distribution.
The corresponding Q-Q plot for this data is presented in Figure 3.15. Since
p — 2 and n = 103, the beta distribution fit to the data, using (3.1), is 5^ 50 ).
From inspection, the Q-Q plot exhibits a very strong linear trend that closely follows
the 45° line imposed on the plot. This indicates an excellent fit between the T2
sampling distribution and the appropriate beta distribution.
A question of interest is whether the above beta distribution is describing the
process data because the actual observations follow a bivariate normal or because
the fit provides a good approximation to the sampling distribution. To address
this question, we examine estimates of the marginal distributions of the individual
variables x\ and x<±. If the joint distribution is bivariate normal, then each marginal
Figure 3.16: Histogram for x\.
Figure 3.17: Histogram for x%.
distribution must follow a univariate normal distribution. Since histograms of the

individual variables provide a good indicator of the marginal distributions, these
are presented in Figures 3.16 and 3.17.
Neither variable appears to have a distribution similar to the normal curves
superimposed on the histograms and based on using the means and standard de-
viations of the individual variables. If the marginal distributions were normal, we
would expect agreement between these normal curves and the corresponding his-
tograms. However, the distribution of variable x\ appears to be flat and skewed
to the left, while the distribution of x% appears to be somewhat flat and skewed to
the right. This is supported by a kurtosis value of 2.68 for the distribution of xi,
and a value of 2.73 for the distribution of x% (relative to a value of 3 for a normal
distribution). From these observations, we conclude that the joint distribution of
the two variables is nonnormal. The good fit of the beta distribution to the data
3.6. Transforming Observations to Normality 47
appears to be due to its excellent approximation to the sampling distribution of

the T2 statistic.
3.6 Transforming Observations to Normality

The closer that observations are to fitting a MVN distribution, the better the beta
distribution will be in describing the behavior of the T2 statistic. For this reason,
we examine, in terms of the MVN, some major reasons why goodness-of-fit test
procedures might lead to rejection of the appropriate T2 distribution. The MVN
distribution is symmetric about its mean and has long thin tails away from its
mean. Thin tails imply that the concentration of probability drops off very rapidly
as the statistical distance of the data from the mean increases. Thus, the number
of points within the contours will diminish as the distance from the mean point
increases. Also, symmetry of the MVN distribution implies a symmetric dispersion
of points within the contours.
When a beta distribution for the T2 values is rejected, it may be a result of the
fact that the data are from a nonnormal distribution having (true) contours that
are nonelliptical in shape (e.g., see Figure 3.7). This can occur when the tails of
the distribution become thick in one or more directions. It also may be due to the
fact that the distribution of the data is skewed in one direction so that the points
are not symmetrically dispersed within the contours. Outliers can be a problem in
that they can distort the estimate of the mean vector as well as of the covariance
matrix. This in turn may lead to contours that are odd-shaped and off-center.
Under the assumption of multivariate normality, the probability functions gov-
erning the individual variables for an MVN are univariate normal. Also, any subset
of variables following an MVN is jointly normally distributed, and any linear com-
bination of the variables is univariate normal. Unfortunately, the converses of these
statements are not true. For example, it is possible to construct data sets that have
marginal distributions that are normally distributed, yet the overall joint distribu-
tion can be non-normal. Nevertheless, although single-variable normality does not
guarantee overall multivariate normality, it does improve the overall symmetry of
the joint distribution, and this is an important issue when applying the T2 statistic.
If the T2 values cannot be shown to follow a beta distribution, it may be possi-
ble to transform the individual variables so that the T2 values based on the trans-
formed values follow a beta distribution. One simple means to achieve this might
be to transform the individual variables so that each is approximately normally dis-
tributed. For example, several methods have been proposed for using the Johnson
system to transform a nonnormal variate to univariate normality (e.g., see Chou,
Polansky, and Mason (1998)). Tests may then be conducted to determine if the T2
based on the set of transformed variables follows a beta distribution.
A similar but more complex technique for making observations more normal
is to use a multivariate transformation so that the entire observation vector has
an approximate MVN distribution (e.g., see Johnson and Wichern (1998), Velilla
(1993)). Also, the transformation based on the above Johnson system has been
extended to the multivariate case (e.g., see Johnson (1987)). The difficulty in using
these types of transformations is in making inferences from the transformed data

back to the original data.
3.7 Distribution-Free Procedures

When the beta distribution for the T2 values is rejected and cannot be used in
determining the UCL for a T2 control chart, alternative procedures are needed. One
simple but conservative method of determining a UCL is based on an application of
Chebyshev's theorem (e.g., see Dudewicz and Mishra (1988)). The theorem states
that regardless of the distribution of a;,
where A: is a chosen constant such that k > I and where p, and d1 are the mean and
variance, respectively, of x. For example, the probability that a random variable
x will take on a value within k = 3.5 standard deviations of its mean is at least
1 — l/k2 = 1 — l/(3.5) 2 = 0.918. Conversely, the probability that x would take on
a value outside this interval is no greater than l/k2 = 1 — 0.918 = 0.082.
To use the Chebyshev procedure in a T2 control chart, calculate the mean, T,
and the standard deviation, s^, of the T2 values obtained from the HDS. Using these
as estimates of the parameters HT and a? of the T2 distribution, an approximate
UCL is given as
The value of k is determined by selecting a, the probability of observing values

of x outside the interval bounded by [i±k<j, and solving the equation
Another simple method for estimating the UCL of a T2 statistic is based on

finding a distribution-free confidence interval (CI) for the UCL. This approach uses
the fact that the UCL represents the (1 —a)th quantile of the T2 distribution, where
a is the false alarm rate. A brief summary of the approach is given in this chapter's
appendix (section 3.11); also, while Conover (2000) gave a detailed discussion of
both one-sided and two-sided CIs for the pth quantile of a distribution.
A third method of obtaining an approximate UCL is to fit a distribution to
the T2 statistic using the kernel smoothing technique (e.g., see Chou, Mason, and
Young (2001)). We then can estimate the UCL using the (1 — a)th quantile of the
fitted kernel distribution function of the T2. An example of a fitted set of data and
the corresponding smoothed distribution function is illustrated in Figure 3.18.
The kernel smoothing approach can provide a good approximation to the UCL
of the T2 distribution provided the sample size is reasonably large (i.e., n > 250).
However, the estimate will be biased. A detailed example using this method to
remove outliers is provided in Chapter 5. For smaller samples, other approaches
must be utilized.
3.8. Choice of Sample Size 49
Figure 3.18: Kernel smoothing fit to T histogram.
3.8 Choice of Sample Size

Many different variables need to be examined in a multivariate process, and many
different parameters need to be estimated. These requirements result in the need
for large sample sizes. For example, in a steam turbine study, the variables of
interest might include temperature, pressure, feed flow, and product flow. Due to
differences in the standard deviations of these variables, each individual variable
might require a separate estimated sample size to achieve a given precision. The
problem, then, is to combine these different estimates into a single value of n. One
useful solution to this problem is to use the largest estimated sample size for the
individual variables as the overall sample size (see Williams (1978)). However, we
need to be certain that this choice meets certain conditions.
Using the T2 statistic requires a sample in which the number of observations n
exceeds the number of variables p. If p < n, neither the inverse covariance matrix
S"1 nor its estimate S~l can be computed. However, this is a minimal requirement.
In addition, a large number of parameters must be estimated when the mean vector
and covariance matrix are unknown. To provide stable and accurate estimates of
these parameters, n must be sufficiently large. This occurs because there are p
means, p variances, and [p(p — l)/2] covariances, or a total of 2p + [p(p — l)/2]
parameters, to be estimated. For large p, this number can be significant. For
example, for p = 10 we must estimate 65 parameters, while for p = 20 the number
increases to 230.
From the above discussion, one can see that the sample size for a multivariate
process can be large, particularly when p is large, as there are many parameters
to estimate. Other considerations also govern the choice of n. For example, in
choosing a preliminary sample for a steam turbine system one might want to include
observations under normal operational conditions. To obtain an adequate profile of
ambient conditions would require observations from an entire year of operation. For
Figure 3.19: Scatter plot with a discrete variable.
replication, one would also need multiple observations at various loads (megawatt
production) for the different temperatures.
An alternative solution to choosing a large sample size is to seek to reduce
the dimensionality of the multivariate problem. This can be achieved by reducing
the number of parameters that need to be estimated. One useful solution to this
problem involves applying principal component analysis (Jackson (1991)).
3.9 Discrete Variables

The MVN distribution is generally used to describe observations on a set of con-
tinuous random variables. These include variables that can take on all possible
values in a given interval. However, this does not imply that all components of
the observation vector must be continuous variables and that discrete variables are
prohibited. For example, consider the plot of a hypothetical bivariate sample of
data taken on shoe size and body weight that is given in Figure 3.19. The data
are concentrated at the different shoe sizes and these occur only at discrete values.
With the assumption of a bivariate normal distribution, we would obtain the ellip-
tical region that is superimposed on the plot. In this example, the T2 statistic is
usable because there are several different categories for the discrete variable.
3.10 Summary
Fundamental to the use of any statistic as a decision-making tool is the probability
function describing its behavior. For the T2 statistic, this is either the chi-square,
the beta, or the F distribution. Multivariate normal observation vectors are the
basis for these distributions. Since multivariate normality is not easily validated,
3.11. Appendix: Confidence Intervals for UCL 51
an alternative approach is to validate the distribution of the T2 statistic. We offer

a number of procedures for accomplishing this task.
A Q-Q plot of the ordered T2 values plotted against the quantiles of the beta
(or chi-square, or F) distribution is a useful graphical procedure for validating the
T2 distribution. Strong linearity of the plotted points suggests that the observed
T2 values can be described by the reference distribution. If graphical procedures
are too imprecise, we recommend performing a goodness-of-fit test on the observed
T2 values. Many of the common goodness-of-fit tests can be used for this purpose.
If the above tests indicate that the T2 values do not provide a good fit to the
required beta, or chi-square, or F distribution, several alternatives are available
for finding the required UCL to use in the T2 control chart. One approach is to
transform the individual nonnormal variables to normality. If such transformations
can be found, the observed T2 values could then be retested for the appropriate
fit. Other procedures involve estimating the UCL using a nonparametric approach
based on kernel smoothing, the quantile technique, or Chebyshev's inequality.
Sample size selection is always a concern when performing a multivariate anal-
ysis. For the T2 control chart, this means having enough observations to estimate
the unknown parameters of the mean vector and covariance matrix. As the number
of parameters increases, the required sample size can become quite large. This is a
restriction that must be considered in any application of the T2methodology.
3.11 Appendix: Confidence Intervals for UCL

Suppose T2^ < T22. < • • • < T,2 N represent n ordered sample T2 values and let
a = 0.01. Also let Qp represent the pih quantile of the T2 distribution. A 1007%
CI for Qo.gg (the UCL) is given by [T(2r),T(2s)] and satisfies
where r and s are determined by the inequality
and s — r is a minimum. For large n, one may approximate r and s by the two
values
where 2( 7 /2) is the upper 7/2th quantile of the standard normal distribution.
The CIs obtained from (A3.1) and (A3.2) are generally very similar when n is
large and 7 ~ 0.95. We choose to use the inequality in (A3.1) to obtain r and s.
From (A3.1), one can be at least 1007% sure that the UCL is somewhere between
TV2 N) and T\2 l-,. Since there is an infinite number of values between T,\2 ^) and T/\2 )^ , there
b r s
are infinitely many choices for the UCL. For convenience, we choose the midpoint
of the interval as an approximate value for the UCL. It is given by
Chapter 4
Construction of Historical
Data Set
Old Blue: The Research Center

You enter the research center and immediately start your search for informa-
tion on SPC. You are amazed at the number of books on univariate process
control. Here is one by Montgomery and another by Ryan. However, none of
these books appears to address multivariate process control in the detail that
you require. You decide to change your approach. Using the key word "multi-
variate, " you again search for textbooks. Here is one by Anderson, on multi-
variate analysis, that might contain some helpful information, but it is at the
graduate level. Ah, here is one titled, Multivariate Statistical Process Control
with Industrial Applications. Could this contain the solution to the problem?
A research center associate quickly secures you a copy and you sit down to
read it. You are encouraged that the preface states the book is written for the
practitioner and from an engineering perspective. You are quickly absorbed
in reading and the next few hours slip away. Soon the research associate is
asking if you need anything before the lunch break. Your response is, "Yes, I
need a personal copy of this book."
Back in the office, while you are pondering what you have learned, you
hear the boss asking, "Any progress?" With confidence restored, you respond
with a very firm "Yes."
To solve the problem with Old Blue, you must compare the new data con-
taining upset conditions to past data taken when the process was in control.
The in-control data is used to create a baseline or HDS. This was explained
in the book as a Phase I operation. This is no different from what you were
taught in univariate process control by . . . what was his name? However, the
book contains two chapters on this topic. This calls for closer examination.
The monitoring of new operational data to ascertain if control is being main-
tained is referred to as a Phase II operation. You continue thinking about
what you have read as you open your lunch bag.
53
54 Chapter 4. Construction of Historical Data Set
The statistic used to make the needed comparison is a Hotelling's T2. You
twice read the section explaining how this is done. The theory is complex,
but from an intuitive point of view, you now understand how multivariate
control procedures work. A T2 statistic, the multivariate analogue of a com-
mon t-statistic, can assess all 35 variables at the same time. It is written as
a quadratic form in matrix notation. You never appreciated that course in
matrix algebra until now. It is all very amazing.
Suddenly, you realize you still have a most serious problem. How is all
of this computing to be done? You can't do it with your favorite spreadsheet
without spending days writing macros. How was it done in the text? What
software did they use in all of their data examples? A quick search provides the
answer, QualStat™, a product of InControl Technologies, Inc. You note that
a CD-ROM containing a demonstration version of this program is included
with the book. (This chicken salad sandwich is good. You must remember
to tell the cafeteria staff that their new recipe is excellent.) Following the
instructions on your computer screen, you quickly upload the software. Now,
you are ready to work on a Phase I operation and create an HDS.
4.1 Introduction
An in-control set of process data is a necessity in multivariate control procedures.
Such a data set, often labeled historical, baseline, or reference, provides the basis
for establishing the initial control limits and estimating any unknown parameters.
However, the construction of a multivariate HDS is complicated and involves prob-
lem areas that do not occur in a univariate situation. It is the purpose of this
chapter to explore in detail some of these problem areas and offer possible solutions.
The development of the HDS is referred to as a Phase I operation. Using it as
a baseline to determine if new observations conform to its structure is termed a
Phase II operation. Since there is only one variable to consider, univariate Phase I
procedures are easy to apply. Upon deciding which variable to chart, one collects
a sample of independent observations (preliminary data) on this variable from the
in-control process. The resulting data provide initial estimates of the parameters
that characterize the distribution of the variable of interest.
The parameter estimates are used to construct a preliminary control procedure
whose major purpose is to purge the original data set of any observations that do not
conform to the structure of the HDS. These nonconforming or atypical observations
are labeled outliers. After the outliers are removed from the preliminary data set,
new estimates of the parameters are obtained and the purging process is repeated.
This is done as many times as necessary to obtain a homogeneous data set as
defined by the control procedure. After all outliers are removed, the remaining
data is referred to as the HDS.
The role of a multivariate HDS is the same as in the univariate situation. It pro-
vides a baseline for the control procedure by characterizing the in-control process.
However, construction of a historical data set becomes more complicated when us-
ing multivariate systems. For example, we must decide which variables to include
and their proper functional forms. This determination may require in-depth process
Planning
Establish Goals
Study and Map Process
Define Good Process Operations
Collect Preliminaey Data Set

Verify Data Quality
Filter Data
Variable Form
Collection Procedures Thoreticel Relationships Missing Data
Human Errors Estimation
Empirical Relationships
Electronic Errors Transformations Deletion
Detecting Data Problems
Collinearity Autocorrelation
Effects Effects
Detection & Removal Detection
Outliers
Detection
Purging Process
Alternative Covariance
Estimators
Construct Final Historical Data Set
Chart 4.1: Guidelines for constructing an HDS.

4.3. Preliminary Data 57
• The production of caustic soda and chlorine gas is a major industry in the
United States. One method of production is through the electrolysis of brine (i.e.,
salt water). This work is done in an electrolyzer that is composed of one or more
cells. A cell is the basic unit where the conversion takes place. The major purpose
of a control procedure is to locate cells whose conversion efficiency has dropped so
that they can be treated to restore their efficiency
• The brine (feed stock) for an electrolyzer must be treated to remove impurities.
This operation takes place in a brine treatment facility. The primary purpose
of a control procedure on this unit is to maintain the quality of the feed stock
for the electrolyzer. "Bad" brine has the potential of destroying the cells and
contaminating the caustic soda being produced.
• From the electrolyzer, caustic soda is produced in a water solution. The water
is removed through evaporation in an evaporating unit. A control procedure on this
unit maintains maximum production for a given set of run conditions, maintains
the desired caustic strength, and helps locate sources of problems.
• One method of transporting the finished caustic product is by railroad tank
cars. Overweight tank cars present a number of major problems. Control proce-
dures on the loading of the tank cars can ensure that no car will be loaded above
its weight limit.
• Control procedures on steam and gas turbines, used in the production of
electricity for the electrolysis of the brine, detect deviations in the efficiency of the
turbines. Also, they are used to locate sources of problems that occur in operations.
Boilers, used in steam production for the evaporation of water, are controlled in a
similar fashion.
• Control procedures on large pumps and compressors are used for maintenance
control to detect any deviation from a set of "ideal" run conditions. It is less
expensive to replace worn parts than to replace a blown compressor.
• Control procedures on various reactors are used to maintain maximum effi-
ciency for a given set of run conditions and to locate the source of the problem
when upsets occur.
4.3 Preliminary Data

The first step in the development of an HDS is to obtain a preliminary data set.
This is a sample of observations taken from the process while it is "in control."
However, "in control" must be defined. For example, the purpose of the desired
control procedure may be to keep the process on target or to minimize some un-
wanted variation in production. How is a sample containing the desired information
obtained? Generally, you are presented with a block of process data collected dur-
ing a given time period. The data may be unfiltered and may contain information
taken when the process was in control as well as out of control. In addition, the re-
sultant sample may include data from different operational levels, different product
formulations, different feed stocks, etc. The data may provide a genuine history of
the process over the given time period but may not be in the desired form necessary
to construct an HDS.
Table 4.1: Chemical process data.

Obs. No. Xi X2 X3 X4 X5 X6 X7 *8
1 2020 165 661 5.90 1.60 0.28 0.56 86
2 2020 255 675 7.00 5.00 0.26 0.61 84
3 2014 266 675 7.00 3.90 0.27 0.60 85
4 1960 270 900 3.00 8.00 0.27 0.65 89
5 1870 185 850 4.00 4.80 0.30 0.62 90
6 1800 195 590 3.00 4.00 0.30 0.52 87
7 1711 201 663 4.00 3.60 0.29 0.61 88
8 1800 250 875 0.00 6.00 0.33 0.59 88
9 2011 182 710 2.40 7.70 0.29 0.59 86
10 1875 175 600 4.30 6.00 0.29 0.57 97
11 2099 252 566 5.70 7.20 0.35 0.68 86
12 2175 270 535 4.50 10.00 0.33 0.60 86
13 1226 216 495 0.00 9.60 0.29 0.56 86
14 1010 180 520 0.00 8.00 0.31 0.58 92
15 2041 192 692 6.80 9.40 0.28 0.60 92
16 2040 225 700 7.50 10.00 0.32 0.64 90
17 2330 131 483 7.40 7.90 0.29 0.62 88
18 2330 160 600 5.50 8.00 0.26 0.53 92
19 2250 241 523 7.90 6.20 0.31 0.58 90
20 2250 195 480 6.50 8.50 0.31 0.60 90
21 2351 177 679 1.80 7.90 0.32 0.61 94
22 2350 135 640 3.30 5.50 0.28 0.58 95
23 1977 181 705 6.70 3.30 0.29 0.60 87
24 2125 200 830 6.50 4.50 0.31 0.59 90
25 2033 189 830 6.50 5.90 0.32 0.60 90
26 1850 150 800 5.80 5.00 0.32 0.64 92
27 1904 103 970 0.00 4.10 0.24 0.51 93
28 1950 125 670 6.50 4.00 0.29 0.52 88
29 1795 290 629 5.10 5.70 0.35 0.63 90
30 2060 240 590 6.00 4.00 0.32 0.65 88
To understand these concepts, consider the data given in Table 4.1. It consists
of a sample of 30 observations taken on eight variables, (Jî, X < 2 , . . . , Xg), measured
on a chemical process.
It is assumed at this stage of data investigation that a decision has been made as
to the purposes and types of control procedures required for this process. Suppose
it is desired to construct a control procedure using only the observations on the
first seven variables presented in Table 4.1. Further, suppose it is most important
to maintain variable X^ above a critical value of 5. Any drifting or changes in
relationships of the other process variables from values that help maintain X^ above
its critical value need to be detected so that corrective action can be taken.
Initially, the data must be filtered to obtain a preliminary data set from which
the HDS can be constructed. There are 17 observations with X± above its critical
value of 5. The obvious action is to sort the data on X± and remove all observations
in which X^ has a value below its critical value. This action should produce a set
of data with the desired characteristics.
4.3. Preliminary Data 59
Table 4.2: Group means of first seven variables.

Group Xi X2 X3 X4 X5 X6 X7
1 1849 195 694 2.33 6.55 .295 .583
2 2070 201 657 6.48 6.00 .300 .600
% Diff 0.7 -3.1 5.6 -6.4 9.1 -1.7 -2.8
Figure 4.1: Time-sequence plot of megawatts and fuel usage.
The filtering of data can provide valuable process information. For example,
suppose the out-of-specification runs (i.e., X^ < 5) are labeled Group 1 and the in-
specification runs (i.e., X^ > 5) are labeled Group 2. The means of the variables of
the two groups are presented in Table 4.2. Valuable process information is obtained
by closely examining the data. A mean difference on variable 4 is to be expected
since it was used to form the groups. However, large percentage differences in the
means are observed on variables 1 and 5, and a moderate difference is observed on
variable 3. Further investigation is needed in determining how these variables are
influencing variable 4.
A preliminary data set should be thoroughly examined using both statistical
procedures and graphical tools. For example, consider the graph presented in Figure
4.1 of fuel consumption, labeled Fuel, and megawatt-hours production, labeled
Megawatts (or MW), of a steam turbine over time (in days of operation). Close
examination of the graph produces interesting results. Note the valleys and peaks
in the MW trace. These indicate load changes on the unit, whereas the plateaus
reflect production at a constant load. When the load is reduced, the MW usage
curve follows the fuel graph downward. Similarly, the MW graph follows the fuel
graph upwards when the load is increased. This trend indicates there is a lag,
Table 4.3: Data for lag comparison.

Obs. No. X y ylagl
1 7.84 116.93 4
2 9.14 117.45 116.93
3 9.20 118.04 117.45
4 9.10 111.40 118.04
5 9.21 85.96 111.40
6 8.62 117.24 85.96
7 6.78 117.42 117.24
8 9.14 117.66 117.42
9 9.15 118.42 117.66
10 9.13 116.84 118.42
11 9.29 116.50 116.84
12 8.95 116.76 116.50
13 9.09 108.05 116.76
14 9.18 90.34 108.05
15 8.33 79.01 90.34
16 8.63 113.24 79.01
17 4.32 115.50 113.24
18 8.74 111.92 115.50
19 8.91 118.54 111.92
20 8.70 118.51 118.54
21 9.20 119.25 118.51
22 9.18 82.01 119.25
23 8.76 68.89 82.01
24 6.25 77.76 68.89
25 5.18 81.05 77.76
26 5.94 101.77 81.05
27 6.13 111.15 101.77
111.15
during a load change, in the response time of the turbine to the amount of fuel
being supplied. This is very similar to the operation of a car, since accelerating or
decelerating it does not produce an instantaneous response.
Lags in only part of the data, as seen in the example in Figure 4.1, often can
be easily recognized by graphical inspection. Other methods, however, must be
used to detect a lag time extending across an entire processing unit. Observations
across a processing unit are made at a single point in time. Before one can use the
observations in this form, there must be some guarantee that the output observa-
tions match the input observations. Otherwise, the lag time must be determined
and the appropriate parts of the observation vector shifted to match the lag. Some
processes, such as strippers used to remove unwanted chemical compounds, work
instantaneously from input to output. Other processes, such as silica production,
have a long retention time from input to output. Consultation with the operators
and process engineers can be most helpful in determining the correct lag time of
the process.
A helpful method for determining if lag relationships exist between two vari-
ables is to compute and compare their pairwise correlation with the correlation be-
tween one variable and the lag of the other variable. For example, consider hourly
4.4. Data Collection Procedures 61
Table 4.4: Lag correlation with process variable.

Correlation
y 0.148
ylagl 0.447
ylag2 0.937
ylag3 0.598
observations taken (at the same sampling period) on two process variables. A
sample of size 27 is presented in Table 4.3, where the variable x is a feedstock
characteristic and the variable y is an output quality characteristic. We begin by
calculating the correlation between x and y. The value is given as 0.148. Next
we lag the y values one time period and reconstruct the data set, as presented in
Table 4.3. The resulting variable, labeled ylagl, has a correlation of 0.447 when
compared to the x variable. We note an increase in the correlation.
The observations on the quality variable y could be continuously shuffled down-
ward until its maximum correlation with y is obtained. The correlations for three
consecutive lags are presented in Table 4.4. Maximum correlation is obtained by
lagging the quality characteristic two time periods. Note the decrease in the cor-
relation for three lags of the quality characteristic. Thus, we estimate the time
through the system as being two hours.
4.4 Data Collection Procedures

Careful examination of the recorded data can alleviate problems occurring in data
acquisition. For example, data are recorded in many ways within processing units.
At one extreme, information may be manually tallied on log sheets, whereas, in
more modern units, data may be recorded electronically. Problems have occurred
using both methods.
Consider data recorded by operators using log sheets. Missing data frequently
occur, as the operator is often busy doing something else at the appropriate sam-
pling time and fails to accurately record an observation. Transcribing errors also
may be prevalent. If problems such as these occur, they must be resolved before
the control procedure is implemented. If observations are costly to obtain, missing
data can become a serious problem.
Electronic data collectors may eliminate problems like the ones described above,
but they can produce others. For example, consider the 20 observations, on seven
variables, listed in Table 4.5. The observation vectors are collected electronically at
1-minute intervals. Some of the components are chemical analysis determinations
obtained using a separate piece of equipment. Since the recycle times of the different
pieces of equipment vary, the same variables are being recorded at different sampling
times. For example, the piece of equipment used in recording the observation on
variable X-j has a two-minute recycle time. Note that observations 12, 13, and 14
all have the same value of 0.27. Other observations also contain repeats on this
variable. To construct a preliminary data set from observations such as these, one
Table 4.5: Electronically gathered data.

Time Xi X2 X3 X4 X5 X6 X7
1 138.21 1.27 5762.99 158.81 2327.05 0.35 0.16
2 135.72 1.27 5762.99 154.20 2327.05 0.35 0.16
3 131.72 1.27 5763.87 151.13 2327.93 0.34 0.16
4 128.24 1.27 5763.87 149.81 2327.05 0.34 0.16
5 122.96 1.27 5762.11 152.67 2327.05 0.33 0.16
6 121.91 1.28 5762.99 175.08 2327.05 0.33 0.16
7 118.21 1.28 5763.87 192.57 2312.11 0.30 0.16
8 121.04 1.27 5763.87 204.29 2312.11 0.29 0.16
9 124.53 1.27 5763.87 203.34 2312.07 0.28 0.16
10 131.47 1.27 7009.57 179.13 2312.70 0.27 0.16
11 131.60 1.23 7009.57 160.42 2312.70 0.26 0.16
12 116.98 1.27 6232.03 162.14 2076.86 0.24 0.27
13 122.94 12.7 6233.50 148.67 2077.73 0.24 0.27
14 128.78 12.7 6234.38 141.55 2077.73 0.24 0.27
15 132.21 1.27 6233.50 141.13 2075.98 0.28 0.05
16 134.87 1.27 6234.38 141.55 2253.22 0.26 0.05
17 134.64 1.28 6233.50 166.27 2254.10 0.25 0.05
18 134.38 1.27 6234.28 182.23 2253.22 0.23 0.05
19 131.36 12.6 6234.38 194.50 2253.22 0.22 0.05
20 120.79 1.27 5822.46 200.91 2253.22 0.20 0.05
needs to examine closely how each observation is determined and the recycle time
of each piece of equipment.
Another problem area in data collection includes incorrect observations on com-
ponents. This may be a result of faulty equipment such as transistors and temper-
ature probes. Problems of this type can be identified using various forms of data
plots or data verification techniques (usually included in the purging process).
4.5 Missing Data

Missing or incorrect observations on variables can occur for numerous reasons, in-
cluding operator failure, equipment failure, or incorrect lab determinations. The
simplest approach would be to delete the observations or variables with the missing
data. However, this method is valid only if the remaining data is still a represen-
tative sample from the process and if the cause of the missing data is unrelated to
the values themselves. Otherwise, the problem of incomplete data is complex and
could have several different solutions.
One helpful estimate of a missing value, when using the T2 as the control statis-
tic, is the mean of that variable, conditioned on the observed values of the other
vector components. This is simply the predicted value of the variable based on the
regression of the variable on the other remaining variables. This value should have
little or no influence on the other components of the data vector and minimum
effect on the parameters to be estimated. When substituted into the data vector,
it should make the observation vector homogeneous with the other observations.
4.5. Missing Data 63
Table 4.6: Chemical process data for six variables.

Time Xi(NaOH) X2(NaCl) X3(Ii) X4(I2) X5(C12) X6(02)
1 134.89 203.00 0.05 4.00 98.37 1.17
2 129.30 203.10 0.06 1.90 98.37 1.17
3 145.50 208.60 0.17 6.10 98.23 1.42
4 143.80 188.10 0.11 0.40 98.44 1.12
5 146.30 189.10 0.22 0.50 98.44 1.11
6 141.50 196.19 0.16 3.50 98.26 1.35
7 157.30 185.30 0.09 2.90 98.23 1.40
8 141.10 209.10 0.16 0.50 98.69 0.86
9 131.30 200.80 0.17 3.80 97.95 1.64
10 156.60 189.00 0.19 0.50 97.97 1.62
11 135.60 192.80 0.26 0.50 97.65 1.94
12 128.39 213.10 0.07 3.60 98.43 1.23
13 138.10 198.30 0.15 2.70 98.12 1.36
14 140.50 186.10 0.30 0.30 98.15 1.37
15 139.30 204.00 0.25 3.80 98.02 1.54
16 152.39 176.30 0.19 0.90 98.22 1.30
17 139.69 186.10 0.15 1.60 98.30 1.25
18 130.30 190.50 0.23 2.60 98.08 1.37
19 132.19 198.60 0.09 5.70 98.30 1.16
20 134.80 196.10 0.17 4.90 97.98 1.50
21 142.30 198.80 0.09 0.30 98.41 1.00
Table 4.7: Correlation comparison.

NaOH NaCl Ii I2 02
Without 0.042 0.246 -0.551 -0.123 -0.962
With 0.050 0.247 -0.563 -0.158 -0.970
To demonstrate this regression procedure for estimating a missing value, con-

sider the 21 observations presented in Table 4.6. Assume this is an HDS for a
chemical process and consider the highlighted observation on variable X^ (in the
last row) as a missing observation. The regression equation of X§ on X\, X%, ^3,
X^, arid ^g as derived from the first 20 observations is given as
and the estimated value of C12 is computed as 98.58. This is in close agreement
with the actual value of 98.41 given in Table 4.6.
The T2 value, with this estimate of the missing component, is 5.94 as compared
to the actual value of 5.92. Thus, substituting the missing value has little influence
on the T2 statistic. Similarly, there is negligible change in the mean of X$. The
mean of the observations without the missing value is 98.21 versus a mean of 98.23
when the estimated value is included. A comparison of the correlations between
^5 and the other five variables, with and without the predicted value, are given in
Table 4.7. There appears to be little difference between these correlations.
The fill-in-the-value approach presented above is a simple and quick method for
estimating missing values in an HDS. However, among its limitations are the fact
that the estimated value is only as good as the prediction equation that produced
it, and the fact that estimation may affect the variance estimates. Many other
solutions exist (e.g., see Little and Rubin (1987)), and these can be used when
better estimation techniques are preferred.
4.6 Functional Form of Variables

Using the originally measured variables in an HDS may not provide information
about relationships between process variables in the best usable form. As stated
in Chapter 2, the T2 statistic is constructed using the linear relationships existing
between and among the process variables. Many situations occur where a simple
transformation on one of the variables can strengthen the linearity, or correlation,
among the variables. Examining scatter plots of pairs of variables can be helpful
in making these decisions.
For example, consider the plot of millivolts versus temperature given in Figure
4.2. The trend in the data appears to follow some form of a logarithmic or expo-
nential relationship. However, the correlation between the variables is very strong
with a value of 0.94. This is mainly due to the strong linearity between the two
variables when millivolts exceed 5 in value. A downward curvature of the plot oc-
curs toward the lower end of the scale of the millivolts axis. This curvature can be
(somewhat) removed with a simple log transformation of millivolts. A plot of log
(millivolt) versus temperature is presented in Figure 4.3. Although the plot still
exhibits a slight curvature, the correlation between temperature and log (millivolt)
has increased slightly to a value of 0.975.
Figure 4.2: Plot of relationship between millivolts and temperature.

4.7. Detecting Collinearities 65
Other methods, such as those based on theoretical knowledge about relation-

ships between the variables, may be more helpful in substituting functional forms
for some of the variables. Consulting with the process engineer can be helpful in
making these decisions.
4.7 Detecting Collinearities

The formula for a T2 statistic is based on a covariance matrix that is nonsingu-
lar and can be inverted. A singular covariance matrix occurs when two or more
observed variables are perfectly correlated (i.e., exactly collinear). Computer soft-
ware packages used in computing the inverse of the matrix produce a warning when
this results. However, most computer packages do not provide a warning when the
collinearity in a matrix is severe but not exact. Our discussion in this section cen-
ters on the latter situation and how to detect and adjust for Collinearities. Several
books include discussions on Collinearities and their effects on matrices (e.g., see
Belsley, Kuh, and Welsch (1980) or Chatterjee and Price (1999)).
Collinearities can occur in a covariance or correlation matrix because of sampling
restrictions, because of theoretical relationships existing in the process, and because
of outliers in the data. One method of identifying a collinearity is to examine the
eigenvalues and eigenvectors of the sample covariance matrix (see subsection 4.11.1
of this chapter's appendix). However, to ease computations, one usually examines
the corresponding correlation matrix since the results from this examination are
equally applicable to the covariance matrix.
A statistical tool that is useful in this process is a principal component analysis
(PCA). This procedure can help detect the existence of a near singularity, can be
used to determine subgroups of variables that are highly correlated, and can be
used to estimate the dimensionality of the system. In some cases, the principal
components themselves can give insight into the true nature of these dimensions.
Figure 4.3: Plot of relationship between temperature and log (millivolts).

Figure 4.4: Schematic of diaphragm cell.
A brief summary of PCA and its relationship to the eigenvalue problem is contained
in subsection 4.11.2 of this chapter's appendix. The reader unfamiliar with these
concepts is encouraged to review this appendix before continuing with this section.
A more detailed discussion of PCA is provided by Jackson (1991).
The effects of a near-singular covariance matrix on the performance of a T2
statistic will be demonstrated in the following example. We begin by expressing
the inverse of the sample covariance matrix as
where AI > A2 > • • • > Ap are the eigenvalues of S and t/j, j — 1, 2 , . . . ,p, are
the corresponding eigenvectors. If Xp is close to zero, the ratio (1/AP) becomes
very large and can have a disproportionate effect on the calculation of the inverse
matrix. This distorts any statistic, such as the T2, that uses the inverse matrix in
its calculation.
To demonstrate how to examine the eigenstructure of a matrix, we examine
the correlation matrix of a chlorine (C^)/caustic (NaOH) production unit. A
unit schematic is presented in Figure 4.4. This particular process is based on
the electrolysis of brine (salt). A current is passed through a concentration of brine
solution where the anode and cathode are separated by a porous diaphragm. The
chlorine is displaced as a gas and the remaining water/brine solution contains the
caustic. The unit performing this work is referred to as a cell, and several of these
are housed together (as a unit) to form an electrolyzer. Overall performance of
the cell is measured by the percentage of the available power being used in the
conversion process. This percentage is a computed variable and is referred to as
conversion efficiency (CE). High values of this variable are very desirable.
Many variables other than CE are used as indicators of cell performance. Mea-
sured variables are the days of life of the cell (DOL), cell gases including chlorine
4.7. Detecting Collinearities 67
Table 4.8: Correlation matrix for chlorine data.

NaOH NaCl Ii 12 C12 02 CE
NaOH 1.000 -0.013 0.218 0.037 -0.297 0.284 -0.284
NaCl -0.013 1.000 0.015 0.023 0.006 0.001 -0.001
Ii 0.218 0.015 1.000 0.567 -0.395 0.323 -0.324
I2 0.037 0.023 0.567 1.000 -0.402 0.323 -0.324
C12 -0.297 0.006 -0.395 -0.402 1.000 -0.956 0.956
02 0.284 0.001 0.323 0.368 -0.956 1.000 -0.999
CE -0.284 -0.001 -0.324 -0.369 0.956 -0.999 1.000
Table 4.9: Eigenvalues of correlation matrix.

Eigen- %of Cum
No. value Total % NaOH NaCl Ii I2 C12 02 CE
1 3.4914 49.88 49.88 0.2073 0.0028 0.2999 0.3039 -0.5105 0.507 -0.507
2 1.1513 16.45 66.32 0.22 -0.1838 -0.6092 -0.6347 -0.1739 0.2384 -0.2382
3 0.9959 14.23 80.55 -0.0664 0.9761 -0.1487 -0.1027 -0.0401 0.0654 -0.0655
4 0.9116 13.02 93.57 0.9151 0.1158 0.2557 -0.1194 0.1265 -0.1638 0.1634
5 0.3965 5.66 99.24 0.2584 -0.0049 -0.6677 0.6928 0.0792 -0.0232 0.0231
6 0.0534 0.76 99.99 0.0064 -0.0068 0.0745 0.001 0.8278 0.3926 -0.3937
7 0.0003 0 100 0.0002 0.0001 0.0001 0.0001 0.0007 0.7074 0.7068
and oxygen (Cl? and 02), caustic (NaOH), salt (NaCl), and impurities production
(Ii and 12). The levels of impurities are important since their production indicates
a waste of electrical power, and they contaminate the caustic.
Table 4.8 is the correlation matrix for an HDS (n = 416) based on seven of these
variables. Its eigenstructure will be examined in order to determine if a severe
collinearity exists among the computed variables. Inspection of the correlation
matrix reveals some very large pairwise correlations. For example, the correlation
between the two measured gases, C12 and O 2 , has a value of -0.956. Also, the
computed CE variable, which contains both Cl2 and O2, has a correlation of 0.956
with C12 and -0.999 with O 2 .
Using a PCA, the seven eigenvalues and eigenvectors for the correlation matrix
are presented in Table 4.9. Also included is the proportion of the correlation varia-
tion explained by the corresponding eigenvectors as well as the cumulative percent-
age of variation explained. A recommended guideline for identifying a near-singular
matrix is based on the size of the square root of the ratio of the maximum eigen-
value to each of the other eigenvalues. These ratios are labeled as condition indices.
A condition index greater than 30 implies that a severe collinearity is present. The
value of the largest index, labeled the condition number, for the data in Table 4.9 is
which clearly indicates the presence of a severe collinearity among the variables.
A severe collinearity in the correlation matrix translates into the presence of a
severe collinearity in the associated covariance matrix. Since it is not possible or
advisable to use a T2 control statistic when the covariance matrix is singular or near
singular, several alternatives are suggested. The first, and simplest to implement, is
to remove one of the variables involved in the collinearity. This is especially useful
when one of the collinear variables is computed from several others, since deletion
of one of these variables will not remove any process information.
To determine which variables are involved in a severe collinearity, one need
only examine the linear combination of variables provided by the eigenvector cor-
responding to the smallest eigenvalue. From Table 4.9, this linear combination
corresponding to the smallest eigenvalue of 0.0003 is given by
Ignoring the variables with small coefficients (i.e., small loadings) gives the linear
relationship between the two variables that is producing the collinearity problem.
This relationship is given as
This relationship confirms the large negative correlation, —0.999, found between
CE and O2 in Table 4.8.
The information contained in the computed variable CE is redundant with that
contained in the measured variable 62- This relationship is producing a near sin-
gularity in the correlation matrix. Since CE is a computed variable that can be
removed with no loss of additional information, one means of correcting this data
deficiency is to compute the T2 statistic using only the remaining six variables.
The revised correlation matrix for these six variables is obtained from the correla-
tion matrix presented in Table 4.9 by deleting the row and column corresponding
to CE.
Another method for removing a collinearity from a covariance matrix is to re-
construct the matrix by excluding the eigenvectors corresponding to the near-zero
eigenvalues. The contribution of the smallest eigenvalues would be removed and
S~l would be computed using only the larger ones; i.e.,
This approach should be used with caution since, in reducing the number of prin-
cipal components, one may lose the ability to identify shifts in some directions in
terms of the full set of the original variables.
4.8 Detecting Autocorrelation

Most multivariate control procedures require the observation vectors to be uncor-
related over time. Unfortunately, violations of this assumption can weaken the
effectiveness of the overall control procedure. Some industrial operations, such
as chemical processes, are particularly prone to generating time-correlated, or au-
tocorrelated, data. This situation can occur because of the continuous wear on
4.8. Detecting Autocorrelation 69
Figure 4.5: Continuous decay of heat transfer coefficient over time.
equipment, the environmental and chemical contamination of the equipment, and

the depletion of certain critical components, such as the availability of catalyst in
the process.
Autocorrelation, like ordinary correlation, may be the result of either a cause-
and-effect relationship or only of an association. If it is due to a cause-and-effect
relationship, the observation on the time-dependent variable is proportional to the
value of the variable at some prior time. If it is a relationship based on association,
the present value of the variable is only associated with the past value and not
determined by it.
Why do certain types of processes have a tendency to generate observations
with a time dependency? One possible answer to this very important question
is that it is due to an association (correlation) with an unobservable "lurking"
variable. Consider two variables that are negatively correlated, so that as one
variable increases in value, the other decreases. Suppose one of the variables, the
lurking one, cannot be observed and is increasing with time. Since this variable
is not measurable, the time association will appear as a time dependency in the
second variable. Without knowledge of the first variable, one could conclude that
the second variable has a time dependency in its observations.
Autocorrelated observations may occur in at least two different forms. The first
we label as continuous or uniform decay, as it occurs when the observed value of
the variable is dependent on some immediate past value. Certain in-line process
filters, used to remove impurities, behave in this fashion. Another example is given
by the measure of available heat, in the form of a heat transfer coefficient, which
is used to do work in many types of processes. During a life cycle of the unit, the
transfer of heat is inhibited due to equipment contamination or other reasons that
cannot be observed or measured. A cycle is created when the unit is shut down and
cleaned. During the cycle, the process is constantly monitored to insure maximum
efficiency. Figure 4.5 is the graph of a heat transfer coefficient over a number of life
cycles of a production unit.
Figure 4.6: Stage decay of process variable over time.
The second form of autocorrelated data is labeled stage decay (e.g., see Mason,
Tracy, and Young (1996)). This occurs when the time change in the variable is in-
consistent over shorter time periods, but occurs in a stepwise fashion over extended
periods of time. This can occur in certain types of processes where change with
time occurs very slowly. The time relationship comes from the performance in one
stage being dependent on the process performance in the previous stage(s). The
graph of a process variable having two stages of decay is presented in Figure 4.6.
If autocorrelation is undetected or ignored, it can create serious problems with
control procedures that do not adjust for it. The major problem is similar to the one
that occurs when using univariate control procedures on variables of a multivariate
process. Univariate procedures ignore relationships between variables. Thus, the
effect of one variable is confounded with the effects of other correlated variables. A
similar situation occurs with autocorrelated data when the time dependencies are
not removed. Adjustment is necessary in order to obtain an undistorted observation
on process performance at a given point in time.
Control procedures for autocorrelated data in a univariate setting often make
these adjustments by modeling the time dependency and plotting the resultant
residuals. Under proper assumptions, these residual errors, or adjusted values (i.e.,
effect of the time dependency removed), can be shown to be independent and
normally distributed. Hence, they can be used as the charting statistic for the
time-adjusted process. It is also useful to look at forecasts of charting statistics
since processes with in-control residuals can drift far from the target values (e.g.,
see Montgomery (1997)).
We offer a somewhat similar solution for autocorrelated data from multivari-
ate processes. However, the problem becomes more complicated. We must be
concerned not only with autocorrelated data on some of the variables, but also
with how the time variables relate to the other process variables. Autocorrelation
4.8. Detecting Autocorrelation 71
Figure 4.7: Process variable with cycle.
does not eliminate these relationships, but instead confounds them and thus must
be removed for clear signal interpretation. How this is done is a major focus of
Chapter 10.
One simple method of detecting autocorrelation in univariate processes is ac-
complished by plotting the variable against time. Depending on the nature of the
autocorrelation, the points in a graph of the process variable versus time will either
move up or down or oscillate back and forth. Subsequent data analysis is used
to verify the presence of autocorrelation, determine lag times, and fit appropriate
autoregressive models.
Observations from a multivariate process are p-dimensional and the components
are usually correlated. The simple method of plotting graphs of individual compo-
nents against time can be inefficient when there are a large number of variables.
Also, these time-sequence plots may be influenced by other correlated variables,
resulting in incorrect interpretations. For example, considering the cyclic nature
over time of the variable depicted in Figure 4.7, one might suspect that some form
of autocorrelation is present. However, this effect is due to the temperature of the
coolant, which has a seasonal trend. Nevertheless, even with this drawback, we
have found that graphing each variable over time is useful.
To augment the above graphical method and to reduce the number of individual
graphs for study, one could introduce a time-sequence variable in the data set and
examine how the individual variables relate to it. If a process variable correlates
with the time-sequence variable, it is highly probable that the process variable
correlates with itself in time. Using this method, one can locate potential variables
that are autocorrelated. Detailed analysis, including the graphing of the variable
over time, will either confirm or deny the assertion for individual variables.
When a variable is correlated with time, there will be dependencies between

successive observations. This dependency may occur between consecutive observa-
tions, every other observation, every third observation, etc. These are denoted as
different lag values. One useful technique for displaying this time dependency is to
calculate the sample autocorrelation, r^, between observations a lag of k time units
apart, where
The rk values (with TQ = 1) provide useful information on the structure of the

autocorrelation. For example, if the fcth lag autocorrelation is large in absolute
value, one can expect the time dependency at this lag to be substantial. This
information can be summarized in a plot of the r^ values against the values of k.
This graph of the autocorrelation function is sometimes referred to as a correlogram.
More details on the use of these plots can be found in time-series books (e.g., Box
and Jenkins (1976)).
4.9 Example of Autocorrelation Detection Techniques

To demonstrate the techniques of detecting autocorrelated data in a multivariate
process, we will use a partial data set on a heat transfer coefficient taken from
Mason and Young (1997). Many industries, especially the chemical industry, use
heat as the energy source in the removal or separation of chemical components.
For example, removal of water is necessary to strengthen a caustic solution. Heat
available to do such work is measured by a heat transfer coefficient. Most evap-
orators are efficient only if the coefficient remains above a certain value. When
the value drops below a critical level, the unit is cleaned, critical components are
replaced, and a new cycle begins. Data consisting of 20 observations on such a unit
are presented in Table 4.10.
Heat transfer is one of the many variables used in the control of the process. It
interests us because, as is indicated in Figure 4.8, its values appear to be linearly
decreasing over time. The autocorrelation structure is illustrated with the correlo-
gram in Figure 4.9. Of the various lag autocorrelations presented for this variable,
the highest value occurs at ri = 0.9164 and corresponds to a lag of one time unit.
Such a linear trend in a time plot is labeled a first-order autoregressive relationship,
or simply an AR(1) relationship.
Inferential statistical techniques also can be applied to confirm this result. For
example, Table 4.11 contains the regression analysis-of-variance table for the fitting
of an AR(1) model of the form
4.9. Example of Autocorrelation Detection Techniques 73
Table 4.10: Raw data for heat transfer example.

Obs. No. Heat Transfer Lag Heat Transfer
1 103 106
2 103 103
3 106 103
4 106 106
5 107 106
6 105 107
7 102 105
8 103 102
9 99 103
10 99 99
11 99 99
12 98 99
13 98 98
14 97 98
15 94 97
16 99 94
17 99 99
18 96 99
19 93 96
20 92 93
21 90 92
22 91 90
23 90 91
Table 4.11: ANOVA table for AR(1) model.
df SS MS F p- value
Regression I 506.304 506.304 109.711 0.000
Residual 21 96.912 4.614
Total 22 603.217
where b0 and 61 are the estimated coefficients of the model relating the heat trans-
fer variable yt to its lag value yt-i- The small p-value for the F statistic in the
table implies that there is strong evidence that the immediate past heat transfer
coefficient is an important predictor of the current heat transfer coefficient.
As another example, consider the techniques necessary for detecting autocorre-
lation in data collected from a reactor used to convert ethylene (C2H4) to ethylene
dichloride (EDC). EDC is the basic building block for much of the vinyl product
industry. Feed stock for the reactor includes hydrochloric acid gas (HC1), ethy-
lene, and oxygen (02). Conversion of the feed stock to EDC occurs under high
temperature in the reactor. The conversion process is labeled oxyhydrochlorination
(OHC).
There are many different types of OHC reactors available to perform the conver-
sion of ethylene and HC1 to EDC. One type, a fixed life or fixed bed reactor, must
have critical components replaced at the end of each run cycle. The components
Figure 4.8: Time-sequence plot of heat transfer coefficient.
Figure 4.9: Correlogram for heat transfer coefficient.
are slowly depleted during operation and performance of the reactor follows the
depletion of the critical components. The best performance of the reactor is at the
beginning of the cycle, as the reactor gradually becomes less efficient during the
remainder of the cycle. While other variables have influence on the performance of
the reactor, this inherent decay of the reactor produces a time dependency in many
of the process and quality variables.
We have chosen seven variables to demonstrate how to detect and adjust for
autocorrelated data in this type of process. These are presented in Table 4.12. The
first variable, RPl, is a measure of feed rate. The next four, Temperature, LI, 1/2,
and 1/3, are process variables, and the last two are output variables. Variable PI
is an indication of the amount of production for the reactor and variable Cl is an
Table 4.12: Process variables from an OHC reactor.

Input Variables Process Variables Output Variables
RPl Temp LI, L2, L3 Cl, PI
Figure 4.10: Time-sequence plot of temperature.
undesirable by-product of the production system. All variables, with the exception
of feed rate, show some type of time dependency.
Temperature measurements are available from many different locations on a
reactor. All of them are important elements in the performance and control of
the reactor and increase over the life cycle. To demonstrate the time decay of the
measured temperatures, we present in Figure 4.10 a graph of the average tempera-
ture over a good production run. The graph indicates the average temperature of
the reactor is initially stable, but then it gradually increases over the life cycle of
the unit.
The time-sequence graphs in Figures 4.11 and 4.12 of the two process variables,
L3 and LI, present two contrasting patterns. In Figure 4.12, L3 increases linearly
with time and has the appearance of a first-order lag relationship. This is confirmed
by the fact that r\ — 0.7533. However, the graph of LI in Figure 4.13 is depicted as
a quadratic or exponential across time, but can still be approximated by a first-order
lag relationship. In this case, r<2 = 0.9331.
The graph of Cl versus time is presented in Figure 4.13. The time trend in
this graph differs somewhat from the previous graphs. There appear to be separate
stages in the plot: one at the beginning, another in the middle, and the third stage
at the end.
Of the remaining three variables, none show strong time dependencies. As an
example, consider the time-sequence plot for RPl given in Figure 4.14. Across
time, the plot of the data is nearly horizontal and shows no trends or patterns.
Figure 4.11: Time-sequence plot of L3.
Figure 4.12: Time-sequence plot of I/I.
Another useful indication of autocorrelation in a process is the presence of a

strong pairwise correlation between a variable and the time sequence of its collec-
tion. For example, the correlations between the seven variables listed in Table 4.12
and a time-sequence variable (Time) based on the observation number for a run
cycle are presented in Table 4.13. Inspection of the pairwise correlations between
the seven variables and time indicates there is a moderate time dependency for
variables 1/2 and Temp, and a lesser one for the unwanted I/I and Cl. A weak-to-
moderate relationship is indicated for RP1, for the process variable L3, and for the
production variable PI. Note that the production variable PI decreases over time,
as indicated by the negative correlation coefficient.
Figure 4.13: Time-sequence plot of Cl.
Figure 4.14: Time-sequence plot of RP1.
Table 4.13: Pairwise correlations with time-sequence variable.

Process
Variables Time
RPl 0.507
LI 0.693
LI 0.811
L3 0.318
Temp 0.808
Cl 0.691
PI -0.456
4.10 Summary
Control procedures are designed to detect and help in determining the cause of
unusual process events. The point of reference for "unusual events" is the historical
data set. This is the baseline of any control procedure and must be constructed with
great care. The first step in its construction is to acquire an understanding of the
process. This knowledge can be obtained from the operators and process engineers.
A study of the overall system will reveal problem areas where the applications of a
control procedure would be most helpful. This is necessary to determine the type
and purpose of the control procedure.
With the selection of an appropriate area for application of a control procedure,
we can obtain a preliminary data set. However, the data must be carefully filtered
of its impurities so that the resulting data set is clean. Graphical tools can be
a great aid in this process, as they can be used to identify obvious outliers and,
in some cases, determine useful functional relationships among the variables. In
addition, data collection and data verification procedures must be examined and
any missing data replaced or estimated, or else one must remove the associated
observation vector or process variable.
After filtering the preliminary data, we strongly recommend checking on the
singularity of the covariance matrix. The problems of a singular covariance matrix,
or of collinearity among the variables, can be quite critical. Collinearity often
occurs when there are many variables to consider or when some of the variables
are computed from measured ones. These situations can be detected using the
eigenvalues of the covariance or correlation matrices. Principal component analysis
can be a useful tool in this determination, as can consultation with the process
engineers. Since a severe collinearity can inflate the T2 statistic, appropriate action
must be taken to remove this problem.
Steady-state control procedures do not work well on autocorrelated processes.
Thus, one must investigate for the presence of autocorrelation in the preliminary
data set. We offer two procedures for detecting the presence of autocorrelated data
in a multivariate system. The first is based on plotting a variable over time and
looking for trends or patterns in the plot. The second is based on plotting the sample
autocorrelations between observations separated by a specified lag time versus time
and examining the observed trends. Large autocorrelations will pinpoint probable
cases for further study. In Chapter 10, we discuss procedures for removing the
effects of these time dependencies on the T2 statistic.
4.11 Appendix
4.11.1 Eigenvalues and Eigenvectors
Consider a square (p x p) matrix A. We seek to find scalar (constants) values A^,
i = l,2,...,p, and the corresponding (p x 1) vectors Ui, i = 1, 2 , . . . , p , such that
the matrix equation
4.11. Appendix 79
is satisfied. The \i are referred to as the eigenvalues, or characteristic roots, of A,

and the corresponding vectors, C/j, are referred to as the eigenvectors, or character-
istic vectors, of A. The \i are obtained by solving the pth-degree polynomial in A
given by
where A—XI \ represents the determinant of the matrix (A—XI). The corresponding
eigenvectors Ui are then obtained by solving the homogeneous system of equations
given in (A4.1).
The eigenvalues (Ai, A 2 , . . . , Xp) are unique to the matrix A] however the cor-
responding eigenvectors (t/i, C/2, • • • , Up) are not unique. In statistical analysis the
eigenvectors are often scaled to unity or normalized, so that
Note that the eigenvalues of A~l are the reciprocals of the eigenvalues of A. The
corresponding eigenvectors of A~l are the same as those of A.
Covariance matrices, such as S, that are associated with a T2 statistic are
symmetric, positive definite matrices. For symmetric matrices, the correspond-
ing eigenvalues must be real numbers. For positive definite symmetric matrices,
the eigenvalues must be greater than zero. Also, with symmetric matrices the
eigenvectors associated with distinct eigenvalues are orthogonal so that UiU'j = 0.
Near-singular conditions (i.e., collinearities) exist when one or more eigenvalues are
close to zero. Closeness is judged by the size of an eigenvalue relative to the largest
eigenvalue. The square root of the ratio of the largest eigenvalue (Ai) to any other
eigenvalue (Xi) of a matrix A is known as a condition index and is given by
A recommended guideline for identifying a near-singular matrix is based on the size

of this ratio. A ratio greater than 30 implies that a severe collinearity is present.
Our main reason for examining eigenvalues and eigenvectors is their use in diag-
nosing (near) collinearities associated with the T2 statistic. This topic is extensive
and cannot be covered in this limited space. For more details, see Belsley (1991)
or Myers and Milton (1991).
4.11.2 Principal Component Analysis

The iih principal component, Zi, of a matrix A is obtained by transforming an
observation vector X' — (rci, # 2 , . . . , xp) by
where Ui, i = 1, 2 , . . . , p, are the eigenvectors of A. If A is symmetric, its eigenvec-

tors (associated with distinct eigenvalues) are orthogonal to each other. Thus, the
resulting principal components (zi, 2 2 , . . . , zp) also would be orthogonal. Also, each
principal component in (A4.3) has a variance given by the corresponding eigenvalue,
i.e.,
A major use of PCA is to transform the information contained in p correlated

process variables into the p independent principal components. The transformation
in (A4.3) is made in such a way that the first k of the principal components con-
tain almost all the information related to the variation contained in the original p
variables. Used in this manner, PCA is known as a dimension-reduction technique
since k < p. The percentage of the total variation explained by the first k principal
components is computed by the ratio of the sum of the first k eigenvalues to the
total sum of the eigenvalues. It can be shown that when a PCA is performed on a
correlation matrix, this reduces to
The principal components of a covariance matrix can be used to identify the

variables that are related to a collinearity problem. Suppose the condition index for
the smallest eigenvalue, A p , is greater than 100. This implies that Xp is very small
relative to the largest eigenvalue of the covariance matrix. It also implies that the
variance of the pth principal component is very small, or approximately zero, i.e.,
that var(zp) ~ 0. For a covariance matrix, this implies that a near-perfect linear
relationship is given by the pth principal component, i.e.,
This equation produces the collinear relationship that exists between Xj and the
other variables of the system.
The theoretical development of PCA is covered in the many texts on multivariate
analysis, e.g., Morrison (1990), Seber (1984), and Johnson and Wichern (1998). An
especially helpful reference on the applications of PCA is given in Jackson (1991).
Chapter 5
Charting the T2 Statistic in

Phase I
5.1 Introduction
In this chapter we discuss methods, based on the T2 statistic, for identifying atypical
observations in an HDS. We also include some examples of detection schemes based
on distribution-free methods. When attempting to detect such observations, it is
assumed that good preliminary data are available and that all other potential data
problems have been investigated and resolved.
The statistical purging of unusual observations in a Phase I operation is essen-
tially the same as an outlier detection problem. An outlier is an atypical observation
located at an extreme distance from the main part of the sample data. Several use-
ful statistical tests have been presented for identifying these observations, and these
techniques have been described in numerous articles and books (e.g., see Barnett
and Lewis (1994), Hawkins (1980), and Gnanadesikan (1977)).
Although the T2 statistic is not necessarily the optimal method for identifying
outliers, particularly when used repeatedly as in a control chart, it is a simple pro-
cedure to apply and can be very helpful in locating individual outlying observations.
Further, as shown in Chapter 7, the T2 statistic has the additional advantage of
being capable of determining the process variables causing an observation to signal.
For these reasons, we will concentrate only on the T2 statistic.
5.2 The Outlier Problem

The reason we seek to remove outlying observations from the HDS is because their
inclusion can result in biased sample estimates of the population mean vector and
covariance matrix and lead to inaccurate control procedures. To demonstrate this,
consider the scatter plot in Figure 5.1 of a preliminary data set on two variables.
Three separate groupings of outliers, denoted as Groups A, B, and C, are pre-
sented in the graph. The inclusion of these observations in the HDS, denoted by
81
82 Chapter 5. Charting the T2 Statistic in Phase I
figure 5.1 scater plot with thre eeg roufp f dat a
black circle son the graph will bias the estimates of thwe varieanc of the two vart
ables and/or the estimates of the correlation between these two variables. For
example, the inclusion of Group A data will increase the variation in both variables
but will have little effect on their pairwise correlation. In contrast, including the
Group C data will distort the correlation between the two variables, though it will
increase the variation of mainly the x\ variable.
Why do atypical observations, similar to those presented above, occur in an
initial sample from an in-control multivariate process? There are many reasons,
such as a faulty transistor sending wrong signals, human error in transcribing a log
entry, or units operating under abnormal conditions. Most atypical information can
be identified using graphs and scatter plots of the variables or by consulting with
the process engineer. For example, several of the observations in Groups B and C
of Figure 5.1, such as points Bl and Cl, are obvious outliers; however, others may
not be as evident. It is for this reason that a good purging procedure is needed.
Detecting atypical observations is not as straightforward in multivariate systems
as in univariate ones. A nonconforming observation vector in the multivariate sense
is one that does not conform to the group. The purging procedure must be able to
identify both the components of the observation vectors that are out of tolerance
as well as those that have atypical relationships with other components.
5.3 Univariate Outlier Detection

Consider a univariate process where control is to be monitored with a Shewhart
chart based on an individual observation x. Assume the mean, //, and the standard
deviation, cr, of the underlying normal distribution, 7V(//,<7 2 ), are known for the
process. Our goal is to clean the preliminary data set (collected in Phase I) by
5.3. Univariate Outlier Detection 83
Figure 5.2: Normal distribution with control limits.
purging it of outliers. Any observation in the data set that is beyond the control
limits of the chart is removed from further consideration.
Suppose that the Shewhart upper and lower control limits, denoted UCL and
LCL, for this data set are those depicted in Figure 5.2. We assume that any outlier is
an observation that does not come from this distribution, but from another normal
distribution, N(/j, + d, a 2 ), having the same standard deviation, but with the mean
shifted d units to the right. Both distributions are depicted in Figure 5.3.
Detecting an outlier in this setting is equivalent to the testing of a statistical
null hypothesis. To decide if the given observation is taken from the shifted normal
distribution, and thus declared an outlier, we test the null hypothesis
that all observations arise from the normal distribution JV(p,,cj 2 ) against the alter-
native hypothesis
that all observations arise from the shifted normal distribution N({j,-\-d: a 2 ). If the
null hypothesis is rejected, we declare the observation to be an outlier and remove
it from the preliminary data set.
In the above hypothesis test, the distribution under the null hypothesis is re-
ferred to as the null distribution and the distribution under the alternative hypoth-
esis is labeled the nonnull distribution. The power of the hypothesis test is denoted
in Figure 5.3 by the area of the shaded region under the nonnull distribution,
which is the distribution shifted to the right. This is the probability of detecting
an observation as an outlier when it indeed comes from the shifted distribution.
Comparisons are made among different outlier detection schemes by comparing the
power function of the procedures across all values of the mean shift, denoted by d.
Many analysts use univariate control chart limits of individual variables to re-
move outlying observations. In this procedure, all observation vectors that contain
Figure 5.3: Original and shifted normal distributions.
Figure 5.4: Univariate Shewhart region and T2 control region.
an observation on a variable outside the 3cr range are excluded. This is equivalent
to using the univariate Shewhart limits for individual variables to detect outlying
observations. A comparison of this procedure with the T2 procedure is illustrated
in Figure 5.4 for the case of two variables.
5.4. Multivariate Outlier Detection 85
The shaded box in the graph is defined by the univariate Shewhart chart for each
variable. For moderate-to-strong correlations between the two variables, the T2
control ellipse usually extends beyond the box. This indicates that the operational
range of the variables of a multivariate correlated system can be larger than the
control chart limits of independent variables.
Use of the univariate control chart limits of a set of variables ignores the con-
tribution of their correlations and in most cases restricts the operational range of
the individual variables. This restriction produces a conservative control region for
the control procedure, which in turn generates an increased number of false signals.
This is one of the main reasons for not using univariate control procedures to detect
outliers in a multivariate system.
5.4 Multivariate Outlier Detection

As in the univariate case, a preliminary control procedure for a multivariate process
must be constructed to purge the data of atypical observations. With the T2
statistic, the corresponding control chart has only a UCL. If the T2 value that is
computed for an observation exceeds this limit, the observation is deleted.
For (univariate) Shewhart charting, there is very little procedural difference be-
tween the preliminary charting procedure and the actual control procedure. How-
ever, the multivariate Shewhart chart based on the T2 statistic offers some distinct
differences in the computation of the UCL, mainly due to the form of the prob-
ability distribution of the T2 statistic. As noted in Chapter 2, depending on the
circumstances, the T2 statistic can be described by three different probability func-
tions: the beta, the F, and the chi-square distributions. The beta distribution is
used in the purging process of a Phase I operation, whereas the F distribution is
used in the development of the control procedure in a Phase II operation. The
chi-square distribution has applications in both Phase I and II operations.
One begins the purging process by selecting a value for a, the probability of a
Type I error. Its choice determines the size, 1 — a, of the control region. A Type I
error is made if an observation is declared an outlier when in fact it is not. In making
such errors, we exclude good observations from the HDS. These observations have
large statistical distances (from the mean vector) and will lie in the tail region of
the assumed MVN distribution. For small preliminary data sets, excluding these
observations can have a significant effect on estimates of the covariance matrix and
mean vector. For larger samples, the effect should be minimal.
The situation is reversed when one considers the error of including an outlier
in the HDS. For small samples, the effect of one outlier on the estimates of the
mean vector and covariance matrix can be substantial. In the face of this dilemma,
especially when using small sample sizes, we recommend carefully examining any
observation considered for deletion and discussing it with the process engineer.
5.5 Purging Outliers: Unknown Parameter Case

Consider a Phase I purging procedure where a single observation vector X' =
(xi, X2,. • • , xp] is to be monitored for control of the process using a T2 chart. We
assume that the data follow an MVN distribution with an unknown mean vector
H and an unknown covariance matrix E. From the preliminary data, we obtain
estimates x and S of // and E using the procedures from Chapter 2.
We begin the purging process by making an initial pass through the preliminary
data. For a given a level, all observation vectors whose T2 values are less than or
equal to the UCL will remain in the data set, i.e., retain X if
where the control limit is determined by
and where -S[a, p /2,(n-p-i)/2] is the upper crth quantile of the beta distribution,
B\p/2,(n-p-i)/2]- If an observation vector has a value greater than the UCL, it
is to be purged from the preliminary data. With the remaining observations, we
calculate new estimates of the mean vector and covariance matrix. A second pass
through the data is now made. Again, we remove all detected outliers and repeat
the process until a homogeneous set of observations is obtained. The final set of
data is the HDS.
When process control is to be based on monitoring the subgroup means of k
samples of observations, the actual purging process of the preliminary data set is
the same as for individual observations. The data are recorded in samples of size
k
m^, i = l , 2 , . . . , f c , yielding a total sample size of n = ^ m^. Since each individual
i=l
observation vector comes from the same MVN distribution, we can disregard the
k subgroups and treat the observations as one group. With the overall group, we
obtain the estimates, and S, and proceed as before. When the process is in control,
this approach produces the most efficient estimator of the covariance matrix (e.g.,
see Wierda (1994) or Chou, Mason, and Young (1999)).
5.5.1 Temperature Example

To illustrate the above procedure when using individual observations, consider the
25 observations presented in Table 5.1. These data are a set of temperature readings
from the eight configured burners on a boiler. The burner temperatures are denoted
as ti, t2,..., ts, and their correlation matrix is presented in Table 5.2. The control
procedure is designed to detect any significant deviation in temperature readings
and any change in the correlations among these variables. If this occurs, a "cold
spot" develops and inadequate burning results.
The T2 values of the 25 observations are computed using (5.1) and are presented
in Table 5.3. Using an a — 0.001, the control limit is computed using the formula
in (5.2). This yields a value of 17.416. Observation 9 is detected as an outlier since
its T2 value of 17.58 exceeds the UCL. Hence, it is removed from the preliminary
data set.
5.5. Purging Outliers: Unknown Parameter Case 87
Table 5.1: Boiler temperature data.

Obs. No. ti *2 t3 *4 *5 *6 *7 ts
1 507 516 527 516 499 512 472 477
2 512 513 533 518 502 510 476 475
3 520 512 537 518 503 512 480 477
4 520 514 538 516 504 517 480 479
5 530 515 542 525 504 512 481 477
6 528 516 541 524 505 514 482 480
7 522 513 537 518 503 512 479 477
8 527 509 537 521 504 508 478 472
9 533 514 528 529 508 512 482 477
10 530 512 538 524 507 512 482 477
11 530 512 541 525 507 511 482 476
12 527 513 541 523 506 512 481 476
13 529 514 542 525 506 512 481 477
14 522 509 539 518 501 510 476 475
15 532 515 545 528 507 511 481 478
16 531 514 543 525 507 511 482 477
17 535 514 542 530 509 511 483 477
18 516 515 537 515 501 516 476 481
19 514 510 532 512 497 512 471 476
20 536 512 540 526 509 512 482 477
21 522 514 540 518 497 514 475 478
22 520 514 540 518 501 514 475 478
23 526 517 546 522 502 516 477 480
24 527 514 543 523 502 512 475 476
25 529 518 544 525 504 516 479 481
Table 5.2: Correlation matrix for boiler temperature data.

tl *2 ts *4 *5 *6 <7 ts
tl 1 0.059 0.584 0.901 0.819 -0.147 0.813 0.014
*2 0.059 1 0.281 0.258 0.044 0.659 0.094 0.797
*3 0.584 0.281 1 0.444 0.308 0.200 0.396 0.294
*4 0.901 0.258 0.444 1 0.846 -0.226 0.788 0.018
*5 0.819 0.044 0.308 0.849 1 -0.231 0.924 -0.043
t6 -0.147 0.659 0.200 -0.226 -0.231 1 -0.103 0.893
t? 0.813 0.094 0.396 0.788 0.924 -0.103 1 0.079
ts 0.014 0.797 0.294 0.018 -0.043 0.893 0.079 1
New estimates of the mean vector and covariance matrix are computed from
the remaining 24 observations, and the purging process is repeated. The new
correlation matrix is presented in Table 5.4. Comparing the correlation matrices of
the purged data and the unpurged data, we find a definite change. For example,
with the removal of observation 9 the correlation between ti and £3 increases from
0.584 to 0.807. This illustrates the effect a single outlier can have on a correlation
coefficient when there is a small sample size (n = 25). Note that such a significant
change in the correlation matrix implies a similar change in the covariance matrix.
The new T2 values are presented in Table 5.5. Since the new UCL for the
reduced set of 24 observations is 17.00, the second pass through the data produces
Table 5.3: T2 values for first pass.
Obs. No. T2 Value Obs. No. T2 Value

1 13.96 14 9.55
2 9.78 15 7.07
3 5.47 16 6.52
4 14.74 17 4.77
5 6.58 18 8.74
6 5.31 19 9.84
7 7.89 20 8.64
8 9.78 21 12.58
9 17.58* 22 2.79
10 2.79 23 6.09
11 3.29 24 7.98
12 3.63 25 5.32
13 1.32
'Indicates T value is significant at 0.001 level.
Table 5.4: Correlation matrix with one outlier removed.

tl t2 *3 *4 *5 *6 *7 «8
*1 1 0.051 0.807 0.899 0.808 -0.141 0.805 0.021
*2 0.051 1 0.342 0.259 0.034 0.662 0.087 0.799
*3 0.807 0.342 1 0.717 0.506 0.204 0.569 0.320
*4 0.899 0.259 0.717 1 0.838 -0.225 0.780 0.027
t5 0.808 0.034 0.506 0.838 1 -0.228 0.922 -0.037
*6 -0.141 0.662 0.204 -0.225 -0.228 1 -0.096 0.893
t7 0.805 0.087 0.569 0.780 0.922 -0.096 1 0.086
ts 0.021 0.799 0.320 0.027 -0.037 0.893 0.086 1
Table 5.5: T2 values for second pass.
Obs. No. T2 value Obs. No. T2 value

1 16.07 14 9.84
2 9.62 15 7.55
3 5.41 16 7.49
4 14.09 17 6.62
5 6.56 18 8.75
6 5.43 19 9.66
7 7.89 20 10.62
8 9.34 21 12.62
10 5.26 22 3.26
11 3.21 23 6.79
12 3.53 24 7.76
13 1.22 25 5.44
no additional outliers. The listed observations (excluding 9) form a homogeneous

statistical group and can be used as the HDS.
An alternative graphical approach to outlier detection based on the T2 statistic
is to examine a Q-Q plot (see Chapter 3) of the appropriate T2 values. For the
5.5. Purging Outliers: Unknown Parameter Case 89
Figure 5.5: Q-Q plot of temperature data.
above temperature data, with p = 2 and n = 25, the beta distribution for the T2
statistic using the formula given in (2.15) is
A Q-Q plot of the T2 values, converted to beta values by dividing them by 0.922, is
presented in Figure 5.5. Several of the plotted points do not fall on the given line
through the data. This is especially true for the few points located in the upper
right corner of the graph. This is supported by the T2 values given in Table 5.3.
where four points. 1, 4, 9, and 21, have T2 values larger than 10. Observation 9,
located at the upper end of the line of plotted values, is somewhat removed from
the others. Given this result, the point should be investigated as a potential outlier.
5.5.2 Transformer Example

As a second data example, consider 134 observations taken on 23 variables used in
monitoring the performance of a large transformer. The T2 values of the preliminary
data set are presented in the T2 control chart given in Figure 5.6. An a — 0.00
was used in determining the UCL of 44.798. The chart indicates that an upset
condition begins at observation 20 and ends at observation 27. Other than the
upset condition, the T2 values for the preliminary data appear to be in steady state
and indicate the occurrence of good process operations.
A Q-Q plot of the 134 T2 values is presented in Figure 5.7. Although the data
generally have a linear trend, the large values associated with the observations
contained in the upset condition have a tendency to pull the other observations
upward in the upper right corner of the graph.
The 7 observations contained in the upset condition of Figure 5.6 were removed
from the transformer data set, and the T2 control chart was reconstructed using the
remaining 127 observations. The corresponding T2 plot is presented in Figure 5.8.
Figure 5.6: T chart for transformer data with upset condition.
Figure 5.7: Q-Q plot of transformer data.
Figure 5.8: T2 chart for transformer data after outlier removal.

5.6. Purging Outliers: Known Parameter Case 91
Figure 5.9: Q-Q plot for transformer data after outlier removal.
The UCL is recalculated as 44.528. Observe that the system appears to be very
consistent and all observations have T2 values below the UCL. The corresponding
Q-Q plot of the T2 values is presented in Figure 5.9. Observe the strong linear
trend exhibited in the plot and the absence of observations far off the trend line.
5.6 Purging Outliers: Known Parameter Case

Assume the sample data follow an MVN distribution having a known mean vector
and known covariance matrix. In this setting, the T2 test statistic for an observation
vector Xf = ( x - [ , X 2 , . . . ,xp) becomes
For a given a level, the UCL for the purging process is determined using
where xJa p\ is the upper ath quantile of a chi-square distribution having p degrees
of freedom.
To illustrate this procedure and contrast it to the case where the parameters
are unknown, assume the sample mean vector and covariance matrix of the data
in Table 5.1 are the true population values. Using an a — 0.001, the UCL is
^fo 001 2) = 26.125. Comparing the observed T2 values in Table 5.3 to this value
we find that no observation is declared an outlier. Thus, observation 9 would not
be deleted.
A major difference between the T2 statistics in (5.1) and (5.3) is due to how we
determine the corresponding UCL. When the mean vector and covariance matrix are
estimated, as in (5.1), the beta distribution is applicable, but when these parameters
are known, as in (5.4), the chi-square distribution should be used. It can be shown
that for large n, the UCL as calculated under the beta distribution (denoted as
Table 5.6: Values of UCLB when a = 0.001.

n
p 50 100 150 200 250 300 350 400 450 500 00
2 8.55 8.88 8.99 9.04 9.08 9.10 9.12 9.13 9.14 9.14 9.21
6 15.00 15.90 16.21 16.36 16.45 16.51 16.55 16.58 16.61 16.63 16.81
30 39.27 45.39 47.27 48.19 48.74 49.10 49.36 49.55 49.70 49.82 50.89
UCL#) approaches the UCL as specified by the chi-square distribution (denoted as

UCL^). This is illustrated in Table 5.6 for various values of n and p at a = 0.001.
Notice that, in this example, using the chi-square (instead of the beta) UCL
for small n and p, such as p = 2 and n = 50, increases the likelihood of accepting
potential outliers in the HDS. This occurs because UCLc- (at n = oo) always exceeds
UCL#. For large p, such as p = 30, n should exceed 500 in order to justify usage
of the chi-square UCL. A comparison of the UCL for various values of n, p, and a
for the two situations can be found in Tracy, Young, and Mason (1992).
5.7 Unknown T2 Distribution

In Chapter 3, we introduced three procedures for determining the UCL in a T2
control chart in cases where the distribution of the T2 statistic was unknown. The
first was a method based on the use of Chebyshev's theorem, the second was based
on the quantile technique, and the third was based on the fitting of the empirical
T2 distribution using a kernel smoothing technique. In this section, we use a data
example to show how these methods can be used to identify outliers.
To demonstrate the procedure based on Chebyshev's theorem, we consider n =
491 observations on p = 6 variables taken as a preliminary data set under good
operational conditions from an industrial process. The corresponding T2 values of
the observations are computed and plotted on the control chart given in Figure
5.10. Examination of the chart clearly indicates the presence of a number of large
T2 values. These large values are also indicated in the tail of the T2 histogram
presented in Figure 5.11.
The estimated mean and the standard deviation of the 491 T2 values are com-
puted and the UCL is approximated using the formula in (3.7) from Chapter 3,
i.e.,
with k = 4.472 and a < 0.10. The estimated UCL for this first pass is 19.617,
and it is used to remove 13 outliers. The estimation of the UCL and the resultant
purging process is repeated until a homogeneous data set of T2 values is obtained.
Five passes are required and 28 observations are deleted. The results of each pass
of this procedure are presented in Table 5.7.
5.7. Unknown T2 Distribution 93
Figure 5.10: T2 values for industrial data set.
Figure 5.11: Histogram of T2 values for industrial data.
Table 5.7: Results of purging process using Chebyshev's procedure.

Pass 1 Pass 2 Pass 3 Pass 4 Pass 5 Pass 6
T 5.988 5.987 5.987 5.987 5.987 5.987
ST 4.310 3.630 3.488 3.407 3.348 3.310
UCL 19.617 17.468 17.019 16.761 16.575 16.455
# of Outliers Removed 13 6 4 3 2 0
Table 5.8: Results of purging process assuming multivariate normality.

Pass 1 Pass 2 Pass 3 Pass 4 Pass 5
UCL 16.627 16.620 16.617 16.616 16.616
# of Outliers Removed 17 7 3 1 0
Table 5.9: Results of purging process using nonparametric procedure.

Pass 1 Pass 2 Pass 3 Pass 4 Pass 5 Pass 6
UCL 22.952 21.142 17.590 17.223 16.588 16.402
# of Outliers Removed 5 8 6 3 5 (1)
If one assumes that the T2 values are described by a beta distribution and
calculates the UCL using (5.2) with an a = 0.01, the same 28 observations are
removed. However, the order of removal is not the same and only four passes
are required. These results are presented in Table 5.8. In this case, the major
difference between the two procedures is that the probability of a Type I error is
fixed at a = 0.01 for the beta distribution, whereas the error rate for the Chebyshev
approach is only bounded by a < 0.10.
To demonstrate the procedure based on the quantile technique, the 491 T2
values are arranged in descending order and an approximate UCL is calculated
using a = 0.01 and the formula in (A3.3) from Chapter 3, i.e.,
The estimated UCL for this first pass is 22.952, and it is used to remove five
outliers. The estimation of the UCL and the purging process are repeated until a
homogeneous data set of T2 values is obtained.
In this procedure, there will always be at least one T2 value exceeding the
estimated UCL in each pass. Thus, one must stop at the step where only a single
outlier is encountered. Since this occurs at step 6 for our data example, only five
passes are required and 27 observations are deleted. The results of each pass of this
procedure are presented in Table 5.9.
The third method for obtaining an appropriate UCL is to fit a distribution to the
T2 statistic using the kernel smoothing technique. The UCL can be approximated
using the (1 — a)th quantile of the fitted kernel distribution function of the T2. We
begin by using the preliminary data of n observations to obtain the estimates X
and S of the parameters // and E. Using these estimates, we compute the T2 values.
These n values provide the empirical distribution of the T2 statistic for a Phase I
operation. As previously noted, we are assuming that the intercorrelation common
to the T2 values has little effect on the application of these statistical procedures.
We apply the kernel smoothing procedure described by Polansky and Baker
(2000) to obtain FK(t), the kernel estimate of the distribution of T 2 , or simply the
5.7. Unknown T Distribution 95
Table 5.10: Results of the purging process using the kernel smoothing technique.
Pass 1 Pass 2 Pass 3 Pass 4 Pass 5 Pass 6 Pass 7
UCL 23.568 22.531 19.418 18.388 17.276 16.626 16.045
# of Outliers Removed 5 4 4 4 5 4 3
kernel distribution of the T 2 . It is given as
where $ is the standard normal distribution function, T 2 (j), j = 1 , . . . , n, denote

the n T2 values, and /i is the two-stage estimate of the bandwidth. An algorithm
for computing h is outlined in Polansky and Baker (2000).
The UCL is determined as the (1 — a)th quantile of FK(i) and satisfies the
equation
(5.7)
Since FK is generally a skewed distribution, the UCL can be large for small a
values, such as 0.01 and 0.001. The (1 — a)th sample quantile of the T 2 (j), j =
1 , . . . , n, can be used as the initial value for the UCL in (5.7).
Since the kernel distribution tends to fit the data well, for a moderate a value
between 0 and 1, approximately no. (rounded to the nearest integer) of the T2 values
are beyond the UCL or the upper lOOcrth percentile of the kernel distribution. For
n = 491 and a = 0.01, na = 4.91 ~ 5, and one may expect that four to five values
of the T2 are above the UCL.
After these outliers are removed in each pass, there are always some points
above the newly calculated UCL. This seems to be inevitable unless n or a is very
small, so that na is around 0. Because the kernel method is based solely on data,
one way of determining the UCL for the final stage is to compare the UCLs and
the kernel distribution curves for successive passes. If the UCLs for two consecutive
passes are very different, this implies that the kernel distribution curves also differ
significantly after outliers are removed. However, if the UCLs and the curves for
two consecutive passes are nearly the same, this implies that the UCL is the desired
UCL for the final stage.
For the data in the example, the UCLs for Passes 7 and 8 are 16.045 and 15.829,
respectively. The difference between the bandwidths of these kernel estimates is
only 0.004. Therefore, the three points in Pass 7 cannot be viewed as outliers and
should be kept in the HDS. After six passes, 26 observations are removed and the
remaining 465 observations form the HDS. The UCL for the T2 chart should be set
at 16.045, as Pass 7 is the final pass. Table 5.10 presents the results of the entire
purging process.
5.8 Summary
Historical data sets are very important in process control as they provide a baseline
for comparison. Any process deviation from this reference data set is considered
out of control, even in those situations where the system improves. Because of this
criticality, the process needs to be in control and on target when the HDS observa-
tions are selected. Inclusion of atypical observations will increase the variation and
distort the correlations among the variables. As few as one outlying observation
can do this. It is for these reasons that a good outlier purging procedure, such as
one based on the T2 statistic, is needed.
A common misconception is that the operational range of a variable in a mul-
tivariate process is the same as the operational range of a variable in a univariate
process. This is only true for independent variables. For correlated variables, the
operational range of the variables is increased. Correct outlier purging procedures
will determine the appropriate operational ranges on the variables.
Several different forms of the T2 can be used in detecting outliers in a Phase I op-
eration. These include situations where the population parameters are both known
and unknown and where observations are either individually charted or charted
as means. In addition, three alternative procedures are available for use when
the assumption of multivariate normality is invalid. These include the Chebyshev
approach, the quantile method, and the kernel smoothing approach.
Chapter 6
Charting the T2 Statistic
in Phase II
Old Blue: An Invaluable Tool

Using the software package, you quickly create a set of scatter plots, his-
tograms, and time-sequence charts. That old statistics professor was right on
target when he said graphical procedures provide true insight into the struc-
ture and behavior of your data. You visually spot a number of questionable
observations. From your process experience, you know these are not typical.
You find the procedure for creating the baseline data to be more involved
than the methods you used with single-variable data. It takes some time for
you to figure out that the heat transfer coefficient is causing the singularity
in the covariance matrix. You should have caught that right away since the
textbook pointed out that a computed variable sometimes contains redundant
information when the variables used in constructing the computed variable
are included in the analysis. The extremely high correlation between the heat
transfer coefficient and the pressure drop was another clue that you missed.
You examine a Q-Q plot of the T2 data from the Phase I operation and
decide the beta distribution can be used to describe the T2 statistic for your
process. As noted in the initial plots, this plot also indicates the presence of
a few outliers. However, it only takes two passes through the data to remove
them. These unusual observations are not like the rest of the data set. You
wonder why they occurred and decide to set these aside for later investigation.
All in all, the Phase I analysis shows "Old Blue" to be very consistent.
The T2 values of the baseline data are all small and very close together and
show very little variation. Since each T2 is a composite of observations on the
available input, process, and output variables, you can easily determine the
overall performance of the unit.
You reflect on the time you spent in process operations. A tool based on
the T2 chart, and used on-line and in real time, would have been invaluable.
You recall the constant stress created in determining if all was going well.
98 Chapter 6. Charting the T2 Statistic in Phase II
There was no single monitored variable useful in determining unit perfor-

mance. You conclude that something must be done about incorporating this
powerful measure into the unit operations.
You now are prepared to move forward with your project. The next step
is the location and isolation of the T2 signals from the incoming process data.
You are ready for Phase II.
6.1 Introduction
A number of items need to be considered when choosing the appropriate T2 chart-
ing procedure for a Phase II operation. These include computing the appropriate
charting statistic, selecting a Type I error probability, and determining the UCL.
For example, if we monitor a steady-state process that produces independent obser-
vations, a T2 charting procedure will suffice. However, if the observations exhibit
a time dependency, such as that which is inherent to decay processes, some adjust-
ment for the time dependency must be made to the T2 statistic (i.e., see Chapter
10).
The charting of the T2 statistic in a Phase II operation is very similar to the
approach used in charting the statistic for a Phase I operation. The major difference
is in the probability functions used in determining the control region. Two cases
exist. When the mean vector and covariance structure are known, a chi-square
distribution is used to describe the behavior of the statistic and determine the
upper control limit. When the mean and covariance parameters are unknown and
must be estimated from the historical data, an F distribution is used to describe
the statistic and locate the upper control limit.
In this chapter, we address several different charting procedures for the T2
statistic and examine the advantages and disadvantages of each. We initially
discuss monitoring a process using a T2 statistic when only a single observation
vector is collected at each time point. This is later extended to the situation
where the process is monitored using the mean of a subgroup of observations taken
at each time point. Other topics discussed include the choice of the probabil-
ity of a Type I error, procedures for calculating the average run length to de-
tect a given mean shift, and charts for the probability of detecting a shift in the
mean vector. Any nonrandom pattern displayed in a T2 chart can imply process
change. For this reason, we include a section on detecting systematic patterns
in T2 charts.
6.2 Choice of False Alarm Rate

When constructing a T2 control chart, consideration must be given to the choice of
a, the probability of a Type I error. Recall that this topic was briefly discussed in
Section 5.4 for a Phase I operation. In a Phase II operation, a Type I error occurs
when we conclude that an observation presents a signal when in fact no signal is
present. Signaling observations are detected when their T2 values exceed the UCL.
In turn, the UCL is primarily determined by the choice of a. This is an important
6.2. Choice of False Alarm Rate 99
decision, since a value of a can be chosen such that the T2 value of an observation
exceeds the UCL, even though the observation really contains no signal. Note, also,
that the size of the control region is 1 — a. This is the probability of concluding
that the process is in control when in fact control is being maintained on all process
variables.
The size of a cannot be considered without discussion of /3, the probability of a
Type II error. This is the error of concluding there is no signal when in fact a signal
is present. Type I and Type II errors are interrelated in that an increase in the
probability of one will produce a decrease in the probability of the other. Careful
consideration must be given to the consequences produced by both types of error.
For example, suppose a chemical process is producing a product that becomes
hazardous when a particular component increases above a given level. Assume that
this component, along with several other correlated components, is observed on a
regular basis. A T2 control procedure is used to check the relationships among the
components as well as to determine if each is in its desired operational range. If a
Type I error is made, needless rework of the product is required since the process is
in control. If a Type II error is made, dangerous conditions immediately exist. Since
dangerous conditions override the loss of revenue, a very small (3 is desirable. Given
this preference for low risk of including an outlier, a large a would be acceptable.
The value of a chosen for a T2 chart in a Phase II operation does not have
to agree with the value used in constructing the Phase I chart. Instances do exist
where making a Type I error in a Phase II operation is not so crucial. For example,
suppose change is not initiated in the actual control of a process until more than one
signal is observed. This reduces the risk of overcontrolling the process. Situations
such as these require a larger a in the Phase II operation. In contrast, a large a
for Phase I can produce a conservative estimate of both the mean vector and the
covariance matrix, so some balance is necessary.
The choice of a for a univariate charting procedure pertains only to the false
alarm rate for the specified variable being monitored. For example, the control
limits of a Shewhart chart are frequently located at plus or minus three standard
deviations from the center line of the charted statistic. This choice fixes the false
alarm rate a at a value of 0.0027 and fixes the size of the control region at (1 — a)
or 0.9973. This translates to a false alarm rate of about 3 observations per 1,000.
The choice of a in monitoring a multivariate process is more complex, as it
reflects the simultaneous risk associated with an entire set of variables (e.g., Timm
(1996)). Establishing a control procedure for each component of the observation
vector would lead to an inappropriate control region for the variables as a group,
as individual control does not consider relationships existing among the variables.
This is illustrated with the following example.
Suppose X' = (£1,0:2) is a bivariate normal observation on a process that is to
be monitored by a joint control region defined by using a 3-sigma Shewhart proce-
dure for each individual variable. The shaded box given in Figure 6.1 illustrates the
control region. A major problem with this approach is that it ignores the relation-
ship that exists between the process variables and treats them independently. The
true joint control procedure, if the two variables were correlated, would be similar
to the ellipse, which is superimposed on the box in Figure 6.1.
Figure 6.1: Region of joint control.

If the p variables being monitored in a multivariate process are independent,
the simultaneous false alarm rate, o;s, is computed as
where a is the false alarm rate for each individual variable. Thus the value of as
increases as p, the number of variables, increases. For example, if a = 0.0027, the
simultaneous false alarm rate for p — 2 is 1 — (0.0027)2 = 0.0053, but for p = 4, the
rate increases to 0.0108. This example produces exact probabilities for a process
with independent variables. In reality, a process usually consists of a group of
correlated variables. Such situations tend to increase the true false alarm rate even
beyond (6.1).
6.3 T2 Charts with Unknown Parameters

Consider a continuous steady-state process where the observation vectors are in-
dependent and the parameters of the underlying normal distribution are unknown
and must be estimated. Assume the process is being monitored by observing a
single observation vector, X' = (xi, £2, • • • , xp), on p variables at each time point.
The T2 value associated with X is given by
where the common estimates X and S are obtained from the HDS following the
procedures described in Chapter 4. In this Phase II setting, the T2 statistic in (6.2)
follows the F distribution given in (2.14). For a given a, the UCL is computed as
where n is the size of the HDS and F(a.p^n_p) is the ath quantile of F( p;n _ p ).
6.3. T2 Charts with Unknown Parameters 101
Figure 6.2: Example of a T2 chart.
Using these results, we can plot new incoming observations in sequence on a

control chart. An example of such a chart is given in Figure 6.2. The chart is the
same as that of a Phase I chart used in the purging operation with the exception
that the UCL is computed using a different form of the F distribution. Any new
observation whose T2 value plots above the UCL is declared a signal, and it is
concluded that the observation does not conform to the baseline data set.
To illustrate the T2 charting procedure, we consider observations taken from the
steam turbine system illustrated in Figure 6.3. The input to this system is fuel in the
form of natural gas that is used to produce steam in the boiler. The high-pressure
steam is used to turn the turbine, which is connected to the generator that produces
electricity (in megawatts-hours). The warm low-pressure vapor from the turbine is
moved to the condenser, where it is converted into a liquid state for pumping to the
boiler. The boiler uses heat to transform the liquid water into high-pressure steam.
The system is considered to be at steady state and works in a continuous cycle.
Baseline data for 28 observations on a steam turbine system are presented in
Table 6.1. Measurements are made on the following variables: fuel usage (Fuel), the
amount of steam (Steam Flow) produced, the steam temperature (Steam Temp),
the megawatt-hour production (MW) of the turbine, the coolant temperature (Cool
Temp), and the absolute pressure (Pressure) observed from the condenser. Esti-
mates of the mean vector and covariance matrix, obtained from these data, are
given as follows:
and
- 5.25£07 2.76£07 -11749.5 3313.42 -408.673 -169.391 '
2.76£07 1.91E07 -8112.06 2302.14 -203.795 -115.969
-11749.5 -8112.06 8.6918 0.93735 0.152381 0.032143
S=
3313.42 2302.14 -0.93735 0.285332 -0.02312 -0.0134
-408.673 -203.795 0.152381 0.02312 0.043598 0.003757
.-169.391 -115.969 0.032143 0.0134 0.003757 0.002474.
Figure 6.3: Steam turbine system.

Table 6.1: Phase I steam turbine data.
Obs. No. Fuel Steam Flow Steam Temp MW Cool Temp Pressure
1 232666 178753 850 20.53 54.1 29.2
2 237813 177645 847 20.55 54.2 29.2
3 240825 177817 848 20.55 54.0 29.2
4 240244 178839 850 20.57 53.9 29.1
5 239042 177817 849 20.57 53.9 29.2
6 239436 177903 850 20.59 54.0 29.1
7 234428 177903 848 20.57 53.9 29.2
8 232319 177990 848 20.55 53.7 29.1
9 233370 177903 848 20.48 53.6 29.1
10 237221 178076 850 20.49 53.9 29.1
11 238416 177817 848 20.55 53.9 29.1
12 235607 177817 848 20.55 53.8 29.1
13 241423 177903 847 20.55 53.7 29.1
14 233353 177731 849 20.53 53.6 29.1
15 231324 178753 846 20.64 53.9 29.1
16 243930 187378 844 21.67 53.9 29.1
17 252550 187287 843 21.65 54.2 29.1
18 251166 187745 842 21.67 53.7 29.1
19 252597 188770 841 21.78 53.4 29.1
20 243360 179868 842 20.66 53.7 29.1
21 238771 181389 843 20.81 53.9 29.1
22 239777 181411 841 20.88 54.0 29.1
23 219664 167330 850 19.08 54.1 29.2
24 228634 176137 846 20.64 54.0 29.2
25 231514 176029 843 20.24 53.8 29.2
26 235024 176115 846 20.22 53.6 29.2
27 239413 176115 845 20.31 53.7 29.2
28 228795 176201 847 20.24 54.3 29.2
6.3. T2 Charts with Unknown Parameters 103
Table 6.2: Phase II steam turbine data.

Steam Steam Cool
Obs. No. Fuel Flow Temp MW Temp Pressure T2
1 234953 181678 843 20.84 54.5 29.0 35.00
2 247080 189354 844 20.86 54.4 28.9 167.98*
3 238323 184419 845 21.10 54.5 28.9 56.82*
4 248801 189169 843 22.18 54.5 28.9 69.48*
5 246525 185511 842 21.21 54.6 28.9 65.91*
6 233215 180409 845 20.75 54.5 29.0 32.56
7 233955 181323 842 20.82 54.6 29.0 43.91
8 238693 181346 844 20.92 54.8 29.0 49.33*
9 248048 185307 844 21.15 54.6 29.0 39.96
10 233074 181411 844 20.93 54.5 29.0 34.46
11 242833 186216 844 21.59 54.4 29.0 25.51
12 243950 182147 844 21.37 54.2 29.0 41.03
13 238739 183349 844 21.01 54.3 29.0 23.28
14 251963 188012 850 21.68 54.4 29.0 29.33
15 240058 183372 846 21.15 54.2 29.0 16.40
16 235376 182436 844 20.99 54.3 29.0 24.10
"Exceeds UCL = 43.91.
A T2 control procedure is developed, using these estimates, to monitor efficiency

by detecting significant changes in any of the six monitored variables. The UCL
for the chart is computed from (6.3), with a = 0.001, as
Considering the data in Table 6.1 as the HDS, T2 values for 16 new incoming
observations are computed. Table 6.2 contains these values in addition to their
corresponding T2 values. The T2 values are computed using (6.2) and the parameter
estimates obtained from the historical data in Table 6.1. For example, the T2 value
for observation 1 is
where
and the inverse matrix is obtained from S.

A T2 chart for the new observations is presented in Figure 6.4. Out-of-control
conditions occur at observations 2-5 and again at observation 8. Our conclusion is
that these observations do not conform to the HDS presented in Table 6.1.
Reaction to a Signal
Any observation that produces a T2 value falling outside its control region is a
signal. This implies that conditions have changed from the historical situation.
Figure 6.4: T2 chart for new observations.
Although an isolated signal can be due to a chance occurrence of an upset condition,

multiple signals often imply a definite shift in the process. Should we react to
single signals or do we wait for multiple signals before declaring that there are
upset conditions? The answer depends on the process setting, such as whether or
not the problem can be immediately corrected or can wait until the next scheduled
maintenance period. An engineer may choose not to react to an isolated out-of-
control point in order to minimize the likelihood of overcontrolling the process. In
such settings, searches are made for trends and patterns in the T2 control chart, and
out-of-control conditions are declared only when a number of T2 values plot outside
the control limit. Also, consultation with the process engineer and operators helps
in determining if upset conditions exist in the process.
Consider the performance of a steam turbine that is operating at steady-state
conditions. Generally, a fixed amount of fuel is supplied to the turbine and a fixed
load (MW) is the output. In this state, one would expect little fluctuation from
the baseline data. However, steady-state conditions are known to deteriorate under
load changes. For example, MW production varies with fuel usage and increases
as the load increases while decreasing when the load decreases. Even though load
changes occur over a very short time period, they can produce erratic behavior in
the control system. Situations such as these, where there are upsets with known
causes, should be identified. When the cause is unknown, upset conditions are
declared when several points fall outside the control region.
Upsets do not produce a critical state in the operation of a steam turbine. How-
ever, upset conditions in certain types of processing units can and do produce crit-
ical or dangerous situations. For example, reconsider the chemical process example
of section 6.2, where a product is produced that becomes hazardous when a partic-
ular component increases beyond a specified level. One would react immediately to
any indication of process movement in this system due to the severe consequences
of ignoring it. The determination of the occurrence of a signal depends on many
factors. We recommend thorough understanding of the process, determination of
6.4. T2 Charts with Known Parameters 105
the critical components, and a risk analysis to determine the consequences of the
actions to be taken.
6.4 T2 Charts with Known Parameters

Sometimes the parameters of the underlying MVN distribution for the process
data are known. This may occur because extensive past information is available
about the operations of the process. For example, known mean vectors can arise
in an industrial setting where "old" steady-state processes are studied for extended
periods of time. However, our experience indicates that this seldom occurs in
practice. Nevertheless, it is useful to consider the case where these parameters are
known, as the results provide information on the use and application of the limiting
distribution of the T2 statistic.
If the parameters of the underlying MVN distribution are known, the T2 value
for a single observation vector in a Phase II operation is computed using
The probability function used to describe this T2 statistic is the chi-square distri-
bution with p degrees of freedom as given in (2.13). This is the same distribution
used to purge outliers when constructing the HDS. For a given value of a, the UCL
is determined as
where xfa „) ig the upper ath quantile of % 2 . In this case, the control limit is
independent of the size of the HDS.
To illustrate the procedure for signal detection in the known parameter case,
consider a bivariate industrial process. A sample of 11 new observations and their
mean corrected values are presented in Table 6.3. The purpose of the control
procedure is to maintain the relationship between the two variables (xi,x-z) and to
guarantee that the two variables stay within their operational ranges. The mean
vector arid covariance matrix are given as
and
The T2 chart for the data in Table 6.3 is presented in Figure 6.5. Letting
a = 0.05, observations 1 and 10 produce signals since their T2 values are above the
UCL = 4,05,2) = 5-99.
The occurrence of situations where the distribution parameters are known is rare
in industry. Processing units, especially in the chemical industry, are in a constant
state of flux. A change in the operational range of a single variable of a process
can produce a ripple effect throughout the system. Many times these changes
Table 6.3: New observations from an industrial process.

Obs. No. XI X2 xi - M i * X2 - M 2 *
1 150.5 204.0 4.64 4.96
2 147.6 201.3 1.74 2.26
3 143.9 199.3 -1.97 0.26
4 147.9 198.3 2.03 -0.74
5 144.3 198.5 -1.56 -0.54
6 147.6 196.3 1.74 -2.74
7 149.7 197.3 3.83 -1.74
8 148.2 195.8 2.33 -3.24
9 140.5 202.1 -5.47 3.06
10 137.3 202.5 -8.56 3.46
11 147.1 194.0 1.24 -5.04
>i = 145.86; /^ = 199.04.
Figure 6.5: T2 chart for industrial process data.
are initiated through pilot-plant studies, research center studies under controlled
conditions, or from data obtained through other types of experiments. This requires
constant updating of the baseline conditions, which in turn demands the use of new
estimates of the parameters. Managers, performance engineers, process engineers,
and the operators are constantly striving to improve the performance of the unit.
There is no status quo.
6.5 T2 Charts with Subgroup Means

When the process observations at a given time point are made on a subgroup
consisting of m vectors instead of only on a single vector, the mean of the subgroup
is used in computing the charted value. If the individual vectors are described by a
6.5. T2 Charts with Subgroup Means 107
p-variate normal distribution Np(p,: S), we are assured that the mean vector X of
a sample of m observations is distributed as a p-variate normal Np(n, E/m) with
the same mean vector [i as an individual observation, but with a covariance matrix
given by E/ra. If the individual observation vectors are not multivariate normally
distributed, we are assured by the central limit theorem that as m increases in size,
the distribution of X becomes more like that of an Np(n, E/ra). This produces the
following changes in the charting procedure.
When the parameters of the underlying MVN distribution are known, the T2
statistic for the iih subgroup mean Xi is computed by
and the UCL for a given a is determined by using (6.5). The control limit is
independent of the sample size of either the subgroup or the HDS.
When the parameters of the underlying MVN distribution are unknown, the T2
statistic for the ith sample mean Xi is computed as
where X and S are the common estimates of JJL and E obtained from the HDS. The
distribution of a T2 statistic based on the mean of a subgroup of m observation
vectors is given in (2.17). For a given a, the UCL for use with the statistic given
in (6.7) is computed as
where n is the size of the HDS.

To illustrate this procedure, we use the set of 21 observations on the electrolyzer
data given in Table 4.6 in section 4.5 as an HDS. We calculate X and S from this
data. Although overall performance of an electrolyzer is judged by the average
efficiency of the 11 cells that compose it, performance is monitored by sampling
only 4 of the individual cells. Replacement of the electrolyzer occurs when its
monitored efficiency drops below the specified baseline values.
Table 6.4 contains data from a sample of m — 4 cells taken from six different
electrolyzers. Four variables are observed on each cell along with two composite
gas samples from all 11 cells. Observations on C>2 and Cl2 are composite (average)
gas samples from all 11 cells, including the four sampled cells.
A UCL using a = 0.01 is calculated from (6.8) as
The T2 values for the average of the four cells for each electrolyzer are listed in
Table 6.5. When compared to UCL = 10.28, we conclude that electrolyzers 573
and 963 are to be removed from service and refurbished.
Table 6.4: Electrolyzer data.

Elect No. Cell No. NaOH NaCl Ii 12 02 C12
573 1 126.33 190.12 0.20 6.32 1.47 98.10
5 137.83 188.68 0.22 14.28 1.47 98.10
6 124.30 199.23 0.34 15.56 1.47 98.10
7 131.72 191.00 0.27 7.55 1.47 98.10
AVG. 130.04 192.26 0.26 10.93 1.47 98.10
372 3 122.17 191.72 0.14 1.37 1.21 98.36
5 129.13 193.85 0.13 0.96 1.21 98.36
7 128.83 188.87 0.15 0.81 1.21 98.36
10 146.93 184.57 0.23 0.60 1.21 98.36
AVG. 131.77 189.75 0.16 0.94 1.21 98.36
834 1 128.17 192.43 0.12 1.13 1.12 98.43
2 129.17 191.15 0.06 1.19 1.12 98.43
8 138.42 180.13 0.18 1.08 1.12 98.43
10 141.73 176.25 0.20 1.01 1.12 98.43
AVG. 134.37 184.99 0.14 1.10 1.12 98.43
1021 2 141.40 197.89 0.11 2.92 1.64 97.91
5 137.29 201.25 0.18 9.01 1.64 97.91
9 144.00 194.03 0.10 0.30 1.64 97.91
11 139.20 190.21 0.17 5.82 1.64 97.91
AVG. 140.47 195.84 0.14 4.51 1.64 97.91
963 3 118.40 215.24 0.27 21.60 1.71 97.82
5 144.19 197.75 0.13 0.78 1.71 97.82
10 128.10 205.33 0.34 10.94 1.71 97.82
11 128.01 173.79 0.19 0.77 1.71 97.82
AVG. 129.68 198.03 0.23 8.52 1.71 97.82
622 3 136.81 201.74 0.19 3.03 1.38 98.18
4 133.93 198.98 0.15 2.38 1.38 98.16
6 140.71 195.19 0.19 0.92 1.38 98.16
9 140.40 192.03 0.26 0.62 1.38 98.16
AVG. 137.96 196.98 0.20 1.74 1.38 98.18
Table 6.5: T2 values for sample means for electrolyzer data.
Electrolyzer No.T2 Value for Mean Vector

573 46.006*
372 7.391
834 8.693
1021 3.955
963 19.749*
622 0.803
"Exceeds UCL = 10.28.
6.6 Interpretive Features of T2 Charting

What do we expect to see in a T2 chart? The first part of the answer to this question
is easy. Under ideal conditions, the T2 chart will exhibit the same randomness
as one might expect to see in any type of charting procedure. For example, in
a univariate Shewhart chart, one expects to see a predominance of the plotted
6.6. Interpretive Features of T2 Charting 109
Figure 6.6: T2 chart for mercury cell data.
points distributed randomly about the center line. This occurs because, for a
normal distribution, approximately 68% of the observations are contained within
one standard deviation of the mean (centerline). Does something similar occur in
a T2 chart? The answer is no, but it is not emphatic.
In some types of industries, T2 charts are often unique and can be used to
characterize the behavior of the process. Close study of the plotted statistic can
produce valuable insight on process performance. Upset conditions, after they
occur, become obvious to those involved. However, process conditions leading to
upsets are not as obvious. If so. there would be few upsets. If the precursor
conditions can be identified by examining the T2 plot, sometimes it is possible to
avoid the upset.
Figure 6.6 is the plotted T2 statistic for a control procedure on a mercury cell (Hg
cell), which is another type of processing unit used to produce chlorine gas and caus-
tic soda. Seven process variables are observed simultaneously in monitoring the per-
formance of the cell. The plotted T2 statistics over the given time period illustrated
in Figure 6.6 indicate a very steady-state iri-control process relative to the baseline
data set. There is very little change in the pattern of the T2 statistic. The UCL, as
determined from the HDS, has a value of 18.393; however, the values of the plotted
T2 statistic are consistently located at a substantial distance below this value.
Any erratic or consistent movement of the observed T2 values from the estab-
lished pattern of Figure 6.6 would indicate a process change. Figure 6.7 illustrates
such a condition, where the T2 values are increasing towards the control limit. Had
process intervention been initiated around observation 1000, it may have been pos-
sible to prevent the upset conditions that actually occurred at the end of the chart.
Of course, one needs tools to determine what variable or group of variables is the
precursor of the upset conditions. These will be discussed in Chapter 7.
Figures 6.6 and 6.7 present another interesting T2 pattern. Notice the running
U pattern contained in both charts. This is due to the fluctuation in the ambient
Figure 6.7: T2 chart with precursor to upset.
Figure 6.8: T2 chart for glass furnace data.
conditions from night to day over a 24-hour period, and it represents a source of
extraneous variation in the T2 charts. Such variation can distort the true rela-
tionships between the variables, and can increase the overall variation of the T2
statistic. For example, the cluster of points between observations 800 and 900 in
Figure 6.7 are U-shaped. Do they represent a process change or a change in ambi-
ent conditions? Removing the effect of ambient conditions would produce a clear
process picture.
Another example of a T2 chart is presented in Figure 6.8. Given are the T2
values for data collected on 45 process variables measured in the monitoring of a
furnace used in glass production. Observe the steady-state operating conditions of
6.7. Average Run Length (Optional) 111
the process from the beginning of the charting to around observation 500. This
pattern reflects a constant trend with minimal variation. After observation 500,
the T2 values exhibit a slow trend toward the UCL with upset conditions occurring
around observations 500, 550, 600, and 650. Corrections were made to the process
and control regained at about observation 650. However, note the increase in
variation of the T2 values and some signals between observations 650 and 1350.
The T2 plot flattens out beyond this point and the steady-state pattern returns.
These examples illustrate another important use of the T2 control chart. Af-
ter the trends in the T2 chart have been established and studied for a process,
any deviation from the established pattern indicates some type of process change.
Sometimes the change is for the better, and valuable process knowledge is gained.
Other times, the change is for the worse, and upset conditions occur. In either case,
we recommend the investigation of any change in the plotted T2 values. Using this
approach, expensive upsets that lead to chaotic conditions can be avoided.
6.7 Average Run Length (Optional)

The average run length (ARL) for a control procedure is defined as
where p represents the probability of being outside the control region. For a process
that is in control, this probability is equal to a, the probability of a Type I error
(see section 6.2). The ARL has a number of uses in both univariate and multi-
variate control procedures. For example, it can be used to calculate the number
of observations that one would expect to observe, on average, before a false alarm
occurs. This is given by
Another use of the ARL is to compute the number of observations one would
expect to observe before detecting a given shift in the process. Consider the two
univariate normal distributions presented in Figure 6.9. One is located at the
center line (CL) and the other is shifted to the right and located at the UCL. The
probability of detecting the shift (i.e., the probability of being in the shaded region
in Figure 6.9) equals (1 — /3), where /3 is the probability of a Type II error (see
section 6.2). Given the shift, this probability can be determined using standard
statistical formulas (e.g.. see Montgomery (2001)). The ARL for detecting the shift
is given by
From Chapter 5, we recognize that the probability (1 — (3) represents the power
of the test of a statistical hypothesis that the mean has shifted. This result produces
another major use of the ARL, which consists of comparing one control procedure
to another. This is done by comparing the ARLs of the two procedures for a given
process shift.
Shifts and the probability of detection, (1 — /3), are easy to compute in the uni-
variate case. However, it is more difficult to do these calculations in the multivariate
Figure 6.9: Univariate process shift.
Figure 6.10: Multivariate process shift.
case. Consider a bivariate control region and a mean shift of the process as repre-
sented in Figure 6.10. We assume that the covariance matrix has not changed and
remains constant for the multivariate distributions. Hence, the orientation of the
control region for the shifted distribution is the same as that of the control region
for the in-control process. The area of the shifted distribution that corresponds
to the shaded region in Figure 6.10 equals the probability (1 — /?) of detecting the
shift. This probability can be computed analytically for p = 2, but becomes very
6.7. Average Run Length (Optional) 113
Figure 6.11: Control region and chi-square distribution.
difficult for higher dimensions. However, using additional statistical theory and the
nonnull distribution of the T2 statistic, the problem can be simplified.
Suppose the parameters, \ix and S, of the MVN distribution are known. The
T2 control region for an in-control observation vector X is described by a chi-
square distribution (see section 6.4) and can be compared to the UCL based on
that distribution; i.e.,
In terms of the chi-square distribution, the control region is represented by the

shaded region in Figure 6.11. This particular distribution is known as a central
chi-square. This is due to the fact that the vector (X — }JLX} is described by an
MVN distribution with a mean vector of zero.
If one considers an observation vector Y from another MVN distribution with
the same covariance matrix but with a different mean vector jjLy. its T2 value is
given by
but it cannot be described by the central chi-square distribution. This is because the
MVN that describes the vector (Y — /i x ) has a mean different from zero. However,
we can determine the mean of the normal vector (Y — /j,x) in terms of /j,x and ^y.
Consider
where 6 = (/j,y — IJLX} represents the mean shift. With this result, the distribution
of T2 is given by
Figure 6.12: Shifted chi-squared distribution.
where x'(^ \\ is a noncentral chi-square distribution with p degrees of freedom. A

major difference between this distribution and the central chi-square is the addi-
tional parameter A, labeled the noncentrality parameter. It can be expressed as
Changing the noncentrality parameter produces changes in the distribution. For

example, the mean of a central chi-square is given by the degrees of freedom, p,
whereas for the noncentral distribution, the mean equals (p + A). Similarly, the
variance changes from 2p for the central chi-square to (2p + 4A) for the noncentral
chi-square. As A becomes large, the differences in the two distributions are signifi-
cant, but as A approaches zero, the noncentral distribution approaches the central
chi-square.
Our interest lies in computing the probability of detecting a given shift in the
process. This is accomplished by computing probabilities from the noncentral chi-
square. Another representation of the shift depicted in Figure 6.10 is given in
terms of the central and noncentral chi-square rather than in terms of normal dis-
tributions. This is represented in Figure 6.12. The central chi-square distribution
represents the in-control situation and the noncentral chi-square distribution rep-
resents the distribution shifted to the right. It can be shown that the shaded area
(1 — /3) under the noncentral chi-square distribution in Figure 6.12 is the same as
the shaded region of Figure 6.10.
For univariate control procedures, shifts of the process distribution are usually
expressed in terms of the standard deviation. This is not feasible with multivari-
ate processes, since shifts involve more than one variable. Shifts of multivariate
processes are expressed in terms of values of the noncentrality parameter A, which
is a function of both the mean distance between the two distributions and the
covariance matrix S.
6.8. Plotting in Principal Component Space (Optional) 115
To determine the probability of detecting a given shift, the noncentral chi-square

distribution must be evaluated above the UCL. This integral is given as
where w ~ X 2 p + 2?) , and where UCL = X?a.p,xy

A similar discussion can be presented for a Phase II operation where the param-
eters are unknown. The distributions involved are the central F distribution for the
in-coritrol situation and the noncentral F for the shifted distribution. Again, shifts
are measured in terms of the same noncentrality parameter A. The probability of
detecting various shifts (A) for a given sample size n, significance level a, and num-
ber of variables p can be determined by integrating the noncentral F distribution
similar to (6.9).
The shift studied in this section and depicted in Figure 6.10 is a simple mean
shift. Unfortunately, not all shifts are of this type. For example, in a multivariate
setting, there can be simultaneous shifts of the mean vector and covariance matrix.
Computing the probability of such shifts is much more complicated than the above
examples.
6.8 Plotting in Principal Component Space

(Optional)
In Chapter 4, we introduced PCA as a tool used in the location of singularities
in the covariance matrix. In this section, we expand the use of principal compo-
nents to include a charting procedure for the T2 statistic. This use requires more
mathematical rigor than previously given. The plotting of T2 values in a princi-
pal component space can provide additional insight into process performance for
certain types of problems.
Consider a process generated from a bivariate normal distribution with known
parameters /j, and S. For an observation vector X, control of the process is main-
tained when the T2 statistic is less than the UCL; i.e.,
Recall from Chapter 2 that the curve T2 = UCL establishes an elliptical con-
trol region. An example of such a control region, where x\ and x% are positively
correlated, is illustrated in Figure 6.13.
A number of observations can be made about the control region represented in
Figure 6.13. It is referenced by three different coordinate systems. The first is the
variable space, represented by (.TI, x ^ ) . This is obtained by expanding (6.10) as a
function of x\ and x2 and constructing the graph.
If we standardize x\ and £2, using the values
Figure 6.13: Control ellipse for a bivariate system.
we obtain the translated axes (yi, 7/2) located at the center of the ellipse in Figure
6.13. The T2 statistic in terms of y\ and 7/2 takes the form
r2 = (x - ^'T,-I(X -n} = Y'P~IY,

where Y' = (yi, 2/2) and P is the correlation matrix for X. In the (yi, 7/2) space the
ellipse remains tilted as it was in the (xi, #2) space since the correlation between
yi and 7/2 is the same as the correlation between x\ and x^.
The third set of axes in Figure 6.13 is obtained using the transformation
Under this transformation, the T2 has the form
where Z' = (21,22) and the matrix A is a diagonal matrix with the eigenvalues of
P along the diagonal.
The above rotation of the (xi, x^} space to the (21, 22) space removes the depen-
dency (correlation) between x\ and x^. In the (z\, z^) space, the elliptical control
region is not tilted, since z\ and z2 are independent. Further, the z\ and z^ values
are expressed as linear combinations of y\ and y^ and, hence, ultimately as linear
combinations of x\ and x^. As such, these variables are the principal components
of the correlation matrix for x\ and x 2 . If x\ and x% are not standardized, the
z\ and z<2 variables are the principal components of the corresponding covariance
matrix E and will have a different representation. For purposes of plotting, it is
usually best to use the correlation matrix.
6.8. Plotting in Principal Component Space (Optional) 117
Figure 6.14: Principal component control region.
The control region for the T2 statistic can be written in terms of z\ and z^ by
expanding the matrix multiplication of (6.10) to obtain
where p is the population pairwise correlation between x\ and x^. Note that the
eigenvalues of the correlation matrix for x\ and x% are (1 + p) and (1 — p), so the
T2 statistic is now expressed in terms of these variables.
If the equation in (6.13) is set equal to the UCL, it forms a bivariate elliptical
control region that can be used as a charting procedure in the principal component
space, i.e., in terms of z\ and z%. For example, given a bivariate observation (xi, £2),
we can standardize the observations using (6.11). The principal components z\
and £2 are computed using (6.12) and plotted in the principal component space.
Observations plotting outside the elliptical region are out of control, as they do
not conform to the HDS. The point A in the principal component control region
presented in Figure 6.14 illustrates this.
The method of principal component translation can be generalized to the p-
dimensional case. The control region for the T2 statistic can be expressed in terms
of the p principal components of the correlation matrix as
where AI > A2 > > Ap are the eigenvalues of the correlation matrix. Each Zi is
computed as
where Ui: i = 1,2, . . . , p , are the corresponding normalized eigenvectors of the

correlation matrix and Y is the standardized observation vector. Again, note that
each Zi is a linear combination of the standardized process variables.
The graph of (6.14), when set equal to the control limit, is a hyperellipsoid in ap-
dimensional space. However, plotting this region for p > 3 is not currently possible.
As an alternative, one can plot any combination of the principal components in a
subspace of three or fewer dimensions. This procedure also has a major drawback.
Any point (2j, Zj, z^) that plots outside the region defined by (6.14) will produce
a signal, but there is no guarantee that a point plotting inside the region does not
contain a signal on another principal component.
6.9 Summary
Signal detection is an important part of any control procedure. In this chapter, we
have discussed charting the T2 statistic when monitoring a process in a Phase II op-
eration. This includes the charting of the T2 statistic based on a single observation
and the T2 statistic based on the mean of a subgroup of observations. Included are
procedures to follow when the parameters of the underlying MVN distribution are
known and when they are unknown. In both cases it is assumed that the covariance
structure is nonsingular.
Also contained in this chapter is a discussion on determining the ARL for a T2
chart. To calculate an ARL for a given mean shift in a multivariate distribution
involves the introduction of noncentral distributions and the evaluation of some
complicated integrals. For the reader who has a deeper interest in this area, there
are many excellent texts in multivariate analysis on this subject, e.g., Johnson and
Wichern (1998), Fuchs and Kenett (1998), and Wierda (1994).
An optional section on the plotting of the T2 statistic in a principal component
space was presented. As pointed out in the discussion, the procedure has both
advantages and disadvantages. A major advantage is that one can plot and observe
signals on particular principal components in a subspace of the principal compo-
nent space. However, a major disadvantage is that each principal component is a
linear combination of all the process variables. This often inhibits a straightforward
interpretation procedure in terms of the process variables.
Chapter 7
Interpretation of T2 Signals for
Two Variables
7.1 Introduction
Univariate process control usually involves monitoring control charts for location
and variation. For example, one might choose to monitor mean shifts with an X
chart and variation shifts with an R chart, as both procedures are capable of de-
tecting deviations from the historical baseline. In this setting, signal interpretation
is simplified, as only one variable needs to be examined. A signal indicates that
either the process has shifted and/or the process variation has changed.
In multivariate SPC, the situation becomes more complicated. Nonconformity
to a given baseline data set can be monitored using the T2 statistic. If the observed
T2 value falls outside the control region, a signal is detected. The simplicity of the
monitoring scheme, however, stops with signal detection, as a variety of variable
relationships can produce a signal.
For example, an observation may be identified as being out of control because its
value for an individual variable is outside the bounds of process variation established
by the HDS. Another cause of a signal is when values on two or more variables do
not adhere to the linear correlation structure established by the historical data. The
worst case is when the signal is a combination of the above, with some variables
being out of control and others being countercorrelated.
Several solutions have been posed for the problem of interpreting a multivariate
signal. For example, Doganaksoy, Faltin, and Tucker (1991) proposed ranking the
components of an observation vector according to their relative contribution to a
signal using a univariate t statistic as the criterion. Hawkins (1991, 1993) and Wade
and Woodall (1993) separately used regression adjustments for individual variables
to improve the diagnostic power of the T2 after signal detection. Runger, Alt, and
Montgomery (1996) proposed using a different distance metric, and Timm (1996)
used a stepdown procedure for signal location and interpretation. An overview
of several of these multivariate process control procedures, including additional
119
120 Chapter 7. Interpretation of T2 Signals for Two Variables
ones by Kourti and MacGregor (1996) and Wierda (1994), can be found in Mason,
Champ, Tracy, Wierda, and Young (1997). Also, several comparisons are given in
Fuchs and Kenett (1998).
In this chapter, we present a method of signal interpretation that is based on
the orthogonal decomposition of the T2 statistic. The independent decomposition
components, each similar to an individual T2 variate, are used to isolate the source
of a signal and simplify its interpretation. The discussion is limited to a two-variable
problem, as it is the easiest to geometrically visualize. The more general p-variable
case is presented in Chapter 8.
7.2 Orthogonal Decompositions

The typical T2 statistic for an observation vector X' = (xi, X 2 , . . . , xp) is given as
The major purpose of our discussion in this section is to provide a methodology

for interpreting a signal with this statistic. A procedure for achieving this goal
is to decompose or separate the T2 value into additive orthogonal components
that can be related to the p process variables. A close examination of how this is
done will provide insight into the workings and understanding of the proposed T2
decomposition.
Orthogonal decompositions are standard tools in well-known statistical proce-
dures such as the analysis of variance and regression analysis. For example, in
regression analysis, the total sum of squares (i.e., total variation) of the response
variable is separated into two independent additive components. One component,
the regression sum of squares, measures the contribution of the predictor variables
(to the total variation), and the other component, the residual sum of squares,
measures the contribution of the model error (to the total variation). A similar
decomposition approach can be used with the T2 statistic.
Consider a two-variable case where an observation vector is denoted by X' =
(xi^x-z) and has a bivariate normal distribution with unknown mean vector and
covariance matrix. Assume that the two variables are independent so that, in
a sample of n observations, their pairwise correlation is zero. Also assume that
the two individual sample variances, s2 and s 2 , are unequal, and denote the two
corresponding sample means as x\ and x 2 . AT 2 elliptical control region is created
using the following formula:
where c is an appropriately chosen constant that specifies the size of the control
region (see section 6.2). A typical control region is illustrated by the interior of
the ellipse given in Figure 7.1. Any sample point located on the ellipse would be
located the same statistical distance from the sample mean as any other point on
the ellipse.
7.2. Orthogonal Decompositions 121
Figure 7.1: Bivariate independent control region with unequal variances.
There are two additive components to the T2 statistic given in (7.1), and these
provide a natural decomposition of the corresponding statistic. The components,
in fact, are independent due to the independence of the two original x variables.
This property is what causes the ellipse not to be tilted. Since the components are
unequally weighted due to their unequal variances, we will transform the variables
to a form that will provide equal weights. Doing so will produce a circular region
and make the statistical distance, represented by the square root of the T2 in (7.1),
equivalent to the corresponding Euclidean, or straight-line, distance. This is the
view we need in order to interpret the T2 value.
Let
represent the standardized values of x\ and x2j respectively. Using this transfor-
mation, we can re-express the T2 value in (7.1) as
The T2 value is again separated into two independent components as in (7.1), but
now the components have equal weight. The first component, y 2 , measures the
contribution of x\ to the overall T2 value, and the second component, y2: measures
the contribution of x2 to the overall T2 value. Careful examination of the magnitude
of these components will isolate the cause of a signal.
The control region using the transformation given in (7.3) is illustrated by the
interior of the circle depicted in Figure 7.2. A T2 value in this orthogonal trans-
formed space is the same as the squared Euclidean distance that a point (2/1,3/2) is
from the origin (0,0), and it can be represented by the squared hypotenuse of the
Figure 7.2: Bivariate orthogonal control region with equal variances. SD refers to
statistical distance.
enclosed right triangle depicted in Figure 7.2. This is equivalent to the statistical
distance the point (zi,#2) is from the mean vector (x\, £2)- Thus, all points with
the same statistical distance are located on the circle in Figure 7.2, as well as on
the ellipse in Figure 7.1.
Consider a situation where the variables of X' = (xi,x 2 ) are not independent.
Since the pairwise correlation r between x\ and X2 is nonzero, the T2 value would
be given as
and the corresponding elliptical control region would be tilted. This is illustrated
in Figure 7.3. Again, letting yi and y2 represent the standardized values of xi and
a?2, the T2 value in (7.5) can be written as
Unfortunately, as can be seen from examining (7.6), we cannot separate the

contributions of yi and 7/2 to the overall T2 value, since transforming to the stan-
dardized space does not remove the cross-product term; i.e., the transformation
is not orthogonal. Thus, the resultant control region is elliptical and tilted, as is
illustrated in Figure 7.4.
To express the statistical distance in (7.6) as a Euclidean distance so that it
can be visualized, we must first transform the axes of the original (xi, #2) space
7.2. Orthogonal Decompositions 123
Figure 7.3: Bivariate nonindependent control region with unequal variances.
Figure 7.4: Bivariate translated control region.
to the axes of the ellipse. This is not done in either Figure 7.3 or Figure 7.4. For
example, the axes of the ellipse in Figure 7.4 do not correspond to the axes of the
(yii 2/2) space. The axes can only be aligned through an orthogonal transformation.
In this nonindependent case, the transformation in (7.6) is incomplete, as it does
not provide the axis rotation that is needed.
The T2 statistic in (7.6) can be separated into two additive components using
the orthogonal transformation
Figure 7.5: Bivariate principal component control region with unequal variances.
and
As was shown in Chapter 6, the values z\ and z^ are the first and second principal
components of the correlation matrix for x\ and x% (also see the appendix to this
chapter, section 7.10). Using this transformation, we can decompose the T2 value
in (7.6) as follows:
Unfortunately, a graph of the control region of (7.8) in the principal component

space still does not allow easy computation of the statistical distance since the
principal components are unequally weighted. This is illustrated in Figure 7.5 by
the different sizes of the axes of the ellipse. The first principal component has a
weight equal to the reciprocal of AI = (1 + r) and the second principal component
is inversely weighted by A2 — (1 — r). The weights are equal only when r = 0, i.e.,
when the original variables are independent.
The above problem can be circumvented by transforming the z values in (7.7)
to a new space where the variables have equal weights. One such transformation is
and
7.3. The MYT Decomposition 125
Figure 7.6: Bivariate principal component control region with equal variances.
Using this orthogonal transformation, we can express the T2 value in (7.1) as
The resultant control region, presented in Figure 7.6, is now circular, and the
(squared) statistical distance is represented by the hypotenuse of a right triangle.
The transformation given in (7.10) provides an orthogonal decomposition of the
T2 value. Thus, it will successfully separate a bivariate T2 value into two additive
and orthogonal components. However, each w\ and u>2 component in (7.10) is a
linear combination of both x\ and x^. Since each component consists of both vari-
ables, this hampers clear interpretation as to the source of the signal in terms of
the individual process variables. This problem becomes more severe as the number
of variables increases. What is needed instead is a methodology that will pro-
vide both an orthogonal decomposition and a means of interpreting the individual
components. One such procedure is given by the MYT (Mason-Young-Tracy) de-
composition, which was first introduced by Mason, Tracy, and Young (1995).
7.3 The MYT Decomposition

Since the T2 statistic is a sum of squares, there are an infinite number of ways to
separate it into p independent (orthogonal) components. For example, the decom-
position of the T2 statistic into its principal components is one such representation.
As discussed in section 7.2, a major problem with decompositions of this type is
the interpretation of the components. For example, principal components can be
difficult to interpret, as they are linear combinations of the p variables of the ob-
servation vector. The components of the MYT decomposition of the T2 statistic,
in contrast, have global meaning. This is one of the most desirable characteristics
of the method.
We will demonstrate the MYT procedure for a bivariate observation vector X' =
(#1, £2)5 where x\ and x% are correlated. Details on the more general p-variable
case can be found in Chapter 8. The MYT decomposition uses an orthogonal
transformation to express the T2 values as two orthogonal and equally weighted
terms. One such decomposition is given by
where
and
In this formulation, x2.i is the estimator of the conditional mean of x% for a given
value of xi, and s|.i is the corresponding estimator of the conditional variance of
#2 f°r a given value of x\. Details on these estimators are given in the last section
of this chapter.
The first term of the MYT decomposition in (7.11), Tf, is referred to as an
unconditional term, as it depends only on x\. The second term of the orthogo-
nal decomposition, written as T|1? is referred to as a conditional term, as it is
conditioned on the value of x\. Using the above notation, we can write (7.11) as
The square root of this T2 value can be plotted and viewed in the T\ and T^.i space.
This is illustrated in Figure 7.7.
The orthogonal decomposition given in (7.11) is one of two possible MYT de-
compositions of a T2 value for p = 2. The other decomposition is given as
where
and
The representation of T2 given in (7.15) is different from the representation

given in (7.11). The first term of the decomposition in (7.15) is an unconditional
term for the variable x-z, whereas the first term of (7.11) is an unconditional term
for the variable x\. Similarly, the conditional term of (7.15) depends on the condi-
tional density of x\ given #2, while the conditional term of (7.11) depends on the
conditional density of #2 given x\. These two conditional terms are not the same
except in the case where x\ and x% are uncorrelated.
7.4. Interpretation of a Signal on a T Component 127
Figure 7.7: T2 plot for MYT bivariate decomposition.
7.4 Interpretation of a Signal on a T"2 Component

Equations (7.11) and (7.15) represent the two possible MYT decompositions for an
observation vector X' — (a?i, x^}. Considering both decompositions, we have four
unique terms: two unconditional terms given by T2 and T| and two conditional
terms represented by T 2 2 and Tjl. A signaling T2 value can produce large values
on any combination of these terms.
Consider a large value for either the unconditional term
or the unconditional term
An unconditional term is the square of a univariate one-sample t statistic, and it

measures the statistical distance of the observed value Xi from its mean a^. If the
observed value of the variable is out of tolerance (i.e., outside its operational range
as based on the HDS), a signal is obtained on the unconditional term.
Interpretation of a large value on a conditional term is more involved. Consider
the conditional term
which is a measure of the squared statistical distance of the observed value x%

is from the conditional mean £2.1- This distance can be examined graphically.
For example, consider the representation of the control region in the variable space
Figure 7.8: Interpretation of the T221 component.
Figure 7.9: Interpretation of the T 2 2 component.
(xi, # 2 ) presented in Figure 7.8. As the value of x\ changes, so does the value of
X2.i. Consider a fixed value of xi, say x\ = a. This is represented by the vertical
line drawn upwards from the x\ axis. For process control to be maintained at this
value of #1, the corresponding value of x% must come from the shaded interval along
7.5. Regression Perspective 129
the x-2 axis. This means the value of x% must be contained in this portion of the
conditional density, otherwise, a signal will be obtained on the T|: component.
A similar discussion can be used to illustrate a signal on the T^2 term. Suppose
the value of x^ is fixed at a point b and we examine the restricted (conditional)
interval that must contain the observation on x\. This is depicted in Figure 7.9. If
the value of x\ is not contained in the shaded interval, a signal will be obtained on
the T^2 component.
A large value on a conditional term implies that the observed value of one
variable is not where it should be relative to the observed value of the other variable.
Observations on the variables (xi, #2) that produce a signal of this type are said to
be countercorrelated, as something is astray with the relationship between x\ and
X2- Countercorrelations are a frequent cause of a multivariate signal.
7.5 Regression Perspective

As stated earlier, a signal on a conditional term implies something is wrong with
the linear relationships among the involved variables. Additional insight for inter-
preting a signaling conditional term is obtained by examining these terms from a
regression perspective. Note that the line labeled x^.i in Figure 7.8 is the regression
line of x-2 on xi: and likewise, the line labeled x\^ in Figure 7.9 is the regression
line of x\ on x^.
In general, Tf- is a standardized observation on the ith variable adjusted by the
estimates of the mean and variance from the conditional distribution associated
with Xi.j. For the bivariate case, the general form of a conditional term is given as
Consider the estimated mean of x% adjusted for Xj, i.e., Xi.j. This is given as
where Xi and Xj are the sample means of Xi and Xj obtained from the historical
data, and b is the estimated regression coefficient relating Xi to Xj in this data set.
The left-hand side of (7.19) contains Xi.j, which is the predicted value of x^ based
on the corresponding value of Xj (i.e., x^ is the dependent variable and x3 is the
predictor variable). Thus, the numerator of (7.18) is a regression residual; i.e.,
Rewriting the conditional variance as
where JT^.J is the multiple correlation between Xi and x j ; and substituting r^.j for
(xi — Xj.j), we can re-express T? • as
or
We use this notation for consistency with the formula used in the p-dimensional
case discussed in Chapter 8.
Figure 7.10: Residual from regression of x% on x\.
The conditional T2 term in (7.20) explains how well a future observation on

a particular variable is in agreement with the value predicted by the other vari-
able. When the denominator in (7.20) is very small, as occurs when the pairwise
correlation is near 1, we would expect very close agreement between the observed
and predicted Xi values. Otherwise, "largeness" of the conditional T2 term will be
due to the numerator, which is a function of the agreement between the observed
and predicted values of Xj. A significant deviation between these two values will
produce a large T2 term.
As an example, consider an observation (#1, #2) that is inside the Shewhart
control "box" formed using the two variables. Suppose there is significant disagree-
ment in magnitude between an observed x\ value and the corresponding predicted
value obtained using the value of £2 and (7.19). This implies that the observation
on this particular component is below or above what was predicted by the HDS.
To better understand the result in (7.20), consider the two conditional T2 terms
whose square roots are given by
and
where r^.i = (#2 — 2:2.1) and ri.2 = (x\ — #1.2) are residuals from the respective
regression fits of x% on x\ and x\ on x-2- These residuals are illustrated in Figures
7.10 and 7.11.
7.6. Distribution of the T2 Components 131
Figure 7.11: Residual from regression of x\ on x%-
Notice that the two conditional values in (7.21) and (7.22), apart from the
R? j term, are actually standardized residuals having the form of 7Yj/Si. When the
residuals (after standardizing) in Figures 7.10 and 7.11 are large, the conditional T2
terms signal. This would occur only when the observed value of x\ differs from the
value predicted by x^, or the observed value of x% differs from the value predicted
by xi, where prediction is derived from the HDS.
7.6 Distribution of the T"2 Components

Issues pertaining to the largeness of the T2 components of the MYT decomposition
can be resolved by determining the probability of observing specific (or larger)
values for each individual term. In order to do this, we must know the probability
distribution of the individual terms. All terms, both conditional and unconditional,
under the assumption of no signal, are described by an F distribution. For example,
the unconditional terms that are used to determine whether the individual variables
are within tolerance are distributed as
for j = 1,2. Similarly, the conditional terms, T?-, used in checking the linear
relationships between the variables are distributed as
where k equals the number of conditioned variables. When k = 0, the distribution

in (7.24) reduces to the distribution in (7.23). Derivational details are supplied in
Mason, Tracy, and Young (1995). Thus, one can use the F distribution to deter-
mine when an individual unconditional or conditional term of the decomposition is
significantly large and makes a contribution to the signal.
The procedure for making this determination is as follows. For a specified a
level and HDS sample of size n, obtain ^(a,i,n-fc-i) from the appropriate F table.
Compute the UCL for individual terms using
Compare each individual unconditional term of the decomposition to the appropri-

ate UCL. All terms satisfying
imply that the corresponding Xj is contributing to the signal. Likewise, any condi-
tional term greater than its UCL, such as
implies that Xi and Xj are both contributing to the signal.

The above procedure for locating contributing terms of a decomposition for a
signaling observation vector is not exact. To see this, consider the acceptance region
for maintaining control for both conditional and unconditional terms when p = 2.
A MYT decomposition for a T2 value can be represented bv
Using this representation, the T? acceptance region is given by
Likewise, the TjA acceptance region is defined by
The relationship between these regions, as denned by the distribution of the

individual terms and the elliptical control region, is presented in Figure 7.12. This
demonstrates that, when we use the distribution of the individual terms of a T2
decomposition to detect signal contributors, we are approximating the elliptical
control region with a parallelogram. This parallelogram acceptance region is equiv-
alent to the acceptance region specified by a cause-selecting (CS) chart (e.g., see
Wade and Woodall (1993)). The CS method is based on the regression adjustment
of one variable for the value of the other. The same approach is used in construct-
ing the conditional terms of the MYT decomposition. In general, the CS procedure
7.6. Distribution of the T2 Components 133
Figure 7.12: Acceptance regions for T2 and T^.
does not require the assumption of nmltivariate normality. However, without this
assumption, or some distributional assumption, it is difficult to detect a signal.
We would like the parallelogram and ellipse in Figure 7.12 to be similar in size.
The size of the ellipse is determined by the choice of the overall probability, labeled
cci, of making a Type I error or of saying the process is out of control, when in fact
control is being maintained. The size of the parallelogram is controlled by the choice
of the specific probability, labeled a 2 , used for testing the "largeness" of individual
terms. Thus, a-2 represents the probability of saying a component is part of the
signal when in fact it is not. The two a's are not formally related in this situation.
However, ambiguities can be reduced by making the two regions agree in size.
We use the F distributions in (7.23) and (7.24) to locate large values among the
T2 decomposition terms. This is done because, given that an overall signal exists,
the most likely candidates among the unique terms of a total MYT decomposition
are the components with large values that occur with small a priori probabilities.
Our interest in locating the signaling terms of the MYT decomposition is due to
the ease of interpretation for these terms.
Consider an example to illustrate the methodology of this section. For p = 2,
the control region is illustrated in Figure 7.13. Signals are indicated by points A,
B, C. and D. Note that the box encompassing the control region represents the
tolerance on variables x\ and x^ for a = 0.05, as specified by (7.26). The tolerance
regions are defined by the Shewhart control limits of the individual variables for
the appropriate a level.
Figure 7.13: T2 control region with four signaling points.
Table 7.1: Decomposition summary for bivariate example.

2
Point T2 Value 1? TJ
2
T
J
2
1.2
T
J
2
2.1
A 10.05* 2.78 10.03* 0.02 7.27*
B 6.33* 0.11 3.49 2.83 6.22*
C 6.63* 3.01 0.34 6.29* 3.62
D 9.76* 2.54 1.73 8.03* 7.22*
* Denotes significance at the 0.05 level.
A summary of the T2 decomposition values for the four signaling points is

presented in Table 7.1. Observation A produces a large value on the unconditional
term T|, since the observation on x-2 is out of tolerance. The T|x term for this
point also is large, since x 2 is not contained in the conditional range of x% given
xi. This is evident in the distance A is from the regression line denoted as x 2 .i.
Likewise, observation B produces a large value for its T22 x term since the observation
is a great distance from the same regression line. However, its two unconditional
terms are not significant, since individually the variables are within the box region
Observation C has acceptable values on both unconditional terms, since both
variables are within tolerance. However, a large value is produced on T 2 2 , as the
observed x\ is not contained in the conditional distribution of x\ given x 2 . Again,
note the extreme distance that point C is from the regression line, labeled 0)1.2,
of x\ on x 2 . Observations on both variables are within tolerance for point D, but
observations on either variable are not where they should be relative to the position
of the other variable. This produces large values for the two conditional terms.
7.7. Data Example 135
Figure 7.14: T2 chart for the boiler data.
7.7 Data Example

In this section, we consider a more detailed example to demonstrate the tech-
niques developed for signal interpretation and also to reemphasize previously cov-
ered points in the development of a control procedure. As the production unit, we
examine a boiler used to produce steam for industrial use. Two of the variables
used in the control procedure that monitor boiler efficiency are fuel usage (i.e., fuel)
and steam flow (i.e., stmflow). The T2 values for an HDS consisting of 500 data
points are presented in Figure 7.14. We now examine the HDS from a statistical
perspective.
The T2 chart is indicative of the performance of the boiler. All T2 values are
below the UCL value of 9; however, there appears to be a certain amount of variation
and several step changes in the values. Note the consistency of the T2 values up to
observation 250, the higher values between observations 250 and 350, and the lower
values thereafter. Later investigation will show that the higher values, from 250 to
350, are caused by a period of consistent low steam production. The unusual spikes
are produced when a rapid change in the production rate occurs. This becomes
apparent when the two individual variables are plotted in time sequence as shown
in Figure 7.15. The vertical axis is labeled "Units." Fuel usage and steam flow are
measured in different units. However, the values are close enough together that one
axis can be used for both measurement systems.
Close examination of Figure 7.15 reveals the strong relationship between the two
variables. Fuel usage is always higher than the corresponding steam flow; however,
the distance between the two points increases with increases in fuel usage. This is
more readily apparent when only the first few points are examined, as shown in
Figure 7.16.
A comparison of the time-sequence chart in Figure 7.15 with the T2 chart in
Figure 7.14 reflects close similarity. Low values on the time-sequence chart for
the two variables correspond to high values on the T2 chart, and high values on
the time-sequence chart correspond to low T2 values. This is because the HDS is
Figure 7.15: Time-sequence chart for fuel usage and steam.
Figure 7.16: Time-sequence chart for first few points of boiler data.
dominated by moderate-to-high values of the two variables, making the low values
farther from the mean vector. Since the T2 statistic is a squared number, values
far from the mean are large and positive.
A Q-Q plot of the 500 ordered T2 values versus the corresponding beta values
for the boiler historical data is presented in Figure 7.17. The upper four points in
the graph correspond to the four T2 values of Figure 7.14 that are greater than
or equal to a value of 8. Although the linear trend for these four points is not
consistent with the linear trend of the other 496 points, the deviation is not severe
enough to disqualify the use of the T2 statistic from detecting signals in the Phase
II operation.
Beta Quantiles
Figure 7.17: Q-Q plot of T values for the boiler data.
Table 7.2: Summary statistics for boiler HDS.

Fuel Steam
Mean 374.02 293.32
Minimum 186.46 138.87
Maximum 524.70 412.53
Std Dev 98.37 77.96
Summary statistics for the boiler HDS are presented in Table 7.2. The minimum
and maximum values of the variables give a good indication of the operational
ranges on the two variables. For example, fuel usage in the HDS ranges from a low
of 186.46 units to a high of 524.70 units.
With statistical understanding of the boiler system through examination of the
HDS, we are ready to move to signal interpretation. The control region is pre-
sented in Figure 7.18 in the variable space of fuel usage and steam flow. Also
included are three signaling points, designated as points 1, 2, and 3. Examination
of the signals in this graphical setting provides insight as to how the terms of the
MYT decomposition identify the source of the signal and how the signals are to be
interpreted.
For example, note the (Euclidean) distance point 1 is from the body of the data
(HDS). Also, note the "closeness" of points 2 and 3 to the control region, especially
point 2. This leads one to think that the signal for point 1 is more severe than
the signals for the other two points. However, this is not the case. Observe the
T2 values presented in Table 7.3 for the three signaling points. Point 3 clearly
has the largest T2 value, while point 2 is the smallest of the three signaling T2
values. To understand why this occurs, we need to examine the values of the MYT
decomposition terms that are presented in Table 7.4.
Since there are only two variables, the three signaling points can be plotted in
either the (7\, T 2 .i) space or the (T2, 7Y2) space. A representation in the (Ti,
T 2 .i) space is presented in Figure 7.19. Geometrically, the circle in Figure 7.19
Figure 7.18: Control region in variable space for boiler data.
Table 7.3: T2 values for three signaling points.
Obs. No. T2 Value

1 13.25
2 10.97
3 405.76
* Significant at a = 0.01,
UCL = 9.33.
Table 7.4: MYT decomposition terms of three signaling points

2
Obs. No. 1? T
J
2
T
J
2
1.2 T
J
2
2.1
1 10.70* 11.24* 2.01 2.54
2 0.07 0.01 10.96* 10.90*
3 4.18 0.92 404.84* 401.58*
represents a rotation of the elliptical control region given in Figure 7.18. In the
transformed space, the T2 statistic is represented by the square of the length of the
arrows designated in the plot. The UCL of 9.33 defines the square of the radius
of the circular control region. The coordinates of point 1 in Figure 7.19 are (3.27,
1.57). The sum of squares of these values equals the T2 value of point 1, i.e.,
The coordinates of point 2 are (1.3, 5.05), and those of point 3 are (2.04, 20.01).
Scalewise, point 3 would be located off the graph of Figure 7.19. However, this
Figure 7.19: Control region in (Ti, T2.i) space for boiler data.
Table 7.5: Three out-of-control points.
Obs. No. Fuel Steam

1 52.21 31.95
2 400.00 300.00
3 172.83 218.41
representation is sufficient to demonstrate that point 1 is much closer to the control

region than the other two points.
Signals on both unconditional terms for point 1 indicate the observed values of
fuel usage and steam to be beyond the operational range of the data as specified
by the HDS. This is indeed the case, as the observation was taken when the boiler
was in an idling mode, so that there was no demand for steam production. In this
mode, a minimum amount of fuel is still supplied to the boiler, as it is less expensive
to let the unit idle than to start a cold boiler. The observed fuel and steam values
for all three points are presented in Table 7.5. Comparison of Table 7.5 with the
summary statistics of Table 7.2 shows point 1 to be well beyond the operational
range of the variables as specified by the HDS.
The T2 signal on point 2 is due to an incorrect relationship between the two
variables. Both the T|x and 7\22 terms produced signals. This is because the fuel
value (/) is not where it should be relative to the value of steam (s). To see this,
consider the regression lines of fuel on steam (i.e., / = 4.43 + 1.26*s), and of steam
on fuel (i.e., s = —2.65 + 0.79*/) as derived from the HDS. The predicted value of
fuel for the given value of steam for point 2 is
The corresponding observed fuel value of 400.00 is too large for this value of steam.
Likewise, the difference between the actual steam value and the predicted steam
Observation Number
Figure 7.20: Time-sequence graph of three signaling points.
value for this point is too large. The predicted steam value is
s = -2.65 + 0.79(400.00) = 313.74.
When compared to the actual value of 300.00, the residual of 13.74 steam units is
too large to be attributed to random fluctuation.
Point 3 also has T2 signals on both the conditional terms. This indicates that
the linear relationship between the two variables is astray. The reason for this can
best be seen by examining the time-sequence graph presented in Figure 7.20 for the
three signaling points. These were derived from the observation plot of the HDS
in Figure 7.16, where it was established that fuel must be above the corresponding
value of steam. For point 3 in Figure 7.20, the relationship is reversed as the value
of steam is above the corresponding fuel value. This counterrelationship produces
large signaling values on the two conditional T2 terms.
7.8 Conditional Probability Functions (Optional)

A bivariate normal distribution is illustrated in Figure 7.21. This is the joint
density function of the observation (x, y), and it is denoted by f ( x , y ) . It explains
the behavior of the two variables as they jointly vary. For example, it can be used
to determine the probability that x < a and, at the same time, that y < 6; i.e.,
P(x < o, y < b).
The conditional density of x given ?/, denoted by f ( x \ y ) , is used for a different
purpose. Its use is to describe the behavior of x when y is fixed at a particular
value. For example, it is used to determine the probability that x < a, given that
y = b] i.e., P(x < a\y = b). It can be observed geometrically by passing a plane
7.8. Conditional Probability Functions (Optional) 141
Figure 7.21: Example of a bivariate normal distribution.
Figure 7.22: Example of conditional densities of x given y.
through the joint density at the fixed value of y, i.e., at y = b. This is illustrated
in Figure 7.22 for various values of the constant b.
For the MVN distribution with p = 2, the conditional density of x given y is
Close examination of (7.27) reveals the conditional density of x given y to be normal,

with a conditional mean [ix\y and conditional variance a\ given by
Figure 7.23: Regression of x on y.
and
Examination of (7.28) reveals that (j,x\y depends on the specified value of y. For
example, the conditional mean of the distribution of x for y = b is given as
For various values of the constant b (i.e., for values of y), it can be proven that the
line connecting the conditional means (as illustrated in Figure 7.23) is the regression
line of re on y. This can also be seen in (7.28) by noting that the regression coefficient
(3 (of x on y] is given by
Thus, another form of the conditional mean (of the line connecting the means of
the conditional densities) is given by
In contrast to the conditional mean, the conditional variance in (7.29) does not
depend on the particular value of y. However, it does depend on the strength of
the correlation, p, existing between the two variables.
7.9 Summary
In this chapter, we have discussed the essentials for using the MYT decomposition in
the interpretation of signals for a bivariate process. We have shown that a signaling
T2 value for a bivariate observation vector has two possible MYT decompositions,
7.10. Appendix: Principal Component Form of T2 143
or
Each of the decompositions consists of two independent components. The un-

conditional terms T2 can be used to check the tolerance of the individual variables,
and the conditional terms T2j can be used to check the linear relationship be-
tween the two variables. Each component can be described by an appropriate F
distribution.
Using this methodology, the signal of a T2 statistic can be separated into two
orthogonal components. A large value for an unconditional term implies that the
designated variable is out of tolerance. A large value on a conditional term implies
that a wrong linear relationship exists between the observations on the variables.
Thus, this procedure provides a powerful tool for signal determination in terms of
the two process variables.
In the next chapter, we extend this methodology to the general case involving a
p-variable process. In addition, a quick and efficient computing scheme is developed
for locating the signaling terms.
7.10 Appendix: Principal Component Form of T2

The T2 statistic can be expressed as a function of the principal components of the
estimated covariance matrix (e.g., see Jackson (1991)). The formulas are similar
to those presented in section 6.8 for the principal components of the population
correlation matrix. For example, an alternate form of the T2 statistic is given by
where \i > \2 > • • • > Xp are the eigenvalues of the estimated covariance matrix S
and the z%, i = 1 , . . . ,p, are the corresponding principal components. A principal
component is obtained by multiplying the vector quantity (X — X) by the transpose
of the normalized eigenvector Ui of S corresponding to A$; i.e.,
Each Zi is a scalar quantity, and the T2 statistic is expressed in terms of these

values.
The representation in (A7.1) is derived from the fact that the estimated co-
variance matrix S is a positive definite symmetric matrix. Thus, its singular value
decomposition is given as
where U is a p x p orthogonal matrix whose columns are the normalized eigenvectors

Ui of 5, and A is a diagonal matrix whose elements are the corresponding eigenvalues
\i of S. In matrix notation, these matrices are given by
Note that
Substituting this quantity into the T2 statistic of (A7.1), we have
where Z = U'(X - X] and Z' = (zi, z2,•.., zp).

A Hotelling's T2 statistic for a single observation also can be written as
where R is the estimated correlation matrix and Y is the studentized observation

vector of X, i.e.,
where r^ = corr(xi,Xj) and
The matrix R (obtained from S) is a positive definite symmetric matrix and can
be represented in terms of its eigenvalues and eigenvectors. Using a transformation
similar to (A7.2), the above T2 can be written as
where Wi, W<2,..., Wp are the principal components of the correlation matrix R
and the 7^ are the eigenvalues of R. The principal component values are given by
Wi = V{ (X — X}, where the Vi are the normalized eigenvectors of R.
Equation (A7.4) is not to be confused with (A7.1). The first equation is written
in terms of the eigenvalues and eigenvectors of the covariance matrix, and the second
is in terms of the eigenvalues and eigenvectors of the estimated correlation matrix.
7.10. Appendix: Principal Component Form of Tz 145
Figure A7.1: Bivariate control region.
These are two very different forms of the same Hotellirig's T 2 , as the mathematical
transformations are not equivalent. Similarly, (A7.4) should not be confused with
(6.13). The equation in (6.13) refers to a situation where the correlation matrix is
known, while (A7.4) is for the case where the correlation matrix is estimated.
The principal component representation of the T2 plays a number of roles in
multivariate SPC. For example, it can be used to show that the control region is
elliptical in shape. Consider a control region defined by a UCL. The observations
contained in the HDS have T2 values less than the UCL; i.e., for each X1t,
Thus, by (A7.4),
The control region is defined by the equality
which is the equation of a hyperellipsoid in a p-dimensional space, provided the

/
yl are all positive. The fact that the estimated correlation matrix R is a positive
definite matrix guarantees that all the 7^'s are positive.
A geometrical representation of a T2 bivariate control region, when ^ and S
are unknown, is given in Figure A7.1. The elliptical region is formed using the
algebraic expression of the T2 statistic and is given by
Substituting y\ and y^ for the standardized values of x\ and #2, we have
and
In the principal component space of the estimated correlation matrix, this re-
duces to
which gives the equation of the control ellipse. The length of the major axis of the
ellipse in (A7.5) is given by 71, and the length of the minor axis is given by 72-
The axes of this space are the principal components, w\ and w^. The absence of
a product term in this representation indicates the independence between wi and
W2- This is a characteristic of principal components, since they are transformed to
be independent.
Assuming that the estimated correlation r is positive, it can be shown that
71 = (1 + r) and 72 = (1 — r). For negative correlations, the 7$ values are reversed.
One can also show that the principal components can be expressed as
From these equations, one can obtain the principal components as functions of the
original variables.
Chapter 8
Interpretation of T2 Signals for
the General Case
8.1 Introduction
In this chapter, we extend the interpretation of signals from a T2 chart to the
setting where there are more than two process variables. The MYT decomposition
is the primary tool used in this effort, and we examine many interesting properties
associated with it. For example, we show that the decomposition terms contain
information on the residuals generated by all possible linear regressions of one
variable on any subset of the other variables. In addition to being an excellent
aid in locating the source of a signal in terms of individual variables or subsets of
variables, this property has two other major functions. First, it can be used to
increase the sensitivity of the T2 statistic in the area of small process shifts (see
Chapter 9). Second, the property is very useful in the development of a control
procedure for autocorrelated observations (see Chapter 10).
8.2 The MYT Decomposition

The general MYT decomposition procedure is outlined below. The T2 statistic for
a p-dimensional observation vector X' = (x±, x % , . . . , xp) can be represented as
Suppose we partition the vector (X — X^ as
where X^p~1' = (xi, #2, • • • ,£p-i) represents the (p — l)-dimensional variable

p
vector excluding p and X^
the ~^ represents the corresponding p x
pth variable —I
147
148 Chapter 8. Interpretation of T2 Signals for the General Case
elements of the mean vector. Suppose we similarly partition the matrix S so that
where Sxx 'ls the (p— 1) x (p— 1) covariance matrix for the first (p — 1) variables,
Sp is the variance of xp, and sxx is a (p — l)-dimensional vector containing the
covariances between xp and the remaining (p — 1) variables.
The T2 statistic in (8.1) can be partitioned into two independent parts (see
Rencher (1993)). These components are given by
The first term in (8.3),
uses the first (p — 1) variables and is itself a T2 statistic. The last term in (8.3)
can be shown (see Mason, Tracy, and Young (1995)) to be the square of the pth
component of the vector X adjusted by the estimates of the mean and standard
deviation of the conditional distribution of xp given (xi, X 2 , . . . , xp~i). It is given
as
where
and
is the (p — l)-dimensional vector estimate of the coefficients from the regression of

xp on the (p — 1) variables xi, x % , . . . ,X P _I. It can be shown that the estimate of
the conditional variance is given as
Since the first term of (8.3) is a T2 statistic, it too can be separated into two
orthogonal parts:
The first term, T2_2, is a T2 statistic on the first (p — 2) components of the X

vector, and the second term, 7 1 p_ 1-1)2) ... ) p_2> '1S the square of xpî adjusted by the
estimates of the mean and standard deviation of the conditional distribution of
£ p _i given (xi, x2, • . . ,z p _ 2 ).
Continuing to iterate and partition in this fashion yields one of the many possible
MYT decompositions of a T2 statistic. It is given by
8.3. Computing the Decomposition Terms 149
The TI term in (8.6) is the square of the univariate t statistic for the first variable
of the vector X and is given as
Note this term is not a conditional term, as its value does not depend on a condi-
tional distribution. In contrast, all other terms of the expansion in (8.6) are condi-
tional terms, since they represent the value of a variable adjusted by the mean and
standard deviation from the appropriate conditional distribution. We will repre-
sent these terms with the standard dot notation used in multivariate analysis (e.g.,
see Johnson and Wichern (1999)) to denote conditional distributions. Thus, T2^ k
corresponds to the conditional T2 associated with the distribution of Xi adjusted
for, or conditioned on, the variables Xj and Xk-
8.3 Computing the Decomposition Terms

There are many different ways of computing the terms of the MYT decomposition.
A shortcut approach is discussed in this section. From (8.3), we know that the first
(p — I) terms of (8.6) correspond to the T2 value of the subvector X', _^ = (x\,
x 2 , . . . ,z p -i); i.e.,
Similarly, the first (p — 2) terms of this expansion correspond to the subvector

X
(P-2) = ( z i , £ 2 , - - . , z P - 2 ) ; i-e.,
Continuing in this fashion, we can compute the T2 values for all subvectors of the
original vector X. The last subvector, consisting of the first component X^ = (xi),
is used to compute the unconditional T2 term given in (8.7); i.e.,
All the T2 values, T?^ X i ,X2 ,. . - , X pv) T?\J<1 ,X2 ,...,JUp— I x) , . . . , T f 2(J,i)'
v are computed using^
A
the general formula
where X^ represents the appropriate subvector, X^ is the corresponding subvec-

tor mean, and Sjj denotes the corresponding covariance submatrix obtained from
the overall S matrix given in (8.2) by deleting all unused rows and columns. Thus,
Figure 8.1: Ellipsoidal control region for three process variables,
the terms of the MYT decomposition can be computed as follows:
To illustrate this method for computing the conditional and unconditional terms
of a MYT decomposition, consider an industrial situation characterized by three
process variables. The in-control HDS is represented by 23 observations, and the
estimates of the covariance matrix and mean vector are given by
A three-dimensional plot of the data with a 95% (a = 0.05) control ellipsoid is

presented in Figure 8.1. For graphing purposes, the data have been centered at the
mean value.
The T2 value for a new observation vector X' — (533,514,528) is computed
using (8.1) and produces a T2 value of 79.994. For an a = 0.05 with 23 observations,
the T2 critical value is 11.923. Since the observed T2 is larger than this value,
8.3. Computing the Decomposition Terms 151
Figure 8.2: Ellipsoidal control region with signaling point.
the new observation produces a signal. A graphical illustration of the signaling

point is presented in Figure 8.2.
Recall from Chapter 7 that for p = 2, there were two separate MYT decompo-
sitions of the T2 statistic. Likewise, for p = 3, a number of decompositions exists.
One possible MYT decomposition for the observation vector is given as
To compute this value, we begin by determining the value of the conditional

term Tjl2- From the above discussion, we have
To obtain T? x -, we partition the original estimates of the mean vector and covari-
ance structure to obtain the mean vector and covariance matrix of the subvector
X^ = ( x i ^ x z ) . The corresponding partitions are given as
and T|x 2 is calculated as

Similarly, the decomposition for T? -, is given by
We obtain Tf.i by computing the T2 value of the subvector X^> = ( x i ) . This is

the unconditional term T2 and is computed by
Thus, Tn-i is computed as
From this, we have
and the smallness of the first two terms, T2 and T2A, imply that the signal is
contained in the third term, T2-^ 2-
Only one possible MYT decomposition was chosen above to illustrate a com-
puting technique for the decomposition terms. Had we chosen another MYT de-
composition, such as
other terms of the decomposition would have had large values. With a signal-
ing overall T2 value, we are guaranteed that at least one term of any particular
decomposition will be large. We illustrate this important point in later sections.
8.4 Properties of the MYT Decomposition

Many properties are associated with the MYT decomposition. Consider a p-
dimensional vector defined as X' = (xix?, - . . , xp}. Interchange the first two com-
ponents to form another vector (x%, #1, • • • , xp] so that the only difference between
the two vectors is that the first two components have been permuted. The T2 value
of the two vectors is the same; i.e.,
This occurs because T2 values cannot be changed by permuting the components of

the observation vector.
8.4. Properties of the MYT Decomposition 153
This invariance property of permuting the T2 components guarantees that each

ordering of an observation vector will produce the same overall T2 value. Since there
are p\ = (p)(p — l)(p — 2) • • • (2)(1) permutations of the components of the vector
( £ i , £ 2 , . . . , £ p ), this implies we can partition a T2 value in pi different ways. To
illustrate this result, suppose p = 3. There are 3! = (3)(2)(1) = 6 decompositions
of the T2 value for an individual observation vector. These are listed below:
Each row of (8.10) corresponds to a different permutation of the components

of the observation vector. For example, the first row corresponds to the vector
written in its original form as (xi, x 2 , £3), whereas the last row represents (x3, £ 2 ,
xi). Note that all six possible permutations of the original vector components are
included.
The importance of this result is that it allows one to examine the T2 statistic
from many different perspectives. The p terms in any particular decomposition are
independent of one another, although the terms across the decompositions are not
necessarily independent. With p terms in each partition and p\ partitions, there
are p x p\ possible terms to evaluate in a total MYT decomposition of a signaling
T2 statistic. Fortunately, all these terms are not unique to a particular partition,
as certain terms occur more than once. For example, the T2 term occurs in the
first and second decompositions listed in (8.10), and the Tji 3 term occurs in the
second and fifth decompositions. In general, there are p x 2^p~1^ distinct terms
among the possible decompositions. These unique terms are the ones that need to
be examined for possible contribution to a T2 signal. When p is large, computing
all these terms can be cumbersome. For example, when p = 10, there are over
5,000 unique terms in the MYT decomposition. To alleviate this problem, several
computational shortcuts have been established and are discussed in detail in later
sections of this chapter.
Consider the MYT decomposition given in (8.6) and suppose T2 dominates
the overall value of the T2 statistic. This indicates that the observation on the
variable x\ is contributing to the signal. However, to determine if the remaining
variables in this observation contribute to the signal, we must examine the T2 value
associated with the subvector (x 2 , £3, • . - , £ p ), which excludes the x\ component.
Small values of the T2 statistic for this subvector imply that no signal is present.
They also indicate that one need not examine any term of the total decomposition
involving these (p — 2) variables.
The fact that a T2 statistic can be computed for any subvector of the overall
observation vector has numerous applications. For example, consider a situation
(not uncommon in an industrial setting) where observations on process variables are
Table 8.1: List of possible regressions and conditional T2 terms when p = 3.
Regression of Conditional T2
2
x\ on X2 T
J
1.2
2
xi on X3 T
J
1.3
2
x\ on X2, X3 T
J
1.2,3
2
X2 on xi T
J
2.1
2
X2 on xs T
J
2.3
2
X2 On Xi, X3 r2.1,3
J
£3 on xi T
J
2
3.1
2
X3 On X2 T
J
3.2
xs on xi, x-2 T
J
2
3.1,2
more frequently available than observations on lab variables. A T2 statistic can be

computed for the subvector of the process variables whenever observations become
available, and an overall T2 statistic can be determined when both the process and
lab variables are available. Another application of this important result is in the
area of missing data. For example, we need not shut down a control procedure
because of a faulty sensor. Instead, we can drop the variable associated with the
sensor from the control procedure and run the T2 on the remaining variables until
the sensor is replaced.
To demonstrate the use of the T2 statistic in signal location, reconsider the
three-variable example discussed in the last section. We computed the T2 value of
the signaling observation vector, X' = (533,514,528), to be
Furthermore, we computed the T2 value of the subvector X'^ = (533, 514) to be
The smallness of this latter T2 value suggests that there are no problems with
the observations on variables x\ and x 2 . Thus, all T2 terms, both conditional and
unconditional, involving only x\ and x% will have small values. Our calculations
confirm this result as
From this type of analysis one can conclude that the signal is caused by the observed
value on #3.
Another important property of the T2 statistic is the fact that the p(2p~1 — 1)
unique conditional terms of a MYT decomposition contain the residuals from all
possible linear regressions of each variable on all subsets of the other variables. For
example, for p = 3, a list of the nine (i.e., 3(23~1 — 1)) linear regressions of each
variable on all possible subgroups of the other variables is presented in Table 8.1
along with the corresponding conditional T2 terms. It will be shown in Chapter
8.5. Locating Signaling Variables 155
9 that this property of the T2 statistic provides a procedure for increasing the
sensitivity of the T2 statistic to process shifts.
8.5 Locating Signaling Variables

In this section we seek to relate a T2 signal and its interpretation to the compo-
nents of the MYT decomposition. Consider a signaling observation vector X' =
(xi, X27 • • • i xp) such that
One method for locating the variables contributing to the signal is to develop a
forward-iterative scheme. This is accomplished by finding the subset of variables
that do not contribute to the signal.
Recall from (8.3) and (8.5) that a T2 statistic can be constructed on any subset of
the variables, xi, #2, • • • , xp. Construct the T2 statistic for each individual variable,
Xj, j = 1 , 2 , . . . , p, so that
where Xj and s2- are the corresponding mean and variance estimates as determined
from the HDS. Compare these individual T2 values to their UCL, where
is computed for an appropriate a level and for a value of p = 1. Exclude from the
original set of variables all Xj for which
since observations on this subset of variables are definitely contributing to the signal.
From the set of variables not contributing to the signal, compute the T2 statistic
for all possible pairs of variables. For example, for all (x^, Xj) with i ^ j, compute
and compare these values to the upper control limit,
Exclude from this group all pairs of variable for which

The excluded pairs of variables, in addition to the excluded single variables,

comprise the group of variables contributing to the overall signal. Continue to
iterate in this fashion so as to exclude from the remaining group all variables of
signaling groups of three variables, four variables, etc. The procedure produces a
set of variables that contribute to the signal.
To illustrate the above methodology, recall the T2 value of 79.9441 for the three-
dimensional observation vector X' — (533, 514, 528) from the previous example.
The unconditional T2 values for the individual observations are given by
The UCL is computed as
From these computations we observe that T2 and T| are in control. However,
indicates that £3 is part of the signal contained in the observation vector.

Next, we separate the original observation into two groups, (x\, x%) and (rr 3 ).
We compute
and compare it to
We conclude no signal to be present in the (xi, x^] component of the observation

vector. Hence, the reason for the signal lies with the observation on the third
variable, namely, x$ = 528.
One problem with this method is that it provides little information on how the
isolated signaling vector component(s) contributes to the signal. For example, in
the above set of data, it does not indicate how the observed value of 528 on x%
contributes to the overall signal. Nevertheless, it can be used, as indicated, to
locate the components of the observation vector that do contribute to the signal.
It should be noted also that a backward-elimination scheme, similar to the forward
scheme, could be developed to locate the same signaling vector components.
Another method of locating the vector components contributing to a signal is to
examine the individual terms of the MYT decomposition of a signaling observation
vector and to determine which are large in value. This is accomplished by compar-
ing each term to its corresponding critical value. Recall from Chapter 7 that the
distribution governing the components of the MYT decomposition for the situation
where there are no signals is an F distribution. For the case of p variables, these
are given by
8.6. Interpretation of a Signal on a T2 Component 157
Table 8.2: Unique T2 terms of total decomposition.
7\2 = 1.3934 T^g = 28.2305* Tf 2 = 21.1522*

T| = 0.0641 T| ! = 0.0001 T^g = 58.7278*
Tf = 11.6578* T|3 = 9.5584* Tf ^ 3 = 40.0558*
T£2 = 1.3294 Tf^ =38.4949* T
3.i,2 = 78.5506*
for unconditional terms, and by
for conditional terms, where k equals the number of conditioned variables. For
k = 0, the distribution in (8.13) reduces to the distribution in (8.12). Using these
distributions, critical values (CVs), for a specified a level and an HDS sample of
size n for both conditional and unconditional terms are obtained as follows:
We can compare each individual term of the decomposition to its critical value and
make the appropriate decision.
To illustrate the above discussion, recall the T2 value of 79.9441 for the obser-
vation vector X' = (533, 514, 528) taken from the example described in section 8.3.
Table 8.2 contains the 12 unique terms and their values for a total decomposition
of this T2 value. A large component is determined by comparing the value of each
term to the appropriate critical value. The T2 values with asterisks designate those
terms that contribute to the overall T2 signal, e.g., T32, T^, T|3, T|_l5 T|2, T?2_3,
T^ i 3 , and T^12- All such terms contain the observation on x3. This was the same
variable designated by the exact method for detecting signaling variables. Thus,
one could conclude that a problem must exist in this variable. However, a strong ar-
gument also could be made that the problem is due to the other two variables, since
four of the signaling terms contain x\ and four terms contain x^. To address this
issue, more understanding of what produces a signal in terms of the decomposition
is needed.
8.6 Interpretation of a Signal on a T2 Component

Consider one of the p possible unconditional terms resulting from the decomposition
of the T2 statistic associated with a signaling observation. As stated earlier, the
term
Figure 8.3: Ellipsoid within a box.
j = 1, 2 , . . . ,p, is the square of a univariate t statistic for the observed value of

the j'th variable of an observation vector X. For control to be maintained, this
component must be less than its critical value, i.e.,
Since we can reexpress this condition as Tj being in th

following interval:
or as
where £( a /2,n-i) is the appropriate value from a t distribution with n — 1 degrees

of freedom. This is equivalent to using a univariate Shewhart control chart for the
jfth variable.
If the control limits in (8.16) are constructed for each of the p variables and
plotted in a p-dimensional space, we would obtain a hyperrectangular "box." This
box is the equivalent of a Shewhart control procedure on all individual variables.
However, the true control region, based on the T2 statistic, is a hyperellipsoid
located within the box. In most situations, the ellipsoid will not fit inside the box.
We illustrate this situation for p = 3 in Figure 8.3. The rectangular box rep-
resents the control limits on the individual box as computed by (8.16), while the
ellipsoid represents the control region for the overall T2 statistic. Various signaling
points are also included for discussion below.
If an observation vector plots outside the box, the signaling univariate Tj2 values
identify the out-of-control variables, since the observation on the particular variable
is varying beyond what is allowed (determined) by the HDS. This is illustrated in
Figure 8.3 by point A. Thus, when an unconditional term produces a signal, the
implication is that the observation on the particular term is outside its allowable
range of variation. The point labeled C also lies outside the box region, but the
overall T2 value for this point would not have signaled since the point is inside the
elliptical control region.
This part of the T2 signal analysis is equivalent to ranking the individual t values
of the components of the observation vector (see Doganaksoy, Faltin, and Tucker
(1991)). While these components are a part of the T2 decomposition, they represent
only the p unconditional terms. Additional insight into the location and cause of a
signal comes from examination of the conditional terms of the decomposition.
Consider the form of a general conditional term given as
If the value in (8.17) is to be less than its control limit,
its numerator must be small, as the denominator of these terms is fixed by the
historical data. This implies that component Xj from the observation vector X' =
(xi, #2, • • • , Xj, • • • i Xp) is contained in the conditional distribution of Xj given x\.
X2, • • • , Xj-i and falls in the elliptical control region.
A signal occurs on the term in (8.17) when Xj is not contained in the conditional
distribution of Xj given xi, X 2 , . . . , £j-i, i.e., when
This implies that something is wrong with the relationship existing between and
among the variables xi, x 2 , . . . ,Xj. For example, a signal on T 2 X 2 j_i implies
that the observation on Xj is not where it should be relative to the value of xi,
X2, • • • , Xj-i- The relationship between Xj and the other variables is counter to the
relationship observed in the historical data.
To illustrate a countercorrelation, consider the trace of the control region of
Figure 8.3 in the x\ and x3 spaces as presented in Figure 8.4. The signaling point
B of Figure 8.3 is located in the upper right-hand corner of Figure 8.4, inside the
operational ranges of x\ and £3 but outside the T2 control region. Thus, neither
the T2 nor the T| term would signal, but both the T 2 3 and T321 terms would.
Conditional distributions are established by the correlation structure among
the variables, and conditional terms of an MYT decomposition depend on this
Figure 8.4: Trace of the control region for x\ and x3.
structure. With the assumption of multivariate normality, no correlation implies

independence among the variables. The decomposition of a T2 statistic for this
situation will contain no conditional terms. To see this, consider a p-variate normal
distribution having a known mean of /j, and a covariance matrix given by
The T2 value for an observation taken from this distribution would be given by
In this form, all terms in the decomposition are unconditional, where
To summarize the procedure for interpreting the components of an MYT decom-

position, signals on unconditional terms imply that the involved variable is outside
the operational range specified by the HDS. An observation vector containing this
type of signal will plot outside the hyperrectangular box defined by the control lim-
its for all unconditional terms. Overall T2 signals on observations within the box
will have large conditional terms, and these imply that something is wrong with
Table 8.3a: Summary statistics.

XI X2 X3
Mean 525.435 513.435 539.913
Min 536.000 509.000 532.000
Max 536.000 518.000 546.000
Table 8.3b: Correlation matrix.

XI X2 £3
XI 1.000 0.205 0.725
X2 0.205 1.000 0.629
X3 0.725 0.629 1.000
the relationship among the variables contained in the term. All of these variables
would need to be examined to identify a possible cause.
More information pertaining to the HDS given in section 8.3 is needed in order to
expand our previous example to include interpretation of the signaling components
for the observation vector X' = (533, 514, 528). This information is summarized in
Tables 8.3a and 8.3b.
Consider from Table 8.2 the value of Tf_3 = 28.2305, which is declared large
since it exceeds the CV = 8.2956. The size of T±3 implies that something is wrong
with the relationship between the observed values on variables x\ and #3. Note
from Table 8.3 that, as established by the HDS, the correlation between these two
variables is 0.725. This implies that the two variables vary together in a positive
direction. However, for our observation vector X' = (533,514,528), the value
of x\ = 533 is somewhat above its mean value of 525.435, whereas the value of
x3 = 528 is well below its mean value of 539.913. This contradictory result is an
example of the observations on x\ and x<z being countercorrelated. To reestablish
control of the process, either x\ must be lowered, if possible, or the value of £3
must be increased.
To determine which variable to move requires one to be familiar with the process
and the process variables. This includes knowing which variable is easiest to control.
If x\ is controllable and x3 is not, then x\ should be lowered. If x% is controllable
and x\ is not, then x3 should be decreased. If both are controllable, then one might
consider the large size of the unconditional term T3 in Table 8.2; i.e.,
A large value on an unconditional term implies that the observation on that variable
is outside the Shewhart box. This is the case for the observed value of £3 = 528,
as it is considerably less than the minimum value of 532 listed in the HDS. Hence,
to restore control in this situation, one would adjust variable xs upward.
8.7 Regression Perspective

As stated earlier a signal on a conditional term implies that something is wrong with
the relationships among the involved variables. Additional insight for interpreting
a signaling conditional term is obtained by examining these terms from a regression
perspective. In general, T^ 2 J-_1 is a standardized observation on the jth variable
adjusted by the estimates of the mean and variance from the conditional distribution
associated with Xj.it2,...,j-i- The general form of this term is given in (8.17).
Consider the estimated mean of Xj adjusted for x\, 2:2, • • • , xj-i- We estimate
this mean using the prediction equation
where Xj is the sample mean of Xj obtained from the historical data. The subvector
X^~^ is composed of the observations on (£1,0:2,... ,2^-1), and X^~l"> is the
corresponding estimated mean vector obtained from the historical data. The vector
of estimated regression coefficients Bj is obtained from partitioning the submatrix
Sjj, the covariance matrix of the first j components of the vector X. To obtain
Sjj, partition S as follows:
Further, partition the matrix Sjj as
Then
Since the left-hand side of (8.18) contains %.i,2,...,j-i, the predicted value of Xj
from the given values of £1,0:2, • • • , a^-i, the numerator of (8.17) is a regression
residual represented by
Rewriting the conditional variance as
(see, e.g., Rencher (1993)) and substituting rj,i^,...,j~i for (xj — 2^.1,2,...j-i), we
can re-express T^ 2) j_i as a squared standardized residual having the form
8.8. Computational Scheme (Optional) 163
The conditional term in this form explains how well a future observation on a
particular variable is in agreement with the value predicted by a set of the other
variate values of the vector, using the covariance matrix constructed from the HDS.
Unless the denominator in (8.19) is very small, as occurs when R2 is near 1,
the "largeness" of the conditional T2 term will be due to the numerator, which is a
function of the agreement between the observed and predicted values of Xj. Even
when the denominator is small, as occurs with large values of R? we would expect
very close agreement between the observed and predicted Xj values. A significant
deviation between these values will produce a large T2 term.
When the conditional T2 term in (8.19) involves many variables, its size is di-
rectly related to the magnitude of the standardized residual resulting from the pre-
diction of xj using £1, £2, • • • , xj-i and the HDS. When the standardized residual
is large, the conditional T2 signals.
The above results indicate that a T2 signal may occur if something goes astray
with the relationships between subsets of the various variables. This situation
can be determined by examination of the conditional T2 terms. A signaling value
indicates that a contradiction with the historical relationship between the variables
has occurred either (1) due to a standardized component value that is significantly
larger or smaller than that predicted by a subset of the remaining variables, or (2)
due to a standardized component value that is marginally smaller or larger than
that predicted by a subset of the remaining variables when there is a very severe
collinearity (i.e., a large -R2 value) among the variables. Thus, a signal results when
an observation on a particular variable, or set of variables, is out of control and/or
when observations on a set of variables are counter to the relationship established
by the historical data.
8.8 Computational Scheme (Optional)

A primary consideration in the decision to use a multivariate control procedure is
ease of computation. Interpretation efforts may require numerous computations,
and this fact might initially discourage practitioners. The MYT decomposition of
the T2 statistic has been shown to be a great aid in the interpretation of signal-
ing T2 values, but the number of unique terms can be large, particularly when p
exceeds 10. Although this problem has been noted by other authors (e.g., Kourti
and MacGregor (1996)), it fortunately has led to the development of computer pro-
grams that can rapidly produce the significant components of the decomposition
for moderately large sets of variables (e.g., see Langley, Young, Tracy, and Mason
(1995)). Nevertheless, the question on how these computational methods will work
when there are hundreds of thousands of variables has yet to be answered. There
are many factors influencing the capability of the procedure, and these include com-
puter capacity, computer speed, the size of the data set, and the programming of
the algorithm.
The following is an outline of a sequential computational scheme that can be
used to reduce the computations to a reasonable number when the overall T2 signals.
Step 1. Compute the individual T2 statistics for every component of the X

vector. Remove variables whose observations produce a significant T2. The ob-
servations on these variables are out of individual control, and it is not necessary
to check how they relate to the other observed variables. With these significant
variables removed, we have a reduced set of variables. Check the subvector of the
remaining k variables for a signal. If no signal remains, we have located the source
of the problem.
Step 2. If a signal remains in the subvector of k variables not deleted, compute
all T2j terms. Remove from consideration all pairs of variables, (2^,0^-), that have
a significant T2• term. This indicates that something is wrong with the bivariate
relationship. When this occurs it will further reduce the set of variables under
consideration. Examine all removed variables for the cause of the signal. Compute
the T2 value for the remaining subvector. If no signal is present, the source of
the problem is with the bivariate relationships and those variables that were out of
individual control.
Step 3. If the subvector of the remaining variables still contains a signal,
compute all the T2 • k terms. Remove any three-way combination of variables,
(xijXjjXk): that shows significant results and check the remaining subvector for a
signal.
Step 4. Continue computing the higher-order terms in this fashion until there
are no variables left in the reduced set. The worst-case situation is that all unique
terms will have to be computed.
Let us apply this computational scheme to the simulated data from Hawkins
(1991) that is described in Mason, Tracy, and Young (1995). Measurements were
taken on five dimensions of switch drums: the inside diameter of the drum, xi, and
the distances from the head to the edges of four sectors of the drum, X2, £3, £4, and
£5. A signaling observation from this situation yielded a T2 — 22.88. With p = 5,
there are 5 x 24 = 80 distinct components that could be evaluated in decomposing
the T2 value to determine the cause of the signal. Table 8.4 contains the 31 terms
out of these 80 that are significant. The larger conditional terms indicate that the
cause of the problem exists with x\ individually, with the relationship between x\
and the other variables, and with x§ and its relationship to x±.
Suppose the Hawkins data is re-examined using our computational scheme. At
Step 1, the five individual T2 statistics would be computed. These are given in
Table 8.5. Only T2 is significant, so variable 1 is removed from the analysis. After
removing x\ from the data vector, the remaining subvector is still significant; i.e.,
T2-T? = 22.88-7.61 = 16.37. Thus, the 12 T?j terms that do not involve x\ must
be computed. As can be seen from scanning Table 8.4, the only one of these that
is significant is T524. Thus, £4 and x$ are removed from the analysis. At this stage
of the computational scheme, the subvector involving x% and #3 is not significant,
so the calculations stop. The interpretation is that xi is out of individual control
and that the observed value for £5 is not what would be predicted by the observed
value of £4 according to the regression equation obtained using the historical data.
The computational scheme was very efficient and reduced the number of required
calculations from 80 terms to only 17 terms.
8.9. Case Study 165
Table 8.4: Significant decomposition components for Hawkins's (1991)

signaling observation.
Component Value Component Value Component Value
2
T2 7.61 T
J
2
1.2,3,4 15.86 T
J
5.4 4.43
2
TL 8.90 T
J
2
1.2,3,5 16.79 r5.1,2
J 10.48
2 2 2
T
J
1.3 14.96 T
J
1.2, 4, 5 16.45 T
J
5.1,4 11.51
2 2 2
T
i
1.4 9.41 T
1
1.3, 4, 5 17.41 T
J
5.2,4 4.73
2
T
J
2
1.5 15.62 T
J
2
1.2, 3, 4, 5 16.37 r5.3,4
J 4.66
2
T
J
2
1.2,3 15.78 T
J
2
3.1 8.10 T
J
5.1,2,3 4.93
2
T
J
2
1.2,4 9.33 T
J
2
3.1,4 6.00 T
J
5.1,2,4 11.85
T 2
M.2,5 15.98 T
J
2
3.1,2 7.06 T 2
-'S.I, 3,4 6.89
2
T
J
2
1.3,4 15.18 r3.1,2,4
J 6.70 T
J
2
5.2,3,4 5.66
2 2
T
J
2
1.3,5 15.10 T
J
5.1 11.87 T
1
5.1,2,3,4 6.17
T
J
2
1.4,5 16.49
Table 8.5: Unconditional T2 values for Hawkins's (1991) data.
Variable Xi X2 X3 X4 £5
2
T 7.61* 0.67 0.76 0.58 3.87
* Denotes significance at the 0.05 level, based on
one-sided UCL = 4.28.
8.9 Case Study

To illustrate the interpretation of signals for the T2 statistic and the computing
scheme for the signal location in terms of components of the MYT decomposition,
we present the following example. Consider a process that produces a specialty
product on demand. The quality characteristics of the product must conform to
rigorous standards specified by the customer. Any deviation leads to serious upset
conditions in the customer's process. The quality of the product is determined by
measurements on seven quality variables. An acceptable product is established by
the observations on the quality variables conforming to the structure of an HDS
consisting of 85 previously accepted lots. Establishing the reason for lot rejection
is most important, since, in most cases, it is possible to rework the rejected lots to
acceptable standards.
In this example, the control procedure on product quality serves two purposes.
First, if a lot conforms to the structure of the HDS, it is considered fit for cer-
tification and shipment to the customer. Thus, the control procedure produces
information to help in making this decision. Second, by analyzing rejected lots, we
can determine the reason(s) for rejection through use of the MYT decomposition of
the signaling T2 statistic. Being able to determine what variables cause rejection
and how they contribute to the overall signal can be quite helpful in the rework
process of the rejected lots. For example, we predict new target values for those
Table 8.6a: Summary statistics.

XI X2 X3 X4 X5 XQ X7
Mean 87.15 7.29 3.21 0.335 0.05 0.864 1.0875
Min 86.40 6.8 2.8 0.27 0.00 0.8 0.85
Max 87.50 7.9 3.5 0.39 0.12 1.00 1.4
Std Dev 0.26 0.21 0.13 0.02 0.03 0.05 0.11
Table 8.6b: Correlation matrix.

Xl X2 X3 £4 X5 XQ x7
Xl 1.000 -0.500 -0.623 -0.508 -0.073 -0.464 -0.329
X2 1.000 -0.160 -0.063 -0.471 0.037 -0.473
X3 1.000 0.700 0.432 0.118 0.400
£4 1.000 0.216 0.110 0.309
X5 1.000 -0.030 0.377
XQ 1.000 0.368
XI 1.000
Figure 8.5: T2 chart for HDS.
variables that contribute to rejection. These new values conform not only to the
HDS, but also to the observations on the variables of the data vector that did not
contribute to the signal.
The HDS on the seven (coded) quality variables is characterized by the sum-
mary statistics and the correlation matrix presented in Tables 8.6a and 8.6b. As
demonstrated later, these statistics play an important role in signal interpretation
for the individual components of the decomposition.
Individual observation vectors are not presented due to proprietary reasons;
however, an understanding of the process variation for this product can be gained
by observing a graph of the T2 values for the HDS. This graph is presented in Figure
8.5. The time ordering of the T2 values corresponds to the order of lot production
8.9. Case Study 167
Table 8.7: Unconditional terms for observation 1.

T2 Component Value
Tl 47.0996*
T| 0.0014
n 0.0092
0.0751
Tl
n 0.0056
n 0.0071
T? 0.0047
"Denotes significance at 0.05 level.
and acceptance by the customer. Between any two T2 values, other products could
have been produced as well as other rejected lots.
The seven process variables represent certain chemical compositions contained in
the product. Not only do the observations on these variables have to be maintained
in strict operation ranges, but they also must conform to relationships specified by
the correlation matrix of the HDS. This situation is representative of a typical
multivariate system.
Consider an observation vector X' - (89.0,7.3,3.2,0.33,0.05,0.86,1.08) for a
rejected lot. The lot was rejected because the T2 value of the observation vector
was greater than its UCL; i.e.,
A relatively large value of the Type I error rate (i.e., a = 0.05) is used to protect the
customer from receiving an out-of-control lot. The risk analysis used in assessing
the value of the Type I error rate deemed it more acceptable to reject an in-control
lot than to ship an out-of control lot to the customer.
The T2 value of this signaling observation vector is decomposed using the com-
puting scheme described in section 8.8. Table 8.7 contains the individual T2 values
for the seven unconditional terms. Significance is determined by comparing the
individual unconditional T2 values to a critical value computed from (8.14). Using
n — 85 and a = 0.05, we obtain
Only the unconditional term T2 produces a signal, indicating that x\ is con-

tributing to the overall signal. When compared to the minimum and maximum x\
values contained in Table 8.6, we find the observed value of x\ to be larger than the
maximum value contained in the HDS, i.e., 89 > 87.5. To make the lot acceptable
to the customer, the value of this component must be reduced.
The second part of the computing scheme given in section 8.8 recommends the
removal of this value (i.e., x\ = 89.0) from the observation vector and the testing
of the remaining subvector for possible signal contributions. A T2 value for the
subvector (x 2 , x 3 , . . . , x7}' = (7.3, 3.2,0.33, 0.05, 0.86,1.08) is computed as
Table 8.8: Unconditional terms for observation 2.

T2 Component Value
7? 0.0321
r| 0.1749
T32 0.6977
n 1.3954
n
T2
0.0056
0.0071
T? 0.0047
This T2 value is compared to a critical value of 14.3019. The smallness of the T2

indicates that no signal is present.
From this analysis we conclude that the overall observation signal can be at-
tributed to the large value observed on x\. If possible, through the rework pro-
cess, this value needs to be reduced. A more acceptable value can be found by
predicting the value of x\ based on the fixed values of the other variables, (#2,
£ 3 , . . . ,#7)' = (7.3,3.2,0.33,0.05,0.86,1.08). A regression equation for xi, devel-
oped from the historical data using a stepwise procedure, is given as
Note that #4 is not in the equation, as it was not significant. The predicted value
of x\ using this equation is 87.16. The T2 value using this predicted value of x\
and the observed values of the remaining variables is
This small value indicates that, if the value x\ = 87.16 is attainable in the re-work
process, the lot will be acceptable to the customer. This judgment is made because
the T2 value is insignificant when compared to a critical value of 16.2411.
A second observation vector of a rejected lot is given as X' = (87.2, 7.2, 3.1,
0.36, 0.05, 0.86, 1.08). Again, its T2 statistic is greater than the critical value; i.e.,
We begin the signal determination procedure by computing the individual T2 values

of the unconditional terms. The values of these terms are presented in Table 8.8.
When compared to a critical value of 4.0011, none of the individual terms are
found to contribute to the signal. This indicates that observations on all variables
are within the control box determined by the HDS.
Our search for variables contributing to the overall signal continues by com-
puting the two-way conditional T2 terms as specified in Step 2 of the computing
scheme. These terms are presented in Table 8.9. The computing scheme detects
only one significant term, Tf 4. At this stage, we remove the observations on the
third and fourth variables and check for a signal on the subvector (xi, #2, x5, XQ,
x7)' = (87.2, 7.2,0.05,0.86,1.08). The computed T2 value is
.10. Summary 169
Table 8.9: Two-way conditional terms for observation 2.

2 2
*?.i Value Tt-j- Value TT"]. Value *?.i Value
T2 2
0.1184
TL 0.0012 J
1.6 0.0249 T J
2.4 TL 0.8586
2 T2 2 2
T
J
1.3 0.1909 J
1.7 0.0275 T J
2.5 0.2646 T
J
3.2 0.8363
2
T
M.4 0.8183 T2
J
2.1 0.1440 T J
2
2.6 0.1725 T
J
2
3.4 5.4091*
2 T2
2
T
1
1.5 0.0303 J
2.3 0.3135 T
J
2.7 0.2617
k
Denotes significance at 0.05 level.
When compared to a critical value of 12.3696, as computed with n = 85, p = 5,

and a = 0.05, no signal is found to be present.
A signal on the T|4 term indicates that something is wrong with the relationship
between the observed values on the third and fourth variables (i.e., £3 = 3.1 and
£4 = 0.36). More insight into the cause of the signal can be obtained by further
analysis. Referring to Table 8.6, we find the correlation between these variables
to be 0.70. Although moderate in size, this correlation implies that large or small
values of the two variables are somewhat associated together. However, examination
of the summary statistics in Table 8.6 indicates that the observation on £3 is below
the mean (i.e., small), while the observation on £4 is above the mean (i.e., large).
Thus, we have a small value for £3 associated with a large value for £4. This is a
contradiction of the historical relationship and produces the signal on the T 2 4 term.
8.10 Summary
The interpretation of a signaling observation vector in terms of the variables of a
multivariate process is a challenging problem. Whether it involves many variables
or few, the problem remains the same: how to locate the signaling variable(s). In
this chapter, we have shown that the MYT decomposition of the T2 statistic is a
solution to this problem. This orthogonal decomposition is a powerful tool, as it
allows examination of a signaling T2 from numerous perspectives. For example,
the signaling of unconditional components readily locates the cause in terms of an
individual variable or group of variables, while signaling of conditional terms locates
countercorrelated relationships between variables as the cause.
As we will see in Chapter 9, the monitoring of the regression residuals contained
in the individual conditional T2 terms allows the detection of both large and small
process shifts. By enhancing the models that are inherent to all conditional terms
of the decomposition, the sensitivity of the overall T2 can be increased.
Unfortunately, the T2 statistic with the MYT decomposition is not the solution
to all process problems. For example, although this technique will identify the
variable or set of variables causing a signal, it does not distinguish between mean
shifts and shifts in the variability of these variables. Nevertheless, Hotelling's T2
with the MYT decomposition has been found to be very flexible and versatile in
industrial applications requiring multivariate SPC.
Chapter 9
Improving the Sensitivity of the
I2 Statistic
Old Blue: Success!

It's the moment of truth. You are ready to analyze the new data to determine
the source of the problem. In the text, this was done by decomposing the T2
value into its component parts. You don't fully understand all the mathematics
involved in this decomposition, but the concept is clear enough. A T2 value
contains information provided by all the measured variables. When signals
occur, the decomposition offers a procedure for separation of this information
into independent component parts, with meaning attached to individual vari-
ables and groups of variables. Large component values indicate that something
is wrong with these individual variables or with the linear relationships among
the variables. Seems like a powerful tool, and you promise yourself to study
this topic in more detail at a later date.
You initialize the Phase II module of your computer program. After load-
ing your historical data file, you enter the new data and observe the T2 plot
containing upset conditions. Again, you are amazed. You can actually see
that the upset conditions started when the T2 points began to drift towards the
UCL.
You begin decomposing the signaling T2 values, but something is wrong.
The first signal indicates that fuel to the furnace is a problem. You quickly
check the observation number and return to the time-sequence plot. Sure
enough, there is a definite increase in fuel usage. This seems odd, since usage
rates were not increased. Returning to the T2 decomposition, you note another
variable that contributes to the signal, the oxygen level in the vent. When the
effects of fuel usage and oxygen level are removed, you observe that there is
no signal in the remaining observations. You quickly return to your computer
graphic module and plot oxygen against time. There it is! The amount of
oxygen in the vent shows an increase during the upset period. What could
cause an increase in the fuel usage and the oxygen level in the vent at the
same time? Then it hits you, the draft is wrong on the furnace. You pick up
the telephone, and call the lead operator and pass on your suspicion.
171
172 Chapter 9. Improving the Sensitivity of the T2 Statistic
As you wait for the return call, your anxiety level begins to rise. Unlike this
morning, there is no edge of panic. Only the unanswered question of whether
this is the solution to the problem. As the minutes slowly tick away, the
boss appears with his customary question, "Have we made any progress?" At
that moment the telephone rings. Without answering him, you reach for the
telephone. As the lead operator reports the findings, you slowly turn to the
boss and remark, "Old Blue is back on line and running fine." You can see
the surprise and elation as the boss leaves for his urgent meeting. He yells
over his shoulder, "You need to tell me later how you found the problem so
quickly."
9.1 Introduction
In Chapter 8, a number of properties of the MYT decomposition were explored
for use in the interpretation of a signaling T2 statistic. For example, through
the decomposition, we were able to locate the variable or group of variables that
contributed to the signal. The major goal of this chapter is to further investigate
ways of using the decomposition for improving the sensitivity of the T2 in signal
detection.
Previously, we showed the T2 statistic to be a function of all possible regres-
sions existing among a set of process variables. Furthermore, we showed that the
residuals of the estimated regression models are contained in the conditional terms
of the MYT decomposition. Large residuals produce large T2 components for the
conditional terms and are interpreted as indicators of counterrelationships among
the variables. However, a large residual also could imply an incorrectly specified
model. This result suggests that it may be possible to improve the performance of
the T2 statistic by more carefully describing the functional relationships existing
among the process variables. Minimizing the effects of model misspecification on
the signaling ability of the T2 should improve its performance in detecting abrupt
process shifts (see Mason and Young (1999)).
When compared to other multivariate control procedures, the T2 lacks the sen-
sitivity of detecting small process shifts. In this chapter, we show that this problem
can be overcome by monitoring the error residuals of the regressions contained in
the conditional terms of the MYT decomposition of the T2 statistic. Furthermore,
we show that such monitoring can be helpful in certain types of on-line experimen-
tation within a processing unit.
9.2 Alternative Forms of Conditional Terms

In Chapter 8, we defined a conditional term of the MYT decomposition as
This is the square of the jth variable of the observation vector adjusted by the
estimates of the mean and variance of the conditional distribution of Xj given #1,
9.2. Alternative Forms of Conditional Terms 173
#2, • • • i xj-i. We later showed that (9.1) could be written as
This was achieved by noting that £j-i,2,...,j-i can be obtained from the regression
of Xj on xi, # 2 , . . . ixj-i'i i- e -7
where bj are the estimated regression coefficients. Since £j.i,2,...,j-i is the predicted
value of .Xj, the numerator of (9.1) is the raw regression residual,
given in (9.2).
Another form of the conditional term in (9.1) is obtained by substituting the
following quantity for the conditional variance contained in (9.2); i.e., by substitut-
ing
where R? l 2 ~_i is the squared multiple correlation between Xj and x\. x^,..., £j-i.
This yields
Much information is contained in the conditional terms of the MYT decompo-

sition. Since these terms are, in fact, squared residuals from regression equations,
they can be helpful in improving the sensitivity of the T2 statistic in detecting both
abrupt process changes and gradual shifts in the process.
Large residuals in a regression analysis can be caused by using incorrect func-
tional forms for the variables. For example, this may occur when one uses a linear
term instead of an inverse term. Knowledge of this property may be useful in im-
proving the sensitivity of the T2 in signal detection. By carefully describing the
functional relationships existing among the process variables before constructing
a multivariate control procedure, the resulting residuals and corresponding con-
ditional T2 should be smaller in size. Since the effects of model misspecification
on the signaling ability of the T2 would be minimized, the performance of the T2
statistic in signal detection should improve.
Most variable specification problems can be eliminated by choosing the correct
form of the process variable to monitor when the in-control historical database
is being constructed. Process engineers often make such choices using theoretical
knowledge of the process. The appropriate functional forms of the variables are
selected at this time and may involve transforming the original variables to such
new forms as a logarithmic or an inverse function. An analyst also may improve
model specification through the use of data exploration techniques. We focus on
using both approaches to improve model specification, and thereby increase the
sensitivity of the T2 statistic.
9.3 Improving Sensitivity to

Abrupt Process Changes
When decomposing a specific T2 value that signals in a multivariate control chart,
we seek to find which of its conditional and unconditional T2 terms are large. These
terms are important because they indicate which variable or variable relationship
is causing the signal. A conditional T2 term will be large if either the residual
in the numerator of (9.5) is large or the R2 value in the denominator of (9.5)
is near unity when the residual in (9.5) is marginal in size. Since an R2 value
that approaches 1 in magnitude is generally an indicator of an extreme collinearity
between a subset of the process variables, it is often recommended in such cases
to eliminate the associated redundant process variables when constructing the in-
control HDS. Unfortunately, we cannot always eliminate such collinearities, as they
may be inherent to the process. In the discussions below, we will focus on the
problem of large residuals in the regressions associated with a signaling T2 value,
but illustrate in an example how inherent process collinearities can actually be
helpful in interpreting the T2 statistic.
Large residuals in the numerator of the conditional T2 term in (9.5) occur when
the observation on the jth variable is beyond the error range of its predicted value.
If the regression equation for Xj provides a poor fit (i.e., a large prediction error
associated with a small R2 value), large residuals may not be significant. In this
situation it is possible, due to the large error of the regression fit, to accept an
out-of-control signal as being in control. This would occur because of the large
error in the fit of the regression equation rather than because of a process shift.
Identifying such situations should lead to improvements in the regression model
and, ultimately, in the sensitivity of the T2 statistic. How to do this in an efficient
manner is an issue requiring further exploration.
As noted in Chapter 8, the T2 statistic can be decomposed into p terms in pi
ways. Removing redundant terms, there remain p(2p~1 — 1) unique conditional T2
components. The residuals of the regressions of an individual component on all
possible subgroups of the remaining p — 1 variables are contained in these terms.
Since many different regression models are needed to obtain the conditional T2
components, it would be inefficient to seek the optimal model specification for each
case. This would require a separate set of variable specifications for each model.
A better solution would be to transform the data prior to any analyses using
common functional forms for the variables requiring transformation. This is demon-
strated in section 9.5. This approach is simplified if one has theoretical or expert
knowledge about the process variables and their relationships to one another. If
such information is not attainable, it may be necessary to analyze the in-control
historical database or the current process data in order to determine these rela-
tionships. This is illustrated in section 9.6. In either case, model specification
9.4. Case Study: Steam Turbine 175
should lead to smaller residuals in (9.5). This is demonstrated in the following data
examples.
9.4 Case Study: Steam Turbine

In signal detection, process knowledge can be a great aid in enhancing the perfor-
mance of the T2 statistic. As an example of this approach, we consider the following
case study. Large industries often use electricity and steam as energy sources in
their production processes. It is not unusual for these companies to use their own
generating facilities to produce both forms of energy, as well as to purchase en-
ergy from local utility companies or cogeneration facilities. A common method of
producing electricity is by the use of a steam turbine; a generic description of a
closed-loop steam turbine process was presented in section 6.3 and illustrated in
Figure 6.3.
The process for generating electricity begins with the input of fuel to the boiler.
In this example, the boiler uses natural gas as the fuel source, and converts water
into steam. The high-pressure steam is passed through the turbine, and the turbine
turns the generator to produce electricity. Electricity is measured in megawatts,
where a megawatt is 1 million watts. The work done by the turbine in generating
electricity removes most of the energy from the steam. The remaining, or residual,
steam is sent to a condenser, where it is converted back to water. The water in
turn is pumped back to the boiler and the process repeats the loop.
The condenser is a large consumer of energy, as it must convert the latent heat
of vaporization of the steam to water. To operate efficiently, the condenser operates
under a vacuum. This vacuum provides a minimum back-pressure and allows for the
maximum amount of energy to be extracted from the steam before it is condensed.
The condenser must have available some type of cooling agent. In this example,
river water is used to condense the steam. However, the temperature of the river
water is inconsistent, as it varies with the seasons of the year. When the river water
temperature is high in the summer months, the absolute pressure of the condenser
is also higher than when the river water temperature is low in the winter months.
In this process, fuel is the input variable and megawatt-hour production is the
output variable. Load change, in megawatt-hour production, is accomplished by
increasing or decreasing the amount of steam from the boiler to the turbine. This
is regulated by the amount of fuel made available to the boiler.
9.4.1 The Control Procedure

The steam-turbine process control procedure serves two purposes. The first is to
monitor the efficiency of the unit. Turbines are run on a continuous basis and large
turbines require large amounts of fuel. A decrease as small as 1% in the efficiency
of a unit, when annualized over a year of run time, can greatly increase operating
costs. Thus, it is desired to operate at maximum efficiency at all times. The
second purpose of the control procedure is to provide information as to what part
of the overall system is responsible when upset conditions develop. For example, a
Figure 9.1: Time-sequence plot of river water temperature.
problem may occur in the boiler, the turbine-generator, or the condenser. Knowing
where to look for the source of a problem is very important in large systems such
as these.
The overall process is monitored by a T2 statistic on observations taken on the
following key variables:
(1) F = fuel to the boiler,
(2) S = steam produced to the turbine,
(3) ST = steam temperature,
(4) W or MW (Megawatts) = megawatts of electricity produced,
(5) P = absolute pressure or vacuum associated with the condenser
(6) RT = temperature of the river water
9.4.2 Historical Data Set

Historical data sets for a turbine system, such as the one described above, contain
observations from all time periods in a given year. One of the main reasons for
this is the changing temperature of the river water. To show the cyclic behavior of
river water temperature, we present a graph of the temperatures recorded over a
two-year period in Figure 9.1. Note that river temperature ranges over a 45-degree
span during a given year.
Consultation with the power engineers indicates that the unit is more efficient
in the winter than in the summer. The main reason given for this is the lower
absolute pressure in the unit condenser in the winter months. If the absolute
pressure is lower, the energy available for producing megawatts is greater, and less
Figure 9.2: Time-sequence plot of megawatt production.
fuel is required to produce a megawatt. The reverse occurs in the summer months,
when the river water temperature is high, as more fuel is required to produce a
megawatt of electricity.
Typical HDSs for a steam turbine are too large to present in this book. However,
we can present graphs of the individual variables over an extended period of time.
For example, Figure 9.2 presents a graph of megawatt production for the same time
period as is given in Figure 9.1. The irregular movement of megawatt production in
this plot indicates the numerous load changes made on the unit in the time period
of operation. This is not uncommon in a large industrial facility. For example,
sometimes it is less expensive to buy electricity from another supplier than it is
to generate it. If this is the situation, one or more of the units, usually the most
expensive to operate, will take the load reduction.
Fuel usage, for the same time period as in Figures 9.1 and 9.2, is presented in
Figure 9.3. Perhaps a more realistic representation is contained in Figure 9.4, where
a fuel usage plot is superimposed on an enlarged section of the graph of megawatt
production.
For a constant load, the fuel supplied to the boiler remains constant. However, to
increase the load on the generator (i.e., increase megawatt production), additional
fuel must be supplied to the boiler. Thus, the megawatt production curve moves
upward as fuel usage increases. We must use more fuel to increase the load than
would be required to sustain a given load. The opposite occurs when the load is
reduced. To decrease the load, the fuel supply is reduced. The generator continues
to produce megawatts, and the load curve follows the fuel graph downwards until a
sustained load is reached. In other words, we recoup the additional cost to increase
a load when we reduce the load.
Figure 9.3: Time-sequence plot of fuel consumption.
Figure 9.4: Time-sequence plot of megawatts and fuel.
Steam production over the given time period is presented in Figure 9.5. Ex-
amination of this graph and the megawatt graph of Figure 9.2 shows the expected
relationship between the two variables: megawatt production follows steam pro-
duction. Again, the large shifts in the graphs are due to load changes.
Steam temperature over the given time period is presented in Figure 9.6. Note
the consistency of the steam temperature values. This is to be expected, since
steam temperature does not vary with megawatt production or the amount of steam
produced.
Figure 9.5: Time-sequence plot of steam production.
Figure 9.6: Time-sequence plot of steam temperature.
The absolute pressure or vacuum on the condenser is presented in Figure 9.7.

Note the similarity of this plot to the one presented in Figure 9.1 for river water
temperature. Previous discussions have described the relationship between these
two variables.
Figure 9.7: Time-sequence plot of absolute pressure.
9.5 Model Creation Using Expert Knowledge

In this section, we will demonstrate the use of theoretical or expert knowledge in the
construction of a control procedure for a steam turbine. For ease of presentation, we
will use only a few of the variables monitored in the operation of these units. Two
primary variables common to this system are the amount of fuel (F), or natural
gas, consumed, and the production of electricity as measured in megawatts (W).
A scatter plot of these two variables for an HDS collected on a typical unit is
presented in Figure 9.8. Without additional information, one might assume that a
linear relationship exists between the two variables, fuel and megawatts. Such an
input/output (I/O) curve would be based on a model having the form
where the oti are the unknown regression coefficients. For example, the correlation
coefficient for the data plotted in Figure 9.8 is 0.989, indicating a very strong linear
relationship between the two variables.
The theory on steam turbines, however, indicates that a second-order polyno-
mial relationship exists between F and W. This is described by an I/O curve defined
by
where the /3j are the unknown coefficients. Without knowledge of this theory, the
power engineer might have used a control procedure based only on the simple linear
model given in (9.6).
To demonstrate how correct model specification can be used to increase the
sensitivity of the T2 statistic, suppose we compare these two models. Treating
9.5. Model Creation Using Expert Knowledge 181
Figure 9.8: Scatter plot of fuel consumption versus megawatts.
the observations plotted in Figure 9.8 as an in-control historical database, the

coefficients of the linear equation given in (9.6) were estimated and the following
equation was obtained:
Similarly, the following equation was obtained using the I/O curve given in (9.7):
A comparison of the graphs of these two functions is presented in Figure 9.9. The
use of the linear model given in (9.6) implies that fuel usage changes at a constant
rate as the load increases. Use of a linear relationship between the fuel and power
variables implies that fuel usage would remain the same regardless of the power.
We wish this were the case. Unfortunately, in operating steam turbines, more fuel
is required when the load increases. Only the quadratic I/O curve describes this
type of relationship.
In Figure 9.9, both functions provide a near-perfect fit to the data. The linear
equation has an R2 value of 0.9775, while the quadratic equation has an R2 value
of 0.9782. Although the difference in these R2 values is extremely small, the two
curves do slightly deviate from one another near the middle and at the endpoints
of the range for W. In particular, the linear equation predicts less fuel usage at
the ends and more in the middle than the quadratic equation. In these areas,
there could exist a set of run conditions acceptable using the linear model in the
T2 statistic but unacceptable using the quadratic I/O model. Since the quadratic
model is theoretically correct, use of it should improve the sensitivity of the T2
statistic to signal detection in the described region.
Figure 9.9: Comparison of models for fuel consumption.
To demonstrate this result, suppose that a new operating observation on F and

W is available and is given by
The corresponding T2 statistic based on only the two variables (F, W) has a value
of 18.89. This is insignificant (for a = 0.001) when compared to a critical value of
19.56 and indicates that there is no problem with the observation.
This is in disagreement with the result using the three variables x\ — F, x^ = W,
and £3 = W2. The resulting T2 value of 25.74 is significantly larger than the
critical value of 22.64 (for a = 0.001). Investigation of this signal using the T2
decomposition indicates that T 2 3 = 18.54 and T22 3 = 14.34 are large. The large
conditional T23 term indicates that there is a problem in the relationship between
F and W 2 . It would appear that the value F = 9675 is smaller than the predicted
value based on a model using W2 and the HDS. The large conditional T22 3 term
indicates something is wrong with the fit to the I/O model in (9.8). It appears
again that the fuel value is too low relative to the predicted value.
The ability of the quadratic model to detect a signal, when the linear model
failed and when both models had excellent fits, is perplexing. When comparing the
two models in Figure 9.9, the curves were almost identical except in the tails. This
result occurred because the correlation between W and W2 was extremely high (i.e.,
R2 = 0.997), indicating that these two variables were basically redundant in the
HDS. If initial screening tools had been used in analyzing this data set, the severe
collinearity would have been detected and the redundant squared megawatt variable
probably would have been deleted. However, because of theoretical knowledge
about the process, we found that the I/O model needed to be quadratic in the
megawatt variable. Thus, the collinearity was an inherent part of the process and
9.6. Model Creation Using Data Exploration 183
could not be excluded. This information helped improve model specification, which
reduced the regression residuals and ultimately enhanced the sensitivity of the T2
statistic to a small process shift.
As an additional note, the (I/O) models created for a steam turbine control
procedure can play another important role in certain situations. Consider a number
of units operating in parallel, each doing its individual part to achieve a common
goal. Examples of this would be a powerhouse consisting of a number of steam
turbines used in the generation of a fixed load, a number of pumps in service to
meet the demand of a specific flow, and a number of processing units that must
process a fixed amount of feedstock. For a system with more than one unit, the
proper division of the load is an efficiency problem. Improper load division may
appreciably decrease the efficiency of the overall system.
One solution to the problem of improper load division is to use "equal-incremental-
rate formulation." The required output of such a system can be achieved in many
ways. For example, suppose we need a total of 100 megawatts from two steam
turbines. Ideally, we might expect each turbine to produce 50 megawatts, but re-
alistically this might not be the most efficient way to obtain the power. For a fixed
system output, equal-incremental-rate formulation divides the load among the in-
dividual units in the most economic way; i.e., it minimizes the amount of input.
This requires the construction of I/O models that accurately describe all involved
units. It can be shown that the incremental solution to the load division problem
is given at the point where the slopes of the I/O curves are equal.
9.6 Model Creation Using Data Exploration

Theoretical knowledge about a process is not always available. In such situations, it
may become necessary to use data exploration techniques to ensure that the correct
functional forms are used for the variables examined with the in-control historical
data. Alternative models are posed based on global variable specifications that hold
across all the possible regressions associated with the T2 statistic.
As an example of the use of data exploration techniques, consider the condensing
unit (condenser) used to convert the low-temperature steam vapor leaving a steam
turbine into a liquid state. This is achieved by exposing the vapor to a coolant of
lesser temperature in the condensing unit. The condensed liquid is sent to a boiler,
transformed to steam, and returned to run the turbine. A control procedure on
the condensing unit provides valuable information in that it provides a check on
turbine performance as well as a check on the performance of the condensing unit.
A generic description of how such a unit works is given below.
High-temperature steam is used as the energy source for a steam turbine. As the
steam moves through the turbine, its energy is removed and only warm steam vapor
is left. This warm vapor has a strong tendency to move to the cool condensing unit
and is enhanced by the creation of a slight vacuum. The temperature of the vapor
effects this total vacuum. Thus, the vacuum as measured by the absolute pressure
becomes an indicator of turbine performance. For example, if the turbine removes
all the energy from the steam at a given load, a strong vacuum is needed to move
Table 9.1: Partial HDS for condensing unit data.

Obs. No. Temp. Megawatts Vacuum Obs. No. Temp. Megawatts Vacuum
1 56 45 0.92 16 89 45 2.54
2 60 45 1.13 17 94 45 2.89
3 63 28 0.97 18 92 45 3.11
4 70 38 1.26 19 92 45 2.87
5 68 40 1.14 20 85 45 2.28
6 73 45 1.54 21 76 20 1.05
7 78 41 1.64 22 77 34 1.62
8 78 45 1.86 23 74 45 1.77
9 82 25 1.46 24 68 39 1.17
10 85 29 1.73 25 66 46 1.28
11 86 35 2.04 26 74 33 1.25
12 86 45 2.46 27 67 20 0.75
13 88 45 2.68 28 51 30 0.55
14 86 45 2.41 29 51 24 0.44
15 92 45 2.80 30 56 45 0.97
the vapor to the condensing unit. However, if the turbine allows hot steam to pass
(i.e., running too hot), less vacuum is needed to move the vapor.
The vacuum is a function of the temperature of the coolant and the amount
of coolant available in the condensing unit. A lower temperature for the coolant,
which is river water, increases the tendency of the warm vapor to move to the
condenser. The amount of the coolant available depends on the cleanness of the
unit. For example, if the tubes that the coolant passes through become clogged
or dirty, inhibiting the flow, less coolant is available to help draw the warm steam
vapor through the unit and, hence, create a change in the vacuum.
Without data exploration, one might control the system using the three vari-
ables, vacuum (V), coolant temperature (T), and megawatt load (W). The condi-
tional term, TVTW, would contain the regression of vacuum on temperature and
megawatt load. In equation form, this is given as
where the bi are estimated constants. The sensitivity of the T2 statistic for the
condensing unit will be improved if the regression of vacuum on temperature and
megawatt load is improved, as TVTW is an important term in the decomposition
of this statistic. The theoretical functional form of this relationship is unknown,
but it can be approximated using data exploration techniques.
The HDS for this unit contains hundreds of observations on the three process
variables taken in time sequence over a time period of one year. For discussion
purposes, a partial HDS consisting of 30 points is presented in Table 9.1. However,
our analyses are based on the overall data set.
Results for the fitting of the regression model given in (9.9) for the overall data
set are presented in Table 9.2. The R2 value of 0.9517 in Table 9.2 indicates a good
fit to this data. However, the standard error of prediction has a value of 0.1658,
Table 9.2: Regression statistics for vacuum model.

Regression Statistics
R2 0.9517
Adjusted R2 0.9514
Standard Error 0.1658
Observations 305
Figure 9.10: Standardized residuals versus time for vacuum model.
which is more than 5% of the average vacuum. This is considered somewhat large
for this type of data and, if possible, needs to be reduced.
A graph of the standardized residuals for the model in (9.9) is presented in
Figure 9.10. There is a definite cyclical pattern in the plot. This needs to be
dampened or removed.
Figure 9.11 contains a graph of the megawatt load on the generator over this
same time period. From the graph, it appears that the generator is oscillating over
its operation range throughout the year, with somewhat lower loads occurring at
the end of the year. A quick comparison of this graph to that of the standardized
residuals in Figure 9.10 gives no indication of a connection to the cyclic nature of
the errors.
An inspection of the graph of the vacuum over time, given in Figure 9.12,
indicates a strong relationship between vacuum and time. This can be explained
by noting the seasonal variation of the coolant temperature displayed in Figure
9.5. The similar shapes of the graphs in Figures 9.12 and 9.13 also indicate that
a strong relationship exists between the vacuum and temperature of the coolant.
This is confirmed by the correlation of 0.90 between these two variables.
Figure 9.11: Time-sequence plot of megawatt.
Figure 9.12: Time-sequence of vacuum.
Figure 9.13: Time-sequence plot of coolant temperature.

Table 9.3: Regression statistics for revised vacuum model.

R2 0.9847
Adjusted R2 0.9844
Observations 305
Figure 9.14: Standardized residual plot for revised vacuum model.
The curvature in the above series of plots suggests the need for squared terms
in temperature and load in the vacuum model. We also will add a cross-product
term between the temperature and W , since this will help compensate for the two
variables varying together. In functional form, the model in (9.9) is respecified to
obtain the prediction equation
where the bi are estimated constants. A summary of the regression statistics is

presented in Table 9.3. The value of R2 for the regression fit is 0.9847, which is
an improvement over the previous model. Also, the standard error of prediction,
0.0939, is smaller than the previous error and is within the 5% range of the average
value of the vacuum. A graph of the standardized residuals is presented in Figure
9.14. Included on the plot are lower and upper control limits, placed at —3 and +3,
to help identify any outlying residuals. Some, but not all, of the oscillating of the
residuals over time is removed by including the cross-product term in the revised
vacuum model.
Overall, the model specified in (9.10) is considered superior. It was decided to

use this model, even though it too can be improved. It also stays within the global
specification of the variables that are used on other units, such as the boiler, that
are associated with the steam turbine.
9.7 Improving Sensitivity to

Gradual Process Shifts
The T2 statistic is known to be less sensitive to small process shifts. Due to the
fact that it is a squared value, it lacks the run rules (Montgomery (2001)) that
are prevalent in univariate Shewhart charts. One suggestion for overcoming this
problem is to examine trend patterns in the plots of the square root of the associated
conditional term given by
The upper and lower control limits for these plots are given by
where k = j — I is the number of terms in the regression equation and t(a j n _/ c _ 1 )

refers to the 100(1 — a/2)th percentile of the Student t distribution with n — k — 1
degrees of freedom. The percentage points of the Student t distribution are given
in Table A.2 in the appendix.
An alternative form of (9.11) is
where MSB is the mean squared error from the regression fit of Xj on x\, # 2 , . . . , Xj-i.
With this form it is easy to see that, apart from the constant term in the denomi-
nator, the square root of the conditional T2 term is simply a standardized residual
from the above regression fit. Hence, rather than plot the conditional term, we can
simply plot the standardized residuals
Any large deviation in these standardized residuals will be detected by a signal-

ing conditional T2 statistic. However, small but consistent changes in the residuals,
where the square root of the conditional T2 statistic is within its control chart error
limits, may not be detected. In such situations, a plot of the standardized resid-
uals against time can be used to detect nonrandom trends, and these plots can
9.7. Improving Sensitivity to Gradual Process Shifts 189
Figure 9.15: Standardized residual plot for in-control process.
be subjected to various run rules to determine out-of-control situations. We will

demonstrate the advantages of using such plots to detect small process shifts with
the following example.
Fuel consumption is an important variable for a large generator of electricity.
Consumption is monitored using an overall T2 control procedure along with two
other variables: megawatt production and squared megawatt production. However,
small but consistent shifts in performance, indicated by increases or decreases in
fuel consumption, can go undetected. Since the generator is large, with an average
fuel consumption of nearly a million fuel units per hour of operation, an increase in
usage of 1% would mean a need for an additional 10,000 units of fuel per hour. This
translates into a considerable increase in annual fuel costs. Thus, it is most impor-
tant to detect small changes in performance that lead to greater fuel consumption.
The regression model relating fuel consumption and megawatt production is
given in (9.7) and is incorporated in the conditional term, Tj- w W2. To demonstrate
the actual fit of the estimated model for this term, a plot of the standardized
residuals corresponding to this conditional T2 over a typical run period is given in
Figure 9.15. Lower and upper control limits have been placed at —3 and +3 to help
identify outlying residuals.
A review of this graph indicates how well the model is predicting fuel consump-
tion for a given load (megawatt production). The peaks in this graph, beyond
+/—3, indicate out-of-control points where a severe load change occurred. For ex-
ample, to make a substantial load increase on the turbine, additional fuel quantities
must be supplied. Since the load rises slowly, there is a gap between the time when
the fuel is consumed and the time when the megawatts are produced. Likewise,
when the load is decreased, the fuel is reduced first, but the generator drops slowly
and provides more megawatts than is expected for the incoming fuel. This produces
the peaks in the run chart.
Figure 9.16: Standardized residual plot for out-of-control process.
As a note of interest, HDSs for a steam turbine contain a year of data taken
over the normal operational range (megawatt production) of the unit and have
these outliers (peaks) removed. The data in Figure 9.15 are very indicative of the
performance of a unit operating at maximum efficiency.
Monitoring the performance of a steam turbine can be accomplished by exam-
ining incoming observations (F, W, W ) and computing and charting the overall
T2 statistic. This statistic indicates when an abrupt change occurs in the system
and can be used as a diagnostic tool for determining the source of the signal. In
addition, the standardized residuals from the fit of F to W and W2 can be plot-
ted in a Shewhart chart and monitored to determine if small process changes have
occurred in the fuel consumption.
The residual plot in Figure 9.16, with its accompanying control limits at +/—3,
represents a time period when a small process change occurred in the operation
of the steam turbine. Residual values plotted above the zero line indicate that
fuel usage exceeded the amount established in the HDS, while those plotted below
the zero line indicate that the opposite is true. Thus, a run of positive residuals
indicates that the unit is less efficient in operation than that established in the
baseline period. In contrast, a run of negative residuals indicates that the unit is
using less fuel than it did in the baseline period.
The trends in the graph in Figure 9.16 indicate that the unit became less efficient
around the time period labeled "upset." At that point, the residuals moved above
the zero line, implying fuel usage was greater than that predicted by the model
given in (9.7). Notice that, while this pattern of positive residuals is consistent,
the residuals themselves are well within the control limits of the chart. The only
exceptions are the spikes in the plot, which occur with radical load changes.
Another example of using trends from plots of the conditional terms of the T2
to detect small process shifts is given in Figure 9.17. This is a residual plot using
the regression model given in (9.10) for the vacuum on a condensing unit. The
upset condition, indicated with a label on the graph, occurred when a technician
inadvertently adjusted the barometric gauge used in the calculation of the absolute
9.8. Summary 191
Figure 9.17: Standardized residual plot of vacuum model with an upset condition.
pressure. After that point, the residual values shift upward, although they remain
within the control limits of the standardized residuals.
Plots of standardized residuals, such as those in Figures 9.15-9.17, provide a
useful tool for detecting small process shifts. However, they should be used with
an overall T2 chart in order to avoid the risk of extrapolating values outside the
operational range of the variable. While we seek to identify systematic patterns in
these plots, a set of rigorous run rules is not yet available. Thus, we recommend
that a residual pattern be observed for a extensive period of time before taking
action.
The proposed technique using the standardized residuals of the fitted models
associated with a specific conditional T2 term is not a control procedure per se.
Rather, it is a tool to monitor process performance in situations where standard
control limits would be too wide. Any consistent change in the process is of interest,
and not just signaling values. While there is some subjectivity involved in the
determination of a trend in these plots, process data often is received continuously,
so that visual inspection of runs of residuals above or below zero is readily available.
9.8 Summary
The T2 statistic can be enhanced by improving its ability to detect (1) abrupt
process changes as well as (2) gradual process shifts. Abrupt changes can be better
identified by correctly modeling in Phase I operations the functional relationships
existing among the variables. One means of doing this is by examining the square
root of the conditional terms in the T2 decomposition or the corresponding related
regression residual plots. These values represent the corresponding standardized
residuals obtained from fitting a regression model. Gradual process shifts can be
identified better in Phase II operations by monitoring the trends in the conditional

T2 terms or standardized residuals of these fits for all incoming observations.
A necessary first step in the application of any multivariate control procedure
is to thoroughly explore the process data. In addition, we also strongly recommend
consulting with the process engineer so that a sound control procedure is estab-
lished. If the model that is inherent to a particular T2 statistic can be improved,
it will require information about the functional relationships between the process
variables. This approach is summarized in the following quote taken from Myers
(1990):
"We cannot ignore input from experts in the scientific discipline in-
volved. Statistical procedures are vehicles that lead us to conclusions;
but scientific logic paves the road along the way.... [F]or these reasons,
a proper marriage must exist between the experienced statistician and
the learned expert in the discipline involved."
Graphical output is very valuable when monitoring a multivariate process. Con-

trol charts based on the overall T2 statistic are helpful in isolating signaling obser-
vations. In addition, plots corresponding to the important conditional terms in a
T2 decomposition are useful in detecting small process shifts. This is best achieved
using process knowledge to select the important conditional terms to be plotted.
Chapter 10
Autocorrelation in T2 Control
Charts
10.1 Introduction
Development and use of the T2 as a control statistic for a multivariate process has
required the assumption of independent observations. Certain types of processing
units may not meet this assumption. For example, many units produce time-
dependent or autocorrelated observations. This may be due to factors such as
equipment degradation, depletion of critical process components, environmental
and industrial contamination, or the effect of an unmeasured "lurking" variable.
The use of the T2 as a control statistic, without proper adjustment for a time
dependency, can lead to incorrect signals (e.g., see Alt, Deutsch, and Walker (1977)
or Montgomery and Mastrangelo (1991)).
In Chapter 4 we discussed detection procedures for autocorrelated data. These
included examination of trends in time-sequence plots of individual variables and
the determination of the pairwise correlation between process variables and a cate-
gorical time-sequence variable. In this chapter, we add a third procedure. We show
that special patterns occurring in the graph of a T2 chart can be used to indicate
the presence of autocorrelation in the process data. We also demonstrate that if
autocorrelation is detected and ignored, one runs the risk of weakening the overall
T2 control procedure. This happens because the main effect of the autocorrelated
variable is confounded with the time dependency. Furthermore, relationships with
other variables may be masked by the time dependency.
When autocorrelation is present, an adjustment procedure is needed in order
to obtain a true picture of process performance. In a univariate setting, one such
adjustment involves modeling the time dependency with an appropriate autore-
gressive model and examining the resulting regression residuals. The residuals are
free of the time dependency and, under proper assumptions, can be shown to be
independent and normally distributed. The resulting control procedure is based on
these autoregressive residuals (e.g., see Montgomery (2001)).
193
194 Chapter 10. Autocorrelation in T2 Control Charts
In this chapter, a somewhat similar solution is applied to the more complex

multivariate problem where we have to contend with a time dependency in multiple
variables and determine how these variables relate to the other process variables.
We first demonstrate how a time dependency can hide the true linear relationships
existing between the process variables. We then present a procedure for adjusting
the variables of an observation vector for a time dependency. As in the univariate
case, the control procedure based on the adjusted observations is shown to be free
of the time dependency.
10.2 Autocorrelation Patterns in 7"2 Charts

To illustrate the effect of autocorrelated observations on the behavior of a T2 statis-
tic, we first consider the behavior of the T2 statistic in a bivariate setting. Suppose
two variables, (xi, 0:2), are measured on an industrial process that exhibits no au-
tocorrelation. Time-sequence plots of the observations on each variable, including
the corresponding mean lines, are presented in Figures 10.1 and 10.2. Note that
there is no upward or downward trend in the points over the time span.
The graph of the corresponding T2 statistic, computed for the observation vector
Oi, x 2 ), using
is presented in Figure 10.3. As expected for a statistic based on random error, no

systematic patterns occur in the graph. The T2 values are very close to the zero
value, and this indicates that the location of the process is being maintained at the
value of the mean vector of the HDS. There is only random fluctuation and very
little variation in the T2 statistic except for one signaling point.
Consider a bivariate observation vector (xi, x^) from a process that decays over
time. In this type of process, both variables exhibit a linear trend over the sampling
period. Time-sequence plots of the observations on each variable, including their
Figure 10.1: Time-sequence plot of variable x\ with no time dependency.

10.2. Autocorrelation Patterns in T2 Charts 195
Figure 10.2: Time-sequence plot of variable x^ with no time dependency.
Figure 10.3: T chart for data without time dependency.
corresponding mean line, are presented in Figures 10.4 and 10.5. Note the upward
trend in the plot of the points in both graphs.
To investigate the effects of these two autocorrelated variables on the behavior
of the T2 statistic, we examine in Figure 10.6 a graph of the corresponding T2
statistic. Observe the very slight, [/-shaped curvature in the graph of the statistic
over the operational range of the two variables. Note also the large variation in
the T2 values and the absence of numerous values close to zero. This is in direct
contrast to the trends seen in Figure 10.3 for the variables with no time dependency.
Since the T2 statistic should exhibit only random fluctuation in its graph, further
examination is required in order to determine the reason for this systematic pattern.
The plots in Figures 10.4 and 10.5 of the correlated data indicate the presence of
large deviations from the respective mean values of both variables at the beginning
and end of their sampling periods. Since the T2 is a squared statistic, such a
Figure 10.4: Time-sequence plot of variable x\ with time dependency.
Figure 10.5: Time-sequence plot of variable x2 with time dependency.
trend produces large T2 values. For example, while deviations below the mean are
negative, squaring them produces large positive values. As the variables approach
their mean values (in time), the value of the T2 declines to smaller values. The
curved [/-shaped pattern in Figure 10.6 is thus due to the linear time dependency
inherent in the observations. This provides a third method for detecting process
data with a time dependency.
Autocorrelation of the form described in the previous example is a cause-effect
relationship between the process variables and time. The observation on the process
variable is proportional to the variable at some prior time. In other cases this time
relationship may be only empirical and not due to a cause-and-effect relationship.
In this situation, the current observed value is not determined by a prior value, but
only associated with it. The association is usually due to a "lurking variable."
10.2. Autocorrelation Patterns in T2 Charts 197
Figure 10.6: T2 chart with time dependency.
Figure 10.7: Time-sequence plot of process variable with cyclical time effect.
Consider the cyclic nature of the process variable depicted in Figure 10.7. The
cyclical or seasonal variation is due to the rise and fall of the ambient temperature
for the corresponding time period. This is illustrated in Figure 10.8. Cyclical or
seasonal variation over time is assumed to be based on systematic causes; i.e., the
variation does not occur at random, but reflects the influence of "lurking" variables.
Variables with a seasonal effect will have a very regular cycle, whereas variables with
a cyclical trend may have a somewhat irregular cycle. Such trends will be reflected
in the T2 chart, and the curved [/-shaped pattern seen previously in other T2 charts
may have short cycles.
A T2 chart including the cyclical process variable in Figure 10.7 with no adjust-
ment for the seasonal trend is presented in Figure 10.9. Close examination of the
run chart reveals a cyclic pattern due to the seasonal variation of the ambient tem-
perature. As the temperature approaches its maximum and minimum values, the
Figure 10.8: Seasonal trend of temperature in a time-sequence plot.
Figure 10.9: T2 chart containing seasonal trend; UCL = 5.99.
T2 statistic moves upward. When the temperature approaches its average value,
the T2 moves toward zero. Also notice the excess variation due to the temperature
swings in the T2 values.
The above examples illustrate a number of the problems occurring with autocor-
related data and the T2 statistic. Autocorrelation produces some type of systematic
pattern over time in the observations on the variables. If not corrected, the pat-
terns are transformed to nonrandom patterns in the T2 charts. As illustrated in
the following sections, these patterns can greatly affect signals. The presence of
autocorrelation also increases variation in the T2 statistic. This increased variation
can smother the detection of process movement and hamper the sensitivity of the
T2 statistic to small but consistent process shifts. As in other statistical procedures,
nonrandom variation of this form is explainable, but it can and should be removed.
10.3. Control Procedure for Uniform Decay 199
10.3 Control Procedure for Uniform Decay

The time dependency between process observations can take different forms. A
common form is that of uniform or continuous decay. This type of autocorrelation
occurs when the observed value of the process variable depends on some immediate
past value. For example, the variables displayed in Figures 10.4 and 10.5 both dis-
play uniform or continuous decay. Due to the linear trend in the graphs over time,
the present value of the observation can be predicted from a past value using an
appropriate first-order autogressive, AR(1), model (i.e., a regression model includ-
ing a linear time variable). Note that continuous or uniform decay does not have
to occur in a linear fashion. The relationship could be of another functional form,
such as quadratic; this would produce a second-order autoregressive, AR(2), model
(i.e., a regression model including both a quadratic and a linear time variable). We
now construct a multivariate control procedure for processes that exhibit uniform
decay.
The variation about the mean line of a variable, with no time dependency and
no relationship to "lurking" variables, is due only to random fluctuation. Examples
are exhibited in Figures 10.1 and 10.2. In contrast, the variation about the mean
line of a variable with a time dependency, as shown in Figures 10.4 and 10.5, is due
to both random error and the time dependency; i.e.,
To accurately assess the T2 statistic, this time effect must be separated and removed
from the random error.
As an example, reconsider the data for x\ given in Figure 10.4. The time
dependency can be explained by a first-order autoregressive model,
where /30 and (3\ are the unknown regression coefficients, x\^ is the current ob-
servation, and xi,t_i is the immediate prior observation. Since the mean of x\,
conditioned on time, is given by
the time effect can be removed using
The above relationship suggests a method for computing the T2 statistic for an
observation vector with some observations exhibiting a time dependency. This is
achieved using the formula
where X't = ( x i \ t , x 2 | t , . . . , x p \ t ] represents the sample mean of X conditioned on

time. For those variables with no time dependencies, Xj\t would simply reduce
to the unadjusted mean, Xj. However, for those variables with a first-order time
dependency, Xj\t would be obtained using a regression equation based on the model
in (10.2) or some similar autoregressive function. Thus, for an AR(1) process,
when no time dependency is present and
when a time dependency is present, where &o and b\ are the estimated regression
coefficients.
The common estimator of S for a sample of size n is usually given as
where X is the overall sample mean. However, if some of the components of the
observation vector X have a time dependency, S also must be corrected. This is
achieved by taking deviations from Xt; i.e.,
The variance terms of St will be denoted as Sj\t to indicate a time adjustment has
been made, while the general covariance terms will be designated as Sj.i,2,...,p-i|t-
Decomposition of T2t
Suppose the components of an observation vector with a time dependency have been
determined using the methods of section 10.1, and the appropriate autoregressive
functions fitted. We assume that Xt and St have been computed from the HDS.
To calculate the T2 value for a new incoming observation, we compute (10.3).
The general form of the MYT decomposition of the T2 value associated with a
signaling p-dimensional data vector X' = ( x i , . . . ,xp) is given in Chapter 8. The
decomposition of T2 follows a similar procedure but uses time adjustments similar
to (10.4).
If a signal is observed, we decompose the T2 statistic, adjusted for time effects,
as follows:
The unconditional terms are given as
If Xj has no time dependency, this is the same unconditional term as given in

Chapter 8. However, if Xj has a time dependency, computing the term in (10.7)
removes the time effect. Similarly, a general conditional T2 term is computed as
10.4. Example of a Uniform Decay Process 201
Close examination reveals how the time effect is removed. Consider the term
The conditional mean can be written as
so that the square root of the numerator of the T2 term becomes
Observations on both x\ and x<2 are corrected for the time dependency by subtract-
ing the appropriate xt term. The standard deviation is time corrected in a similar
manner.
10.4 Example of a Uniform Decay Process

We demonstrate the construction of a T2 control procedure for a uniform decay
process by considering a chemical process where observations are taken on a re-
actor used to convert ethylene (C2H4) to EDC. EDC is a basic building block for
much of the vinyl product industry. Input to the reactor in the form of feedstock
is hydrochloric acid gas (HCL) along with ethylene and oxygen (O 2 ). The split
among the three feed components is a constant ratio and yields a perfect correla-
tion among these components. To avoid a singularity problem, we record only one
variable as feed flow (Feed). The process of converting the feedstock to EDC takes
place in a reactor under high temperature, and the process is known as OHC. Al-
though reactor temperature readings are available from many different locations on
a reactor, we will only use the average temperature (Temp) of the reactor column
in our example.
Many types of OHC reactors are available to perform the conversion of ethylene
and HCL to EDC. All involve the monitoring of many process variables. We con-
sider only two process variables, labeled x\ and x^. The type of reactor is a fixed-lif
or fixed-bed reactor and must have critical components replaced at the end of each
run cycle as the components are slowly depleted during operation. Performance
of the reactor is directly related to the depletion of the critical components. Best
performance, as measured by percentage conversion, is at the beginning of the run
cycle, and the unit gradually becomes less efficient during the remainder of the
cycle. This inherent uniform decay in the performance of the reactor produces a
time dependency in many of the resulting process and quality variables.
Decay reactors of this type differ from steady-state processes. For the steady-
state case, reactor efficiency stays relatively constant, the efficiency variable will
contain very little variation (due to the steady-state conditions), and its operation
range will be small. Any significant deviation from this range should be detected
by the process control procedure. However, over the life cycle of a uniform decaying
reactor, the unit efficiency might have a very large operational range. For instance,
it might range from 98% at the beginning of a cycle to 85% at the end of the
Table 10.1: Correlation matrix with time-sequence variable.

Time Feed XI X2 Temp
Time 1 0.037 0.880 0.843 0.691
Feed 0.037 1 -0.230 -0.019 0.118
XI 0.880 -0.230 1 0.795 0.737
X2 0.843 -0.019 0.795 1 0.392
Temp 0.691 0.118 0.737 0.392 1
cycle and thus would contain more variation than a steady-state variable, which
would remain relatively constant. If we fail to consider the decay in the process,
any efficiency value between 85% and 98% would be acceptable, even 85% at the
beginning of a cycle.
As discussed in Chapter 8, a deviation beyond its operational range (established
using in-control historical data) for a process variable can be detected using the
corresponding unconditional T2 term of the MYT decomposition. In addition,
incorrect movement of the variable within its range because of improper linear
relationships with other process variables can be detected using the conditional T2
terms. However, this approach does not account for the effects of movement due
to time dependencies.
10.4.1 Detection of Autocorrelation

A correlation matrix based on 79 observations for the four variables Feed, x\, ^2,
and Temp, taken from a reactor process, is presented in Table 10.1. Also included
is a time-sequence variable, labeled Time. Note the moderate-to-large correlations
between the three process variables and the time-sequence variable. Also, note the
virtual absence of correlation between Feed and Time.
To demonstrate the time decay in the measured temperatures, we present a time-
sequence plot in Figure 10.10 of the average temperature of the reactor during a
good production run. The graph indicates that the average temperature is gradually
increasing over the life cycle of the unit. The temperature increase is due to the
decay of the reactor. For example, if the reactor is "coking" up, it takes more heat
at the end of a cycle to do the same or less "work" than at the beginning of a cycle.
Figures 10.11 and 10.12 contain time-sequence plots of the other two process
variables, x% and x\. The decay effect for x% in Figure 10.11 has the appearance
of an AR(1) relationship, while the decay effect for x\ in Figure 10.12 has the
appearance of some type of a quadratic (perhaps a second-order) or an exponential
autoregressive relationship. However, for simplicity, we will fit an AR(1) model to
Xi.
Figure 10.13 contains a time-sequence plot of the Feed variable. Notice that
during a run cycle, the feed to the reactor is somewhat consistent and does not
systematically vary with time.
10.4.2 Autoregressive Functions

Figure 10.11 gives some indication of a time dependency for the process variable
£2- This is supported by the moderate-to-strong correlation, 0.843, between x% and
Figure 10.10: Time-sequence plot of reactor temperature.
Figure 10.11: Time-sequence plot of x<2.
the time-sequence variable. Confirmation also is given in the analysis-of-variance

table presented in Table 10.2 for an AR(1) model fit. The regression fit is highly
significant (p < 0.0001), and it indicates that there is a linear relationship between
the current value of x^ and its immediate past value. The corresponding regression
statistics for this model are presented in Table 10.3. These results indicate the lag
X2 values explain over 56% of the variation in the present values of x^.
A plot of the raw regression residuals from the AR(1) model for x<2 is presented in
Figure 10.14. The plot indicates the presence of possibly another factor contributing
to the variation present in x^. Note the predominance of values below the zero line
at the beginning of the cycle and the reverse of this trend at the end of the cycle.
Figure 10.12: Time-sequence plot of x\.
Figure 10.13: Time-sequence plot of feed.
Table 10.2: ANOVA table for AR(1) model for x2.
df SS MS F Significance of F
Regression 1 2077.81 2077.81 99.81 < 0.0001
Residual 77 1602.93 20.82
Total 78 3680.74
This nonrandom trend coupled with the moderate value of 0.565 for the R2 statistic
supports our belief.
Process variable x\ shows a strong upward linear time trend in its time-sequence
plot given in Figure 10.12. This is confirmed by its high correlation (0.880) with
Table 10.3: Regression statistics for variable x-^.

R2 0.565
Adjusted R2 0.559
Table 10.4: ANOVA table for AR(1) model for Xl.
Regression 1 11.69 11.69 518.64 < 0.000
Residual 77 1.73 0.02
Total 78 13.42
Figure 10.14: Plot of x% residuals versus time.
time given in Table 10.1. The analysis-of-variance table from the regression analysis
for an AR(1) model for this variable is presented in Table 10.4. The fit is highly
significant (p < 0.000) and indicates that there is a linear relationship between
x\ and its immediate past value. Summary statistics for this fit are presented in
Table 10.5. The large R2 value, 0.871, in addition to the small residuals, given in
the residual plot in Figure 10.15, indicates a good fit over most of the data. The
increase in variation at the end of the plot is due to decreasing unit efficiency as
the unit life increases.
The third variable to show a time dependency is the average reactor temperature
(Temp). As noted in Figure 10.10, reactor temperature has a nonlinear (i.e., curved)
relationship with time. Thus, an AR(2) model of the form
Table 10.5: Regression statistics for x\.

R2 0.871
Adjusted R2 0.869
Figure 10.15: Plot of xi residuals versus time.
where the (3j are the unknown regression coefficients, might result in decreasing the
error seen at the end of the cycle in Figure 10.10. However, for simplicity, we will
use the AR(1) model.
Although the pairwise correlation between this variable and the time-sequence
variable is only 0.691 in Table 10.1, this is mainly a result of the flatness of the plot
at the earlier time points. The analysis-of-variance table for the AR(1) model fit to
the average temperature is presented in Table 10.6. The fit is significant (p < 0.000)
and indicates that there is a linear relationship between average temperature and
its immediate past value. Summary statistics for the AR(1) fit are presented in
Table 10.7. The R2 value of 0.565 is moderate, and the larger residuals in the
residual plot in Figure 10.16 confirm this result.
For the three variables exhibiting some form of autocorrelation, the simplest au-
toregressive function was fit. This is to simplify the discussion of the next section.
The fitted AR(1) models depend only on the first-order lag of the data. A substan-
tial amount of lack of fit was noted in the discussion of the residual plots. These
models could possible be improved by the addition of different lag terms. The use
of a correlogram, which displays the lag correlations as a function of the lag value
(see section 4.8), can be a useful tool in making this decision. The correlogram for
the three variables xi, x-z, and Temp is presented in tabular form for the first three
lags in Table 10.8. In the case of variable xi, the correlogram suggests using all
three lags as the lag correlations remain near 1 for all three time points.
Figure 10.16: Plot of temperature residuals versus time.
Table 10.6: ANOVA table for AR(1) model for temp.
Regression 1 4861.44 4861.44 108.63 < 0.000
Residual 77 3445.93 44.75
Total 78 8307.37
Table 10.7: Regression statistics for temperature.

R2 0.585
Adjusted R2 0.580
Table 10.8: Correlations for different lags.

Lag 1 Lag 2 Lag 3
Temp 0.76 0.74 0.61
X2 0.75 0.67 0.64
Xl 0.93 0.92 0.90
10.4.3 Estimates
Using the results of section 10.3, we can construct time-adjusted estimates of the
mean vector and covariance matrix for our reactor data. For notational purposes,
the four variables are denoted by x\ (for process variable xi), x% (for process variable
^2)5 £3 (for Temp), and x^ (for Feed). The estimate of the time-adjusted mean
Table 10.9: Correlation matrix with time adjustment.

Time Feed x\\t *2|t Temp*
Time 1 0.037 0.184 0.322 0.238
Feed 0.037 1 -0.069 0.001 0.203
x 0.184 -0.069 1 0.586 -0.026
i\t
X
2\t 0.322 0.001 0.586 1 -0.342
Temp* 0.238 0.203 -0.026 -0.342 1
vector is given as
where
Since Feed has no time dependency, no time adjustments are needed, and the
average of the Feed data is used.
Removing the time dependency from the original data produces some interesting
results. For example, consider the correlation matrix of the 79 observations with
the time dependency removed. This is calculated by computing
and converting the covariance estimate to a correlation matrix. The resulting esti-
mated correlation matrix is presented in Table 10.9. In contrast to the unadjusted
correlation matrix given in Table 10.1, there now is only a weak correlation between
the time-sequence variable and each of the four process variables. Other correlations
not directly involving the time-sequence variable were also affected. For example,
the original correlation between temperature and x\ was 0.737. Corrected for time,
the value is now —0.026. Thus, these two variables were only correlated due to
the time effect. Also, observe the correlation between x\ and #2- Originally, a
correlation of 0.795 was observed in Table 10.1, but correcting for time decreases
this value to 0.586.
The T2 values of the preliminary data are plotted in Figure 10.17. These values
are computed without any time adjustment. Close inspection of this graph reveals
the U-shaped trend in the data that is common to autocorrelated processes. The
upward trend prevails more at the end of the cycle than at its beginning. This is
mainly due to the instability of the reactor as it nears the end of its life cycle.
Consider the T2 graph for the time-adjusted data. This is presented in Figure
10.18. In comparison to Figure 10.17, there is no curvature in the plotted points.
However, the expanded variation at the end of the life cycle is still present, as it
Figure 10.17: T2 chart with no time adjustment.
Figure 10.18: T2 chart with values adjusted for time.
is not due to the time variable. Note also that this plot identifies eight outliers
in the data set as compared to only five outliers being detected in the plot of the
uncorrected data.
10.4.4 Examination of New Observations

Fourteen new observations, taken in sequence near the end of a life cycle of the
reactor, are examined for signals. The observations and their corresponding (first-
order) lag values are presented in Table 10.10. The three lag variables are denoted
as I/xi, Z/^2, and Ltemp.
The adjusted observations of Figure 10.18, with the eight outliers removed, were
used as the HDS, and these 14 new values were checked for signals. The resulting
T2 values are given in Figure 10.19. Using an a = 0.05, a signal is observed for
Table 10.10: New observations on reactor data.

Obs.
No. Feed XI X2 Temp L#i Lx2 Ltemp
1 188,300 0.98 44.13 510 0.98 44.13 510
2 189,600 0.81 33.92 521 0.98 44.13 510
3 198,500 0.46 28.96 524 0.81 33.92 521
4 194,700 0.42 29.61 521 0.46 28.96 524
5 206,800 0.58 29.31 530 0.42 29.61 521
6 198,600 0.63 28.28 529 0.58 29.31 530
7 205,800 0.79 29.08 534 0.63 28.28 529
8 194,600 0.84 30.12 526 0.79 29.08 534
9 148,000 0.99 39.77 506 0.84 30.12 526
10 186,000 1.19 34.13 528 0.99 39.77 506
11 200,200 1.33 32.61 532 1.19 34.13 528
12 189,500 1.43 35.52 526 1.33 32.61 532
13 186,500 1.10 34.42 524 1.43 35.52 526
14 180,100 0.88 37.88 509 1.10 34.42 524
Figure 10.19: Values for new observations on the reactor.
observation 10. The corresponding T2 value of 27.854 is decomposed for signal

interpretation.
We begin by examining the T2 values for the four unconditional terms. These
are given as
T2 = 1.280 (process variable xi),
T$ =4.195 (process variable £2),
T32 = 17.145** (Temp),
Tl = 1.312 (Feed),
where the symbol (**) denotes that the unconditional T2 term for temperature
10.5. Control Procedure for Stage Decay Processes 211
produces a signal as it exceeds the critical value of 7.559. The usual interpretation
for a signal on an unconditional term is that the observation on the variable is
outside the operational range. However, for time-adjusted variables, the implication
is different. In this example, the observed temperature value, 526, is not where it
should be relative to its lag value of 506. For observation 10, the increase in
temperature from the value observed for observation 9 was much more than that
predicted using the historical data.
Removing the Temp variable from observation 10 and examining its subvector
(Feed, xit, x^t] produced a T2 value of 15.910. When compared to a critical value
of 13.38, a signal was still present. Further decomposition of the T2 value on this
subvector produced the following two-way conditional T2 terms:
where the symbol (**) denotes the term that exceeds the critical value of 7.675.
There are two signaling conditional terms, and these imply that the relationship
between the operational variables x% and 0:4 (Feed), after adjustment for time, does
not agree with the historical situation.
These results indicate the need to remove the process variables x2 and x 4 . With
their removal, the only variable left to be examined is x\. However, the small value
of the unconditional T2 term, T2 = 1.3124, indicates that no signal is present in
observation 10 on this variable.
10.5 Control Procedure for Stage Decay Processes

Not all autocorrelation is of the form of uniform or continuous decay, where present
performance values are dependent on some immediate past value. For example, in
stage decay, the time dependency is between different stages of the process. Process
performance in the second stage may be dependent on first-stage performance, and
performance in the third stage may be dependent on the performance of the process
in stages 1 and 2. The control procedure at any stage (other than the first) must be
adjusted for performance in the previous stages. Process monitoring detects when
significant deviation occurs from the expected adjusted performance. An overview
of how this is done is contained in Mason and Young (2000) and more extensive
details and examples are found in Mason, Tracy, and Young (1996).
10.6 Summary
The charting of autocorrelated multivariate data in a control procedure presents a
number of serious challenges. A user must not only examine the linear relationships
existing between the process variables to determine if any are unusual, but also
adjust the control procedure for the effects of the time dependencies existing among
these variables. This chapter presents one possible solution to problems associated
with constructing multivariate control procedures for processes experiencing either
uniform decay or stage decay.
Autocorrelated observations are common in many industrial processes. This is
due to the inherent nature of the processes, especially any type of decay process.
Because of the potentially serious effects of autocorrelation on control charts, it
is important to be able to detect its presence. We have offered two methods of
detection. The first involves examining the correlations between each variable and
a constructed time-sequence variable. Large correlation will imply some type of time
dependency. Graphical techniques are a second aid in detecting time dependencies.
Trends in the plot of an individual variable versus time will give insight to the type
of autocorrelation that is present. Correlogram plots for individual variables also
can be helpful in locating the lag associated with the autocorrelation.
For uniform decay data that can be fit to an autoregressive model, the cur-
rent value of an autocorrelated variable is corrected for its time dependency. The
proposed control procedure is based on using the T2 value of the time-adjusted
observation and decomposing it into components that lead to an interpretation of
the time-adjusted signal. The resulting decomposition terms can be used to moni-
tor relationships with the other variables and to determine if they are in agreement
with those found in the HDS. This property is also helpful in examining stage-
decay processes as the decay occurs sequentially and thus lends itself to analysis by
repeated decompositions of the T2 statistic obtained at each stage.
Chapter 11
2
The T Statistic and Batch
Processes
11.1 Introduction
Our development of a multivariate control procedure has been limited to applica-
tions to continuous processes. These are processes with continuous input, continu-
ous processing, and continuous output. We conclude the text with a description of
a T2 control procedure for batch processes. These are processes that use batches
as input (e.g., see Fuchs and Kenett (1998)).
There are several similarities between the T2 procedures for batch processes
and for continuous processes. Phase I still consists of constructing an HDS, and
Phase II continues to be reserved for monitoring new (future) observations. Also,
control procedures for batch processes can be constructed for the overall process,
or for individual components of the processing unit. In some settings, multiple
observations on the controlled component may be treated as a subgroup with control
based on the sample mean. In other situations, a single observation may be used,
such as monitoring the quality of the batch or grade being produced.
Despite these similarities, differences do exist when monitoring batch processes.
For example, the estimators of the covariance matrix and the overall mean vector
may vary. Changes also may occur in the form of the T2 statistic and the probability
function used to describe its behavior. A detailed discussion of the application of
the T2 statistic to batch processes can be found in Mason, Chou, and Young (2001).
11.2 Types of Batch Processes

There are two basic types of batch processes. The first will be referred to as a
Category 1 batch process. In this category, observations on different batches are
assumed to come from the same multivariate distribution, having a common mean
213
214 Chapter 11. The T2 Statistic and Batch Processes
Figure 11.1: Category 1 process with batch input.
Figure 11.2: Control region for a continuous process.
vector /^ and a common covariance matrix S. Very little variation is tolerated

between batches.
For example, in certain types of glass production, the processing and the finished
glass (output) are continuous. However, feedstock construction for this operation
is a batch process since consecutive batches of feedstock are fed on a continuous
basis to the production system. Thus, it is very important to maintain the quality
characteristics of the individual batches that are used as input to the system. When
this is achieved, it guarantees uninterrupted processing and a continuous flow of
quality glass; otherwise, fouled batch input can interrupt the entire system. An
example of a process with batch input is illustrated in Figure 11.1.
Figure 11.2 represents a control region for a steady-state continuous process
containing two variables, x\ and #2- Note that all observations on this process
would be contained within the designated ellipse. The process mean is represented
by the dark circle located in the center of the ellipse. In contrast, the control region
11.2. Types of Batch Processes 215
Figure 11.3: Control regions for Category 1 batch process.
Figure 11.4: Example of a process with batch output.
for a Category 1 batch process is presented in Figure 11.3. A different ellipse

represents each batch. The lightly shaded circles represent the individual batch
means and the dark circle in the middle of the plot represents the overall mean.
Note the closeness of the different batch means.
In some processes, input is continuous, but changing the processing component
produces changes in the output. For example, certain plastic-producing units use
the same feedstock as input. However, changing the amount of hydrogen, or other
control variables, in the processing component will produce a different finished
product (output). The work order on such a unit is usually stated in terms of the
number of tons of a variety of products. Over a given period of time, different
batches of each product, with the same quality characteristics, are produced. This
type of process is illustrated in Figure 11.4.
Batch processing can also occur when the input, processing, and output are con-
tinuous. A decaying unit provides such an example. When the efficiency of the unit
declines to a certain level, the unit is shut down and refurbished before restarting
Figure 11.5: Example of a process with batch runs.
production. In this setting, the individual production runs may be considered as

batches. An evaporator used in removing water from a caustic soda-water solution
is an example of such a production unit. In addition to caustic soda and water,
the solution contains large amounts of salt, which fouls the evaporator under the
high temperatures. When the efficiency of the evaporator drops, the unit must be
shut down, cleaned (removal of the salt), and restarted. This type of process is
The processes in Figures 11.4 and 11.5 represent the second type of batch pro-
cess, labeled Category 2. This category relaxes the stringent condition that all
observations come from the same multivariate distribution. Mean variation is ac-
ceptable among the k different runs or batches, although it is assumed that the
covariance matrix is constant across batches. Each batch is characterized by its
own multivariate distribution with the same covariance matrix, S, but with a pos-
sibly different mean vector ^, where //j, i = 1,2, . . . , / c , denotes the population
mean of the ^th batch.
A control region for a Category 2 batch process is presented in Figure 11.6.
The small ellipses represent the control regions for the different batches. The
large ellipse represents the acceptable overall region that is determined by customer
specifications.
11.3. Estimation in Batch Processes 217
Figure 11.6: Control region for a Category 2 batch process.
11.3 Estimation in Batch Processes

Consider observations taken from a Category 1 batch process. These observations
are assumed to be from the same multivariate distribution with only random vari-
ation between batches. A known mean vector, p, = (/î, //2 5 • • • 5/^)5 is referred to
as the target. However, if the mean vector is unknown, it is estimated using the
overall mean of the data from all batches.
Suppose we have k batches of size m, r i 2 , . . . , n^ and each ^-dimensional ob-
servation vector is denoted by X i j , where j — 1, 2 , . . . . HI indicates the observation
within the batch and i = 1, 2 , . . . , k denotes the batch. The estimate of the overall
mean is given as
where X% represents the mean of the ith batch. The total sample size N is obtained
as the sum of the batch sizes, i.e., N = n\ + n? + • • • + n^. The estimate of the
covariance matrix is computed as
The quantity SSr in (11.2) is referred to as the total sum of squares of varia-
tion. It can be separated into two separate components. One part, referred to as
the within-variation, represents the variation within the batches. The other part,
labeled the between-variation, is the variation between the batches. We write this
as
The component 883 represents the between-batch variation and, when signifi-
cant, can distort the common estimator 5. However, for Category 1 processes, it is
assumed that the between-batch, as well as the within-batch, variation is minimal
and due to random variation. Therefore, for a Category 1 situation, we estimate the
overall mean using (11.1) and the covariance matrix using (11.2). We emphasize
that these are the appropriate estimates only if we adhere to the basic assumption
that a single multivariate distribution can describe the process.
For a Category 2 process, we have multiple distributions describing the process.
Multiplicity comes from the possibility that the mean vector of the various batches
may differ, i.e., that /^ ^ Hj for all i and j. For this case, the overall mean is still
estimated using (11.1). However, the covariance matrix estimator in (11.2) is no
longer applicable due to the effects of the between-batch variation.
As an illustration of the effects of between-batch variation, consider the plot
given in Figure 11.7 for two variables, x\ and x-2, and two batches of data. The
orientation of the two sets of data implies that x\ and x% have the same correlation
in each batch, but the batch separation implies that the batches have different
means. If the batch classification is ignored, the overall sample covariance matrix,
5, will be based on deviations taken from the overall mean, indicated by the center
of the ellipse, and will contain any between-group variation.
For a Category 2 process, the covariance matrix is estimated as
where Si is the covariance matrix estimate for the iih batch and SSw represents
the within-batch variation as defined in (11.3). Close inspection of (11.4) reveals
the estimator Sw to be a weighted average (weighted on the degrees of freedom)
of the within-batch covariance matrix estimators. With mean differences between
the batches, the common estimator obtained by considering the observations from
all batches as one group would be contaminated with the between-batch varia-
tion, represented by 885. Using only the within-batch variation to construct the
estimator of the common covariance matrix will produce a true estimate of the rela-
tionships among the process variables. We demonstrate this in a latter example (see
section 11.7).
11.4. Outlier Removal for Category 1 Batch Processes 219
Figure 11.7: Control region containing two batches of data.
11.4 Outlier Removal for Category 1

Batch Processes
Consider a Category 1 batch process. The T2 statistic and its distribution for a
Phase I operation is given as
where X and S are the common estimators obtained from (11.1) and (11.2), respec-
tively, and -B( p /2,AT-p-i/2) represents the beta distribution with parameters (p/2)
and ((TV — p — l)/2), where N is total sample size (all batches combined). The
UCL, used for outlier detection in a Phase I operation, is given as
where -B[ a ,p/2,(w-p-i)/2] ig the upper ath quantile of -B[p/2,(w-p-i)/2]- For this
category, the distribution of the T2 statistic and the purging procedure for outlier
removal is the same as those used for a continuous process.
We emphasize that the statistic in (11.5) can only be used when there is no
between-batch variation. All observations from individual batches must come from
the same multivariate distribution. This assumption is so critical that it is strongly
recommended that a test of hypothesis be performed to determine if the batch
means are equal. This creates the dilemma of whether to remove outliers first or
to test the equality of the batch means, since mean differences could be due to
individual batches containing atypical observations.
As an example, reconsider the two-variable control region and data set illus-
trated in Figure 11.7. If one treats the two batches of data as one overall group,
the asterisk in the middle of the ellipse in the graph represents the location of the
overall mean vector. Since deviations used in computing the common covariance
matrix S are taken from this overall mean, the variances of x\ and x% would be
considerably larger when using S instead of Sw Also, the estimated correlation
between the two variables would be distorted, as the orientation of the ellipse is not
the same as the orientation of the two separate batches. Finally, note that the two
potential outliers in the ellipse would not be detected using the common estimator
S as these points are closer to the overall mean than any of the other points of the
two separate batches.
The solution to the problem of outliers in batches is provided in Mason, Chou,
and Young (2001). These authors recommend the following procedure for this
situation.
Step 1. Center all the individual batch data by subtracting the particular batch
mean from the batch observation; i.e., compute
where i = 1, 2 , . . . , k and j = 1, 2 , . . . , m. With this translation, Y^- has a zero

mean vector and a covariance matrix of S, as any possible mean difference has
been removed by translation to a zero mean.
Step 2. Using the translated data, construct a covariance estimator given by
Step 3. Use the T2 statistic and the distribution given by
and remove outliers following the established procedures (see Chapter 5).
Step 4. After outlier removal, Sw and X must be recalculated using only
the retained observations. To test the hypothesis of equal batch means, apply the
outlier removal procedure to the batch mean vectors. The T2 statistic for this
procedure is given as
where S\y is the sample covariance matrix computed using (11.7) and the translated
data with the individual outliers removed. The distribution of the statistic in (11.9),
under the assumption of a true null hypothesis, is that of an F variable (i.e., see
Wierda (1994)), and is given by
11.5. Example: Category 1 Batch Process 221
where ^(p, n fc-fc-p+i) represents the F distribution with parameters (p) and (nk —
k — p + 1). For a given a level, the UCL for the T2 statistic is computed as
where ^( Q ,p,nfc-fc-p+i) is the upper aih quantile of F(p,nk-k-p+i)- The T2 value

in (11.9) is computed for each of the k batch means and compared to the UCL in
(11.11). Batch means with T2 values that exceed the UCL are declared outliers,
and the corresponding batches are removed.
With this procedure, we have accomplished the goals of removing all observation
outliers, and all atypical batches. We are now ready to obtain an estimate of the
target mean vector. Using the retained batches, compute the target mean estimate
using (11.1). The corresponding estimate of the covariance matrix is provided by
(11.2). The common covariance matrix estimator S is used instead of Sw because
whatever between-variation remains is only due to inherent process variation (see
Alt (1982) and Wierda (1994)). Also, the estimator S is a more efficient estimator
of E than Sw when there are no mean differences between the batches.
11.5 Example: Category 1 Batch Process

To demonstrate the above procedure, we consider a preliminary data set of ap-
proximately 100 observations on three separate batches where monitoring is to
be imposed on two process variables. Scatter plots of the two variables for the
Figure 11.8: Scatter plot for Batch 1.

three separate batches are presented in Figures 11.8, 11.9, and 11.10, respectively.
Observe the same general shape of the data swarm for each of the different batches.
Note also potential outliers in each batch. For example, the observation located
at the extreme left end of the data swarm of Figure 11.8 and the cluster of points
located in the extreme right-hand corner of Figure 11.9.
Summary statistics for the three separate batches are presented in Table 11.1.
Observe the similarities among the statistics for the three batches, especially for the
Table 11.1: Summary statistics for batches.

Batch 1 Batch 2 Batch 3 Translated
Sample Size 99 100 100 299
xi Mean 1518.9 1570.6 1560.1 0.0
x<2 Mean 15.6 16.1 16.0 0.0
xi Std Dev 366.8 321.4 330.7 338.9
X2 Std Dev 36.6 32.3 33.3 34.0
Correlation (x\ , £2) 0.996 0.994 0.995 0.995
Figure 11.11: Scatter plot of combined translated data.
pairwise correlations between x\ and x^. This is also true for the separate standard
deviations for each variable.
Centering of the data for the three separate batches is achieved by subtracting
the respective mean. For example, the translated vector based on centering the
observations of Batch 1 is obtained using
The summary statistics for the combined translated data is given in the last column
of Table 11.1, and a graph of the translated data is presented in Figure 11.11.
Observe the similarities between the standard deviations of the individual variables
in the three separate batches with the standard deviation of the variables in the
overall translated batch. Likewise, the same is true for the pairwise correlation
between the variables in the separate and combined batches.
Only one observation appears as a potential outlier in the scatter plot presented
in Figure 11.11. This is observation 46 of Batch 1, and it is located at the (extreme)
left end of the data swarm. This observation also was noted as a potential outlier
in a similar scatter plot of the Batch 1 data presented in Figure 11.8. A T2 chart
based on (11.8) and the combined translated data is presented in Figure 11.12. The
Figure 11.12: T2 chart for combined translated data.
Figure 11.13: T2 chart of HDS.
first 99 T2 values correspond to Batch 1; the second 100 values correspond to Batch
2; and the last 100 values refer to Batch 3. The results confirm that the T2 value
of observation 46 exceeds the UCL. Thus, it is removed from the data set.
A revised T2 value chart, based on 298 observations, is given in Figure 11.13.
The one large T2 value in the plot corresponds to observation 181 from Batch 2.
However, the change in TV from 299 to 298, by excluding observation 46 in Batch
1, does not reduce the UCL sufficiently to warrant further deletion. Thus, the
previous removal of the one outlier is adequate to produce a homogeneous data set.
The distribution of the T2 statistic is verified by examining a Q-Q plot of the
HDS. This plot is presented in Figure 11.14. The plot has a strong linear trend,
and no serious deviations from it are noted other than the few points located in
the upper right-hand corner of the plot. The extreme value is observation 181 from
Figure 11.14: Q-Q plot of translated data.
Table 11.2: Summary statistics for HDS.

Variable Sample Size Mean Std Dev
Xl 298 1549.9 339.68
X2 298 15.9 3.44
Table 11.3: T2 values of batch means.

Batch No. T2 Value
1 0.009
2 0.006
3 0.002
Figure 11.13. It appears that the beta distribution can be used in Phase I analyses,
and the corresponding F distribution should be appropriate for Phase II operations.
Using the combined translated data with the single outlier removed, estimates
of Sw and X are obtained and the mean test given in Step 4 of section 11.5 is
performed. Summary statistics for the HDS are presented in Table 11.2. Very close
agreement is observed when the overall mean and standard deviation are compared
to the individual batch means and standard deviations of Table 11.1.
The T2 values for the three individual batch means are computed using (11.9)
and are presented in Table 11.3. All three values are extremely small due to the
closeness of the group means to the overall mean (see Tables 11.1 and 11.2). When
compared to the UCL value of 0.0622 as computed using (11.11) with p = 2, k = 3,
and n ~ 100, none are significantly different from the others. From these results,
we conclude that all three batches are acceptable for use in the HDS.
11.6 Outlier Removal for Category 2

Batch Processes
For Category 2 batch processes, the batch means are known to differ. Thus, param-
eter estimation involves the estimation of the common covariance matrix and the
target mean vector under the assumption that the batches are taken from different
multivariate distributions. Detecting atypical observations is still a concern, but
now there is the additional problem of detecting atypical batches.
As in a Category 1 batch process, we can identify individual outliers in a Cate-
gory 2 batch process using the mean-centering procedure described in section 11.6.
However, a different procedure is needed to identify batch outliers. First, we must
establish an acceptable target region for all the batches. This is depicted as the
large ellipse that encompasses the individual batches given in Figure 11.6. This
region (ellipse) is the equivalent of a control region established on the individual
batch means, denoted by X±, X^1..., Xk-
Assuming the batch mean data also is described by a multivariate normal dis-
tribution, a control region based on a T2 statistic can be developed. The form and
distribution of the statistic is given as
where #(fc/2,(fc-p-i)/2) represents the beta distribution with parameters (k/2) and
((k—p— l)/2), SB = SSp/k is the covariance estimate defined in (11.3), and X is
the overall mean computed using (11.1). The corresponding UCL is given by
where B[ a ,fc/2,(fc-p-i)/2] is the upper ath quantile of B[k/2,(k-p-i)/2]-

The T2 statistic in (11.12) is based on the between-batch variation SB, which de-
scribes the relationships between the components of the batch mean vectors rather
than of the individual observations. For example, in the Category 2 batch process
example presented in Figure 11.6, the correlation between the two process variables
is positive, while the correlation between the batch mean components is negative.
The downward shifting of the overall process could be due to an unknown "lurking"
variable or to a known uncontrollable variable.
Any batch whose T2 value computed using (11.12) exceeds the UCL specified
in (11.13) is excluded from the target region. The estimate of the target mean is
computed from the retained batches in the target region. These are also used to
obtain Sw, the estimate of the common covariance matrix.
11.7 Example: Category 2 Batch Process

Consider the scatter plot presented in Figure 11.15 for 11 observations on two pro-
cess variables, (xi, £2), f°r three different batches from a Category 2 batch process.
Figure 11.15: Scatter plot of three batch processes.
Table 11.4: Batch summary statistics.

Batch No. Mean x\ Std Dev £1 Mean £2 Std Dev £2 Correlation
1 133.36 3.60 200.78 4.76 -0.437
2 149.08 3.50 202.59 4.17 -0.706
3 147.61 3.91 190.6 6.76 -0.779
Overall 143.33 8.04 197.99 7.45 -0.390
Translated 0 3.56 0 5.17 -0.650
For this industrial situation, it is important to maintain the relationship between

the process variables as well as to maintain the tolerance on the individual vari-
ables. Taken together, these data were collected under good operating conditions
and contain no obvious outliers. Observe that the data pattern for each independent
batch is similar. However, there is more variation for the second process variable,
£2, in the third batch than for the other two batches. Close examination of Batch
3 indicates the presence of three to four possible outliers (i.e., points outside the
ellipse). Some of these observations lie between Batches 1 and 2 and appear to be a
substantial distance from the mean of Batch 3. Thus, they are somewhat removed
from the remaining observations of the batch.
Summary statistics for the three batches are presented in Table 11.4. Given the
small sample size (i.e., n = 11), the difference among the pairwise correlation coef-
ficients as well as between the individual standard deviations for the three groups
is not significant. Large variation among correlation coefficients of the variables is
not abnormal for small samples. However, for larger samples this could imply a
change in the covariance structure.
The overall batch statistics are given in the fourth row of Table 11.4. The overall-
batch correlation of —0.390 is lower than any of the within-batch correlations
Figure 11.16: Translated batch process data.
and considerably lower than those for Batches 2 and 3. This is due to the mean
separation of the individual batches. Even though all three batches represent an
in-control process, combining them into one data set can mask the true correlation
between the variables or create a false correlation. Note also the difference between
the standard deviations of the variables within each batch and the standard devi-
ation of the variables for the overall batch. The latter is larger for x\ and nearly
twice as large for x% and does not reflect the true variation for an individual batch.
We begin the outlier detection procedure by translating the three sets of batch
data to a common group that is centered at the origin. This is achieved by subtract-
ing the respective batch mean from each observation vector within the batch. For
an observation vector from Batch 1, we use (xu — 133.36, #12 — 200.78); for Batch
2, we compute (xî — 149.08, ^22 — 202.59); and for Batch 3, we use (^31 — 147.6
£32 — 190.60). A scatter plot of the combined data set after the translation is
presented in Figure 11.16, and summary statistics are presented in the last row of
Table 11.4. Comparing the summary statistics of the within-batches to those of
the translated batch presents a more agreeable picture. The standard deviations
of the variables of the overall translated batch compare favorably to the standard
deviation of any individual batch. Likewise, the correlation of the overall translated
group is more representative of the true linear relationship between the two process
variables.
Translation of the data in Figure 11.16 presents a different perspective. Overall,
the scatter plot of the data does not indicate obvious outliers. Three observations
from Batch 3 are at the extreme scatter of the data, but do not appear as obvious
outliers. Using an a — 0.05, the T2 statistic based on the common overall batch
Figure 11.17: T2 chart for batch data.
Figure 11.18: T2 mean chart of acceptable batch region.
was used to detect observations located a significant distance from the mean of
(0,0). These results are presented in the T2 chart given in Figure 11.17, with Batch
1 data first, then Batch 2 data, followed by Batch 3 data. No observation has a T2
value larger than the UCL of 5.818. Also, the eighth observation of Batch 1 has
the largest T2 value, though this was not obvious when examining the scatter plot
given in Figure 11.15.
If there is an indication of a changing covariance matrix among the different
data runs or batches, a test of hypotheses of equality of covariance matrices may
be performed (e.g., see Anderson (1984)). Rejection of the null hypothesis of equal
group covariance matrices would imply that different MVN distributions are needed
to describe the different runs or batches, and that the data cannot be pooled or
translated to a common group. From a practical point of view, this would imply a
very unstable process with no repeatability, and each run or batch would need to
be described by a different distribution. No solution is offered for an overall control

procedure for this case.
To demonstrate the procedure for construction of a control region for batch
production, we consider a batch process for producing a specialty plastic polymer
(see Mason, Chou, and Young (2001)). A detailed chemical analysis is performed
on each batch to assure that the composition of seven measured components adhere
to a rigid chemical formulation. The rigid formulation is necessary for mold release
when the plastic is transformed to a usable product.
A preliminary data set consisting of 52 batches is used to construct an HDS. A
T2 chart for the sample means is presented in Figure 11.18. Using the statistic in
(11.13) with k = 52 and a = 0.001, the UCL is 20.418. Batch means 23 and 37
produce signals on the T2 mean chart. Since the customer removed neither batch,
both were included in the final calculation of the acceptable batch region.
11.8 Phase II Operation with Batch Processes

Throughout this book, we have presented a monitoring system based upon observ-
ing a single observation from a distribution with unknown parameters. We have
noted in several instances, however, that the T2 control statistic could be based
on the mean of a subgroup of (future) observations. These cases can occur with a
known or unknown target vector. We also have presented several different estima-
tors of the covariance matrix, such as 5 and Sw- These many combinations have
led to an assortment of possible T2 control statistics for a Phase II situation, all of
which are summarized in Table 11.5 (for more details, see Mason, Chou, and Young
(2001)).
The T2 statistic for a future observation that is taken from a multivariate dis-
tribution with an unknown mean (target) vector and covariance matrix is given
as
where N is the total sample size of the HDS. The common covariance matrix
estimate S and the target mean vector estimate X are obtained using the HDS,
and F(PIJV-P) denotes the F distribution with p and N — p degrees of freedom. For
a given value of a and the appropriate values of N and p, we compute the UCL
using
where ^(a,p,Ar-p) is the upper ath quantile of F(^p^_p). If, for a given observation
X, the T2 value does not exceed the UCL, it is concluded that control of the process
is being maintained; otherwise, a signal is declared. The T2 distribution in (11.14)
and the UCL in (11.15) would be appropriate for monitoring a Phase II operation
for a Category 1 batch process.
Table 11.5: Phase II formulas for batch processes.
Sub- Target
group Mean Covariance T2 T2 T2
Size: m Mt Estimator Statistic Distribution UCL
1 Known S {A
( X HT ) o<3~~ \A-
llmY ( X l
MT )
llm\ [p(JV-i)l Fr. fp(JV-l)] p
L (N-p) J (P,n-p) [ (AT-p) J P(<*,P,N-p)
1 Known S\v {^\

( X HT ! >->w
llmY !~ ({^
C
X l
llrr,})
PT
r[(N-k-p+l)\
p(N-k) i pl P , J V - f c - p + l j
r
rL ( AP(N-fc) i F.
T-fe-p+l)J ^(«,P,A'-fc-p+l)
xV cl~l ( X [p(N + l)(N-l)-\F r P ( N + l)(JV-l)] „

1 Unknown S (X X} [ N(N-p) J r(P,N-p) L N(N-p) J ^(a.P^-Pj
l
1 Unknown Sw (x
^ ^) s~
xY J (x
{s\
W x}
y^ )
\p(N~k)(N + l)l
[N(N-k-p+l)\
p
r
(p,N-k-p+l)
fp(AT-fc)(JV + l)] p
[N(N-k-p+l)\ (<*,p,JV - K — p+i;
>1 Known S {^•s

( X r,
Mr ) >->
MrrnV Q^1
{^-s
(X^
MT )
1 1 m}
F p ( J V - l ) 1 F.
[m(N-p)\ r(P,n~P)
[p(^-i) r
i F.
L m ( J V - p ) j (.a,P,-?v-p;
>1 Known f p(N-k) 1F r p(Ar-fc) i p

S\v \^S
( X C-
H'T ) ^vt7 V
llrnY S""
1
(Xo s
llr^\
"T ) L m ( N - f e - p + l ) J Mp,-<v-fc-p+l) Lm(N-fc-p+l)J ^(a,p,^V~fc-p+l)
\p(m + N)(N-l)l p rp(m+AO(Ar-l)l p

>1 Unknown S (Ye, Y"mV 'l"1 ( X C* X^\
L m7V(AT-p) J ÎP,^-PJ L mN(N-p) J r(a,p,n-p)
f p(m + N)(N~k) ] p [ p(m + 7V)(N~fc) ] p

>1 Unknown S\v ( X c* XrnY C
i~^ ( X V X^\
[mN(N-k-p+l) J ^(p.JV-fc-p+l) LmJV(AT-fc-p+l)J r(.Q,p,N-fc-p+l)
The changes in the distribution of the T2 statistic when using the estimator S\v
are given by
and
The T2 distribution and the UCL given in (11.16) would be appropriate for
monitoring a Phase II operation for a Category 2 batch process. This statistic
can also be used to monitor a Category 1 batch process, but it produces a more
conservative control region.
The changes that occur in the two statistics when a target mean vector HT is
specified are given as
where S is again obtained from the HDS. For a given value of a, the UCL is
computed using
When the target mean vector is specified but the estimator Sw is used, the T2
statistic and its distribution are given as
and the UCL is given as
When inference is based on the monitoring of a subgroup mean X$ of a future

sample of m observations, noted changes occur in both the distribution and UCL
of the T2 statistic. These formulas are included in Table 11.5 along with the
distributions for the case where the subgroup size is 1.
11.9 Example of Phase II Operation

As an example of a Phase II operation for a Category 1 batch process, consider
the data example introduced in section 11.6. The T2 values being observed on
approximately 50 observations are presented in Figure 11.19. Since the T2 values
11.9. Example of Phase II Operation 233
Figure 11.19: T2 chart for a Phase II operation.
Figure 11.20: Scatter plot of partial batch run.
of all observations are well below the UCL of 9.4175, the process appears to be in
control.
However, closer inspection of the chart reveals a definite linear trend in the T2
values. An inspection of the data leads to the cause for this pattern. Consider
the scatter plot presented in Figure 11.20. When compared to the scatter plot of
the HDS given in Figure 11.11 or to the scatter plots of the individual batch data
given in Figures 11.8-11.10, the reason becomes obvious. The process is operating
in only a portion (i.e., the lower left-hand corner of the plot in Figure 11.20) of the
variable range specified in the HDS.
Figure 11.21: Time-sequence plot for variable x\.
Figure 11.22: Time-sequence plot for variable x^.
Further investigation confirms this conclusion and also gives a strong indication,
as does the T2 chart, that the entire process is moving beyond the operational region
of the variables. This is exhibited in the time-sequence plots of the individual
variables that are presented in Figures 11.21 and 11.22. From this analysis, it is
concluded that the process must be immediately recentered; otherwise the noted
drift in both variables will lead to upset conditions.
11.10. Summary 235
11.10 Summary
When monitoring batch processes, the problems of outlier detection, covariance
estimation, and batch mean differences are interrelated. To identify outliers and
estimate the covariance matrix, we recommend translating the data from the dif-
ferent batches to the origin prior to analysis. This is achieved by subtracting the
individual batch mean from the batch observations. With this translation, outliers
can be removed using the procedures identified in Chapter 5. To detect batches
with atypical means, we recommend testing for mean batch differences following
the procedures described in this chapter.
Old Blue: Epilogue
As you walk out of your office with your boss, you explain how you used multi-
variate statistical process control to locate the cause of the increased fuel usage
on Old Blue. You add that this would be an excellent tool to use in real-time
applications within the unit. You also ask permission to make a presentation
at the upcoming staff meeting on what you 've learned from reading this new
book on multivariate statistical process control.
The boss notices the book in your hand and asks who wrote it. You glance
at the names of the authors, and comment: Mason and Young. Then it all
connects. That old statistics professor wasn't named Dr. Old . . . his name
was Dr. Young.
Appendix
Distribution Tables
Table A.I. Standard normal cumulative probabilities.
Table A.2. Percentage points of the Student t distribution.
Table A.3. Percentage points of the chi-square distribution.
Table A.4. Percentage points of the F distribution.
Table A.5. Percentage points of the beta distribution.

Table A.I: Standard normal cumulative probabilities."
z Value 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359
0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753
0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141
0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517
0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879
0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224
0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549
0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852
0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133
0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389
1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621
1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830
1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015
1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177
1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319
1.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441
1.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545
1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9633
1.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706
1.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.9767
2.0 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817
2.1 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.9857
2.2 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.9890
2.3 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.9916
2.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.9936
2.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.9952
2.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.9964
2.7 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.9974
2.8 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.9981
2.9 0.9981 0.9982 0.9982 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.9986
3.0 0.9987 0.9987 0.9987 0.9988 0.9988 0.9989 0.9989 0.9989 0.9990 0.9990
3.1 0.9990 0.9991 0.9991 0.9991 0.9992 0.9992 0.9992 0.9992 0.9993 0.9993
3.2 0.9993 0.9993 0.9994 0.9994 0.9994 0.9994 0.9994 0.9995 0.9995 0.9995
3.3 0.9995 0.9995 0.9995 0.9996 0.9996 0.9996 0.9996 0.9996 0.9996 0.9997
3.4 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9998
3.5 0.9998 0.9998 0.9998 0.9998 0.9998 0.9998 0.9998 0.9998 0.9998 0.9998
3.6 0.9998 0.9998 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999
3.7 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999
3.8 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999
3.9 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
* Entries in the table are the probability that a standard normal variate is less than
or equal to the given z value.
240 Appendix. Distribution Tables
Table A.2: Percentage points of the Student t distribution/
Alpha
DF 0.2 0.15 0.1 0.05 0.025 0.01 0.005 0.001 0.0005
1 1.376 1.963 3.078 6.314 12.706 31.821 63.656 318.29 636.58
2 1.061 1.386 1.886 2.920 4.303 6.965 9.925 22.328 31.600
3 0.978 1.250 1.638 2.353 3.182 4.541 5.841 10.214 12.924
4 0.941 1.190 1.533 2.132 2.776 3.747 4.604 7.173 8.610
5 0.920 1.156 1.476 2.015 2.571 3.365 4.032 5.894 6.869
6 0.906 1.134 1.440 1.943 2.447 3.143 3.707 5.208 5.959

7 0.896 1.119 1.415 1.895 2.365 2.998 3.499 4.785 5.408
8 0.889 1.108 1.397 1.860 2.306 2.896 3.355 4.501 5.041
9 0.883 1.100 1.383 1.833 2.262 2.821 3.250 4.297 4.781
10 0.879 1.093 1.372 1.812 2.228 2.764 3.169 4.144 4.587
11 0.876 1.088 1.363 1.796 2.201 2.718 3.106 4.025 4.437

12 0.873 1.083 1.356 1.782 2.179 2.681 3.055 3.930 4.318
13 0.870 1.079 1.350 1.771 2.160 2.650 3.012 3.852 4.221
14 0.868 1.076 1.345 1.761 2.145 2.624 2.977 3.787 4.140
15 0.866 1.074 1.341 1.753 2.131 2.602 2.947 3.733 4.073
16 0.865 1.071 1.337 1.746 2.120 2.583 2.921 3.686 4.015

17 0.863 1.069 1.333 1.740 2.110 2.567 2.898 3.646 3.965
18 0.862 1.067 1.330 1.734 2.101 2.552 2.878 3.610 3.922
19 0.861 1.066 1.328 1.729 2.093 2.539 2.861 3.579 3.883
20 0.860 1.064 1.325 1.725 2.086 2.528 2.845 3.552 3.850
21 0.859 1.063 1.323 1.721 2.080 2.518 2.831 3.527 3.819

22 0.858 1.061 1.321 1.717 2.074 2.508 2.819 3.505 3.792
23 0.858 1.060 1.319 1.714 2.069 2.500 2.807 3.485 3.768
24 0.857 1.059 1.318 1.711 2.064 2.492 2.797 3.467 3.745
25 0.856 1.058 1.316 1.708 2.060 2.485 2.787 3.450 3.725
26 0.856 1.058 1.315 1.706 2.056 2.479 2.779 3.435 3.707

27 0.855 1.057 1.314 1.703 2.052 2.473 2.771 3.421 3.689
28 0.855 1.056 1.313 1.701 2.048 2.467 2.763 3.408 3.674
29 0.854 1.055 1.311 1.699 2.045 2.462 2.756 3.396 3.660
30 0.854 1.055 1.310 1.697 2.042 2.457 2.750 3.385 3.646
40 0.851 1.050 1.303 1.684 2.021 2.423 2.704 3.307 3.551

60 0.848 1.045 1.296 1.671 2.000 2.390 2.660 3.232 3.460
90 0.846 1.042 1.291 1.662 1.987 2.368 2.632 3.183 3.402
120 0.845 1.041 1.289 1.658 1.980 2.358 2.617 3.160 3.373
00 0.842 1.036 1.282 1.645 1.960 2.326 2.576 3.090 3.291
"Entries in the table are the t values for an area (Alpha probability)
in the upper tail of the Student t distribution for the given degrees
of freedom (DF).
Table A.3: Percentage points of the chi-square distribution."
Alpha
DF 0.001 0.005 0.01 0.025 0.05 0.1 0.9 0.95 0.975 0.99 0.995 0.999
1 0.00 0.00 0.00 0.00 0.00 0.02 2.71 3.84 5.02 6.63 7.88 10.83
2 0.00 0.01 0.02 0.05 0.10 0.21 4.61 5.99 7.38 9.21 10.60 13.82
3 0.02 0.07 0.11 0.22 0.35 0.58 6.25 7.81 9.35 11.34 12.84 16.27
4 0.09 0.21 0.30 0.48 0.71 1.06 7.78 9.49 11.14 13.28 14.86 18.47
5 0.21 0.41 0.55 0.83 1.15 1.61 9.24 11.07 12.83 15.09 16.75 20.51
6 0.38 0.68 0.87 1.24 1.64 2.20 10.64 12.59 14.45 16.81 18.55 22.46
7 0.60 0.99 1.24 1.69 2.17 2.83 12.02 14.07 16.01 18.48 20.28 24.32
8 0.86 1.34 1.65 2.18 2.73 3.49 13.36 15.51 17.53 20.09 21.95 26.12
9 1.15 1.73 2.09 2.70 3.33 4.17 14.68 16.92 19.02 21.67 23.59 27.88
10 1.48 2.16 2.56 3.25 3.94 4.87 15.99 18.31 20.48 23.21 25.19 29.59
11 1.83 2.60 3.05 3.82 4.57 5.58 17.28 19.68 21.92 24.73 26.76 31.26
12 2.21 3.07 3.57 4.40 5.23 6.30 18.55 21.03 23.34 26.22 28.30 32.91
13 2.62 3.57 4.11 5.01 5.89 7.04 19.81 22.36 24.74 27.69 29.82 34.53
14 3.04 4.07 4.66 5.63 6.57 7.79 21.06 23.68 26.12 29.14 31.32 36.12
15 3.48 4.60 5.23 6.26 7.26 8.55 22.31 25.00 27.49 30.58 32.80 37.70
16 3.94 5.14 5.81 6.91 7.96 9.31 23.54 26.30 28.85 32.00 34.27 39.25
17 4.42 5.70 6.41 7.56 8.67 10.09 24.77 27.59 30.19 33.41 35.72 40.79
18 4.90 6.26 7.01 8.23 9.39 10.86 25.99 28.87 31.53 34.81 37.16 42.31
19 5.41 6.84 7.63 8.91 10.12 11.65 27.20 30.14 32.85 36.19 38.58 43.82
20 5.92 7.43 8.26 9.59 10.85 12.44 28.41 31.41 34.17 37.57 40.00 45.31
21 6.45 8.03 8.90 10.28 11.59 13.24 29.62 32.67 35.48 38.93 41.40 46.80
22 6.98 8.64 9.54 10.98 12.34 14.04 30.81 33.92 36.78 40.29 42.80 48.27
23 7.53 9.26 10.20 11.69 13.09 14.85 32.01 35.17 38.08 41.64 44.18 49.73
24 8.08 9.89 10.86 12.40 13.85 15.66 33.20 36.42 39.36 42.98 45.56 51.18
25 8.65 10.52 11.52 13.12 14.61 16.47 34.38 37.65 40.65 44.31 46.93 52.62
26 9.22 11.16 12.20 13.84 15.38 17.29 35.56 38.89 41.92 45.64 48.29 54.05
27 9.80 11.81 12.88 14.57 16.15 18.11 36.74 40.11 43.19 46.96 49.65 55.48
28 10.39 12.46 13.56 15.31 16.93 18.94 37.92 41.34 44.46 48.28 50.99 56.89
29 10.99 13.12 14.26 16.05 17.71 19.77 39.09 42.56 45.72 49.59 52.34 58.30
30 11.59 13.79 14.95 16.79 18.49 20.60 40.26 43.77 46.98 50.89 53.67 59.70
40 17.92 20.71 22.16 24.43 26.51 29.05 51.81 55.76 59.34 63.69 66.77 73.40
50 24.67 27.99 29.71 32.36 34.76 37.69 63.17 67.50 71.42 76.15 79.49 86.66
60 31.74 35.53 37.48 40.48 43.19 46.46 74.40 79.08 83.30 88.38 91.95 99.61
70 39.04 43.28 45.44 48.76 51.74 55.33 85.53 90.53 95.02 100.43 104.21 112.32
80 46.52 51.17 53.54 57.15 60.39 64.28 96.58 101.88 106.63 112.33 116.32 124.84
90 54.16 59.20 61.75 65.65 69.13 73.29 107.57 113.15 118.14 124.12 128.30 137.21
100 61.92 67.33 70.06 74.22 77.93 82.36 118.50 124.34 129.56 135.81 140.17 149.45
150 102.11 109.14 112.67 117.98 122.69 128.28 172.58 179.58 185.80 193.21 198.36 209.27
200 143.84 152.24 156.43 162.73 168.28 174.84 226.02 233.99 241.06 249.45 255.26 267.54
250 186.55 196.16 200.94 208.10 214.39 221.81 279.05 287.88 295.69 304.94 311.35 324.83
500 407.95 422.30 429.39 439.94 449.15 459.93 540.93 553.13 563.85 576.49 585.21 603.45
* Entries in the table are the chi-square values for an area (Alpha probability) in the upper tail
of the chi-square distribution for the given degrees of freedom (DF).
Table A.4a: Percentage points of the F distribution at Alpha = 0.05/
Denominator Numerator DF
DF 1 2 3 4 5 6 7 8 9 10 12 15 20 24 30 60 90 120 150 200 250 500 oo
1 161.4 199.5 215.7 224.6 230.2 234.0 236.8 238.9 240.5 241.9 243.9 245.9 248.0 249.0 250.1 252.2 252.9 253.2 253.5 253.7 253.8 254.1 254.3
2 18.51 19.00 19.16 19.25 19.30 19.33 19.35 19.37 19.38 19.40 19.41 19.43 19.45 19.45 19.46 19.48 19.48 19.49 19.49 19.49 19.49 19.49 19.50
3 10.13 9.55 9.28 9.12 9.01 8.94 8.89 8.85 8.81 8.79 8.74 8.70 8.66 8.64 8.62 8.57 8.56 8.55 8.54 8.54 8.54 8.53 8.53
4 7.71 6.94 6.59 6.39 6.26 6.16 6.09 6.04 6.00 5.96 5.91 5.86 5.80 5.77 5.75 5.69 5.67 5.66 5.65 5.65 5.64 5.64 5.63
5 6.61 5.79 5.41 5.19 5.05 4.95 4.88 4.82 4.77 4.74 4.68 4.62 4.56 4.53 4.50 4.43 4.41 4.40 4.39 4.39 4.38 4.37 4.36
6 5.99 5.14 4.76 4.53 4.39 4.28 4.21 4.15 4.10 4.06 4.00 3.94 3.87 3.84 3.81 3.74 3.72 3.70 3.70 3.69 3.69 3.68 3.67
7 5.59 4.74 4.35 4.12 3.97 3.87 3.79 3.73 3.68 3.64 3.57 3.51 3.44 3.41 3.38 3.30 3.28 3.27 3.26 3.25 3.25 3.24 3.23
8 5.32 4.46 4.07 3.84 3.69 3.58 3.50 3.44 3.39 3.35 3.28 3.22 3.15 3.12 3.08 3.01 2.98 2.97 2.96 2.95 2.95 2.94 2.93
9 5.12 4.26 3.86 3.63 3.48 3.37 3.29 3.23 3.18 3.14 3.07 3.01 2.94 2.90 2.86 2.79 2.76 2.75 2.74 2.73 2.73 2.72 2.71
10 4.96 4.10 3.71 3.48 3.33 3.22 3.14 3.07 3.02 2.98 2.91 2.85 2.77 2.74 2.70 2.62 2.59 2.58 2.57 2.56 2.56 2.55 2.54
12 4.75 3.89 3.49 3.26 3.11 3.00 2.91 2.85 2.80 2.75 2.69 2.62 2.54 2.51 2.47 2.38 2.36 2.34 2.33 2.32 2.32 2.31 2.30
15 4.54 3.68 3.29 3.06 2.90 2.79 2.71 2.64 2.59 2.54 2.48 2.40 2.33 2.29 2.25 2.16 2.13 2.11 2.10 2.10 2.09 2.08 2.07
20 4.35 3.49 3.10 2.87 2.71 2.60 2.51 2.45 2.39 2.35 2.28 2.20 2.12 2.08 2.04 1.95 1.91 1.90 1.89 1.88 1.87 1.86 1.84
24 4.26 3.40 3.01 2.78 2.62 2.51 2.42 2.36 2.30 2.25 2.18 2.11 2.03 1.98 1.94 1.84 1.81 1.79 1.78 1.77 1.76 1.75 1.73
30 4.17 3.32 2.92 2.69 2.53 2.42 2.33 2.27 2.21 2.16 2.09 2.01 1.93 1.89 1.84 1.74 1.70 1.68 1.67 1.66 1.65 1.64 1.62
60 4.00 3.15 2.76 2.53 2.37 2.25 2.17 2.10 2.04 1.99 1.92 1.84 1.75 1.70 1.65 1.53 1.49 1.47 1.45 1.44 1.43 1.41 1.39
90 3.95 3.10 2.71 2.47 2.32 2.20 2.11 2.04 1.99 1.94 1.86 1.78 1.69 1.64 1.59 1.46 1.42 1.39 1.38 1.36 1.35 1.33 1.30
120 3.92 3.07 2.68 2.45 2.29 2.18 2.09 2.02 1.96 1.91 1.83 1.75 1.66 1.61 1.55 1.43 1.38 1.35 1.33 1.32 1.30 1.28 1.25
150 3.90 3.06 2.66 2.43 2.27 2.16 2.07 2.00 1.94 1.89 1.82 1.73 1.64 1.59 1.54 1.41 1.36 1.33 1.31 1.29 1.28 1.25 1.22
200 3.89 3.04 2.65 2.42 2.26 2.14 2.06 1.98 1.93 1.88 1.80 1.72 1.62 1.57 1.52 1.39 1.33 1.30 1.28 1.26 1.25 1.22 1.19
250 3.88 3.03 2.64 2.41 2.25 2.13 2.05 1.98 1.92 1.87 1.79 1.71 1.61 1.56 1.50 1.37 1.32 1.29 1.27 1.25 1.23 1.20 1.17
500 3.86 3.01 2.62 2.39 2.23 2.12 2.03 1.96 1.90 1.85 1.77 1.69 1.59 1.54 1.48 1.35 1.29 1.26 1.23 1.21 1.19 1.16 1.11
oo 3.84 3.00 2.60 2.37 2.21 2.01 1.94 1.88 1.83 1.75 1.67 1.57 1.52 1.46 1.32 1.38 1.20 1.22 1.20 1.17 1.15 1.13 1.00
"Entries in the table are the F values for an area (Alpha probability) in the upper tail of the F distribution for the given denominator and numerator
degrees of freedom (DF).
Table A.4b: Percentage points of the F distribution at Alpha = 0.025."
1 2 3 4 5 6 7 8 9 10 12 15 20 24 30 60 90 120 150 200 250 500 oo
1 647.8 799.5 864.2 899.6 921.8 937.1 948.2 956.6 963.3 968.3 976.2 984.9 993.1 997.3 1001 1010 1013 1014 1015 1016 1016 1017 1018
2 38.51 39.00 39.17 39.25 39.30 39.33 39.36 39.37 39.39 39.40 39.41 39.43 39.45 39.46 39.46 39.48 39.49 39.49 39.49 39.49 39.49 39.50 39.50
3 17.44 16.04 15.44 15.10 14.88 14.73 14.62 14.54 14.47 14.42 14.34 14.25 14.17 14.12 14.08 13.99 13.96 13.95 13.94 13.93 13.92 13.91 13.90
4 12.22 10.65 9.98 9.60 9.36 9.20 9.07 8.98 8.90 8.84 8.75 8.66 8.56 8.51 8.46 8.36 8.33 8.31 8.30 8.29 8.28 8.27 8.26
5 10.01 8.43 7.76 7.39 7.15 6.98 6.85 6.76 6.68 6.62 6.52 6.43 6.33 6.28 6.23 6.12 6.09 6.07 6.06 6.05 6.04 6.03 6.02
6 8.81 7.26 6.60 6.23 5.99 5.82 5.70 5.60 5.52 5.46 5.37 5.27 5.17 5.12 5.07 4.96 4.92 4.90 4.89 4.88 4.88 4.86 4.85
7 8.07 6.54 5.89 5.52 5.29 5.12 4.99 4.90 4.82 4.76 4.67 4.57 4.47 4.41 4.36 4.25 4.22 4.20 4.19 4.18 4.17 4.16 4.14
8 7.57 6.06 5.42 5.05 4.82 4.65 4.53 4.43 4.36 4.30 4.20 4.10 4.00 3.95 3.89 3.78 3.75 3.73 3.72 3.70 3.70 3.68 3.67
9 7.21 5.71 5.08 4.72 4.48 4.32 4.20 4.10 4.03 3.96 3.87 3.77 3.67 3.61 3.56 3.45 3.41 3.39 3.38 3.37 3.36 3.35 3.33
10 6.94 5.46 4.83 4.47 4.24 4.07 3.95 3.85 3.78 3.72 3.62 3.52 3.42 3.37 3.31 3.20 3.16 3.14 3.13 3.12 3.11 3.09 3.08
12 6.55 5.10 4.47 4.12 3.89 3.73 3.61 3.51 3.44 3.37 3.28 3.18 3.07 3.02 2.96 2.85 2.81 2.79 2.78 2.76 2.76 2.74 2.72
15 6.20 4.77 4.15 3.80 3.58 3.41 3.29 3.20 3.12 3.06 2.96 2.86 2.76 2.70 2.64 2.52 2.48 2.46 2.45 2.44 2.43 2.41 2.40
20 5.87 4.46 3.86 3.51 3.29 3.13 3.01 2.91 2.84 2.77 2.68 2.57 2.46 2.41 2.35 2.22 2.18 2.16 2.14 2.13 2.12 2.10 2.09
24 5.72 4.32 3.72 3.38 3.15 2.99 2.87 2.78 2.70 2.64 2.54 2.44 2.33 2.27 2.21 2.08 2.03 2.01 2.00 1.98 1.97 1.95 1.94
30 5.57 4.18 3.59 3.25 3.03 2.87 2.75 2.65 2.57 2.51 2.41 2.31 2.20 2.14 2.07 1.94 1.89 1.87 1.85 1.84 1.83 1.81 1.79
60 5.29 3.93 3.34 3.01 2.79 2.63 2.51 2.41 2.33 2.27 2.17 2.06 1.94 1.88 1.82 1.67 1.61 1.58 1.56 1.54 1.53 1.51 1.48
90 5.20 3.84 3.26 2.93 2.71 2.55 2.43 2.34 2.26 2.19 2.09 1.98 1.86 1.80 1.73 1.58 1.52 1.48 1.46 1.44 1.43 1.40 1.37
120 5.15 3.80 3.23 2.89 2.67 2.52 2.39 2.30 2.22 2.16 2.05 1.94 1.82 1.76 1.69 1.53 1.47 1.43 1.41 1.39 1.37 1.34 1.31
150 5.13 3.78 3.20 2.87 2.65 2.49 2.37 2.28 2.20 2.13 2.03 1.92 1.80 1.74 1.67 1.50 1.44 1.40 1.38 1.35 1.34 1.31 1.27
200 5.10 3.76 3.18 2.85 2.63 2.47 2.35 2.26 2.18 2.11 2.01 1.90 1.78 1.71 1.64 1.47 1.41 1.37 1.35 1.32 1.30 1.27 1.23
250 5.08 3.74 3.17 2.84 2.62 2.46 2.34 2.24 2.16 2.10 2.00 1.89 1.76 1.70 1.63 1.46 1.39 1.35 1.33 1.30 1.28 1.24 1.20
500 5.05 3.72 3.14 2.81 2.59 2.43 2.31 2.22 2.14 2.07 1.97 1.86 1.74 1.67 1.60 1.42 1.35 1.31 1.28 1.25 1.24 1.19 1.14
00 5.02 3.69 3.12 2.79 2.57 2.41 2.29 2.19 2.11 2.05 1.94 1.83 1.74 1.64 1.57 1.39 1.31 1.30 1.24 1.21 1.18 1.13 1.00
Entries in the table are the F values for an area (Alpha probability) in the upper tail of the F distribution for the given denominator and n umerator
Table A.4c: Percentage points of the F distribution at Alpha = 0.01/
DF 1 2 3 4 5 6 7 8 9 10 12 15 20 24 30 60 90 120 150 200 250 500 00
1 4052 4999 5403 5624 5764 5859 5928 5981 6022 6056 6107 6157 6209 6234 6260 6313 6331 6339 6345 6350 6353 6359 6366
2 98.50 99.00 99.16 99.25 99.30 99.33 99.36 99.38 99.39 99.40 99.42 99.43 99.45 99.46 99.47 99.48 99.49 99.49 99.49 99.49 99.50 99.50 99.50
3 34.12 30.82 29.46 28.71 28.24 27.91 27.67 27.49 27.34 27.23 27.05 26.87 26.69 26.60 26.50 26.32 26.25 26.22 26.20 26.18 26.17 26.15 26.13
4 21.20 18.00 16.69 15.98 15.52 15.21 14.98 14.80 14.66 14.55 14.37 14.20 14.02 13.93 13.84 13.65 13.59 13.56 13.54 13.52 13.51 13.49 13.46
5 16.26 13.27 12.06 11.39 10.97 10.67 10.46 10.29 10.16 10.05 9.89 9.72 9.55 9.47 9.38 9.20 9.14 9.11 9.09 9.08 9.06 9.04 9.02
6 13.75 10.92 9.78 9.15 8.75 8.47 8.26 8.10 7.98 7.87 7.72 7.56 7.40 7.31 7.23 7.06 7.00 6.97 6.95 6.93 6.92 6.90 6.88
7 12.25 9.55 8.45 7.85 7.46 7.19 6.99 6.84 6.72 6.62 6.47 6.31 6.16 6.07 5.99 5.82 5.77 5.74 5.72 5.70 5.69 5.67 5.65
8 11.26 8.65 7.59 7.01 6.63 6.37 6.18 6.03 5.91 5.81 5.67 5.52 5.36 5.28 5.20 5.03 4.97 4.95 4.93 4.91 4.90 4.88 4.86
9 10.56 8.02 6.99 6.42 6.06 5.80 5.61 5.47 5.35 5.26 5.11 4.96 4.81 4.73 4.65 4.48 4.43 4.40 4.38 4.36 4.35 4.33 4.31
10 10.04 7.56 6.55 5.99 5.64 5.39 5.20 5.06 4.94 4.85 4.71 4.56 4.41 4.33 4.25 4.08 4.03 4.00 3.98 3.96 3.95 3.93 3.91
12 9.33 6.93 5.95 5.41 5.06 4.82 4.64 4.50 4.39 4.30 4.16 4.01 3.86 3.78 3.70 3.54 3.48 3.45 3.43 3.41 3.40 3.38 3.36
15 8.68 6.36 5.42 4.89 4.56 4.32 4.14 4.00 3.89 3.80 3.67 3.52 3.37 3.29 3.21 3.05 2.99 2.96 2.94 2.92 2.91 2.89 2.87
20 8.10 5.85 4.94 4.43 4.10 3.87 3.70 3.56 3.46 3.37 3.23 3.09 2.94 2.86 2.78 2.61 2.55 2.52 2.50 2.48 2.47 2.44 2.42
24 7.82 5.61 4.72 4.22 3.90 3.67 3.50 3.36 3.26 3.17 3.03 2.89 2.74 2.66 2.58 2.40 2.34 2.31 2.29 2.27 2.26 2.24 2.21
30 7.56 5.39 4.51 4.02 3.70 3.47 3.30 3.17 3.07 2.98 2.84 2.70 2.55 2.47 2.39 2.21 2.14 2.11 2.09 2.07 2.06 2.03 2.01
60 7.08 4.98 4.13 3.65 3.34 3.12 2.95 2.82 2.72 2.63 2.50 2.35 2.20 2.12 2.03 1.84 1.76 1.73 1.70 1.68 1.66 1.63 1.60
90 6.93 4.85 4.01 3.53 3.23 3.01 2.84 2.72 2.61 2.52 2.39 2.24 2.09 2.00 1.92 1.72 1.64 1.60 1.57 1.55 1.53 1.49 1.46
120 6.85 4.79 3.95 3.48 3.17 2.96 2.79 2.66 2.56 2.47 2.34 2.19 2.03 1.95 1.86 1.66 1.58 1.53 1.51 1.48 1.46 1.42 1.38
150 6.81 4.75 3.91 3.45 3.14 2.92 2.76 2.63 2.53 2.44 2.31 2.16 2.00 1.92 1.83 1.62 1.54 1.49 1.46 1.43 1.42 1.38 1.33
200 6.76 4.71 3.88 3.41 3.11 2.89 2.73 2.60 2.50 2.41 2.27 2.13 1.97 1.89 1.79 1.58 1.50 1.45 1.42 1.39 1.37 1.33 1.28
250 6.74 4.69 3.86 3.40 3.09 2.87 2.71 2.58 2.48 2.39 2.26 2.11 1.95 1.87 1.77 1.56 1.48 1.43 1.40 1.36 1.34 1.30 1.24
500 6.69 4.65 3.82 3.36 3.05 2.84 2.68 2.55 2.44 2.36 2.22 2.07 1.92 1.83 1.74 1.52 1.43 1.38 1.34 1.31 1.28 1.23 1.16
00 6.63 4.61 3.78 3.32 3.02 2.80 2.64 2.51 2.41 2.32 2.18 2.04 1.88 1.79 1.70 1.47 1.38 1.36 1.29 1.25 1.22 1.15 1.00
"Entries in the table are the F values for an area (Alpha probability) in the upper tail of the F distribution for the given denominator and numerator
Table A.4d: Percentage points of the F distribution at Alpha = 0.005.*
DF 1 2 3 4 5 6 7 8 9 10 12 15 20 24 30 60 90 120 150 200 250 500 oo
1 ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++
2 198.5 199.0 199.2 199.2 199.3 199.3 199.4 199.4 199.4 199.4 199.4 199.4 199.4 199.4 199.5 199.5 199.5 199.5 199.5 199.5 199.5 199.5 199.5
3 55.55 49.80 47.47 46.20 45.39 44.84 44.43 44.13 43.88 43.68 43.39 43.08 42.78 42.62 42.47 42.15 42.04 41.99 41.96 41.92 41.91 41.87 41.83
4 31.33 26.28 24.26 23.15 22.46 21.98 21.62 21.35 21.14 20.97 20.70 20.44 20.17 20.03 19.89 19.61 19.52 19.47 19.44 19.41 19.39 19.36 19.32
5 22.78 18.31 16.53 15.56 14.94 14.51 14.20 13.96 13.77 13.62 13.38 13.15 12.90 12.78 12.66 12.40 12.32 12.27 12.25 12.22 12.21 12.17 12.14
6 18.63 14.54 12.92 12.03 11.46 11.07 10.79 10.57 10.39 10.25 10.03 9.81 9.59 9.47 9.36 9.12 9.04 9.00 8.98 8.95 8.94 8.91 8.88
7 16.24 12.40 10.88 10.05 9.52 9.16 8.89 8.68 8.51 8.38 8.18 7.97 7.75 7.64 7.53 7.31 7.23 7.19 7.17 7.15 7.13 7.10 7.08
8 14.69 11.04 9.60 8.81 8.30 7.95 7.69 7.50 7.34 7.21 7.01 6.81 6.61 6.50 6.40 6.18 6.10 6.06 6.04 6.02 6.01 5.98 5.95
9 13.61 10.11 8.72 7.96 7.47 7.13 6.88 6.69 6.54 6.42 6.23 6.03 5.83 5.73 5.62 5.41 5.34 5.30 5.28 5.26 5.24 5.21 5.19
10 12.83 9.43 8.08 7.34 6.87 6.54 6.30 6.12 5.97 5.85 5.66 5.47 5.27 5.17 5.07 4.86 4.79 4.75 4.73 4.71 4.69 4.67 4.64
12 11.75 8.51 7.23 6.52 6.07 5.76 5.52 5.35 5.20 5.09 4.91 4.72 4.53 4.43 4.33 4.12 4.05 4.01 3.99 3.97 3.96 3.93 3.90
15 10.80 7.70 6.48 5.80 5.37 5.07 4.85 4.67 4.54 4.42 4.25 4.07 3.88 3.79 3.69 3.48 3.41 3.37 3.35 3.33 3.31 3.29 3.26
20 9.94 6.99 5.82 5.17 4.76 4.47 4.26 4.09 3.96 3.85 3.68 3.50 3.32 3.22 3.12 2.92 2.84 2.81 2.78 2.76 2.75 2.72 2.69
24 9.55 6.66 5.52 4.89 4.49 4.20 3.99 3.83 3.69 3.59 3.42 3.25 3.06 2.97 2.87 2.66 2.58 2.55 2.52 2.50 2.49 2.46 2.43
30 9.18 6.35 5.24 4.62 4.23 3.95 3.74 3.58 3.45 3.34 3.18 3.01 2.82 2.73 2.63 2.42 2.34 2.30 2.28 2.25 2.24 2.21 2.18
60 8.49 5.79 4.73 4.14 3.76 3.49 3.29 3.13 3.01 2.90 2.74 2.57 2.39 2.29 2.19 1.96 1.88 1.83 1.81 1.78 1.76 1.73 1.69
90 8.28 5.62 4.57 3.99 3.62 3.35 3.15 3.00 2.87 2.77 2.61 2.44 2.25 2.15 2.05 1.82 1.73 1.68 1.65 1.62 1.60 1.56 1.52
120 8.18 5.54 4.50 3.92 3.55 3.28 3.09 2.93 2.81 2.71 2.54 2.37 2.19 2.09 1.98 1.75 1.66 1.61 1.57 1.54 1.52 1.48 1.43
150 8.12 5.49 4.45 3.88 3.51 3.25 3.05 2.89 2.77 2.67 2.51 2.33 2.15 2.05 1.94 1.70 1.61 1.56 1.53 1.49 1.47 1.42 1.37
200 8.06 5.44 4.41 3.84 3.47 3.21 3.01 2.86 2.73 2.63 2.47 2.30 2.11 2.01 1.91 1.66 1.56 1.51 1.48 1.44 1.42 1.37 1.31
250 8.02 5.41 4.38 3.81 3.44 3.18 2.99 2.83 2.71 2.61 2.45 2.27 2.09 1.99 1.88 1.64 1.54 1.48 1.45 1.41 1.39 1.33 1.27
500 7.95 5.35 4.33 3.76 3.40 3.14 2.94 2.79 2.66 2.56 2.40 2.23 2.04 1.94 1.84 1.58 1.48 1.42 1.39 1.35 1.32 1.26 1.18
oo 7.88 5.30 4.28 3.72 3.35 3.09 2.90 2.74 2.62 2.52 2.36 2.19 2.00 1.90 1.79 1.53 1.43 1.36 1.32 1.28 1.25 1.17 1.00
* Entries in the table are the F values for an area (Alpha probability) in the upper tail of the F distribution for the given denominator and numerator degrees of
freedom (DF).
F values exceed 16.000.
Table A.4e: Percentage points of the F distribution at Alpha = 0.001.H
DF 1 2 3 4 5 6 7 8 9 10 12 15 20 24 30 60 90 120 150 200 250 500 00
1 ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++
2 998.4 998.8 999.3 999.3 999.3 999.3 999.3 999.3 999.3 999.3 999.3 999.3 999.3 999.3 999.3 999.3 999.3 999.3 999.3 999.3 999.3 999.3 999.5
3 167.1 148.5 141.1 137.1 134.6 132.8 131.6 130.6 129.9 129.2 128.3 127.4 126.4 125.9 125.4 124.4 124.2 124.0 123.9 123.7 123.7 123.6 123.5
4 74.13 61.25 56.17 53.43 51.72 50.52 49.65 49.00 48.47 48.05 47.41 46.76 46.10 45.77 45.43 44.75 44.51 44.40 44.33 44.27 44.22 44.14 44.05
5 47.18 37.12 33.20 31.08 29.75 28.83 28.17 27.65 27.24 26.91 26.42 25.91 25.39 25.13 24.87 24.33 24.15 24.06 24.00 23.95 23.92 23.85 23.78
6 35.51 27.00 23.71 21.92 20.80 20.03 19.46 19.03 18.69 18.41 17.99 17.56 17.12 16.90 16.67 16.21 16.06 15.98 15.93 15.89 15.86 15.80 15.75
7 29.25 21.69 18.77 17.20 16.21 15.52 15.02 14.63 14.33 14.08 13.71 13.32 12.93 12.73 12.53 12.12 11.98 11.91 11.87 11.82 11.80 11.75 11.70
8 25.41 18.49 15.83 14.39 13.48 12.86 12.40 12.05 11.77 11.54 11.19 10.84 10.48 10.30 10.11 9.73 9.60 9.53 9.49 9.45 9.43 9.38 9.33
9 22.86 16.39 13.90 12.56 11.71 11.13 10.70 10.37 10.11 9.89 9.57 9.24 8.90 8.72 8.55 8.19 8.06 8.00 7.96 7.93 7.90 7.86 7.81
10 21.04 14.90 12.55 11.28 10.48 9.93 9.52 9.20 8.96 8.75 8.45 8.13 7.80 7.64 7.47 7.12 7.00 6.94 6.91 6.87 6.85 6.81 6.76
12 18.64 12.97 10.80 9.63 8.89 8.38 8.00 7.71 7.48 7.29 7.00 6.71 6.40 6.25 6.09 5.76 5.65 5.59 5.56 5.52 5.50 5.46 5.42
15 16.59 11.34 9.34 8.25 7.57 7.09 6.74 6.47 6.26 6.08 5.81 5.54 5.25 5.10 4.95 4.64 4.53 4.48 4.44 4.41 4.39 4.35 4.31
20 14.82 9.95 8.10 7.10 6.46 6.02 5.69 5.44 5.24 5.08 4.82 4.56 4.29 4.15 4.00 3.70 3.60 3.54 3.51 3.48 3.46 3.42 3.38
24 14.03 9.34 7.55 6.59 5.98 5.55 5.24 4.99 4.80 4.64 4.39 4.14 3.87 3.74 3.59 3.29 3.19 3.14 3.10 3.07 3.05 3.01 2.97
30 13.29 8.77 7.05 6.12 5.53 5.12 4.82 4.58 4.39 4.24 4.00 3.75 3.49 3.36 3.22 2.92 2.81 2.76 2.73 2.69 2.67 2.63 2.59
60 11.97 7.77 6.17 5.31 4.76 4.37 4.09 3.86 3.69 3.54 3.32 3.08 2.83 2.69 2.55 2.25 2.14 2.08 2.05 2.01 1.99 1.94 1.89
90 11.57 7.47 5.91 5.06 4.53 4.15 3.87 3.65 3.48 3.34 3.11 2.88 2.63 2.50 2.36 2.05 1.93 1.87 1.83 1.79 1.77 1.72 1.66
120 11.38 7.32 5.78 4.95 4.42 4.04 3.77 3.55 3.38 3.24 3.02 2.78 2.53 2.40 2.26 1.95 1.83 1.77 1.73 1.68 1.66 1.60 1.54
150 11.27 7.24 5.71 4.88 4.35 3.98 3.71 3.49 3.32 3.18 2.96 2.73 2.48 2.35 2.21 1.89 1.77 1.70 1.66 1.62 1.59 1.53 1.47
200 11.15 7.15 5.63 4.81 4.29 3.92 3.65 3.43 3.26 3.12 2.90 2.67 2.42 2.29 2.15 1.83 1.71 1.64 1.60 1.55 1.52 1.46 1.39
250 11.09 7.10 5.59 4.77 4.25 3.88 3.61 3.40 3.23 3.09 2.87 2.64 2.39 2.26 2.12 1.80 1.67 1.60 1.56 1.51 1.48 1.42 1.34
500 10.96 7.00 5.51 4.69 4.18 3.81 3.54 3.33 3.16 3.02 2.81 2.58 2.33 2.20 2.05 1.73 1.60 1.53 1.48 1.43 1.39 1.32 1.23
00 10.83 6.91 5.42 4.62 4.10 3.74 3.47 3.27 3.10 2.96 2.74 2.51 2.27 2.13 1.99 1.66 1.52 1.45 1.40 1.34 1.30 1.21 1.00
* Entries in the table are the F values for an area (Alpha probability) in the upper tail of the F distribution for the given denominator and numerator degrees of
freedom (DF).
++
F values exceed 400,000.
Table A.5: Percentage points of the beta distribution.'
(n-p-l)/2
P/2 Alpha 5 6 7 8 9 10 20 30 40 50 60 70 90 120 150 200 250 500
0.5 0.999 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
0.75 0.0106 0.0088 0.0075 0.0065 0.0058 0.0052 0.0026 0.0017 0.0013 0.0010 0.0008 0.0007 0.0006 0.0006 0.0004 0.0003 0.0003 0.0002 0.0001
0.10 0.2473 0.2093 0.1814 0.1600 0.1431 0.1295 0.0662 0.0445 0.0335 0.0268 0.0224 0.0192 0.0168 0.0150 0.0112 0.0090 0.0067 0.0054 0.0027
0.05 0.3318 0.2835 0.2473 0.2193 0.1969 0.1787 0.0927 0.0625 0.0472 0.0379 0.0316 0.0272 0.0238 0.0212 0.0159 0.0127 0.0096 0.0077 0.0038
0.025 0.4096 0.3532 0.3103 0.2765 0.2493 0.2269 0.1194 0.0810 0.0612 0.0492 0.0412 0.0354 0.0310 0.0276 0.0208 0.0166 0.0125 0.0100 0.0050
0.01 0.5011 0.4374 0.3876 0.3478 0.3152 0.2882 0.1546 0.1055 0.0801 0.0645 0.0540 0.0464 0.0407 0.0363 0.0273 0.0219 0.0165 0.0132 0.0066
0.005 0.5619 0.4948 0.4413 0.3979 0.3621 0.3321 0.1808 0.1240 0.0944 0.0761 0.0638 0.0549 0.0482 0.0429 0.0324 0.0260 0.0195 0.0157 0.0079
0.001 0.6778 0.6084 0.5505 0.5019 0.4608 0.4256 0.2397 0.1664 0.1273 0.1031 0.0866 0.0747 0.0656 0.0585 0.0442 0.0355 0.0267 0.0214 0.0108
1.0 0.999 0.0002 0.0002 0.0001 0.0001 0.0001 0.0001 0.0001 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
0.75 0.0559 0.0468 0.0403 0.0353 0.0315 0.0284 0.0143 0.0095 0.0072 0.0057 0.0048 0.0041 0.0036 0.0032 0.0024 0.0019 0.0014 0.0012 0.0006
0.10 0.3690 0.3187 0.2803 0.2501 0.2257 0.2057 0.1087 0.0739 0.0559 0.0450 0.0376 0.0324 0.0284 0.0253 0.0190 0.0152 0.0114 0.0092 0.0046
0.05 0.4507 0.3930 0.3482 0.3123 0.2831 0.2589 0.1391 0.0950 0.0722 0.0582 0.0487 0.0419 0.0368 0.0327 0.0247 0.0198 0.0149 0.0119 0.0060
0.025 0.5218 0.4593 0.4096 0.3694 0.3363 0.3085 0.1684 0.1157 0.0881 0.0711 0.0596 0.0513 0.0451 0.0402 0.0303 0.0243 0.0183 0.0146 0.0074
0.01 0.6019 0.5358 0.4821 0.4377 0.4005 0.3690 0.2057 0.1423 0.1087 0.0880 0.0739 0.0637 0.0559 0.0499 0.0376 0.0302 0.0228 0.0183 0.0092
0.005 0.6534 0.5865 0.5309 0.4843 0.4450 0.4113 0.2327 0.1619 0.1241 0.1005 0.0845 0.0729 0.0641 0.0572 0.0432 0.0347 0.0261 0.0210 0.0105
0.001 0.7488 0.6838 0.6272 0.5783 0.5358 0.4988 0.2921 0.2057 0.1586 0.1290 0.1087 0.0940 0.0827 0.0739 0.0559 0.0450 0.0339 0.0273 0.0137
1.5 0.999 0.0023 0.0019 0.0017 0.0015 0.0013 0.0012 0.0006 0.0004 0.0003 0.0002 0.0002 0.0002 0.0002 0.0001 0.0001 0.0001 0.0001 0.0000 0.0000
0.75 0.1093 0.0926 0.0803 0.0709 0.0635 0.0575 0.0295 0.0198 0.0150 0.0120 0.0100 0.0086 0.0075 0.0067 0.0050 0.0040 0.0030 0.0024 0.0012
0.10 0.4500 0.3944 0.3509 0.3158 0.2871 0.2631 0.1431 0.0982 0.0747 0.0603 0.0506 0.0435 0.0382 0.0340 0.0257 0.0206 0.0155 0.0124 0.0062
0.05 0.5266 0.4660 0.4174 0.3778 0.3450 0.3173 0.1755 0.1212 0.0925 0.0748 0.0628 0.0541 0.0475 0.0424 0.0320 0.0257 0.0193 0.0155 0.0078
0.025 0.5915 0.5280 0.4761 0.4332 0.3972 0.3666 0.2062 0.1432 0.1096 0.0888 0.0747 0.0644 0.0566 0.0505 0.0381 0.0306 0.0231 0.0185 0.0093
0.01 0.6628 0.5981 0.5438 0.4981 0.4591 0.4255 0.2444 0.1710 0.1315 0.1068 0.0899 0.0776 0.0682 0.0609 0.0461 0.0370 0.0279 0.0224 0.0113
0.005 0.7080 0.6437 0.5887 0.5417 0.5012 0.4660 0.2718 0.1912 0.1474 0.1199 0.1011 0.0873 0.0769 0.0687 0.0520 0.0418 0.0315 0.0253 0.0127
0.001 0.7902 0.7298 0.6758 0.6281 0.5858 0.5485 0.3309 0.2358 0.1830 0.1494 0.1263 0.1093 0.0964 0.0862 0.0654 0.0527 0.0398 0.0320 0.0161
2.0 0.999 0.0083 0.0070 0.0060 0.0053 0.0048 0.0043 0.0022 0.0015 0.0011 0.0009 0.0008 0.0006 0.0006 0.0005 0.0004 0.0003 0.0002 0.0002 0.0001
0.75 0.1612 0.1380 0.1206 0.1072 0.0964 0.0876 0.0458 0.0310 0.0235 0.0189 0.0158 0.0135 0.0119 0.0106 0.0079 0.0064 0.0048 0.0038 0.0019
0.10 0.5103 0.4526 0.4062 0.3684 0.3368 0.3102 0.1729 0.1198 0.0916 0.0741 0.0623 0.0537 0.0472 0.0421 0.0318 0.0255 0.0192 0.0154 0.0077
0.05 0.5818 0.5207 0.4707 0.4291 0.3942 0.3644 0.2067 0.1441 0.1106 0.0897 0.0754 0.0651 0.0572 0.0511 0.0386 0.0310 0.0234 0.0188 0.0094
0.025 0.6412 0.5787 0.5265 0.4825 0.4450 0.4128 0.2382 0.1670 0.1286 0.1045 0.0880 0.0760 0.0669 0.0597 0.0452 0.0363 0.0274 0.0220 0.0111
0.01 0.7057 0.6434 0.5899 0.5440 0.5044 0.4698 0.2768 0.1957 0.1512 0.1232 0.1039 0.0899 0.0792 0.0707 0.0536 0.0432 0.0326 0.0262 0.0132
0.005 0.7460 0.6849 0.6315 0.5850 0.5443 0.5086 0.3043 0.2163 0.1677 0.1368 0.1156 0.1000 0.0882 0.0788 0.0598 0.0482 0.0364 0.0292 0.0147
0.001 0.8186 0.7625 0.7113 0.6651 0.6237 0.5866 0.3630 0.2613 0.2039 0.1671 0.1416 0.1228 0.1084 0.0970 0.0738 0.0595 0.0450 0.0362 0.0183
*Entries in the table are the beta values -B(a,p/2.(n-P-2)/2) for

foi an area (Alpha probability) in the upper tail of the beta distribution for the given parameter values
of p/2 and (n-p-l)/2.
Table A.5 (continued): Percentage points of the beta distribution/
(n-p-l)/2
p/2 Alpha 9 10 20 30 40 50 60 70 90 120 150 200 250 500
2.5 0.999 0.0182 0.0155 0.0135 0.0120 0.0107 0.0097 0.0051 0.0034 0.0026 0.0021 0.0017 0.0015 0.0013 0.0012 0.0009 0.0007 0.0005 0.0004 0.0002
0.75 0.2092 0.1808 0.1592 0.1422 0.1286 0.1173 0.0625 0.0426 0.0323 0.0260 0.0218 0.0187 0.0164 0.0146 0.0110 0.0088 0.0066 0.0053 0.0027
0.1 0.5577 0.4994 0.4517 0.4122 0.3789 0.3505 0.1997 0.1395 0.1072 0.0870 0.0732 0.0632 0.0556 0.0496 0.0375 0.0302 0.0227 0.0182 0.0092
0.05 0.6245 0.5641 0.5137 0.4713 0.4351 0.4040 0.2344 0.1648 0.1271 0.1034 0.0871 0.0753 0.0663 0.0592 0.0448 0.0361 0.0272 0.0218 0.0110
0.025 0.6793 0.6185 0.5668 0.5225 0.4844 0.4512 0.2663 0.1884 0.1457 0.1188 0.1002 0.0867 0.0764 0.0683 0.0518 0.0417 0.0315 0.0253 0.0127
0.01 0.7381 0.6785 0.6264 0.5810 0.5413 0.5063 0.3052 0.2177 0.1690 0.1381 0.1168 0.1011 0.0892 0.0798 0.0606 0.0488 0.0369 0.0296 0.0150
0.005 0.7746 0.7167 0.6652 0.6196 0.5792 0.5435 0.3326 0.2386 0.1858 0.1522 0.1288 0.1117 0.0985 0.0882 0.0670 0.0540 0.0409 0.0328 0.0166
0.001 0.8398 0.7875 0.7389 0.6944 0.6541 0.6176 0.3906 0.2839 0.2226 0.1831 0.1554 0.1350 0.1193 0.1069 0.0814 0.0658 0.0498 0.0401 0.0203
3.0 0.999 0.0316 0.0270 0.0237 0.0210 0.0189 0.0172 0.0090 0.0061 0.0046 0.0037 0.0031 0.0027 0.0023 0.0021 0.0016 0.0013 0.0009 0.0008 0.0004
0.75 0.2531 0.2206 0.1955 0.1756 0.1593 0.1459 0.0790 0.0542 0.0413 0.0333 0.0279 0.0240 0.0211 0.0188 0.0142 0.0114 0.0086 0.0069 0.0034
0.1 0.5962 0.5382 0.4901 0.4496 0.4152 0.3855 0.2242 0.1579 0.1218 0.0991 0.0836 0.0722 0.0636 0.0568 0.0430 0.0346 0.0261 0.0210 0.0106
0.05 0.6587 0.5997 0.5496 0.5069 0.4701 0.4381 0.2595 0.1839 0.1424 0.1162 0.0981 0.0849 0.0748 0.0669 0.0507 0.0408 0.0308 0.0248 0.0125
0.025 0.7096 0.6509 0.6001 0.5561 0.5178 0.4841 0.2916 0.2081 0.1616 0.1321 0.1117 0.0968 0.0853 0.0763 0.0580 0.0467 0.0353 0.0284 0.0143
0.01 0.7637 0.7068 0.6563 0.6117 0.5723 0.5373 0.3305 0.2377 0.1855 0.1520 0.1288 0.1117 0.0986 0.0882 0.0671 0.0541 0.0410 0.0329 0.0166
0.005 0.7970 0.7422 0.6926 0.6482 0.6085 0.5730 0.3577 0.2588 0.2026 0.1663 0.1411 0.1225 0.1082 0.0969 0.0738 0.0596 0.0451 0.0363 0.0183
0.001 0.8562 0.8073 0.7612 0.7185 0.6793 0.6436 0.4151 0.3042 0.2397 0.1977 0.1682 0.1463 0.1295 0.1161 0.0886 0.0717 0.0543 0.0438 0.0222
3.5 0.999 0.0473 0.0408 0.0359 0.0320 0.0289 0.0264 0.0140 0.0095 0.0072 0.0058 0.0049 0.0042 0.0037 0.0033 0.0025 0.0020 0.0015 0.0012 0.0006
0.75 0.2929 0.2572 0.2294 0.2070 0.1886 0.1732 0.0954 0.0659 0.0503 0.0407 0.0341 0.0294 0.0258 0.0230 0.0174 0.0140 0.0105 0.0084 0.0042
0.1 0.6282 0.5711 0.5230 0.4821 0.4470 0.4165 0.2468 0.1751 0.1356 0.1107 0.0935 0.0809 0.0713 0.0637 0.0484 0.0389 0.0294 0.0236 0.0119
0.05 0.6870 0.6296 0.5802 0.5376 0.5005 0.4681 0.2824 0.2018 0.1569 0.1283 0.1085 0.0940 0.0829 0.0742 0.0564 0.0454 0.0343 0.0276 0.0139
0.025 0.7344 0.6778 0.6282 0.5848 0.5466 0.5128 0.3147 0.2263 0.1765 0.1447 0.1226 0.1063 0.0939 0.0840 0.0639 0.0516 0.0390 0.0314 0.0158
0.01 0.7845 0.7302 0.6814 0.6379 0.5990 0.5642 0.3534 0.2562 0.2008 0.1650 0.1400 0.1216 0.1075 0.0963 0.0734 0.0593 0.0449 0.0361 0.0183
0.005 0.8152 0.7632 0.7156 0.6724 0.6335 0.5984 0.3804 0.2774 0.2181 0.1796 0.1526 0.1327 0.1173 0.1052 0.0802 0.0648 0.0491 0.0396 0.0200
0.001 0.8695 0.8235 0.7797 0.7387 0.7007 0.6658 0.4370 0.3228 0.2556 0.2114 0.1802 0.1570 0.1390 0.1248 0.0954 0.0773 0.0586 0.0473 0.0240
4.0 0.999 0.0648 0.0562 0.0496 0.0444 0.0402 0.0368 0.0198 0.0135 0.0103 0.0083 0.0069 0.0060 0052 0.0047 0035 0.0028 0.0021 0.0017 0.0009
0.75 0.3291 0.2910 0.2609 0.2364 0.2162 0.1991 0.1114 0.0774 0.0593 0.0481 0.0404 0.0348 0306 0.0273 0207 0.0166 0.0125 0.0100 0.0050
0.1 0.6554 0.5994 0.5517 0.5108 0.4753 0.4443 0.2678 0.1914 0.1488 0.1217 0.1030 0.0892 0787 0.0704 0535 0.0431 0.0326 0.0262 0.0132
0.05 0.7108 0.6551 0.6066 0.5644 0.5273 0.4946 0.3036 0.2185 0.1706 0.1398 0.1185 0.1028 0908 0.0813 0618 0.0499 0.0378 0.0304 0.0153
0.025 0.7551 0.7007 0.6525 0.6097 0.5719 0.5381 0.3359 0.2433 0.1906 0.1566 0.1329 0.1154 1020 0.0914 0696 0.0562 0.0426 0.0343 0.0173
0.01 0.8018 0.7500 0.7029 0.6604 0.6222 0.5878 0.3745 0.2735 0.2152 0.1773 0.1508 0.1311 1160 0.1040 0794 0.0642 0.0486 0.0392 0.0198
0.005 0.8303 0.7809 0.7351 0.6933 0.6552 0.6206 0.4012 0.2947 0.2327 0.1921 0.1636 0.1424 1261 0.1131 0864 0.0699 0.0530 0.0427 0.0217
0.001 0.8804 0.8371 0.7954 0.7560 0.7192 0.6851 0.4569 0.3401 0.2703 0.2242 0.1915 0.1670 1481 0.1331 1019 0.0826 0.0628 0.0506 0.0257
"Entries in the table are the beta values B ( a , p / 2 , ( n - p - 2 ) / 2 ) for an area (Alpha probability) in the upper tail of the beta distribution for the given parameter values
Table A.5 (continued): Percentage points of the beta distribution/
(N-
Alpha 5 6 7 8 9 10 20 30 40 50 60 70 80 90 120 150 200 250 500
4.5 0.999 0.0834 0.0727 0.0644 0.0579 0.0526 0.0481 0.0262 0.0180 0.0137 0.0111 0.0093 0.0080 0.0070 0.0063 0.0047 0.0038 0.0029 0.0023 0.0011
0.75 0.3620 0.3220 0.2901 0.2640 0.2422 0.2238 0.1271 0.0888 0.0683 0.0554 0.0467 0.0403 0.0354 0.0316 0.0239 0.0192 0.0145 0.0116 0.0059
0.1 0.6787 0.6241 0.5770 0.5362 0.5006 0.4693 0.2874 0.2068 0.1614 0.1324 0.1122 0.0973 0.0859 0.0769 0.0585 0.0472 0.0357 0.0287 0.0145
0.05 0.7311 0.6771 0.6297 0.5880 0.5512 0.5185 0.3234 0.2343 0.1836 0.1509 0.1281 0.1113 0.0983 0.0881 0.0671 0.0542 0.0411 0.0330 0.0167
0.025 0.7728 0.7204 0.6735 0.6317 0.5942 0.5607 0.3555 0.2593 0.2040 0.1680 0.1428 0.1242 0.1099 0.0985 0.0752 0.0608 0.0461 0.0371 0.0188
0.01 0.8165 0.7669 0.7215 0.6802 0.6427 0.6087 0.3938 0.2897 0.2288 0.1890 0.1610 0.1402 0.1242 0.1114 0.0851 0.0689 0.0523 0.0421 0.0214
0.005 0.8430 0.7960 0.7520 0.7115 0.6743 0.6403 0.4203 0.3109 0.2464 0.2040 0.1740 0.1517 0.1344 0.1207 0.0923 0.0748 0.0568 0.0458 0.0232
0.001 0.8896 0.8487 0.8089 0.7710 0.7354 0.7022 0.4752 0.3561 0.2842 0.2363 0.2022 0.1767 0.1568 0.1410 0.1082 0.0878 0.0668 0.0539 0.0274
5.0 0.999 0.0898 0.0799 0.0721 0.0656 0.0602 0.0331 0.0229 0.0175 0.0141 0.0119 0.0102 0.0090 0.0080 0.0060 0.0049 0.0037 0.0029 0.0015
0.75 0.3507 0.3173 0.2898 0.2668 0.2471 0.1424 0.1001 0.0771 0.0628 0.0529 0.0457 0.0403 0.0360 0.0272 0.0219 0.0165 0.0133 0.0067
0.1 0.6458 0.5995 0.5590 0.5234 0.4920 0.3059 0.2215 0.1735 0.1426 0.1210 0.1051 0.0929 0.0832 0.0634 0.0512 0.0388 0.0312 0.0158
0.05 0.6965 0.6502 0.6091 0.5726 0.5400 0.3418 0.2493 0.1961 0.1615 0.1373 0.1194 0.1057 0.0947 0.0723 0.0584 0.0443 0.0357 0.0181
0.025 0.7376 0.6921 0.6511 0.6143 0.5810 0.3738 0.2745 0.2167 0.1789 0.1524 0.1327 0.1175 0.1054 0.0805 0.0652 0.0494 0.0398 0.0202
0.01 0.7817 0.7378 0.6976 0.6609 0.6274 0.4118 0.3049 0.2418 0.2002 0.1708 0.1489 0.1320 0.1185 0.0908 0.0735 0.0558 0.0450 0.0229
0.005 0.8091 0.7668 0.7275 0.6913 0.6579 0.4379 0.3262 0.2595 0.2153 0.1840 0.1606 0.1424 0.1280 0.0981 0.0795 0.0604 0.0488 0.0248
0.001 0.8587 0.8206 0.7841 0.7496 0.7173 0.4920 0.3712 0.2974 0.2479 0.2124 0.1859 0.1652 0.1486 0.1142 0.0928 0.0706 0.0570 0.0290
6.0 0.999 0.1120 0.1016 0.0929 0.0857 0.0482 0.0335 0.0257 0.0209 0.0176 0.0152 0.0133 0.0119 0.0090 0.0072 0.0055 0.0044 0.0022
0.75 0.3663 0.3368 0.3117 0.2902 0.1717 0.1220 0.0946 0.0773 0.0653 0.0566 0.0499 0.0446 0.0339 0.0273 0.0206 0.0166 0.0084
0.1 0.6377 0.5982 0.5631 0.5317 0.3397 0.2490 0.1964 0.1621 0.1380 0.1202 0.1064 0.0954 0.0729 0.0590 0.0448 0.0361 0.0183
0.05 0.6848 0.6452 0.6096 0.5774 0.3754 0.2772 0.2195 0.1817 0.1550 0.1351 0.1197 0.1075 0.0823 0.0666 0.0506 0.0408 0.0207
0.025 0.7233 0.6842 0.6486 0.6162 0.4070 0.3026 0.2405 0.1995 0.1705 0.1488 0.1320 0.1186 0.0909 0.0737 0.0560 0.0452 0.0230
0.01 0.7651 0.7271 0.6920 0.6597 0.4443 0.3330 0.2659 0.2213 0.1894 0.1655 0.1470 0.1322 0.1015 0.0824 0.0627 0.0506 0.0257
0.005 0.7915 0.7546 0.7201 0.6882 0.4698 0.3542 0.2838 0.2366 0.2028 0.1774 0.1577 0.1419 0.1091 0.0886 0.0675 0.0545 0.0278
0.001 0.8401 0.8062 0.7738 0.7432 0.5222 0.3987 0.3217 0.2695 0.2317 0.2032 0.1809 0.1630 0.1257 0.1023 0.0781 0.0631 0.0322
7.0 0.999 0.1315 0.1209 0.1119 0.0642 0.0451 0.0348 0.0283 0.0239 0.0206 0.0182 0.0162 0.0123 0.0099 0.0075 0.0060 0.0030
0.75 0.3782 0.3518 0.3289 0.1994 0.1431 0.1117 0.0915 0.0776 0.0673 0.0594 0.0532 0.0405 0.0327 0.0247 0.0199 0.0101
0.1 0.6309 0.5965 0.5654 0.3700 0.2742 0.2177 0.1805 0.1541 0.1345 0.1192 0.1071 0.0821 0.0665 0.0506 0.0408 0.0207
0.05 0.6750 0.6404 0.6090 0.4054 0.3027 0.2413 0.2006 0.1716 0.1499 0.1331 0.1196 0.0918 0.0745 0.0567 0.0457 0.0233
0.025 0.7114 0.6771 0.6457 0.4365 0.3281 0.2626 0.2188 0.1874 0.1640 0.1457 0.1311 0.1008 0.0818 0.0623 0.0503 0.0256
0.01 0.7512 0.7177 0.6866 0.4729 0.3584 0.2882 0.2408 0.2067 0.1811 0.1611 0.1451 0.1118 0.0909 0.0693 0.0560 0.0286
0.005 0.7766 0.7439 0.7132 0.4977 0.3794 0.3060 0.2563 0.2204 0.1933 0.1721 0.1551 0.1196 0.0973 0.0743 0.0600 0.0307
0.001 0.8241 0.7936 0.7645 0.5485 0.4234 0.3439 0.2894 0.2496 0.2194 0.1957 0.1766 0.1366 0.1114 0.0851 0.0689 0.0353
*Entries in the table arc the beta values B(n,p/2,(n-p-2)/2) f°r an area (Alpha probability) in the tipper tail of the beta distribution for the given parameter values
Table A.5 (continued): Percentage points of the beta distribution.*
(n-p-l)/2
p/2 Alpha 9 10 20 30 40 50 60 70 80 90 120 150 200 250 500
8.0 0.999 0.1487 0.1381 0.0809 0.0573 0.0444 0.0362 0.0306 0.0265 0.0233 0.0209 0.0158 0.0128 0.0096 0.0077 0.0039
0.75 0.3877 0.3638 0.2255 0.1635 0.1282 0.1055 0.0896 0.0779 0.0689 0.0617 0.0471 0.0381 0.0288 0.0232 0.0118
0.1 0.6250 0.5945 0.3974 0.2976 0.2377 0.1979 0.1694 0.1481 0.1316 0.1184 0.0909 0.0738 0.0562 0.0454 0.0231
0.05 0.6666 0.6360 0.4323 0.3262 0.2616 0.2183 0.1873 0.1640 0.1458 0.1313 0.1010 0.0821 0.0626 0.0505 0.0258
0.025 0.7012 0.6708 0.4628 0.3516 0.2831 0.2368 0.2035 0.1784 0.1588 0.1430 0.1103 0.0897 0.0684 0.0553 0.0282
0.01 0.7393 0.7094 0.4984 0.3817 0.3087 0.2591 0.2231 0.1959 0.1745 0.1574 0.1216 0.0990 0.0756 0.0612 0.0313
0.005 0.7638 0.7344 0.5226 0.4025 0.3266 0.2747 0.2369 0.2082 0.1857 0.1676 0.1296 0.1056 0.0808 0.0654 0.0335
0.001 0.8101 0.7824 0.5718 0.4458 0.3644 0.3078 0.2663 0.2347 0.2097 0.1895 0.1470 0.1201 0.0920 0.0745 0.0382
9.0 0.999 0.1639 0.0978 0.0698 0.0543 0.0445 0.0376 0.0326 0.0288 0.0258 0.0196 0.0158 0.0120 0.0096 0.0049
0.75 0.3954 0.2500 0.1830 0.1443 0.1191 0.1015 0.0883 0.0782 0.0702 0.0537 0.0434 0.0330 0.0266 0.0135
0.1 0.6198 0.4224 0.3194 0.2566 0.2144 0.1841 0.1613 0.1435 0.1292 0.0995 0.0809 0.0617 0.0499 0.0255
0.05 0.6594 0.4567 0.3479 0.2807 0.2351 0.2023 0.1775 0.1581 0.1425 0.1099 0.0895 0.0683 0.0553 0.0282
0.025 0.6924 0.4867 0.3732 0.3022 0.2538 0.2187 0.1921 0.1713 0.1545 0.1194 0.0973 0.0744 0.0602 0.0308
0.01 0.7290 0.5214 0.4031 0.3279 0.2762 0.2385 0.2099 0.1873 0.1692 0.1310 0.1069 0.0818 0.0662 0.0339
0.005 0.7526 0.5449 0.4236 0.3458 0.2919 0.2525 0.2224 0.1987 0.1795 0.1392 0.1137 0.0871 0.0705 0.0362
0.001 0.7977 0.5925 0.4663 0.3833 0.3251 0.2821 0.2491 0.2229 0.2018 0.1570 0.1284 0.0985 0.0799 0.0411
10 0.999 0.1148 0.0826 0.0645 0.0530 0.0449 0.0390 0.0345 0.0309 0.0235 0.0190 0.0144 0.0116 0.0059
0.75 0.2732 0.2017 0.1599 0.1325 0.1131 0.0986 0.0875 0.0786 0.0602 0.0488 0.0371 0.0299 0.0152
0.1 0.4452 0.3397 0.2744 0.2301 0.1981 0.1739 0.1549 0.1397 0.1079 0.0879 0.0671 0.0543 0.0278
0.05 0.4790 0.3682 0.2986 0.2511 0.2166 0.1904 0.1698 0.1533 0.1186 0.0967 0.0739 0.0599 0.0307
0.025 0.5083 0.3933 0.3202 0.2699 0.2332 0.2053 0.1833 0.1656 0.1283 0.1047 0.0802 0.0649 0.0333
0.01 0.5422 0.4228 0.3459 0.2925 0.2532 0.2232 0.1996 0.1805 0.1401 0.1145 0.0878 0.0712 0.0365
0.005 0.5651 0.4431 0.3637 0.3082 0.2672 0.2359 0.2111 0.1910 0.1485 0.1215 0.0932 0.0756 0.0389
0.001 0.6113 0.4851 0.4010 0.3413 0.2970 0.2627 0.2356 0.2134 0.1665 0.1365 0.1049 0.0852 0.0439
15 0.999 0.1963 0.1462 0.1166 0.0970 0.0831 0.0726 0.0645 0.0581 0.0446 0.0363 0.0276 0.0223 0.0114
0.75 0.3712 0.2846 0.2308 0.1941 0.1675 0.1473 0.1315 0.1187 0.0920 0.0750 0.0574 0.0465 0.0239
0.1 0.5361 0.4245 0.3511 0.2991 0.2605 0.2307 0.2071 0.1878 0.1467 0.1204 0.0927 0.0754 0.0389
0.05 0.5668 0.4519 0.3753 0.3206 0.2798 0.2482 0.2230 0.2024 0.1585 0.1302 0.1004 0.0817 0.0423
0.025 0.5930 0.4758 0.3965 0.3397 0.2970 0.2638 0.2372 0.2155 0.1691 0.1391 0.1073 0.0874 0.0453
0.01 0.6230 0.5035 0.4216 0.3622 0.3174 0.2824 0.2543 0.2313 0.1818 0.1498 0.1157 0.0943 0.0490
0.005 0.6430 0.5224 0.4387 0.3778 0.3315 0.2953 0.2662 0.2422 0.1907 0.1573 0.1217 0.0992 0.0516
0.001 0.6830 0.5609 0.4743 0.4103 0.3612 0.3226 0.2913 0.2655 0.2098 0.1733 0.1344 0.1097 0.0572
*Entries in the table are the beta values B(aip/2](n-p-2)/2) f°r an area (Alpha probability) in the upper tail of the beta distribution
for the given parameter values of p/2 and (n-p-l)/2.
Table A.5 (continued): Percentage points of the beta distribution.*
(n-p-l)/2
p/2 Alpha 30 40 50 60 70 80 90 120 150 200 250 500
20 0.999 0.2057 0.1670 0.1406 0.1215 0.1069 0.0955 0.0863 0.0669 0.0547 0.0419 0.0339 0.0174
0.75 0.3524 0.2913 0.2482 0.2163 0.1916 0.1720 0.1560 0.1221 0.1003 0.0772 0.0628 0.0325
0.1 0.4893 0.4122 0.3560 0.3131 0.2795 0.2523 0.2300 0.1816 0.1501 0.1164 0.0950 0.0496
0.05 0.5152 0.4358 0.3774 0.3326 0.2973 0.2688 0.2452 0.1941 0.1606 0.1247 0.1019 0.0533
0.025 0.5376 0.4564 0.3962 0.3498 0.3131 0.2834 0.2587 0.2052 0.1700 0.1322 0.1081 0.0566
0.01 0.5634 0.4804 0.4182 0.3701 0.3319 0.3007 0.2749 0.2185 0.1813 0.1411 0.1156 0.0606
0.005 0.5809 0.4967 0.4334 0.3841 0.3448 0.3127 0.2861 0.2278 0.1891 0.1474 0.1208 0.0634
0.001 0.6162 0.5303 0.4647 0.4132 0.3719 0.3380 0.3097 0.2474 0.2059 0.1609 0.1320 0.0695
25 0.999 0.2595 0.2139 0.1821 0.1586 0.1404 0.1260 0.1143 0.0894 0.0734 0.0566 0.0460 0.0238
0.75 0.4089 0.3432 0.2958 0.2599 0.2318 0.2092 0.1906 0.1505 0.1243 0.0964 0.0787 0.0411
0.1 0.5407 0.4625 0.4038 0.3583 0.3219 0.2922 0.2676 0.2134 0.1775 0.1386 0.1137 0.0598
0.05 0.5651 0.4852 0.4248 0.3777 0.3399 0.3089 0.2831 0.2263 0.1885 0.1474 0.1210 0.0638
0.025 0.5860 0.5049 0.4432 0.3947 0.3557 0.3236 0.2969 0.2378 0.1982 0.1552 0.1275 0.0674
0.01 0.6100 0.5278 0.4646 0.4146 0.3743 0.3410 0.3131 0.2514 0.2099 0.1646 0.1354 0.0717
0.005 0.6262 0.5433 0.4792 0.4283 0.3871 0.3530 0.3244 0.2608 0.2180 0.1712 0.1409 0.0747
0.001 0.6587 0.5750 0.5093 0.4567 0.4137 0.3781 0.3480 0.2808 0.2352 0.1851 0.1526 0.0812
*Entries in the table arc the beta values -B(Q.p/2,(«,-p-2)/2) f°r an area (Alpha probability) in the upper tail of the beta
distribution for the given parameter values of p/2 and (n-p-l)/2.
Bibliography
Agnew, J.L., and Knapp, R.C. (1995). Linear Algebra with Applications,
Brooks/Cole, Pacific Grove, CA.
Alt, F.B. (1982). "Multivariate Quality Control: State of the Art," Quality Congress
Transactions, American Society for Quality, Milwaukee, WI, pp. 886-893.
Alt, F.B., Deutch, S.J., and Walker, J.W. (1977). "Control Charts for Multivariate,
Correlated Observations," Quality Congress Transactions, American Society for
Quality, Milwaukee, WI, pp. 360-369.
Anderson, D.R., Sweeney, D.J., and Williams, T.A. (1994). Introduction to Statis-
tics Concepts and Applications, West Publishing Company, New York.
Anderson, T.W. (1984). An Introduction to Multivariate Statistical Analysis, 2nd

ed., Wiley, New York.
Barnett, V., and Lewis, T. (1994). Outliers in Statistical Data, 3rd ed., Wiley, New
York.
Belsley, D.A. (1991). Conditioning Diagnostics: Collinearity and Weak Data in

Regression, Wiley, New York.
Belsley, D.A., Kuh, E., and Welsch, R.E. (1980). Regression Diagnostics: Identify-
ing Influential Data and Sources of Collinearity, Wiley, New York.
Box, G.E.P., and Jenkins, G.M. (1976). Time Series Analysis: Forecasting and
Control, Holden-Day, San Francisco, CA.
Chatterjee, S., and Price, B. (1999). Regression Analysis by Example, 3rd ed., Wiley,
New York.
Chou, Y.M., Mason, R.L., and Young, J.C. (1999) "Power Comparisons for a
Hotelling's T2 Statistic," Commun. Statist. Simulation Comput., 28, pp. 1031-
1050.
Chou, Y.M., Mason, R.L., and Young, J.C. (2001). "The Control Chart For Individ-
ual Observations from a Multivariate Non-Normal Distribution." Comm. Statist.,
30, pp. 1937-1949.
253
254 Bibliography
Chou, Y.M., Polansky, A.M., and Mason, R.L. (1998). "Transforming Non-Normal
Data to Normality in Statistical Process Control," J. Quality Technology, 30, pp.
133-141.
Conover, W. J. (2000). Practical Nonparametric Statistics, 3rd ed., Wiley, New York.
David, H.A. (1970). Order Statistics, Wiley, New York.
Doganaksoy, N., Faltin, F.W., and Tucker, W.T. (1991). "Identification of Out-
of-Control Quality Characteristics in a Multivariate Manufacturing Environment,"
Comm. Statist. Theory Methods, 20, pp. 2775-2790.
Dudewicz, E.J., and Mishra, S.N. (1988). Modern Mathematical Statistics, Wiley,
New York.
Duncan, A.J. (1986). Quality Control and Industrial Statistics, 5th ed., Richard D.
Irwin, Homewood, IL.
Fuchs, C., and Kenett, R.S. (1998). Multivariate Quality Control, Dekker, New
York.
Gnanadesikan, R. (1977). Methods for Statistical Data Analysis of Multivariate
Observations, Wiley, New York.
Hawkins, D.M. (1980). Identification of Outliers, Chapman and Hall, New York.
Hawkins, D.M. (1981). "A New Test for Multivariate Normality and Homoscedas-
ticity," Technometrics, 23, pp. 105-110.
Hawkins, D. M. (1991). "Multivariate Quality Control Based on Regression-
Adjusted Variables," Technometrics, 33, pp. 61-75.
Hawkins, D.M. (1993). "Regression Adjustment for Variables in Multivariate Qual-
ity Control," J. Quality Technology, 25, pp. 170-182.
Holmes, D.S., and Mergen, A.E. (1993). "Improving the Performance of the T2
Control Chart," Quality Engrg., 5, pp. 619-625.
Hotelling, H. (1931). "The Generalization of Student's Ratio," Ann. Math. Statist.,
2, pp. 360-378.
Hotelling, H. (1947). "Multivariate Quality Control," in Techniques of Statistical

Analysis, edited by C. Eisenhart, M.W. Hastay, and W.A. Wallis, McGraw-Hill,
New York, pp. 111-184.
Jackson, J.E. (1991). A User's Guide to Principal Components, Wiley, New York.
Johnson, M.E. (1987). Multivariate Statistical Simulation, Wiley, New York.
Johnson, R.A., and Wichern, D.W. (1998). Applied Multivariate Statistical Analy-
sis, 4th ed., Prentice-Hall, Englewood Cliffs, NJ.
Bibliography 255
Kourti, T., and MacGregor, J.F. (1996). "Multivariate SPG Methods for Process
and Product Monitoring," J. Quality Technology, 28, pp. 409-428.
Kshirsagar, A.M., and Young, J.C. (1971). "Correlation Between Two Hotelling's
T2," Technical Report, Department of Statistics, Southern Methodist University,
Dallas, TX.
Langley, M.P., Young, J.C., Tracy, N.D., and Mason, R.L. (1995). "A Computer
Program for Monitoring Multivariate Process Control," in Proceedings of the Sec-
tion on Quality and Productivity, American Statistical Association, Alexandria, VA,
pp. 122-123.
Little, R.J.A., and Rubin, D.B. (1987). Statistical Analysis with Missing Data,
Wiley, New York.
Looney, S.W. (1995). "How to Use Tests for Univariate Normality to Assess Mul-
tivariate Normality," Amer. Statist., 49, pp. 64-70.
Mahalanobis, P.C. (1930). "On Tests and Measures of Group Divergence." J. Proc.
Asiatic Soc. Bengal, 26, pp. 541-588.
Mardia, K.V., Kent, J.T., and Bibby, J.M. (1979). Multivariate Analysis, Academic
Press. New York.
Mason, R.L., Champ, C.W., Tracy, N.D., Wierda, S.J., and Young, J.C. (1997).
"Assessment of Multivariate Process Control Techniques," J. Quality Technology,
29, pp. 140-143.
Mason, R.L., Chou, Y.M., and Young, J. C. (2001). "Applying Hotelling's T2 Statis-
tic to Batch Processes," J. Quality Technology, 33, pp. 466-479.
Mason, R.L., Tracy, N.D., and Young, J.C. (1995). "Decomposition of T2 for Mul-
tivariate Control Chart Interpretation," J. Quality Technology, 27, pp. 99-108.
Mason, R.L., Tracy. N.D., and Young, J.C. (1996). "Monitoring a Multivariate Step
Process," J. Quality Technology, 28, pp. 39-50.
Mason, R.L., Tracy, N.D., and Young, J.C. (1997). "A Practical Approach for
Interpreting Multivariate T2 Control Chart Signals," J. Quality Technology. 29,
pp. 396-406.
Mason, R.L., and Young, J.C. (1997), "A Control Procedure for Autocorrelated
Multivariate Process," in Proceedings of the Section on Quality and Productivity,
American Statistical Association, Alexandria, VA, pp. 143-145.
Mason, R.L., and Young, J.C. (1999). "Improving the Sensitivity of the T2 Statistic
in Multivariate Process Control," J. Quality Technology, 31, pp. 155-165.
Mason, R.L., and Young, J.C. (2000). "Autocorrelation in Multivariate Processes,"
in Statistical Monitoring and Optimization for Process Control, edited by S. Park
and G. Vining, Marcel Dekker, New York, pp. 223-240.
256 Bibliography
Montgomery, D.C., and Mastrangelo, C.M. (1991). "Some Statistical Process Con-
trol Methods for Autocorrelated Data (with Discussion)," J. Quality Technology,
23, pp. 179-204.
Montgomery, D.C. (2001). Introduction to Statistical Quality Control, 5th ed., Wi-
ley, New York.
Morrison, D.F. (1990). Multivariate Statistical Methods, 3rd ed., McGraw-Hill, New
York.
Myers, R.H. (1990). Classical and Modern Regression with Applications, 2nd ed.,
Duxbury Press, Boston, MA.
Myers, R.H., and Milton, J. (1991). A First Course in the Theory of Linear Statis-
tical Models, PWS-Kent, Boston, MA.
Polansky, A.M., and Baker, E.R. (2000). "Multistage Plug-In Bandwidth Selection
for Kernel Distribution Function Estimates," J. Statist. Comput. Simulation, 65,
pp. 63-80.
Rencher, A.C. (1993). "The Contribution of Individual Variables to Hotelling's T2,
Wilks' A, and #2," Biometrics, 49, pp. 479-489.
Runger, G.C., Alt, F.B., and Montgomery, D.C. (1996). "Contributors to a Multi-
variate Statistical Process Control Chart Signal," Comra. Statist. Theory Methods,
25, pp. 2203-2213.
Ryan, T.P. (2000). Statistical Methods for Quality Improvement, 2nd ed., Wiley,
New York.
Scholz, F.W., and Tosch, T.J. (1994). "Small Sample Uni- and Multivariate Control
Charts for Means," in Proceedings of the American Statistical Association, Quality
and Productivity Section, American Statistical Association, Alexandria, VA, pp.
17-22.
Seber, G.A.F. (1984). Multivariate Observations, Wiley, New York.
Sharma, S. (1995). Applied Multivariate Techniques, Wiley, New York.
Sullivan, J.H., and Woodall, W.H. (1996). "A Comparison of Multivariate Control
Charts for Individual Observations," J. Quality Technology, 28, pp. 398-408.
Sullivan, J.H., and Woodall, W.H. (2000). "Change-Point Detection of Mean Vec-
tor or Covariance Matrix Shifts Using Multivariate Individual Observations," HE
Trans., 32, pp. 537-549.
Timm, N.H. (1996). "Multivariate Quality Control Using Finite Intersection Tests,"
J. Quality Technology, 28, pp. 233-243.
Tracy, N.D., Young, J.C., and Mason, R.L. (1992) "Multivariate Control Charts
for Individual Observations," J. Quality Technology, 24, pp. 88-95.
Bibliography 257
Velilla, S. (1993). "A Note on the Multivariate Box-Cox Transformation to Nor-

mality," Statist. Probab. Lett. 17, pp. 259-263.
Wade, M.R., and Woodall, W.H. (1993). "A Review and Analysis of Cause-Selecting
Control Charts," J. Quality Technology, 25, pp. 161-169.
Wierda, S.J. (1994). Multivariate Statistical Process Control, Groningen Theses
in Economics, Management and Organization, Wolters-Noordhoff, Groningen, the
Netherlands.
Williams, B. (1978). A Sampler on Sampling, Wiley, New York.
Wishart, J. (1928). "The Generalized Product Moment Distribution in Samples
from a Normal Multivariate Population," Biometrika, 20A, pp. 32-52.
Index
abrupt process change, 174 category 2, 216, 226

acceptance region, 132 control region for, 217
advanced process control (APC), 2, 3 outlier removal for, 226
alternative covariance estimators, 26 estimation in, 217
analysis-of-variance table, 203 monitoring, 213, 234
Anderson-Darling test, 34 Phase II formulas for, 231
APC systems, see advanced process Phase II operation with, 230
control T2 control procedure for, 213
AR(1) model, see first-order autore- types of, 213
gressive model batch production, 230
AR(2) model, see second-order autore- beta distribution, 24, 34, 50, 85, 247
gressive model beta quantiles, 41
ARL, see average run length between-batch variation, 218, 226
autocorrelated observations, 55, 193, bivariate control region, 117, 120
194 geometrical representation, 145
autocorrelated processes, 78 bivariate normal
autocorrelation, 69, 193, 196, 198 contours, 18, 19, 35
detecting, 68, 71-73 distribution, 140, 141
function, 72 probability function, 17, 19
occurrence, 69
pattern, 194 cause-and-effect relationship, 69, 196
sample estimator, 72, 78 cause-selecting (CS) chart, 132
useful indication of, 76 center line (CL), 111
autoregressive function, 72, 202, 206 central limit theorem, 107
autoregressive model, 193, 212 Chebyshev procedure, 96
autoregressive residuals, 193 application of, 48
average run length (ARL) in a T2 control chart, 48
procedure based on, 92
for a control procedure, 111
theorem, 51, 92
probability of detecting given shifts
chi-square
in, 114
distribution, 21, 23, 85, 113, 241
probability function, 23
backward-elimination scheme, 156 CI, see confidence interval
batch processes, 10, 27, 213 collinearity, 55, 80, 174
category 1, 213, 221 detecting, 65
control regions for, 215 occurrence, 65
outlier removal for, 219 outliers, 65
259
260 Index
sampling restrictions, 65 decay

severe, 67, 78, 182 processes, 98
theoretical relationships, 65 stage, 70, 211, 212
condition index, 67, 79 uniform, 69, 199, 201, 211, 212
condition number, 67 determinant, 30
conditional distribution, 148, 159, 172 dimension-reduction technique, 80
conditional mean, 126, 141, 201 discrete variables, 50
conditional probability function, 140 distance measure, see statistical dis-
conditional T2, 126, 129, 149 tance
conditional terms, 157 distribution-free methods, 48, 81
alternative forms of, 172
signaling of, 169 eigenvalues, 78, 79
conditional variance, 126, 129, 141, eigenvectors, 78, 79
148, 162, 173 electronic data collectors, 9, 61
continuous process, 213 elliptical
contours, 18 contour, 18-20
correlogram, 72, 206, 212 control region, 115, 117
countercorrelation, 129, 159, 161, 169 distance, 8
covariance estimator equal-incremental-rate formulation, 183
alternative, 27 error residuals, 172
based on successive difference, 26 Euclidean distance, 6, 13-15, 21, 121,
by partitioning the data, 27 137
common, 27 experimental design, 11
for batches, 218 expert knowledge, 180
covariance matrix, 20 exploring process data, 192
known, 91 exponential autoregressive
nonsingular, 55 relationship, 202
sample, 21, 29, 65 F distribution, 22, 24, 50, 85, 242
singular, 55, 65, 78 false alarm rate, 98
test of equality, 229 first-order autogressive model AR(1),
cyclic trend, 197 72, 199, 203, 206
forward-iterative scheme, 155
functional form of variables, 64, 173
data functional relationships, 173, 191
autocorrelated, 70, 211
collection, 61
global variable specifications, 183
example, 92, 135
goodness-of-fit test, 36, 51
exploration techniques, 183
gradual process shifts
in-control set, 54
improving sensitivity, 188
incomplete, 62
graphical output, 192
missing, 61, 62
preliminary set, 78
purging, 85 Historical data set (HDS), 22, 78, 176
time-adjusted, 208 analyzing, 174
transforming, 174 atypical observations, 81
Index 261
construction, 54-56 methods of detection, 212

outlier in, 85 missing data, 62, 154
hyperellipsoid, 118, 158 model creation
hypothesis test, 83, 229 using data exploration, 183
using expert knowledge, 180
industrial process, 55 model specification, 172
input/output (I/O) curve, 180 correct, 180
multivariate control procedure, 5, 6,
JMP™, 11 14
Johnson system, 47 characteristics of, 9
multivariate normal (MVN)
kernel smoothing technique, 48, 51, assessment of, 34
92, 94, 96 assumption, 33
for approximation to UCL, 48 distribution, 23, 31, 33, 85
Kolmogorov-Smirnov test, 33 probability function, 20
kurtosis, 38, 43 multivariate observation, 14
Mardia's statistic, 39 multivariate procedures, 9
sample estimate, 39 multivariate process, 5
multivariate transformation, 47
lag relationships, 60 MVN, see multivariate normal
lag values, 72, 209 MYT decomposition, 125, 132, 142
LCL, see local control limit components, 125, 127, 153, 160
life cycle, 69 computational scheme, 163
log sheets, 61 computing the terms, 149
lower control limit (LCL), 4 distribution of the components,
lurking variable, 196, 199, 226 131
general procedure, 147
Mahalanobis's distance, 8, 13 largeness of the components, 131
matrix, 28, 31 locating signaling variables, 155
collinearity, 65 properties of, 152
correlation, 202 shortcut approach, 149
determinant of, 79 total, 133
eigenvalues, 65
eigenvectors, 65 noncentral chi-square, 114
inverse, 29, 66 noncentrality parameter, 114
near-singular, 66, 67, 79 nonnormal distribution, 33, 34
nonsingular, 30, 65 multivariate, 37, 40
positive definite, 31, 32, 79, 144 normal variate, 21
square, 30 distribution, 239
symmetric, 30, 79 transformation, 47
transpose, 29
mean of a sample on-line experimentation, 10, 172
monitoring of, 26, 86, 119, 232 orthogonal decomposition, 120, 125
mean vector, 20, 29 orthogonal transformation, 123
estimate, 162 out-of-control point, 9, 119
known, 91 out-of-control variables, 159
262 Index
outlier, 47, 54, 81 SD, see statistical distance

detection, 82, 85 seasonal effect, 197
known parameter case, 91 second-order autoregressive
potential, 45 model AR(2), 199
problem in batches, 220 sensitivity, 10, 174
purging, 86, 91, 95 Shapiro-Wilk test, 33
Shewhart
parallelogram, 132, 133 chart, 4, 10, 82, 85, 99, 108, 158,
parameters, 22 188
partitions, 153 control box, 130
PCA, see principal component analy- control limits, 133
sis multivariate, 10
permuting components, 152 signal, 159
Phase I operation, 53, 54, 81, 83, 97, signal detection, 8, 118, 119
219 signal interpretation, 10, 119, 134, 157
Phase II operation, 53, 54, 98, 100, singularities, 115
105, 171, 232 size of control region, 85, 99
planning stage, 55 small process shifts, 188
pooled estimate of 5*, 26 detecting, 190
potential outliers, 45 SPC, see statistical process control
power of test, 83, 111 stage decay, 70
prediction equation, 162 standardized residual, 130, 162, 185,
preliminary data set, 57, 78 188, 191
principal component analysis (PCA), statistical distance (SD), 7, 11, 13-15,
50, 65, 78, 79 22, 120-122, 127
principal components, 65, 115-118, 124, statistical process control (SPC), 3
143, 145 steady-state control procedures, 78
process map, 55 steady-state operating conditions, 110
production units, 5 steady-state process, 98
purging process, 54, 82, 86 subgroup mean, 26, 86, 106, 230, 232
systematic pattern over time, 198
Q-Q plot, 41, 43, 51, 88, 97, 136
QualStat™, 11, 54 t distribution, 22, 188, 240
quantile technique, 51, 92, 94, 96 t statistic, 13, 20, 21
T2 chart, 10, 101
R chart, 119 autocorrelation patterns in, 194
regression determining ARL for, 118
analysis, 120, 173 including the cyclical process, 197
model, 189 interpretive features of, 108
fitting, 191 known parameters with, 105
perspective, 129, 162 plotting in principal component
residuals, 173, 193 space, 118
subgroup means with, 106
sample size, 49 trends in, 111
SAS™, 11 U pattern in, 109
scatter plot, 64 unknown parameters with, 100
Index 263
T2 component time adjustments, 200

conditional, 174 time dependency, 98, 193
interpretation of, 128 time effect, 199
interpretation of a signal on, 157 time-adjusted estimates
invariance property, 153 of covariance matrix, 207
T2 decomposition, 125, 130, 148, 163, of mean vector, 207
171, 200 time-adjusted statistic, 10
conditional terms, 126, 191 time-adjusted variables, 210
distribution of the terms, 132 time-sequence plots, 71, 75, 136, 194
unconditional terms, 126 time-sequence variable, 71, 202, 208
tolerance regions, 133
T2 control region, 85, 120, 145
transcribing errors, 61
T2 statistic, 8, 10, 21, 24, 91, 194,
transformations, 47, 51
226, 230
Type I error, 85, 99
abrupt process changes with, 191 Type II error, 99
adjusted for time effects, 200
alternate form of, 143 UCL, see upper control limit
assessing the sampling distribu- [U-shaped pattern, 109, 196, 208
tion of, 34, 41 unconditional components
assumptions, 33 signaling of, 169
computed for any subvector, 153 unconditional terms, 126, 127, 139,
detecting small process shifts, 172 150, 157
distribution for Phase I, 219 critical values (CV) for, 157
distribution properties of, 22 signaling on, 139
estimating the UCL of, 48 univariate control chart, 4
for individual observation vector, upper control limit (UCL), 4, 100
23 approximate, 48
for subgroup mean, 107, 230 estimation, 51
form and distribution of, 26, 226 upset conditions, 9, 109
gradual proces shifts with, 191 variables
kernel distribution of, 94 functional forms of, 64, 173
matrix form of, 28 variation
monitoring of. 169 between-batch, 218
nonnormal distributions and, 37 shifts, 119
orthogonal components, 143 within-batch, 218
orthogonal decomposition of, 120, total, 218
125 vectors, 28
principal component form of, 143
related assumptions, 33 Wishart
sampling distribution of, 33, 37 distribution, 22
sensitivity of, 147, 169 matrix, 31
T2 value, 105 probability function, 32
correlation between any two, 25 variate, 21
within-group estimator, 27
of the time-adjusted observation,
212

Multivariate Statistical Process Control With Industrial Application ( PDFDrive )

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Multivariate Statistical Process Control With Industrial Application ( PDFDrive )

Uploaded by

Copyright:

Available Formats

Multivariate Statistical

Process Control with

Janet P. Buckingham Jane F. Pendergast

Richard K. Burdick Alan M. Polansky

James A. Calvin Paula Roberson

Mason, R. L. and Young, J. C., Multivariate Statistical Process Control with

Library of Congress Cataloging-in-Publication Data

TS156.8 .M348 2001

Windows and Windows NT are registered trademarks of Microsoft Corporation.

1 Introduction to the T2 Statistic 1

2 Basic Concepts about the T2 Statistic 13

3 Checking Assumptions for Using a T2 Statistic 33

3.9 Discrete Variables 50

4 Construction of Historical Data Set 53

5 Charting the T2 Statistic in Phase I 81

6 Charting the T2 Statistic in Phase II 97

7 Interpretation of T2 Signals for Two Variables 119

7.4 Interpretation of a Signal on a T2 Component 127

8 Interpretation of T2 Signals for the General Case 147

9 Improving the Sensitivity of the T2 Statistic 171

10 Autocorrelation in T2 Control Charts 193

11 The T2 Statistic and Batch Processes 213

11.2 Types of Batch Processes 213

Appendix. Distribution Tables 237

Although an understanding of matrix algebra is a prerequisite in studying any area

helpful in the initial stages of development of a T2 control procedure. A special

The Saga of Old Blue

fundamental relationships existing between and among the variables. Perhaps

Figure 1.1: Shewhart chart of a process variable.

1.2 Univariate Control Procedures

Figure 1.2: Model of a production unit.

1.3 Multivariate Control Procedures

This type of distance measure is known as Euclidean, or straight-line, distance.

Figure 1.3: Scatter plot illustrating straight-line distance.

where x\ represents the temperature component of the observation vector and x%

Figure 1.4: Scatter plot illustrating statistical distance.

where R = S12/S1S2is the sample correlation coefficient The actual SD value is

1.4 Characteristics of a Multivariate Control

Charting with the T2 Statistic

production quotas, the creation of dangerous and hazardous conditions, extended

2.2 Statistical Distance

and all points satisfying the relationship

Figure 2.1: Region of same straight-line distance.

Figure 2.2: Region of same statistical distance.

Figure 2.3: Scatter plot of correlated variables.

From analytical geometry, the general equation of an ellipse is given by

Figure 2.4: Elliptical region encompassing data points.

2.3 72 and Multivariate Normality

Figure 2.5: Correlation and the ellipse.

is illustrated in Figure 2.5. If p = 0, so that there is no correlation between x\ and

Figure 2.6: A bivariate normal probability function (a\ = 1, cr2 = 1, p = 0.8).

where T2 is a form of Hotelling's T2 statistic. As in the bivariate case, the ellipsoidal

2.4 Student t versus Hotelling's T"2

where is the sample mean and is the

corresponding sample standard deviation. The square of the t statistic is given by

where X and S are sample estimators of /u and E and are denned by

The sample covariance matrix S also can be expressed as

T2 — (multivariate normal vector) * (Wishart matrix/df)

2.5 Distributional Properties of the T2

Figure 2.8: Chi-square distribution.

Several different probability functions can be used in describing the T2 statis-

where %2 ^ represents a chi-square distribution with p degrees of freedom. The

Figure 2.9: F distribution.

where F(p^n_p^ is an F distribution with p and (n — p) degrees of freedom. The