Chemometrics and

laboratory systems
ELSEVIER Chemometrics and Intelligent Laboratory Systems 30 (199.5)97-108

Multi-way partial least squares in monitoring batch processes
Paul Nomikos *, John F. MacGregor
Department of Chemical Engineering, McMaster UniversiTy,Hamilton, Ontario, Canada L8S 4L7

Received 21 December 1994; accepted 10 May 1995


Multivariate statistical procedures for monitoring the progress of batch processes are developed. Multi-way partial least
squares (MPLS) is used to extract the information from the process measurement variable trajectories that is more relevant to
the final quality variables of the product. The only information needed is a historical database of past successful batches.
New batches can be monitored through simple monitoring charts which are consistent with the philosophy of statistical pro-
cess control. These charts monitor the batch operation and provide on-line predictions of the final product qualities. Approxi-
mate confidence intervals for the predictions from PLS models are developed. The approach is illustrated using a simulation
study of a styrene-butadiene batch reactor.

Keywords: Multivariate analysis; Partial least squares; Batch processes; Statistical process control

1. Introduction programmable logic controllers. Monitoring these
batch processes is very important to ensure their safe
Batch and semi-batch processes play an important
operation and to assure that they produce consis-
role in the chemical industry due to their low vol- tently high quality products.
ume-high value products. Examples include reac- Batch processes suffer a lack of reproducibility
tors, crystallization, distillation, and injection mold- from batch to batch variations due to disturbances and
ing processes; the manufacturing of polymers, herbi- the absence of on-line quality measurements. These
cides, insecticides, pharmaceuticals, and biochemi- variations may be difficult for an operator to discern,
cals. Batch processes are characterized by a pre- but could have an adverse effect on the final product
scribed processing of materials for a finite duration. quality. Often, a disturbance or an operational prob-
Successful batch operation means tracking this pre- lem can be undetected and the poor product quality
scribed recipe with high degree of reproducibility. may remain undetected until significant expense has
Feedrates, temperature and pressure profiles are im- been incurred. A system that monitors the time evo-
plemented with servo-controllers, and precise se- lution of a batch and can detect variations from nor-
quencing operations are produced with tools such as mal operation in the familiar manner of a statistical
process control @PC) chart might allow for correc-
tive action early in the batch, the quick disposition of
* Corresponding author. Present address: DuPont Canada Inc.,
Research and Development, 461 Front Rd., Kingston, Ontario, batches not salvaged, and the diagnosis of assignable
K7L 5A5, Canada. email: causes that can be eliminated from future batches.

0169-7439/95/$09.50 0 1995 Elsevier Science B.V. All rights reserved
SSDI 0169-7439(95)00043-7

nonlinear behavior. Nomikos and MacGregor [3. The use of statistical methods such as principal dictions and on the development of control charts for component analysis (PCA) and partial least squares on-line predictions of the final product qualities. flexi. The aim is to build an empirical model based Most batch and semi-batch processes operate in on the measurements of a reference batch database. variables at any one time important. of batch trajectory data consists of i = 1. [l]). Subsequently. open loop with respect to product quality variables. K time quality variables (Y) taken at the end of each batch intervals throughout the batch. Nomikos. Not only is the relationship among all the are detected by referencing the measured process be. J. which will describe the normal operation of the pro- simply because few. which utilize directly the in- formation of the on-line measurements and systemat.10] only makes use of the process vari. A historical dataset schemes [3. ity of the product. Some of the variables measured may be re- the process is operating well and is in a state of sta. such product quality data can be steady state.. Upon completion of the This Gpirical model will be used to monitor the batch a range of quality measurements are usually evolution of future batch runs.1 batch able trajectory measurements (X> taken throughout runs where each of them has the on-line measure- the duration of the batch.. on-line sensors exist for cess (X) when it produces good quality product (Y).F. Measyrements on product ments of j = 1. MPLS [ 111 is an extension of PLS [12] to handle trol laboratory.. For on-line monitoring of batch (Y). ables. The behavior of the tures. ited only to end product quality measurements [4.2. were used only to help classify a batch as ‘good’ or bility. Rather than focus%g only on the variance of X. and the second on knowledge The same multivariate SPC monitoring ideas that based models (expert systems. 2. artificial intelligence were developed using MPCA can be extended di- methods.2 . [2]). They proposed SPC schemes for batch pro- cesses based on MPCA.g. e.5].. This paper describes the monitor- SPC methods in batch processes usually are lim- ing scheme for batch processes based on MPLS. but so is the en- havior against this ‘in-control’ model and its statisti. tracking these variables.g. ‘bad’.. MPLS analysis of batch data ically and scientifically recognize significant devia- tions from the normal operating behavior of the pro.98 P. if any. and fo- or to a single variable measured throughout the batch cuses on some statistical properties of PLS for pre- [6]. J variables over k = 1. etc. and un. dundant and most of them are highly correlated with tistical control. future unusual events one another. tire past history of the trajectories of all these vari- cal properties.. The number of measurements (feedrates. are related both to their success and their used in a much more direct fashion. This vast amount of . made on a sample of the product in the quality con.) being made every few seconds process is characterized using an empirical model over several hours of the batch duration creates a data based on the MPCA analysis of data obtained when overload. processes there are two general methodologies. - man filters. These methods are reviewed by rectly using MPLS when product quality data (Y) are Nomikos and MacGregor [3] and contrasted with sta- available. pressures. MacGregor/ Chemometrics and Intelligent Laboratory Systems 30 (1995) 97-108 The main characteristics of batch processes.2 . However.10] extended these method. Multi-way par- incompatibility with the conventional mathematical tial least squares (MPLS) can be performed using or empirical modeling for monitoring and controlling both the process data (X) and the product quality data continuous processes.. ideas to the nonlinear and finite duration batch pro- cesses. finite duration.. The MPCA in the proposed SPC data in three-dimensional arrays.. One MPLS focuses more on the variance of X that is mofe is based on fundamental mathematical models (Kal- predictive for the product quality Y. e. tempera- cess through simple SPC charts. A (PLS) to the analysis of multivariate continuous simulation of a styrene-butadiene batch reactor is chemical process data has been well documented [7- used to illustrate the ideas of the proposed SPC 91. The additional information that one can get tistical approaches based on multi-way principal from MPLS is an on-line inference of the final qual- component analysis (MPCA). which is analogous to that based on MPCA.

and their elements give the weights applied to interval (Fig.q’. and Q (M X R) matrices something changes in the operation of the batch (e. correlations. For analyzing and monitoring batch processes batch and depicts the overall variability of this batch [3. Although other multi- and detect abnormalities which are difficult to detect way methods [13-161 have been proposed for de- by visual inspection only of the measurement trajec- composing such three-way arrays. Y= i t. Each of the vertical slices of X is a each variable at each time interval within a batch to give the t-scores for that batch. to stationary problem which is suitable for statistical discriminate between normal and abnormal batches analysis and inference [3.10]. MPLS in this framework explains the variation of a E(i.. P. = Cfi’=“. Nomikos. to the end quality of the product. Y=TQ’+F (2) quent batches operationally different from the previ- .. are that one has ‘comparable’ X= i t. J. tory but also both their simultaneous and temporal q (A4 X l). both in simu- behavior of the process.. By doing this. leading to a more linear and lated examples and in real industrial data [3. batches with unusual operation will X and Y are mean centered and sTaled to unit vari- appear in MPLS either as batches with large f-scores. it means that its final product qualities ence normal database.10. with respect to the other batches in the database. This subtraction of MPCA has proven very useful in the post analysis of the average trajectories removes the major nonlinear batch runs and has shown its abilities.+F (1) runs and ‘observable’ events of interest. The first as- r= 1 r=l sumption states that future batches will operate in a or if we combine the t. matrix of the variable trajectories. and q vectors into T (I X similar way with those in the reference database. PLS decomposes the X (I X JK) methods utilize not just the magnitude of the devia- and Y (I X M) matrices into a summation of R score tions of each process variable from its mean trajec- vectors (t (I X 1)) and loading vectors (p (JK X l). the t vectors are or- evolution occupies the third dimension. F (I x M)): The assumptions behind these methods.M) for each batch are with ones as diagonal elements [ 171. as this has been defined from the refer- cj2) are large. DifferentbatFh runs are T = XW(P’W)-’ (3) organized along the vertical side.18]. the measurement variables along the horizontal side. ance prior to perform the ordinary PLS analysis. Each row of the T matrix corresponds to a single cases). After the unfolding of X. P (JK X R).+E. and the matrix (P' W) is upper triangular quality variables (m = 1. MacGregor/Chemometrics and Intelligent Laboratory Systems 30 (1995) 97-108 99 data is organized into a three-way array X (I X . 1. measurement variables about their average trajecto- starting with the one corresponding to the first time ries. The final thogonal. and their time The w vectors are orthonormal.2.I X where T is given by: K) as it is shown in Fig. we focus on MPLS tories. or with large residuals in the x-space (Q. This decomposi- summarized in a (I X M) matrix Y. both For batch data. The ray X is to put each of its vertical slices (I X J) side P and W matrices summarize the time variation of the by sge to the right to create the matrix X (I X JK). 1). The power of both exclusively because of its simplicity and ease of in- MPCA and MPLS results from using the covariance terpretation. which is most closely related are not well predicted by its process measurements. amount or type of catalyst). p. If R). The relation be. or with both.F. The Q matrix relates (I X . tion summarizes and compresses the data with re- tween MPLS and PLS is that MPLS is equivalent to spect to both x and y variables and time into low performing ordinary PLS on a large two dimensional dimensional spaces that describe the operation of the matrix X formed by unfolding the three way array X process which is most relevant to final product qual- in one of the six possible ways (two are degenera!! ity. each column of As in MPCA. Additionally in MPLS. cj2).Z> matrix representing the values of all thrpro- the variability of the process measurements to the fi- cess variables for all the batches at a common time nal product qualities. interval k. MPLS shares these benefits. the most meaningful way of unfolding the ar. as in all inferential methods. point of time. if the process variable about its average trajectory at each residuals for a batch in the y-space (Qr = Cr=“=iF(i.10]. it will make all subse- X=TP’+E. plus some residual matrices (E (I X JK).g.p.

the temperature of the feed fault. The resulting latex anld poly- trated here with a detailed mechanistic model for mer properties of the product were summarized in semi-batch emulsion polymerization of styrene. in order that the method to be able to de. and the data used in this article are available The ability of MPLS to discriminate between . J is the number of process measurement variables.F. the cooling water. Using typical variations (% styrene). array_ X (I X J X K) unfolds into a matrix X (I X JK). [3. the reference dataset X was a The application of MPLS in batch data is illus. and a normal PLS can be performed between the X and Y matrices. a number of batches were (crosslinks/reacted monomer units). and (branches/reacted monomer units). particle size (A). and the instantaneous heat release from an en- 3. Fifty batches which which incorporates the change and re-apply the gave final latex with quality properties within an ac- method. the reactor contents. J. the latex density. I is the number of batches. In this case one has to build a new database from the authors upon request. and M is the number of product quality variables. and polydisper- simulated. and the reactor jacket. ous ones. branching in the initial charge of materials and impurities. 1. stream. crosslinking in the process operations.20]. Arrangement of batch data in MPLS. five quality variables (Y (SO X 2)): composition butadiene rubber (SBR) [ 191. then no method can detect it. The second assumption requires the mea. the total conver- sion. If there is no information in the data about a and butadiene monomers. Using 200 time increments over the duration of the batch. ceptable region were selected to provide a reference surements to contain some information about an ab. data array.100 P. SBR example ergy balance. available on nine variables: the feedrates of styrene tect it. Nomikos. Details of the simulations can be found in sity. On-line measurements were assumed to be normality. The three-way . MacGregor/ Chemometrics and Intelligent Laboratory Systems 30 (1995) 97-108 M 1 Batches I 1 Y KJ Fig. K is the num- ber of time intervals. (50 X 9 X 200) array.

The linear nature of these plots suggests that be used as the statistical reference to classify new nonlinear PLS [22] would probably not be needed. as deter. traction of the averaye trajectories from the process tion of the process variables about their average tra... Hav. (1 X M)).74 MSR/MSE (F. batches as ‘good’ or ‘bad’ and give on-line predic. ticular unfolding of X that is been used and the sub- mined by cross-validation [21]. versus u.10 52.. which Y. 9 = iQ’. uct qualities. quality variables 3 and 4 are ex. to capture the varia... The par- Two PLS components were needed. f=y-9 (4) Table 1 Cumulative percentage sum of squares explained by the two principal components of the X and Y matrices %SS X Y Yl Y2 Y3 Y4 Y5 PC1 14. To rely only on the percentage of ex.93 91..23 42. ing established the observability of faults. and the residuals (e (1 X KJ). Subsequent number of predictor x-variables (9 X 200 = 1800 in batches are then referenced against this ‘in-control’ the SBR example).08 54. f (1 X M)) data account for the variation in each y-variable. Any regression model could have model.<.q r ).. and Q matrices from such an anal- accounted for a large portion of the variability in the ysis contain all the structural information about how Y. _ 1 is the residual matrix of summarizes the information contained in them about Y after extracting r . and it is not influenced much by resulting central cluster formed by the 50 ‘good’ batches.. P.05 65. 2. This arises because the barely outside the acceptable region./(q’. ‘good’ batches which yielded acceptable product and plained Y will be misleading because of the large did not exhibit any operational problems. One should always use cross-validation to determine the number The central idea in on-line monitoring is as fol- of components in the PLS model and to assert its lows. The cumulative percentage sum of squares explained (%SS) by the two principal com- ponents of the X and Y matrices and of each quality 4.. P. = model was built from the 50 ‘good’ batches. measurements have apparently eliminated most non- jectories which is most predictive for the final prod..F.. This model will Fig. (K X J) are given by: These F-tests provide another way from a regression unfold and scale X.97 246.28 91. MacGregor/ Chemometrics and Intelligent Laboratory Systems 30 (1995) 97-108 101 batches with acceptable product and ‘bad’ batches plained very well from the MPLS model and only was tested through a post analysis of the 50 ‘good’ quality variable 2 (particle size) is poorly explained batches plus some ‘bad’ ones with product quality by the process measurements.92 6.( 1 X ZU) point of view to check how well each of the y-varia- i = x. An MPLS model is built using a database of predictability._ I9 ..21 91. tially identical results to the linear analysis. MPLS was able particle size is determined largely by the variation in to detect clearly these ‘bad’ batches by placing them the number of seeded particles charged initially in the in the reduced space (t-plots) away from the main reactor. performing such a nonlinear PLS gave essen- tions of their final qualities.17 245. the process measurements deviate from their mean The last row in Table 1 is the regression statistic values at each time interval and how these are re- mean sum of squares due to regression (MSR) over lated to the final quality variables. where Y. linear effects in the data. Nom&s. which shows how well the x.82 57.20) 27.1 components) are shown in the normal operation of the process.. an MPLS Plots of the latent vectors t. bles is explained by the MPLS model.93 ... is given in Table 1. As it can be seen from Table 1.( K X J) to x.79 91. (u.. for a new batch X. process conditions. The W. e = x. Indeed. J. The predicted r- the mean squared error (MSE) (see Section 5) with scores (2 (1 X R)).24 PC2 23. ..87 7. On-line monitoring variable separately.W(P’ W) -‘.30 20.21 49. = 3.29 67..b’. the predicted quality variables (9 its 95% critical value.

etc. Each point represents one of the fifty tions in the initial batch charge conditions (i. since the actual values of the quality variables will be tion of the above equations is that the X. one can calculate at each time interval the This indicates that the PLS latent variables in the x. becomes complete. but dur- ing the first few time intervals may give poor esti- mates of the t-scores since there is so little informa- tion to work with. matrix is known only at the end of the batch. and are very good for predicting the T2 future trajectory deviations which arise from varia- Fig.. MacGregor/ Chemometrics and Intelligent Laboratory Systems 30 (1995) 97-108 studied to overcome this problem [lo]. PLS. impu- batches in the reference database. k (1 X W)) into the reduced space defined by the W and P matrices in a sequential manner as following: at each time interval k for r = 1 to R $l.r) =x new. the predicted final quality vari- ables. At still operating in the same way as the batches in the time interval k. for rities. in our experience with MPCA and MPLS on this and other examples [3. 2.e. predicts these missing values by restricting them to be consistent with the already 40 t 1 known values..10.r) end -40 -20 0 20 4Q where the symbol (l:W.102 P. PLS does this by projecting the already known observations up to time interval k (x. this will show up in the t- vations (K ..). fall close to the diagonal line of the graph. predicted t-scores. the early data are com- -40 plete for all the measurement variables up to the cur- -20 -10 0 10 20 rent time interval..kW(I:Kr) /(W(l:kl. Note that there are no resid- uals (f) in the y-space during the on-line monitoring The problem which arises in the on-line applica. but has a larger than normal varia- rows complete and it is missing all the future obser.. Several approaches have been scores which will place the new batch away from the .k . t versus u plots. this method works well by the time one has about 10% of the batch history. essentially. Nomikos.r) indicates the elements of 11 the rth column from the first row up and to the klth row. However. Furthermore. The approach shown here uses the ability of PLS to handle missing data [23].F. The reasons for this is that one is not building a PLS model based on large amounts of missing data. and with the correlation structure of the process variables as defined by the W and P ma- 30 trices. i( l. and the residuals.k = X new. The t and u observations.. both PLS components. tion in its measurements. J.r)P( l:W.24].r)‘W(l:kl.r)) (5) X new. u) are well correlated. Now. particle concentrations. but using an already well established PLS model to predict the future behavior of a new batch. the matrix X.and y-space (t.k rows)... has only its first k normal database. This approach gives r-scores very close to their final values as the X.. If a new batch is not complete until the end of the batch operation.

MacGregor/ Chemometrics and Intelligent Laboratory Systems 30 (1995) 97-108 103 origin of the reduced x-space (t-plots). and its residuals will be thing behaves abnormally [lo]. Monitoring charts for the SPE and tl-scores with their 95% and 99% control limits (dashed and solid lines) for the new ‘good’ batch (left hand side plots) and the for ‘bad’ batch with the problem half-way through its operation (right hand side plots)._ ljJ+ 1 e(l. Multivariate hotelling spEovloF 95%Wll3 99xwlro spEolJTff 95%lJwloo 99xlJlM9o Y) 10 I . P. These charts are easily imple- time interval k (SPE. 50 loo 150 200 201 TUE nw Fig. In the case represents the perpendicular distance of the instanta- where a new type of variation occurs. the new batch neous batch process measurements from the reduced data will move away from the reduced x-space de.F. m--w. the squared prediction error (SPE) monitor the t-scores and the SPE for a new batch by associated with the latest on-line measurements at using SPC charts. Nomikos. = Ct<. Thus. The abnormality in the ‘bad’ batch is clearly flagged in the SPE chart after time interval 105. P i c _________-------.-_ 4 ----____-__ -----mm___. 3. x-space. J. will indicate the particular instant that some- fined by the MPLS model. In this case.. one has to large. -0 50 100 150 200 TcuTcf oJ%WrrO 99%uul10 . . c>‘) which mented and easily interpreted.. .

We treat the problem in its general The control limits for the monitoring charts are form and the prediction confidence intervals are ap- derived from the statistical properties of the histori. These pre- either of these charts. and Boullion and Od$l[30] show x-space. has a unique value (Xb) ~0 matter what right weak lar to those based on MPCA.24. batch operation (at time interval 100). After this time interval the obser- vations from this batch move away from the reduced Rao and Mitra [29]. at each time interval. MPLS gives. that a right weak generalized inverse of X. for PLS gives a right weak generalized itverse G of the two new batches. 3 shows the on-line monitor. for the problem y = X/3.26. XG = (XGY) which is given with the problem half-way through its operation. with their 95% and 99% control limits. This latter fault The regression problem y = Xp can always get a is an incipient one. generalized inverse of X one choose to use. The property of invariance of Although the SPC charts based on MPLS for generalized inverses. in which: I~-Yl+k-Yl. In this section we develop approximate confi- process variable to the deviations observed in the t.27]. The MPLS model is not any longer valid. but the b and G are not defined uniquely. They only refer to the values which the prod- the detected deviations. and their t-scores for each tion of the y-variables given the x-observations. MPLS additionally pro.104 P. apptoximation_of the . vg (8) 5. GXG = G.25]. are based on the organic impurities in the butadiene monomer feed to work of Searle [28] for regression based on general- the reactor. The new ‘good’ batch shows no the PLS. database was passed through the on-line monitoring PLS does not only look at the conditional distribu- algorithm given above. solution in the following form: normal operation develops slowly.F. but PLS component and SPE were collected at each time treats both x and y as random variables connected interval. for the construct the control limits for future observations. and the other a ‘bad’ one in which the level of which we shall derive in this section. vides predictions for the final product qualities. This diagnostic information uct quality variables will have upon completion of the can be found by checking the contribution of each batch. This provided fifty observations at each time through the latent variables t. is by: clearly flagged as abnormal in the SPE chart around G= W(P’W)-‘(T’T)-‘T’ (7) time interval 105. A major problem in the statistical analysis of PLS Each of the fifty ‘good’ batches in the reference is the nonlinear extraction of the PLS components. multivariate case (Y) one has to treat each of the y- Two additional batches were simulated to provide variables separately. The X and Y matrices are as- examples for on-line monitoring. The ‘bad’ batch GX = X. mance of the t-scores and the predicted final quali. increased by 50% halfway through the ized inverses.X matrix X = TP’ (X abnormality in any of these charts. PLS gives the aboveAleast squares solution which ables has unique minimum Ixb . J. easily calculated. predictions of the ties [10. one can diagnose the fault by dictions do not have anything to do with the actual interrogating the underlying MPLS model to find values of the quality variables at the given time in- which process variables are primarily responsible for terval. In the following we interval for the t-scores and SPE which were used to shall treat only the case with univariate y. The statistical properties batch. Al- . Confidence intervals for the final quality vari. which has and one should treat the predicted t-scores with cau. cal reference distribution of past normal batches [lo]. guarantees that the predicted 9 monitoring a batch process can be constructed simi. dence intervals for these predictions which can be scores and residuals [20. where the ab. where G is a generalized inverse of X.y I. Nom&s. typical in industry. The final product b=Gy (6) from this ‘bad’ batch was slightly out of the accept- able quality region. Fig. If an abnormal situation is detected by final quality variables (Y) of the product. One was a ‘good’ sumed to be mean centered. gives the least squares solution b tion. plicable to any PLS study. MacGregor/ Chemometrics and Intelligent Laboratory Systems 30 (1995) 97-108 statistics can be used to monitor the overall perfor. the same rank as X. ing charts.

[31] recognized this. = R MSR = SSR/R d. (14) is (11) general for any PLS study. in which c’H = c’. is the critical lowing statistical analysis: value of the Studentized variable with Z-R-l degrees of freedom at significance level a/2. inverse of X’X that is used for Z (b has an infinite lution (minlbl which is unique). Under the assumption that the y-vari. Of course. P. for the significance of each coefficient separately. tion in the y-variable. PLS does not provide the minimum norm so. the predictions capture its problem and stretch SSR=f’j d. y. and t._. which was poorly explained in the MPLS model.b).. PLS model was built.1 degrees of free. PLS gives the follow.. which has rank equal to the 9 * r. C’ZC’CT ‘) (13) for the full rank decomposition of X.=I-R--l MSE = SSE/ their values towards their actual final qualities.~. Although his ap- the null hypothesis (X p) = 0. it is computationally much more model accounts for a significant portion of the varia.2(MSE)1’2(1 + ~(TT)‘P)“* (14) number of PLS components we have extracted (rank(H) = R).. in the usual regression notation. For the ‘bad’ batch. we get the fol. time consuming. but dinary least squares solution b = (X’X1-l X’y. one needs this. The pre- dictions for the ‘normal’ batch. tributed normally as: GAG = G. variate y did a first order linear approximation of b dom [28]. not a function of the y-variable is incorrect. Nomikos.. If one drops the (1) in the var(b) = Za2 (12) (1 + t(T’T)-‘It’) term. The only conclusion proach is more accurate than the zero order approxi- we can derive.. its solution is close number of descriptions). Z=W(P’W)-‘(T’T)-‘(W’P)-‘W’ The confidence intervals at significance level CYfor (9) an individual y-response are given by: Define the indempotent matrix H = G%‘% = W(P’ W)-‘P’ (H2 = H). In spite of To proceed with Searle’s analysis [28]... 4 shows the on-line predictions with their 95% function. P is an estimable function (~.. z?&Bz = Z): and L. match well the actual Table 2 Statistical test final quality values of the product. and that squared error of the PLS analysis of the data upon the Z is independent of the y-variable.. where T and MSE is the r-score matrix and mean able is distributed normally as N(X p.R .H = jz”. The j3 is not an estimable Fig. MacGregor/ Chemometrics and Intelligent Laboratory Systems 30 (1995) 97-108 105 A A though. GA = (Ga)‘). The only estimable function to this.. Qual- SSE = (y . Eq.) to get im- check for significant regression ( /? # 0). the assumption that Z is summarized in Table 2.). It tests for proved confidence intervals for 9. has the poorest predictions for the . CT‘1.. and in the case This shows that we are not able in general to test where X has full column rank. PLS provides the Moore-Penrose generalized inverse of the original X c’b N N(c’P. is that the PLS mation used here. since the matrix W(P’ W) -‘P’ is generally is any quantity c’p. vals for the predicted y-variable since a new set of ing reflexive generalized inverse Z for A’% observations xneW can be decomposed as fi. Phatak The statistic MSR/MSE is distributed as an F et al.f. since b is not invariant to the generalized and 99% confidence intervals.f. This c’/3 has close to being symmetric and thus G would have been c’b as its best linear unbiased estimator which is dis- also a left weak generalized inverse of a (%G% = 8. b = Zk y + ( (H . one gets the equation for con- The statistical test which we can derive from the fidence intervals of the expected value of a y-re- above results. The problem is that this statistic does not around a set of observations (X 0. and for the case of uni- distribution with R and Z . PLS provides a way to derive confidence inter- a generalized inverse of 8’8._. assume g = 0) (10) develop a simple expression for approximate confi- E(b) = H p b is a biased estimator of /3 dence intervals for the PLS predictions. PLS provides the or.jY(y .F.. = ^fP’.. J. CEza9 = fi?Q.I) g for arbitrary g The motivation behind the above analysis was to (from now on. only certain linear combinations of them.9) (I-R-l)=+’ ity variable two.. is sponse (2 ._R_1. if this test is significant. for three of the five quality variables for the two new batches.

The PLS model for the ‘bad’ batch is not valid after time interval 105 (absence of confidence intervals) and its predictions are not generally trustworthy.F. for three of the five final product quality variables for the ‘good’ batch (left hand side plots) and for the ‘bad’ batch with the problem half-way through its operation (right hand side plots). J.44 1 0 20 IO0 lso 200 lwf Fig. 4.5.72 - ____* . J. lZJ6. P. The actual final product qualities are indicated by diamond marks.52 J. MacGregor/ Chemometrics and Intelligent Laboratory Systems 30 (1995) 97-108 .220. On-line predictions. Nomikos. . with their 95% and 99% confidence intervals (dashed and solid lines). a. * . (\ SC----___ ___--- I 1221 I 0 so 100 tw 200 0 50 too 150 200 TIME 1.

As been developed for these predictions. In such cases. and they have not cess measurements. Batch monitoring methods based on MPCA have cesses is whether to use MPCA or MPLS. amounts of initiator or emulsifier. ture all the quality aspects of the final product. the MPLS ditions and set-up of the batch process is available. as to be well correlated with the pro- by Eq. 6. ties because it is hard to believe that the whole batch rections that the quality variables will take can be operation can be reflected to a single quality mea- trusted in general. MPCA been extended by using MPLS. operator on duty. Since acceptable operation in the next stage. but also on the the on-line process measurements deviate from their final quality measurements at the end of each batch. the Y matrix should have as columns. 4. Also. On the other hand.27. In general. Al. If the agitator power is not When additional information about the initial con- correlated with the final product qualities. position of the batch malfunctions. in the MPCA monitoring. Nomikos. and provides a base for continuous improve- . which approach one uses will de.and y-spaces to give is that they are usually susceptible to a significant good predictions and also provides a measure through amount of measurement error. ‘in-control’ state. P. monitoring may not detect this deterioration in the one may use a multi-block method based on MPCA agitator. other difficulty with the batch quality measurements sion methods. The confidence limits given batch process. This extension al- uses only the information about the process opera. quality variable three. these quality measure- be plotted beyond this time interval in Fig. MacGregor/ Chemometrics and Intelligent Laboratory Systems 30 (1995) 97-108 107 ‘bad’ batch. via a polymeriza- agitator power profile because of a deterioration in its tion simulation. This event will cause an alarm quickly an abnormality. a batch-run may have a slightly different monitoring schemes have shown. or that one quality measurement can cap- diagnosing the source of the abnormality. it models both x. has very good predictions and its final prediction is or variables which indicate if the product will have very close to the actual one for the ‘bad’ batch. J. measurements of physical properties of the product. preprocessing con- tions and correct them before they lead to permanent ditions such as preheat duration. Approximate confidence intervals have it may be irrelevant to the quality of the product. initial be beneficial to try to detect all process deteriora.24. quality variables which are closely connected with the val are not trustworthy.F. This is an. or MPLS [20. in the cleaning cycle. Batch quality variables typically are which was very well explained in the MPLS model. 3) the predictions after this time inter. Conclusions The question that arises in monitoring batch pro. that they are able to detect clearly and agitator mechanism. MPCA or MPLS 7. average trajectories when the process operates in an In addition to monitoring the process variable space. Such prior informa- in events that will probably offset product quality or tion can be organized in a new matrix which may in any type of abnormal behavior. product quality. it is easily applied to almost any batch or semi-batch ber of quality variables which describe adequately the process. As a consequence. For an ef- after time interval 105 (the SPE exceeds its control fective PLS. (14) will no longer be valid. and some- the MPLS model is no longer valid for the ‘bad’ batch times variables from customer feedback. An- other advantage of PLS with respect to other regres. it may have variables like feed quality measurements. limits in Fig. The proposed an example. it will flag any MPLS gives on-line predictions for the final product abnormality in the process measurements even though qualities. the di.32] to incorporate this informa- pend upon whether or not one is primarily interested tion into the monitoring scheme. A difficultly with using MPLS to analyze and The methodology presented here is generic in that monitor batch processes is having a sufficient num. lows one not only to utilize the historical data on the tional behavior (x-data) and its model describes how measured process variable trajectories. Therefore. ments should span a wide range of product proper- though the predictions may not be accurate. etc. the un- its residuals in the x-space of how well the PLS certainty in the quality measurements can make the model can be trusted. use of MPLS inappropriate. and this can help considerably in surement.

Sanchez and B. Chemom. ganaksoy. IFAC. 1311 A. 1971. J.W.W. In addition.J.F. its Applications. Technometrics.F.. Appl.R. [21] S. 14 (1992) 71. Piovoso. 41. (1993) 495. Syst. J.5. Intell. P. Process Contr. New [8] B. Birky and T.K. J. [3] P. 1971.L. New York. 30 (1991) 39.A. Stat. Maryland.R. [5] S. C. MacGregor. Odell. [2] G. Kowalski. Mason. P. [12] P. A. Ricker. Chem.R. 1982.. C.E. Qual. Searle. J. Smilde. Chemom. in that they can be [13] i: Geladi.. York. Hamielec and J. Syst.F. N. MacGregor. 1995. Chemom. Wold. Can. MacGregor. lserman. Do.T. 277 [ll] S. Thesis. Taylor and J. Geladi and B. MacGregor. ISA Trans. Lab... W.F. [27] J.. Lab. Koutoudi.S. 14 (1992) 341. Makro- quality product. Cockrum. Technol. control limits and work towards a more consistent [19] T. easily displayed and diagnose a fault. [20] P. Dahl.E. M. problem in their implementation.E. Suppl.F. submitted. Anal. 1 (1990) 41 1301 T. Chemom. Broadhead. Wold. Technometrics. Canada. ‘94. Kowalski. 14 (1990) [24] T. 1 (1987) 41. Smilde and D.. Department of Chemical Engineer- ing. Kowalski. Syst. Eng. [7] J. 40 (1994) 1361. Generalized Inverse of Matrices and [9] B. P..K. R.F. 1994. 34 (1992) 286. Nomikos.A. Chim. [lo] P. Ohman. and eliminate their cause and thereby shrink the 1.L. Chem. Faltin and N. Mitra. 5 (1991) 345. Skagerberg. Chem.. Rao and S. Nelson. AIChE J. Lab. MacGregor monitoring procedure is to detect faults. data reduction and the light computational require. 4 (1990) 29.E. J. Swanson and C.A. submitted. Tucker. Chemom. F. Veltkamp and B. Wangen and B.M. Nomikos and J.L. Multivariate Statistical Process Control of Batch Processes..J. [29] C. Chemom. [23] P. Kresta. Nomikos and J. McMaster University. Chim.. Hahn and M. . New York. Wiley. Doombos. MacGregor. Esbensen and J. Intell. MacGregor and C. 1251 N.F. 14 (1987) 35. Generalized Inverse Matrices. 37 (1995) Wiley-Interscience.M. J. Kowalski. [15] A. Acta. 713. MacGregor and T. [6] C. Jaeckle. MacGregor. J. 15 (1992) 143.A. 24 (1992) 88 .. Tucker. Wold. Eng. Hoskuldsson. [4] G..R. MacGregor/Chemometrics and Intelligent Laboratory Systems 30 (1995) 97-108 ment. Lab.E. June 29-July them. Process Contr. Chemom. 7 (1989) 11. 20 (1984) 387. Nomikos.. diagnose and P.D. Baltimore. Marlin.F. J. [l] R. 185 (1986) dance with the SPC requirements. the [14] A. Boullion and P. Chemom.L.J. J. J. Matrix Algebra Useful for Statistics.R. 1181 K..F. lntell. Reilly and A. Lab. Young and R. Penlidis.O. D. Marsh and T. lntell. ]32] L. J.K. Acta. Tracy. Chemom. Comput. Chemom. mol. 20 (1978) 397. 40 (1994) 826.. Phatak. The proposed monitoring charts are in accor.C. Kiparissides and M. Miller. [28] S. Geladi.F. Control Conf. 69 (1991) 3. J. Heckler. Qual. Nomikos. Anal. Qua]. Syst.B. K.. Syst.108 P. Vander Wiel.R. Am. P.. Ph.D. AfChE J. Kourti. 10 (1985) 105. References [22] S..F. Technometrics. Automatica. Nomikos and J. Wise.. ments of the proposed methods do not impose any [16] E. 3 (1988) 3. 2 (1988) 211. McAvoy. Technol. Kiparissides. J. submitted. J. Wiley. Kosanovich.J. The objective of the [17] A. J. K. [26] P.E. lntell.