You are on page 1of 9

IEEE TRANSACTIONS ON SEMICONDUCTOR MANUFACTURING, VOL. 23, NO.

2, MAY 2010

201

Introducing a Unied PCA Algorithm for Model Size Reduction


Richard P. Good, Daniel Kost, and Gregory A. Cherry
AbstractPrincipal component analysis (PCA) is a technique commonly used for fault detection and classication (FDC) in highly automated manufacturing. Because PCA model building and adaptation rely on eigenvalue decomposition of parameter covariance matrices, the computational effort scales cubically with the number of input variables. As PCA-based FDC applications monitor systems with more variables, or trace data with faster sampling rates, the size of the PCA problems can grow faster than the FDC system infrastructure will allow. This paper introduces an algorithm that greatly reduces the overall size of the PCA problem by breaking the analysis of a large number of variables into multiple analyses of smaller uncorrelated blocks of variables. Summary statistics from these subanalyses are then combined into results that are comparable to what is generated from the complete PCA of all variables together. Index TermsCombined index, computation time, fault detection, large scale systems, multivariate statistical process control (MSPC), principal component analysis (PCA), recursive PCA.

I. INTRODUCTION ULTIVARIATE fault detection and classication (FDC) has been widely applied in semiconductor manufacturing to quickly identify when a process is behaving abnormally. These abnormalities often result from faulty measurements, misprocessed wafers, process drifts or trends, tool aging, and tool failures. Studies have been reported on the analysis of optical emission spectroscopy data [1], site-level metrology data [2], trace-level process equipment data [2][6], and end-of-line electrical test data [7][9]. Several trends in the semiconductor manufacturing industry have made the use of FDC increasingly critical. First, the growing size of wafers increases the cost of missing a process fault. A missed fault during the processing of a 300 mm wafer puts roughly twice as many die in jeopardy compared to a 200 mm wafer. This will become even more critical as the industry shifts to 450 mm wafers or reduces the size of each die. Second, process sampling is fast becoming a more common practice [10][12]. As fewer wafers are measured, it becomes more critical to capture disturbances at the tool level. By waiting until product wafers are measured, it is possible to have misprocessed hundreds of wafers. In addition, if FDC is
Manuscript received May 31, 2009; revised November 30, 2009. Current version published May 05, 2010. This work was supported by the Advanced Process Control Group, GLOBALFOUNDRIES, Austin, TX and Dresden, Germany. R. P. Good and G. A. Cherry are with GLOBALFOUNDRIES, Austin, TX, USA (e-mail: rick.good@globalfoundries.com; gregory.cherry@globalfoundries.com). D. Kost is with GLOBALFOUNDRIES, Dresden, Germany (e-mail: daniel. kost@globalfoundries.com). Digital Object Identier 10.1109/TSM.2010.2041263

used on metrology tools, then waiting for univariate trends to signal a process drift puts too many wafers at risk. Multivariate FDC enables quicker identication of process faults by better utilizing the available metrology data. The third trend is the ever-increasing cost of manufacturing equipment. The most expensive process equipment currently cost upwards of $40 million [13]. To ensure a return on such a large investment, maximizing equipment uptime is essential. Relying on multivariate FDC to identify abnormal processing allows for less frequent preventative maintenance and lower likelihood of catastrophic failure, which results in greater equipment utilization. Principal component analysis (PCA) is a technique commonly used to perform FDC in semiconductor manufacturing. The PCA algorithm is well-suited for semiconductor process data because of its robustness to collinearity. Furthermore, PCA uses the existing correlation structure to identify process faults and reduce false alarms. A model is rst built to characterize the correlation of the data. New measurements are then compared to the model, and, if a new measurement is signicantly different from the historical data, the measurement is classied as a fault. As we will see in Section II, PCA loadings are calculated from the correlation matrix, the size of which scales quadratically with the number of variables, . When the requirement exists to adjust to changes in correlation for new measurements, , and the the overall size of the stored model scales when computation effort for loading updates scales using the rank-one modication approach [14]. As PCA-based FDC applications monitor systems with more and more variables or on trace data with faster sampling rates, the FDC systems infrastructure is faced with a growing storage and computational burden that can be difcult to overcome. This paper introduces an algorithm for breaking the PCA problem into multiple smaller problems, which greatly reduces the size of models and the computation time for model generation and adaptation. The idea of dividing a large PCA problem into two or more smaller ones has been investigated in the past. Wold et al. introduced the method of consensus PCA (CPCA), in which the complete set of process variables is split into blocks [15]. Upperlevel modelling is performed to capture the relationships between blocks to generate super scores, while the relationships between variables of the same block are captured at the lower level. Hierarchical PCA (HPCA) has also been presented, which differs only from CPCA with respect to normalization [16], but it has been shown to have problems with convergence [17]. Subsequent research was performed that proved the equivalence between CPCA and regular PCA, such that block scores and loadings can be derived from a single PCA model constructed from all variables [17], [18].

0894-6507/$26.00 2010 IEEE

202

IEEE TRANSACTIONS ON SEMICONDUCTOR MANUFACTURING, VOL. 23, NO. 2, MAY 2010

A commonality between the multiblock methods of CPCA and HPCA is that the estimation of both levels of model parameters must be performed simultaneously, which is useful for capturing correlation that exists from one block to another. However, for applications in which block-to-block correlation is minimal, little is gained by modelling the crosstalk across blocks. In such cases, it is more efcient to deploy models independently for each block of variables, which can be done without compromising fault detection accuracy. The independent modeling approach has seen signicant use for batch process monitoring, in which trace data can be split using either engineering knowledge [19], [20] or phase identication algorithms [21], [22]. Yet one thing that is missing in these approaches is the ability to combine the results from the individual models to provide monitoring statistics at the overall process level. The method presented in this paper allows for the efcient use of smaller independent models while providing a mechanism for rolling up results to higher levels on demand. In the rest of the paper, Section II provides a review of the recursive PCA algorithm, and the unied PCA algorithm (UPCA) is introduced in Section III. In Section IV, we demonstrate the UPCA algorithm on electrical test data, and follow with some concluding remarks in Section V. II. RECURSIVE PCA In this section, we will briey review the recursive PCA algorithm as well as three commonly used statistics for quantifying the severity of a fault. A. PCA With PCA, a zero mean and unit variance data matrix of samples (rows) and variables (columns) is rst transformed into a new orthogonal basis that is aligned along the directions of largest variation. This denes the eigenvalue problem using according to the sample covariance matrix (1) (2) where are the eigenvectors (loadings) of the covari, which are the orthogonal ance matrix. The scores projections of in the new basis , can be obtained by (3) Data compression and dimension reduction is obtained by decomposing into (4) where and are the modeled and residual components, respectively. Furthermore, and can be written as (5) (6) where contains the rst eigenvectors (corresponding to the largest eigenvalues) of the correlation matrix, , and contains the last eigenvectors of . The matrices and are the loading and score matrices, respectively.

Fig. 1. Decomposition of the example data matrix, and unmodeled ( ~ ) components.

^ X, into its modeled (X)

and are Likewise, the matrices the residual loading and score matrices. Because the columns of both and are orthogonal (7) (8) It is worth noting that the choice of the appropriate number of principal components is critical when building a PCA model. Please refer to [23] and [24] for detailed discussions on criteria and methods for choosing the appropriate number of principal components. The main computational effort goes to the solution of the eigenvalue problem and the selection of the appropriate number of principal components, which increases with the number of variables monitored. An example of PCA decomposition is provided in Fig. 1, and , are plotted against where two correlated variables, each other. After scaling each of them to zero mean and unit variance, they are assembled into the data matrix, , prior to the application of PCA. The eigenvector corresponding to the largest eigenvalue provides the rst principal direction and is shown in Fig. 1 as a dashed line. We have chosen to use only this one principal component while the second principal component is relegated to the residual space. The modeled and residual component can now be calculated for every observation in Fig. 1. The modeled component is the distance from the origin along the principal eigenvector. Likewise, the residual component is the distance from origin along the residual eigenvector. B. Recursive PCA Two main aspects lead to the introduction of recursive PCA. First, the number of samples used for building models is not always sufcient for a representative estimation of the correlation structure between variables. When that is the case, it may be useful to deploy an immature model to provide immediate monitoring as soon as possible and then adapt that model as more data become available. Second, due to time-varying behavior of some processes (such as equipment aging, sensor and process drifts, etc.), newer data are often more representative of the normal behavior of a process than older data [14].

GOOD et al.: ALGORITHM FOR MODEL SIZE REDUCTION

203

In these cases, it is appropriate to adapt PCA models because these normal drifts and trends may be inaccurately identied as process faults. The PCA model can be adapted to a new , by updating the estimate of the unscaled data sample, correlation matrix and scaling parameters according to

(9) where (10) is the new scaled measurement, is the vector of means, is a diagonal matrix containing the standard deviations of the variables, , and is a tuning parameter that governs the rate at which the correlation matrix is updated [2]. The closer is to zero, the more quickly the model will adapt. Conversely, when is close to unity, the model will adapt gradually. Subsequent to the correlation matrix update, the loadings would be generated by performing a singular value decomposition on the result. C. PCA Performance Indices When a PCA model is applied to a new observation, the two performance statistics that are normally considered are the squared prediction error (SPE) and Hotellings . The SPE indicates how much an observation deviates from the model and is dened as SPE (11) (16) Furthermore, Yue and Qin show that tional to the distribution is approximately propor(17) where (18) and (19) The limit is shown for the example data set in Fig. 3.
Fig. 2.

and SPE limits for the example data set.

As an alternative to looking at the SPE and separately, Yue and Qin [4] introduce a combined metric, , which is a sum of the SPE and metrics weighted against their control limits SPE where (15)

Alternatively, Hotellings indicates how much an observation deviates inside the model, and is calculated by (12) where is a diagonal matrix containing the principal eigenvalues used in the PCA model. A process is considered normal if, as referenced in [25] and [4], respectively, both the SPE and statistics satisfy SPE (13)

D. Recursive PCA Model Sizes Although PCA is in essence a data compression algorithm, when using recursive PCA one must keep a record of the entire correlation matrix. This implies that the PCA model size increases quadratically with the number of variables. Although this is not a concern with small models, several trends in semiconductor manufacturing are creating a situation in which the PCA model sizes are becoming too large for FDC system infrastructures. First, process equipment data are unfolded to allow FDC to be applied on the trace-level sensor trajectories [26]. Because data are unfolded in the time direction, the increase in data collection frequency directly increases the number of variables in the PCA model. Second, we are seeing an increase in the number of variables being monitored. From a microprocessor

and (14) where is the inverse of the chi-squared distribution function with degrees of freedom and a condence interval of . The SPE and limits are shown for the example data set in Fig. 2. Here we see that would identify deviations in the modeled direction only, while the SPE would identify deviations in the residual direction only. Taken collectively, the and SPE limits would form a box around the raw data.

204

IEEE TRANSACTIONS ON SEMICONDUCTOR MANUFACTURING, VOL. 23, NO. 2, MAY 2010

Fig. 3.

' limits for the example data set.

Fig. 4. Correlation coefcients and blocking for the electrical test data.

manufacturing perspective, the industry has seen the introduction of multiple cores and additional levels of cache memory on the devices. This has dramatically increased the number of electrical parameters to monitor. In just the past couple of years, the number of parameters has increased approximately ten-fold, thus increasing the PCA model size by a factor of 100. Finally, we have seen an increase in the number of processes being monitored by PCA models. Although this does not affect the size of the models, the increase in the number of applications compounded with the increase in model sizes can drive the FDC system infrastructure to its limits. III. A UNIFIED PCA ALGORITHM When investigating sample covariance matrices, we often see that data are segregated into nearly uncorrelated blocks. Consider, for example, the correlation matrix of some electrical test data in Fig. 4. Here we see several blocks of variables with very little correlation between the blocks. Let us assume for the moment that the correlation between the blocks is negligible such that the correlation matrix, , can be written in block-diagonal form

Therefore, if the data are composed of uncorrelated blocks, and if we assign the appropriate number of principal components to each block, then it is possible to reduce the greater PCA model to a series of smaller PCA models. We term this method unied PCA (UPCA). One can think of two special cases. On one extreme, only one block is used such that UPCA is identical to PCA. In this case, there is no performance loss nor model size reduction. On the other extreme, by assigning each variable to its own block, the number of nonzero elements in the covariance to . Assuming that only nonzero matrix is reduced from values are stored in the model, the model size would be at a mintest on imum, and the FDC approach would be reduced to a independent variables. In this sense, the proposed method can approaches. be considered a generalization of the PCA and Calculating the overall fault detection indices and their limits is straightforward since we need only to operate on, at most, ve block scalar values (SPE, , , and ). Since is now block diagonal, SPE and can be calculated by summing the contributions from the individual blocks SPE SPE (23)

. . .

. . .

..

. . .

(20)

(24) Next, when calculating the SPE limits, we see that we are op, and erating on two scalar values: . Again, using the block diagonal form of , be calculated by summing the individual blocks and can

It is widely known that the eigenvalues of this block-diagonal matrix are the union of the eigenvalues of the individual blocks, for . Furthermore, the corresponding eigenvectors of are the individual eigenvectors of lled with zeros at the positions of the other blocks (see, for example, [27], [28]). It follows that, if we abandon the practice of ordering the eigenvectors by the magnitude of the eigenvalues, the loadings and scores matrices, and , can be rewritten as

(25)

(26) . . . . . . .. . . . . (21) (22) The SPE limit, follows from (13) (27)

GOOD et al.: ALGORITHM FOR MODEL SIZE REDUCTION

205

TABLE I UPCA ROLL-UP OF SUMMARY STATISTICS FROM SMALLER PCA BLOCKS TO FORM A LARGER PCA MODEL

The contribution to the is

statistic according to Qin et al. [18]

(31) and the contribution to the combined index is (32) where (33) In the above equations, and are used to denote row of and , respectively. Limits for the variable contributions could be determined using (27)(29), by considering each block to be a single variable. B. Process Characterization With UPCA (28) We discussed in the previous section how it is often desirable to drill down to nd the root cause of a process fault. However, often the opposite is also true, namely, that we wish to roll up process data to characterize larger groups of data. For example, we may wish to group multiple wafers into lots, multiple lots into products, multiple chambers into tools, or multiple tools into tool groups. In this section, we discuss using the UPCA algorithm to generate performance indices for groups of process runs. One of the fundamental assumptions of the PCA algorithm is that there is no autocorrelation between process runs [29]. If this assumption holds, then one can apply the UPCA algorithm to combine recursive PCA models (or UPCA models) from multiple process runs. One should contrast this with the prospect of unfolding data in the wafer direction to build a lot-level PCA model. In this case, the PCA model for a 25-wafer lot would be 625 times as large. Because it may be unrealistic to unfold data and build a CPCA or HPCA model on an entire lots worth of variables, it is reasonable with UPCA to combine the results of PCA as applied to single wafers to generate performance metrics for a lot. Lots can then be combined to characterize a product. Using such an approach, a process engineer would rst start by looking at a trend chart of lot-level SPE, , or . If a lots performance index exceeds its limit, the engineer would drill down to the wafer level. The wafer-level contributions would then be used to identify the faulty wafer. This pattern is continued to the PCA block and then the parameter level until the root cause is identied. C. Some Practical Considerations With UPCA The use of UPCA has advantages and disadvantages. The obvious disadvantage is that the assumption of interblock orthogonality is only an approximation. Correlation between the blocks will cause the (and therefore ) limits to be smaller than what we would expect from a PCA model that captures the covariance. Likewise, if this correlation structure is ignored, disruptions in the correlation structure between blocks is not captured by the SPE index, so it is possible to miss process faults. More interblock correlation causes the UPCA approximation to

and the limit, , can be calculated by summing the number of principal components used in each block, , by

If we wish to use the combined index , then SPE, , , and can be used directly with (15) through (19) such that the fault is limit (29) An example of the UPCA algorithm is shown in Table I. Here we see the summary statistics for a UPCA model with six uncorrelated blocks. Each block has its own PCA model and the summary statistics ( , , and ) are shown for each model. Also shown in Table I are the performance metrics (SPE and ) for an observation for each of the six blocks. We operate on these ve summary statistics to calculate the SPE limits , the limits , the combined metric and limit for each of the blocks. Then, to combine the blocks into a single PCA model (again, assuming correlation between blocks does not exist or can be neglected), the same operations are applied to the sum of the ve summary statistics. A. Fault Diagnosis With UPCA If a fault is observed, it is important to quickly determine the root cause of the fault so that engineering intervention can return the tool to its normal operating state. Specically, we are interested in rapidly identifying the variable(s) that contribute to the fault [2]. The UPCA algorithm lends itself well to such a drill-down approach. In the example in Table I, we see that the combined statistic is greater than its limit, indicating that a fault has likely occurred. After establishing that a fault has been observed, the next step would be to look at the contributions of the individual blocks. In this case, we see that Block 6 has the largest contribution. At this point the engineer would look at the contributions of the individual variables within the block. The contribution to the SPE from a single variable is SPE (30)

206

IEEE TRANSACTIONS ON SEMICONDUCTOR MANUFACTURING, VOL. 23, NO. 2, MAY 2010

Fig. 5. Fault detection for PCA and UPCA after removing the correlation between blocks.

Fig. 6. Fault detection for PCA and UPCA when correlation exists between blocks.

be less accurate. It follows that an FDC engineer must consider a tradeoff between model size and model performance. UPCA also has the advantage of using process knowledge to declare that blocks are orthogonal. Because the PCA models are built with a nite data set, the models often identify relationships that do not actually exist. UPCA allows process knowledge to disallow such false relationships. Finally, UPCA models can be constructed from the PCA blocks in situ. If engineers observe that certain PCA blocks are consistently causing faults in the performance indices without a real impact to the product, it is possible to simply remove these blocks from UPCA without having to rebuild a PCA model. Furthermore, it is possible to rebuild a single PCA block without having to rebuild the entire model. IV. SEMICONDUCTOR CASE STUDY In this section, we apply the UPCA algorithm to wafer electrical test (WET) data. In this case, process engineers have grouped 484 variables into fteen blocks where process knowledge has led them to believe that there should be little or no interaction between the blocks. It is worth noting here that one can imagine a method to automatically group blocks directly from the correlation structure. Although the grouping may not be optimal in the sense of creating orthogonal blocks, we have chosen to use product engineerings blocks because they capture most physical meaning when doing root cause analysis. The correlation matrix and the blocking for this case study is shown in Fig. 4. The case study looks at a total of 800 wafers, where the rst 500 wafers are used for model building and the nal 300 wafers are used to test the model. A. Comparing PCA and UPCA We rst consider the scenario in which there is truly no correlation between the blocks. To achieve this, we have articially set the off-block-diagonal components of the correlation matrix to zero. The PCA and UPCA algorithm are applied to this correlation matrix and the performance index is plotted in Fig. 5.

The results demonstrate that, in the absence of interblock correlation, PCA and UPCA are identical. However, by using UPCA, the model size has been reduced by 89.4% and the model was built (including eigenvalue decomposition, cross validation, and missing value reconstruction) 26 times faster. Returning now to the correlation matrix illustrated in Fig. 4, we see that a small amount of correlation exists between the blocks. As such, the UPCA algorithm is only an approximation. Fig. 6 compares the UPCA and PCA algorithms when correlation exists between the blocks. We see that, even when correlation exists between the blocks, UPCA is a good approximation of PCA. Both methods identify most of the same wafers as being faulty, including a large disturbance at Wafer 250. There are only a select few smaller disturbances that cross the control limit for UPCA and not PCA. These can be considered false alarms for UPCA, and could be avoided through a slight adjustment to the condence interval, . As before, the UPCA model size is reduced by 89.4% and the model was built 26 times faster. As mentioned in Section III, one extreme of UPCA is when every parameter is assigned to its own block. In this case, UPCA assumes that there is no correlation between variables and the test to the data. approach is identical to simply applying a The drawback of this approach is that the correlation causes additional false alarms in the model space and misses faults in the residual SPE space. This is illustrated in Fig. 7. The oblimits would servations inside the PCA limits but outside the be false alarms. Observations inside the limits but outside the PCA limits would be missed faults. Returning now to the case study, we can compare the performance of PCA with the application of UPCA. This comparison is show in Fig. 8. Here we see that, although the model size is reduced by 99.7%, by completely ignoring all correlation we greatly increase the number of false alarms and we miss a fault at Wafer 180. Clearly such an approach is inappropriate for monitoring a process, but it illustrates that a key contribution of the UPCA algorithm is that an FDC engineer can make a tradeoff between model performance and model size.

GOOD et al.: ALGORITHM FOR MODEL SIZE REDUCTION

207

Fig. 7. Comparison of the ' limits for PCA and the  limits for UPCA when correlation exists between two variables. Fig. 10. Fault diagnosis using contributions from the individual variables in Block 10.

TABLE II UPCA ROLL-UP OF SUMMARY STATISTICS FROM SEVERAL LOTS TO GET OVERALL STATISTICS FOR THE PRODUCT

Fig. 8. Fault detection for PCA and UPCA when correlation exists between blocks, with UPCA results based on the  test.

B. Root Cause Analysis Both PCA and UPCA show a sizeable fault at Wafer 250. We start to identify the root cause of the fault by looking at the contributions of the 15 individual PCA models. This is shown in Fig. 9, where each of the contributions is scaled by the limit. We see a clear signal at Block 10, where the block has exceeded its limit by a factor of nine. After identifying this block, we can now drill deeper into Block 10 to see the contributions from the individual variables. This is shown in Fig. 10. Here we see that ten individual variables in the block contribute to the fault. These data are then used by engineers to determine the severity of the fault and whether the wafer should be scrapped now to prevent further unnecessary processing.

Fig. 9. Fault diagnosis using contributions from the fteen PCA models.

208

IEEE TRANSACTIONS ON SEMICONDUCTOR MANUFACTURING, VOL. 23, NO. 2, MAY 2010

C. Lot-Level Summaries The data used to train and test the model were processed in 25-wafer lots. We can now use the UPCA algorithm to combine the individual 25 wafers results (25 484 variables) into lots to mimic the results as if a single PCA model with 12 100 variables were applied to all of the wafers of the lot simultaneously. Returning to our case study, the 300 test wafers are divided into 12 lots. The necessary summary statistics are shown in Table II, which shows that two lots are agged as being over the limit . Drilling down into Lot 10, we would see the contributions for each of the wafers in the lots (including Wafer 250, with a known fault). The investigation would continue all the way to the parameter level. Also shown in Table II is a summary for the twelve lots. This value could be used to summarize the health of an entire product for the time period in question. In this case, we see that even though we have a handful of faulty wafers the product as a whole has not exceeded the fault criterion. V. CONCLUSION In this paper, we introduce a new algorithm termed unied PCA that is used to combine multiple PCA models into a larger model. We show that if the variables in the individual model blocks are uncorrelated, then UPCA provides identical performance to PCA but with dramatically smaller model sizes. In practice, correlation between the blocks exists and therefore UPCA is only an approximation of PCA. As such, a process engineer can tune UPCA to form a balance between model size and fault detection accuracy. Although PCA has the ability to drill down for fast root cause identication, UPCA introduces the capability of rolling-up summary statistics. If PCA (or UPCA) models are created at the wafer level, then UPCA can be used to combine the wafers into lots. Likewise lots can be combined to monitor entire products. Finally, several additional advantages to the algorithm are discussed including the ability to create and modify models in situ, and the ability to use process knowledge to disallow insignicant correlation between unrelated variables. As a nal note, UPCA could greatly benet from a method to automatically group variables. The authors have investigated a handful of methods but, to date, have found no satisfactory approach. As such, this remains as an open topic and warrants further investigation. ACKNOWLEDGMENT The authors gratefully acknowledge D. Kadosh, K. Chamness, and B. Harris for implementing the Test Parameter Analysis application in GLOBALFOUNDRIES Fab1 and Spansions Fab25. REFERENCES
[1] H. Yue, S. Qin, R. Markle, C. Nauert, and M. Gatto, Fault detection of plasma etchers using optical emission spectra, IEEE Trans. Semicond. Manuf., vol. 13, no. 3, pp. 374385, 2000. [2] G. Cherry and S. Qin, Multiblock principal component analysis based on a combined index for semiconductor fault detection and diagnosis, IEEE Trans. Semicond. Manuf., vol. 19, no. 2, pp. 159172, 2006.

[3] B. Wise, N. Gallagher, S. Butler, J. D. D. White, and G. Barna, A comparison of principal component analysis, multiway principal component analysis, trilinear decomposition and parallel factor analysis for fault detection in a semiconductor etch process, J. Chemometr., vol. 13, pp. 379396, 1999. [4] H. Yue and S. Qin, Reconstruction based fault detection using a combined index, Ind. Eng. Chem. Res., vol. 40, no. 20, pp. 44034414, 2001. [5] H. Yue and M. Tomoyasu, Weighted principal component analysis and its applications to improve FDC performance, in Proc. 43rd IEEE Conf. Decision Contr., Atlantis, Paradise Island, Bahamas, 2004, pp. 42624367. [6] Q. He and J. Wang, Fault detection using the k-nearest neighbor rule for semicondductor manufacturing processes, IEEE Trans. Semicond. Manuf., vol. 20, no. 4, pp. 345354, 2007. [7] L. Yan, A PCA-based PCM data analyzing method for diagnosing process failures, IEEE Trans. Semicond. Manuf., vol. 19, no. 4, pp. 404410, 2006. [8] K. Skinner, D. Montgomery, G. Runger, J. Fowler, D. McCarville, T. Rhoads, and J. Stanley, Multivariate statistical methods for modeling and analysis of wafer probe test data, IEEE Trans. Semicond. Manuf., vol. 15, no. 4, pp. 523530, 2002. [9] K. Chamness, Multivariate Fault Detection and Visualization in the Semiconductor Industry, Ph.D., Univ. Texas, Austin, 2006. [10] A. Holfeld, R. Barlovic, and R. Good, A fab-wide APC sampling application, IEEE Trans. Semicond. Manuf., vol. 20, no. 4, pp. 393399, 2007. [11] R. Good and M. Purdy, An MILP approach to wafer sampling and selection, IEEE Trans. Semicond. Manuf., vol. 20, no. 4, pp. 400407, 2007. [12] M. Purdy, K. Lensing, and C. Nicksic, Method for efciently handling metrology queues, in Proc. Int. Symp. Semicond. Manuf., San Jose, CA, 2005, pp. 7174. [13] M. LaPedus, Lithography Vendors Prep for the Next Round: Rival Immersion Scanners Roll for 45-nm Node EE Times Jul. [Online]. Available: http://www.eetimes.com/showArticle.jhtml?articleID= 190300855 [14] W. Li, H. Yue, S. Valle, and J. Qin, Recursive PCA for adaptive process monitoring, J. Proc. Cont., vol. 10, pp. 471486, 2000. [15] S. Wold, S. Hellberg, T. Lundstedt, M. Sjstrm, and H. Wold, in Proc. Symp. PLS Model Building: Theory and Applications, 1987. [16] S. Wold, N. Kettaneh, and K. Tjessem, Hierarchical multiblock PLS and PC models for easier model interpretation and as an alternative to variable selection, J. Chemometr., vol. 10, pp. 463482, 1996. [17] J. A. Westerhuis, T. Kourti, and J. MacGregor, Analysis of multiblock and hierarchical PCA and PLS models, J. Chemometr., vol. 12, pp. 301321, 1998. [18] S. Qin, S. Valle, and M. Piovoso, On unifying multi-block analysis with applications to decentralized process monitoring, J. Chemometr., vol. 15, pp. 715742, 2001. [19] K. Kosanovich, K. Dahl, and M. Piovoso, Improved process understanding using multiway principal component analysis, Ind. Eng. Chem. Res., vol. 35, pp. 138138, 1996. [20] C. Undey and A. Cinar, Statistical monitoring of multistage, multiphase batch processes, IEEE Control Syst. Mag., vol. 22, no. 5, pp. 4052, 2002. [21] N. Lu, F. Gao, and F. Wang, Sub-PCA modeling and on-line monitoring strategy for batch processes, AIChE J., vol. 50, no. 1, pp. 255259, 2004. [22] J. Camacho and J. Pic, Multi-phase principal component analysis ofor batch processes modelling, Chem. Intell. Lab. Syst., vol. 81, pp. 127136, 2006. [23] S. Wold, Cross validatory estimation of the number of components in factor and principal component analysis, Technometrics, vol. 20, no. 4, pp. 397406, Nov. 1978. [24] S. Valle, W. Li, and S. Qin, Selection of the number of principal components: A new criterion with comparison to existing methods, Ind. Eng. Chem. Res., vol. 38, no. 11, pp. 43894401, 1999. [25] J. Jackson and G. Mudholkar, Control procedures for residuals associated with principal component analysis, Technometr., vol. 21, no. 3, pp. 341349, Aug. 1979. [26] S. Wold, P. Geladi, K. Esbensen, and J. Ohman, Multi-way principal components and PLS analysis, J. Chemometr., vol. 1, pp. 4156, 1987. [27] U. Von Luxburg, A tutorial on spectral clustering, Statist. Comput., vol. 17, no. 4, pp. 395416, 2007. [28] L. Hogben, Linear algebra, in Ser. Discrete Mathematics and Its Applications. London, U.K.: Chapman and Hall, 2006. [29] S. Wold, K. Esbensen, and P. Geladi, Principal component analysis, Chem. Intell. Lab. Sys., vol. 2, pp. 3752, 1987.

GOOD et al.: ALGORITHM FOR MODEL SIZE REDUCTION

209

Richard P. Good received the B.S. degree in chemical engineering from the Georgia Institute of Technology, Atlanta, in 2000, and the M.S. and Ph.D. degrees in chemical engineering from the University of Texas at Austin in 2002 and 2004, respectively. He is currently a Member of the Technical Staff in the Advanced Process Control group, GLOBALFOUNDRIES, Austin. His research interests include multivariate run-to-run control and fault detection, wafer sampling and selection, supervisory electrical parameter control, and yield prediction.

Gregory A. Cherry received the B.S. degree in chemical engineering from the University of Maryland, College Park, in 2000, and the M.S. and Ph.D. degrees in chemical engineering from the University of Texas, Austin, in 2002 and 2006, respectively. He is currently a Member of the Technical Staff with GLOBALFOUNDRIES, Austin, TX. His research interests include fault detection and diagnosis, as applied to semiconductor processes. Prior to joining GLOBALFOUNDRIES, he performed internships with Degussa-Hls Corporation, Mobile, AL, and the Army Research Laboratory, Aberdeen Proving Ground, MD.

Daniel Kost received the diploma and Ph.D. degrees in physics from T.U. Dresden, Germany, in 2003 and 2007, respectively. He is currently a Software and Application Engineer in the Advanced Process Control group, GLOBALFOUNDRIES, Dresden, Germany. Prior to joining GLOBALFOUNDRIES, he performed internships with James R. McDonald Laboratory and Kansas State University. He also worked as a scientist with the Dresden-Rossendorf Research Center, Dresden, in the elds of ion solid interaction, plasma physics, and highly charged ion physics. His research interests include multivariate fault detection, yield prediction, yield-loss classication, and multivariate fault detection on wafer electrical test data, in addition to physics-related topics.