You are on page 1of 8

IEEE ROBOTICS AND AUTOMATION LETTERS, VOL. 7, NO.

3, JULY 2022 8423

Convolutional Autoencoder and Transfer Learning


for Automatic Virtual Metrology
Yu-Ming Hsieh , Member, IEEE, Tan-Ju Wang, Chin-Yi Lin , Yueh-Feng Tsai ,
and Fan-Tien Cheng , Fellow, IEEE

Abstract—To ensure stable processing and high-yield produc- leads to the failure of monitoring the product quality in real
tion, high-tech factories (e.g., semiconductor, TFT-LCD) demand time, and there will be a loss once production-process shifts or
product quality total inspection. Generally speaking, sampling manufacturing-tool drifts occur while waiting for metrology. A
inspection only measures a few samples and comes with metrology novel idea to solve this problem is adopting virtual metrology
delay, thus it usually cannot achieve the goal of real-time and (VM) [1] that can convert sampling inspections with metrology
online total inspection. Automatic Virtual Metrology (AVM) was delay into real-time and on-line total inspections.
developed to tackle such problem. It can collect the data from the
process tools to conjecture the virtual metrology (VM) values in
Hirai and Kano [2] proposed locally weighted partial least
the prediction model for realizing the goal of online and real-time squares (LW-PLS) for VM. The experimental results indicated
total inspection. With the advancement of technology, the pro- that the method can achieve higher prediction accuracy com-
cesses become more and more precise, and better accuracy of VM pared to sequential update model and artificial neural network
value prediction is demanded. The CNN-based AVM (denoted as in the dry etching process of semiconductor manufacturing.
AVMCNN ) scheme can not only enhance the accuracy of the origi- Wan and McLoone [3] proposed the use of Gaussian process
nal AVM prediction, but also perform better on the extreme values. regression (GPR) models in VM-enabled run-to-run control. The
Nevertheless, two advanced capabilities need to be addressed for its experimental results showed that the models have better control
practical applications: 1) effective initial-model-creation approach performance than the ones that do not take prediction reliability
with insufficient metrology data; and 2) intelligent self-learning into account when the chemical mechanical polishing process
capability for on-line model refreshing. To possess these two ad- is applied. Cheng et al. proposed Automatic Virtual Metrology
vanced capabilities, the Advanced AVMCNN System based on (AVM) [4], which comes with data quality index, reliance index,
convolutional autoencoder (CAE) and transfer learning (TL) is
global similarity index and the so-called Dual-Phase algorithm
proposed in this work. It is verified that the Advanced AVMCNN
System is more feasible for the onsite applications of the actual for online self-learning capability that enable AVM’s practical
production lines. applications on the production lines. This paper focuses on
the augmentation of the AVM method to address issues of
Index Terms—Automatic virtual metrology (AVM), convolu- insufficient metrology data and online self-learning.
tional neural network (CNN), convolutional autoencoder (CAE), However, with the advancement of (such as semiconductor)
transfer learning (TL). manufacturing technologies, the manufacturing processes be-
come more precise, and the demand of VM accuracy gets higher.
Back Propagation Neural Networks (BPNN), the current AVM
I. INTRODUCTION
prediction algorithm, may not fulfill this requirement. In view
TABLE processing and high-yield production are the con- of this, Hsieh et al. proposed the AVM server based on Convo-
S tinuously pursued goals for the manufacturing industries,
and offline sampling inspection is one of the most commonly
lutional Neural Networks (CNN) [5], i.e., AVMCNN , as shown
in Fig. 1. By adjusting the CNN structure and developing the
adopted methods for achieving the goals mentioned above. Automated Data Alignment Scheme (ADAS), the application
However, this method can only assess the quality of the sam- of CNN on VM becomes feasible. The experimental results
pled workpieces and there is a waiting time for acquiring the presented in [5] indicate that the overall prediction accuracy
metrology values after the manufacturing processing is com- is improved by 15% and the performance on the extreme values,
pleted. The metrology delay of the offline sampling method which are far from average but may still within the spec, can get
better as well. Please refer to the 13th sample in Fig. 6 for one of
the extreme-value examples. In this way, broader applications
Manuscript received 23 February 2022; accepted 19 June 2022. Date of
publication 30 June 2022; date of current version 12 July 2022. This letter of VM technology on more and more precise manufacturing
was recommended for publication by Associate Editor W. Guo and Editor J. processes can be realized. Yet two issues need to be addressed
Yi upon evaluation of the reviewers’ comments. This work was supported in prior to practical applications of this approach.
part by the “Intelligent Manufacturing Research Center” (iMRC) from The 1) Insufficient Metrology Data for Model Creation: Gener-
Featured Areas Research Center Program within the framework of the Higher ally speaking, deep-learning-based algorithms require a large
Education Sprout Project by the Ministry of Education (MOE) in Taiwan and
in part by the Ministry of Science and Technology of Taiwan, R.O.C., under amount of data to learn the comprehensive domain characteris-
Contracts MOST 111-2221-E-006-200, 110-2218-E-006-027, & 110-2923-E- tics. However, it is very time-consuming to collect a huge amount
006-010-MY3. (Corresponding author: Fan-Tien Cheng.) of paired samples (with both the metrology data and their associ-
The authors are with the Institute of Manufacturing Information and Systems, ated process data). Therefore, developing a real-time prediction
National Cheng Kung University, Tainan 701, Taiwan (e-mail: johnniewalk@ scheme that requires only a small amount of paired samples for
imrc.ncku.edu.tw; drwang@imrc.ncku.edu.tw; b10021001@gmail.com; tsai-
yeifeng@imrc.ncku.edu.tw; chengft@mail.ncku.edu.tw). initial modeling is crucial to decide whether AVMCNN can be
Digital Object Identifier 10.1109/LRA.2022.3187617 applied to the production lines.

2377-3766 © 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: National Cheng Kung Univ.. Downloaded on July 14,2022 at 00:06:59 UTC from IEEE Xplore. Restrictions apply.
8424 IEEE ROBOTICS AND AUTOMATION LETTERS, VOL. 7, NO. 3, JULY 2022

Fig. 1. Advanced AVMCNN System Adopting CAE and TL.

Fig. 2. Networks of Convolutional Autoencoder (CAE).

2) Online Self-Learning Capability: The original AVM sys- unlabeled data, and then connected to the CNN architecture for
tem with BPNN (denoted as AVMBPNN ) conducts online fine-tuning with a small amount of paired samples to solve the
tuning/re-training by utilizing the dual-phase algorithm [4] to problem of spectral overlap smoothly.
sustain prediction accuracy with varying process status changes. Wang et al. [11] used the method of data visualization
Therefore, AVMCNN adopting CNN should also possess the on- to convert one-dimensional flow data into two-dimensional
line tuning/re-training capability to secure prediction accuracy. images, and utilized the residual convolutional neural network
The technologies of autoencoder (AE) [6] can extract the to adopt TL on abnormality detection of the industrial control
characteristics of the process data, establish a pre-trained model, system. Compared to other algorithms such as SVM or
and then fine-tune the model with a small amount of paired generative adversarial network, CNN can shorten the training
samples. Transfer Learning (TL) [7] can be applied to reduce time and increase the accuracy more efficiently after adopting
the amount of training data and shorten the training time of TL. Zhu et al. [12] applied maximum mean discrepancy to the
the CNN training model. Therefore, these two issues mentioned CNN TL architecture for fault diagnosis of vibration signals,
above may be resolved by applying AE and TL. The survey of where it could accurately distinguish the fault types. Imoto et al.
AE and TL are presented below. [13] used the CNN based on TL for automated fault classification
Choi and Jeong [6] used AE and clipping fusion regularization on limited wafer defect samples with extremely high accuracy.
for feature extraction, and then used methods like least absolute Liu et al. [14] proposed a two-stage multi error prediction
shrinkage and selection operator, support vector machine and model based on temporal convolutional network and domain-
decision tree to perform VM for critical dimension (CD) in the adversarial training of neural networks TL, which was applied to
etching process. It was proved that AE can indeed improve the the ion mill etching process to accurately predict the failure time.
accuracy of the prediction model in comparison to other feature By fine-tuning the pre-trained model, the accuracy and stability
extraction methods. Yu and Liu [8] proposed a novel deep neural of fault detection on other failure modes of less frequency can be
network based on the two-dimensional principal component enhanced.
analysis and convolutional autoencoder for wafer map pattern As mentioned above, CAE can extract the characteristics
recognition, which is superior to other well-known CNNs, such of the process data, establish a pre-trained model, and then
as GoogleNet and AlexNet. Li et al. [9] and Chiu et al. [10] fine-tune the model with a small amount of paired samples.
successively proposed a new fiber Bragg grating adjustment Also, TL-based CNN applications can not only effectively
system, which first utilized AE to pre-train a large number of shorten the model training time but also improve the prediction

Authorized licensed use limited to: National Cheng Kung Univ.. Downloaded on July 14,2022 at 00:06:59 UTC from IEEE Xplore. Restrictions apply.
HSIEH et al.: CONVOLUTIONAL AUTOENCODER AND TRANSFER LEARNING FOR AUTOMATIC VIRTUAL METROLOGY 8425

accuracy. However, the adoptions of CAE and TL technologies III. AUTOENCODER AND TRANSFER LEARNING
to AVMCNN still require further research. The main contribution
of this paper is that it develops the so-called Advanced AVMCNN A. Introducing Autoencoder
System to replace AVMCNN [5] in the conjecture model for bet- Autoencoder (AE) is comprised of three layers, i.e., an input
ter performance by adopting CAE and TL as shown in Fig. 1. The layer, a hidden layer, and a reconstruction layer (or output layer).
Advanced AVMCNN System takes the practical online manufac- Encoding is to extract the feature vector from the input sample
turing scenarios into consideration and two learning strategies x to the hidden layer through the Encoder, and mapping the
are designed to resolve the issue on how and when to transfer. features back to the original dimension through the Decoder to
CAE adopts a few paired samples for model creation. Two learn- obtain the reconstruction sample x’. The purpose of AE is to find
ing strategies are applied interactively in response to different a set of parameter weights to minimize the reconstruction error
situations in order to achieve better accuracy and shorter learning and learn the abstract features of the samples, which is often
time while conducting online and real-time conjecture. applied for abnormality detection.
The remainder of this paper is organized as follows. Section II AE has several types of extension architectures, such as:
briefs AVMBPNN and AVMCNN, Section III introduces CAE stacked autoencoder [18], denoising autoencoder [19], sparse
and TL. Section IV proposes the Advanced AVMCNN System. autoencoder [18], and convolutional autoencoder (CAE) [12].
Section V presents the illustrative examples. Finally, summary This paper utilizes CAE based on the networks proposed in [5] to
and conclusions are stated in Section VI. solve the issue of insufficient paired samples. General AE adopts
neural networks for Encoder and Decoder programs; while CAE,
as depicted in Fig. 2, applies the excellent convolutional network
feature extraction and representation to maximize the accuracy
II. AVMBPNN AND AVMCNN of the model.
A. Introducing AVMBPNN Take the data of semiconductor manufacturing process as the
example and observe Fig. 2. After acquiring the process data of
The major contribution of the AVM system shown in Fig. 1 Sensor 1 in Encoder for a period of time (such as t1 ), the process
is that it can convert sampling inspection with metrology delay data of Sensor 1’ can then be reconstructed and outputted in
into real-time and on-line total inspection. Three novelties of Decoder through the AE model. The trained model then has a
AVM facilitate this contribution. First, VM values are provided powerful feature extraction function for this sensor, which is
along with RIs and global similarity indexes (GSIs) [16] so users often applied to fault detection and classification [12]. In other
know whether VM values are reliable or not; second, dual-phase words, when the same process data (Sensor 1) appear in different
algorithm [15] is added into the conjecture module to achieve time periods (such as t2 ), the above-mentioned trained model
promptness and accuracy simultaneously; and finally, process can then be re-used to reconstruct and output the process data
data evaluation index (DQIX ) [17] and metrology data evaluation (Sensor 1’). If the reconstruction is successful, the data is judged
index (DQIy ) [17] are designed to perform on-line and real-time as “non-abnormal”; otherwise, it is determined as “abnormal”.
quality evaluation of the collected process and metrology data, Since the AVMCNN algorithm proposed in [5] requires a large
respectively. amount of paired samples for model creation, it is not feasible
Moreover, the modules in the AVM framework shown in Fig. 1 to be applied to the current (such as semiconductor) industry.
are pluggable and inter-changeable. As shown in the top-right Hence, this paper proposes a modeling process based on the
corner of Fig. 1, various prediction algorithms may be adopted CAE algorithm to solve the issue of insufficient paired samples.
and plugged in the module of conjecture model. For example,
BPNN is the main algorithm of the conjecture model of the
original AVM system in [4] for outputting VM values; therefore,
the original AVM system is denoted as AVMBPNN . B. Introducing Transfer Learning
TL [7] can be applied to reduce the amount of training data
and shorten the training time of the CNN training model. TL
addresses and fixes three main problems of traditional machine
B. Introducing AVMCNN learning: (1) insufficient labeled data, (2) incompatible compu-
The more precise processes resulted from advanced technol- tation power, and (3) distribution mismatch. TL has been suc-
ogy increase the demand for VM accuracy. To enhance VM cessfully applied to image processing [20], speech recognition
accuracy, Hsieh et al. [5] proposed the AVMCNN System based [21], and natural language processing (NLP) [22].
on CNN, which not only replaces BPNN with CNN in the con- Given a source domain DS and source learning task TS , a
jecture model but also develops the automated data alignment target domain DT and target learning task TT . The techniques
scheme (ADAS) for data preprocessing, as outlined with red of TL aim to help improve the learning of the target predictive
square and red ellipse in Fig. 1. In view of the CNN structure, function fT (.) in DT by using the knowledge in DS and TS , where
assuming there are p sensors and ŷ is the VM output, a CNN DS ࣔ DT , or TS ࣔ TT [23].
structure called Structure C is developed in [5]. Structure C In the above definition, a domain is a pair D = {X, P(X)}, let X
executes CNN operation on each sensor individually in suc- be the feature space and P(X) a marginal probability distribution,
cession of convolution, max pooling, flatten, dropout, and fully where X = {x1 , …, xn }. Thus, the condition DS ࣔ DT implies
connected (FC) layers, for deriving the intermediate outputs ŷ1 , that either XS ࣔ XT or P(XS ) ࣔP(XT ). For example, there
ŷ2 , …, ŷp . Then, Structure C conducts ensemble on an extra FC are generally multiple tools in the dry etching shallow trench
layer with the intermediate outputs being the input to generate isolation (STI) process of semiconductor manufacturing. They
the final VM output ŷ. As stated in [5], it was proved that the have the same process parameters, i.e., XS = XT ; yet tools vary
prediction accuracy of AVMCNN with Structure C is better than from each other, their performance would be different, i.e., P(XS )
that of the original AVMBPNN . ࣔP(XT ).

Authorized licensed use limited to: National Cheng Kung Univ.. Downloaded on July 14,2022 at 00:06:59 UTC from IEEE Xplore. Restrictions apply.
8426 IEEE ROBOTICS AND AUTOMATION LETTERS, VOL. 7, NO. 3, JULY 2022

Fig. 3. Different Fine-Tuning Strategies.

A task can be denoted as T = {Y, f(.)}, where Y is a label among tools (i.e., XS = XT but P(XS ) ࣔP (XT )), the prediction
space and f(.) is an objective prediction function. f(.) can be accuracy may be poor while fanning out this initial model to
learned from the training data consisting of pairs {xi , yi }, where the other tools of the same type without further fine-tuning.
xi ∈ X and yi ∈ Y. From a probabilistic viewpoint, f(.) can be Two different fine-tuning strategies are developed to manage
defined as P(Y | X). Therefore, a task can be defined as a pair the issues of what and how to transfer, as depicted in Fig. 3.
T = {Y, P(Y | X)}. And TS ࣔ TT implies that either YS ࣔ YT Strategy I in Fig. 3(a) refreshes all the weights in the CNN
or P(YS | XS ) ࣔ P(YT | XT ). Take the same dry etching STI network using the target tool’s own data to make sure that
process of semiconductor manufacturing as the example. The the model can effectively explain this target tool. Strategy II
critical dimension (CD) values on all the wafers are assessed in Fig. 3(b) freezes all the weights of the convolutional layer
after machining finishes (YS = YT ); and different performance and merely refreshes the weights of the FC layer. As such, the
of different tools will result in the variations on the CD values execution time of Strategy II is much less than that of Strategy
(P(YS | XS ) ࣔ P(YT | XT )). I. There are various learning requirements on actual production
As for TL, there are three main issues for research: 1) what to lines, such as handling status changes, stabilizing production, or
transfer, 2) how to transfer, and 3) when to transfer [23]. Most applying model creation to Tool B with the data from Tool A, etc.
research focuses on what and how to transfer, yet it is also crucial Hence, the re-learning mechanism should be able to cover all the
to know the timing of transferring. Therefore, the Dual-Phase related issues. Judiciously selecting Strategy I or Strategy II can
algorithm based on TL for the Advanced AVMCNN System is fulfill the above-mentioned individual requirements. Strategy I
proposed not only to consider what and how to transfer but also is applied to 1) initialize CNN model creation or 2) re-train
deal with the issue on when to transfer. the CNN model when a tool status change occurs (such as
maintenance operation, etc.); while Strategy II is utilized if the
initial model optimization has been completed under online and
IV. ADVANCED AVMCNN SYSTEM real-time prediction environment.
For the purpose of possessing the advanced capabilities of CNN requires a large amount of paired samples to learn the
1) effective initial-model-creation approach with insufficient comprehensive domain characteristics. However, it is often hard
metrology data; and 2) intelligent self-learning capability for on- to collect a great amount of metrology data for preparing the
line model re-learning, the Advanced AVMCNN System based paired samples. In this work, an unsupervised CAE pre-training
on CAE and TL is proposed to enhance the conjecture model in strategy is proposed to solve this problem. According to the
[5] as illustrated below. AVMCNN structure [5], this paper applies CAE as shown in
Fig. 2 to obtain the pre-trained model. The Encoder that adopts
the CAE algorithm performs feature extraction on the process
A. Initial-Model-Creation Approach With Insufficient data (X). Then, only the part of the pre-trained Encoder model
Metrology Data is retained, while the part of Decoder is replaced with an ar-
Three CNN network structures were presented in [5] and chitecture that conforms to AVMCNN , i.e., a fully-connected
among them, Structure C performs the best. Therefore, Structure neural network. As such, the AVMCNN structure modeling based
C of [5] is adopted for Advanced AVMCNN here. Two differ- on CAE is depicted in Fig. 4. Then, the AVMCNN modeling
ent fine-tuning strategies of Advanced AVMCNN structure are based on CAE is applied to create the initial model as explained
depicted in Fig. 3. While applying VM on the production lines below.
with insufficient model-creation samples, historical data from
different sources of the same type of tools are collected to build Step 1: Collect both available historical process data and metrol-
the initial model. However, due to different characteristics exist ogy data.

Authorized licensed use limited to: National Cheng Kung Univ.. Downloaded on July 14,2022 at 00:06:59 UTC from IEEE Xplore. Restrictions apply.
HSIEH et al.: CONVOLUTIONAL AUTOENCODER AND TRANSFER LEARNING FOR AUTOMATIC VIRTUAL METROLOGY 8427

Fig. 4. AVMCNN Structure Modeling based on CAE.

Step 2: Adopt the ADAS scheme for data pre-processing and


quality inspection.
Step 3: Divide all the historical data into paired samples (with
metrology data plus their corresponding process data) and
unpaired process data.
Step 4: Use the unpaired process data to conduct CAE model
creation to find the pre-trained model.
Step 5: Use the CAE model built in Step 4 as the pre-trained
model; then, apply Strategy I of Fig. 3 to refresh the whole
AVMCNN networks with the paired samples to complete the
initial model creation of AVMCNN .
B. Dual-Phase Algorithm based on Transfer Learning

As mentioned previously, a manufacturing tool is a time-


varying system. The Dual-Phase algorithm [15] in the original
AVMBPNN System is comprised of Phase I and Phase II. Phase
I emphasizes promptness for generating the Phase I VM results
to monitor the processing quality in real time; while Phase II
ensures the freshness and accuracy of the model by tuning or re-
learning the VM, RI/GSI, and DQI models once a real metrology
sample is collected. However, the original Dual-Phase algorithm
[15] was designed for BPNN, it cannot be applied to deep
learning algorithms. In addition, the AVMCNN System based
on CNN deep learning algorithm may not effectively re-learn
the AVMCNN model with only one or small amount of samples
of metrology data. Moreover, in response to insufficient model
creation data on the production line, either to collect data from
different tools for model fanning out or to utilize CAE for
model creation, the re-learning strategy for AVMCNN should
be changed.
Therefore, the Dual-Phase algorithm of AVMCNN based on Fig. 5. Dual-phase algorithm based on transfer Learning.
TL designed by applying two different fine-tuning strategies is
proposed in this paper as depicted in Fig. 5.
As mentioned previously, the original Dual-Phase algorithm Observing the left-hand portion of Phase II flow in Fig. 5,
[15] can neither be applied to deep learning algorithms, nor Phase II algorithm starts to define the size of the virtual cassette
correspond to various learning strategies. Re-design is necessary [4], K, to be the amount of paired samples required for finetun-
for the Dual-Phase algorithm. The yellow blocks with italic ing. Then reset N to zero and collect metrology data. Correlation
words in Fig. 5, as well as the description in italics below in between the metrology data and the process data is checked
this paragraph are the parts of modifications to the original via the workpiece ID once a complete set of metrology data is
Dual-Phase algorithm [15]. As shown in the right-hand portion collected. If the correlation check is successful, conduct data
of Fig. 5, Phase I algorithm starts to collect the process data of quality check; otherwise, re-collect metrology data. Send alarm
each processing workpiece after the conjecture model is built. when an abnormality is confirmed. If there is status change,
Conduct process data quality check, and issue an alarm once reset N. Otherwise, N = N+1. If N = K, execute CNN model
abnormality is detected. Then execute ADSA [5]. The work- re-learning; otherwise, re-collect metrology data. Before per-
piece’s Phase I VM value (VMI ) along with its corresponding forming CNN model re-learning, it is necessary to first determine
RI and GSI values are computed once both data quality check which learning strategy will be adopted. Strategy I should be
and ADAS finish. conducted for scenarios like: just after a status change, initial

Authorized licensed use limited to: National Cheng Kung Univ.. Downloaded on July 14,2022 at 00:06:59 UTC from IEEE Xplore. Restrictions apply.
8428 IEEE ROBOTICS AND AUTOMATION LETTERS, VOL. 7, NO. 3, JULY 2022

Fig. 6. Prediction Accuracy Comparison of Various Fine-tuning Strategies.

learning of a new model, or manual activation, so that the model The size of the virtual cassette [4] is set to be 25 (K = 25).
can effectively explain the target tool. If the tool is under stable Hence, as indicated in Fig. 6, N = K (25) is marked on Samples
production, only Strategy II is required to keep the model fresh. 25, 50, and 75. Observing Fig. 5, when N reaches K, retraining
The conjecture and RI/GSI models will be updated after CNN with Strategy I or tuning with Strategy II will be activated. Three
model re-learning is completed. Finally, Phase II VM value testing cases are conducted and explained below.
(VMII ) and its accompanying RI/GSI values of each workpiece
in the entire cassette are recomputed. Then reset N. A1) Special AVMCNN always Strategy I: After creating the
initial model following Step 1 to Step 4 of Section IV-A,
Strategy I is adopted for AVMCNN modeling by TL at Step
V. ILLUSTRATIVE EXAMPLES
5. Then, manual activation is set for running the Dual-Phase
Two illustrated examples are presented in this section. Ex- algorithm based on TL. As a result, Strategy I will be executed
ample 1 illustrates the prediction accuracy comparison of for fine-tuning the CNN model.
various fine-tuning strategies. Then, Example 2 demonstrates A2) Special AVMCNN always Strategy II: After creating the
the accuracy comparison of AVMBPNN , AVMCNN , Advanced initial model following Step 1 to Step 4 of Section IV-A,
AVMCNN , and other different deep learning algorithms (such as Strategy II is applied for AVMCNN modeling by TL at Step
VGG16 [24]) encountering a tool status change. The comparison 5. Then, manual activation is disabled and no auto-retraining
index, mean absolute error (MAE) expressed in (1), is utilized is allowed. Thus, Strategy II will be executed for fine-tuning
here to assess the prediction accuracy with the CNN model.
n A3) Advanced AVMCNN : Re-learn the model online and in real
i = 1 |ŷi − yi |
MAE = (1) time normally following the complete steps of Section IV-A
n and Fig. 5.
where
The prediction results of different fine-tuning strategies are
ŷi : ith predictive value, depicted in Fig. 6 which indicate that the result of using always
Strategy II (A2) is the worst. The reason is that, the initial
yi : ith real metrology value, model extracts the feature values using CAE algorithm with
n : sample size. process data only, thus the correlation between the metrology
values and the feature values is not well established. As such,
A. Comparison of Different Fine-Tuning Strategies the model cannot acquire better prediction results if Strategy
II (which freezes the weightings of the convolutional layer)
The dry etching STI process of semiconductor manufacturing is applied to fine-tune the AVMCNN networks at Step 5 of
is adopted as the example to compare the prediction results of dif- Section IV-A.
ferent fine-tuning strategies, and there is no status change within As for the A1 Case, it adopts Strategy I for initial modeling by
this time period. The inspection of CD is chosen for demonstra- TL, then the model fine-tuning is still conducted with Strategy
tion. According to the physical properties of the etching process I. This will deteriorate the prediction accuracy, as the refreshed
tools, 35 key process parameters, such as electrostatic chuck model of the convolutional and max pooling layers is slightly
(ESC) leakage current, chuck backside pressure, throttle valve disturbed by a small amount (K = 25) of samples.
position, radio-frequency (RF) power, RF voltage, and chamber Finally, the A3 Case adopts the normal and complete steps of
temperature, etc., are identified. A total of 300 unpaired process both Section IV-A and Fig. 5, the model can be well established.
data and 105 sets of paired samples are collected in this example. As a result, there are 6.18% and 15.03% accuracy enhancements
The data dimension of each sensor is 1×56, and there are 35 compared to A1 and A2, respectively.
sensors, which means p = 35. The initial AVMCNN model based
on CAE is created following the steps illustrated in Section IV-A.
The first 300 unpaired process data are utilized to create the B. Advanced AVMCNN System for Handling Status Change
pre-trained CAE model, the subsequent 25 paired samples for The dry etching STI process of semiconductor manufacturing
fine-tuning by TL, and the last 80 paired samples for testing. with tool status change is adopted as the example to verify

Authorized licensed use limited to: National Cheng Kung Univ.. Downloaded on July 14,2022 at 00:06:59 UTC from IEEE Xplore. Restrictions apply.
HSIEH et al.: CONVOLUTIONAL AUTOENCODER AND TRANSFER LEARNING FOR AUTOMATIC VIRTUAL METROLOGY 8429

TABLE I
PERFORMANCE OF ADVANCED AVMCNN IN DIFFERENT DATA SETS

Fig. 7. Accuracy Comparison of Various AVMBPNN , AVMCNN , VGG16, and Advanced AVMCNN Encountering a Tool Status Change.

the performance of the Advanced AVMCNN system. A total hyperparameters of VGG16 are set as the same values in [24],
of 356 sets of paired samples are collected in this example. the only difference is that the learning rate of the CV results is
The data dimension of each sensor is 1×56, and there are 35 0.00008, smaller than that in [24].
sensors, which means p = 35. The first 300 paired samples As for the Advanced AVMCNN case, the first 275 set of pro-
are adopted for model creation of AVMBPNN [4] and original cess data are applied to create the pre-trained CAE model, then
AVMCNN [5] for free-run and dual-phase experiments; finally, the subsequent 25 paired samples for fine-tuning by Strategy I
the last 56 paired samples are utilized for testing. In addition, to create the initial model, and the last 56 paired samples are
a state-of-the-art deep learning algorithm of VGG16 [24] is used for testing. The size of the virtual cassette [4] is set to be
adopted for comparison. It has been demonstrated in [25] that 12 (K = 12) in this example.
AVMBPNN has better performance than Simple Recurrent Neu- The accuracy comparison of various AVMBPNN , AVMCNN ,
ral Networks and Multiple Regression of statistical regression VGG16, and Advanced AVMCNN encountering a tool status
based algorithms. Moreover, the fact that AVMCNN has better change is shown in Fig. 7. Observing Fig. 7, six cases with
performance than AVMBPNN has been validated in [5]. Thus, AVMBPNN free-run, AVMBPNN dual-phase, original AVMCNN
statistical-regression-based algorithms are not included for com- free-run, original AVMCNN dual-phase, VGG16, and Advanced
parison in this paper. AVMCNN are compared with 56 testing samples. During the
In the experiments, the training data are further split into the testing period, tool status (CapPosition) change (started from
training set and validation set. The splitting ratio for the two Samples 13 till 56) occurs. Note that, with K = 12 and the
sets is 8:2. Then, the model’s hyperparameters can be found via status change beginning at Sample 13, according to Fig. 5, the
cross validation (CV). The hyperparameters of AVMCNN and Dual-Phase algorithm of the Advanced AVMCNN will perform
of VGG16 can be acquired through CV. The hyperparameters tuning at Sample 12, re-set the N value to 0 at Sample 13, and
of AVMCNN are as below: activation function of convolutional then conduct retraining at Sample 25, tuning at 37, and finally
layer is SeLU, filter number of convolutional layer is 4, filter tuning at 49.
size of convolutional layer is 3, max pooling size of pooling As shown in Fig. 7, the CD prediction results of the
layer is 2, dropout rate of flatten layer is 0.5, learning rate of AVMBBPNN free-run case deviate from the real metrology
FC layer is 0.003, activation function of FC Layer is SeLU. The values during the period of status-change occurrence. This

Authorized licensed use limited to: National Cheng Kung Univ.. Downloaded on July 14,2022 at 00:06:59 UTC from IEEE Xplore. Restrictions apply.
8430 IEEE ROBOTICS AND AUTOMATION LETTERS, VOL. 7, NO. 3, JULY 2022

phenomenon is improved by applying the Dual-Phase algorithm [2] T. Hirai and M. Kano, “Adaptive virtual metrology design for semi-
of the AVMBPNN . As for the cases of the original AVMCNN conductor dry etching process through locally weighted partial least
free-run and Dual-Phase, their prediction accuracies are better squares,” IEEE Trans. Semicond. Manuf., vol. 28, no. 2, pp. 137–144,
May 2015.
that of the AVMBPNN Dual-Phase. Finally, observing Fig. 6, the [3] J. Wan and S. McLoone, “Gaussian process regression for virtual
prediction accuracy of Advanced AVMCNN is the best among metrology-enabled run-to-run control in semiconductor manufacturing,”
all the cases with the MAE enhancement of 276.36%, 39.23%, IEEE Trans. Semicond. Manuf., vol. 31, no. 1, pp. 12–21, Feb. 2018.
23.92%, 8.18%, and 17.88%, respectively. It indicates that the [4] F.-T. Cheng, C.-A. Kao, C.-F. Chen, and W.-H. Tsai, “Tutorial on applying
prediction accuracy done by Advanced AVMCNN built with 275 the VM technology for TFT-LCD manufacturing,” IEEE Trans. Semicond.
Manuf., vol. 28, no. 1, pp. 55–69, Feb. 2015.
unpaired process data and a small amount (25) of paired samples [5] Y.-M. Hsieh, T.-J. Wang, C.-Y. Lin, L.-H. Peng, F.-T. Cheng, and S.-Y.
is superior to other prediction combinations that adopt a large Shang, “Convolutional neural networks for automatic virtual metrology,”
amount (300) of paired samples. IEEE Robot. Automat. Lett., vol. 6, no. 3, pp. 5720–5727, Jul. 2021.
The execution time required for model re-learning is also an [6] J. Choi and M. K. Jeong, “Deep autoencoder with clipping fusion regular-
important indicator on the production line, which denotes the ization on multistep process signals for virtual metrology,” IEEE Sensors
Lett., vol. 3, no. 1, Jan. 2019, Art. no. 7101804.
feasibility of the algorithm. Under the hardware specifications of [7] S. Niu, Y. Liu, J. Wang, and H. Song, “A decade survey of transfer learn-
CPU: Intel(R) Core(TM) i7-10700 @ 2.90GHz, RAM: 128 GB, ing (2010–2020),” IEEE Trans. Artif. Intell., vol. 1, no. 2, pp. 151–166,
and GPU: NVIDIA GeForce RTX 2080 Ti GAMING OC 11 G, Oct. 2020.
the original AVMCNN Dual Phase takes 148 seconds to complete [8] J. Yu and J. Liu, “Two-dimensional principal component analysis-based
model re-learning, while the Advanced AVMCNN applying TL convolutional autoencoder for wafer map defect detection,” IEEE Trans.
Ind. Electron., vol. 68, no. 9, pp. 8789–8797, Sep. 2021.
only needs 68 seconds to do the same, which is a 2.17 times of [9] Y. Li et al., “Image reconstruction using pre-trained autoencoder on
speed-up. multimode fiber imaging system,” IEEE Photon. Technol. Lett., vol. 32,
Considering generality, in addition to the data sets of a dry no. 13, pp. 779–782, Jul. 2020.
etch STI process, several different data sets of semiconductor [10] P. -H. Chiu, Y. -S. Lin, Y. C. Manie, J. -W. Li, J. -H. Lin, and
manufacturing processes, such as the side-wall angle and depth P. -C. Peng, “Intensity and wavelength-division multiplexing fiber sensor
interrogation using a combination of autoencoder pre-trained convolution
of the dry etching process as well as the thickness of plasma neural network and differential evolution algorithm,” IEEE Photon. J.,
enhanced chemical vapor deposition (PECVD) process are also vol. 13, no. 1, Feb. 2021, Art. no. 6600709.
adopted to examine the performance of the proposed Advanced [11] W. Wang et al., “Anomaly detection of industrial control systems based
AVMCNN System. As indicated in Table I, the average MAE on transfer learning,” Tsinghua Sci. Technol., vol. 26, no. 6, pp. 821–832,
improvement considering those four data sets in VGG16 and Dec. 2021.
[12] J. Zhu, N. Chen, and C. Shen, “A new deep transfer learning method for
AVMCNN Dual Phase are 23.00% and 6.76% with the average bearing fault diagnosis under different working conditions,” IEEE Sensors
2.52 and 2.08 times of execution time speed-up, which sup- J., vol. 20, no. 15, pp. 8394–8402, Aug. 2020.
ports the fact that the proposed Advanced AVMCNN has better [13] K. Imoto, T. Nakai, T. Ike, K. Haruki, and Y. Sato, “A CNN-based transfer
performance than the previously mentioned methods as well learning method for defect classification in semiconductor manufacturing,”
as VGG16. In conclusion, the experimental results prove that IEEE Trans. Semicond. Manuf., vol. 32, no. 4, pp. 455–459, Nov. 2019.
[14] C. Liu, L. Zhang, J. Li, J. Zheng, and C. Wu, “Two-stage transfer learning
the Advanced AVMCNN is superior to the original AVMCNN in for fault prognosis of Ion mill etching process,” IEEE Trans. Semicond.
terms of both prediction accuracy and model re-learning speed. Manuf., vol. 34, no. 2, pp. 185–193, May 2021.
[15] F.-T. Cheng, H.-C. Huang, and C.-A. Kao, “Dual-phase virtual metrology
scheme,” IEEE Trans. Semicond. Manuf., vol. 20, no. 4, pp. 566–571,
VI. SUMMARY AND CONCLUSION Nov. 2007.
[16] F.-T. Cheng, Y.-T. Chen, Y.-C. Su, and D.-L. Zeng, “Evaluating reliance
As the processes become more complicated and the demand level of a virtual metrology system,” IEEE Trans. Semicond. Manuf.,
of VM accuracy gets higher in the industry, how to deal with vol. 21, no. 1, pp. 92–103, Feb. 2008.
the issues of 1) developing an effective initial-model-creation [17] Y.-T. Huang and F.-T. Cheng, “Automatic data quality evaluation for the
approach with insufficient metrology data; and 2) establishing an AVM system,” IEEE Trans. Semicond. Manuf., vol. 24, no. 3, pp. 445–454,
Aug. 2011.
intelligent self-learning capability for on-line model re-learning [18] C. Shi, B. Luo, S. He, K. Li, H. Liu, and B. Li, “Tool wear prediction via
becomes a big challenge for VM application on the production multidimensional stacked sparse autoencoders with feature fusion,” IEEE
lines. The Advanced AVMCNN System is proposed in this paper Trans. Ind. Inform., vol. 16, no. 8, pp. 5150–5159, Aug. 2020.
to enhance the conjecture model in [5] for solving the above two [19] X. Wu, G. Jiang, X. Wang, P. Xie, and X. Li, “A multi-level-denoising
issues. To begin with, two fine-tuning strategies are designed autoencoder approach for wind turbine fault detection,” IEEE Access,
vol. 7, pp. 59376–59387, 2019.
with the TL technology. Then, the technologies of CAE and TL [20] B. Q. Huynh, H. Li, and M. L. Giger, “Digital mammographic tumor classi-
are adopted to design the initial-model-creation steps for the fication using transfer learning from deep convolutional neural networks,”
Advanced AVMCNN System. Finally, the Dual-Phase algorithm J. Med. Imag., vol. 3, no. 3, 2016, Art. no. 034501.
based on TL is proposed as well. The experimental results reveal [21] W. Zhang and P. Song, “Transfer sparse discriminant subspace learning
that the Advanced AVMCNN System performs better concerning for cross-corpus speech emotion recognition,” IEEE/ACM Trans. Audio,
Speech, Lang. Process., vol. 28, pp. 307–318, 2020.
the prediction accuracy and execution time than those of the [22] S. Singh and A. Mahmood, “The NLP cookbook: Modern recipes for
various AVMBPNN and AVMCNN Systems, as well as those of transformer based deep learning architectures,” IEEE Access, vol. 9,
VGG16. By the way, the conjecture model of the Advanced pp. 68675–68702, 2021.
AVMCNN System shown in the right-bottom corner of Fig. 1 [23] S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Trans. Knowl.
can be applied to various VM conjecture model as well. Data Eng., vol. 22, no. 10, pp. 1345–1359, Oct. 2010.
[24] Z.-P. Jiang, Y.-Y. Liu, Z.-E. Shao, and K.-W. Huang, “An improved VGG16
model for pneumonia image classification,” Appl. Sci., vol. 11, no. 23, Nov,
2021, Art. no. 11185.
REFERENCES [25] Y.-C. Su, T.-H. Lin, F.-T. Cheng, and W.-M. Wu, “Accuracy and real-
time considerations for implementing various virtual metrology algo-
[1] A. Weber, “Virtual metrology and your technology watch list: Ten things
rithms,” IEEE Trans. Semicond. Manuf., vol. 21, no. 3, pp. 426–434,
you should know about this emerging technology,” Future Fabr. Int.,
Aug. 2008.
vol. 22, no. 4, pp. 52–54, Jan. 2007.

Authorized licensed use limited to: National Cheng Kung Univ.. Downloaded on July 14,2022 at 00:06:59 UTC from IEEE Xplore. Restrictions apply.

You might also like