You are on page 1of 16

Chemometrics and Intelligent Laboratory Systems 231 (2022) 104711

Contents lists available at ScienceDirect

Chemometrics and Intelligent Laboratory Systems


journal homepage: www.elsevier.com/locate/chemometrics

A review on autoencoder based representation learning for fault detection


and diagnosis in industrial processes
Jinchuan Qian a, Zhihuan Song a, Yuan Yao b, Zheren Zhu a, Xinmin Zhang a, *
a
State Key Laboratory of Industrial Control Technology, College of Control Science and Engineering, Zhejiang University, Hangzhou, 310027, China
b
Department of Chemical Engineering, National Tsing Hua University, Hsinchu, 30013, Taiwan

A R T I C L E I N F O A B S T R A C T

Keywords: Process monitoring technologies play a key role in maintaining the steady state of industrial processes. However,
Process monitoring with the increasing complexity of modern industrial processes, traditional monitoring methods cannot provide
Deep learning satisfactory performance. In the past decades, deep learning models have achieved rapid development in in­
Autoencoder
dustrial data analysis, especially autoencoder (AE), which has been widely used to deal with various challenges
Representation learning
of process monitoring, and a number of related works have been proposed. This paper aims to present a
comprehensive review of AE-based industrial applications, which mainly includes two parts: AE-based repre­
sentation learning and monitoring strategies, which illustrate the entire design process of AE-based monitoring
methods. In particular, AE, AE variants, and the encoder-decoder framework are briefly introduced first. Sec­
ondly, AE-based representation learning is comprehensively reviewed from the aspects of industrial data char­
acteristics. Then, the state-of-the-art studies of monitoring strategies, including fault detection strategies and
fault diagnosis strategies, are reviewed and discussed. Finally, some prospects for future research are explored.

1. Introduction information from the collected industrial data. In addition, due to the
development of sensors and data storage technologies, sufficient data
In modern industry, advances in technologies such as the Internet of can be sampled and stored. These data can be used as solid data support
Things, cloud computing, big data analysis, and artificial intelligence for training these data mining models, and further contribute to the
have promoted the development of intelligent manufacturing. In this era development of data mining technology in the industrial field.
of smart manufacturing, production needs are characterized by flexi­ Data-driven methods are developed based on these data mining
bility and diversity. To meet these demands, industrial processes are models and can provide an effective solution for process information and
becoming increasingly complex, which also places higher demands on fault information extraction in the absence of expert experience and
the safety and reliability of the industrial process. mechanism knowledge. Therefore, the data-driven method has become a
Process monitoring is a technology to maintain the steady operation mainstream strategy for monitoring tasks and a hot research topic in
of industrial processes, and is the key to realizing safe production and recent years [1–3]. Fig. 1 illustrates the general procedure of process
ensuring product quality, especially in the large-scale complex industrial monitoring and the basic steps of establishing data-driven monitoring
process, a minor fault may lead to a serious production loss. Process methods to achieve the corresponding monitoring tasks.
monitoring is mainly achieved by fault detection and fault diagnosis As shown in Fig. 1, data-driven process monitoring methods can be
(FDD), and these two steps are generally performed in turn, i.e., from achieved by feature engineering and monitoring strategy. Feature en­
whether the fault happens (fault detection) to the specific faulty variable gineering aims to extract the feature representation that can reflect the
and fault type (fault diagnosis). However, due to the complexity of characteristics of processes. Although feature representation can be
modern industrial processes, it is difficult to use expert knowledge and directly constructed manually, it relies on expert experience and is
build accurate mechanical models to identify fault information directly. difficult to realize in some complex industrial processes. Some statistical
In recent years, data mining technology has developed rapidly, which learning models, such as principal component analysis (PCA) and partial
makes it possible to extract industrial information and detect fault least squares (PLS) [4,5] can map original data to the latent feature

* Corresponding author.
E-mail addresses: qianjinchuan@zju.edu.cn (J. Qian), songzhihuan@zju.edu.cn (Z. Song), yyao@mx.nthu.edu.tw (Y. Yao), xinminzhang@zju.edu.cn (X. Zhang).

https://doi.org/10.1016/j.chemolab.2022.104711
Received 7 September 2022; Received in revised form 25 October 2022; Accepted 4 November 2022
Available online 10 November 2022
0169-7439/© 2022 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-
nc-nd/4.0/).
J. Qian et al. Chemometrics and Intelligent Laboratory Systems 231 (2022) 104711

space, which contains the principal process information and signifi­ Because AE has the advantages of strong representation learning ability,
cantly reduces the burden of manual feature designing. However, simple structure, and easy training, it has been widely used in the field of
traditional statistical learning models usually require the data to meet FDD. Furthermore, to overcome different data characteristics or to force
certain conditions such as Gaussian distribution, linear correlation, etc. the learned feature representations to exhibit different useful properties,
In practice, the industrial process is usually more complicated, and in­ a number of AE-based variants have been developed and applied to the
dustrial data often show various characteristics such as dynamic corre­ FDD field. In addition, a more general AE framework, the encoder-
lation, nonlinearity, multimode, etc., making it difficult for these decoder framework, has recently been developed and attracted more
traditional models to show good application results. The deep learning attention in academic and engineering fields. In the encoder-decoder
model can effectively solve this problem through multi-layer forward framework, the structures of the encoder part and decoder part can be
mapping and has been widely used in industrial processes [6–9]. replaced with other deep learning modules, such as recurrent neural
Deep learning models are mainly developed based on neural network network (RNN) and convolutional neural network (CNN), to extract
models. Compared with traditional statistical learning models, deep complex data features [14,15].
learning models have a stronger capability of extracting the underlying A simple experiment based on the benchmark data is presented here
information from data and can accomplish feature engineering auto­ to show the effectiveness of AE in fault detection applications [16]. The
matically by representation learning [10,11]. Representation learning fault detection rates (FDRs) of PCA, AE, denoising AE (DAE), and
aims to learn a mapping function that maps the raw data to a feature contractive AE (CAE) are compared here, and Table I lists the FDRs for
space where the characteristic of the process can be better described [12, 21 faults. Note that all the fault alarm rates (FARs) are controlled under
13]. Therefore, this process can also be regarded as a feature extraction 5%, and the indexes are designed based on the reconstruction error and
process, and the depth of extracted features usually plays an essential the deviation of the obtained features. As can be seen from Table I,
role in improving the performance of the developed process monitoring compared with the traditional monitoring model PCA, AE and its vari­
methods. ants have improved detection performance for most faults, especially in
In addition, deep learning-based feature extraction is less con­ faults 10, 16, 19, and 20, the performance is significantly improved.
strained by data characteristics. The structure of the deep learning In the past decades, a large number of data-driven process moni­
model is mainly formed by multi-layer neural networks, which can be toring methods have been proposed. Several researchers have conducted
flexibly changed to meet different practical needs. In the process of data reviews of data-driven process monitoring methods in terms of model
forward mapping, features are continuously refined and abstracted, types and practical applications. For example, Ge et al. [2] reviewed the
which can effectively improve the performance of downstream process statistical learning-based process monitoring methods in plant-wise in­
monitoring tasks. After the network structure of a deep learning model is dustrial processes. Qin et al. [3] systematically reviewed dynamic latent
defined, the loss function needs to be further designed to realize network variable analytics-based process monitoring methods. Rezaeianjouybari
training and feature extraction. The loss function is usually determined et al. [12] and Khan et al. [13] presented detailed reviews about deep
by reconstruction error or the loss of the downstream task. If special learning-based prognostics and health management (PHM). Yang et al.
feature extraction requirements are needed, the network structure or [17] reviewed AE-based fault diagnosis methods in mechanical equip­
loss function constraints can be modified accordingly. Nowadays, with ment. Different from the above literature, this paper presents a
the development of deep learning frameworks and the improvement of comprehensive review of AE-based representation learning and its FDD
GPU acceleration strategies, the training and application of deep neural applications in industrial processes. In addition, it presents detailed
networks are becoming more and more convenient in practice. procedures for designing an appropriate monitoring strategy. Compared
Autoencoder (AE) is a typical deep learning model designed for with the existing reviews, this paper has the following contributions:
learning feature representation of data in an unsupervised manner.

Fig. 1. Flowchart of process monitoring and data-driven monitoring method.

2
J. Qian et al. Chemometrics and Intelligent Laboratory Systems 231 (2022) 104711

Table 1
FDRs for 21 faults in benchmark data.
Fault no. PCA AE DAE CAE

SPE T2 SPE T2 SPE T2 SPE T2

IDV 1 0.9988 0.9913 0.9988 0.9888 1.0000 0.9913 0.9988 0.9913


IDV 2 0.9613 0.9838 0.9838 0.9800 0.9838 0.9825 0.9838 0.9838
IDV 3 0.0413 0.0113 0.0450 0.0113 0.0538 0.0075 0.0463 0.0050
IDV 4 1.0000 0.2288 1.0000 0.2850 1.0000 0.2400 1.0000 0.3188
IDV 5 0.2575 0.2388 0.3138 0.1388 0.3200 0.2138 0.3238 0.2075
IDV 6 1.0000 0.9913 1.0000 0.9813 1.0000 0.9838 1.0000 0.9875
IDV 7 1.0000 1.0000 1.0000 0.9425 1.0000 1.0000 1.0000 0.9938
IDV 8 0.8838 0.9688 0.9800 0.8350 0.9788 0.9425 0.9788 0.9513
IDV 9 0.0238 0.0113 0.0400 0.0088 0.0488 0.0125 0.0400 0.0100
IDV10 0.3525 0.2800 0.5200 0.0275 0.5325 0.1013 0.5013 0.1338
IDV11 0.7863 0.4175 0.7675 0.4488 0.7975 0.3863 0.7600 0.4763
IDV12 0.9113 0.9838 0.9913 0.8000 0.9900 0.9550 0.9913 0.9600
IDV13 0.9525 0.9363 0.9525 0.7963 0.9538 0.9238 0.9525 0.9213
IDV14 1.0000 0.9925 1.0000 0.9325 1.0000 0.9738 1.0000 0.9738
IDV15 0.0425 0.0125 0.0525 0.0025 0.0600 0.0063 0.0500 0.0075
IDV16 0.3638 0.1250 0.4175 0.0138 0.4525 0.0250 0.4438 0.0575
IDV17 0.9638 0.7625 0.9600 0.6763 0.9625 0.7263 0.9625 0.7388
IDV18 0.9038 0.8925 0.9013 0.8875 0.9038 0.8888 0.9050 0.8900
IDV19 0.2025 0.1063 0.2113 0.1175 0.3100 0.1125 0.2375 0.1013
IDV20 0.5375 0.3225 0.5838 0.0738 0.6075 0.2400 0.5888 0.2200
IDV21 0.4888 0.4150 0.5050 0.3138 0.5200 0.3763 0.5288 0.3100

(1) Different AE-based representation learning strategies, such as with the training data, it is known that the loading matrix of PCA is
dynamic feature representation, multi-model feature represen­ formed by the first p eigenvectors Up of Σ XX . Baldi et al. [19] proved that
tation, robust feature representation, target-dependent feature when the linear AE is trained to a unique local and global minimum, the
representation, neighborhood sample representation learning, total forward mapping weight matrix is an orthogonal projection onto
etc., are summarized and classified according to the intrinsic the space spanned by Up , and the corresponding weights in the encoder
features of process data. and decoder can be seen as the multiplication of an invertible matrix and
(2) Different AE-based fault detection strategies are summarized and Up . Daniel et al. [20] further proved that with L2 regularization, the left
classified, including general fault detection strategies, multi­ singular matrix of the decoder weights in a well-trained linear AE equals
modal fault detection strategies, distributed fault detection stra­ to the left singular matrix of the data matrix, which means that PCA can
tegies, quality-related fault detection strategies, and combined be established by performing singular value decomposition (SVD) on the
fault detection strategies. decoder weights and the dimension of the principal components can be
(3) Different AE-based fault diagnosis strategies are summarized and adjusted by the strength of L2 regularization.
classified from the perspectives of fault classification and fault PCA is one of the most commonly used monitoring models, and a lot
identification. For fault classification, it is further classified from of work has illustrated the effectiveness of PCA in the field of process
the perspective of model training strategies, including pre- monitoring. As discussed above, linear AE is closely related to PCA in
training and fine-tuning, simultaneous training, and semi- model structure and parameters, so the two models have similar feature
supervised training. For fault identification, it is further divided extraction capabilities. However, simply using linear AE is hard to deal
into contribution plot, variable selection, and interpretability. with complex data mining tasks, so AE is generally used in combination
(4) Challenges and some interesting future topics are analyzed and with nonlinear activation functions.
presented, including multimodal and heterogeneous data pro­ In this section, basic AE, AE variants, and a general framework called
cessing, joint modeling with mechanism knowledge and data, and encoder-decoder will be introduced. Fig. 2 illustrates a brief description
integrated system with process recovery. of these models.
(5) This review can provide detailed technical guidance for new re­
searchers and algorithm developers in the field of process
monitoring. 2.1. Autoencoder and its variants

The organization of this paper is as follows. In Section 2, the basic As shown in Fig. 3, the structure of AE includes an encoder and a
AE, AE variants, and encoder-decoder framework are briefly introduced; decoder. The encoder aims to map the original data to the feature space,
Section 3 introduces the implementation methods of AE in the process of while the decoder reconstructs the input from the feature
representation learning; Section 4 and Section 5 respectively introduce representation.
the strategies of detection and diagnosis based on the extracted features; Assuming x denotes the input, and then the encoder and decoder
Section 6 gives some research prospects; Finally, conclusions are made. functions can be described as follows:
t = σ(We x + be ) (1)
2. Preliminaries
σ (Wd t + bd )
x=̂
̂ (2)
Autoencoder (AE) is a neural network model for unsupervised
feature extraction [18]. It presents a reconstruction-based learning where t denotes the output of the encoder, which is the feature to be
paradigm for feature representation learning. It should be noted that obtained. We andbe denote the weight and bias of the encoder function,
linear AE (an AE has one hidden layer with linear activation units and no respectively. Wd andbd denote the weight and bias of the decoder
biases) has exactly the same structure as PCA. Some researchers have
function, respectively. σ(∗) and ̂σ (∗) denote activation functions, which
deeply explored the relationship between linear AE and PCA in model
is the key for a neural network to capture nonlinear features.
parameters. Assuming that Σ XX denotes the covariance matrix associated
Table II tabulates the descriptions of some commonly used activation

3
J. Qian et al. Chemometrics and Intelligent Laboratory Systems 231 (2022) 104711

Fig. 2. A brief description of AE, AE variants, and encoder-decoder framework.

Table 2
Descriptions of some commonly used activation functions.
Activation function Equation Output Output curve

Sigmoid 1 [0,1]
f(x) =
1 + e− x

Tanh ex − e− x
[-1,1]
f(x) =
ex + e− x

ReLU f(x) = max(0, x) [0, ∞]

σ (Wd σ (We x + be ) + bd )
r(x) = ̂ (3)

1 ∑N
LAE = ‖xn − r(xn )‖2 (4)
N n=1

where N is the number of training samples. The model training can be


realized by gradient decent algorithm, where the gradient is calculated
by the back-propagation (BP) algorithm. In addition, some other opti­
mization algorithms such as Adam, RMSprop, etc. can also be selected to
optimize the network training process [21]. After training, the feature
mapping t can be used as a feature representation. In order to modify
traditional AE to meet different requirements of representation learning,
several modified strategies are proposed.
Traditional AE only utilizes a single-layer encoder for feature
Fig. 3. Structure of AE. extraction. Therefore, it is difficult for this basic model to extract deep
features. Deepening the neural network structure can be an effective
functions, including sigmoid, Tanh, and ReLU. It can be seen that both improvement strategy. Following the layer-wise learning strategy, a
sigmoid and Tanh have saturation areas where the gradients are close to number of basic AEs can be stacked together to form a stacked AE (SAE)
zero, which will cause the problem of gradient vanishing. ReLU can to extract complex features in the data [22]. Fig. 4 shows the structure
provide stable gradients in the range of [0, ∞], so gradient vanishing is and learning strategy of SAE, where t(i) denotes the hidden output of the
avoided. Moreover, ReLU has the merits of sparsity and simple calcu­ ith AE. Every single AE in it is trained in turn, and the final feature
lation, which make ReLU a widely used activation function today, representation is obtained by mapping the input data to the deepest
especially in some deep neural network models. However, sigmoid and hidden layer.
Tanh are needed when the feature outputs are required to be within a In addition to deepening the network structure, changing the
certain range. training loss function is another commonly used improvement strategy.
The reconstruction mapping function can be obtained by combining Both sparse AE and contractive AE follow this idea [23–25]. Sparse AE
encoder and decoder functions, as shown in equation (3). The parame­ adds a regularization term to constrain the activation of the hidden layer
ters in AE are trained by minimizing reconstruction error, and the loss output. More specifically, the sparse regularization equation is as
function is given in equation (4). follows:

4
J. Qian et al. Chemometrics and Intelligent Laboratory Systems 231 (2022) 104711

Fig. 4. Structure and training strategy of SAE.

( ⃦ ) ρ 1− ρ
KL ρ⃦ρj = ρ log + (1 − ρ)log (5)
ρj 1 − ρj

nh
∑ ( ⃦ )
LSparseAE = LAE + β KL ρ⃦ρj (6)
j=1

where pj denotes the activation of the output of the jth hidden layer
node, nh denotes the number of the hidden layer nodes, and β denotes an
adjustable parameter. The regularizing term in Sparse AE can be seen as
the KL divergence between the activation of hidden outputs and the
preset probability ρ.
Contractive AE (CAE) aims to extract robust features, which is also
achieved by adding a regularization term. The training loss function is as
follows:

LCAE = LAE + λ‖Jt (x)‖2F (7)

∑( ∂tj )2
‖Jt (x)‖2F = (8)
i,j
∂xi

where the regularization term ‖Jt (x)‖2F denotes the squared Frobenius
norm of the Jacobian matrix. This regularization term constrains the
derivatives of the encoder function, which makes the feature output less
Fig. 5. Training strategy of DAE.
sensitive to the changes in the input, and thus effectively reduces the
effect of noise in the input.
Denoising AE (DAE) is also designed for robust feature extraction.
1 ∑N
However, unlike CAE, DAE introduces a denoising training strategy for LDAE = ‖r(xn + ε) − xn ‖2 (9)
N n=1
model training [26,27]. As shown in Fig. 5, in the training stage of DAE,
the input samples are firstly corrupted by artificial noises, and then DAE
where ε denotes the noise for corruption, which commonly follows a
is trained to reconstruct the uncorrupted samples. The loss function is
zero-mean Gaussian distribution. It can be seen that DAE encourages the
given by:
reconstruction mapping to eliminate noise in the input, so the learned

5
J. Qian et al. Chemometrics and Intelligent Laboratory Systems 231 (2022) 104711

DAE is more robust to noisy inputs. as follows:


Moreover, in order to achieve the deep extraction of sparse and ∫ ∫
p(z)
robust features, AE and its variants can be effectively combined through LVAE = − qθ (z|x)log pφ (x|z)dz − qθ (z|x)log dz
the layer-wise training strategy to form models like stacked denoising z z qθ (z|x) (12)
= − Eqθ (z|x) [log pφ (x|z)] + KL(qθ (z|x)‖p(z))
AE (SDAE), sparse stacked AE (sparse SAE), and so on.
where the first part of equation (12) can be seen as the reconstruction
2.2. Variational Autoencoder error and the second part denotes the KL divergence between qθ (z|x) and
the prior distribution p(z). In general, the feature output z in VAE is
Variational Autoencoder (VAE) is a generative model, which consists assumed to follow the Gaussian distribution. Thus, the output of the
of an inference network and a generation network [28], as shown in inference network contains two parts: mean μ and standard deviation σ,
Fig. 6. Unlike basic AE, the features extracted by VAE are described by a and p(z) is set to N(0, I). In addition, to realize the back-propagation of
certain form of a probability distribution, and the optimization the gradient, a re-parameter trick is further developed in VAE.
computation follows the idea of variational Bayes inference.
The optimization objective of VAE is developed from the likelihood
function, which is defined as: 2.3. Encoder-decoder framework

L= log p(x) (10) It can be seen that the aforementioned basic models are achieved by
x
similar model structures and learning strategies. Thus, these models can
be generalized as a general encoder-decoder framework.
where p(x) denotes the distribution of the original data, and it can be
The encoder-decoder framework divides the model into an encoder
transformed by the following equations.
∫ part and a decoder part, and presents a flexible model construction
strategy for feature extraction. Based on this framework, the network
log p(x) = q(z|x)log p(x)dz
z structures in the encoder and decoder are not limited to the fully con­

p(x, z)q(z|x) (11) nected layer (FC layer), but can also use modules such as recurrent
= q(z|x)log dz
∫z p(z|x)q(z|x) neural network (RNN), convolutional neural network (CNN) [29–33].
≥ q(z|x)log
p(x|z)p(z)
dz Moreover, this framework can be used not only for end-to-end training,
∫ z ∫q(z|x) such as the machine translation model Seq2Seq and CNN-based AE using
= q(z|x)log p(x|z)dz + q(z|x)log
p(z)
dz the deconvolution layer, but also for unsupervised feature extraction via
z z q(z|x) reconstruction loss [34–36]. Fig. 7 illustrates a schematic of the
∫ encoder-decoder framework.
p(x|z)p(z)
where z q(z|x)log q(z|x) dz is named as evidence lower bound (ELBO), In practical industrial processes, complex industrial information such
and the likelihood maximization is optimized by maximizing ELBO. VAE as dynamic and spatial features commonly exist, which will seriously
provides an effective solution for solving this optimization problem, affect the performance of traditional models. Based on the general
where q(z|x) and p(x|z) are estimated by neural networks, namely, encoder-decoder framework, the modules in the encoder part and
inference network and generative network. The parameters in these two decoder part can be selected and combined to effectively deal with these
parts are denoted as θ and φ , respectively. Therefore, q(z|x) and p(x|z) complex feature extraction tasks. Table III briefly lists the application of
are further denoted as qθ (z|x) and pφ (x|z), and the loss function is given different network modules.

Fig. 6. Structure of VAE.

6
J. Qian et al. Chemometrics and Intelligent Laboratory Systems 231 (2022) 104711

Fig. 7. Encoder-decoder framework.

3.1. Dynamic feature representation


Table 3
Module and its application.
Process dynamics are widely present in industrial processes and are
Module Application mainly caused by process mechanisms and control loops, and this pro­
FC layer nonlinear feature extraction cess characteristic is mainly manifested in the autocorrelation between
CNN spatial/temporal information extraction samples. Therefore, traditional static feature extraction methods cannot
RNN dynamic representation learning
obtain valid dynamic features. Some researchers combined AE or its
variants with traditional dynamic models such as autoregressive model
3. Representation learning [37], Kalman filter [38], and slow feature analysis (SFA) [39] to achieve
the representation learning in the dynamic process.
Representation learning aims to learn the feature representation that Another commonly used solution to this problem is serializing the
can reflect the intrinsic characteristics of data through a model mapping. original samples to sequences and training the model using these se­
In the industrial application, the feature obtained by representation quences. Jiang et al. [40] flattened the transformed sequences and used
learning can be seen as an abstract representation of industrial knowl­ sparse SAE to extract dynamic features. Yin et al. [41] utilized mutual
edge. The extracted features are hard to interpret, but compared with information to select the most relevant samples to the current time
the original variables, these features can present a more accurate samples, and eliminated irrelevant variables in these selected samples,
description about the changes in process information. The accurate which effectively optimized the sequence construction and reduced the
capture of these changes can effectively guide the downstream tasks, e. redundancy. Following a similar idea, Agarwal et al. [42] utilized
g., fault detection can be achieved through detecting abnormal devia­ layer-wise relevance propagation between the original variables and
tion, and fault classification can be achieved by extracting differences in hidden features to simplify the model structure and improve the inter­
different operation conditions. Therefore, representation learning is an pretation of the model. Li et al. [43] concatenated the current sample
important step for developing subsequent monitoring strategies. with the weighted sum of its temporal neighborhoods to perform the
As a common unsupervised feature extraction model, AE has a strong subsequent dynamic feature extraction.
nonlinear feature extraction ability and has been widely used for In addition, based on the encoder-decoder framework, modifying the
extracting complex industrial features. However, in practice, industrial module in each part can also enable AE to realize dynamic representa­
data generally show various characteristics such as nonlinearity, multi- tion learning. Replacing the encoder and decoder parts with RNN or its
mode, and dynamics. Therefore, in order to deal with various charac­ variants is a widely used strategy. Ren et al. [44] utilized LSTM to
teristics in industrial data, ensure the completeness of the obtained construct the encoder-decoder model to achieve feature extraction in the
features, and provide effective feature representation for monitoring dynamic process. Cheng et al. [45] utilized VAE for further feature
tasks, the basic AE needs to be further improved. The commonly used extraction on the features obtained by Seq2Seq. Moreover, time series
improvement ideas mainly include two parts: the improvement of can also be viewed as a two-dimension image, where the second
training strategy and the improvement of model structure. This section dimension denotes time, so CNN can also be used as an improvement for
will introduce and summarize the research status of AE-based repre­ dynamic feature extraction, where the decoder part is constructed by a
sentation learning in the industrial process according to the different deconvolution operation. Based on this idea, Kanno et al. [46] proposed
requirements of feature extraction. Fig. 8 illustrates the framework of a CNN-AE structure and added a DAE for deep extraction of process
AE-based representation learning for industrial processes. features. Liu et al. [47] combined the attention mechanism to integrate
the relevant information into the sequential data to modify the structure
of the CNN-based AE. Maggipinto et al. [48] combined variable selection

7
J. Qian et al. Chemometrics and Intelligent Laboratory Systems 231 (2022) 104711

Fig. 8. Framework of AE-based representation learning for industrial processes.

to select the most efficient features obtained by AE and reduce the model representation learning, the improvements in model structure and
redundancy. One dimension CNN (1DCNN) is a special form of CNN training strategy should be made for guiding the network to explore the
designed for dealing with time series. Based on the framework of information about different operation modes. The most commonly used
encoder-decoder, various studies have been proposed using the 1DCNN strategy is to incorporate sparse regularization into the loss function, as
module [49–52]. it can restrict the activation degree of nodes during the multimode
training so that hidden layer nodes can be selectively activated [54,55].
Besides, Lu et al. [56] designed a regularization for mode elimination to
3.2. Multimode feature representation reduce the influence of mode change, and they further utilized the Fisher
criterion to increase the sensitivity of features to fault information.
In some industrial processes, switching between multiple operating Wang et al. [57] designed a network structure based on two SAEs for
conditions is required, which makes the process data show the multi­ extracting mode-common features and mode-specific features. In addi­
modal characteristic. From the perspective of data distribution, the tion, combining with traditional clustering models such as the Gaussian
multimode data are concentrated in several local areas. Traditional mixture model (GMM) can also effectively improve the multimode
multimode feature extraction methods generally cluster the data into representation learning ability of AE and its variants. Zhou et al. [58]
several categories in the first stage, and then process the data in each proposed a joint training strategy by combining the reconstruction loss
category separately. The same strategy can be achieved by combining with the optimization objection of GMM. Tang et al. [59] utilized GMM
AE. For example, Gao et al. [53] proposed a hierarchical mode identi­ as a prior distribution of the hidden output, which improved the ability
fication method to separate the multimode industrial data, and used of VAE to deal with multimode data.
SDAE to extract the features of the data of a single mode. In addition, AE
has a number of hidden nodes, and each node can be selectively acti­
vated according to the types of input data. Therefore, it is achievable to 3.3. Robust feature representation
use different nodes of AE or its variants to describe different operation
modes. Zhou et al. [54] visualized the feature outputs of the Sparse Data pollution will hide the process information underlying the data,
SDAE when using a set of multimode data as training data, and the re­ making it challenging to achieve valid representation learning. More­
sults show that some of the hidden features can obviously reflect the over, as mentioned above, the extracted feature representation can
operation mode information. effectively capture the changes in the process operation. However, when
To make AE perform a more effective and stable multimode the collected data are seriously polluted, the detected changes might

8
J. Qian et al. Chemometrics and Intelligent Laboratory Systems 231 (2022) 104711

cause by the defect of input data rather than the fault information, which 3.5. Learning from neighborhood samples
will seriously affect the FDD performance. Data pollution is common in
practice, so it should be considered when designing a representation Neighborhood samples can be obtained by similarity measurement,
learning strategy. and these samples usually contain similar features and attributes. Rep­
Industrial data are easily affected by the noise in the process. In resentation learning with these neighborhood samples can help models
addition, when the performance of sensors and data acquisition systems explore the intrinsic geometrical information and local information in
degrades, the quality of the collected data also degrades, resulting in the the industrial data.
problems of outliers and missing data. Thus, robust feature representa­ Some manifold learning methods can be used to extract information
tion learning is necessary for solving these problems. from neighborhood samples, and one of the strategies is to perform
DAE is trained by a corrupted reconstruction task, which makes DAE further feature extraction in the features obtained by AE. He et al. [83]
more robust to the noise corruption, and thus, various studies have been combined discrimination locality preserving projections and sparse AE
proposed to deal with process noise based on DAE [60–65]. Besides, Hu for global and local information extraction. Lu et al. [84] performed
et al. [66] proposed a low-rank reconstruction strategy for AE training, t-distributed stochastic neighbor embedding (t-SNE) on the features
aiming to deal with the problems of noise pollution and outliers. As for obtained by AE to achieve better feature visualization. Another solution
the problem of missing data, Choudhury et al. [67] utilized K-means for is to construct a neighborhood sample-related regularization and add it
missing data imputation, and then used these imputed data to complete to the loss function. Wang et al. [85] constructed a locality preserving
feature extraction. McCoy et al. [68] proposed a VAE-based iterative projections-based regularization for reducing the distance between
algorithm to impute the missing data, where the reconstruction error is neighborhood samples to extract local information. Yu et al. [86] con­
used for measuring the accuracy of the imputation. Ba-Alawi et al. [69] structed a manifold regularization to guide AE to extract the information
further combined VAE with ResNet to improve the imputing accuracy from the neighborhood samples. Zhang et al. [87] further utilized
and help the model extract some more complex features underlying the manifold regularization in the CNN-based AE to capture intrinsic data
industrial data. information.

3.4. Target relevant representation learning 4. Fault detection

Target relevant features refer to the features that are highly related to The aim of fault detection is to extract the fault information and
target variables, such as fault labels, key performance indicators, etc., establish an index to indicate the occurrence of a fault. As discussed
which are more effective for the downstream tasks and can help to above, the features obtained by AE can effectively describe the process
construct the fault detection methods with special requirements. The information and evaluate the operation state of the process after a valid
representation learning for target relevant features aims to add target representation learning, so fault detection can be achieved by analyzing
information in the feature extraction process, which can be achieved these obtained features.
through the target variable-based supervised learning. However, in A general strategy for AE-based fault detection is to measure the
some processes, the target variables are difficult to measure or have deviation of extracted features between the current operation state and
much lower sample rates, so semi-supervised target relevant feature the normal state. The magnitude of reconstruction error can also be used
representation learning strategies are also considered. as a fault detection index. Moreover, based on this general strategy, a
During the representation learning process, the information in the number of works have been improved to meet different requirements of
fault data or fault labels can supervise the network to extract some fault- fault detection. In this section, a detailed review of AE-based fault
relevant features. One of the commonly used solutions to achieve it is to detection is given, including the general strategy and some modified
add a structure for label prediction, like the softmax layer, and add a strategies for fault detection. The procedure of industrial fault detection
prediction error part to the loss function [70–73]. Wang et al. [71] and the framework of AE-based fault detection strategies are briefly
added a fault label prediction part to the layer-wise training of SAE, illustrated in Fig. 9.
which makes every single AE in it contains fault information. Yan et al.
[73] augmented the fault samples with fault labels to increase the 4.1. General fault detection strategies
discrimination between different fault features. Besides, some studies
further combined feature selection or variable selection strategies to The most general strategy for the fault detection task is to construct a
establish the relationship between features and fault information. Pan fault index and set the control limit. If the fault index of the online
et al. [74] separated all the hidden nodes into several groups to deal with sample exceeds the control limit, the fault is considered to have
different faults. Yin et al. [75] utilized mutual information to select the occurred, otherwise, the process is in a normal state. The fault index
fault-related variables to better extract fault-related features. Zhang used here is a measure of the difference between the current operation
et al. [76] combined Wasserstein distance to minimize the difference state and the normal state. The AE used for fault detection is generally
between the feature output distributions of labeled and unlabeled trained by a set of normal state data, so when a fault sample is used as
samples to enable the integration of fault information into the extracted input, the AE cannot reconstruct it effectively, resulting in a large
features. reconstruction error. In addition, the features will also show deviation
The representation learning for key performance index relevant compared with that in a normal state. Therefore, the fault information
features can also be realized by similar ideas. Yan et al. [77–79] com­ can be reflected by the reconstruction error and the deviation of feature
bined AE with a traditional quality prediction model to extract the outputs.
quality-relevant features. Wang et al. [80] proposed a VAE-based sem­ According to the above analysis, Yan et al. [65] constructed two fault
i-supervised model for quality-relevant feature extraction. Tang et al. indexes, SPE and H2, based on AE and its variants, where SPE is used to
[81] modified the loss function of VAE to maximize the mutual infor­ indicate the reconstruction error and H2 is used to measure the
mation between the hidden outputs and the quality variables. Zhu et al. magnitude of the feature outputs. Jiang et al. [64] constructed SPE and
[82] separated the features obtained by VAE into a quality-relevant part T2 fault indexes, which are similar to PCA-based fault indexes. The
and a quality-irrelevant part, which can be further used for analysis results show that SPE is more sensitive to fault information
quality-related fault detection tasks. when DAE is used. As mentioned in the previous section, AE can be
modified to meet different requirements of representation learning, but
the structures of its modifications generally contain two parts: feature
mapping and reconstruction. Therefore, the same idea of fault index

9
J. Qian et al. Chemometrics and Intelligent Laboratory Systems 231 (2022) 104711

Fig. 9. Framework of AE-based fault detection.

construction can be used in those modified AE-based variants [88–94]. some common probability distributions like Gaussian distribution, chi-
In addition, in the framework of the encoder-decoder, the modules in square distribution, and F distribution. However, after the feature
each part can be changed to RNN or CNN, but the fault indexes based on mapping and reconstruction of AE, some assumptions will be violated
these models can also be constructed through the general strategy and those commonly used distributions cannot be used to determine the
[95–99]. Yu et al. [95] proposed a ConvLSTM module to form the control limit. Kernel density estimation (KDE) can fit the PDF of the
encoder part and decoder part, and constructed two indexes to measure index values under normal conditions without assuming the data dis­
the deviation of the features and reconstruction error. Kanno et al. [96] tribution, so this method can be used to solve this problem [112,113].
constructed T2 and SPE based on CNN-based AE for fault detection tasks In addition, one-class methods such as support vector data descrip­
in a dynamic process. Toikka et al. [97] improved AE using LSTM, in tion (SVDD) and one-class neural network [114–116], as well as isola­
which SPE is constructed for fault detection. tion forest [117,118], can also be used as fault detection strategies. The
From the literature and analysis mentioned above, the fault indexes merit of these methods is that they do not need to construct some spe­
are mainly used to analyze the fault information in the feature space and cific fault detection indexes, and only need to map the data to the feature
the residual space. The fault index of residual space in each work is space and the residual space using the trained AE and its variants.
relatively simple, because the reconstruction error is sufficient to reflect Moreover, if the faulty samples are included in the training dataset,
the fault information [64,100–103]. In the feature space, the fault in­ using AE to train a two-class classification network is also a feasible
formation can be reflected in the magnitude of the feature outputs, such solution [119–121].
as H2. However, a more common strategy to measure the deviation in
feature space is to calculate the distance to the normal feature outputs. 4.2. Multimode fault detection strategies
For example, Euclidean distance [88,104] and Mahalanobis distance
[39,47,50,66,87,103,105] are utilized to indicate the fault information. The multimode process usually includes various operation states,
In addition, combining the idea of KNN is also a suitable strategy for and when the multimode features are fully extracted by AE, the general
analyzing the fault information. Because after mapping the data into strategy can be used directly for fault detection. To further improve the
these two spaces, the distance to the neighborhood samples in the performance of fault detection, some work has been developed accord­
training dataset will get larger when a fault happens, so the KNN dis­ ing to the characteristics of the multimode process. Wu et al. [122]
tance can be used for fault detection [106–108]. utilized local adaptive standardization for data preprocessing and pro­
The hidden outputs of VAE are different from traditional AE, so the posed a VAE-BiLSTM model for multimode feature extraction, where the
construction of the fault indexes should be changed accordingly. Zhang fault index used here is based on the loss function of VAE. Tang et al.
et al. [109] utilized VAE to obtain a set of Gaussian distributed features [59] modified VAE by replacing the prior distribution with a Gaussian
and combined them with the variance of normal state data to construct mixture distribution, and integrated the features of each mode to
an index for analyzing the fault information in feature space. Wang et al. construct the index in the feature space. In the residual space, the
[110] utilized the parameters of mean and standard deviation obtained reconstruction error is also used as a fault index.
by VAE to calculate the KL divergence between the standard Gaussian In addition, it is also a feasible solution to construct fault indexes for
distributions to measure the deviation in the feature space, while in the a single operation mode. Gao et al. [53] established an SDAE-based fault
residual space, they further combined with the obtained standard de­ detection model, in which the hierarchical mode identification method
viation to deal with the uncertainty in the industrial process. Bi et al. was first used to identify the transient state and steady state of the
[111] directly utilized the loss function of VAE as the fault detection process, and then the SPE index was used for fault detection in each
index, because the loss function of VAE mainly includes a reconstruction state. Wang et al. [57] constructed fault indexes for mode-specific fea­
error and a KL divergence between the feature distribution and the tures and mode-common features and an SPE-based global index for
standard Gaussian distribution. This index can be regarded as an inte­ multimode process fault detection.
grated index to analyze the fault information in both feature space and
residual space. 4.3. Distributed fault detection strategies
In practical applications, the setting of the control limit is another
key step. When using traditional linear models, the control limit can be The distributed strategy is established to solve problems in large-
determined by using the probability distribution functions (PDF) of scale industrial processes. Data sampled from the large-scale industrial

10
J. Qian et al. Chemometrics and Intelligent Laboratory Systems 231 (2022) 104711

process are usually high dimensional, which brings the time consump­ features and dynamic features, and then trained SAE and RNN-LSTM
tion of model training and online fault detection. Moreover, modern using these two types of features, respectively. Finally, fault detection
industrial processes have the characteristics of modularization, and the is achieved by a Bayesian information criterion (BIC)-based integrated
construction of models and detection indexes for a single module can index.
help to determine the location of faults. From the perspective of representation learning, this combination
A commonly used distributed strategy is to separate the whole var­ can further optimize the features obtained by AE. Moreover, the detec­
iables into several blocks, and perform model training and fault detec­ tion strategy used in the traditional models can also be used here, and
tion for each block separately. Li et al. [123] performed the variable only the method of determining the control limit needs to be changed.
separation according to the practical operation units and trained SAE for
each block. They further constructed fault indexes in each SAE to indi­ 5. Fault diagnosis
cate the fault information in each unit and a global index for the fault
detection of the whole process. Fault diagnosis is performed after fault detection, which is used to
The coupling between operation units should also be considered extract some more specific fault information like fault label, faulty
when modeling the process and establishing distributed fault detection location, etc. Fault classification and fault identification are two types of
strategies. When the units are strongly related to each other, the fault commonly used fault diagnosis approaches, and they output different
information will be propagated among the units, affecting the detection fault information. Fault classification can output the fault type, and fault
performance. Therefore, the influence between units needs to be identification is used to identify the faulty variable.
extracted to solve this problem. Chen et al. [124] proposed a novel index Since neural networks are more suitable for end-to-end training, a
for fault detection in a single unit, and the influence from the neigh­ number of deep learning-based fault diagnosis works have been
boring units is also considered. Huang et al. [125] modified VAE to designed for fault classification tasks. Among them, AE plays an
extract the impact of neighboring units and constructed a Euclidean important role in the development of the semi-supervised fault classi­
distance-based fault detection index. fication methods. In addition, by combining the idea of contribution
index and variable selection, some AE-based fault identification
4.4. Quality-related fault detection strategies methods have also been developed for identifying the fault variable in
complex industrial processes. Fig. 10 briefly illustrates the framework
The quality variable is directly related to product quality, so it is and categories of AE-based fault diagnosis.
always the most concerned variable in the industrial processes. How­
ever, the sampling rate of the quality variable is much lower than that of
5.1. Fault classification
the process variable measurements, so directly monitoring the fluctua­
tion of the quality variable cannot detect quality-related faults in time.
In general, the network structure of fault classification mainly in­
Quality-related information is also contained in the process variable
cludes two parts: a feature mapping part and a label prediction part. The
measurements, and the aforementioned target-related representation
label prediction part is mainly realized by a softmax layer, which is
learning can be used to extract this part of the information. To effec­
calculated as
tively use these quality-related features and achieve accurate fault
detection, a number of AE-based quality-related fault detection methods ( ) ehj
softmax hj = ∑ h (13)
have been proposed. Yan et al. [77] designed a supervised dual SAE for ie
i

quality-related feature extraction, and the obtained features are further


divided into a quality-related part and a quality-unrelated part using where hi denotes the output of the ith node, and the softmax layer
principal component regression (PCR). Then, T2-based fault detection converts the input value into the probability of belonging to the ith
indexes are constructed in these two parts. Tang et al. [81] combined category.
VAE and DeepVIB algorithms to separate the obtained features into a The procedures of AE-based fault classification application are
quality-related part and a quality-unrelated part, and constructed shown in Fig. 11 and two strategies can be used to achieve it. The first
T2-based indexes for quality-related fault detection. Yan et al. [78] one is training AE for unsupervised feature extraction, and then, the
constructed a detection index named Ty in the proposed quality-driven weights obtained at this stage is used as initialization for further su­
AE to describe the fault information in the quality-related features. pervised fine-tuning. The other is training these two parts simulta­
Wang et al. [80] proposed a semi-supervised quality-related VAE and neously. This strategy can guide AE to extract fault-related features, and
constructed a fault detection index based on the quality prediction error. the prediction part can be directly used for classification tasks after
training.
4.5. Combined fault detection strategies Based on the training strategies shown in Fig. 11, a number of fault
classification works have been developed. For example, Chopra et al.
AE has a powerful capability of unsupervised feature extraction, but [133] utilized a sparse AE for unsupervised feature extraction and then
the features obtained by AE have weak interpretability. When facing combined the fine-tuning strategy to implement the fault classification
detection tasks with special requirements, it is difficult to select the task. In Ref. [134], DAE is first employed to extract robust features of the
feasible features for constructing a fault detection strategy. Traditional data, and then the fault classification model is fine-tuned. Qiu et al.
models like PCA, PLS, and SFA can simply and quickly realize feature [135] first utilized SAE to extract the complex features lying in the in­
extraction with special requirements such as orthogonal, the decompo­ dustrial data, and then fine-tuning was performed to train a fault clas­
sition of dynamic and static features, etc. However, if the industrial sification model for a chemical process. Wang et al. [71] utilized labeled
process is complex, these methods may not be able to give a good per­ data for fault-related feature extraction, and further fine-tuned the
formance. Therefore, some researchers developed some improved AE- prediction part by supervised training. He et al. [136] proposed a deep
based fault detection methods by combining AE with traditional VAE-based diagnosis model, where the two parts are trained simulta­
models [126–132]. neously, and they further developed an unseen fault detection method
Agarwal et al. [126] combined Sparse AE with multiway PLS and based on this model.
multiway PCA for the fault detection task in the batch process. Wang For extracting dynamic features in the industrial process, based on
et al. [127] proposed a novel dynamic VAE and performed PCA on the the encoder-decoder framework, the diagnosis model is further modified
concatenated result of raw variables, and extracted features for fault by CNN and RNN. Li et al. [137] proposed a CNN-DAE based fault
detection. Yu et al. [129] first utilized Kernel SFA to extract static diagnosis model for the fault diagnosis task in the distillation process.

11
J. Qian et al. Chemometrics and Intelligent Laboratory Systems 231 (2022) 104711

Fig. 10. Framework of AE-based fault diagnosis.

Fig. 11. Training strategies for AE-based classification method.

Chen et al. [49] utilized 1DCNN-based AE for unsupervised feature by using the sinkhorn distance-based regularization.
extraction, and further developed a diagnosis model by supervised
fine-tuning. Yu et al. [72] utilized labeled data for the training of 5.2. Fault identification
CNN-based AE, which can help to extract fault-relevant features, and the
classifier structure can be directly used for fault classification. Moreover, The faulty variable diagnosed by the fault identification methods can
many strategies, such as adding a specially designed regularization be seen as more specific fault information of the fault location, which is
[138] and combining neighborhood samples [139] for AE training, also known as fault isolation. The identification results are shown by a
combining the ideas of ensemble learning to achieve the fusion of the set of contribution indexes that indicate the relevance between the
features obtained by multiple AEs [140], and using transfer learning to process variables and the occurred fault.
modify AE-based unsupervised learning [141,142], can also be used for Since the structures of AE and PCA are similar, PCA-based fault
improving the performance of AE-based fault classification. identification methods like contribution plot and reconstruction-based
The training of weights in the feature mapping part and the label contribution (RBC) can be utilized in AE for fault identification tasks.
prediction part can be performed independently, so the AE-based clas­ The basic idea of the contribution plot is to divide the fault detection
sification methods can be easily extended to semi-supervised forms, index into various parts, and each part is only related to a single vari­
which is meaningful for the fault labels that are usually rare in practice. able, where the faulty variable is considered to have a higher contri­
Jiang et al. [40] proposed a dynamic sparse SAE for semi-supervised bution value [144]. Taking SPE as an example, the following equation
fault classification, where both unlabeled and labeled data are used shows the basic strategy for constructing an AE-based contribution plot.
for unsupervised training, and only the labeled data is used for super­

m
vised fine-tuning. Zhang et al. [143] proposed an LSTM ladder AE for x ‖2 =
‖x − ̂ x i )2
(xi − ̂ (14)
semi-supervised chemical process fault diagnosis. Zhang et al. [76] i=1

improved the performance of AE-based semi-supervised fault diagnosis


where xi and ̂
x i denote the ith variable of the input sample x and its

12
J. Qian et al. Chemometrics and Intelligent Laboratory Systems 231 (2022) 104711

reconstructed value ̂ x , and each part (xi − ̂ x i )2 can be seen as the Thanks to the encoder-decoder framework, AE-based feature
contribution of the ith variable to the value of the fault detection index. extraction methods can be generalized to handle different data types.
Following the idea of the contribution plot shown above, a number of For example, traditional AE can deal with process measurements, and
AE-based fault identification methods have been proposed. For example, after combining CNN, 1DCNN, and RNN, AE can perform feature
Hallgrímsson et al. [145] constructed a contribution plot on the sparse extraction using images, vibration signals, and acoustic data [151–155].
AE. Cheng et al. [94] established a SAE-based contribution rate, which is Therefore, when facing multimodal and heterogeneous data, AE-based
a normalized contribution plot that is used in a chiller system. To deal methods are feasible in data representation learning tasks. However,
with the dynamic process, Toikka et al. [97] established a contribution how to achieve the fusion of all these extracted features and how to
plot based on the LSTM-AE. Jiang et al. [132] further established a develop the monitoring strategy based on these features are the key
contribution plot on the proposed two-dimensional deep correlated problems that need to be solved. Moreover, the development of an in­
model, which combines SAE and canonical correlation analysis (CCA) tegrated AE-based feature extraction framework for multimodal and
for batch process monitoring. heterogeneous data is also expected.
Contribution plot is an efficient method for fault identification and is Up to now, little work has been proposed for this problem, but ad­
easy to construct. However, the diagnosis result of the contribution plot vances in this field can significantly improve the performance of the
is easily affected by the smearing effect, which is mainly caused by the monitoring methods.
forward propagation of fault information in the model. Acala et al. [146]
analyzed the diagnosability of the contribution plot and proposed RBC 6.2. Joint modeling with mechanism knowledge and data
to address this problem. The contribution index of RBC is optimized by
fault reconstruction, which can effectively suppress the smearing effect. Deep learning models are essentially learning feature representations
Due to the similarity in the structures of AE and PCA, the smearing effect from the given training data, so if the training data does not contain
will also affect the performance of the AE-based contribution plot. sufficient process information or the collected industrial data does not
Therefore, based on the idea of RBC, several fault identification works cover all the operation regions, the trained model will show a lack of
have been developed. Qian et al. [147] established an RBC-based generalization. Mechanism knowledge, which can be obtained through
contribution index based on AE and a local linearization strategy, and expert experience and process prior knowledge, is a generalized inter­
they further proposed a modified method by using an adaptive corre­ pretation of process information and plays an important role in process
lation weight and a hidden output estimation [148]. Tang et al. [149] modeling and fault diagnosis. However, mechanism knowledge is rarely
proposed a novel RBC-based method, which is constructed on VAE and considered in the development of traditional data-driven fault diagnosis
uses the branch and bound strategy to accelerate the diagnosis process. methods, especially in the deep learning-based methods. As mentioned
In addition, some studies utilized the variable selection method to above, the features obtained by a neural network like AE are hard to
realize fault variable identification. Dong et al. [150] utilized LASSO to interpret, which makes the performance unstable in some applications.
identify the fault relevant variable. Yu et al. [60] proposed an elastic The mechanism knowledge can provide a robust description of the
network-based method to improve the identification performance. process, so with its help, deep learning-based fault diagnosis methods
Moreover, fault identification can be seen as a procedure of interpreting can give a more stable performance without complex parameter
the key causes of the detected fault, so the interpretability method can adjustment. In addition, the features extracted by the deep learning
also be used to develop fault identification methods. Peng et al. [120] model are usually abstract, and it is difficult to give interpretable in­
proposed a smooth integrated gradients-based method to diagnose formation of the process, while the mechanism knowledge can effec­
faulty variables. Kanno et al. [46] utilized Grad-CAM to describe the tively explain the operation state of the process, so it can also be used as
importance of variables to the detected fault type. supervision to guild the network to extract more interpretable features.
On the other hand, the powerful feature extraction ability of the deep
6. Prospects learning model indicates that it has the potential to mine the informa­
tion of the internal mechanism of the industrial process. If the abstract
The above literature has reviewed the application of AE in repre­ features can be described in an interpretable way, it can be regarded as
sentation learning, fault detection, and fault diagnosis in modern in­ an effective solution to extend and refine process knowledge.
dustrial processes. However, with the rapid upgrading of industrial
production demands, the industrial mechanism is becoming more and 6.3. Integrated system with the process recovery
more complex, which brings great challenges to data-driven process
modeling and fault information extraction. AE has constructed a general The development of process monitoring methods has mainly focused
framework for feature extraction of process data with different charac­ on fault detection and fault diagnosis tasks, while process recovery is
teristics, and AE-based monitoring strategies have great prospects in rarely considered. Process recovery is generally performed after the fault
dealing with the challenges of modern industrial applications. In this detection and fault diagnosis stages, and the purpose is to recover the
section, some prospects for future research directions are discussed. faulty part of the process under the guidance of the obtained fault in­
formation. For process recovery, some specific manipulations generally
need to be performed. With the improvement of automation in modern
6.1. Multimodal and heterogeneous data processing industrial processes, manipulations can be easily achieved automati­
cally, which makes it possible to realize an integrated monitoring system
Nowadays, various kinds of sensors are installed for monitoring the with automatic process recovery. The key to realizing it is to develop a
process operation state. Instead of traditional process measurements like matching mechanism between fault information and the corresponding
flow, temperature, pressure, etc., some different data types like images, manipulation.
vibration signals, and acoustic data are also recorded which makes the When the fault types that occurred previously have been stored in the
industrial dataset show the characteristics of multimodal and hetero­ fault knowledge base, the fault recovery manipulation can be deter­
geneous. All of these data contain useful process information, for mined based on the predicted fault label. However, if a novel fault type
example, in the task of subsurface defect detection, the image data of the occurs, the same strategy will not be effective. In this situation, a good
surface picture are more helpful [151–153]. However, in order to data representation of the fault is expected, which can indicate more
develop a higher-performance FDD method, various data needs to be specific fault attributes such as fault location, fault unit, etc.
fully utilized, which requires different data descriptions, and it is diffi­ As can be seen from the AE-based pre-training in fault diagnosis task,
cult to manage the information extraction using only a single model. AE has the ability to extract fault information lying in the fault data.

13
J. Qian et al. Chemometrics and Intelligent Laboratory Systems 231 (2022) 104711

However, when the extracted features are used to match the recovery [6] S. Gao, Y. Dai, Y. Li, et al., Augmented flame image soft sensor for combustion
oxygen content prediction, Measurement Science and Technology 34 (1) (2023)
manipulation, they should be able to reflect the commonness and
015401, https://doi.org/10.1088/1361-6501/ac95b5.
uniqueness of different faults. While at the same time, the features are [7] L. Yao, Z. Ge, Cooperative deep dynamic feature extraction and variable time-
also expected to be discriminative, so that the determination of recovery delay estimation for industrial quality prediction[J], IEEE Trans. Ind. Inf. 17 (6)
manipulation can be more easily achieved. (2020) 3782–3792.
[8] Y. Liu, C. Yang, Z. Gao, et al., Ensemble deep kernel learning with application to
Additionally, each recovery manipulation can be represented by a quality prediction in industrial polymerization processes[J], Chemometr. Intell.
sparse vector, such as the one-hot vector. These sparse vectors can be Lab. Syst. 174 (2018) 15–21.
used to guide the network to extract the required fault information, and [9] Y. Liu, Y. Fan, J. Chen, Flame images for oxygen content prediction of combustion
systems using DBN[J], Energy Fuel. 31 (8) (2017) 8776–8783.
when the network is well trained, the specific manipulation can be ob­ [10] Y. LeCun, Y. Bengio, G. Hinton, Deep learning[J], Nature 521 (7553) (2015)
tained through a data mapping. AE and the encoder-decoder framework 436–444.
have the potential to realize it. [11] G.E. Hinton, R.R. Salakhutdinov, Reducing the dimensionality of data with neural
networks[J], Science 313 (5786) (2006) 504–507.
[12] B. Rezaeianjouybari, Y. Shang, Deep learning for prognostics and health
7. Conclusion management: state of the art, challenges, and opportunities, [J]. Measurement
163 (2020), 107929.
[13] S. Khan, T. Yairi, A review on the application of deep learning in system health
A detailed review of AE-based representation learning, fault detec­ management[J], Mech. Syst. Signal Process. 107 (2018) 241–265.
tion, and fault diagnosis is presented in this paper. As a popular deep [14] T. Mikolov, M. Karafiát, L. Burget, et al., Recurrent neural network based
learning model, AE can be used for unsupervised feature extraction language model[C], Interspeech 2 (3) (2010) 1045–1048.
[15] J. Gu, Z. Wang, J. Kuen, et al., Recent advances in convolutional neural networks
without the need to manually construct features. To improve the per­
[J], Pattern Recogn. 77 (2018) 354–377.
formance of feature extraction, the basic AE is modified to several var­ [16] S. Yin, S.X. Ding, A. Haghani, et al., A comparison study of basic data-driven fault
iants such as SAE, DAE, Sparse AE, etc. Moreover, AE can be further diagnosis and process monitoring methods on the benchmark Tennessee Eastman
process[J], J. Process Control 22 (9) (2012) 1567–1581.
extended to an encoder-decoder framework, where the modules in the
[17] Z. Yang, B. Xu, W. Luo, et al., Autoencoder-based representation learning and its
encoder and decoder parts can be changed. This framework enables AE application in intelligent fault diagnosis: a review[J], Measurement (2021),
to deal with different data types and meet specific feature extraction 110460.
requirements. [18] Y. Bengio, A. Courville, P. Vincent, Representation learning: a review and new
perspectives[J], IEEE Trans. Pattern Anal. Mach. Intell. 35 (8) (2013) 1798–1828.
In addition, this paper reviews the AE-based representation learning [19] P. Baldi, K. Hornik, Neural networks and principal component analysis: learning
from the aspects of process characteristics and feature extraction stra­ from examples without local minima, [J]. Neural netw. 2 (1) (1989) 53–58.
tegies, which are also the key considerations for the development of [20] D. Kunin, J. Bloom, A. Goeva, et al., Loss Landscapes of Regularized Linear
autoencoders[C]//International Conference on Machine Learning, PMLR, 2019,
process monitoring models. Effective data representation is the prereq­ pp. 3560–3569.
uisite for improving monitoring performance, but to complete the final [21] F. Zou, L. Shen, Z. Jie, et al., A Sufficient Condition for Convergences of Adam and
monitoring tasks, effective monitoring strategies are also needed. rmsprop[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition, 2019, pp. 11127–11135.
Therefore, according to the different monitoring tasks and model [22] Y. Bengio, P. Lamblin, D. Popovici, et al., Greedy layer-wise training of deep
structures, this paper further summarizes the state-of-the-art monitoring networks[J], Adv. Neural Inf. Process. Syst. 19 (2006).
strategies in fault detection and fault diagnosis applications. At last, [23] J. Xu, L. Xiang, Q. Liu, et al., Stacked sparse autoencoder (SSAE) for nuclei
detection on breast cancer histopathology images[J], IEEE Trans. Med. Imag. 35
some prospects are given for future research. It is believed that AE-based
(1) (2015) 119–130.
models have great potential to help modern monitoring systems be more [24] S. Rifai, P. Vincent, X. Muller, et al., Contractive Auto-Encoders: Explicit
automatic and intelligent. Invariance during Feature extraction[C]//Icml, 2011.
[25] G. Alain, Y. Bengio, What regularized auto-encoders learn from the data-
generating distribution[J], J. Mach. Learn. Res. 15 (1) (2014) 3563–3593.
Declaration of competing interest [26] P. Vincent, H. Larochelle, Y. Bengio, et al., Extracting and Composing Robust
Features with Denoising autoencoders[C]//Proceedings of the 25th International
Conference on Machine Learning, 2008, pp. 1096–1103.
The authors declare that they have no known competing financial
[27] Y. Bengio, L. Yao, G. Alain, et al., Generalized denoising auto-encoders as
interests or personal relationships that could have appeared to influence generative models[J], Adv. Neural Inf. Process. Syst. 26 (2013).
the work reported in this paper. [28] D.P. Kingma, M. Welling, Auto-encoding variational bayes[J], arXiv preprint
arXiv:1312.6114 (2013), https://doi.org/10.48550/arXiv.1312.6114.
[29] T. Mikolov, M. Karafiát, L. Burget, et al., Recurrent neural network based
Data availability language model[C], Interspeech 2 (3) (2010) 1045–1048.
[30] M. Sundermeyer, R. Schlüter, H. Ney, LSTM Neural Networks for Language
No data was used for the research described in the article. modeling[C]//Thirteenth Annual Conference of the International Speech
Communication Association, 2012.
[31] J. Chung, C. Gulcehre, K.H. Cho, et al., Empirical evaluation of gated recurrent
Acknowledgement neural networks on sequence modeling[J], arXiv preprint arXiv:1412.3555
(2014), https://doi.org/10.48550/arXiv.1412.3555.
[32] J. Gu, Z. Wang, J. Kuen, et al., Recent advances in convolutional neural networks
This work was supported in part by the National Natural Science [J], Pattern Recogn. 77 (2018) 354–377.
Foundation of China (Grant No. 61933013, 61833014) and in part by [33] S. Kiranyaz, O. Avci, O. Abdeljaber, et al., 1D convolutional neural networks and
the Natural Science Foundation of Zhejiang Province (Grant No. applications: a survey[J], Mech. Syst. Signal Process. 151 (2021), 107398.
[34] I. Sutskever, O. Vinyals, Q.V. Le, Sequence to sequence learning with neural
LQ21F030018). networks[J], Adv. Neural Inf. Process. Syst. 27 (2014).
[35] H. Huang, X. Hu, Y. Zhao, et al., Modeling task fMRI data via deep convolutional
References autoencoder[J], IEEE Trans. Med. Imag. 37 (7) (2017) 1551–1561.
[36] C. Zhang, D. Song, Y. Chen, et al., A deep neural network for unsupervised
anomaly detection and diagnosis in multivariate time series data[C], Proc. AAAI
[1] S.J. Qin, Survey on data-driven industrial process monitoring and diagnosis[J],
Conf. Artif. Intell. 33 (2019) 1409–1416, 01.
Annu. Rev. Control 36 (2) (2012) 220–234.
[37] S. Zhang, T. Qiu, A dynamic-inner convolutional autoencoder for process
[2] Z. Ge, Review on data-driven modeling and monitoring for plant-wide industrial
monitoring[J], Comput. Chem. Eng. 158 (2022), 107654.
processes[J], Chemometr. Intell. Lab. Syst. 171 (2017) 16–25.
[38] Z. Zhang, J. Zhu, Z. Ge, Industrial Process Modeling and Fault Detection with
[3] S.J. Qin, Y. Dong, Q. Zhu, et al., Bridging systems theory and data science: a
Recurrent Kalman Variational Autoencoder[C]//2020 IEEE 9th Data Driven
unifying review of dynamic latent variable analytics and process monitoring[J],
Control and Learning Systems Conference (DDCLS), IEEZE, 2020, pp. 1370–1376.
Annu. Rev. Control 50 (2020) 29–48.
[39] P. Song, C. Zhao, B. Huang, SFNet: a slow feature extraction network for parallel
[4] B. De Ketelaere, M. Hubert, E. Schmitt, Overview of PCA-based statistical process-
linear and nonlinear dynamic process monitoring[J], Neurocomputing 488
monitoring methods for time-dependent, high-dimensional data[J], J. Qual.
(2022) 359–380.
Technol. 47 (4) (2015) 318–335.
[40] L. Jiang, Z. Ge, Z. Song, Semi-supervised fault classification based on dynamic
[5] J. Dong, K. Zhang, Y. Huang, et al., Adaptive total PLS based quality-relevant
Sparse Stacked auto-encoders model[J], Chemometr. Intell. Lab. Syst. 168 (2017)
process monitoring with application to the Tennessee Eastman process[J],
72–83.
Neurocomputing 154 (2015) 77–85.

14
J. Qian et al. Chemometrics and Intelligent Laboratory Systems 231 (2022) 104711

[41] J. Yin, X. Yan, Mutual information–dynamic stacked sparse autoencoders for fault [71] Y. Wang, H. Yang, X. Yuan, et al., Deep learning for fault-relevant feature
detection[J], Ind. Eng. Chem. Res. 58 (47) (2019) 21614–21624. extraction and fault classification with stacked supervised auto-encoder[J],
[42] P. Agarwal, M. Tamer, H. Budman, Explainability: relevance based dynamic deep J. Process Control 92 (2020) 79–89.
learning algorithm for fault detection and diagnosis in chemical processes[J], [72] F. Yu, J. Liu, D. Liu, et al., Supervised convolutional autoencoder-based fault-
Comput. Chem. Eng. 154 (2021), 107467. relevant feature learning for fault diagnosis in industrial processes[J], J. Taiwan
[43] N. Li, H. Shi, B. Song, et al., Temporal-spatial neighborhood enhanced sparse Inst. Chem. Eng. 132 (2022), 104200.
autoencoder for nonlinear dynamic process, Monitoring[J]. Processes 8 (9) [73] S. Yan, X. Yan, Using labeled autoencoder to supervise neural network combined
(2020) 1079. with k-nearest neighbor for visual industrial process monitoring[J], Ind. Eng.
[44] J. Ren, D. Ni, A batch-wise LSTM-encoder decoder network for batch process Chem. Res. 58 (23) (2019) 9952–9958.
monitoring[J], Chem. Eng. Res. Des. 164 (2020) 102–112. [74] Z. Pan, Y. Wang, X. Yuan, et al., A classification-driven neuron-grouped SAE for
[45] F. Cheng, Q.P. He, J. Zhao, A novel process monitoring approach based on feature representation and its application to fault classification in chemical
variational recurrent autoencoder[J], Comput. Chem. Eng. 129 (2019), 106515. processes[J], Knowl. Base Syst. 230 (2021), 107350.
[46] Y. Kanno, H. Kaneko, Deep Convolutional Neural Network with Deconvolution [75] J. Yin, X. Yan, Stacked sparse autoencoders monitoring model based on fault-
and a Deep Autoencoder for Fault Detection and Diagnosis[J], ACS omega, 2022. related variable selection[J], Soft Comput. 25 (5) (2021) 3531–3543.
[47] X. Liu, J. Yu, L. Ye, Residual attention convolutional autoencoder for feature [76] X. Zhang, H. Zhang, Z. Song, Feature-aligned Stacked Autoencoder: A Novel Semi-
learning and fault detection in nonlinear industrial processes[J], Neural Comput. supervised Deep Learning Model for Pattern Classification of Industrial Faults[J],
Appl. 33 (19) (2021) 12737–12753. IEEE Transactions on Artificial Intelligence, 2021.
[48] M. Maggipinto, A. Beghi, G.A. Susto, A Deep Convolutional Autoencoder-Based [77] S. Yan, X. Yan, Design teacher and supervised dual stacked auto-encoders for
Approach for Anomaly Detection with Industrial, Non-images, 2-Dimensional quality-relevant fault detection in industrial process[J], Appl. Soft Comput. 81
Data: A Semiconductor Manufacturing Case Study[J], IEEE Transactions on (2019), 105526.
Automation Science and Engineering, 2022. [78] S. Yan, X. Yan, Quality-driven autoencoder for nonlinear quality-related and
[49] S. Chen, J. Yu, S. Wang, One-dimensional convolutional auto-encoder-based process-related fault detection based on least-squares regularization and
feature learning for fault diagnosis of multivariate processes[J], J. Process enhanced statistics[J], Ind. Eng. Chem. Res. 59 (26) (2020) 12136–12143.
Control 87 (2020) 54–67. [79] S. Yan, X. Yan, Quality-relevant fault detection based on adversarial learning and
[50] J. Yu, X. Liu, One-dimensional residual convolutional auto-encoder for fault distinguished contribution of latent variables to quality[J], J. Manuf. Syst. 61
detection in complex industrial processes[J], Int. J. Prod. Res. (2021) 1–20. (2021) 536–545.
[51] J. Yu, C. Zhang, S. Wang, Multichannel one-dimensional convolutional neural [80] K. Wang, X. Yuan, J. Chen, et al., Supervised and semi-supervised probabilistic
network-based feature learning for fault diagnosis of industrial processes[J], learning with deep neural networks for concurrent process-quality monitoring[J],
Neural Comput. Appl. 33 (8) (2021) 3085–3104. Neural Network. 136 (2021) 54–62.
[52] J. Yu, C. Zhang, S. Wang, Sparse one-dimensional convolutional neural network- [81] P. Tang, K. Peng, J. Dong, Y variational information bottleneck and variational
based feature learning for fault detection and diagnosis in multivariable autoencoder[J], ISA Trans. 114 (2021) 444–454.
manufacturing processes[J], Neural Comput. Appl. 34 (6) (2022) 4343–4366. [82] J. Zhu, H. Shi, B. Song, et al., Information concentrated variational auto-encoder
[53] H. Gao, C. Wei, W. Huang, et al., Multimode process monitoring based on for quality-related nonlinear process monitoring[J], J. Process Control 94 (2020)
hierarchical mode identification and stacked denoising autoencoder[J], Chem. 12–25.
Eng. Sci. 253 (2022), 117556. [83] Y.L. He, K. Li, N. Zhang, et al., fault diagnosis using improved discrimination
[54] Y. Zhou, X. Ren, S. Li, Nonlinear non-Gaussian and multimode process locality preserving projections integrated with sparse autoencoder[J], IEEE
monitoring-based multi-subspace vine copula and deep neural network[J], Ind. Trans. Instrum. Meas. 70 (2021) 1–8.
Eng. Chem. Res. 59 (32) (2020) 14385–14397. [84] W. Lu, X. Yan, Industrial Process Data Visualization Based on a Deep Enhanced T-
[55] F. Lv, X. Fan, C. Wen, et al., Stacked Sparse Auto Encoder Network Based Distributed Stochastic Neighbor Embedding Neural network[J], Assembly
Multimode Process monitoring[C]//2018 International Conference on Control, Automation, 2022.
Automation and Information Sciences (Iccais), IEEE, 2018, pp. 227–232. [85] Y. Wang, C. Liu, X. Yuan, Stacked locality preserving autoencoder for feature
[56] W. Lu, X. Yan, Deep model based on mode elimination and Fisher criterion extraction and its application for industrial process data modeling[J],
combined with self-organizing map for visual multimodal chemical process Chemometr. Intell. Lab. Syst. 203 (2020), 104086.
monitoring[J], Inf. Sci. 562 (2021) 13–27. [86] J. Yu, C. Zhang, Manifold regularized stacked autoencoders-based feature
[57] K. Wang, Z. Guo, Y. Wang, et al., Common and specific deep feature learning for fault detection in industrial processes[J], J. Process Control 92
representation for multimode process monitoring using a novel variable-wise (2020) 119–136.
weighted parallel network[J], Eng. Appl. Artif. Intell. 104 (2021), 104381. [87] C. Zhang, J. Yu, L. Ye, Sparsity and manifold regularized convolutional auto-
[58] Y.C. Zhou, M.Q. Li, L.B. Ji, Denoising deep autoencoder Gaussian mixture model encoders-based feature learning for fault detection of multivariate processes[J],
and its application for robust nonlinear industrial process monitoring[C]//2021 Control Eng. Pract. 111 (2021), 104811.
international conference on computer information science and artificial [88] J. Yu, X. Yan, A new deep model based on the stacked autoencoder with
intelligence (CISAI), IEEE (2021) 67–73. intensified iterative learning style for industrial fault detection[J], Process Saf.
[59] P. Tang, K. Peng, J. Dong, et al., Monitoring of nonlinear processes with multiple Environ. Protect. 153 (2021) 47–59.
operating modes through a novel Gaussian mixture variational autoencoder [89] F. Cheng, J. Zhao, A novel process monitoring approach based on Feature Points
model[J], IEEE Access 8 (2020) 114487–114500. Distance Dynamic Autoencoder[M], Comput. Aid. Chem. Eng. 46 (2019)
[60] W. Yu, C. Zhao, Robust monitoring and fault isolation of nonlinear industrial 757–762. Elsevier.
processes using denoising autoencoder and elastic net[J], IEEE Trans. Control [90] D. Kong, X. Yan, Adaptive parameter tuning stacked autoencoders for process
Syst. Technol. 28 (3) (2019) 1083–1091. monitoring[J], Soft Comput. 24 (17) (2020) 12937–12951.
[61] J. Liu, L. Xu, Y. Xie, et al., Toward Robust Fault Identification of Complex [91] G.S. Chadha, A. Rabbani, A. Schwung, Comparison of semi-supervised deep
Industrial Processes Using Stacked Sparse-Denoising Autoencoder with Softmax neural networks for anomaly detection in industrial processes[C]//2019 IEEE
Classifier[J], IEEE Transactions on Cybernetics, 2021. 17th International Conference on Industrial Informatics (INDIN), IEEE 1 (2019)
[62] S. Chen, Q. Jiang, Distributed robust process monitoring based on optimized 214–219.
denoising autoencoder with reinforcement learning[J], IEEE Trans. Instrum. [92] P. Park, P.D. Marco, H. Shin, et al., Fault detection and diagnosis using combined
Meas. 71 (2022) 1–11. autoencoder and long short-term memory network[J], Sensors 19 (21) (2019)
[63] H. Lee, Y. Kim, C.O. Kim, A deep learning model for robust wafer fault monitoring 4612.
with sensor measurement noise[J], IEEE Trans. Semicond. Manuf. 30 (1) (2016) [93] T. Zhang, W. Wang, H. Ye, et al., Fault Detection for Ironmaking Process Based on
23–31. Stacked Denoising autoencoders[C]//2016 American Control Conference (ACC),
[64] L. Jiang, Z. Song, Z. Ge, et al., Robust self-supervised model and its application for IEEE, 2016, pp. 3261–3267.
fault detection[J], Ind. Eng. Chem. Res. 56 (26) (2017) 7503–7515. [94] T. Mao, Y. Zhang, H. Zhou, et al., Data Driven Injection Molding Process
[65] W. Yan, P. Guo, Z. Li, Nonlinear and robust statistical process monitoring based Monitoring Using Sparse Auto Encoder technique[C]//2015 IEEE International
on variant autoencoders[J], Chemometr. Intell. Lab. Syst. 158 (2016) 31–40. Conference on Advanced Intelligent Mechatronics (AIM), IEEE, 2015,
[66] Z. Hu, H. Zhao, J. Peng, Low-rank reconstruction-based autoencoder for robust pp. 524–528.
fault detection[J], Control Eng. Pract. 123 (2022), 105156. [95] J. Yu, X. Liu, L. Ye, Convolutional long short-term memory autoencoder-based
[67] S.J. Choudhury, N.R. Pal, Imputation of missing data with neural networks for feature learning for fault detection in industrial processes[J], IEEE Trans.
classification[J], Knowl. Base Syst. 182 (2019), 104838. Instrum. Meas. 70 (2020) 1–15.
[68] J.T. McCoy, S. Kroon, L. Auret, Variational autoencoders for missing data [96] Y. Kanno, H. Kaneko, Deep Convolutional Neural Network with Deconvolution
imputation with application to a simulated milling circuit[J], IFAC-PapersOnLine and a Deep Autoencoder for Fault Detection and Diagnosis[J], ACS omega, 2022.
51 (21) (2018) 141–146. [97] T. Toikka, J. Laitinen, K.T. Koskinen, Failure Detection and Isolation by LSTM
[69] A.H. Ba-Alawi, J. Loy-Benitez, S.Y. Kim, et al., Missing data imputation and Autoencoder[C]//World Congress on Engineering Asset Management, Springer,
sensor self-validation towards a sustainable operation of wastewater treatment Cham, 2021, pp. 390–399.
plants via deep variational residual autoencoders[J], Chemosphere 288 (2022), [98] J. Yu, X. Yan, Deep unLSTM network: features with memory information
132647. extracted from unlabeled data and their application on industrial unsupervised
[70] K. Jiang, Z. Jiang, Y. Xie, et al., Ironmaking process based on stacked dynamic industrial fault detection[J], Appl. Soft Comput. 108 (2021), 107382.
target-driven denoising autoencoders[J], IEEE Trans. Ind. Inf. 18 (3) (2021) [99] M. Dix, A. Chouhan, S. Ganguly, et al., Anomaly detection in the time-series data
1854–1863. of industrial plants using neural network architectures[C]//2021 IEEE seventh
international conference on big data computing service and applications
(BigDataService), IEEE (2021) 222–228.

15
J. Qian et al. Chemometrics and Intelligent Laboratory Systems 231 (2022) 104711

[100] A.H. Ba-Alawi, P. Vilela, J. Loy-Benitez, et al., Intelligent sensor validation for [128] J. Li, X. Yan, Process monitoring using principal component analysis and stacked
sustainable influent quality monitoring in wastewater treatment plants using autoencoder for linear and nonlinear coexisting industrial processes[J], J. Taiwan
stacked denoising autoencoders[J], J. Water Proc. Eng. 43 (2021), 102206. Inst. Chem. Eng. 112 (2020) 322–329.
[101] Y. Choi, S. Yoon, Autoencoder-driven fault detection and diagnosis in building [129] J. Yu, X. Yan, Data-Feature-Driven Nonlinear Process Monitoring Based on Joint
automation systems: residual-based and latent space-based approaches[J], Build. Deep Learning Models with Dual-Scale[J], Information Sciences, 2022.
Environ. 203 (2021), 108066. [130] S. Li, J. Luo, Y. Hu, Towards Interpretable Process Monitoring: Slow Feature
[102] F. Cheng, W. Cai, H. Liao, et al., fault detection and isolation for chiller system Analysis Aided Autoencoder for Spatiotemporal Process Feature Learning[J],
based on deep autoencoder[C]//2021 IEEE 16th conference on industrial IEEE Transactions on Instrumentation and Measurement, 2021.
electronics and applications (ICIEA), IEEE (2021) 1702–1706. [131] Z. Ren, W. Zhang, Z. Zhang, A deep nonnegative matrix factorization approach
[103] J. Zhu, H. Shi, B. Song, et al., Nonlinear process monitoring based on load via autoencoder for nonlinear fault detection[J], IEEE Trans. Ind. Inf. 16 (8)
weighted denoising autoencoder, [J]. Measurement 171 (2021), 108782. (2019) 5042–5052.
[104] D. Kong, X. Yan, Industrial process deep feature representation by regularization [132] Q. Jiang, S. Yan, X. Yan, et al., Data-driven two-dimensional deep correlated
strategy autoencoders for process monitoring[J], Meas. Sci. Technol. 31 (2) representation learning for nonlinear batch process monitoring[J], IEEE Trans.
(2019), 025104. Ind. Inf. 16 (4) (2019) 2839–2848.
[105] J. Fan, W. Wang, H. Zhang, AutoEncoder Based High-Dimensional Data Fault [133] P. Chopra, S.K. Yadav, Fault detection and classification by unsupervised feature
Detection system[C]//2017 Ieee 15th International Conference on Industrial extraction and dimensionality reduction[J], Complex & int.syst. 1 (1) (2015)
Informatics (Indin), IEEE, 2017, pp. 1001–1006. 25–33.
[106] H. Zhang, P. Wang, X. Gao, et al., Automated Fault Detection Using Convolutional [134] R. Thirukovalluru, S. Dixit, R.K. Sevakula, et al., Generating Feature Sets for Fault
Auto Encoder and K Nearest Neighbor Rule for Semiconductor Manufacturing Diagnosis Using Denoising Stacked auto-encoder[C]//2016 IEEE International
Processes[C]//2020 3rd International Conference on Intelligent Autonomous Conference on Prognostics and Health Management (ICPHM), IEEE, 2016,
Systems (ICoIAS), IEEE, 2020, pp. 83–87. pp. 1–7.
[107] Z. Zhang, T. Jiang, S. Li, et al., Automated feature learning for nonlinear process [135] Y. Qiu, Y. Dai, A stacked auto-encoder based fault diagnosis model for chemical
monitoring–An approach using stacked denoising autoencoder and k-nearest process[M], Comput. Aid. Chem. Eng. 46 (2019) 1303–1308. Elsevier.
neighbor rule[J], J. Process Control 64 (2018) 49–61. [136] A. He, X. Jin, Deep variational autoencoder classifier for intelligent fault diagnosis
[108] J. Jang, B.W. Min, C.O. Kim, Denoised residual trace analysis for monitoring adaptive to unseen fault categories[J], IEEE Trans. Reliab. 70 (4) (2021)
semiconductor process faults[J], IEEE Trans. Semicond. Manuf. 32 (3) (2019) 1581–1595.
293–301. [137] C. Li, D. Zhao, S. Mu, et al., Fault diagnosis for distillation process based on
[109] Z. Zhang, T. Jiang, C. Zhan, et al., Gaussian feature learning based on variational CNN–DAE[J], Chin. J. Chem. Eng. 27 (3) (2019) 598–604.
autoencoder for improving nonlinear process monitoring[J], J. Process Control 75 [138] X. Luo, X. Li, Z. Wang, et al., Discriminant autoencoder for feature extraction in
(2019) 136–155. fault diagnosis[J], Chemometr. Intell. Lab. Syst. 192 (2019), 103814.
[110] K. Wang, M.G. Forbes, B. Gopaluni, et al., Systematic development of a new [139] X. Zhao, M. Jia, M. Lin, Deep Laplacian Auto-encoder and its application into
variational autoencoder model based on uncertain data for monitoring nonlinear imbalanced fault diagnosis of rotating machinery[J], Measurement 152 (2020),
processes[J], IEEE Access 7 (2019) 22554–22565. 107320.
[111] X. Bi, J. Zhao, A novel orthogonal self-attentive variational autoencoder method [140] J. Yang, G. Xie, Y. Yang, An improved ensemble fusion autoencoder model for
for interpretable chemical process fault detection and identification[J], Process fault diagnosis from imbalanced and incomplete data[J], Control Eng. Pract. 98
Saf. Environ. Protect. 156 (2021) 581–597. (2020), 104358.
[112] G.R. Terrell, D.W. Scott, Variable Kernel Density estimation[J], The Annals of [141] Z. Deng, Z. Wang, Z. Tang, et al., A deep transfer learning method based on
Statistics, 1992, pp. 1236–1265. stacked autoencoder for cross-domain fault diagnosis[J], Appl. Math. Comput.
[113] Y.C. Chen, A tutorial on kernel density estimation and recent advances[J], 408 (2021), 126318.
Biostat. Epidemiol. 1 (1) (2017) 161–187. [142] L. Wen, L. Gao, X. Li, A new deep transfer learning based on sparse auto-encoder
[114] Q. Jiang, X. Yan, B. Huang, Deep discriminative representation learning for for fault diagnosis[J], IEEE Trans. syst. man cyber.: Systems 49 (1) (2017)
nonlinear process fault detection[J], IEEE Trans. Autom. Sci. Eng. 17 (3) (2019) 136–144.
1410–1419. [143] S. Zhang, T. Qiu, Semi-supervised LSTM ladder autoencoder for chemical process
[115] K. Qiu, W. Song, P. Wang, Abnormal Data Detection for Industrial Processes Using fault diagnosis and localization[J], Chem. Eng. Sci. 251 (2022), 117467.
Adversarial Autoencoders Support Vector Data Description Data description[J], [144] J.A. Westerhuis, S.P. Gurden, A.K. Smilde, Generalized contribution plots in
Measurement Science and Technology, 2022. multivariate statistical process monitoring, J]. Chemometr. int. lab. syst. 51 (1)
[116] M.A. Chao, B.T. Adey, O. Fink, Implicit supervision for fault detection and (2000) 95–114.
segmentation of emerging fault types with Deep Variational Autoencoders[J], [145] D. Hallgrímsson Á, H.H. Niemann, M. Lind, Improved process diagnosis using
Neurocomputing 454 (2021) 324–338. fault contribution plots from sparse autoencoders[J], IFAC-PapersOnLine 53 (2)
[117] R. Arunthavanathan, F. Khan, S. Ahmed, et al., Fault detection and diagnosis in (2020) 730–737.
process system using artificial intelligence-based cognitive technique[J], Comput. [146] C.F. Alcala, S.J. Qin, Reconstruction-based contribution for process monitoring
Chem. Eng. 134 (2020), 106697. [J], Automatica 45 (7) (2009) 1593–1600.
[118] G. Zhang, Z. Tang, J. Zhang, et al., Convolutional autoencoder-based flaw [147] J. Qian, L. Jiang, Z. Song, Locally linear back-propagation based contribution for
detection for steel wire ropes[J], Sensors 20 (22) (2020) 6612. nonlinear process fault diagnosis[J], IEEE/CAA J. Autom. Sinica 7 (3) (2020)
[119] G.S. Chadha, A. Schwung, Comparison of Deep Neural Network Architectures for 764–775.
Fault Detection in Tennessee Eastman process[C]//2017 22nd IEEE International [148] J. Qian, C. Wei, Q. Zhang, et al., Adaptive positive semidefinite matrix-based
Conference on Emerging Technologies and Factory Automation (ETFA), IEEE, contribution for nonlinear process diagnosis[J], Ind. Eng. Chem. Res. 60 (21)
2017, pp. 1–8. (2021) 7868–7882.
[120] P. Peng, Y. Zhang, H. Wang, et al., Towards Robust and Understandable Fault [149] P. Tang, K. Peng, R. Jiao, A process monitoring and fault isolation framework
Detection and Diagnosis Using Denoising Sparse Autoencoder and Smooth based on variational autoencoders and branch and bound method[J], J. Franklin
Integrated gradients[J], ISA transactions, 2021. Inst. 359 (2) (2022) 1667–1691.
[121] M. Qian, Y.F. Li, T. Han, Positive-unlabeled learning based hybrid deep network [150] J. Dong, R. Sun, K. Peng, et al., Quality monitoring and root cause diagnosis for
for intelligent fault detection[J], IEEE Trans. Ind. Inf. 18 (7) (2021) 4510–4519. industrial processes based on Lasso-SAE-CCA[J], IEEE Access 7 (2019)
[122] H. Wu, J. Zhao, Self-adaptive deep learning for multimode process monitoring[J], 90230–90242.
Comput. Chem. Eng. 141 (2020), 107024. [151] K. Liu, M. Zheng, Y. Liu, et al., Deep autoencoder thermography for defect
[123] Z. Li, L. Tian, Q. Jiang, et al., Distributed-ensemble stacked autoencoder model detection of carbon fiber composites[J], IEEE Trans. Ind. Inf. (2022).
for non-linear process monitoring[J], Inf. Sci. 542 (2021) 302–316. [152] K. Liu, Q. Yu, Y. Liu, et al., Convolutional graph thermography for subsurface
[124] S. Chen, Q. Jiang, Distributed robust process monitoring based on optimized defect detection in polymer composites[J], IEEE Trans. Instrum. Meas. 71 (2022)
denoising autoencoder with reinforcement learning[J], IEEE Trans. Instrum. 1–11.
Meas. 71 (2022) 1–11. [153] K. Liu, Y. Li, J. Yang, et al., Generative principal component thermography for
[125] C Huang, Y Chai, Z Zhu, et al., A Novel Distributed Fault Detection Approach enhanced defect detection and analysis[J], IEEE Trans. Instrum. Meas. 69 (10)
Based on the Variational Autoencoder Model[J], ACS omega 7 (3) (2022) (2020) 8261–8269.
2996–3006. [154] T.B. Duman, B. Bayram, G. İnce, Acoustic Anomaly Detection Using
[126] P. Agarwal, M. Aghaee, M. Tamer, et al., A novel unsupervised approach for batch Convolutional Autoencoders in Industrial processes[C]//International Workshop
process monitoring using deep learning[J], Comput. Chem. Eng. 159 (2022), on Soft Computing Models in Industrial and Environmental Applications,
107694. Springer, Cham, 2019, pp. 432–442.
[127] K. Wang, J. Chen, Z. Song, et al., Deep Neural Network-Embedded Stochastic [155] J. Yu, X. Zheng, J. Liu, Stacked convolutional sparse denoising auto-encoder for
Nonlinear State-Space Models and Their Applications to Process monitoring[J], identification of defect patterns in semiconductor wafer map[J], Comput. Ind.
IEEE Transactions on Neural Networks and Learning Systems, 2021. 109 (2019) 121–133.

16

You might also like