You are on page 1of 13

Computers and Chemical Engineering 115 (2018) 185–197

Contents lists available at ScienceDirect

Computers and Chemical Engineering


journal homepage: www.elsevier.com/locate/compchemeng

Deep convolutional neural network model based chemical process


fault diagnosis
Hao Wu a, Jinsong Zhao a,b,∗
a
State Key Laboratory of Chemical Engineering, Department of Chemical Engineering, Tsinghua University, Beijing, China
b
Beijing Key Laboratory of Industrial Big Data System and Application, Tsinghua University, Beijing, China

a r t i c l e i n f o a b s t r a c t

Article history: Numerous accidents in chemical processes have caused emergency shutdowns, property losses, casualties
Received 17 November 2017 and/or environmental disruptions in the chemical process industry. Fault detection and diagnosis (FDD)
Revised 4 April 2018
can help operators timely detect and diagnose abnormal situations, and take right actions to avoid ad-
Accepted 9 April 2018
verse consequences. However, FDD is still far from widely practical applications. Over the past few years,
Available online 11 April 2018
deep convolutional neural network (DCNN) has shown excellent performance on machine-learning tasks.
Keywords: In this paper, a fault diagnosis method based on a DCNN model consisting of convolutional layers, pooling
Fault diagnosis layers, dropout, fully connected layers is proposed for chemical process fault diagnosis. The benchmark
Deep convolutional neural network Tennessee Eastman (TE) process is utilized to verify the outstanding performance of the fault diagnosis
Alarm management method.
Tennessee Eastman process © 2018 Elsevier Ltd. All rights reserved.

1. Introduction based methods, data based methods usually stand out among the
three categories, especially in the era of internet of things.
With the wide application of distributed control systems (DCS) Data based FDD methods can be classified into statistical meth-
during the past three decades, chemical processes have become ods, shallow learning methods and deep learning methods. Statis-
more and more automated. However, there have been many tragic tical methods include principle component analysis (PCA) (Wise
chemical process accidents, which have resulted in fatalities as et al., 1990; Russell et al., 20 0 0; Cho et al., 20 05; Rato et al.,
well as asset, environmental damages. It seems that we have not 2016), independent component analysis (ICA) (Kano et al., 2003;
tackled the process safety issues head on. One critical protection Lee et al., 2007; Hsu et al., 2010; Ge et al., 2012; Fan and Wang,
layer is still missing. This layer currently is relying upon operators 2014), partial least squares (PLS) (MacGregor et al., 1994; Plovoso
who need to be timely aware of abnormal situations and make cor- and Kosanovich, 1994; Zhang and Hu, 2011), fisher discriminant
rective decisions. However, it requires safety intelligence of opera- analysis (FDA) (Chiang et al., 20 0 0; Zhu and Song, 2011), quali-
tors. Different operators may have different levels of safety intel- tative trend analysis (QTA) (Maurya et al., 20 05, 20 07, 2010) and
ligence. For an operator who is lack of enough safety intelligence, their derivative methods. Based on the industrial benchmark of
it is almost impossible for him/her to play this critical role. There- Tennessee Eastman (TE) process, a comparison study on these sta-
fore, there has been an industrial need to develop an intelligent tistical methods was conducted for process monitoring and fault
fault detection and diagnosis (FDD) system to assist operators in diagnosis (Yin et al., 2012).
handling abnormal situations. Shallow learning methods include support vector machine
Various FDD methods have been proposed in literature so far. (SVM) (Chiang et al., 2004; Kulkarni et al., 2005; Zhang, 2008; Ma-
Generally, the FDD methods can be classified into three categories: hadevan and Shah, 2009; Yélamos et al., 2009;), artificial immune
knowledge based, model based and data based methods. The pro- system (AIS) (Dai and Zhao, 2011; Ghosh and Srinivasan, 2011; Shu
cess model-based methods can be further classified into qualitative and Zhao, 2016), k-nearest neighbor (KNN) (He and Wang, 2007),
model based and quantitative model based methods (Eslamloueyan Gaussian mixture model (GMM) (Choi et al., 2004; Yu and Qin,
et al., 2003; Venkatasubramanian et al., 2003a, 2003b, 2003c). Due 2008) and artificial neural network (ANN) (Venkatasubramanian
to the insurmountable drawbacks of knowledge based and model and Chan, 1989; Watanabe et al., 1989; Fan et al., 1993). Shallow
learning methods have been successfully utilized to treat fault di-
agnosis as a classification problem. Furthermore, some neural net-

Corresponding author at: State Key Laboratory of Chemical Engineering, Depart-
works derived from ANN were developed for FDD, such as hier-
ment of Chemical Engineering, Tsinghua University, Beijing, China.
archical ANN (HANN) (Watanabe et al., 1994; Eslamloueyan et al.,
E-mail address: jinsongzhao@tsinghua.edu.cn (J. Zhao).

https://doi.org/10.1016/j.compchemeng.2018.04.009
0098-1354/© 2018 Elsevier Ltd. All rights reserved.
186 H. Wu, J. Zhao / Computers and Chemical Engineering 115 (2018) 185–197

2003), Duty-Oriented HANN (DOHANN) (Eslamloueyan, 2011) and cess fault diagnosis method in detail. Section 4 shows the ex-
supervised local multilayer perceptron (SLMLP) (Ayubi and Yazdan- periment result of the fault diagnosis on the TE process. Finally,
panah, 2015). Section 5 summarizes this paper.
Despite of the advantages of the above two categories of data
driven methods, FDD is still far from widely practical applications 2. Deep convolutional neural network
due to two major obstacles. One is that these two categories of
FDD methods often require considerable amount of domain exper- Convolutional neural network (CNN) was first proposed in the
tise to determine the fault features in both spatial and temporal late 1980 s, to process data that comes in the form of multiple ar-
domains. The other one is that the fault diagnosis rate is still not rays (LeCun et al., 1989). Since then, it has made great success in
high enough. object detection and recognition in the computer-vision domain.
Over the past few years, deep learning has become an outstand- Over the past few years, several outstanding DCNN architectures
ing technology and has shown better performance than the afore- have been reported, including AlexNet (Krizhevsky et al., 2012),
mentioned methods in many fields. A deep learning architecture is Network in Network (Lin et al., 2013), VGG (Simonyan and Zisser-
a multilayer stack of several simple but non-linear layers, such as man, 2014), GoogLeNet (Szegedy et al., al.,2014), ResNet (He et al.,
restricted Boltzmann machines (RBM), convolutional and pooling 2016), etc. The general function of DCNN includes feature extrac-
layers. The difference between “shallow” and “deep” is that, deep tion and classification. For feature extraction, convolutional lay-
learning emphasizes the importance of feature extraction layer by ers and pooling layers are stacked to transform the raw data into
layer. Each layer can transform the input representation at a low a representation at a higher level. Then fully connected layers
level into a representation at a higher and more abstract level are utilized to classify the transformed representation into cer-
(LeCun et al., 2015). With the composition of enough transforma- tain class. Labeled data are required at the training phase which
tions, complex functions can be learned and the output represen- is based on the backpropagation procedure. At the beginning of
tation of the last layer is easier for pattern recognition tasks. With the training phase, data samples are divided into mini-batches by
the rapid development of deep learning, some deep learning based providing a parameter named “batch size”. The basic method for
methods have been proposed for chemical process fault diagnosis. training neural networks is stochastic gradient descent (SGD). The
A hierarchical deep neural network (HDNN) was proposed for diag- idea of SGD is to update the weights and biases of neurons after
nosing the faults on the TE process (Xie and Li, 2015). The average each batch computation, the aim of which is to minimize the train-
correct classification rate reached 80.5% (except fault 03, 09, 15), ing error. The training of the multilayer architecture is a supervised
which was higher than DOHANN. Lv et al. utilized stacked sparse learning process.
auto encoder neural networks and a softmax classifier for FDD
(Lv et al., 2016). In the latest study, an extensible deep belief net- 2.1. Convolutional layer
work (DBN) based fault diagnosis model was proposed by the re-
search group of the corresponding author (Zhang and Zhao, 2017). In a convolutional layer, the representation of the output is
Features of fault data in spatial and temporal domains were ex- composed of feature maps, within which each unit is connected to
tracted by DBN sub-networks, then a global back-propagation net- a local patch in the input feature map through a filter composed of
work was used for fault classification. Even though the average a set of weights. All units in an output feature map share the same
fault diagnosis rate for all the 20 fault types in the TE process filter (shared weights). Different feature maps in a layer use differ-
reached 82.1%, an all-time high record, it is still far from real ap- ent filters. A typical convolutional layer is shown in Fig. 1. There
plications. Therefore, further researches are still needed. are two reasons for using convolutional layers. First, in array data
The above three deep learning based methods are all based on such as process data, local groups of values are often highly corre-
DBN, which is a stack of RBMs. DBN was first proposed in 2006 lated, and local patterns can be formed to make pattern detection
(Hinton and Salakhutdinov, 2006a; Hinton et al., 2006b), which is or recognition easier. Second, the distinctive local patterns can ap-
widely regarded as the start of deep learning. However, in the fa- pear anywhere in the input feature maps, hence units at different
mous ImageNet Large Scale Visual Recognition Challenge (ILSVRC) locations sharing the same weights can help detect the same pat-
competition in 2012, AlexNet (Krizhevsky et al., 2012), the first terns regardless of their locations (LeCun et al., 2015).
model that used a deep convolutional neural network (DCNN) won In a convolutional layer, assume that there are M feature maps
the champion. Since then, DCNN has brought about a revolution as the input and N filters. Generally, we can use Eq. (1) to calculate
in computer vision and pattern recognition. Now DCNN is the the output feature maps of the lth layer (Bouvrie, 2006):
dominant approach for almost all recognition and detection tasks  

(LeCun et al., 2015). However, DCNN model based fault diagnosis xlj = f xl−1 ∗ kli j + blj , j = 1, . . . , N, (1)
i
method for chemical processes has been paid much less attention. i=1,...,M
The basic architectures of typical convolutional neural networks
mainly contain two types of layers for feature extraction: convo- where kli j represents the kernel of the jth filter connected to the ith
lutional layers and pooling layers. In addition, other layers such input map, xl−1
i
represents the ith input map and xlj represents the
as dropout and fully connected (FC) layers, are also significant for
jth output map, blj represents the bias corresponding to jth filter,
DCNN. Compared with DBN, DCNN has three advantages: the first
f represents the activation function, and ∗ represents the convolu-
one is that convolutional and pooling layers are local-connected
tional operation. In this way, we can obtain N feature maps as the
with filters, which can help extract local patterns or features bet-
output. Assuming that the kernel size is s × s, we can use Eq. (2) to
ter; the second one is that DCNN with similarly sized layers has
compute the number of all the parameters of a convolutional layer:
fewer parameters and requires less computation time; the third
one is that “overfitting” can be avoided by using dropout and pool-
P = N × (s × s × M + 1 ) (2)
ing layers.
This paper presents a DCNN model based fault diagnosis The convolutional operation is shown in Fig. 2, where the size
method for complex chemical processes. The rest of the paper of the input map is 4 × 4 ,the kernel size is 2 × 2, and the stride
is organized as follows: Section 2 introduces the basic theory is 1. After the convolutional operation and the addition of the
of DCNN, including convolutional layers, pooling layers, dropout corresponding bias, an activation function is applied for comput-
and FC layers. Section 3 presents the DCNN based chemical pro- ing the output feature maps. The common activation functions for
H. Wu, J. Zhao / Computers and Chemical Engineering 115 (2018) 185–197 187

Fig. 1. Convolutional layer.

Fig. 2. Convolutional operation.

Fig. 3. Max pooling and average pooling operations.


neural networks include logistic function, hyperbolic tangent func-
tion and rectified linear unit (ReLU) function as shown in Eq. (3)–
(5):
calculate the output feature maps of the lth layer (Bouvrie, 2006):
 
−x −1
f (x ) = 1 + e (3)    l
xlj = f β lj down xl−1
j
+ b j , j = 1, . . . , M, (6)

f (x ) = tanh(x ) (4) where xl−1 represents the jth input map and xlj represents the
j
jth output map, blj and β lj represent the additive bias and the
f (x ) = max(0, x ) (5) multiplicative bias corresponding to the jth filter respectively, f
represents the activation function, and down represents the sub-
However, it has been proved that DCNN with ReLUs can be
sampling function. Generally, β lj , blj can be negligible and there is
trained several times faster than their equivalents with the other
functions (Krizhevsky et al., 2012). Since then, ReLU has become no parameter in pooling layers. In this way, we can obtain M fea-
the first choice for designing a DCNN architecture. ture maps as the output. The max pooling and average pooling op-
erations are shown in Fig. 3, where the size of the input map is
4 × 4 ,the kernel size is 2 × 2, and the stride is 2.
2.2. Pooling layer

A pooling layer is also known as a sub-sampling layer which 2.3. Dropout


follows a convolutional layer and produces down-sampled versions
of the input feature maps. The objective of a pooling layer is to If a large neural network is trained on a small training dataset,
merge similar local features into one. There are three advantages it may have poor performance on the testing dataset. This phe-
for the use of pooling layers. Firstly, because the relative positions nomenon is called “overfitting”. Generally, at the whole training
of the features forming a local pattern may vary slightly, it is more phase, the training error will decrease with the number of itera-
reliable for detection to merge similar features in local positions tions increasing. The testing error will decrease at the beginning
(LeCun et al., 2015). Secondly, pooling layers usually do not have of the training phase, however, will increase later. Fig. 4(a) shows
parameters and can reduce the dimension of the feature represen- the phenomenon of “overfitting”. By using the dropout mechanism,
tation. The use of pooling layers can also greatly reduce the com- the trend of the testing error will be similar to the trend of the
putation time and the parameters of the whole network. Thirdly, training error (see Fig. 4(b)). It is very helpful for overcoming the
pooling layers are beneficial for preventing the “overfitting” (see “overfitting” problem.
Section 2.3). Dropout avoids “overfitting” by randomly omitting some feature
There are two types of pooling operations: max pooling and detectors in each iteration of the training stage (Hinton et al., 2012;
average pooling. A max pooling unit computes the maximum of Srivastava et al., 2014). Generally, in the dropout, we need to set
a local patch of units in a feature map, and an average pooling the retaining probability p to a certain value (for example, p = 0.5)
unit computes the average. The computation procedure in a pool- (see Fig. 5). It will set the output of each neuron to zero with the
ing layer is similar with that in a convolutional layer. In a pooling probability of 0.5 (Krizhevsky et al., 2012). This method can destroy
layer, assuming that there are M feature maps as the input, there complex co-adaptations of hidden neurons and prevent that a neu-
must be M output feature maps. Generally, we can use Eq. (6) to ron is only useful in the context of several other specific neurons.
188 H. Wu, J. Zhao / Computers and Chemical Engineering 115 (2018) 185–197

Fig. 4. The training error and the testing error.

Fig. 5. Dropout neural net model.

The omitted half of the neurons will not be able to make contri-
bution for the forward computation and the back propagation.

2.4. Fully connected layer

Convolutional layers and pooling layers constitute the feature


extractor of the whole DCNN. Following the feature extractor, the
objective of FC layers is to classify the features extracted from
the raw data. FC layers are essentially backpropagation neural net-
works and their input must be one-dimensional vectors. Assume
that the lengths of the input and output vectors are M and N
respectively. Each value of the input vector is connected to each
value of the output vector through one neuron (see Fig. 6). Then
we can use Eq. (7) to calculate the output vector of the lth layer
(Bouvrie, 2006):
 

xlj = f xl−1
i
× wli j + blj , j = 1, . . . , N, (7)
i=1,...,M

where wli j represents the weight of the jth output value connected Fig. 6. Fully connected layer.
to the ith input value, xl−1
i
represents the ith input value and xlj
represents the jth output value, blj represents the bias correspond-
ing to the jth output value, and f represents the activation function.
The computation for the number of all the parameters of a FC layer
is as follows:
P =M×N+N (8) layers are redundant, compared with other types of layers. The pa-
A typical DCNN architecture contains several FC layers in the rameters of FC layers constitute more than 80% of parameters of
end. Due to the design of full connectivity, the parameters of FC the whole DCNN.
H. Wu, J. Zhao / Computers and Chemical Engineering 115 (2018) 185–197 189

Fig. 7. The framework of the DCNN based fault diagnosis method.

3. DCNN based fault diagnosis method belled with their corresponding classes, including “normal”
and their fault types (see Fig. 8(a)).
Faults in chemical processes are essentially states that process Step 3: The sample matrices including their corresponding la-
variables deviate from their normal states. Data from different bels are divided into the training set and the testing set.
state deviations can be used to diagnose the fault types. Most of Step 4: The DCNN model is designed for the chemical process
the published FDD methods only studied the features of fault data (see Fig. 8(b)).
in the spatial domain, while the time-varying features in the tem- Step 5: The DCNN model is trained.
poral domain are relatively less studied. It has been proved that Step 6: The DCNN model is tested.
the time-varying features of fault data are also critical for distin- Step 7: The fault diagnosis result is outputted and visualized
guishing fault types (Maurya et al., 20 05, 20 07, 2010; Zhang and (see Fig. 8(c)).
Zhao, 2017), which is also true even for human experts in per- Step 8: If the fault diagnosis rate in testing is satisfactory, the
forming various diagnosis tasks. Although DCNN is mainly used model will be used for online fault diagnosis; if unsatisfac-
for images in computer vision, it can be used for spectrograms tory, the DCNN model needs to be redesigned (Step 4).
in speech recognition as well. DCNN extracts features from local
patches, therefore, a simple data preprocessing is done in the pro- Online stage:
posed method to set the time-series of process variables of each
Step 1: Online data is collected and preprocessed from the
equipment next to each other. In this way, there exists spatial
chemical process.
relationship along the variable dimension. Similar with spectro-
Step 2: Through the same preprocessing, online data is trans-
grams including time and frequency domains, the data preprocess-
formed into sample matrices with the size of m × n.
ing also transforms the process data into two-dimensional matri-
Step 3: Online sample matrices are input to the DCNN model.
ces (temporal and spatial domains) with the size of m × n, where
The model can give a predicted diagnosis result for each
m represents the length of a certain period of time (sample time
sample matrix. The diagnosis result is either “normal” or one
length) and n represents the number of variables. For example,
specific fault type.
with n = 50, m = 20 (sampling period is set to 3 min), this means
Step 4: If there is a discrepancy between the predicted diagno-
that the data of 50 variables from time t-1 h to time t is used as a
sis result and the judgment of the human experts, the DCNN
sample matrix for diagnosing the status of time t.
model needs to be retrained with the new data.
The framework of the DCNN based fault diagnosis method is
shown in Fig. 7. Its diagnosis procedures including offline and on-
line stages are described as follows: 4. Experiment result
Offline stage:
4.1. Tennessee Eastman process
Step 1: Historical data is collected and preprocessed from the
chemical process. The benchmark Tennessee Eastman (TE) process (Downs and
Step 2: Through the data preprocessing, historical data is trans- Vogel, 1993) is used for showing the advantages of the proposed
formed into sample matrices with the size of m × n and la- DCNN model in this paper. The simulator is based on the revised
190 H. Wu, J. Zhao / Computers and Chemical Engineering 115 (2018) 185–197

Fig. 8. The DCNN model for fault diagnosis.

version (see Fig. 9) which is available at http://depts.washington. refers to Zhang’s research (Zhang and Zhao, 2017). The sampling
edu/control/LARRY/TE/download.html (Bathelt et al., 2015). The period is set to 3 min (20 samples/h). The simulator runs for 500 h
process variables include 12 process manipulated variables, 22 con- in the normal state. 10,0 0 0 normal sample matrices are then col-
tinuous process measurements and 19 component analysis mea- lected and each contains one-hour data. In each simulation of the
surements. Even though there are 28 process disturbances (fault 20 faults, the simulator runs for 10 h in the normal state at the
types) in the revised version, we only select IDV(1) – IDV(20) beginning. Then the corresponding fault disturbance is introduced
for comparison with other algorithms. Normal data and fault data and the simulator continue to run for 40 h. In this way, 40 h fault
of the TE process are collected from the simulations on MATLAB data (800 fault sample matrices) are collected in each simulation.
2016a (see Table 1 and Table 2). The method of the simulations It must be noted that the simulations of fault 06 shut down after
H. Wu, J. Zhao / Computers and Chemical Engineering 115 (2018) 185–197 191

Fig. 9. P&ID of TE process (Bathelt et al., 2015).

Table 1
The fault sample matrices collected from the TE process simulator.

Status index Training time length/h Number of training sample matrices Testing time length/h Number of testing sample matrices

IDV 01-05 07-20 40 × 19 × 8 800 × 19 × 8 40 × 19 × 2 800 × 19 × 2


IDV 06 7×8 140 × 8 7×2 140 × 2
Fault 6136 122720 1534 30680

Table 2
The normal sample matrices collected from the TE process simulator.

Status index Training time length/h Number of training sample matrices Testing time length/h Number of testing sample matrices

Normal 400 80 0 0 100 20 0 0

7 h in the fault state, hence each simulation of fault 06 only has 7 h Table 3
DCNN model candidates for fault diagnosis of the TE process.
fault data. The simulation of each fault type repeats 10 times with
10 different initial states. We randomly select 80% normal sam- Model Architecture
ple matrices (80 0 0) as training normal sample matrices and the Model 1 Conv(128)-Pool-FC(300)∗ -FC (21)
remaining normal sample matrices (20 0 0) as testing normal sam- Model 2 Conv(128)-Conv(128)-Pool-FC(300)∗ -FC(21)
ple matrices. In the same way, for each fault, we randomly select Model 3 Conv(128)-Conv(128)-Conv(128)-Pool-FC(300)∗ -FC(21)
122,720 training fault sample matrices from eight simulations and Model 4 Conv(128)-Conv(128)-Conv(128)-Conv(128)-Pool-FC(300)∗ -FC(21)
Model 5 Conv(64)-Pool-Conv(128)-Pool-FC(300)∗ -FC(21)
30,680 testing fault sample matrices from the other two simula-
Model 6 Conv(64)-Conv(64)-Pool-Conv(128)-Pool-FC(300)∗ -FC(21)
tions. Model 7 Conv(64)-Conv(64)-Pool-Conv(128)-Pool(2 × 1)-FC(300)∗ -FC(21)
In order to extract the features of process data in both spa- Model 8 Conv(64)-Conv(64)-Pool-Conv(128)-Pool(1 × 2)-FC(300)∗ -FC(21)
tial and temporal domains, data samples are transformed into two- Model 9 Conv(64)-Pool-Conv(128)-Conv(128)-Pool-FC(300)∗ -FC(21)
Model 10 Conv(64)-Conv(64)-Pool-Conv(128)-Conv(128)-Pool-FC(300)∗ -FC(21)
dimensional matrices with m × n. In the TE process, XMV(5) (Com-
Model 11 Conv(64)-Conv(64)-Conv(64)-Pool-
pressor recycle valve), XMV(9) (Stripper steam valve) and XMV(12) -Conv(128)-Conv(128)-Pool(2 × 1)-FC(300)∗ -FC(21)
(Agitator speed) are constant during the simulations. These three Model 12 Conv(64)-Conv(64)-Pool-
variables are excluded and therefore each sample matrix includes -Conv(128)-Conv(128)-Conv(128)-Pool(2 × 1)-FC(300)∗ -FC(21)
the information of the rest 50 variables for 1 h (m = 20, n = 50). FC∗ : Dropout ( p = 0.5) is utilized for this FC layer.
Pool: The default of kernel size in pooling layers is 2 × 2 and all the strides are set
to 2.
4.2. DCNN model for the TE process

It is a common issue that there is no scientific guidance for de-


signing an optimal DCNN architecture. In order to find a proper In the following, Model 7 is explained as an example. The in-
model, we tried several DCNN models, the architectures of which put size of one sample matrix is 20 × 50, where “20” represents
are shown in Table 3. Then an architecture with the most out- sample time length and “50” represents the number of process
standing fault diagnosis performance was selected from Table 3. variables. Here we use 3 convolutional layers, 2 max pooling lay-
Parameters of the selected architecture were studied through ex- ers and 2 FC layers. In the three convolutional layers, the kernel
periments. The parameters mainly contain the number of filters of sizes are all set to 3 × 3 and the strides are set to 1. The first
each convolutional layer and the output length of the first FC layer. and second convolutional layers both contain 64 filters and the
192 H. Wu, J. Zhao / Computers and Chemical Engineering 115 (2018) 185–197

Table 4
Confusion matrix for the ith class.

Number of samples in the ith class (Predicted) Number of samples in the other classes (Predicted)

Number of samples in the ith class (Actual) p b


Number of samples in the other classes (Actual) q d

third contains 128 filters. The two max pooling layers are behind
the second and third convolutional layers. The kernel size is set to
2 × 2 in the first pooling layer and set to 2 × 1 in the second
pooling layer. Here the output of one sample matrix is a three-
dimensional array (3 × 21 × 128). As mentioned in Section 2.4,
the input of FC layers must be a one-dimensional vector, hence we
utilize a “Flatten” layer to reshape three-dimensional arrays into
one-dimensional vectors with the size of 8064 (3 × 21 × 128).
The output length of the first FC layer is set to 300 and “Dropout”
is used for this layer. The last FC layer outputs the classes of sam-
ple matrices with a “softmax” function. The softmax function also
named normalized exponential function, is a generalization of the
logistic function that transforms a K-dimensional vector Z of arbi- Fig. 10. The training accuracy and testing accuracy for the iteration process.
trary real values into a K-dimensional vector σ (Z) of real values in
the range of (0, 1) that add up to 1. The function is shown as:
eZ j
σ ( Z ) j = K , j = 1, 2, . . . , K, (9)
k=1 eZk
After the transformation of the softmax function, the model
outputs a vector of length 21, each value of which represents the
possibility of the corresponding class. For testing a sample matrix,
the predicted diagnosis result is the class with the highest possi-
bility.

4.3. Fault diagnosis result

4.3.1. FDR and FPR


After the data collection and the DCNN model construction are
completed, the training and testing procedures of classification are
implemented in a server computer. In order to show the exper-
iment result of the fault diagnosis, define the confusion matrix,
fault diagnosis rate (FDR) and false positive rate (FPR) for the ith
class in Table 4, Eq. (10) and (11), respectively:
p
FDR = (10)
p+b Fig. 11. The details of the testing result for fault diagnosis.

q
FPR = (11)
q+d
erage FPR is 0.1%. As for the testing dataset including 30,680 sam-
Here the batch size for training is set to 128 and the number ple matrices, the average FDR reaches 88.2% and the corresponding
of epochs is set to 50. “Batch size” means the number of sam- average FPR is 0.5%. These results show that except “Fault 09” (D
ple matrices in one forward/backward pass of each iteration. “One feed temperature in stream 2 – random variation), “Fault 15” (Con-
Epoch” means all the training sample matrices are passed forward denser cooling water valve - sticking) and “Fault 16” (Unknown),
and backward through the network only once. The fault diagno- the other fault types can be diagnosed with more than 91% FDRs
sis testing is on the basis of one sample matrix each time. Each by the DCNN model. The details of the testing result for fault diag-
sample matrix contains the time-series data of the 50 variables nosis are illustrated in Fig. 11. It shows the confusion matrix of all
from time t-1 h to time t to diagnose the status of time t. The the 21 classes. “Size” and “Color” both represent the value of the
testing average FDR and the training/testing time of the models in confusion matrix.
Table 3 are listed in Table 5. Model 3 has the highest testing av- In the previous researches about the TE process fault diagno-
erage FDR (88.4%) and takes 56 s × 50 = 46.7 min for training. With sis, the best reported results were achieved by the DBN based
a little decrease of the testing average FDR (88.2%), Model 7 only model (Zhang and Zhao, 2017). In order to show the performance
takes about half the training time of Model 3. In the following dis- of the proposed method, the results of the DBN based model and
cussion, Model 7 is chosen as the best architecture. Then through a the DCNN based model are compared in Table 6. For the train-
lot of experiments, the parameters of Model 7 are set to {conv(64), ing dataset, the DBN based model cannot diagnose the three fault
conv(64), conv(128), FC(300)}. “Fault 09", “Fault 15" and “Fault 16" and the FDRs of “Fault 15" and
Fig. 10 illustrates the curves of the accuracy at the training and “Fault 16" are even 0%. By contrast, the DCNN based model shows
testing phases. The DCNN based fault diagnosis results are listed excellent performance on all the fault types in the training dataset,
in Table 6. Among the 122,720 training sample matrices, the av- with 98.6% average FDR. For the testing dataset, the FDRs of “Fault
erage FDR of training dataset is 98.6% and the corresponding av- 05", “Fault 08", “Fault 11", “Fault 12", “Fault 13", “Fault 14" are less
H. Wu, J. Zhao / Computers and Chemical Engineering 115 (2018) 185–197 193

Table 5
The testing average FDR and the training and testing time.

Model Testing average FDR (%) Training time for one epoch (s) Testing time for one sample matrix (ms)

Model 1 85.0 60 1.9


Model 2 87.9 57 1.8
Model 3 88.4 56 1.8
Model 4 88.0 56 1.8
Model 5 86.4 24 1.5
Model 6 87.5 24 1.5
Model 7 88.2 30 1.5
Model 8 87.5 31 1.5
Model 9 86.5 24 1.5
Model 10 87.5 24 1.5
Model 11 87.5 31 1.5
Model 12 87.8 27 1.5

Table 6
The comparison of fault diagnosis results between the DBN based model (Zhang and Zhao, 2017) and the DCNN based model.

Status index FDR-Train (DBN) FDR-Train (DCNN) FPR-Train (DCNN) FDR-Test (DBN) FDR-Test (DCNN) FPR-Test (DCNN)

Normal - 0.916 0.001 - 0.978 0.015


Fault 01 1.00 0.998 0.0 0 0 1.00 0.986 0.003
Fault 02 1.00 0.996 0.0 0 0 0.99 0.985 0.0 0 0
Fault 03 0.99 0.996 0.0 0 0 0.95 0.917 0.008
Fault 04 0.98 0.999 0.0 0 0 0.98 0.976 0.0 0 0
Fault 05 0.90 0.998 0.0 0 0 0.86 0.915 0.006
Fault 06 1.00 0.998 0.0 0 0 1.00 0.975 0.0 0 0
Fault 07 1.00 0.999 0.0 0 0 1.00 0.999 0.0 0 0
Fault 08 0.96 0.985 0.001 0.78 0.922 0.001
Fault 09 0.655 0.973 0.0 0 0 0.57 0.584 0.020
Fault 10 0.975 0.977 0.0 0 0 0.98 0.964 0.0 0 0
Fault 11 0.975 0.995 0.0 0 0 0.87 0.984 0.001
Fault 12 0.855 0.992 0.0 0 0 0.85 0.956 0.001
Fault 13 0.965 0.978 0.001 0.88 0.957 0.0 0 0
Fault 14 0.96 0.998 0.0 0 0 0.87 0.987 0.0 0 0
Fault 15 0 0.997 0.001 0 0.28 0.028
Fault 16 0 0.912 0.009 0 0.442 0.038
Fault 17 1.00 0.988 0.004 1.00 0.945 0.0 0 0
Fault 18 1.00 0.970 0.002 0.98 0.939 0.001
Fault 19 0.97 0.996 0.0 0 0 0.93 0.986 0.0 0 0
Fault 20 0.987 0.971 0.001 0.93 0.933 0.0 0 0
Average∗ 0.859 0.986 0.001 0.821 0.882 0.005

The average excludes the FDR of normal status because it is not available in the research of DBN based model.

than 88% in the DBN based model. By contrast, the proposed DCNN 800 sample matrices of the 21 classes (one normal class and
model achieves more than 91% FDRs of the six fault types. Addi- 20 fault classes) are randomly selected from the testing set for vi-
tionally, the DCNN based model can partially diagnose “Fault 15" sualization. The size of the input data is 800 × 20 × 50. Then the
and “Fault 16", despite that their FDRs are still low. Overall, the t-SNE method is used to transform these 20 × 50 sample matrices
average of testing FDRs through the DCNN based model is 88.2%, into 800 vectors of length 2. In Fig. 12(a), each point represents a
which is 6.1% higher than that of the DBN based model. sample matrix, plotted by the first value of each vector on the hor-
izontal axis and the second value on the vertical axis. The points
4.3.2. Hierarchical feature learning visualization are marked with their actual class labels, “Normal” labelled with
To facilitate the understanding of the feature learning process of “0", “Fault 01" labelled with “1", etc. Additionally, in order to dis-
the DCNN model, it is vital to learn about its hierarchical feature tinguish the clusters for viewing, different colors are used to rep-
learning process. However, it is difficult for the output represen- resent their class labels as well. Similarly, the output of each layer
tation of each layer to visualize the diagnosis because the learned is transformed by the t-SNE method into a vector of length 2 so
features are high-dimensional. To solve this problem, we adopt the that it can be visualized in Fig. 12(b)–(h). It should be noted that
t-distributed stochastic neighbor embedding (t-SNE) (Maaten and the output of the “Dropout” and “Flatten” layers are not visualized
Hinton, 2008) method, which is usually used to visualize the hier- because these layers are useless for feature learning.
archical feature learning process of the DCNN model. As illustrated in Fig. 12(a), the raw process data samples of all
The t-SNE method is a variation of Stochastic Neighbor Em- the classes are mixed up. Through three convolutional layers and
bedding (SNE) (Hinton and Roweis, 2002), and is better at reveal- two max pooling layers, we can find that the samples are gradually
ing the distribution of high-dimensional data. The t-SNE can em- clustered by the class labels in the t-SNE maps (see Fig. 12(b)–(f)).
bed high-dimensional features of each layer into a space of two or Theoretically, clearer clusters mean better performance for classi-
three dimensions, which can be visualized in a scatter map. With fication. Finally, Fig. 12(g) and (h) are the t-SNE maps of the last
the 2D or 3D map corresponding to each layer, we can visualize two fully connected layers. The last t-SNE map illustrates the re-
the feature learning process easily. Through the experiment, we sult of classification, and shows rather clear clustering of samples.
find that 3D maps are not suitable for visualization of the DCNN These subfigures powerfully prove that the DCNN model is effec-
based fault diagnosis model. Therefore, the high-dimensional out- tive for the fault diagnosis task. Additionally, in Fig. 12(h), we can
put features of each layer are embedded into 2D maps, which are also find that the points with labels “9", “15" and “16" are mixed
then plotted in the subfigures in Fig. 12.
194 H. Wu, J. Zhao / Computers and Chemical Engineering 115 (2018) 185–197

Fig. 12. DCNN model visualization using t-SNE.

up in the t-SNE map. It is well known that the FDRs of these three unknown in the 50 variables of the TE simulator (Bathelt et al.,
fault types in the previously reported researches were also quite 2015), it is hard to explain why their FDRs are low.
low (Zhang and Zhao, 2017; Rato and Reis, 2017). Since the fluc- Model 7 is retrained by using the data of normal and the other
tuation ranges of the root cause variables of these three faults are 17 fault types (without “Fault 09", “Fault 15" and “Fault 16"). The
result is listed in Table 7.
H. Wu, J. Zhao / Computers and Chemical Engineering 115 (2018) 185–197 195

Table 7 Table 10
The fault diagnosis result without “Fault 09", “Fault 15" and “Fault 16". The diagnosis time (min) in different simulation conditions.

Status index FDR(Test) FPR(Test) Status index FDR(Test) FPR(Test) A∗ 3min 1min 15s

Normal 0.996 0.022 Fault 10 0.959 0.0 0 0 B∗ 40 h 20 h 20 h 10 h 20 h


Fault 01 0.995 0.002 Fault 11 0.984 0.0 0 0
C∗ 20 10 20 10 20 10 20 10 20 10
Fault 02 0.990 0.0 0 0 Fault 12 0.968 0.002
Fault 03 0.961 0.001 Fault 13 0.939 0.0 0 0 Fault 01 28 36 30 31 16 18 18 18 15 13
Fault 04 0.999 0.001 Fault 14 0.992 0.0 0 0 Fault 02 39 50 60 47 17 19 17 15 17 18
Fault 05 0.951 0.001 Fault 17 0.950 0.0 0 0 Fault 03 35 26 34 26 14 18 19 17 9 7
Fault 06 0.989 0.0 0 0 Fault 18 0.946 0.0 0 0 Fault 04 15 34 16 38 9 9 9 10 8 8
Fault 07 1.0 0 0 0.0 0 0 Fault 19 0.985 0.0 0 0 Fault 05 33 12 28 13 10 9 10 9 26 22
Fault 08 0.938 0.002 Fault 20 0.940 0.0 0 0 Fault 06 18 25 23 27 11 12 11 12 9 8
Average 0.970 0.001 Fault 07 15 15 15 15 9 9 9 9 7 7
Fault 08 94 89 97 92 44 38 13 44 81 71
Fault 10 82 57 91 61 31 39 40 39 24 52
Table 8 Fault 11 45 45 56 43 12 17 16 16 19 18
The result of tuning the model output. Fault 12 52 33 66 44 13 18 27 20 13 9
n0 False alarm rate Average FDR Fault 13 100 82 107 101 131 105 77 86 72 62
Fault 14 24 24 27 23 15 16 15 17 12 12
0.5 0.004 0.970 Fault 17 124 123 123 115 27 34 35 42 43 41
0.4 0.003 0.969 Fault 18 45 56 107 96 36 72 43 28 19 20
0.3 0.002 0.969 Fault 19 64 53 59 51 17 18 17 18 17 15
0.2 0.002 0.968 Fault 20 146 163 134 154 79 60 66 29 44 62
0.1 0.002 0.967 Average 56 54 63 57 29 30 26 25 26 26
0.005 0.001 0.961
0.004 0.0 0 04 0.961 A∗ : sampling period; B∗ : fault simulating time; C∗ : sample time length.

Table 9
(C∗ ) is set to 20 (20 × 3min = 1 h). For comparison, the diagnosis
The average FDR during different time periods after the fault introduction.
time in different simulation conditions is listed in Table 10. If a
Time period 0∼1h 1∼2h 2∼3h 3∼4h 4∼5h 5∼6h 6∼7h 7∼8h fault is diagnosed consecutively for 5 (when A∗ = 3 min) evalua-
Average FDR 0.432 0.766 0.941 1.0 0 0 0.985 0.999 1.0 0 0 1.0 0 0 tions, or 9 (when A∗ = 1 min) evaluations, or 30 (when A∗ = 15 s)
evaluations, a diagnostic result is indicated. It is important to note
that before the diagnosis time, one or several faults in the 17
4.3.3. Model tuning fault types may be misdiagnosed. For example, in the condition of
Table 7 shows that the FDR for normal operation is 99.6%. This (A∗ = 3 min, B∗ = 40 h, C∗ = 20), 16 fault types can be correctly di-
means that 0.4% of normal data is misdiagnosed as fault data, agnosed when the first fault is detected. However, “Fault 13" will
which is usually regarded as “false alarm”. If the false alarm rate is be misdiagnosed as “Fault 02" at 85 min after the fault occurrence.
large, operators would be bombarded with lots of false alarms dur- From Table 10, it can be found that as the sampling period de-
ing normal operation. In order to minimize the false alarm rate, creases (from 3 min to 15 s), the faults can be correctly diagnosed
the output of the DCNN model can be tuned. After the softmax early.
function, the model outputs a vector of length 18, {ri |i = 0 ∼ 8,
10 ∼ 14, 17 ∼ 20}, each value of which represents the possibility of 5. Conclusion
the corresponding class. The diagnosis result (R) is the class with
the highest possibility, R = argmaxi (ri ). To reduce the normal data In this paper, a DCNN model based chemical process fault diag-
misdiagnosed as fault data, the following procedure is used for di- nosis method is proposed. In order to extract the features in both
agnosing the class of data: spatial and temporal domains, a DCNN model is built with convo-
lutional layers, pooling layers, dropout and FC layers. Raw process
set n0 ;
data is transformed into a m × n dimensional feature map, where
if r0 > n0 :
m represents the length of a certain period of time (sample time
return R = 0;
length) and n represents the number of variables.
else :
The experiment result shows that the proposed DCNN based
return R = arg maxi (ri )
fault diagnosis method has excellent performance on the bench-
Table 8 lists the false alarm rates of the normal state and the mark TE process. The average FDR of all the 20 fault types reaches
average FDRs of all the fault types with several values of n0 . With 88.2%, which is higher than other fault diagnosis methods pub-
the decrease of n0 , the false alarm rate decreases but the average lished in literatures. Except three fault types (“Fault 09", “Fault
FDR of faults decreases as well. This brings out the trade-off be- 15" and “Fault 16") which are notoriously difficult to diagnose, the
tween the needs of false alarm rate and fault diagnosis rate. FDRs of the other fault types are all over 91%. Additionally, the t-
SNE method is utilized to visualize the hierarchical learning fea-
4.3.4. Diagnostic performance ture process. Most data sample matrices of are clearly and cor-
The fault development is a dynamic process. Therefore, it is rectly clustered by the DCNN in the t-SNE map. Then the model
necessary to explore the diagnostic performance of the proposed is tuned to reduce the false alarms. This brings out the trade-off
method as time progresses after the fault introduction. Table 9 lists between the needs of false alarm rate and fault diagnosis rate. Fi-
the average FDR during different time periods after the fault intro- nally, the dynamic diagnostic performance and the diagnosis time
duction. It can be found that the average FDR increases as time are explored.
progresses. This method is prospective for industrial applications due to its
Furthermore, the diagnosis time is explored to show when the outstanding fault diagnosis rates and false positive rates. However,
fault can be diagnosed correctly after the fault occurrence. In the since the model still relies on historical fault data samples, it is
aforementioned discussion, sampling period (A∗ ) is set to 3 min, not applicable to diagnosis of faults without historical data or with
fault simulating time (B∗ ) is set to 40 h, and sample time length little historical data. For a super complex chemical process which
196 H. Wu, J. Zhao / Computers and Chemical Engineering 115 (2018) 185–197

generally has more than thousands of variables, input data dimen- LeCun, Y., Jackel, L.D., Boser, B., Denker, J.S., Graf, H.P., Guyon, I., Henderson, D.,
sion n will become a large number while the other dimension m Howard, R.E., Hubbard, W., 1989. Handwritten digit recognition: applications of
neural network chips and automatic learning. IEEE Commun. Mag. 27, 41–46.
will be relatively very small. Under this situation, how to design a doi:10.1109/35.41400.
DCNN architecture will become a serious problem. The future re- LeCun, Y., Bengio, Y., Hinton, G., 2015. Deep learning. Nature 521, 436–444. doi:10.
search work will be focused on designing a DCNN model for super 1038/nature14539.
Lee, J.M., Qin, S.J., Lee, I.B., 2007. Fault detection of non-linear processes using kernel
complex chemical processes with thousands of process variables. independent component analysis. Can. J. Chem. Eng. 85, 526–536.
Lin, M., Chen, Q., Yan, S., 2013. Network in network. arXiv Prepr. 10. doi:10.1109/
Acknowledgement ASRU.2015.7404828.
Lv, F., Wen, C., Bao, Z., Liu, M., 2016. Fault diagnosis based on deep learning. In:
2016 Am. Control Conf., pp. 6851–6856. doi:10.1109/ACC.2016.7526751.
The authors gratefully acknowledge support from the National Maaten, L.VanDer, Hinton, G., 2008. Visualizing data using t-SNE. J. Mach. Learn.
Natural Science Foundation of China (No. 61433001). Res. 1 (620), 267–284. doi:10.1007/s10479- 011- 0841- 3.
Maurya, M.R., Rengaswamy, R., Venkatasubramanian, V., 2005. Fault diagnosis by
Reference qualitative trend analysis of the principal components. Chem. Eng. Res. Des. 83,
1122–1132. https://doi.org/10.1205/cherd.04280.
Ayubi Rad, M.A., Yazdanpanah, M.J., 2015. Designing supervised local neural net- Maurya, M.R., Rengaswamy, R., Venkatasubramanian, V., 2007. A signed directed
work classifiers based on EM clustering for fault diagnosis of Tennessee East- graph and qualitative trend analysis-based framework for incipient fault diagno-
man process. Chemom. Intell. Lab. Syst. 146, 149–157. doi:10.1016/j.chemolab. sis. Chem. Eng. Res. Des. 85, 1407–1422. https://doi.org/10.1016/S0263-8762(07)
2015.05.013. 73181-7.
Bathelt, A., Ricker, N.L., Jelali, M., 2015. Revision of the Tennessee eastman process Maurya, M.R., Paritosh, P.K., Rengaswamy, R., Venkatasubramanian, V., 2010. A
model. IFAC PapersOnLine 309–314. doi:10.1016/j.ifacol.2015.08.199. framework for on-line trend extraction and fault diagnosis. Eng. Appl. Artif. In-
Bouvrie, J., 2006. Notes on convolutional neural networks. In Pract. 47–60. http: tell. 23, 950–960. https://doi.org/10.1016/j.engappai.2010.01.027.
//dx.doi.org/10.1016/j.protcy.2014.09.007. MacGregor, J.F., Jaeckle, C., Kiparissides, C., Koutoudi, M., 1994. Process monitoring
Chiang, L.H., Russell, E.L., Braatz, R.D., 20 0 0. Fault diagnosis in chemical processes and diagnosis by multiblock PLS methods. AIChE J 40, 826–838. doi:10.1002/aic.
using Fisher discriminant analysis, discriminant partial least squares, and prin- 690400509.
cipal component analysis. Chemom. Intell. Lab. Syst. 50, 243–252. doi:10.1016/ Mahadevan, S., Shah, S.L., 2009. Fault detection and diagnosis in process data us-
S0169-7439(99)0 0 061-1. ing one-class support vector machines. J. Process Control 19, 1627–1639. doi:10.
Chiang, L.H., Kotanchek, M.E., Kordon, A.K., 2004. Fault diagnosis based on Fisher 1016/j.jprocont.2009.07.011.
discriminant analysis and support vector machines. Comput. Chem. Eng. 28, Plovoso, M.J., Kosanovich, K.A., 1994. Applications of multivariate statistical methods
1389–1401. doi:10.1016/j.compchemeng.2003.10.002. to process monitoring and controller design. Int. J. Control 59, 743–765. doi:10.
Cho, J.-H., Lee, J.-M., Wook Choi, S., Lee, D., Lee, I.B., 2005. Fault identification for 1080/00207179408923103.
process monitoring using kernel principal component analysis. Chem. Eng. Sci. Rato, T., Reis, M., Schmitt, E., Hubert, M., De Ketelaere, B., 2016. A systematic
60, 279–288. doi:10.1016/j.ces.20 04.08.0 07. comparison of PCA-based statistical process monitoring methods for high-
Choi, S.W., Park, J.H., Lee, I.B., 2004. Process monitoring using a Gaussian mix- dimensional, time-dependent processes. AIChE J. 62, 1478–1493. doi:10.1002/
ture model via principal component analysis and discriminant analysis. Comput. aic.15062.
Chem. Eng. 28, 1377–1387. doi:10.1016/j.compchemeng.2003.09.031. Rato, T.J., Reis, M.S., 2017. Markovian and non-Markovian sensitivity enhancing
Dai, Y., Zhao, J., 2011. Fault diagnosis of batch chemical processes using a dynamic transformations for process monitoring. Chem. Eng. Sci. 163, 223–233. https:
time warping (DTW)-based artificial immune system. Ind. Eng. Chem. Res. 50, //doi.org/10.1016/j.ces.2017.01.047.
4534–4544. doi:10.1021/ie101465b. Russell, E.L., Chiang, L.H., Braatz, R.D., 20 0 0. Fault detection in industrial processes
Downs, J.J., Vogel, E.F., 1993. A plant-wide industrial process control problem. Com- using canonical variate analysis and dynamic principal component analysis.
put. Chem. Eng. 17, 245–255. doi:10.1016/0 098-1354(93)80 018-I. Chemom. Intell. Lab. Syst. 51, 81–93. doi:10.1016/S0169-7439(0 0)0 0 058-7.
Eslamloueyan, R., Shahrokhi, M., Bozorgmehri, R., 2003. Multiple simultaneous fault Shu, Y., Zhao, J., 2016. Fault diagnosis of chemical processes using artificial im-
diagnosis via hierarchical and single artificial neural networks. Sci. Iran 10. mune system with vaccine transplant. Ind. Eng. Chem. Res. 55, 3360–3371.
Eslamloueyan, R., 2011. Designing a hierarchical neural network based on fuzzy clus- doi:10.1021/acs.iecr.5b02646.
tering for fault diagnosis of the Tennessee–Eastman process. Appl. Soft Comput. Simonyan, K., Zisserman, A., 2014. Very deep convolutional networks for large-scale
11, 1407–1415. doi:10.1016/j.asoc.2010.04.012. image recognition. CoRR 1–14. http://arxiv.org/abs/1409.1556.
Fan, J.Y., Nikolaou, M., White, R.E., 1993. An approach to fault diagnosis of chemical Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R., 2014.
processes via neural networks. AIChE J 39, 82–88. doi:10.1002/aic.690390109. Dropout: a simple way to prevent neural networks from overfitting. J. Mach.
Fan, J., Wang, Y., 2014. Fault detection and diagnosis of non-linear non-Gaussian Learn. Res. 15, 1929–1958. doi:10.1214/12-AOS10 0 0.
dynamic processes using kernel dynamic independent component analysis. Inf. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., 2014. Going deeper with convolutions. arXiv
Sci. 259, 369–379. doi:10.1016/j.ins.2013.06.021. Prepr. arXiv:1409.4842 1–9. doi:10.1109/CVPR.2015.7298594.
Ge, Z., Xie, L., Kruger, U., Song, Z., 2012. Local ICA for multivariate statistical fault Venkatasubramanian, V., Chan, K., 1989. A neural network methodology for process
diagnosis in systems with unknown signal and error distributions. AIChE J 58, fault diagnosis. AIChE J. 35, 1993–2002. doi:10.1002/aic.690351210.
2357–2372. doi:10.1002/aic.12760. Venkatasubramanian, V., Rengaswamy, R., Ka, S.N., Kavuri, S.N., Ka, S.N., 2003a. A
Ghosh, K., Srinivasan, R., 2011. Immune-system-inspired approach to process mon- review of process fault detection and diagnosis part II : qualitative models and
itoring and fault diagnosis. ind. eng. chem. Res 50, 1637–1651. doi:10.1021/ search strategies. Comput. Chem. Eng. 27, 313–326. doi:10.1016/S0098-1354(02)
ie100767c. 00161-8.
He, K., Zhang, X., Ren, S., Sun, J., 2016. Deep residual learning for image recognition. Venkatasubramanian, V., Rengaswamy, R., Kavuri, S.N., Yin, K., 2003b. A review of
In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), process fault detection and diagnosis part III: process history based methods.
pp. 770–778. doi:10.1109/CVPR.2016.90. Comput. Chem. Eng. doi:10.1016/S0 098-1354(02)0 0162-X.
He, Q.P., Wang, J., 2007. Fault detection using the k-nearest neighbor rule for semi- Venkatasubramanian, V., Rengaswamy, R., Kavuri, S.N., Yin, K., 2003c. A review of
conductor manufacturing processes. IEEE Trans. Semicond. Manuf. 20, 345–354. process fault detection and diagnosis part I: quantitative model based methods.
doi:10.1109/TSM.2007.907607. Comput. Chem. Eng. 27, 293–311. doi:10.1016/S0 098-1354(02)0 0160-6.
Hinton, G.E., Roweis, S.T., 2002. Stochastic neighbor embedding. Adv. Neural Inf. Pro- Watanabe, K., Matsuura, I., Abe, M., Kubota, M., Himmelblau, D.M., 1989. Incipient
cess. Syst. 833–840. http://books.nips.cc/papers/files/nips15/AA45.pdf. fault diagnosis of chemical processes via artificial neural networks. AIChE J. 35,
Hinton, G.E., Salakhutdinov, R., 2006a. Reducing the dimensionality of data using 1803–1812. doi:10.1002/aic.690351106.
neural networks. Science 313, 504–507 (80-.). Watanabe, K., Hirota, S., Hou, L., Himmelblau, D.M., 1994. Diagnosis of multiple si-
Hinton, G.E., Osindero, S., Teh, Y.W., 2006b. A fast learning algorithm for deep belief multaneous fault via hierarchical artificial neural networks. AIChE J. 40, 839–
nets. Neural Comput 18, 1527–1554. doi:10.1162/neco.2006.18.7.1527. 848. doi:10.10 02/aic.69040 0510.
Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R., 2012. Wise, B.M., Ricker, N.L., Veltkamp, D.F., Kowalski, B.R., 1990. Theoretical basis for
Improving neural networks by preventing co-adaptation of feature detectors. the use of principal component models for monitoring multivariate processes.
arXiv Prepr. arXiv:1207.0580. Process Control Qual. 1, 41–51.
Hsu, C.C., Chen, M.C., Chen, L.S., 2010. A novel process monitoring approach with Xie, D., Li, B., 2015. A hierarchical deep neural network for fault diagnosis on
dynamic independent component analysis. Control Eng. Pract. 18, 242–253. Tennessee–Eastman process. ICMLA, 2015 IEEE 14th International Conference
doi:10.1016/j.conengprac.20 09.11.0 02. doi:10.1109/ICMLA.2015.208.
Kano, M., Tanaka, S., Hasebe, S., Hashimoto, I., Ohno, H., 2003. Monitoring inde- Yélamos, I., Escudero, G., Graells, M., Puigjaner, L., 2009. Performance assessment
pendent components for fault detection. AIChE J. 49, 969–976. doi:10.1002/aic. of a novel fault diagnosis system based on support vector machines. Comput.
690490414. Chem. Eng. 33, 244–255. doi:10.1016/j.compchemeng.20 08.08.0 08.
Krizhevsky, A., Sutskever, I., Hinton, G.E., 2012. ImageNet classification with deep Yin, S., Ding, S.X., Haghani, A., Hao, H., Zhang, P., 2012. A comparison study of ba-
convolutional neural networks. Adv. Neural Inf. Process. Syst. 1–9. http://dx.doi. sic data-driven fault diagnosis and process monitoring methods on the bench-
org/10.1016/j.protcy.2014.09.007. mark Tennessee Eastman process. J. Process Control 22, 1567–1581. doi:10.1016/
Kulkarni, A., Jayaraman, V.K., Kulkarni, B.D., 2005. Knowledge incorporated support j.jprocont.2012.06.009.
vector machines to detect faults in Tennessee Eastman process. Comput. Chem. Yu, J., Qin, S.J., 2008. Multimode process monitoring with bayesian inference-based
Eng. 29, 2128–2133. doi:10.1016/j.compchemeng.20 05.06.0 06. finite Gaussian mixture models. AIChE J. 54, 1811–1829. doi:10.1002/aic.11515.
H. Wu, J. Zhao / Computers and Chemical Engineering 115 (2018) 185–197 197

Zhang, Y.W., 2008. Fault detection and diagnosis of nonlinear processes using im- Zhang, Z., Zhao, J., 2017. A deep belief network based fault diagnosis model for com-
proved kernel independent component analysis (KICA) and support vector ma- plex chemical processes. Comput. Chem. Eng. doi:10.1016/j.compchemeng.2017.
chine (SVM). Ind. Eng. Chem. Res. 47, 6961–6971. doi:10.1021/Ie071496x. 02.041.
Zhang, Y., Hu, Z., 2011. Multivariate process monitoring and analysis based on multi- Zhu, Z.B., Song, Z.H., 2011. A novel fault diagnosis system using pattern classification
scale KPLS. Chem. Eng. Res. Des. 89, 2667–2678. doi:10.1016/j.cherd.2011.05.005. on kernel FDA subspace. Expert Syst. Appl. 38, 6895–6905. doi:10.1016/j.eswa.
2010.12.034.

You might also like