A Multisource Domain Adaptation Network For Process Fault Diagnosis Under Different Working Conditions

6272 IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 70, NO.
6, JUNE 2023
A Multisource Domain Adaptation Network for

Process Fault Diagnosis Under Different
Working Conditions
Shijin Li and Jianbo Yu , Member, IEEE
Abstract—Transfer learning-based process fault diag- have recently received increasing attention owing to the ad-
nosis has received intensive attention from researchers. vancements in sensor technology as well as data storage and
However, a practical scenario of process fault diagnosis analysis techniques. Recently, deep learning (DL) based meth-
(i.e., multisource domain adaptation) has not been well
solved under various working conditions. It is challeng- ods have been successfully applied to data-driven fault diagno-
ing since distribution difference coexists between different sis. Li et al. [3] proposed a graph convolutional network with
source domains and across source and target domains. multiple receptive fields for machinery fault diagnosis, which
In this article, a novel transfer learning model, feature- converted data into graphs to investigate the relationship of
level, and class-level based multisource domain adaptation
data samples in non-Euclidean space. Liu et al. [4] proposed
(FC-MSDA) is proposed for process fault diagnosis under
varying working conditions. A common feature extractor a sparse-denoising autoencoders-based network for fault iden-
is proposed to learn both global and local features from tification in industrial processes. Yu and Zhao [5] designed a
process signals. A feature selection module is developed broad convolutional neural network (CNN) to capture the fault
to reduce negative transfer caused by irrelevant informa- information and nonlinear structure of the process signals for
tion in multiple source domains. Domain specific feature
fault diagnosis.
generator is developed for each source-target domain pair
to learn domain-specific features. Moreover, class-level dis- The limitation of these DL-based methods is that they re-
tribution alignment loss is proposed for each domain pair quire that the training and testing datasets follow the same data
to settle the negative transfer caused by inconsistent label distribution. However, this requirement is hard to be satisfied
space between domains from different working conditions in practical implementation owing to the inevitable changes in
of process. An information fusion strategy is performed
the industrial production process (i.e., the changes in operating
to ensemble multiple predictions. The experimental results
on three industrial cases demonstrate the effectiveness of settings, working environment, and materials). The distribution
FC-MSDA in process fault diagnosis (i.e., FC-MSDA obtains discrepancy generally exists between different datasets collected
the average accuracy of 99.17% on five transfer tasks in at different periods. Thus, the model constructed on the training
three phase process). data may not achieve satisfactory results on the testing data.
Index Terms—Multisource domain adaptation, negative Recently, transfer learning (TL) technology has become the
transfer, process fault diagnosis, transfer learning. focus of academia and industry to effectively cope with the
domain shift caused by the changing working conditions. It
can transfer knowledge learned from one domain to another
I. INTRODUCTION related but different domain. Two mainstreams of TL methods
ITH the expansion of production scale of modern indus- are the moment matching-based method and the adversarial
W trial processes, process fault diagnosis has been con-
sidered as a key technology to ensure production efficiency,
learning-based method. They are widely applied to machinery
fault diagnosis under different operating conditions. Moment
resource utilization, and production yield. Fault diagnosis can matching-based methods adopt statistic distance (i.e., correla-
be devided into model-based, knowledge-based and data-driven- tion alignment (CORAL) [6] and maximum mean discrepancy
based methods [1], [2]. Data-driven fault diagnosis approaches (MMD) [7]) to narrow the domain gap. Zhao et al. [8] adopted
multikernel MMD to reduce the distribution discrepancy under
different working conditions. Li et al. [9] proposed a CNN with
Manuscript received 23 March 2022; revised 28 May 2022 and 2 July
2022; accepted 20 July 2022. Date of publication 3 August 2022; date
CORAL alignment and transfer component analysis for bear-
of current version 23 January 2023. This work was supported by the ing fault diagnosis. Adversarial learning-based methods utilize
National Natural Science Foundation of China under Grant 92167107 adversarial learning to achieve distribution alignment [10]. Liu
and Grant 71771173. (Corresponding author: Jianbo Yu.)
The authors are with the School of Mechanical Engineering, Tongji
et al. [11] minimized the domain distribution distance through
University, Shanghai 201804, China (e-mail: 2111049@tongji.edu.cn; adversarial training. Li et al. [12] proposed a deep network with
jbyu@tongji.edu.cn). two separated feature generators for machinery fault diagnosis,
Color versions of one or more figures in this article are available at
https://doi.org/10.1109/TIE.2022.3194654.
and their classifiers are trained based on the domain adversarial
Digital Object Identifier 10.1109/TIE.2022.3194654 training loss and MMD loss, respectively.
0278-0046 © 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
rized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on November 17,2023 at 16:12:20 UTC from IEEE Xplore. Restrictions a
LI AND YU: MULTISOURCE DOMAIN ADAPTATION NETWORK FOR PROCESS FAULT DIAGNOSIS UNDER DIFFERENT WORKING CONDITIONS 6273
Although the abovementioned TL methods have received in multisource scenarios and assumes that the label space
excellent results, most of the pioneering works are confined to of the target domain and source domains are different.
a single-source scenario, which cannot ensure diagnostic preci- 2) A novel domain consistency loss is employed to guide
sion when the source samples come from various source domains the common feature selection to preserve transferable
with distinct data distributions caused by different working features for distribution alignment, and a class-level align-
conditions. It is challenging to perform process fault diagnosis ment loss is proposed to align the inconsistent label space
in multisource scenarios, for distribution discrepancies exist not of each domain pair.
only between source and target domains but also among the 3) An information fusion module based on the similarity
source domains. Recently, a small number of researches have of the source and target common features is proposed
been studied in the field of multisource domain fault diagnosis. to ensemble the multiple prediction results of domain-
Rezaeianjouybari and Yi [13] developed global and local feature specific classifiers. The experimental results on three
generators to learn effective features from multiple domains. process cases prove the superiority and effectiveness of
To implement multisource domain adaptation, Wang et al. [14] FC-MSDA for process fault diagnosis.
proposed a domain-specific distribution alignment module that The rest of this article is organized as follows. The details
adopted intradomain alignment to narrow the domain gap be- of FC-MSDA are presented in Section II. Section III provides
tween source-target domain pairs. Tian et al. [15] proposed the experimental details and results analysis. Finally, Section IV
a multibranch network for multisource fault diagnosis, where concludes this article.
local MMD is adopted to align the subdomain distributions
between domains. Zhu et al. [16] utilized a multiadversarial II. METHODOLOGY
learning strategy to obtain feature representations that are both
In this section, a novel feature-level and class-level-based
domain-invariant and discriminative. The performance of the
multisource domain adaptation model, i.e., FC-MSDA, is pro-
abovementioned methods depends on the assumption that the
posed for process fault diagnosis across various working con-
source domain and target domain share the same label space.
ditions. The framework of FC-MSDA is depicted in Fig. 1.
Generally, process data obtained in different working conditions
FC-MSDA mainly consists of three phases when implementing
have distinct label spaces. Target label space is sometimes a
process fault diagnosis: data preparation, feature extraction, and
subset of source label space when knowledge of multiple source
multisource domain adaptation. The feature extractor includes
domains is transferred to a single target domain. The source-only
two parts, i.e., a common feature extractor and several domain
classes that do not exist in the target domain may cause feature
specific feature generators. A feature selection module is de-
mismatch when aligning distributions, thus resulting in negative
veloped to filter out the unrelated information of the multiple
transfer. This issue is often addressed by using partial domain
source domains. A class-level distribution alignment module is
adaptation-based methods in single-source fault diagnosis [17].
proposed to alleviate the negative impact of inconsistent label
However, few studies have been done to mitigate the negative
space between each source-target domain pair due to the varying
impact of inconsistent label space in multisource domain fault
working conditions of industrial process. The output of each
diagnosis. Chai et al. [18] reduced the category wise-refined
domain-specific classifier is ensembled through an information
discrepancies of each source and target domain pair through
fusion module by measuring the similarity of source common
multiple adversarial training and complemented the multiple
features and target common features.
classification results by the similarity obtained from the refined
adaptation module. Huang et al. [19] designed a joint loss for
distribution alignment of both feature and label information A. Common Feature Extractor
to alleviate the negative effect of the inconsistent label space. The industrial manufacturing process is of great nonlinearity
Although these multisource domain adaptations have been suc- and dynamics. Thus, it is vital to preserve both local and global
cessfully applied to machinery fault diagnosis, there still exist information of the process data. The local structure of process
some limitations: 1) They are rarely applied to process fault data can well reflect the local changes of process faults. The
diagnosis under varying operating conditions, and most of them global structure can be learned to represent the overall changes
assume that source domains and target domain have the same of the process variables [20]. As shown in Fig. 2, a new common
label space, while it is difficult to be satisfied in real practice; feature generator, i.e., multidilation convolutional gated recur-
2) The transferability of multiple source domain features is not rent unit (MDC-GRU), is proposed to capture both local and
fully considered. It is an interesting issue to perform multisource global feature information through a local branch (multidilation
domain transfer learning in process fault diagnosis under varying convolutional neural network, MDCNN) and a global branch
operating conditions. (GRU), respectively. The network structure of MDC-GRU is
In this article, a new multisource domain adaptation model presented in Table I.
based on feature-level and class-level distribution alignment 1) Local Branch: Dilated convolution with varying dilated
(FC-MSDA) is proposed for process fault diagnosis across vary- rate is adopted to generate features with different spatial recep-
ing working conditions. The main contributions of this article tive fields. These features are fused as follows:
are as follows. n1
n2
n3
1) A new transfer learning model is proposed in this article, 1 2 3
y=f ωi ∗ x + ωi ∗ x + ω i ∗ x + bj (1)
which focuses on solving the process fault diagnosis issue i=0 i=0 i=0
6274 IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 70, NO. 6, JUNE 2023
Fig. 1. Network structure of FC-MSDA.
attention module for further modeling the importance of each

local feature point.
2) Global Branch: The process signal is characterized as
time-series since it is collected during the production process.
Thus, the global branch adopts GRU [21] to learn time serial fea-
ture representations. GRU network can learn the dependencies
of the large time step distance well by adjusting the information
flow. The feature learning process of GRU can be defined as
zt = σ(Wz xt + Uz ht−1 + bz ) (2)
rt = σ(Wr xt + Ur ht−1 + br ) (3)
h̃t = tanh(Wh xt + Uh (ht−1 rt ) + bh ) (4)
Fig. 2. Network structure of MDC-GRU. ht = (1 − zt ) ht−1 + zt h̃t (5)

where σ means the activate function, b means the bias, U denotes
TABLE I
the weight matrix of the recurrent unit, W means the weight
NETWORK STRUCTURE OF MDC-GRU matrix, h stands for the hidden status, zt means the output of
update gate, rt denotes the output of the reset gate, and is the
Hadamard product.
The input x is fed into the local branch and global branch
concurrently to learn local and global feature representations.
Then, the extracted features are contacted as follows:
CF = con(MDCNN(x), GRU(x)) (6)
where CF stands for the extracted common features, MDCNN ()
denotes the operation of the dilated convolution [see (1)], GRU()
represents the feature extraction process of GRU [see (2)–(5)],
and con () denotes the concatenation operation.
where x denotes the input feature, ωi1 , ωi2 , and ωi3 represent
the convolutional kernel, n1, n2, and n3 are the number of
convolution kernels, bj is the bias, and y represents the feature B. Feature-Level Distribution Alignment
after the fusion of multiple dilation rate convolutional features. In real implementation, target features may not cover all
Particularly, the fused features are firstly processed by a convo- the information of the multiple source domains. The source
lution operation. The output feature map is then delivered to the information that does not occur in the target domain would
TABLE II
NETWORK STRUCTURE OF DOMAIN SPECIFIC FEATURE GENERATOR
Fig. 3. Procedures of feature-level distribution alignment.
probably result in negative transfer in distribution alignment.

In this article, feature-level distribution alignment is proposed
to address this issue in Fig. 3. First, a feature selection module Fig. 4. Network structure of class-level distribution alignment.
(FSM) is used to select transferable common source features at
the feature level. FSM adopts an auxiliary domain discriminator structure of the domain specific feature generators is presented
Da to measure the transferability of the source features to the in Table II. For each source-target domain pair, the feature
target feature [22]. It is worth highlighting that the auxiliary extracting process of the kth domain-specific feature extractor
domain classifier is not involved in adversarial learning. When is as follows:
the auxiliary domain classifier has well converged, the output
of the domain classifier reflects the probability that the sample DFks = f (wk SFks + bk ) (10)
belongs to the source domain. If Da (x)≈0, the feature may
come from the shared feature space of the source and target DF t = f (wk CF t + bk ) (11)
domains, and its contribution is supposed to be emphasized. where wk is the weight matrix, bk is the bias, SFks is the selected
Thus, these features should be given a larger weight to avoid common features of the kth source domain, CFt is the target
negative transfer. The transferability weight wf of the source common features, DFks and DFt are the output of the kth domain
features can be described as specific feature generator, and f() stands for the activate function.
1
wf = 1 − Da (x) = p (x) (7)
pt (x) + 1
s
C. Class-Level Distribution Alignment
where ps (x) and pt (x) denote the dataset distribution of the source Since multiple source domains exist, domain-shift and class-
domain and target domain, respectively. shift between source and target domains need to be addressed.
A new domain consistency loss is proposed to train Da . The In this article, for each source-target pair, two distribution align-
domain consistency loss is computed as the distance between the ment losses (i.e., domain-level alignment loss and class-level
weighted source common feature and target common feature that distribution alignment loss) are proposed to align the distri-
have the same labels. This is done for each source-target domain bution discrepancy. Then, the extracted features are fed into
pair. The domain consistency loss is measured as the domain-specific classifier to generate the diagnosis results.
It can help the feature generator learn discriminative features.

K
1 k
N
An information fusion module is constructed to ensemble the
Lcon = f − ft (8)
N n=1 ws prediction results of each source-target domain pair based on a
k=1
class-level distribution alignment shown in Fig. 4.
where K denotes the source domain number, N means the number 1) Domain Adversarial Loss: FC-MSDA utilizes domain
k
of feature samples, and fws denotes the nth weighted source adversarial training strategy for learning domain-invariant fea-
common features. tures. Domain adversarial training plays a crucial role in domain
Then, the selected common features can be defined as adaptation. During adversarial training, it is hoped that the do-
SFks = wfk · CFks (9) main discriminator cannot distinguish whether the input feature
is source domain feature or target domain feature. The loss
where wfk is the transferability weight of kth source domain, function of the domain discriminator can be defined as follows:
CFks and SFks denote the common feature and the selected
1
n
common feature of the kth source domain, respectively. Lkd = − [yi log(D(G(xi )))
After obtaining the selected source common features and n j=1
target common features, K domain specific feature generators
are set up to correspond to K source domains. The network + (1 − yi ) log(1 − D(G(xi )))] (12)
where yj stands for the domain label, n means the number of

samples in the batch, and xi denotes the input data.
2) Class-Level Distribution Alignment Loss: In practical
industrial processes, the normal data is generally easier to be col-
lected. This would lead to the distinct label space of source and
target domains. The label space of target domain is sometimes
a subset of source label space when information is transferred
from multisource domains to a single target domain. Aligning
the knowledge that does not occur in the target domain would
result in negative transfer. In this article, a class-level distribution
alignment module for each source-target pair is proposed to cope
with the negative transfer issue. In each class-level distribution
alignment module, the number of domain discriminator Dcm
equals to the number of the label categories M. The total class-
level distribution alignment loss of each source-target domain
pair is expressed as
1 M
Lkcd = L(Dcm (ŷim Gf (xi )), di ) (13)
ns + nt m=1
xi ∈DS ∪Dt
Then, the weight ω k can be obtained through a monotonically
where L denotes the cross-entropy loss, ŷim means the predicted increasing function [23] as follows:
result of Dcm , and di stands for the domain label of xi . ⎛ ⎞
1 ⎝ exp(dj )
K
3) Classification Loss: In this article, domain-specific exp(dk ) ⎠
ωk = − . (17)
classifier is designed for each source-target pair to facilitate K − 1 j=1 1 + exp(dj ) 1 + exp(dk )
MDC-GRU to extract discriminative features. For each classifier
Ck , the output can be expressed as follows: The total loss of FC-MSDA is composed of four parts: 1) the
domain adversarial training loss, 2) the class-level distribution
exp((wm )T f + bm ) alignment loss, 3) the classification loss, and 4) the domain
y = M (14)
T consistency loss. It can be defined as
i=1 exp((wi ) f + bi )
L = Ld + β1 Lcd + β2 Lc + β3 Lcon
where wi means the weight matrix, bi is the bias, f is the output
K
K
K
of the fully-connected layer, and M denotes the number of class = ωk Lkd + β1 ωk Lkcd + β2 ωk Lkc + β3 Lcon
labels. The cross-entropy loss of each source-target domain pair k=1 k=1 k=1
(18)
can be calculated as
where β 1 , β 2 , and β 3 are non-negative parameters.
M In summary, the main differences between FC-MSDA and the
Lkc = − n1s 1 yk,is
= j [log(Gf (xsk,i ))]− existing multisource fault diagnosis methods [14]–[16], [18],
k
xsk,i ∈Dk
S j=1
(15) [19] are as follows.
1

M
nt 1 {yit = j}[log(Gf (xti ))] 1) FC-MSDA assumes that the label space of the target
xti ∈D T j=1 domain is a subset of that in multiple source domains.
2) The domain consistency loss and the class-level distribu-
where 1{·} is an indicator function, DkS and DT denote the kth tion alignment loss are proposed to alleviate the negative
source domain dataset and target domain dataset, xsk,i is the ith transfer in the feature-level and class-level simultane-
sample from the kth source domain dataset, xti is the ith sample ously.
s
of the target domain, yk,i and yit mean the label of xsk,i and xti , 3) FC-MSDA ensembles the multiple prediction results
respectively, and nsk and nt stand for the number of the source based on the similarity of the source and target common
and target domain samples, respectively. features.
4) Information Fusion Module: In this article, an infor-
mation fusion module is proposed to ensemble the prediction D. Parameter Optimization and Process Fault Diagnosis
results by assigning different weights to the common features of
different source domains. The weight of the kth source domain In FC-MSDA, let θg , θgk , θck , θcd
k
, θfs , and θd denote the
wk can be measured based on the similarity between the common parameter of the common feature extractor, the domain spe-
features of the kth source and target domains. The similarity dk cific feature generator, domain specific classifier, class-level
between CFns and CFt is calculated as follows: distribution alignment module, feature selection module, and
domain discriminator, respectively. The diagnosis procedures
t s 2 of FC-MSDA are exhibited in Fig. 5. In the offline training
1 1 1
M n nk
dk = || CFi,m − s
sk t
CFi,m || . (16) phase, the multiple source datasets are collected under varying
M m=1 nt i=1 nk i=1 working conditions. The target dataset is obtained from another
TABLE IV
DETAILED INFORMATION OF SOURCE ONLY, DANN, DSAN, CORAL,
DCTLN, DTLFD, MFSAN, MDAN, AND FC-MSDA
TABLE V
DETAILS OF VARIOUS WORKING CONDITIONS, FAULT TYPES, AND PROCESS
FAULT DIAGNOSIS TASKS OF CSTR SYSTEM
Fig. 5. Diagnosis procedure of FC-MSDA.
TABLE III
PARAMETER SETUP OF FC-MSDA
1) Single best: Experiments are performed on each source-

target domain pair. The one with the best diagnosis result
is recorded.
2) Source combine: The multiple source domains are united
into one domain to perform process fault diagnosis in a
different but related operating condition. After normalization, single source-target manner.
the collected datasets are fed into FC-MSDA for feature learning 3) Multisource: The comparison experiments are conducted
and multisource domain adaptation. The details of the training by using different multisource domain adaptation meth-
algorithm for FC-MSDA are provided in Algorithm I. The ods.
well-trained FC-MSDA is obtained by updating the model pa- Each testing is repeated five times to get the average diagnosis
rameters. In the online testing phase, the testing dataset is fed precision and standard deviation. All the methods are imple-
into the well-trained model to output the diagnosis results. mented on the PyTorch platform. The training and testing time
of FC-MSDA on these three cases are listed in Table III. Take the
III. EXPERIMENT AND RESULT ANALYSIS CSTR process for example, the training time of 150 epochs is
420.80 s, and the testing time for the target dataset is 3.03 s. Thus,
In this article, three industrial process cases are considered
FC-MSDA can meet the demand of online process monitoring.
to validate the effectiveness of FC-MSDA, i.e., two simulation
cases [i.e., continuous stirred tank reactor (CSTR) process and
fed-batch fermentation penicillin process (FBFP)] and a practi- A. Case 1: CSTR Process
cal industrial process (i.e., three phase process). Table III lists In this article, CSTR process [29] is adopted to validate the
the information about the parameters. It is worth mentioning that effectiveness of FC-MSDA for process fault diagnosis. Process
the parameter settings are different under different cases. data can be obtained by simulating a first-order irreversible
To validate the effectiveness and the superiority of FC-MSDA, chemical reaction under diverse working conditions. The de-
a DNN (i.e., Source only), four single-source domain adaptation tailed information of the measured process samples can be
methods (i.e., DANN [10], DSAN [24], Coral and DCTLN [25]), referenced in literature [29]. In this section, various working con-
and three multisource domain adaptation methods (i.e., DTLFD ditions are considered by adjusting T and cA . Table V presents
[26], MFSAN [27], and MDAN [28]) are considered to compare the details of five operating conditions and three fault types of
with FC-MSDA. The information of these methods is presented CSTR process, in which S1 and T1 denote the source domain
in Table IV. Specifically, the comparison is performed in three dataset and target domain dataset collected under condition 1,
different scenarios as follows. respectively. The transfer tasks are presented in the form of
TABLE VI
DIAGNOSIS RESULTS (%) OF VARIOUS FAULT DIAGNOSIS METHODS FOR CSTR PROCESS
Fig. 7. Comparison of different hyperparameters of FC-MSDA on task

D1. (a) Different batch sizes and epochs. (b) Different learning rates and
dilation rates.
adaptations, which indicates that combining multiple source

domains into one source domain may degrade the generation
ability of the model. For domain shifts also exist among different
Fig. 6. Training process of FC-MSDA. (a) Diagnosis accuracy.
(b) Classification loss. (c) Domain adversarial loss. (d) Class-level distri- source domains. Furthermore, FC-MSDA outperforms the other
bution alignment loss. four multisource domain adaptation methods. This illustrates
that aligning distribution discrepancy in both feature-level and
class-level can well address the negative transfer caused by
S→T. For instance, S2, S3, S4→T1 denotes that the model is unrelated information that exists in multiple source domains
trained to implement the diagnosis task on T1 by using the prior and inconsistent label space of each source-target domain pair.
information in S2, S3, and S4. Each working condition contains Moreover, the difficulty of knowledge transfer varies from task to
1200 samples. As shown in Table V, one fault type is randomly task, as the domain gap between the target domain and the source
excluded from the target domain to simulate different label space domains of each task is different. FC-MSDA obtains the best and
between domains. worst performances on D1 and D2, respectively. For task D2,
1) Result Analysis: The training process of FC-MSDA is the source domains do not contain prior knowledge about the
shown in Fig. 6. As shown in Fig. 6(a), when the epoch reaches concentration of reactant A during production. Thus, it is quite
100, almost all the samples are diagnosed accurately. Fig. 6(b)– challenging to obtain satisfactory fault diagnosis performance
(d) exhibites the classification loss, domain adversarial loss, when transferring the knowledge of the source domains S3, S4,
and class-level distribution alignment loss, respectively. As the and S5 to the target domain T2.
training epoch increases, these three losses decline rapidly. This 3) Parameter Sensitivity Analysis: The batch size and
indicates that the domain adversarial training and class-level epoch of FC-MSDA are the main factors that influence the
distribution alignment enhance the feature learning ability and performance of the model. In this section, the sensitivity analysis
domain alignment performance of FC-MSDA. is carried out to illustrate the impact of batch size and epoch on
2) Performance Comparison and Discussion: Table VI the performance of process fault diagnosis. The changes of these
shows the performance comparisons between FC-MSDA and two factors on diagnosis accuracy are exhibited in Fig. 7(a). It
other related methods. It can be observed that the source-only can be observed that the diagnosis precision of FC-MSDA varies
method obtains the worst diagnosis precision. Without domain with different epochs and batch sizes. When the batch size and
adaptation techniques, the distribution discrepancy that existed epoch are set as 60 and 150, respectively, FC-MSDA can achieve
in the feature space and label space across domains degrades the best diagnosis results on the training dataset. Thus, the
the diagnosis performance of the deep network. This demon- batch size and epoch of FC-MSDA are 60 and 150, respectively.
strates the effectiveness of domain adaptation. As shown in Dilated Convolution tends to have a larger receptive field than
Table VI, multisource-based methods outperform the single- regular Convolution. However, the deep network would suffer
source domain adaptation methods by exploring the comple- from grid effect when dilated Convolution with the same dilated
mentary information of different sources. Moreover, the per- rate are stacked. Thus, it is crucial to set the optimal dilation rate
formance of FC-MSDA is superior to source-combine domain for the dilated convolutional layers. In this article, empirical
TABLE VII
RESULTS OF FC-MSDA WITH DIFFERENT COMBINATIONS OF MODULES (%)
TABLE VIII
EMPIRICAL ANALYSIS OF THE NUMBER OF THE SOURCE DOMAIN
Fig. 8. Schematic of three phase process.
TABLE IX
VARIABLES AND FAULT TYPES IN THREE PHASE PROCESS
experiment is performed on task D1 to present the impact of

different settings of Convolution layers and the learning rate.
The learning rate is selected among the set {0.0001, 0.001, 0.005,
0.01, 0.02}. The results in Fig. 7(b) show that FC-MSDA tends to
have better performance when smaller learning rate is adopted.
It can be also observed that using different dilated convolutions
can enhance the diagnosis accuracy of FC-MSDA compared
with adopting normal convolution layer. The best choices of the TABLE X
dilation rate of MDC-GRU and learning rate of FC-MSDA are FAULT PATTERNS AND TASK DESCRIPTION OF THREE PHASE PROCESS
{1, 3, 7} and 0.0001.
4) Ablation Study: An ablation experiment is carried out to
illustrate the contribution of MDC-GRU, class-level distribution
alignment module (CDA) and the information fusion block (IFB)
to the final diagnosis accuracy. It is worth noting that one compo-
nent is removed at a time. The experimental results of FC-MSDA
and its three variants are listed in Table VII. Obviously, the
diagnosis precision decreases when any part is removed from
FC-MSDA, which illustrates that the abovementioned three
components are designed reasonably.
5) Impact of the Number of Source Domain: The empiri-
cal analysis is conducted to analyze the impact of source domain
various faulty occasions by manually seeding faults. In total,
numbers on the diagnosis accuracy, and the results are listed in
24 process variables are collected in total. In this article, 13 of
Table VIII. It can be concluded that: 1) The domain adaptation
these variables are selected to conduct experiments. The relevant
methods generally perform better when more source domains
descriptions of these variables and four kinds of faults are pre-
are adopted. This demonstrates the necessity of transferring
sented in Table IX. The information about working conditions
knowledge from multisource domains; 2) For MDAN, the per-
of this case is listed in Table IX. Although both the water flow
formance on tasks {S2, S3→T5}, {S2, S4→T5}, {S3, S4→T5},
rate and the air flow rate of C1 and C2 are varying, the data in
and {S2, S3, S4→T5} is better than that on task {S1, S2, S3,
each condition has distinct distribution since they are collected
S4→T5}, which may result from the negative transfer caused by
in different time periods. In C3, F1 and F4 are collected when
the unrelated information existed in S1. This further indicates the
the air flow rate is 0.042, and F2 and F4 are collected when the
importance of settling the negative transfer issue in multisource
air flow rate is 0.028. Each working condition contains 2000
domain adaptation.
samples. To simulate the distinct label space of source domain
and target domain, the fault type of the target domain is set as the
B. Case 2: Three Phase Process subset of that in the source domains. The detailed information
In this section, FC-MSDA is applied to three phase process of the fault diagnosis tasks and target class are presented in
to evaluate its diagnosis performance. The three phase process Table X.
[30] is a real industrial process with multiple working conditions 1) Performance Comparison and Result Analysis: The
formed by the multiphase flow facility in Granfield University. recognition results of FC-MSDA and other state-of-art methods
In this process, water, oil, and air are provided by individual are listed in Table XI. FC-MSDA presents better performance
pipelines, as shown in Fig. 8. This process can operate under in diagnosis accuracy and model stability than the other three
TABLE XI
DIAGNOSIS RESULTS (%) OF VARIOUS FAULT DIAGNOSIS METHODS ON THREE PHASE PROCESS
Fig. 9. Confusion matrix analysis on different tasks of FC-MSDA. (a) P1. (b) P2. (c) P3. (d) P4.
on different source domains (i.e., S2, S3, and S4) have different
prediction results on T1, because the datasets in S2, S3, and S4
are collected under different operating conditions. Classifier 1
trained on S2 misclassifies the F2 into F4, and Classifier 2 trained
ON S3 achieves 100% recognition accuracy on Normal-type and
F1. In FC-MSDA, the information fusion module ensembles
the prediction results of different domain-specific classifiers to
obtain the final result, as shown in Fig. 9(a).
Fig. 10. Prediction results of domain-specific classifiers. (a) Classfier1. 1) Feature Visualization: To exhibit the feature learning
(b) Classfier2. (c) Classfier3. performance of FC-MSDA, the learned features of different lay-
ers on task P2 are visualized. Only one source-target domain pair
(i.e., S1 and T2) is selected for the sake of better visualization.
multisource domain adaptation methods. The comparison re- As presented in Fig. 11, the features of the same class in S1
sults between FC-MSDA and single-source domain adaptation and T2 are gathered and the different fault types are well sepa-
methods indicate the necessity of multisource domain adaption rated. This indicates that features extracted by MDC-GRU are
in multisource scenarios. Although the methods of single best both domain-invariant and discriminative. Compared with the
sometimes perform better than multisource domain adaptation features presented in Fig. 11(b) and (c), it can be concluded that
methods, they are less computational efficient since they need to the feature selection module has well separated the outlier class
train the model three times. Additionally, the confusion matrix type that is originally fused with T0, T1, and T2. This further
of FC-MSDA relating to different diagnosis tasks is exhibited illustrates the effectiveness of the feature selection module.
in Fig. 9. It can be observed that FC-MSDA misclassifies fault
3 and fault 4 in tasks P1, P2, and P3. Moreover, FC-MSDA
can always recognize the normal samples well on almost all C. Case 3: FBFP Process
the tasks. Take task P1 as an example, although fault 4 does In this section, fed-batch fermentation penicillin process
not exist in the target domain, there are still some samples to be (FBFP) [31] is further used to verify the diagnosis performance
misclassified as fault 4. Since the irrelevant information in source of FC-MSDA. FBFP is an industrial benchmark process that has
domains would degrade the recognition of the classifiers. This been widely used as a validation case for process fault diagnosis.
further demonstrates that it is necessary to reduce the negative FBFP simulates a nonlinear industrial process of cells from
impact caused by the inconsistent label space between domains. growth to autolysis. FBFP process is complex since the process
Among the four transfer tasks, the best and worst diagnostic status is sensitive to the process variables, i.e., temperature, Ph,
accuracy are achieved on task P2 and P3, respectively. In task substrate concentration, and other factors. The information of
P3, some samples of Fault 4 are misclassified into Fault 1. the considered 12 process variables can be referenced in [31].
Fig. 10 presents the prediction results of each domain-specific Table XII lists the different operating conditions and fault types
classifier on Task P1. It can be seen that the classifiers trained of FBFP. Each working condition contains 2100 samples.
Fig. 11. Feature visualization of FC-MSDA on task P2. (a) Raw data. (b) Common features. (c) Selected common features.
TABLE XII TABLE XIII

FAULT TYPES AND VARYING WORKING CONDITIONS OF FBFP PROCESS DIAGNOSIS RESULTS OF THESE METHODS WITH INCREASING NUMBERS OF
FAULT TYPES IN THE TARGET DOMAIN
samples are available in the target domain during training. In this

section, the testing is further performed based on the assumption
that only healthy samples in the target domain are available in
the training stage on task S1, S2, S4→T3. The testing dataset
of each testing contains three fault types, i.e., N, F1, and F2.
Table XIII presents the diagnosis results of six different methods
with increasing numbers of fault types in the target domain.
The second column of Table XIII lists the diagnosis results
when the healthy samples are available in the target domain.
As shown in the columns 3–5 of Table XIII, experiments are
also performed when the faults (i.e., F1 and F2) are added to the
target domain in a sequential manner during training to show the
Fig. 12. Diagnosis results of various fault diagnosis methods on FBFP impact of fault types on diagnosis performance of FC-MSDA.
process.
The testing dataset of each testing contains three fault types,
i.e., N, F1, and F2. It can be observed that the absence of
To verify the effectiveness of FC-MSDA, the experiments are fault samples in the target domain in the training stage degrades
performed when all the target data are available during the train- the diagnosis performance of FC-MSDA. However, FC-MSDA
ing stage on tasks B1–B4. The diagnosis results of FC-MSDA always maintains the highest diagnosis accuracy among the
and the other five multisource domain adaptation methods (i.e., six methods. As the fault types increase in the target domain
DTLFD [26], MFSAN [27], MDAN [28], ADACL [32], and during training, the diagnosis accuracy of FC-MSDA increases
WDAN [33]) are presented in Fig. 12. ADACL is a multisource accordingly. This demonstrates that it is quite challenging to
fault diagnosis method that adopts multiadversarial training achieve satisfactory diagnosis performance when only healthy
and classifier alignment for domain adaptation. WDAN applied samples are available in the training stage.
different weights to different source domains by measuring the
MMD distance between each source and target domain pair. It
D. Influence of Working Condition Change
can be observed that FC-MSDA presents better performance
compared with the other five methods. As shown in Fig. 12, FC- In practice, the changes in working conditions would lead to
MSDA and WDAN perform better than the other four methods, changes in data distributions and then degrade the diagnostic
for they give equal weights to the prediction results of different performance of the model. In this section, the influence of
classifiers. This illustrates the necessity of assigning different different working conditions is discussed by analyzing the best
weights to different source domains in multisource scenario. and worst-performing tasks of FC-MSDA on the three cases
In practical diagnosis scenarios, it is difficult to collect the data (i.e., CSTR, three phase process and FBFP). The MMD term
for all fault types in the target domain. Sometimes only healthy is adopted to calculate the distribution discrepancy between
TABLE XIV 1) It is an effective multisource domain adaptation method

DISTRIBUTION DISCREPANCY BETWEEN THE SOURCE AND TARGET DOMAIN
DATASETS IN DIFFERENT CASES
for process fault diagnosis under varying working condi-
tions by alleviating the negative effect of the inconsistent
label space of the source domain and target domain.
2) A class-level distribution alignment module is proposed
to reduce the negative transfer between different domains.
3) An information fusion module is developed to ensemble
the prediction results of different domain-specific classi-
fiers for process diagnosis.
The results on the three cases demonstrate that FC-MSDA
is effective for process fault diagnosis across various operating
conditions. Although the testing results are promising, there are
some limitations of FC-MSDA-based process fault diagnosis
method in real applications. First, the implementation of FC-
MSDA is supported by the assumption that fault samples of the
target domain are available in the training phase. However, this
assumption can not always be met in real practice. It will be
interesting to perform process fault diagnosis in the situation
that only healthy samples are available in different operating
conditions for the target domain. Moreover, it is still difficult
for FC-MSDA in detecting new faults that could occur in the
Fig. 13. Probability density distributions of different domains in three process.Future study will be concentrated on adaptive process
phase process. (a) Original datasets. (b) Extracted features after domain
adaptation. fault diagnosis by updating learning scheme of FC-MSDA.
source and target domains of the best-performing task and worst-

performing task. The sum of the data distribution discrepancy REFERENCES
of each source-target domain pair represents the overall data
distribution discrepancy of each task. The testing results are [1] L. Yao and Z. Ge, “Industrial big data modeling and monitoring frame-
work for plant-wide processes,” IEEE Trans. Ind. Inform., vol. 17, no. 9,
shown in Table XIV. It can be observed that the distribution pp. 6399–6408, Sep. 2021.
discrepancy of the worst-performing task is more obvious than [2] A. Hs, A. Mf, and B. Ac, “A survey and classification of incipient fault
that of the best-performing task. This indicates that bigger do- diagnosis approaches,” J. Process Control, vol. 97, pp. 1–16, 2021.
[3] T. Li et al., “Multireceptive field graph convolutional networks for
main gap between source domains and the target domain makes machine fault diagnosis,” IEEE Trans. Ind. Electron., vol. 68, no. 12,
the knowledge transfer more difficult and affects the diagnosis pp. 12739–12749, Dec. 2021.
performance of FC-MSDA. Thus, it is necessary to implement [4] J. Liu et al., “Toward robust fault identification of complex
industrial processes using stacked sparse-denoising autoencoder
domain adaptation for process fault diagnosis under varying with softmax classifier,” IEEE Trans. Cybern., to be published,
operating conditions. Take the best-performing task (i.e., P2) doi: 10.1109/TCYB.2021.3109618.
and worst-performing task (i.e., P3) in three phase process as [5] W. Yu and C. Zhao, “Broad convolutional neural network based industrial
process fault diagnosis with incremental learning capability,” IEEE Trans.
an example, Fig. 13 shows the probability density distributions Ind. Electron., vol. 67, no. 6, pp. 5081–5091, Jun. 2020.
of the original dataset and the extracted features after domain [6] B. Sun and K. Saenko, Deep Coral: Correlation alignment for Deep
adaptation. It can be seen that the domain gap between the Domain Adaptation. New York, NY, USA: Springer, 2016, pp. 443–450.
[7] M. Ghifary, W. B. Kleijn, and M. Zhang, “Domain adaptive neural net-
target domain dataset and the source domain datasets has been works for object recognition,” in Proc. Pacific Rim Int. Conf. Artif. Intell.,
narrowed after domain adaptation. Compared with the average New York, NY, USA, Springer, 2014, pp. 898–904.
accuracy of source-only, the proportions of improvements by [8] K. Zhao, H. Jiang, and Z. Wu, “Transfer residual convolutional neural
network for rotating machine fault diagnosis under different working con-
FC-MSDA in the three cases are 25.47%, 29.10%, and 20.08%, ditions,” in Proc. 12th Int. Conf. Mech. Aerosp. Eng., 2021, pp. 477–483.
respectively. These proportions indicate the impact of working [9] X. Li et al., “A new semi-supervised fault diagnosis method via deep
conditions on diagnosis performance of FC-MSDA. Different CORAL and transfer component analysis,” IEEE Trans. Emerg. Topics
Comput. Intell., vol. 6, no. 3, pp. 690–699, Jun. 2022.
cases show different proportions, as the variation of working [10] H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, and M. Marchand,
conditions in different cases or diagnosis tasks is different. “Domain-adversarial neural networks,” Statistics, vol. 1050, pp. 1–9,
Dec. 2014.
IV. CONCLUSION [11] Z. H. Liu, B. L. Lu, and H. L. Wei, “Deep adversarial domain adaptation
model for bearing fault diagnosis,” IEEE Trans. Syst., Man, Cybern.: Syst.,
In this article, a novel transfer learning model, i.e., FC- vol. 51, no. 7, pp. 4217–4226, Jul. 2021.
MSDA, was proposed for process fault diagnosis in multisource [12] Y. Li, Y. Song, L. Jia, S. Gao, Q. Li, and M. Qiu, “Intelligent fault
diagnosis by fusing domain adversarial training and maximum mean
scenarios, which focused on alleviating the negative transfer discrepancy via ensemble learning,” IEEE Trans. Ind. Inform., vol. 17,
caused by the unrelated information in multiple source domains no. 4, pp. 2833–2841, Apr. 2021.
and inconsistent label space between domains under varying [13] B Rezaeianjouybari and S Yi, “A novel deep multi-source domain adap-
tation framework for bearing fault diagnosis based on feature-level and
working conditions. The advantages of FC-MSDA for process task-specific distribution alignment,” Measurement, vol. 178, no. 3, 2021,
fault diagnosis can be concluded as follows. Art. no. 109359.
[14] R. Wang et al., “Multisource domain feature adaptation network [30] C. Ruiz-Cárcel, Y. Cao, and D. Mba, “Statistical process monitoring of a
for bearing fault diagnosis under time-varying working conditions,” multiphase flow facility,” Control Eng. Pract., vol. 42, pp. 74–78, 2015.
IEEE Trans. Instrum. Meas., vol. 71, Apr. 2022, Art no. 3511010, [31] C. Zhang, J. Yu, and S. Wang, “Fault detection and recognition of multi-
doi: 10.1109/TIM.2022.3168903. variate process based on feature learning of one-dimensional convolutional
[15] J. Tian, D. Han, and M. Li, “A multi-source information transfer learning neural network and stacked denoised autoencoder,” Int. J. Prod. Res.,
method with subdomain adaptation for cross-domain fault diagnosis,” vol. 59, pp. 2426–2449, 2021.
Knowl.-Based Syst., vol. 243, 2022, Art. no. 108466. [32] Y. Zhang, Z. Ren, and S. Zhou, “Adversarial domain adaptation with
[16] J. Zhu, N. Chen, and C. Shen, “A new multiple source domain adaptation classifier alignment for cross-domain intelligent fault diagnosis of multiple
fault diagnosis method between different rotating machines,” IEEE Trans. source domains,” Meas. Sci. Technol., vol. 32, no. 3, 2020, Art. no. 035102.
Ind. Inform., vol. 17, no. 7, pp. 4788–4797, Jul. 2021. [33] A. Dw, B. Th, and C. Fc, “Weighted domain adaptation networks for
[17] X. Li and W. Zhang, “Deep learning-based partial domain adaptation machinery fault diagnosis,” Mech. Syst. Signal Process., vol. 158, 2021,
method on intelligent machinery fault diagnostics,” IEEE Trans. Ind. Art. no. 107744.
Electron., vol. 68, no. 5, pp. 4351–4361, May 2021.
[18] Z. Chai, C. Zhao, and B. Huang, “Multisource-refined transfer network
for industrial fault diagnosis under domain and category inconsistencies,”
IEEE Trans. Cybern., to be published, doi: 10.1109/TCYB.2021.3067786.
[19] Z. Huang et al., “A multi-source dense adaptation adversarial network for
fault diagnosis of machinery,” IEEE Trans. Ind. Electron., vol. 69, no. 6,
Shijin Li received the B.Eng. degree in indus-
pp. 6298–6307, Jun. 2022.
trial engineering from the School of Mining En-
[20] B. Sla, B. Jla, and B. Yha, “Nonlinear process modeling via unidimensional
gineering, China University of Mining and Tech-
convolutional neural networks with self-attention on global and local inter-
nology, Jiangsu, China, in 2021. She is cur-
variable structures and its application to process monitoring,” ISA Trans.,
rently working toward the Ph.D. degree in indus-
vol. 121, pp. 105–118, 2021.
trial engineering with the Department of Indus-
[21] K. Cho et al., “Learning phrase representations using RNN encoder-
decoder for statistical machine translation,” in Proc. Conf. Empirical trial Engineering, Tongji University, Shanghai,
China.
Methods Natural Lang. Process., Jul. 2014, pp. 1724–1734.
Her current research interests include pro-
[22] J. Zhang, Z. Ding, W. Li, and P. Ogunbona, “Importance weighted adver-
cess fault diagnosis and machine learning.
sarial nets for partial domain adaptation,” in Proc. IEEE Conf. Comput.
Vis. Pattern Recognit., Dec. 2018, pp. 8156–8164.
[23] Y. Yao, X. Li, Y. Zhang, and Y. Ye, “Multisource heteroge-
neous domain adaptation with conditional weighting adversarial net-
work,” IEEE Trans. Neural Netw. Learn. Syst., to be published,
doi. 10.1109/TNNLS.2021.3105868.
[24] Y. Zhu et al., “Deep subdomain adaptation network for image classifica-
tion,” IEEE Trans. Neural Netw. Learn. Syst., vol. 32, no. 4, pp. 1713–1722,
Apr. 2021. Jianbo Yu (Member, IEEE) received the B.Eng.
[25] L. Guo, Y. Lei, and S. Xing, “Deep convolutional transfer learning network: degree in industrial engineering from the De-
A new method for intelligent fault diagnosis of machines with unla- partment of Industrial Engineering, Zhejiang
beled data,” IEEE Trans. Ind. Electron., vol. 66, no. 66, pp. 7316–7325, University of Technology, Zhejiang, China, in
Sep. 2019. 2002, the M.Eng. degree in mechanical manu-
[26] X. Li, W. Zhang, Q. Ding, and X. Li, “Diagnosing rotating machines with facturing and automation from the Department
weakly supervised data using deep transfer learning,” IEEE Trans. Ind. of Mechanical Automation Engineering, Shang-
Inform., vol. 16, no. 3, pp. 1688–1697, Mar. 2020. hai University, Shanghai, China, 2005, and the
[27] Y. Zhu, F. Zhuang, and D. Wang, “Aligning domain-specific distribution Ph.D. degree in industrial engineering from the
and classifier for cross-domain classification from multiple sources,” in Department of Industrial Engineering, Shanghai
Proc. AAAI Conf. Artif. Intell., 2019, vol. 33, pp. 5989–5996. Jiaotong University, Shanghai, China, 2009.
[28] Z. Han, S. Zhang, and G. Wu, “Multiple source domain adaptation with ad- From 2009–2013, he was an Associate Professor with Shanghai
versarial learning,” in Proc. Int. Conf. Learn. Representations, Feb. 2018, University, Shanghai, China. Since 2016, he has been a Professor
pp. 1–24. with the School of Mechanical Engineering, Tongji University, Shang-
[29] J. Yu, X. Liu, and L. Ye, “Convolutional long short-term memory hai, China. His research interests include machine maintenance and
autoencoder-based feature learning for fault detection in industrial pro- machine learning.
cesses,” IEEE Trans. Instrum. Meas., vol. 70, 2021, Art no. 3505615, Dr. Yu is an Associate Editor for IEEE TRANSACTIONS ON INSTRUMEN-
TATION MEASUREMENT.
doi: 10.1109/TIM.2020.303964.

A Multisource Domain Adaptation Network For Process Fault Diagnosis Under Different Working Conditions

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Multisource Domain Adaptation Network For Process Fault Diagnosis Under Different Working Conditions

Uploaded by

Copyright:

Available Formats

6272 IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 70, NO.

A Multisource Domain Adaptation Network for

Fig. 1. Network structure of FC-MSDA.

attention module for further modeling the importance of each

Fig. 2. Network structure of MDC-GRU. ht = (1 − zt ) ht−1 + zt h̃t (5)

Fig. 3. Procedures of feature-level distribution alignment.

probably result in negative transfer in distribution alignment.

where yj stands for the domain label, n means the number of

Fig. 5. Diagnosis procedure of FC-MSDA.

1) Single best: Experiments are performed on each source-

Fig. 7. Comparison of different hyperparameters of FC-MSDA on task

adaptations, which indicates that combining multiple source

experiment is performed on task D1 to present the impact of

TABLE XII TABLE XIII

samples are available in the target domain during training. In this

TABLE XIV 1) It is an effective multisource domain adaptation method

source and target domains of the best-performing task and worst-

You might also like