You are on page 1of 10

7316 IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 66, NO.

9, SEPTEMBER 2019

Deep Convolutional Transfer Learning Network:


A New Method for Intelligent Fault Diagnosis of
Machines With Unlabeled Data
Liang Guo, Yaguo Lei , Member, IEEE, Saibo Xing, Tao Yan , and Naipeng Li

Abstract—The success of intelligent fault diagnosis of random forest classifier to recognize the health conditions of
machines relies on the following two conditions: 1) labeled a gearbox. The proposed method was validated by vibration
data with fault information are available; and 2) the train- signals acquired from a gearbox experiment platform, which
ing and testing data are drawn from the same probabil-
ity distribution. However, for some machines, it is difficult simulated six kinds of gear faults. Rauber et al. [5] used an
to obtain massive labeled data. Moreover, even though la- intelligent model to distinguish health conditions of rotating
beled data can be obtained from some machines, the in- machines. The performance of the used method was validated
telligent fault diagnosis method trained with such labeled using a labeled bearing dataset from Case Western Reserve Uni-
data possibly fails in classifying unlabeled data acquired
versity (CWRU). Lee et al. [6] built a convolutional neural net-
from the other machines due to data distribution discrep-
ancy. These problems limit the successful applications of work (CNN) to recognize the health conditions of machines. An
intelligent fault diagnosis of machines with unlabeled data. experiment dataset including six kinds of health conditions of
As a potential tool, transfer learning adapts a model trained chemical vapor deposition machines was applied to demonstrate
in a source domain to its application in a target domain. the effectiveness of the proposed method. Su et al. [7] proposed
Based on the transfer learning, we propose a new intelligent a least-square support-vector-machine-based multifault diagno-
method named deep convolutional transfer learning net-
work (DCTLN). A DCTLN consists of two modules: condition sis method. The proposed method was verified via experimental
recognition and domain adaptation. The condition recog- data acquired from a roller bearing fault experiment rig.
nition module is constructed by a one-dimensional (1-D) Through the literature review, it can be seen that the datasets
convolutional neural network (CNN) to automatically learn applied to validate the effectiveness of various intelligent fault
features and recognize health conditions of machines. The diagnosis methods satisfy the following two conditions: 1) la-
domain adaptation module facilitates the 1-D CNN to learn
domain-invariant features by maximizing domain recogni- beled data with fault information are available; and 2) the train-
tion errors and minimizing the probability distribution dis- ing and testing data are drawn from the same probability distri-
tance. The effectiveness of the proposed method is verified bution. As a matter of fact, it is demonstrated in [8]–[10] that
using six transfer fault diagnosis experiments. the success of intelligent fault diagnosis of machines relies on
Index Terms—Bearing, convolutional neural network those two conditions. However, for some machines, it is difficult
(CNN), intelligent fault diagnosis, transfer learning. to satisfy those two conditions due to the following problems.
1) Labeled fault data are difficult to be obtained from some
I. INTRODUCTION machines [11]. Specifically speaking, there are two main
NTELLIGENT fault diagnosis is able to handle massive reasons causing the lack of labeled fault data. First, ma-
I monitoring data and distinguish health conditions of ma-
chines [1]–[3]. In recent years, various intelligent fault diagno-
chines may not be allowed to run to failure since an unex-
pected fault usually leads to the breakdown of machines
sis methods have been studied. Cerrada et al. [4] proposed a or even catastrophic accidents. In such cases, fault data
are impossible to be obtained. Second, machines gener-
Manuscript received February 26, 2018; revised May 26, 2018 and
ally go through a long degradation process from a healthy
August 25, 2018; accepted September 30, 2018. Date of publication condition to failure. It means that it is time consuming
October 26, 2018; date of current version April 30, 2019. This work was and expensive to obtain fault data of machines [12].
supported in part by the National Natural Science Foundation of China
under Grant U1709208 and Grant 61673311, and in part by the Young
2) An intelligent fault diagnosis method trained with la-
Talent Support plan of Central Organization Department. (Corresponding beled data acquired from one machine possibly fails in
author: Yaguo Lei.) classifying unlabeled data acquired from the other ma-
The authors are with the Key Laboratory of Education Ministry for
Modern Design and Rotor-Bearing System, Xi’an Jiaotong University,
chines. Although massive labeled data are difficult to be
Xi’an 710049, China (e-mail:, guoliang@mail.xjtu.edu.cn; yaguolei@ obtained for some machines, they can still be obtained
mail.xjtu.edu.cn; xingsaibo@stu.xjtu.edu.cn; yantao_0421@163.com; from different but related machines. For example, la-
li3112001096@stu.xjtu.edu.cn).
Color versions of one or more of the figures in this paper are available
beled data of railway locomotive bearings are difficult to
online at http://ieeexplore.ieee.org. be obtained, while the labeled data of motor bearings can
Digital Object Identifier 10.1109/TIE.2018.2877090 be relatively easier to be obtained. However, probability

0278-0046 © 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications standards/publications/rights/index.html for more information.

Authorized licensed use limited to: Cornell University Library. Downloaded on September 02,2020 at 13:07:55 UTC from IEEE Xplore. Restrictions apply.
GUO et al.: DCTLN: A NEW METHOD FOR INTELLIGENT FAULT DIAGNOSIS OF MACHINES WITH UNLABELED DATA 7317

distributions of data acquired from different machines domain adaptation. The condition recognition module seeks
are different [13]. Therefore, the classification perfor- to automatically learn features and accurately recognize the
mance of intelligent fault diagnosis methods degenerates health conditions of machines. The domain adaptation module
when the training and testing datasets are acquired from includes a domain classifier and a domain distribution dis-
different machines. crepancy metrics term, which makes the learned features to be
The aforementioned two problems limit the successful appli- domain invariant. With those two modules, the DCTLN trained
cations of intelligent fault diagnosis of machines where no la- with labeled data acquired from one machine is expected to
beled data are available. To promote the successful applications effectively classify the unlabeled data acquired from the other
of intelligent fault diagnosis of machines with unlabeled data, machines. Using three bearing datasets acquired from different
a new intelligent method is in urgent need. For such method, machines, six transfer fault diagnosis experiments are con-
the model trained with labeled data acquired from one machine ducted to demonstrate the effectiveness of the proposed method.
is able to be generalized to the unlabeled data acquired from The results indicate that the proposed method improves the
the other machines. Transfer learning is able to use the learned recognition accuracies of bearing health conditions about 32.1%
knowledge from the source domain to solve a new but related compared with traditional methods without transfer learning.
task in the target domain [14], [15]. It is expected to solve the The main insights and contributions of this paper are summa-
problem that there are no sufficient labeled data to train a reli- rized as follows.
able intelligent model. An intuitive and commonly used idea of 1) A new deep transfer learning method is proposed. The
transfer learning is to obtain a feature representation where the proposed method includes a condition recognition mod-
different domains are close to each other while keeping good ule and a domain adaptation module. In the condition
classification performance on the source data [16]–[18]. Deep recognition module, a one-dimensional (1-D) CNN is
learning is able to learn deep hierarchical representation of the constructed to learn features from the raw vibration data,
data, which is considered to provide the cross-domain-invariant and then, the health condition classifier can classify the
features for transfer learning. Transfer leaning based on these samples based on these features. In the domain adapta-
cross-domain-invariant features can effectively reduce the dis- tion, a domain classifier and a distribution discrepancy
crepancy between the source and the target domains. Based metrics are built to help learn domain-invariant features.
on this idea, deep-transfer-learning-based methods present 2) The intelligent fault diagnosis for machines with unla-
successful applications in various tasks [19]–[22]. beled data is explored. Generally, the training and testing
Currently, several transfer-learning-based intelligent fault di- dataset should be acquired from one machine for an in-
agnosis methods have been proposed [13], [23], [24]. Lu et al. telligent fault diagnosis method. In this paper, we explore
[13] presented a deep-model-based domain adaptation method a new scenario of intelligent fault diagnosis, where the
for the machine fault diagnosis. A gearbox dataset collected training and testing dataset are acquired from different
under different operation conditions was used to test the per- machines and the data from the monitoring machines are
formance of the proposed method. Wen et al. [23] set up a unlabeled. This exploration would promote the practical
new deep transfer learning method for fault diagnosis. The val- application of intelligent fault diagnosis.
idation dataset was acquired from a bearing testbed operating The rest of this paper is organized as follows. Section II
under different loading conditions. Xie et al. [24] proposed details the transfer learning problem. Section III includes the
a transfer-analysis-based gearbox fault diagnosis method. The proposed method. In Section IV, the transfer fault diagnosis ex-
performance of the presented method was verified by a gearbox periments between three bearing datasets are conducted. Finally,
dataset obtained under various operation conditions. conclusions are drawn in Section V.
It can be observed from those works that the existing transfer-
learning-based intelligent fault diagnosis methods mainly focus
on the transfer between different operation conditions. Those
studies demonstrate that transfer learning allows the intelligent II. TRANSFER LEARNING PROBLEM
fault diagnosis methods to be applicable across datasets acquired In order to clearly state the problem to be solved, a basic
from a machine under different operation conditions. Actually, notation on transfer learning, domain, is first introduced. Let
in real-world applications, the massive labeled data are difficult χ be a feature space, X be a particular sample and P (X) be
to be obtained from some machines. Therefore, the transfer fault a marginal probability distribution. Then, a domain is defined
diagnosis between machines is essential and crucial. Only if the by D = {χ, P (X)}, where X = {x1 , x2 , . . . , xn } ∈ χ, and xi
transfer fault diagnosis between machines works well, the health is the ith feature term [14]. As shown in Fig. 1, traditional
conditions of machines with unlabeled data can be recognized intelligent methods are trained using labeled data, and tested
by the intelligent method trained with labeled data acquired on the future data with the same feature space and probability
from the other machines. distribution. It indicates that the training and testing data come
Therefore, in order to promote the successful applications from the same domain. Transfer learning, however, allows the
of intelligent fault diagnosis of machines with unlabeled distributions of the training and testing data to be different [14].
data, we propose a new deep transfer learning method named Generally, given a source domain DS with labeled training data,
deep convolutional transfer learning network (DCTLN). A and a target domain DT with unlabeled data, transfer learning
DCTLN consists of two modules: condition recognition and aims to improve the capacity of the target predictive function in

Authorized licensed use limited to: Cornell University Library. Downloaded on September 02,2020 at 13:07:55 UTC from IEEE Xplore. Restrictions apply.
7318 IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 66, NO. 9, SEPTEMBER 2019

health condition classifier recognizes health conditions based


on the extracted features. Domain adaptation is completed by
a domain classifier and a distribution discrepancy metrics. The
domain adaptation module is connected to the feature extractor
to help the 1-D CNN learn domain-invariant features.
1) Condition Recognition: Condition recognition is
achieved by a 1-D CNN with 16 layers including one input layer,
six convolutional layers, six pooling layers, two fully connected
layers, and one output layer. The detailed information of the 1-D
CNN can be found in Table I. Among those 16 layers, the first
15 layers can be thought of a feature extractor, and the last layer
is seen as a health condition classifier.
The input layer is built by input vibration signals with a length
of L. The next one is the convolutional layer. In the convolutional
Fig. 1. Intelligent learning systems. (a) Traditional intelligent method. layer, the convolution kernel convolutes with input data to learn
(b) Transfer-learning-based intelligent method.
features. Since vibration signals are 1-D, the convolution is
designed as a 1-D operation. Concretely, the 1-D convolution
the domain DT using the knowledge learned from the domain is to take a dot product between a kernel wc ∈ Rm and the
DS . jth segmented signal sij −m +1:j ∈ Rm to obtain convolution
In this paper, an intelligent fault diagnosis method trained features
with labeled data acquired from one machine is expected to  n 
recognize health conditions of other machines with unlabeled 
data. In other words, the intelligent method is able to accom- cj = Relu wc ∗sj −m +1:j + bc
i
(1)
plish transfer fault diagnosis between machines. More specifi- i=1

cally, we denote the labeled data acquired from one machine as where ∗ is a 1-D convolution operator, wc is referred to as the
the source domain, which is referred to as DS = {{xS i , yS i }}, convolution kernel, bc is the corresponding bias, n is the number
where xS i ∈ χS is a data sample and yS i ∈ YS is its correspond- of kernels, and cj is the jth output point of the convolutional
ing label. Similarly, we denote the unlabeled data acquired from layer. Relu(·) is an activation function.
the other machines as the target domain which is DT = {(xT i )}, Each convolutional layer is connected with a pooling layer, in
where a data sample xT i is in χT . Note that, in this paper, it is which a pooling operation is conducted to reduce the dimension
assumed that the source and target domains share the same label of convolution features and to enable the learned convolution
space. In other words, the source and target domains only dif- features to be shift invariant. The max pooling function is uti-
fer in their respective data probability distributions. In order to lized in this paper, which returns the maximum value within a
obtain the desirable result of this paper, the proposed intelligent certain subregion as follows:
method should be able to learn domain-invariant features. Learn-
ing domain-invariant features means that the features should be pj = max{cj ×k :(j +1)×k } (2)
subject to the same or almost the same distribution, no matter
the source domain data or target domain data they are learned where k is the pooling length, and pj is the pooling output of
from. If the features are domain invariant, then the health con- the jth point.
dition classifier trained with the source domain data is able to After six convolution and pooling operations, the input vibra-
effectively classify the features learned from the target domain tion signals are mapped into features in layer P6. Then, the first
data. Therefore, learning domain-invariant features is a critical fully connected layer FC1 is flatten from the output of layer P6.
procedure for accomplishing transfer fault diagnosis between The second fully connected layer FC2 is calculated as follows:
machines.
f = σ((wf )T sm + bf ) (3)
III. PROPOSED METHOD
where wf is the weight matrix connecting two fully connected
In this section, we detail the architecture and training process layers, bf is the corresponding bias vector, and sm is the input
of the proposed method. data. Based on the output f2 of layer FC2, the health conditions
of machines are estimated in the output layer FO of the health
A. Deep Convolutional Transfer Learning recognition module through the softmax regression as
Network (DCTLN) ⎡ ⎤
T

As shown in Fig. 2, the proposed DCTLN consists of two e((w 1 ) f 2+b)


⎢ ((w ) T f 2+b) ⎥
modules: condition recognition and domain adaptation. Condi- 1 ⎢e 2 ⎥
⎢ ⎥
tion recognition is achieved by a 1-D CNN. The 1-D CNN in- y = K ⎢ . ⎥ (4)
i=1 e
((w i ) T
f 2+b) ⎢ . ⎥
cludes a feature extractor and a health condition classifier. The ⎣ . ⎦
T
feature extractor seeks to automatically learn features, and the e((w k ) f 2+b)

Authorized licensed use limited to: Cornell University Library. Downloaded on September 02,2020 at 13:07:55 UTC from IEEE Xplore. Restrictions apply.
GUO et al.: DCTLN: A NEW METHOD FOR INTELLIGENT FAULT DIAGNOSIS OF MACHINES WITH UNLABELED DATA 7319

Fig. 2. Structure illustration of the proposed method.

TABLE I Let P (f 2(S ) ) and Q(f 2(T ) ) be the probability distributions


ARCHITECTURE OF THE 1-D CNN
of FC2S and FC2T, where FC2S and FC2T are the output of
layer FC2 with the source domain and target domain data,
respectively. The distance between P (f 2(S ) ) and Q(f 2(T ) )
is calculated by the maximum mean discrepancy (MMD) as
follows:
2
 
1 ns 1
nt
(T )
D= (S )
f 2i − f2 (6)
ns i=1 nt j =1 j
H

where ns is the number of training samples from the source


domain, nt is the number of training samples from the target
domain, and  · H is a reproducing kernel Hilbert space.

B. Optimization Objective
The proposed DCTLN has the following three optimization
objects.
1) minimize the health condition classification error on the
where wi is the weight matrix connecting to the ith output source domain dataset;
neuron, b is the corresponding bias vector, and K denotes the 2) maximize the domain classification error on the source
health condition categories. and target domain dataset;
2) Domain Adaptation: The domain adaptation module in- 3) minimize the MMD distance between the source and
cludes a domain classifier and a domain distribution discrepancy target domain dataset.
metrics. 1) Object 1: In order to accomplish transfer fault diag-
As shown in Fig. 2, the domain classifier includes two layers: nosis, DCTLN should be able to recognize health conditions
a fully connected layer FC3 and a domain discrimination output and learn domain-invariant features. Specifically, the condition
layer DO. The FC3 layer is calculated by (3), where the input recognition module is designed to recognize the health condi-
data sm is the output of the layer FC2. The domain discrimina- tions of machines. Therefore, the first optimization object of the
tion output layer DO is a binary classifier setting with logistics DCTLN is to minimize the health condition classification error
regression on the source domain data. For a dataset with k health condition
1 categories, the desired objective function can be defined as a
d= T , (5) standard softmax regression loss
1 + e−((w d ) f 3+b d )
⎡ ⎤
where wd is the weight matrix in domain adaptation module, 
m  k ((w j ) T f 2+b)
1 e
bd is the corresponding bias vector, and f 3 is the output of the Lc = − ⎣ I[yi = k] log k ⎦ (7)
m i=1 j =1 ((w l ) T f 2+b)
layer FC3. l=1 e

Authorized licensed use limited to: Cornell University Library. Downloaded on September 02,2020 at 13:07:55 UTC from IEEE Xplore. Restrictions apply.
7320 IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 66, NO. 9, SEPTEMBER 2019

where m is the batch size of the training samples, k is the fault Combining those three optimization objects, the final
categories, and I[·] is an indicator function. optimization object can be written as
2) Object 2: The domain adaptation module is designed to
L = Lc − λLd + μD̂ (11)
learn domain-invariant features. The domain adaptation mod-
ule includes a domain classifier and a distribution discrepancy where the hyperparameters λ and μ determine how strong the
metrics. As shown in Fig. 2, the domain classifier is connected domain adaptation is.
with the feature extractor. Based on the theory suggestion on Once the optimization object of the propose method is built,
domain adaptation [15], if a domain classifier cannot discrimi- it is convenient to train the proposed method by the stochastic
nate features between the source and target domain, the features gradient descent (SGD) algorithm. As shown in Fig. 2, let θf ,
are domain invariant. Therefore, the second optimization object θc , and θd be the parameters of the feature extractor, health
of the DCTLN is to maximize the domain classification error condition classifier, and domain classifier, respectively. The loss
on the source and target domain data. The domain classification function equation (11) is rewritten as follows:
loss is defined as L(θf∗ , θc∗ , θd∗ ) = min Lc (θf , θc ) − λLd (θf , θd ) + μD̂(θf ).
θ f ,θ c ,θ d
1 
m
Ld = (gi log d(xi ) + (1 − gi ) log(1 − d(xi ))) (8) (12)
m i=1 Based on the aforementioned equation and the SGD algo-
rithm, the parameters θf , θc , and θd are updated as follows:
where gi is the ground-truth domain label, and d(xi ) denotes the  
output domain for ith sample, which indicates whether xi come ∂Lc ∂Ld ∂ D̂
θf ← θf − ε −λ +μ (13)
from the source domain or the target domain. In the training ∂θf ∂θf ∂θf
stage, the training dataset is constructed by ns source domain
data samples and nt target domain data samples. Therefore, (8) ∂Lc
θc ← θc − ε (14)
can be rewritten as follows: ∂θc
∂Ld
1  1 
ns nt
(S ) (T) θd ← θd − ε (15)
Ld = Ld (f 2i ) + Ld (f 2j ) (9) ∂θd
ns i=1 nt j =1
where ε is the learning rate.
(S ) (T ) When the training process is completed, the health condi-
where f 2i and f 2j is the high-level learned features from
tion classifier is able to correctly classify unlabeled samples
the source domain data and the target domain data, respectively.
in the target domain if the learned features are subject to am-
3) Object 3: In the proposed DCTLN, the output of the
biguous domain categories and small domain discrepancy. For
layer FC2 is the high-level features that connect to the health
the testing process of the DCTLN, the input of the DCTLN
condition classifier. In other words, the high-level features di-
is unlabeled data from the target domain. A DCTLN learns
rectly influence the effectiveness of transfer fault diagnosis. In
domain-invariant features from these data first. Then, the health
order to reduce the distribution discrepancy distance between
condition classifier predicts the health condition according to
features learned from different domains, the distribution dis-
the learned domain-invariant features.
crepancy distance between the FC2S and FC2T is directly mea-
sured. Therefore, the third optimization object of the DCTLN is
IV. EXPERIMENT RESULTS AND COMPARISONS
to minimize the distribution discrepancy distance between the
source and target domain data. To calculate the distribution dis- A. Dataset
tance of high-level learned features between different domains, As discussed in Section I, in order to promote the successful
the practical computation of the MMD is written as applications of intelligent fault diagnosis of machines with un-
1 
ns ns labeled data, the intelligent fault diagnosis method trained with
(S ) (S )
D̂ = k(f 2i , f 2j ) labeled data acquired from one machine is expected to classify
n2s i=1 j =1
unlabeled data acquired from the other machines effectively.
Therefore, in this section, three datasets acquired from three
1 
nt nt
(T ) (T )
+ 2 k(f 2i , f 2j ), different but related machines are used to conduct six transfer
nt i=1 j =1 fault diagnosis experiments.
1) A: CWRU Bearing Dataset: CWRU bearing dataset was
2 
ns nt
(S ) (T ) collected from an experiment platform provided by the
− k(f 2i , f 2j ) (10)
ns nt i=1 j =1 CWRU [26]. On this experiment platform, experiments
were conducted using an electric motor, and vibration
where D̂ is the unbiased estimation of D(P, Q), and k(·, ·) is data were measured from the motor bearings. Faults were
a kernel function. According to [25], the Gaussian radial basis introduced separately at the inner raceway, rolling ele-
function (RBF), i.e., k(x, y) = exp(x − y2 /2σ 2 ), maps the ment, and outer raceway of bearings. The CWRU bearing
original features to a higher dimensional space, and is widely dataset is composed of vibration signals acquired from the
used for the MMD calculation. Therefore, in this paper, an RBF aforementioned health conditions. Each health condition,
is chosen to estimate the MMD between domains. i.e., normal condition (NC), inner race fault (IF), outer

Authorized licensed use limited to: Cornell University Library. Downloaded on September 02,2020 at 13:07:55 UTC from IEEE Xplore. Restrictions apply.
GUO et al.: DCTLN: A NEW METHOD FOR INTELLIGENT FAULT DIAGNOSIS OF MACHINES WITH UNLABELED DATA 7321

Fig. 3. Six transfer fault diagnosis experiments.

TABLE II A total of 4000 data samples are in this dataset, and the
VARIOUS BEARING DATASETS
length of each sample is also 1200.
3) C: Railway Locomotive (RL) Bearing Dataset: RL bear-
ing dataset was acquired from real rolling element bear-
ings [1]. The RL bearing was installed on a test bench.
An accelerometer was mounted on the outer race of the
test bearing to measure its vibration. The health condition
types and number of data samples are the same as those
of the CWRU and IMS bearing dataset.
The vibration signals of those three datasets are shown in
Fig. 3, and their detailed information are displayed in Table II.
Those three datasets are all bearing vibration signals, while they
are acquired from different machines and different operation
conditions.

race fault (OF), and ball fault (BF), has 1000 data sam-
B. Transfer Fault Diagnosis of the DCTLN
ples. Thus, this dataset consists of 4000 data samples, and
the length of each sample is 1200. As shown in Fig. 3, we evaluate the proposed DCTLN on six
2) B: Intelligent Maintenance System (IMS) Bearing transfer fault diagnosis experiments, i.e., A→B, B→A, A→C,
Dataset: IMS bearing dataset was generated by the Center C→A, B→C, and C→B. In each transfer fault diagnosis experi-
for Intelligent Maintenance Systems, which was also col- ment, the part before the arrow represents the source domain, and
lected from a bearing experiment platform [27]. Four dou- that after the arrow refers to the target domain. For example, for
ble row bearings were installed on a shaft. On the bearing the transfer fault diagnosis experiment A→B, the CWRU bear-
housings, accelerometers were installed to acquire vibra- ing dataset is the source domain, and the IMS bearing dataset is
tion signals. Through several run-to-failure experiments, the target domain. We follow the standard evaluating protocol
outer race failures, inner race failures, and ball failures oc- for unsupervised transfer learning tasks. In each experiment, the
curred in three bearings, respectively. We choose the data training dataset includes all the labeled data samples from the
collected from failure conditions to construct IMS bear- source domain data and half of the unlabeled data samples from
ing dataset. Therefore, this dataset includes three types the target domain. The other half of data samples from the target
of failure conditions and one type of normal condition. domain are used for testing.

Authorized licensed use limited to: Cornell University Library. Downloaded on September 02,2020 at 13:07:55 UTC from IEEE Xplore. Restrictions apply.
7322 IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 66, NO. 9, SEPTEMBER 2019

TABLE III
RECOGNITION RESULTS OF VARIOUS METHODS

TABLE IV
VARIOUS TRANSFER LEARNING METHODS

C. Comparison Results
In order to further demonstrate the effectiveness of the pro-
Fig. 4. Penalty parameter and training loss of the proposed DCTLN. posed DCTLN, five methods are used for comparison on the six
(a) Penalty parameter. (b) Training loss.
transfer fault diagnosis experiments. As shown in Table IV, the
five comparison methods are CNN trained only by the source
data, transfer component analysis (TCA) [29], deep domain
The detailed parameters for each experiment are set as fol- adaptation neural-network-based fault diagnosis (DAFD) [30],
lows. In the condition recognition module, the sizes of convolu- deep domain confusion (DDC) [25], and domain adversarial
tion kernel and pooling kernel are set to be 5 and 2, respectively. training of neural networks (DANN) [28]. Based on different
In the domain adaptation module, the RBF with the bandwidth comparison purposes, those five methods are classified into three
σ of median pairwise distances on the training data is used to types.
calculate the MMD distance between high-level learned features 1) Comparison to the method with no transfer learning. The
from the source and target domains. As shown in Fig. 4(a), the first type is designed to illustrate the improvement of
penalty parameters λ and μ are gradually changed from 0 to 1 us- transfer-learning-based methods for transfer fault diag-
ing the formula 2/(1 + exp(−10 × p)) − 1, and p is the training nosis task where data from the target domain are unla-
progress that changes from 0 to 1. The method is trained using beled. The comparison method is CNN trained only by
the SGD with a learning rate, 0.001/(1 + 10 × p)0.75 [28]. The the source data, in which a CNN is trained just by the
training set and testing set of each experiment are listed in Fig. 3. data from a source domain.
The batch size is set as 256. Half of each batch is populated by 2) Comparison to handcrafted-feature-based transfer learn-
the data samples from the source domain, and the rest is consti- ing methods. The second type is designed to illustrate
tuted by the ones from the target domain. The training step is set effects of learned features for transfer fault diagnosis
to be 3000. Take transfer fault diagnosis A→B as an example, tasks. Two handcrafted-feature-based methods, TCA and
the loss function of the DCTLN is plotted in Fig. 4(b). As shown DAFD, are applied to compare with the DCTLN. The
in Fig. 4(b), the training loss of the proposed DCTLN converges TCA is a conventional transfer learning method using
after about 2000 training epoch. MMD-regularized subspace learning, which is built with
With those parameters, each transfer fault diagnosis exper- some handcrafted features. In the TCA method, 14 hand-
iment is repeated ten times. The results of six transfer fault crafted features, i.e., mean, RMS, kurtosis, variance, crest
diagnosis experiments are shown in Table III. In each experi- factor, wave factor, and eight energy ratios of the wavelet
ment, all of the transfer fault diagnosis accuracies are over 82% package transform, are used. A DAFD is a deep domain
and some are even over 89%. It means that the proposed DCTLN adaptation fault diagnosis model, in which the input sig-
is able to effectively recognize the health conditions of bearings nal of the model is the frequency spectrum of vibration
where no labeled data are available. signals. Note that the DAFD is also able to learn features.

Authorized licensed use limited to: Cornell University Library. Downloaded on September 02,2020 at 13:07:55 UTC from IEEE Xplore. Restrictions apply.
GUO et al.: DCTLN: A NEW METHOD FOR INTELLIGENT FAULT DIAGNOSIS OF MACHINES WITH UNLABELED DATA 7323

Fig. 5. t-SNE visualization of features. (a) Hand-crafted features. (b) Spectrum learned features. (c) Raw signal learned features. (d) TCA.
(e) DAFD. (f) DDC. (g) DANN. (h) The proposed method.

However, it learns features from the frequency spectrum 2) The learned features outperform the handcrafted features
of vibration signals rather than vibration signals. There- for the transfer fault diagnosis tasks. In the comparison
fore, we consider it as a method based on handcrafted methods, TCA uses 14 classical handcrafted features and
features. DAFD extracts features from the frequency spectrum of
3) Comparison to state-of-the-art transfer learning methods. vibration signals. The other methods, i.e., DDC, DANN,
The third type is designed to compare two widely used and the proposed DCTLN, directly learn features from
transfer learning methods, i.e., DDC and DANN, with the vibration signals. From the results, it can be seen
DCTLN. DDC adds an adaptation layer and a distribution that three learned feature-based methods outperform two
match term MMD in a CNN architecture to ensure that handcrafted-feature-based methods. The possible reason
the learned features are domain invariant. DANN adds is that the extracted features may discard some useful
a domain discriminative component into a deep neural condition representation information compared with the
network. The method is expected to learn representation features learned from raw signals. Additionally, it also
features that are predictive for the source domain data means that deep learning trained with raw signals may
samples, but uninformative about the domain of the input. be able to reduce the distribution discrepancy of learned
The classification accuracies and standard errors on six trans- features between domains.
fer fault diagnosis experiments are shown in Table III. It can be 3) Compared with the two widely used transfer learning
seen that the proposed DCTLN outperforms all of the methods methods, i.e., DDC and DANN, the proposed DCTLN
for comparison on the six transfer fault diagnosis experiments. obtains higher recognition accuracies of health condi-
More specifically, through the comparison results, we obtain the tions in six transfer fault diagnosis experiments. This
following three observations. validates that DCTLN reduces the distribution discrep-
1) Transfer-learning-based method outperforms the classi- ancy between domain data more effectively than the two
cal method without transfer learning for the transfer fault widely used transfer learning methods, DDC and DANN.
diagnosis tasks where no labeled data are available in the Additionally, the relatively high recognition accuracies
target domain. The only difference between DCTLN and confirm the practicability of the proposed DCTLN.
the CNN trained only by source data is that a domain In order to provide visual insights into the effects of transfer
adaptation module is added into the DCTLN, while the learning on the distribution discrepancy of features from the
results show that DCTLN obtains higher classification source and target domains, we use the t-distributed stochastic
accuracies than the CNN trained only by source data. neighbor embedding (t-SNE) [31] technique to map the high-
It means that transfer learning may be a promising tool dimensional features into a 2-D space. The first transfer fault
to promote the successful application of intelligent fault diagnosis experiment A→B is taken as an example, and the re-
diagnosis of machines with unlabeled data. sults are shown in Fig. 5. Fig. 5(a)–(c) plots features without

Authorized licensed use limited to: Cornell University Library. Downloaded on September 02,2020 at 13:07:55 UTC from IEEE Xplore. Restrictions apply.
7324 IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 66, NO. 9, SEPTEMBER 2019

transfer learning processing. It can be seen that the distribu- [6] K. B. Lee, S. Cheon, and C. O. Kim, “A convolutional neural network
tions of learned features from the source and target domains are for fault classification and diagnosis in semiconductor manufacturing pro-
cesses,” IEEE Trans. Semicond. Manuf., vol. 30, no. 2, pp. 135–142, May
closer than the ones of the handcrafted features. This confirms 2017.
the current practice that deep neural networks learn features that [7] Z. Su, B. Tang, Z. Liu, and Y. Qin, “Multi-fault diagnosis for rotating ma-
can reduce the domain distribution discrepancy. Fig. 5(d)–(h) chinery based on orthogonal supervised linear local tangent space align-
ment and least square support vector machine,” Neurocomputing, vol. 157,
plots features with transfer learning processing, namely trans- pp. 208–222, 2015.
ferred feature, where Fig. 5(d), (e), and (f)–(h) correspond to [8] R. Zhang, H. Tao, L. Wu, and Y. Guan, “Transfer learning with neural
Fig. 5(a), (b), and (c), respectively. For example, Fig. 5(d) plots networks for bearing fault diagnosis in changing working conditions,”
IEEE Access, vol. 5, pp. 14347–14357, 2017.
the transferred features, where the corresponding features with- [9] W. Qiao and D. Lu, “A survey on wind turbine condition monitoring and
out transfer learning processing are shown in Fig. 5(a). From fault diagnosis—Part II: Signals and signal processing methods,” IEEE
the visualization of features with and without transfer learning Trans. Ind. Electron., vol. 62, no. 10, pp. 6546–6557, Oct. 2015.
[10] Y. Lei, F. Jia, J. Lin, S. Xing, and S. X. Ding, “An intelligent fault diagnosis
processing, we can find that the distributions of transferred fea- method using unsupervised feature learning towards mechanical Big data,”
tures from the source and target domains are closer than the ones IEEE Trans. Ind. Electron., vol. 63, no. 5, pp. 3137–3147, May 2016.
of the features without transfer learning processing. It validates [11] I. H. Witten, E. Frank, M. A. Hall, and C. J. Pal, Data Mining: Practical
Machine Learning Tools and Techniques. San Mateo, CA, USA: Morgan
that transfer learning processing is able to reduce the distri- Kaufmann, 2016.
bution discrepancy of data acquired from different machines. [12] Y. Lei, N. Li, L. Guo, N. Li, T. Yan, and J. Lin, “Machinery health prog-
Additionally, we can find that features learned by the DCTLN nostics: A systematic review from data acquisition to RUL prediction,”
Mech. Syst. Signal Process., vol. 104, pp. 799–834, 2018.
exhibit tighter health condition class clustering while mixing [13] W. Lu, B. Liang, Y. Cheng, D. Meng, J. Yang, and T. Zhang, “Deep model
the feature distribution between domains than features shown in based domain adaptation for fault diagnosis,” IEEE Trans. Ind. Electron.,
Fig. 5(a)–(g). vol. 64, no. 3, pp. 2296–2305, Mar. 2017.
[14] S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Trans. Knowl.
Data Eng., vol. 22, no. 10, pp. 1345–1359, Oct. 2010.
V. CONCLUSION [15] S. Ben-David, J. Blitzer, K. Crammer, A. Kulesza, F. Pereira, and J. W.
Vaughan, “A theory of learning from different domains,” Mach. Learn.,
In this paper, we introduced transfer learning into the field of vol. 79, no. 1–2, pp. 151–175, 2010.
intelligent fault diagnosis of machines with unlabeled data and [16] C. Persello and L. Bruzzone, “Kernel-based domain-invariant feature se-
lection in hyperspectral images for transfer learning,” IEEE Trans. Geo-
proposed a new intelligent method, DCTLN, for transfer fault science Remote Sens., vol. 54, no. 5, pp. 2615–2626, May 2016.
diagnosis tasks. We demonstrated the ability of our proposed [17] M. Long, Y. Cao, J. Wang, and M. Jordan, “Learning transferable features
method through six transfer fault diagnosis experiments where with deep adaptation networks,” in Proc. Int. Conf. Mach. Learn., 2015,
pp. 97–105.
no labeled data are available in the target domain. The following [18] K. D. Feuz and D. J. Cook, “Transfer learning across feature-rich hetero-
three conclusions were drawn from the experiment results. geneous feature spaces via feature-space remapping (FSR),” ACM Trans.
1) The transfer-learning-based intelligent fault diagnosis Intell. Syst. Technol., vol. 6, no. 1, pp. 1–27, 2015.
[19] K. Yan and D. Zhang, “Correcting instrumental variation and time-varying
methods obtain higher recognition accuracies of bearing drift: A transfer learning approach with autoencoders,” IEEE Trans. In-
heath conditions compared with the traditional method strum. Meas., vol. 65, no. 9, pp. 2012–2022, Sep. 2016.
without transfer learning processing. [20] M. Oquab, L. Bottou, I. Laptev, and J. Sivic, “Learning and transferring
mid-level image representations using convolutional neural networks,” in
2) The features learned from the raw signals may reduce dis- Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2014, pp. 1717–1724.
tribution discrepancy between different domain datasets. [21] V. Jayaram, M. Alamgir, Y. Altun, B. Scholkopf, and M. Grosse-Wentrup,
3) The proposed DCTLN outperforms the two widely used “Transfer learning in brain-computer interfaces,” IEEE Comput. Intell.
Mag., vol. 11, no. 1, pp. 20–31, Feb. 2016.
transfer learning methods, i.e., DDC and DANN, in the [22] C. Devin, A. Gupta, T. Darrell, P. Abbeel, and S. Levine, “Learning
six transfer fault diagnosis experiments of bearings. modular neural network policies for multi-task and multi-robot transfer,”
Those conclusions indicate that DCTLN trained with labeled in Proc. IEEE Int. Conf. Robot. Automat., 2017, pp. 2169–2176.
[23] L. Wen, L. Gao, and X. Li, “A new deep transfer learning based on sparse
data acquired from one machine is able to effectively classify auto-encoder for fault diagnosis,” IEEE Trans. Syst., Man, Cybern. Syst.,
unlabeled data acquired from the other machines. Therefore, to be published, doi: 10.1109/TSMC.2017.2754287.
the DCTLN is able to promote the successful applications of [24] J. Xie, L. Zhang, L. Duan, and J. Wang, “On cross-domain feature fusion
in gearbox fault diagnosis under various operating conditions based on
intelligent fault diagnosis of machines with unlabeled data. transfer component analysis,” in Proc. IEEE Int. Conf. Prognostics Health
Manage., 2016, pp. 1–6.
[25] E. Tzeng, J. Hoffman, N. Zhang, K. Saenko, and T. Darrell, “Deep domain
REFERENCES confusion: Maximizing for domain invariance,” 2014, arXiv:1412.3474.
[1] Y. Lei, Intelligent Fault Diagnosis and Remaining Useful Life Pre- [26] Case Western Reserve University Bearing Data Center Website. 2000.
diction of Rotating Machinery. Oxford, U.K.: Butterworth-Heinemann, [Online]. Available: http://csegroups.case.edu/bearingdatacenter/home
2016. [27] H. Qiu, J. Lee, J. Lin, and G. Yu, “Wavelet filter-based weak signature de-
[2] L. Guo, N. Li, F. Jia, Y. Lei, and J. Lin, “A recurrent neural network tection method and its application on rolling element bearing prognostics,”
based health indicator for remaining useful life prediction of bearings,” J. Sound Vib., vol. 289, no. 4, pp. 1066–1090, 2006.
Neurocomputing, vol. 240, pp. 98–109, 2017. [28] Y. Ganin et al., “Domain-adversarial training of neural networks,” J.
[3] F. Jia, Y. Lei, L. Guo, J. Lin, and S. Xing, “A neural network constructed Mach. Learn. Res., vol. 17, no. 59, pp. 1–35, 2016.
by deep learning technique and its application to intelligent fault diagnosis [29] S. J. Pan, I. W. Tsang, J. T. Kwok, and Q. Yang, “Domain adaptation via
of machines,” Neurocomputing, vol. 107, pp. 241–265, 2018. transfer component analysis,” IEEE Trans. Neural Netw., vol. 22, no. 2,
[4] M. Cerrada, G. Zurita, D. Cabrera, R.-V. Sánchez, M. Artés, and C. Li, pp. 199–210, Feb. 2011.
“Fault diagnosis in spur gears based on genetic algorithm and random [30] W. Lu, B. Liang, Y. Cheng, D. Meng, J. Yang, and T. Zhang, “Deep model
forest,” Mech. Syst. Signal Process., vol. 70, pp. 87–103, 2016. based domain adaptation for fault diagnosis,” IEEE Trans. Ind. Electron.,
[5] T. W. Rauber, F. de Assis Boldt, and F. M. Varejão, “Heterogeneous vol. 64, no. 3, pp. 2296–2305, Mar. 2017.
feature models and feature selection applied to bearing fault diagnosis,” [31] L. Van Der Maaten, “Accelerating t-SNE using tree-based algorithms,” J.
IEEE Trans. Ind. Electron., vol. 62, no. 1, pp. 637–646, Jan. 2015. Mach. Learn. Res., vol. 15, no. 1, pp. 3221–3245, 2014.

Authorized licensed use limited to: Cornell University Library. Downloaded on September 02,2020 at 13:07:55 UTC from IEEE Xplore. Restrictions apply.
GUO et al.: DCTLN: A NEW METHOD FOR INTELLIGENT FAULT DIAGNOSIS OF MACHINES WITH UNLABELED DATA 7325

Liang Guo was born in ChangDe, China. He re- Tao Yan received the B.S. degree in mechan-
ceived the B.S. and Ph.D. degrees in mechanical ical engineering from Central South Univer-
engineering from Southwest Jiaotong University, sity, Changsha, China, in 2016. He is cur-
Chengdu, China, in 2011 and 2016, respectively. rently working toward the Ph.D. degree in me-
He is currently an Assistant Professor with chanical engineering with the Key Laboratory
Southwest Jiaotong University. Prior to joining of Education Ministry for Modern Design and
Southwest Jiaotong University in 2018, he was Rotor-Bearing System, Xi’an Jiaotong Univer-
a Postdoctoral Research Fellow with Xi’an Jiao- sity, Xi’an, China
tong University, Xi’an, China. His research inter- His research interests include machinery con-
ests include machinery intelligent fault diagnos- dition monitoring, intelligent fault diagnostics,
tics and remaining useful life prediction. and remaining useful life prediction of rotating
machinery.

Yaguo Lei (M’15) received the B.S. and Ph.D.


degrees in mechanical engineering from Xi’an
Jiaotong University, Xi’an, China, in 2002 and
2007, respectively.
He is currently a Full Professor of Mechan-
ical Engineering with Xi’an Jiaotong University.
Prior to joining Xi’an Jiaotong University in 2010,
he was a Postdoctoral Research Fellow with the
University of Alberta, Edmonton, AB, Canada.
He was also an Alexander von Humboldt Fellow
with the University of Duisburg-Essen, Duisburg,
Germany.
His research interests include machinery condition monitoring and
fault diagnosis, mechanical signal processing, intelligent fault diagnos-
tics, and remaining useful life prediction.
Dr. Lei is a Member of the editorial boards of more than ten jour-
nals, including Mechanical System and Signal Processing and Neural
Computing and Applications. He is also a member of the American Soci-
ety of Mechanical Engineers. He has pioneered many signal processing
techniques, intelligent diagnosis methods, and remaining useful life pre-
diction models for machinery.
Naipeng Li received the B.S. degree in me-
chanical engineering from Shandong Agricul-
Saibo Xing was born in Nantong, China. He tural University, Tai’an, China, in 2012. He is
received the B.S. degree in material science currently working toward the Ph.D. degree in
and engineering, in 2015 from Xi’an Jiaotong mechanical engineering with the Key Labora-
University, Xi’an, China, where he is currently tory of Education Ministry for Modern Design
working toward the Ph.D. degree in mechanical and Rotor-Bearing System, Xi’an Jiaotong Uni-
engineering. versity, Xi’an, China.
His research interests include intelligent fault His research interests include machinery con-
diagnostics of rotating machinery. dition monitoring, intelligent fault diagnostics,
and remaining useful life prediction of rotating
machinery.

Authorized licensed use limited to: Cornell University Library. Downloaded on September 02,2020 at 13:07:55 UTC from IEEE Xplore. Restrictions apply.

You might also like