A Novel Feature Extraction Method Based On Discriminative Graph Regularized Autoencoder For Fault Diagnosis

Available online at www.sciencedirect.
com
ScienceDirect
IFAC PapersOnLine 52-24 (2019) 272–277
A
A novel
novel feature
feature extraction
extraction method
method based
based
A
A novel
on
novel feature
discriminative
feature extraction
graph
extraction method based
regularized
method based
on discriminative graph
on discriminative graph regularizedregularized
onautoencoder
discriminative for fault
graph diagnosis
regularized
autoencoder
autoencoder for
for fault
fault diagnosis
diagnosis
autoencoder
Yanxia Li ∗∗ Yi Chai ∗∗
for fault diagnosis
∗∗∗ ∗∗∗∗
∗∗ Han Zhou ∗∗∗ Hongpeng Yin ∗∗∗∗
Yanxia Li ∗∗ Yi Chai ∗∗ Han Zhou ∗∗∗ Hongpeng Yin ∗∗∗∗
Yanxia Li ∗ Yi Chai ∗∗ ∗∗
Han Zhou ∗∗∗∗∗∗ Hongpeng Yin ∗∗∗∗
∗∗∗∗
∗ Yanxia Li Yi Chai Han Zhou Hongpeng Yin
∗ College of Automation, Chongqing University, Chongqing 400044,
∗ College of Automation,
∗
China
Chongqing University, Chongqing 400044,
(e-mail: liyanxia106@gmail.com)
∗ College of Automation,
China Chongqing
(e-mail: University, Chongqing 400044,
liyanxia106@gmail.com)
∗∗ College of Automation, Chongqing University, Chongqing 400044,
∗∗ Key Laboratory of Complex System
∗∗
China
Key Laboratory
∗∗ Education,
(e-mail:
of Complex
China (e-mail: System Safety
Safety and
and Control,
Control, Ministry
liyanxia106@gmail.com)
liyanxia106@gmail.com) Ministry of
of
Key Laboratory
Chongqing 400044, China (e-mail: chaiyi@cqu.edu.cn) of
of Complex System Safety and Control, Ministry
∗∗ Education, Chongqing 400044, China (e-mail: chaiyi@cqu.edu.cn)
∗∗∗Key Laboratory
College of Complex System Safety and Control, Ministry of
∗∗∗
∗∗∗
∗∗∗ College of
Education,
Education,
Automation,
ofChongqing
Automation,
Chongqing
Chongqing
400044, ChinaUniversity,
Chongqing
400044, China University,
(e-mail:
Chongqing
Chongqing 400044,
(e-mail: chaiyi@cqu.edu.cn)
400044,
chaiyi@cqu.edu.cn)
China
∗∗∗ College ofChina
(e-mail:
Automation, zhouhan1515@foxmail.com)
(e-mail:Chongqing University, Chongqing 400044,
zhouhan1515@foxmail.com)
∗∗∗∗College ofChina
Automation, Chongqing University, ChongqingMinistry
400044,
∗∗∗∗ Key Laboratory (e-mail:
of zhouhan1515@foxmail.com)
Complex System
∗∗∗∗ Key
∗∗∗∗ Laboratory
China of Complex
(e-mail: System Safety
Safety and
and Control,
zhouhan1515@foxmail.com) Control, Ministry
Key Laboratory
of of
Education, Complex
Chongqing System Safety
400044, and
China Control,
(e-mail: Ministry
∗∗∗∗ of Education, Chongqing 400044, China (e-mail:
Key Laboratory ofyinhongpeng@gmail.com)
Complex System Safety and (e-mail:
Control, Ministry
of Education, Chongqing 400044, China
of Education, yinhongpeng@gmail.com)
Chongqing 400044, China (e-mail:
yinhongpeng@gmail.com)
yinhongpeng@gmail.com)
Abstract: Autoencoder
Abstract: Autoencoder has has been
been popularly
popularly usedused as as anan effective
effective feature
feature extraction
extraction methodmethod
in
in fault
Abstract: diagnosis.
Autoencoder
fault diagnosis. However,
However, has the
been autoencoder
popularly
the autoencoder algorithms
used as
algorithmsan neglect
effective local
feature
neglect feature structure
extraction
local structure and
and class
method
class
Abstract:
in
information Autoencoder
fault diagnosis.that is However, has been
available the
in the popularly
autoencoder
training used
set. as an
algorithms
To effective
addressneglectthis local
problem, extraction
structure
a noveland method
class
feature
information
in fault diagnosis.that is However,
available the in the training set.
autoencoder To address
algorithms neglectthis local
problem, a novel
structure and feature
class
extraction
information
extraction approach
approach based
that is based
availableon discriminative
on discriminative
in the training graph
set.
graph regularized
To address
regularized autoencoder
this problem,
autoencoder is proposed
is proposed
a novel for fault
feature
for fault
information
diagnosis task.
extraction task.that
approach is available
A single-layer
single-layer in the
autoencoder
based on autoencoder training
discriminative with set. To
nonlinear
graph address this
layersautoencoder
regularized problem,
is adopted
adopted to to a novel feature
extract nonlinear
is proposed nonlinear
for fault
diagnosis
extraction A
approach based oninput
discriminative with nonlinear
graph layers
regularized is
autoencoder extract
isdata
proposed for fault
features
diagnosis
features automatically
task. A
automatically from
single-layer
from input signals.
autoencoder Locality
with
signals. with relationship
nonlinear
Locality layers
relationship of
is original
adopted
ofadopted
originalto to is
extract
data propagated
nonlinear
is propagated
diagnosis
features
to the
the featuretask. A
automaticallysingle-layer
feature extraction from
extraction stage autoencoder
input
stage via signals.
via aa graph
graph to nonlinear
Locality
to learn
learn internallayers
relationship is
of original
internal representations
representations that extract
data is nonlinear
propagated
that gogo beyond
beyond
to
features automatically from input signals.
reconstruction
to the feature
reconstruction and on to
extraction
and on to locality
stage
locality via a graphLocality
preservation.
preservation. Tolearn
toTo relationship
better
better internal
exploit
of original
exploitrepresentations
the
the
data
discriminative
discriminative
isinformation,
that propagated
go beyond
information,
to
thethe labelfeature
reconstruction extraction
information
and on toof stage
oflocality
training via a graph istoTo
samples
preservation. learn
embedded
better internal
to the
exploit representations
the graph
thegraph thatinformation,
to improve
discriminativeimprove gothebeyond
fault
the label
reconstruction information
and on to training
locality samples
preservation. is To
embedded
better to
exploit the to
discriminative the fault
information,
diagnosis
the label
diagnosis performance.
information A
of real industrial
training samples process
is are
embedded used to comparing
the graph the
to performance
improve the with
fault
the label performance.
diagnosis
commonly information
performance.
used diagnosis
A training
of real industrial
A method,
real industrialsamples
the
process
process
promising
are used to
is embedded
are
to comparing
theresults
used to comparing
experimental
the
graphvalidate
the
performance
to improve
performance
the superiority
with
the fault
with
commonlyperformance.
diagnosis used diagnosis A method, the promising
real industrial process are experimental results validate
used to comparing the superiority
the performance with
of the
of the proposed
commonly proposed method.
used method.
diagnosis method, the promising experimental results validate the superiority
commonly used
of the proposed method. diagnosis method, the promising experimental results validate the superiority
© the
of 2019,proposed
IFAC (International
method. Federation of Automatic Control) Hosting by Elsevier Ltd. All rights reserved.
Keywords: fault
Keywords: fault diagnosis,
diagnosis, feature
feature extraction,
extraction, autoencoder,
autoencoder, local local geometric
geometric structure,
structure, labellabel
information
Keywords:
information fault diagnosis, feature extraction, autoencoder, local geometric structure, label
Keywords:
informationfault diagnosis, feature extraction, autoencoder, local geometric structure, label
information
1. INTRODUCTION
INTRODUCTION ing
1. ing techniques.
techniques. An An effective
effective feature
feature extraction
extraction procedure
procedure
1. INTRODUCTION greatly
ing facilitates
techniques. An the fault
effective classification
feature process,
extraction reducing
procedure
1.oneINTRODUCTION greatly
ing facilitates
techniques. the
Anthe fault
effective classification
feature process,
extraction reducing
procedure
Fault diagnosis is important task to identify defec- greatly
its facilitates
computational fault classification
demand, and ultimately process, reducing
improving the
Fault diagnosis is one important task to identify defec- greatly its computational
facilitates demand,
the fault and ultimately
classification improving
process, the
reducing
Fault
tive diagnosis
states and is one
conditions important
within task to
industrial identify
systems.defec-
It fault
its
fault classification
computational
classification performance,
demand,
performance, and which
ultimately
which may
may be adversely
improving
be the
adversely
tive
Fault states and is
diagnosis conditions
one within task
important industrial
to systems.
identify It its computational demand, and ultimately improving the
defec-
tive states
ensures safe and conditions
operations, withinequipment
prevents industrialdamagessystems.and affected
It fault by irrelevant
classification and redundant
performance, which features.
may be adversely
ensures
tive
ensures
maintains
safeand
states
safe
the
operations,
conditions
operations,
normal
prevents
withinequipment
prevents
production byindustrial
equipment
detecting
damages
systems.
damages
faults and It affected
and affected
by irrelevant
fault classification
by irrelevant
and redundant
performance,
and redundant which features.
may be adversely
features.
maintains
ensures the
safe normal
operations, production
prevents by detecting
equipment faults
damages and
and Representative
affected by
Representative features
irrelevant
features and associated
redundant
associated with the
the conditions
features.
with conditions ofof
maintains
diagnosingthe thenormal
root causes production
causes of faults
faultsby(Gajjar
detecting et faults
al., and
2018).
diagnosing
maintains the
the root
normal of
production by (Gajjar
detecting et al.,
faults2018).
and industrial
Representative
industrial components
features
components should
associated
should be
be extracted
with
extracted the by using
conditions
by using ap-
of
ap-
diagnosing
Primary fault the root
fault diagnosis causes
diagnosis techniques of faults
techniques can (Gajjar
can be et
be categorizedal., 2018).
categorized into into industrial
Representative features
components associated
should be with theapproaches.
extracted conditions
by using ap- of
Primary
diagnosing the root causes of faults (Gajjar et al., 2018). propriate signal processing and calculating It
Primary
the classes
classesfault diagnosis techniques
of model-based,
model-based, signal-based, categorized into propriate
can beknowledge-based industrial
propriate
signal processing
components
signal processing shouldandbe
and
calculating
extracted
calculating
approaches.
by using
approaches.
It
ap-
It
the
Primary of
fault diagnosis signal-based,
techniques can be knowledge-based
categorized into can be
can be foundfound that
that various
various and techniques,
techniques, including
including Fourier
Fourier
the
(Gao classes
et of2015).
al., model-based,
In the signal-based,
last several knowledge-based
decades, owing to propriate signal processing calculating approaches. It
(Gao
the et
classesal., 2015).
of2015). In
model-based, the last several
signal-based, decades, owing
knowledge-based to transform
can be (FT),
found envelope
that various analysis, wavelet
techniques, analysis,
including Em-
Fourier
(Gao
the et al.,
wide applications In ofthe last
modern several decades,
measurement owing to transform
techniques can be (FT),
found envelope
that various analysis, wavelet
techniques, analysis,
including Em-
Fourier
the
(Gao wide applications
et al., 2015). In of modern measurement
decades,techniques pirical
owing to transform Mode(FT), Decomposition
envelope analysis, (EMD) and
and timefrequency
wavelet analysis, Em-
the
and wide
the applications
quick ofthe
development lastofseveral
modern measurement
data analysis
analysis techniques
methods, pirical
transform
pirical
Mode
Mode(FT), Decomposition
envelope
Decomposition
(EMD)
analysis,
(EMD) wavelet
and
timefrequency
analysis, Em-
timefrequency
and
the the
wide quick development
applications of modern of measurement
data methods,
techniques distributions, have been conducted on feature extraction
and the quick development
knowledge-based techniques, also of data
also known analysis methods, distributions,
as data-driven
data-driven pirical Mode
distributions,
have been conducted
Decomposition
have been (EMD)
conducted
onand
on
feature extraction
timefrequency
feature extraction
knowledge-based
and the quick techniques, known as for fault diagnosis and achieved good results(Li and
and He,
knowledge-based
techniques,
techniques, are development
are
techniques,
finding
finding more
of
more also data
chances
chances
known analysis
in fault
in fault methods,
diagnosis for
as data-driven
diagnosis
fault diagnosis
distributions,
2012). However, have and achieved
beenfeature
these conducted goodon results(Li
extraction feature He,
extraction
techniques
knowledge-based
techniques, are techniques,
finding more also
chancesknown in as
fault data-driven
diagnosis for
2012).fault diagnosis
However, and
these achieved
feature good
extraction results(Li
techniquesand cost
He,
cost
applications.
applications. for
much fault
humandiagnosis
labor and feature
and achieved
makes the good
methods results(Li
less and cost
He,
automatic
techniques, are finding more chances in fault diagnosis much
applications. 2012). However,
human laborthese
and feature
makes the extraction techniques
methodstechniques
less automatic
data-driven since 2012).
much However,
human laborthese
and makes extraction
the methods less cost
automatic
Traditionally, the
applications.
Traditionally, the framework
framework of of intelligent
intelligent data-driven since it
much it largely
largely
human
depends
depends
labor and makes
on
on much
much prior
prior knowledge
thediagnostic
methods knowledge
less
about
about
automatic
Traditionally,
fault diagnosis the
is framework
generally of intelligent
decomposed into data-driven
two stages, signal
since
signal processing
it largely
processing techniques
depends
techniques and
on much
and prior knowledge
diagnostic expertise.
about
expertise.
fault diagnosis
Traditionally, is
the generally
framework decomposed
of denoted
intelligentinto two stages,
data-driven since it largely depends on much prior knowledge about
fault
namely diagnosis
a first is generally
preprocessing decomposed
stage into two extrac-
feature stages, signal processing techniques and diagnostic expertise.
namely
fault a first preprocessing
diagnosis is generally stage denotedinto
decomposed feature
two extrac-
stages, Compared
signal to
processing above conventional
techniques and fault
diagnosticfeature extraction
expertise.
namely a first by
tion, followed
tion, followed preprocessing
the actual stageclassification
actual fault
fault denoted feature (Leiextrac-al., Compared
et al., to above conventional fault feature extraction
namely
tion,
2016). a first by
followed
Feature by
the
preprocessing
the actualaims
extraction stage
fault
classification
denoted
classification
to extract
(Lei
feature et
representative
approaches,
et al., Compared
(Leiextrac- approaches,
Compared
torecently,
torecently,
above
autoencoder
autoencoder as
above conventional
conventional as aa new
fault
fault new feature
feature
feature
feature
extrac-
extraction
extrac-
extraction
2016).
tion, Feature
followed byextraction
the actual aims
fault to extract
classification representative
(Lei et al., tion
tion method
approaches,
method has
recently,
has been
been an
an active
autoencoder
active research
as a new
research topic.
feature
topic. It
It is
is a
extrac-a
2016). Feature
features from
from the extraction
the collected aims
collected signals to
signals based extract
based on representative
on signal
signal process-
process- approaches, recently, autoencoder as a new feature extrac-
features
2016). Feature extraction aims tobased extract representative promising
tion method methodhas to
been overcome
an active the difficulties
research of
topic. feature
It is a
features from the collected signals on signal process- promising
tion method methodhas beento overcome the difficulties
an activetheresearch of
topic. feature
It is a

features
from
This work wasthe collected
supported signals based
by National Naturalon
This work was supported by National Natural Science Foundation
process- promising
signalFoundation
Science extraction method
by
by learning
extraction method
promising learning
to overcome
to
the
the features
features
overcome the
indifficulties
high-level
indifficulties
of feature
high-level ofrepresen-
represen-
feature

of This
China (NSFC) under Grants 6177308, 61633005 and Scientific tations
extraction
tations automatically.
by learning
automatically. Up
Upthe to now,
features lots
in of feature
high-level extraction
represen-
of China

of This
Reserve
China
work
work
was supported
(NSFC) under Grants
was Programs
Talent
(NSFC) supported
under Grants
by National
6177308,
by National
of Chongqing
6177308,
Natural
Natural
Science
61633005
Science
University
61633005
and
and
Foundation
Scientific
Foundation
under Grant
Scientific extraction
methods
tations by learning
based
automatically.on Uptheto now, lots
tofeatures
autoencoder now, lots in of
have of
feature extraction
high-level
been
feature represen-
proposed
extractionin
Reserve Talent Programs of Chongqing University under Grant
of China (NSFC) under Grants 6177308, 61633005 and Scientific
methods
tations based
automatically.on autoencoder
Up to now, have
lots of been
feature proposed
extractionin
cqu2018CDHB1B04.
Reserve Talent
cqu2018CDHB1B04. Programs of Chongqing University under Grant methods based on autoencoder have been proposed in
Reserve Talent Programs of Chongqing University under Grant
cqu2018CDHB1B04. methods based on autoencoder have been proposed in
cqu2018CDHB1B04.
2405-8963 © 2019, IFAC (International Federation of Automatic Control) Hosting by Elsevier Ltd. All rights reserved.
Peer review under responsibility of International Federation of Automatic Control.
10.1016/j.ifacol.2019.12.420
Yanxia Li et al. / IFAC PapersOnLine 52-24 (2019) 272–277 273
the field of fault diagnosis. Ma et al. (Ma et al., 2018) (2) The proposed feature extraction method DGAE is
constructed a coupling autoencoder to integrates feature utilized for intelligent fault diagnosis. It can be used to
extraction of multimodal data seamlessly into data fusion directly learn features from raw monitoring signals and
for fault diagnosis. Shao et al. (Shao et al., 2017) employed recognize different faults under various conditions.
a deep autoencoder feature learning method to capture the
The rest paper is organized as follows. Section 2 briefly
useful fault features from the measured vibration signals.
describes the detail of the method and the optimization
Xia et al. (Xia et al., 2017) used a deep neural network
steps. In Sections 3, the effectiveness and advantages of
based on stacked denoising autoencoder to extract rep-
the proposed method are illustrated by a case study on
resentative features from massive unlabelled data on the
the Tennessee Eastman (TE) process. Finally, conclusion
system condition and achieve high performance in fault
is drawn in Section 4.
diagnosis. In (Lu et al., 2017), an effective and reliable
stacked denoising autoencoder was developed for fault 2. THE PROPOSED DGAE METHOD
diagnosis of typical dynamic systems. incorporates feature
extraction and fault recognition into a general-purpose In this section, the proposed feature extraction method,
learning procedure. Jia et al. (Jia et al., 2018) adopted a which is called discriminative graph regularized autoen-
local connection network constructed by normalized sparse coder (DGAE), is introduced in detail. Specifically, the
autoencoder to produce shift-invariant features and recog- framework of the proposed method is provided initially.
nizes mechanical health conditions effectively. Then, the objective function of the proposed method is
Although autoencoder has better interpretability than reported. Later, the detailed optimization steps of the pro-
manual designing features, it inherits a disadvantage that posed method is also explained, followed by the proposed
it only preserves the global Euclidean structure (via stan- fault diagnosis method.
dard autoencoder cost function) of data but totally neglect 2.1 The framework of the DGAE method
the local data structure(i.e., local neighborhood relations
among data points). Because the local neighborhood struc- The framework of the proposed method is illustrated in
ture is also a very important aspect of signal features(Wei in Fig.1. As Fig.1 demonstrates, the proposed method
et al., 2013; Zheng et al.,2011), neglecting such important consists of three components. First, it projects the train-
data information inevitably degrades the performance of ing data to a subspace parameterized by a single-layer
fault diagnosis. Moreover, autoencoder is an unsupervised autoencoder. Then, a local constraints denoted by graph
feature extraction method, and thus fail to encode discrim- Laplacian regularization are considered. It to explore the
inative information into feature learning stage. However, intrinsic local structure of training data and increase the
in scenarios where labeled data is available and can be used discrimination capability of extracted features. In the test-
for training, the class information is essential for promote ing stage, the underlying and succinct features of testing
the performance of fault diagnosis task. data can be represented by the learned nonlinear trans-
To address the problem mentioned above and achieve formations. Then the label of the testing data can be
automated key features extraction, a novel feature ex- predicted via a simple classifier.
traction method based on discriminative graph regularized
2.2 The objective function of the DGAE method
autoencoder (DGAE) is proposed for fault diagnosis in the
present paper. AE, an advanced neural network structure,
Given the data samples set of N training samples X =
is utilized to map the process data to feature space. Com-
[X1 , · · · , XC ] = [x1 , · · · , xN ] ∈ RD×N , C is the class
pared to the feature extraction methods mentioned above,
number of the training samples, Xi is a matrix composed
the architecture of AE is all learned from data, which not
of all training samples of the ith class, each sample is
only avoids the problem of designing feature manually,
vectorized to be a vector x with D dimensions. Thus,
but also ensures that the learned feature truly reflects the
the label matrix of training samples X can be defined
data characteristics. In addition, a graph which reflects
as Y = [Y1 , · · · , YC ] = [y1 , · · · , yN ] ∈ RC×N , yi =
the locality relationship of original data is incorporated T
into AE model to learn internal representations that go [0, · · · , 1, · · · , 0] ∈ RC , and only the jth entry of yi is
beyond reconstruction and on to locality preservation. To nonzero, which indicates that training samples xi comes
better exploit the discriminative information, the label in- from jth class. Specifically, an autoencoder is a self-
formation of training samples is embedded to the graph to supervised neural network where the input and the output
improve the classification performance. In order to detect are the same. In this paper, a single-layer autoencoder
faults, the Nearest Neighbor Classifier is employed due to is utilized for feature extraction due to its quite simple
its simplicity and classification accuracy is calculated as structure and easy to implement. The worklow of an
the evaluation metric for fault diagnosis efficiency. The autoencoder consists of two steps: encoding and decoding.
main contribution of this paper can be summarized as To be more specific, in the encoding step, it projects
follows: the original data xi ∈ RD to a d hidden layer via a
nonlinear function h (xi ; W1 , b1 ). The hidden layer can
(1) A novel feature extraction method which is based on be recognized as the feature layer. This is represented as
an autoencoder and discriminative graph regularization is follows:
proposed. The proposed method inherits the property of hi = h (xi ; W1 , b1 ) = σ (W1 · xi + b1 ) (1)
automatically extracting discriminative feature that pre-
serves locality and lable information of original data, which where W1 is a d × D weight matrix, b1 ∈ Rd is a bias
in turn results in more accurate classification accuracy. vector, and σ (·) is a sigmoid active function, σ (x) =
1/(1 + exp (−x)).
274 Yanxia Li et al. / IFAC PapersOnLine 52-24 (2019) 272–277
Training Stage Testing Stage

Class 1
Class 2
...
...
Class n Testing Data
Local structures Labels
L
ȝȝȝ
ȝȝȝ
ȝȝȝ
Classifier
Training Data Feature Extraction

Label
Fig. 1. The framework of the proposed DGAE method.
The second steps maps the outputs of hidden units However, (5) is an unsupervised type. In consideration of
back to the original input space via a decoder function the label information of original data, the label is further
o (hi ; W2 , b2 ) as follows: embedded to the graph as follows:

oi = o (hi ; W2 , b2 ) = σ (W2 · hi + b2 ) (2) 
 s (xi , xj ) , if xi ∈ N (xj ) or xj ∈ N (xi )

 and y (xi ) = y (xj )
where W2 ∈ RD×d and b2 ∈ RD are corresponding weight Sij = −s (xi , xj ) , if xi ∈ N (xj ) or xj ∈ N (xi ) (6)

 and y (xi ) = y (xj )
matrix and bias vector, respectively. 

0 , otherwise
During training, the encoder and decoder function are
learnt by minimize the difference between the input data Then the Laplacian matrices of graph G can be calculated
and the reconstructed data oi = xi . This is expressed as as:
follows:
L=D−S (7)
1 2

min J1 = O (Ω) − XF (3)
Ω 2N where D is a diagonal matrix whose elements are Dii =
n
where Ω = [W1 , W2 , b1 , b2 ], O (Ω) = [o1 (Ω) , · · · , oN (Ω)] Sij . Hence, the local geometrical structure and label
j=1
and oi (Ω) = σ (W2 · σ (W1 · xi + b1 ) + b2 ) for i = infomation can be considered by the discriminative graph
1, · · · N . as follows:
According to formulation (3), one can note that it pays 1 2
N N

much attention on representation and reconstruction while min J2 = h(Ω)i − h(Ω)j Sij
totally ignores the label information and relationship be- Ω 2 i=1 j=1 2 (8)

tween local samples. As described previously, local geomet- = T r H (Ω) LH(Ω)
T
ric structures of original data often contain discriminative
information of neighboring data point pairs. Motivated by
the intuition that nearby data points have similar geomet- where H (Ω) = [h1 (Ω) , · · · , hN (Ω)] and hi (Ω) =
ric properties, we construct graph Laplacian to model the σ (W1 · xi + b1 ) for i = 1, · · · N , and T r(·) denotes the
local neighborhood relationships data points. trace operator. By this means, if two data points xi and xj
are close in the original space, their corresponding features
Given a sample xi ∈ X, the neighbors of xi are the k Hi and Hj are also close to each other. Similar data pairs
nearest data samples. Here, k is a pre-defined positive which belong to different classes in the original space are
constant. In order to characterize the local structure, a far apart in the feature space. It can encourage the nearby
graphs G and is constructed on the data space. Specially, points with the same label to be close to each other while
the edge weight of between two connected data points is the nearby points with different labels to be far apart on
determined by the similarly between those two points as their respective spaces.
follows:
With the two objective function terms defined above, the
2
xi − xj objective function of the DGAE algorithm is defined by
s (xi , xj ) = exp − (4) combining them as:
t
min J = J1 + λ2 J2
Ω
where t adjusts the weight decay speed. Let Sij denotes
1 2
the weight matrix. 2 2
= O (Ω) − XF + λ1 Wi F (9)
2N
s (xi , xj ) , if xi ∈ N (xj ) or xj ∈ N (xi ) i=1
Sij = (5) +λ2 T r H (Ω) LH(Ω)
T
0 , otherwise
2 3. EXPERIMENTS
In (9), the first term O (Ω) − XF is an autoencoder
term which minimizes the reconstruction error between
O (Ω) and X. The second term is a regularization term to In this section, a series of experiments are conducted on
decrease the magnitude of the weights and help prevent real-world benchmark datasets to show the performance
overfitting problem. The third term represents the locality of the proposed algorithm. All experiments are performed
information with label embedding, it enable the proposed in windows 7 environment on a machine having core i5
method to perverse desirable local structure among train- processor and 8 GB RAM.
ing samples and enhance class discrination at the same
time.. λ1 and λ2 are positive regularization parameters to 3.1 Data description
balance the relative importance of corresponding term ro
the objective function J. The proposed method is tested on the TE process. The
process has 41 measurements and 12 manipulated vari-
ables. The measurements include 22 continuous process
2.3 Optimization of the DGAE method measurements and 19 sampled process measurements. To-
tally 21 different faults have been designed and the data set
The objective function of DGAE shown in (9) does not used in this paper are given in (Chiang et al., 2000) and are
have a closed-form solution. By following (Le et al., 2011) widely accepted for process monitoring and fault diagnosis.
, the back-propagation algorithm is employed in conjunc- The data set includes 22 training sets and 22 testing sets.
tion with batch gradient decent method to update the Except that one testing set was obtained under normal
parameters Ω. The key operation of this step is computing operational condition, the other 21 sets were collected
the partial derivatives ∂J/∂Wi and ∂J/∂bi . The partial under 21 different faulty conditions for 48 operation hours
derivatives of the J regarding the Wi and bi (i = 1, 2) are and 960 samples are obtained for each testing set. For each
given as: of 21 faulty testing sets, the fault was introduced in at 8th
∂J 1 operation hour. Similarly, the training sets also compose
= ∆1 XT + 2λ1 W1 + 2λ2 [HLH (1 − H)] XT (10) of one normal operational set and 21 faulty operational
∂W1 N sets. In the experiment, 10080 data samples are randomly
∂J 1 T selected for training and the rest 20160 data samples are
= (O − X) O (1 − O) O + 2λ1 W2 (11) used for testing.
∂W2 N
∂J 1 3.2 Comparison Algorithms
= W2T ∆1 (12)
∂b1 N
∂J 1 To show the effectiveness of the proposed method, the
= ∆2 (13) proposed method is compared with traditional Nearest
∂b2 N
Neighbor (NN) Classifier, PCA+NN, AE+NN respec-
where the ∆1 are ∆2 computed by: tively. PCA+NN means the PCA is firstly used to ex-
tract features from training data, and Nearest Neighbor
∆1 = W2T ∆2 H (1 − H) (14) Classifier is used to classify faults. Similarly, the method
AE+NN utilizes AE and and Nearest Neighbor Classifier
∆2 = (O − X) O (1 − O) (15)
to extract features and fault recognition, respectively.
Then Ω can be updated as follows until convergence: 3.3 Parameter Settings
∂J ∂J
W i = Wi − η , b i = bi − η i = 1, 2 (16) There are two important parameters λ1 and λ2 to be
∂Wi ∂bi
decided in advance in the proposed algorithm. These
To simplify notation, H and Y are adopted to represent parameters are selected and determined by grid search
H (Ω) and O (Ω), respectively. η is the learning rate method on the training datasets. In detail, the suitable λ1
controlling the convergence speed of the objective func- and λ2 are investigated by assigning λ1 ∈ [1e−2, 3e−2, 4e−
tion J. Accordingly, the limited memory Boyden-Fletcher- 2, 5e − 2, 1e − 1], λ2 ∈ [1e − 3, 3e − 3, 4e − 3, 5e − 3, 1e − 2].
Goldfarb-Shanno (L-BFGS) and the minFunc toolbox is The classification accuracy of Fault1 are reported in Fig.2.
utilized to do the optimization, thus the η can be adap- It can be noted that λ1 = 5e − 2 and λ1 = 4e − 3 reaches
tively adjusted in the iteration process. the highest point in as shown in Fig.2. Therefore, λ1 and
λ2 are set as 0.05 and 0.004, respectively.
2.4 Diagnosis Method Then, the effect of the hidden layer size are evaluated on
the classification performance. Six different hidden layer
Once the nonlinear mapping h (xi ; W1 , b1 ) which is char- sizes are discussed, including 15, 20, 25, 30, 35 and 40.
acterized by Ω, and the designed dictionary D are learned, Fig.3 shows the classification accuracy of the proposed
the classification task can be performed as follows. For a method versus different hidden layer size on the dataset.
test sample Xtest , it can be coded as Atest via (17). Then It can be noted that the proposed method achieves stable
i performance when the hidden layer size reaches around 25.
the label of x can be predicted with a simple linear classi-
fier, such as the Nearest Neighbor Classifier, to predict its Lastly, different k of the k nearest data samples from 3 to
class label. 11 are investigated on the classification performance. Fig.4
shows the classification accuracy of the proposed versus k
Atest
i = σ W1 · xtest
i + b1 (17) on the Fault1. It can be noted that the proposed method
276 Yanxia Li et al. / IFAC PapersOnLine 52-24 (2019) 272–277
3.4 Diagnosis results
To show the effectiveness of the proposed method, the

proposed method are compared with the other feature ex-
traction and fault diagnosis methods. Table 2 tabulates the
average classification rates of these methods. In method
1, the test data is directly classified by Nearest Neighbor
(NN) Classifier. Comparing methods 1 with the proposed
method, it is illustrated that the features automatically
extracted by DGAE can help obtaining higher diagnosis
accuracy. In Method 2, principal component analysis is
used for learning features and NN is used for classification.
As shown in Table 2, the proposed DGAE algorithm has
better diagnosis performance than the compared method
using Method 2. The main reason is that the features ex-
tracted by PCA are linear, so they have limited abilities in
characterizing the faults. In Method 2, AE and NN consti-
tute a fault diagnosis framework, whose diagnosis accuracy
is a bit lower than the proposed method. It indicates that
the DGAE is able to learn better features than autoen-
Fig. 2. Effects of parameter selection of λ1 and λ2 on the coders. Besides, since the DGAE algorithm extracts the
classification accuracy of Fault1. nonlinear features with locality and label preservation to
develop the discriminant model, it improves the diagnosis
performance, especially on the Faults 9, 10, 11, 14, and 16,
which cannot be effectively classified by other methods. In
general, among these four methods, the proposed DGAE
algorithm can provide better diagnosis results for these
twenty one faults.
Table 1. Classification accuracy of all the 21
faults in TE process with different methods(%)
NN PCA AE DGAE
Fault1 75.64 78.83 81.14 83.95
Fault2 72.47 75.27 78.92 81.76
Fault3 0 9.87 10.18 14.98
Fault4 2.29 18.64 21.67 24.47
Fault5 10.20 15.73 18.95 21.76
Fault6 78.41 80.08 81.67 84.47
Fault7 37.81 6.14 75.10 77.90
Fault8 2.81 1.35 16.87 19.68
Fault9 0 4.72 12.08 14.89
Fig. 3. Classification results versus different hidden layer Fault10 0 18.95 20.20 23.12
sizes on Fault1. Fault11 0 2.54 11.25 14.05
Fault12 0 21.87 23.64 26.44
Fault13 20.67 21.56 23.33 26.13
Fault14 0 18.85 22.50 25.30
Fault15 0 11.29 12.60 15.42
Fault16 0 11.45 17.91 20.71
Fault17 23.12 21.77 23.43 26.24
Fault18 62.18 67.18 67.39 70.20
Fault19 0 45.06 47.60 50.40
Fault20 32.60 34.68 39.16 41.96
Fault21 3.85 4.58 6.25 9.05
3.5 Computational Time
Lastly, the computational time of different methods are

reported. Table 6 shows the computational time for dif-
ferent methods under the Matlab platform. From table 6,
Fig. 4. Classification results versus different nearest num- one can see it obviously that the computational time of
bers k on Fault1. the proposed method is generally larger than the PCA
and the NN method in the training stage, since it takes a
achieves stable performance when the k reaches around 5. lot of time to update the neural networks parameters in
Therefore the hidden layer size and k are set 25 and 5 in the optimization progress. In comparison to the classical
the experiment. methods, although the proposed algorithm takes up more
time in the training stage, it can get better classifification Lu, C., Wang, Z.Y., Qin, W.L., and Ma, J. (2017). Fault di-
accuracy. Furthermore, in the testing stage, the proposed agnosis of rotary machinery components using a stacked
method takes up relatively less time. This is because once denoising autoencoder-based health state identification.
the projection matrix and classififiers are obtained, the Signal Processing, 130, 377–388.
classifification of testing data can be simply conducted. Ma, M., Sun, C., and Chen, X. (2018). Deep coupling
In brief, the proposed method can achieve promising per- autoencoder for fault diagnosis with multimodal sensory
formance and guarantees the processing speed at testing data. IEEE Transactions on Industrial Informatics,
stage. 14(3), 1137–1145.
Table 2. The training time and testing time of Shao, H., Jiang, H., Zhao, H., and Wang, F. (2017). A
different methods (s) novel deep autoencoder feature learning method for
rotating machinery fault diagnosis. Mechanical Systems
NN PCA AE DGAE and Signal Processing, 95, 187–204.
Training - 0.05 0.59 5.61 Wei, C.P., Chao, Y.W., Yeh, Y.R., and Wang, Y.C.F.
Testing 2.47 1.29 2.48 2.47 (2013). Locality-sensitive dictionary learning for sparse
representation based classification. Pattern Recognition,
4. CONCLUSION 46(5), 1277–1287.
Xia, M., Li, T., Liu, L., Xu, L., and de Silva, C.W. (2017).
A novel feature extraction method based on discriminative Intelligent fault diagnosis approach with unsupervised
graph regularized autoencoder for fault diagnosis is stud- feature learning by stacked denoising autoencoder. IET
ied. The proposed method can automatically extracting Science, Measurement & Technology, 11(6), 687–695.
discriminative feature that preserves locality and label Zheng, M., Bu, J., Chen, C., Wang, C., Zhang, L., Qiu,
information of original data. Experimental results over TE G., and Cai, D. (2011). Graph regularized sparse coding
process dataset demonstrate that the proposed method for image representation. IEEE Transactions on Image
always archives better performance when compared to Processing, 20(5), 1327–1336.
related feature extraction and fault diagnosis methods.
However, the proposed method can still be revised since it
is based on the underlying assumption of balanced data
distributions among different faults or equal misclassi-
fication costs. Therefore, the future work will focus on
considering the imbalanced data distributions in order to
obtain better classification accuracy.
REFERENCES
Chiang, L.H., Russell, E.L., and Braatz, R.D. (2000). Fault
detection and diagnosis in industrial systems. Springer
Science & Business Media.
Gajjar, S., Kulahci, M., and Palazoglu, A. (2018). Real-
time fault detection and diagnosis using sparse principal
component analysis. Journal of Process Control, 67,
112–128.
Gao, Z., Cecati, C., and Ding, S.X. (2015). A survey
of fault diagnosis and fault-tolerant techniquespart i:
Fault diagnosis with model-based and signal-based ap-
proaches. IEEE Transactions on Industrial Electronics,
62(6), 3757–3767.
Jia, F., Lei, Y., Guo, L., Lin, J., and Xing, S. (2018). A
neural network constructed by deep learning technique
and its application to intelligent fault diagnosis of ma-
chines. Neurocomputing, 272, 619–628.
Le, Q.V., Ngiam, J., Coates, A., Lahiri, A., Prochnow,
B., and Ng, A.Y. (2011). On optimization methods for
deep learning. In Proceedings of the 28th International
Conference on International Conference on Machine
Learning, 265–272. Omnipress.
Lei, Y., Jia, F., Lin, J., Xing, S., and Ding, S.X. (2016). An
intelligent fault diagnosis method using unsupervised
feature learning towards mechanical big data. IEEE
Transactions on Industrial Electronics, 63(5), 3137–
3147.
Li, R. and He, D. (2012). Rotational machine health
monitoring and fault detection using emd-based acoustic
emission feature quantification. IEEE Transactions on
Instrumentation and Measurement, 61(4), 990–1001.

A Novel Feature Extraction Method Based On Discriminative Graph Regularized Autoencoder For Fault Diagnosis

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Novel Feature Extraction Method Based On Discriminative Graph Regularized Autoencoder For Fault Diagnosis

Uploaded by

Copyright:

Available Formats

Available online at www.sciencedirect.

Training Stage Testing Stage

Training Data Feature Extraction

Fig. 1. The framework of the proposed DGAE method.

3.4 Diagnosis results

To show the effectiveness of the proposed method, the

3.5 Computational Time

Lastly, the computational time of different methods are

You might also like