Kang 2016

IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 63, NO.
5, MAY 2016 3299
A Hybrid Feature Selection Scheme for

Reducing Diagnostic Performance Deterioration
Caused by Outliers in Data-Driven Diagnostics
Myeongsu Kang, Md. Rashedul Islam, Jaeyoung Kim, Jong-Myon Kim, Member, IEEE,
and Michael Pecht, Fellow, IEEE
Abstract—In practice, outliers, defined as data points dpj,k kth data point in class j.
that are distant from the other agglomerated data points Edistl Euclidean distance between the centroid data
in the same class, can seriously degrade diagnostic per- point and the lth outlier.
formance. To reduce diagnostic performance deterioration
caused by outliers in data-driven diagnostics, an outlier- Emetric Feature evaluation metric.
insensitive hybrid feature selection (OIHFS) methodology Fshaft Shaft speed in hertz.
is developed to assess feature subset quality. In addition, Ledist , Redist Two Euclidean distance values specified by
a new feature evaluation metric is created as the ratio of the membership level.
the intraclass compactness to the interclass separability Li Cumulative distance to all the other data
estimated by understanding the relationship between data
points and outliers. The efficacy of the developed method- points, calculated using a norm metric for the
ology is verified with a fault diagnosis application by iden- Di .
tifying defect-free and defective rolling element bearings Mdisti Maximum Euclidean distance between the
under various conditions. ith data point and its neighboring data points.
Index Terms—Data-driven diagnostics, outlier detection, membershipi Membership degree of the ith data point.
outlier-insensitive hybrid feature selection (OIHFS), rolling membershiplevel Membership level to determine outlier
element bearings. candidates.
N OMENCLATURE Nanaldata Number of data points per bearing condition
in the analysis dataset.
Bd Roller diameter. Nclasses Number of bearing conditions.
Ci Intraclass compactness of the ith class. NFN Number of data points in class i that are not
Coverall Overall compactness estimated from per- correctly classified as class i.
class compactness. Nfeatures Number of fault features (or signatures).
cdpi Centroid data point of the ith class. Niterations Number of iterations for both the filter and
D A set of data points in a class. wrapper methods.
Dcentroid Centroid data point. Noutliers Number of outliers in each class.
Di ith data point in a class. Nrollers Number of rollers in a rolling element
Manuscript received April 24, 2015; revised July 31, 2015, September bearing.
10, 2015, and December 13, 2015; accepted January 9, 2016. Date NTP Number of data points in class i that are
of publication February 11, 2016; date of current version April 8, 2016.
This work was supported in part by the National Research Foundation
correctly classified as class i.
of Korea funded by the Ministry of Education, Science, and Technology Ntfreqbins Total number of frequency bins.
of Korea under Grant NRF-2013R1A2A2A05004566 and Grant NRF- Ntotaldata Total number of data points used to test the
2015K2A1A2070866, in part by the over 100 CALCE members of
the CALCE Consortium, and in part by the National Natural Science
k-nearest neighbor (k-NN) classifier.
Foundation of China under Grant 71420107023. (Corresponding author: Ntrdata Total number of data points used to train the
Jong-Myon Kim.) k-NN classifier.
M. Kang was with the Department of Electrical Engineering, University
of Ulsan, Ulsan 44610, Korea. He is now with the Center for Advanced
Ntsamples Total number of data samples in five-second
Life Cycle Engineering, The University of Maryland, College Park, MD acoustic emission data sampled at 250 kHz,
20742 USA (e-mail: mskang@calce.umd.edu). x(n).
Md. R. Islam and J. Kim are with the School of Electrical, Electronic,
and Computer Engineering, University of Ulsan, Ulsan 44610, Korea
n Number of data points (n = Nanaldata in this
(e-mail: rashedcse@mail.ulsan.ac.kr; kim7097@mail.ulsan.ac.kr). study).
J.-M. Kim is with the Department of IT Convergence, University of outlieri,l lth outlier in class i.
Ulsan, Ulsan 44610, Korea (e-mail: jmkim07@ulsan.ac.kr).
M. Pecht is with the Center for Advanced Life Cycle Engineering,
Pd Pitch diameter.
The University of Maryland, College Park, MD 20742 USA (e-mail: RVorder Order of random variation in the theoretical
pecht@calce.umd.edu). bearing characteristic frequencies.
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org.
S(f ) Magnitude response of fast Fourier transform
Digital Object Identifier 10.1109/TIE.2016.2527623 of x(n).
0278-0046 © 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
3300 IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 63, NO. 5, MAY 2016
Si Interclass separability of the ith class. identification of various failures in a direct-drive wind turbine
Soverall Overall separability estimated from per-class [10]. More specifically, the HFS method consists of a global
separability. geometric similarity scheme that yields promising feature sub-
vDi Two-dimensional (2-D) vector used to deter- sets and a predefined classifier (e.g., support vector machine
mine whether a data point Di is an outlier. or general regression neural network) to predict diagnostic
α Contact angle. performance (or classification accuracies) with these feature
ε Constant value to control the membership subsets. In [11], Yang et al. proposed a method to improve diag-
level. nostic performance by introducing an HFS framework which
x̄, σ Mean and standard deviation of x(n). is an unsupervised learning model. This method is effective
for bearing fault diagnosis with fewer fault signatures that
are closely related to single and multiple-combined bearing
I. I NTRODUCTION
defects.
D URING the past several decades, model-based fault

detection and diagnosis (FDD) techniques have been
extensively used to enhance the reliability and availability of
With the negative effects of high-dimensional feature vec-
tors, current FDD approaches are also subject to diagnostic
performance deterioration caused by outliers, defined as data
industrial systems subject to faults [1]–[3]. However, indus- points that are distant from the other agglomerated data points
trial processes have become more complicated in recent years, in the same class. To address this issue, an outlier-insensitive
and there is less tolerance for both performance deterioration hybrid feature selection (OIHFS) methodology is developed in
and productivity decreases. Hence, conventional model-based this study. This method mainly consists of sequential forward
schemes, which demand a deep knowledge of process models floating search (SFFS) [12], [13], precise assessment of the
derived from first principles, are impractical for complicated quality of feature subsets, and accuracy estimation of a classi-
industrial processes [4], [5]. Fortunately, the rapid develop- fier. In this OIHFS, the SFFS is first used to produce a series of
ment of data acquisition, data mining, and machine-learning feature subsets. Then, a discriminatory feature subset candidate
techniques is facilitating the collection and storage of massive is determined among them, where the feature subset candidate
amounts of data (e.g., current, vibration, and acoustic emis- can be insensitive to outliers. That is, to successfully discrimi-
sion), the extraction of useful inherent information, and the nate the feature subset candidate, proper assessment of feature
classification of fault types, respectively, in large-scale indus- subset quality is a key issue. Accordingly, this study introduces
tries. Accordingly, data-driven methodologies are alternatives a new feature subset evaluation metric which is defined as the
that can be used for efficient process monitoring and fault ratio of the intraclass compactness to the interclass separabil-
diagnosis [6]–[8]. ity. Both the compactness and the separability are estimated
Today’s FDD approaches use high-dimensional feature vec- by understanding the relationship between the data points and
tors to avoid the risks of missing potentially necessary informa- outliers. A technique to accurately detect outliers based on the
tion needed for reliable FDD. However, some of fault features following attributes is further investigated in this study: 1) large
are redundant or irrelevant to the predictive models (i.e., super- distances from the agglomerated data points in the same class
vised and unsupervised learning). If so, these redundant or and 2) low membership degrees in the same class.
irrelevant fault features can be a primary cause of diagnostic Once a discriminatory feature subset candidate is deter-
performance degradation. To address this issue, discriminatory mined, it is further used to predict the classification accuracy of
fault feature (or signature) selection has become an indis- a k-nearest neighbors (k-NN) algorithm, where the k-NN algo-
pensable part of reliable FDD. Essentially, the following two rithm is a nonparametric method of classification. Additionally,
procedures are performed in the feature selection process, i.e., both the filter method for determining a discriminatory fea-
a number of feature subsets (or subsets of fault signatures) ture subset candidate and the wrapper method for estimating
are first formed (i.e., configuration process of feature subsets) accuracy with that feature subset candidate are carried out mul-
and then evaluated (i.e., evaluation process of feature subset tiple times via k-fold cross validation (k-cv) in the OIHFS
quality). Based on the evaluation process, feature selection methodology. This iterative process is effective for reducing
schemes are basically categorized into filters or wrappers. Filter variability of the feature subsets. However, since this process
approaches employ an evaluation strategy that is independent results in several discriminatory feature subset candidates, it
from any classification scheme, while wrapper methods use is necessary to determine the most discriminatory feature sub-
accuracy estimates for specific classifiers during the assessment set that will be eventually used for data-driven diagnostics. To
of feature subset quality [9]. Accordingly, wrapper method- deal with this issue, a decision rule based on both accuracy
ologies theoretically offer better diagnostic performance for estimates and the frequency of feature subsets is employed.
predefined specific classifiers than filter methods. However, fil- In this study, the efficacy of the OIHFS methodology is vali-
ter approaches are computationally efficient since they avoid dated with a low-speed rolling element-bearing fault-diagnosis
the accuracy estimation process for a certain classifier. application.
To achieve high-computational efficiency and diagnostic per- This paper is organized as follows. In Section II, the data
formance concurrently, recent intelligent FDD approaches have acquisition environment for performing bearing fault diagnosis
adopted hybrid feature selection (HFS) schemes that appropri- is illustrated. In Section III, the OIHFS scheme is presented,
ately exploit the advantages of the filter and wrapper methods and its efficacy is validated in Section IV. Conclusion and
[10], [11]. Liu et al. presented an HFS approach for the effective suggestion for future work are provided in Section V.
KANG et al.: HFS SCHEME FOR REDUCING DIAGNOSTIC PERFORMANCE DETERIORATION 3301
Fig. 1. Example of single and multiple-combined seeded bearing failures with crack length, width, and depth of 3, 0.35, and 0.3 mm, respectively.
(a) BCI, (b) BCO, (c) BCR, (d) BCIO, (e) BCIR, (f) BCOR, and (g) BCIOR, where the number of rollers is 13, the contact angle is 0◦ , and the roller
and pitch diameters are 9 and 46.5 mm, respectively.
is 0.2714, which is approximately three times higher than the

RMS value of 0.0838 for a DFB without the load.
In addition, Al-Ghamd et al. conducted experiments to eval-
uate bearing defect severity by exploiting the kurtosis and RMS
values, and they concluded that RMS was suitable for mea-
suring the degree of bearing defect severity [17]. Thus, this
study employs the same method to show the defect severity of
defective bearings under the slight load condition. As shown
in Fig. 4, the RMS value increases with bearing defect sever-
Fig. 2. (a) Machinery fault simulator. (b) Data acquisition system. ity. More specifically, the RPM values, on average, increase
1.81-fold, 1.97-fold, 2.56-fold, 5.39-fold, 2.53-fold, and 7.1-
II. ACOUSTIC E MISSION DATA ACQUISITION FOR fold for the BCI, BCO, BCR, BCIO, BCIR, BCOR, and BCIOR
R OLLING E LEMENT-B EARING FAULT D IAGNOSIS conditions, respectively, as severity worsens.
As mentioned above, a bearing fault-diagnosis application is
used to determine whether the OIHFS methodology is useful
III. O UTLIER -I NSENSITIVE H YBRID F EATURE S ELECTION
for data-driven FDD approaches. This study uses a defect-free
and seven defective cylindrical roller bearings (FAG NJ206- As illustrated in Fig. 5, a dataset (see Table I) is divided into
E-TVP2). Specifically, defective bearings have either a single two different subdatasets: an analysis dataset and an evaluation
crack or multiple-combined cracks in the following locations dataset. This is to guarantee the reliability of the performance
(see Fig. 1): the inner raceway (BCI), outer raceway (BCO), evaluation results by isolating the analysis dataset from the
roller (BCR), inner and outer raceways (BCIO), inner raceway evaluation dataset. In this study, 30 of 90 five-second AE data
and roller (BCIR), outer raceway and roller (BCOR), and inner samples for each bearing condition are used to configure the
raceway, outer raceway, and roller (BCIOR). analysis dataset, while the remaining data are reserved as the
Since, according to [14], the acoustic emission (AE) sensor evaluation dataset (i.e., 60 of 90 five-second AE data samples
is capable of capturing intrinsic information about defect-free for each bearing condition not involved in the analysis dataset).
bearing (DFB) and defective bearings, this study records AE That is, the analysis dataset is used to determine the most dis-
data sampled at 250 kHz. For this purpose, a PCI-2-based sys- criminatory feature subset for bearing fault diagnosis, whereas
tem [15] that is connected to a general-purpose wideband fre- the evaluation dataset is employed for efficacy verification of
quency AE sensor (WSα from Physical Acoustics Corporation) the OIHFS methodology.
[16] is used, as shown in Fig. 2. More specifically, the AE sen-
sor is attached at the top of the nondrive end-bearing housing,
and its distance from the nondrive end cylindrical roller bear- A. Fault Signature Pool Configuration
ing is 21.48 mm. Likewise, this study uses various datasets In Fig. 5, the analysis dataset is used to configure a fault-
involving 90 five-second AE data samples for each bearing con- signature pool and to determine the most discriminatory feature
dition (e.g., a defect-free bearing and seven defective bearings) subset. According to [18], statistical parameters from the time
for efficacy validation of the OIHFS method, as presented in and frequency domains are well corroborated with intelligent
Table I. fault-diagnosis schemes. Thus, this study uses them as fault
signatures for the identification of various single and multiple-
combined bearing defects. Tables II and III define statistical
A. Load Condition and Defect Severity of Defective
parameters for the given five-second AE data, x(n). These
Bearings
parameters are calculated in the time and frequency domains,
In 2006, Al-Ghamd et al. demonstrated the effectiveness of and include the root-mean-square (RMS, f1 ), square root of
using a root-mean-square (RMS) method to measure load con- the amplitude (SRA, f2 ), kurtosis (f3 ), skewness (f4 ), peak-
ditions [17]. Accordingly, in this study, the RMS values are to-peak (PP, f5 ), crest factor (CF, f6 ), impulse factor (IF, f7 ),
calculated for five-second AE data obtained from DFBs rotating margin factor (MF, f8 ), shape factor (SF, f9 ), kurtosis factor
at 500 RPM under no-load and load conditions. As illustrated (KF, f10 ), frequency center (FC, f11 ), RMS frequency (RMSF,
in Fig. 3, the RMS value for a DFB with a slight load condition f12 ), and root variance frequency (RVF, f13 ).
TABLE I
D ETAILED D ESCRIPTION OF VARIOUS DATASETS U SED TO E VALUATE THE E FFICACY OF THE OIHFS M ETHODOLOGY
where BPFI is the ball pass frequency of the inner raceway,

BPFO is the ball pass frequency of the outer raceway, BSF is the
ball spin frequency, and FTF is the fundamental train frequency.
In (1), these characteristic frequencies depend on the following
parameters: the number of rollers (Nrollers ), the shaft speed in
hertz (Fshaft ), the contact angle (α), the roller diameter (Bd ),
and the pitch diameter (Pd ).
In fact, narrow-band RMSF values (see Table III) are used as
additional fault signatures, computed around each BPFI, BPFO,
Fig. 3. RMS values of five-second AE data obtained from DFBs rotating 2 × BSF, up to the third harmonics of each of these defect
at a speed of 500 RPM under (a) no-load and (b) slight-load conditions.
frequencies in an envelope power spectrum obtained from a
five-second AE data sample. Moreover, this study assumes
that a bearing contains all of the possible bearing defects
(i.e., a crack on the inner raceway, the outer raceway, and a
roller). Accordingly, in total, nine RMSF values are further
used as fault signatures: RMSFBPFI1 (f14 ), RMSFBPFI2 (f15 ),
RMSFBPFI3 (f16 ), RMSFBPFO1 (f17 ), RMSFBPFO2 (f18 ),
RMSFBPFO3 (f19 ), RMSF2×BSF1 (f20 ), RMSF2×BSF2 (f21 ), and
RMSF2×BSF3 (f22 ).
In a bearing, the radial load greatly influences the force
of the impact caused by rolling over a defect. The outer
Fig. 4. Defect severity of defective bearings considered in this study. raceway is a stationary component of the bearing, and thus
a defect of the outer raceway is subjected to the same force
For bearing failures, there are characteristic (or defect) fre- at each roll. On the other hand, a fault of the inner raceway
quencies at which faulty symptoms must be observable. This is subjected to variable forces because it rotates near the
has prompted us to use statistical values, computed around har- shaft speed. Consequently, all the harmonics of the BPFI are
monics of these characteristic frequencies in an envelope power amplitude-modulated by the RPM of the shaft (i.e., Fshaft ).
spectrum, as additional fault signatures. A detailed description Similarly, 2 × BSF, caused by a roller defect, is amplitude-
of the method for obtaining an envelope power spectrum for the modulated by the RPM of the cage (i.e., FTF). Theoretically,
given AE data is provided in [19]. amplitude modulation by either inner-raceway or roller-related
The aforementioned faulty symptoms can be revealed at one bearing defects produces sidebands that are spaced apart by the
of the following four defect frequencies as the roller strikes a modulation frequency (i.e., Fshaft or FTF) and centered about
local failure in the bearing [20]: the BPFI or 2 × BSF. Furthermore, it is common to observe a
random variation in the calculated bearing-defect frequencies
Nroller · Fshaft Bd on the order of 1%–2% [21]. Thus, in this study, narrow-band
BPFI = 1+ cos α
2 Pd RMSF values are calculated in frequency ranges with a

Nroller · Fshaft Bd random variation of 2%. Namely, RMSFBPFIi, RMSFBPFOi ,
BPFO = 1− cos α and RMS2×BSFi are computed in ranges from 1 − RV2order ·
2 Pd
2 (BPFIi − 2 · Fshaft ) to 1 + RV2order · (BPFIi + 2 · Fshaft ),
Pd · Fshaft Bd
BSF = 1− cos α from 1 − RV2order · (BPFOi ) to 1 + RV2order · (BPFOi ),
2 · Bd Pd
and from 1 − RV2order · (2 × BSFi − 2 · FTF) to
Fshaft Bd
FTF = 1− cos α (1) 1+ 2 RVorder
· (2 × BSFi + 2 · FTF), respectively, where
2 Pd RV order is the order of random variation of the theoretical
Fig. 5. Overall flow diagram of bearing fault diagnosis.
TABLE II
D EFINITION OF T IME -D OMAIN S TATISTICAL PARAMETERS U SED IN T HIS S TUDY
Ntsamples is the total number of data samples in five-second AE data sampled at 250 kHz, x(n).
x̄ and σ are the mean and the standard deviation of x(n), respectively.
TABLE III
D EFINITION OF F REQUENCY-D OMAIN S TATISTICAL PARAMETERS U SED IN T HIS S TUDY
S(f ) is the magnitude response of the fast Fourier transform of x(n.

Ntf reqbin is the total number of frequency bins.
characteristic frequencies (RV order = 2% in this study) and i I, both the filter and wrapper methods are performed Niterations
is an index indicating the ith harmonic of BPFI, BPFO, and times (Niterations = 20 in this study). This iterative process with
2 × BSF (i = 1, 2, and 3 in this study). cross validation reduces the variability of the most discrimi-
In summary, the dimensionality of the fault-signature pool natory fault-signature subset, which is used for reliable FDD.
used in the feature selection process is Nfeatures × Nanaldata × More details about the OIHFS approach are given below.
Nclasses , where Nfeatures is the number of fault signatures 1) A Metric to Assess the Quality of Feature Subsets
(Nfeatures = 22 in this study), Nanaldata is the number of data in OIHFS: SFFS is used to yield a series of feature subsets.
points per bearing condition in the analysis dataset (Nanaldata = To determine a promising feature subset candidate from these,
30 in this study), and Nclasses is the number of bearing con- a metric to precisely assess the quality of feature subsets is
ditions to be discriminated in this study (Nclasses = 8 in this needed. Recently, Kang et al. presented an efficient multivariate
study). feature evaluation criterion using average values of pair-wise
Euclidean distances to measure the intraclass compactness and
interclass separability [23]. Based on this criterion, the authors
B. OIHFS Scheme greatly improved the diagnostic performance. In practice, the
As shown in Fig. 6, k-cv [22] is used to divide the above-mentioned feature evaluation criterion is effective for
Nfeatures × Nanaldata × Nclasses -dimensional fault-signature pool determining a feature subset that minimizes the intraclass com-
randomly into k Nfeatures × (Nanaldata /k) × Nclasses -dimensional pactness and maximizes the interclass separability. However,
fault-signature subpools. Accordingly, the filter method in diagnostic performance deterioration is likely, since the esti-
OIHFS can yield k discriminatory feature subset candidates. mation of both the compactness and separability based on
Then, these feature subsets are used in the wrapper method for average values of pair-wise Euclidean distances does not take
accuracy estimation of k-NN. As briefly mentioned in Section into account the impact of outliers.
Fig. 6. Overall process of the OIHFS methodology.
Fig. 7. Example of a 2-D fault-signature distribution, where the dot-

ted circle (left) indicates the degree of intraclass compactness of a
class, and the dotted line (right) indicates the pair-wise Euclidean dis-
tance between two data samples in different classes for estimating the
interclass separability of a class. Fig. 8. Example of a data point configuration used in the OIHFS scheme
to yield the most discriminatory feature subset.
The example in Fig. 7 demonstrates the reason that it is nec- For every data point Di , the norm metric is used to compute
essary to consider outliers in estimations of both the intraclass the cumulative distances to all other data points, resulting in
compactness and the interclass separability. In Fig. 7, three out-
liers are clearly observed in class 1. Although there is a high n
Li = Di − Dj 2 , i = 1, 2, . . . , n. (2)
probability that these outliers are not correctly discriminated j=1
into class 1, the ratio of the intraclass compactness to interclass
The centroid data point Dcentroid is associated with a data
separability may be low enough when the average values of
point yielding the minimum accumulated distance. That is,
pair-wise Euclidean distances are calculated, which means that
Dcentroid is a data point satisfying Dcentroid = Di and i =
these features are effectively used to identify the class 1 from
arg min {Li }, respectively. In practice, data points can be cat-
the others. This is because most of the data points in class 1 are i
closely agglomerated and separate from data points belonging egorized as outliers if they have both of the following two
to other classes. Hence, this study investigates a way to mea- attributes: a large distance from the agglomerated data points
sure both the compactness and the separability by considering and a low membership degree. Specifically, the membership
outliers. degree of the data point can be interpreted as the degree to
In the developed OIHFS methodology, the intraclass com- which the point is affinitive with the class to which it belongs.
pactness of a class is defined as the maximum Euclidean To effectively detect outliers, a 2-D vector involving the
distance between a centroid data point and outliers. Thus, a aforementioned two attributes, vDi = {disti , membershipi }, is
centroid data point and outliers must be detected in each class needed for each data point, i = 1, 2, . . . , n. As depicted in
so that the intraclass compactness can be estimated. Let D = Fig. 9, the maximum Euclidean distance between the ith data
D1 , D2 , D3 , . . . , Dn be a set of data points in a class, where point and its neighboring data points disti is used as one of
n is the total number of data points (i.e., n = Nanaldata in this the outlier detection criteria. The number of neighboring data
study). In addition, each data point in D corresponds to a vec- points is set to 3 in this study. In addition, a membership degree
tor involving fault signatures specified by SFFS. Fig. 8 depicts for the ith data point, membershipi , is assigned by a probability
an example of data point configuration in a class. density function (pdf) of the maximal Euclidean distance. This
Fig. 11. Example showing how to calculate the interclass separability of

a class.
Fig. 9. 2-D vectors used for outlier detection.
TABLE IV
D IFFERENCE IN M EANING OF THE S YMBOL K W HEN U SED FOR
K -F OLD C ROSS VALIDATION AND K -N EAREST N EIGHBORS
Fig. 10. Example showing the process used to compute the intraclass
compactness of a class.
membership degree is used as another outlier detection crite- where mindistcdp-to-dp = min cdpi − dpj,k 2
∀i, j = 1, 2,
rion. In this study, the membership level for determining outlier i=j
candidates, membershiplevel , is defined as . . . , Nclasses ∀k = 1, 2, . . . , Nanaldata and mindistoutlier-to-dp =

min outlieri,l − dpj,k 2
∀i, j = 1, 2, . . . , Nclasses ∀k =
membershiplevel = min {pdf (disti )} + {max {pdf (disti )} i=j
1, 2, . . . , Nanaldata ∀l = 1, 2, . . . , Noutliers . Moreover, cdpi is
− min {pdf (disti )}} · ε, i = 1, 2, . . . n the centroid data point of the ith class, dpj,k is the kth data
(3) point in class j, and outlieri,l is the lth outlier in class i. For the
where ε, which controls the membership level, is a constant overall separability, this study uses the minimum value among
value from 0 to 1. Likewise, ε is experimentally determined the per-class separability estimates, denoted as Soverall
and set to 0.25 in this study. As the membershiplevel is decided,
Soverall = min {Si } , i = 1, 2, . . . , Nclasses . (7)
it is possible to maximally obtain two Euclidean distance values
specified by that membership level, denoted as Ledist and Redist Based on both Coverall and Soverall , the metric to assess the
in Fig. 9. quality of any given feature subset by SFFS, Emetric , is defined
Due to the properties of outliers which are greatly sepa- as follows:
rated from the agglomerated data points in the same class,
this study can ultimately identify the ith data point that sat- Coverall
Emetric = . (8)
isfies disti ≥ Redist and membershipi ≤ membershiplevel as an Soverall
outlier, i = 1, 2, . . . , n. As illustrated in Fig. 10, the intraclass
In summary, discriminatory feature subset candidates deter-
compactness of class i, Ci , is computed as follows:
mined through the use of well-defined intraclass compactness
Ci = max {Edistl } , l = 1, 2, . . . , Noutliers (4) and interclass separability estimates are helpful for reduc-
ing diagnostic performance degradation caused by outliers in
where Edistl is the Euclidean distance between the centroid data data-driven diagnostics.
point and the lth outlier. Likewise, Noutliers is the total number 2) Accuracy Estimation of the k-NN Classifier in
of outliers in each class. Based on the per-class compactness, OIHFS: In Fig. 6, two-fold cross validation generates two
the overall compactness Coverall is finally estimated as randomly portioned fault-signature subpools, denoted as
Nfeatures × (Nanaldata /2) × Nclasses -dimensional fault-signature
Coverall = max {Ci } , i = 1, 2, . . . , Nclasses . (5) subpool1 and Nfeatures × (Nanaldata /2) × Nclasses -dimensional
Fig. 11 pictorially illustrates the process of estimating the fault-signature subpool2 , respectively. For accuracy estimation
interclass separability of class i, Si , which is calculated as of the k-NN classifier, a subpool used to yield a discriminatory
feature subset candidate is reserved as a training dataset, while
Si = min {mindistcdp-to-dp , mindistoutlier-to-dp } (6) the other subpool is employed as a test dataset. This is repeated
Fig. 12. Predictive classification accuracy of the k -NN classifier using 40 discriminatory feature subset candidates.
TABLE V TABLE VI
S UMMARY OF F EATURE S UBSETS D ETERMINED BY THE OIHFS S UMMARY OF D ISCRIMINATORY F EATURE S UBSETS Y IELDED
M ETHODOLOGY BY THE F EATURE E VALUATION M ETRIC IN [23]
until each subpool is reserved as either a training dataset or a from Niterations × k feature subset candidates. More specifically,
test dataset at least once. the decision rule is based not only on the predictive classifica-
In OIHFS, the aforementioned accuracy estimation process tion accuracy (or diagnostic performance) of the k-NN classifier
is repeated Niterations times. Hence, a decision rule is required so but also on the frequency of the feature subsets. In this decision
that the most discriminatory feature subset can be determined rule, the predictive classification accuracy has the priority in the
TABLE VII
AVERAGE C LASSIFICATION ACCURACIES AND S ENSITIVITIES FOR I DENTIFYING D EFECT-F REE AND D EFECTIVE B EARINGS
VIA 20 T IMES K -CV (U NIT : %)
determination of which feature subset to use in the performance

evaluation process (see Fig. 5). This study further considers
the frequency of feature subset candidates in the decision rule.
This is because some feature subset candidates may yield the
same high level of classification accuracy even though they are
different from each other.
In this study, the predictive classification accuracy (CA) is
defined as follows: Fig. 13. 2-D representation of discriminatory feature subsets yielded by
the OIHFS approach for (a) dataset 1 and (b) dataset 10.
Nclasses NTP
CA = × 100 (%) (9)
Ntotaldata
IV. E XPERIMENTAL R ESULTS
where Ntotaldata is the total number of data points used to test the A. Most Discriminatory Fault-Signature Subset
k-NN classifier and NTP is the number of data points in class i Determined by the OIHFS Methodology
that are correctly classified as class i, i = 1, 2, . . . , Nclasses .
As mentioned in Section III-B, Niterations × k discriminatory
feature subset candidates are eventually created (Niterations = 20
and k = 2 for k-cv). This study then examines the predic-
C. Summary of k Values for Both k-Fold Cross Validation
tive classification accuracy of the k-NN classifier using these
and k-Nearest Neighbors
40 discriminatory feature subset candidates. As depicted in
Although the use of the same symbol k for both k-cv and Fig. 12, it is obvious that the classification performance largely
k-NN may reduce readability, this symbol is extensively used in depends on the aforementioned feature subsets. Accordingly,
the literature to denote not only the number of randomly divided this study evaluates each of them by employing the decision
folds in k-cv but also the number of neighbors for measuring the rule described in Section III-B. Table V lists the most discrimi-
degree of affinity with classes in k-NN. To enhance readability, natory feature subset for each dataset (see Table I), and each of
Table IV clarifies the difference in meaning when the symbol these subsets is used to identify various bearing conditions in
“k” is used for k-cv and k-NN. the performance evaluation process.
TABLE VIII
AVERAGE E XECUTION T IMES FOR B OTH FAULT S IGNATURE C ALCULATIONS AND ACCURACY E STIMATIONS
B. Efficacy Verification of the OIHFS Methodology for a low rotational speeds. To analyze this phenomenon, this study
Bearing Fault-Diagnosis Application employs a 2-D representation of discriminatory feature subsets
yielded by the OIHFS approach, as depicted in Fig. 13. For
The key difference between the OIHFS and other conven-
bearings with small cracks at low rotational speeds (e.g., dataset
tional HFS methodologies is the manner of assessing the quality
1), fault signatures in the same class are not as closely agglom-
of feature subsets. Hence, this section validates the effective-
erated as those obtained from bearings with large cracks at high
ness of the feature subset evaluation metric (under the OIHFS
rotational speeds (e.g., dataset 10). In addition, fault signatures
scheme) in comparison with one using average values of pair-
belonging to BCO and BCIO are not clearly separated, which
wise Euclidean distances [23]. Table VI summarizes the most
eventually results in diagnostic performance degradation.
discriminatory feature subset for each dataset yielded by the
In this study, the OIHFS methodology efficiently reduces the
feature subset evaluation metric in [23].
dimensionality of the feature vector by eliminating irrelevant
k-cv is also employed to estimate the generalized classifica-
and redundant fault signatures for reliable FDD. This implies
tion accuracy in the performance evaluation process. That is,
that it is possible to alleviate the computational burden of con-
an evaluation dataset (see Fig. 5) is randomly divided into k
figuring a feature vector in real FDD applications. Moreover,
mutual folds, denoted as F1 , F2 , . . . , Fk . At the ith iteration
low-dimensional feature vectors can contribute to reducing the
in k-cv, fold Fi is reserved as a training dataset for the k-NN
time needed to compute the classification accuracy of a clas-
classifier, while the remaining folds are used to test the k-NN
sifier (e.g., k-NN in this study). Table VIII shows the speedups
classifier, in which the test of the k-NN classifier is performed
obtained due to the use of the low-dimensional feature vector in
k times. In addition, this study iteratively performs the per-
the developed method; all the experiments have been performed
formance evaluation process to increase the reliability of the
with MATLAB 2008a on an Intel Core i3-2120 CPU operating
diagnostic results. Hence, the final diagnostic performance is
at 3.30 GHz. As shown in Table VIII, the average time required
defined as the average of Niterations × k classification accuracies
to execute a given task within the bearing fault-diagnosis appli-
(ACA).
cation largely depends on the time required to calculate fault
Table VII presents the diagnostic performance for each
signatures. The average execution time for predicting classifica-
dataset (see Table I) in terms of the ACA, which is helpful for
tion accuracy is almost negligible. In summary, the developed
understanding the overall diagnostic performance. In addition,
method achieves 2.15- to 29.7-fold speedups by significantly
sensitivity, a useful metric for evaluating the diagnostic perfor-
reducing the execution times in fault-signature calculations.
mance for each bearing condition, is provided in Table VII, and
is defined as
NTP V. C ONCLUSION AND F UTURE W ORK
Sensitivity = × 100 (%) (10)
NTP + NFN The OIHFS scheme was developed to reduce diagnostic
where NFN is the number of data points in class i that are not performance deterioration caused by outliers in data-driven
correctly classified as class i. diagnostics. Its key contribution is the assessment of the qual-
As shown in Table VII, the OIHFS method enhances the ity of feature subsets. The developed feature subset evaluation
ACAs of 0.44% and 10% compared to the ACAs obtained metric, defined as the ratio of the intraclass compactness to the
when all the fault signatures and feature subsets specified by the interclass separability estimated by understanding the relation-
method in [23] are used, respectively. Consequently, the OIHFS ship between data points and outliers, is capable of determining
methodology can effectively solve the diagnostic performance the most discriminatory feature subset, and can minimize the
degradation problem caused by outliers. negative impacts of outliers. The experimental results indicated
An interesting observation in Table VII is that the diagnos- that feature subsets specified by the developed metric are more
tic performance decreases for bearings with small cracks at effective for alleviating diagnostic performance degradation
than feature subsets determined by the metric in [23]. The [16] WSα Sensor [Online]. Available: http://www.pacndt.com/downloads/
developed approach achieved diagnostic performance improve- Sensors/Alpha/WS_Alpha.pdf
[17] A. M. Al-Ghamd and D. Mba, “A comparative experimental study on the
ments of up to 30.0% over the conventional method in terms use of acoustic emission and vibration analysis for bearing defect identifi-
of ACA. In addition, the OIHFS method reduced the execu- cation and estimation of defect size,” Mech. Syst. Signal Process., vol. 20,
tion time of the bearing fault diagnosis application by efficiently pp. 1537–1571, 2006.
[18] Z. Xia, S. Xia, L. Wan, and S. Cai, “Spectral regression based fault feature
reducing the dimensionality of the originally produced feature extraction for bearing accelerometer sensor signals,” Sensors, vol. 12,
vector. pp. 13694–13719, 2012.
This study presented a method of detecting outliers based on [19] M. Kang, J. Kim, and J.-M. Kim, “High-performance and energy-efficient
fault diagnosis using effective envelope analysis and denosing on a
the following attributes: 1) large distances from the agglom- general-purpose graphics processing unit,” IEEE Trans. Power Electron.,
erated data points in the same class and 2) low membership vol. 30, no. 5, pp. 2763–2776, May 2015.
degrees. In this method, the membership level was experimen- [20] I. Bediaga, X. Mendizabal, A. Arnaiz, and J. Munoa, “Ball bearing
damage detection using traditional signal processing algorithms,” IEEE
tally set to a constant value so that outliers could be identified. Instrum. Meas. Mag., vol. 16, no. 12, pp. 20–25, Apr. 2013.
The downside of this is that a fixed membership level may lead [21] R. B. Randall and J. Antoni, “Rolling element bearing diagnostics—A
to misdetection of outliers. Thus, future research will explore tutorial,” Mech. Syst. Signal Process., vol. 25, pp. 485–520, 2011.
[22] J. D. Rodriguez, A. Perez, and J. A. Lozano, “Sensitivity analysis of k-
a technique in which the membership level is set adaptively so fold cross validation in prediction error estimation,” IEEE Trans. Pattern
that outliers can be detected more accurately in real data-driven Anal. Mach. Intell., vol. 32, no. 3, pp. 569–575, Mar. 2010.
FDD applications. [23] M. Kang, J. Kim, and J.-M. Kim, “Reliable fault diagnosis for incipi-
ent low-speed bearings using fault feature analysis based on a binary bat
algorithm,” Inf. Sci., vol. 294, pp. 423–438, 2015.
[24] A. Hajnayeb, A. Ghasemloonia, S. E. Khadem, and M. H. Moradi,
“Application and comparison of an ANN-based feature selection method
R EFERENCES and the algorithm in gearbox fault diagnosis,” Expert Syst. Appl., vol. 38,
pp. 10205–10209, 2011.
[1] J. Seshadrinath, B. Singh, and B. K. Panigrahi, “Vibration analysis based
interturn fault diagnosis in induction motors,” IEEE Trans. Ind. Informat.,
vol. 10, no. 1, pp. 340–250, Feb. 2014.
[2] Y. Gritli, L. Zarri, C. Rossi, F. Filippetti, G.-A. Capolino, and D. Casadei,
“Advanced diagnosis of electrical faults in wound-rotor induction Myeongsu Kang received the B.E. and M.S.
machines,” IEEE Trans. Ind. Electron., vol. 60, no. 9, pp. 4012–4024, degrees in computer engineering and informa-
Sep. 2013. tion technology and the Ph.D. degree in electri-
[3] S. Huang, K. K. Tan, and T. H. Lee, “Fault diagnosis and fault-tolerant cal, electronic, and computer engineering from
control in linear drives using the Kalman filter,” IEEE Trans. Ind. the University of Ulsan, Ulsan, Korea, in 2008,
Electron., vol. 59, no. 11, pp. 4285–4292, Nov. 2012. 2010, and 2015, respectively.
[4] X. Dai and Z. Gao, “From model, signal to knowledge: A data-driven He is currently a Research Associate with
perspective of fault detection and diagnosis,” IEEE Trans. Ind. Informat., the Center for Advanced Life Cycle Engineering,
vol. 9, no. 4, pp. 2226–2238, Nov. 2013. The University of Maryland, College Park, MD,
[5] S. Yin, X. Li, H. Gao, and O. Kaynak, “Data-based techniques focused USA. His research interests include data-driven
on modern industry: An overview,” IEEE Trans. Ind. Electron., vol. 62, prognostics and health management using sig-
no. 1, pp. 657–667, Jan. 2015. nal processing, data mining, and machine learning techniques, and
[6] S. Yin, X. Zhu, and O. Kaynak, “Improved PLS focused on high-performance computing.
key-performance-indicator-related fault diagnosis,” IEEE Trans. Ind.
Electron., vol. 62, no. 3, pp. 1651–1658, Mar. 2015.
[7] S. Yin and Z. Huang, “Performance monitoring for vehicle suspension
system via fuzzy positivistic C-means clustering based on accelerom- Md. Rashedul Islam received the B.S. degree
eter measurements,” IEEE/ASME Trans. Mechatronics, vol. 20, no. 5, in computer science and engineering from the
pp. 2613–2620, Oct. 2015. University Rajshahi, Rajshahi, Bangladesh, in
[8] A. Soualhi, G. Clerc, and H. Razik, “Detection and diagnosis of faults 2006, and the M.S. degree in informatics from
in induction motor using an improved artificial ant clustering technique,” the Högskolan i Borås (University of Boras),
IEEE Trans. Ind. Electron., vol. 60, no. 9, pp. 4053–4062, Sep. 2013. Boras, Sweden, in 2011. He is currently working
[9] B. Li, P.-L. Zhang, H. Tian, S.-S. Mi, D.-S. Liu, and G.-Q. Ren, “A toward the Ph.D. degree in electrical, electronic,
new feature extraction and selection scheme for hybrid fault diagnosis and computer engineering at the University of
of gearbox,” Expert Syst. Appl., vol. 38, pp. 10000–10009, 2011. Ulsan, Ulsan, Korea.
[10] C. Liu, D. Jiang, and W. Yang, “Global geometric similarity scheme for He is an Assistant Professor (on study leave)
feature selection in fault diagnosis,” Expert Syst. Appl., vol. 41, pp. 3585– with the Department of Computer Science and
3595, 2014. Engineering, University of Asia Pacific (UAP), Dhaka, Bangladesh. His
[11] Y. Yang, Y. Liao, G. Meng, and J. Lee, “A hybrid feature selection scheme research interests include signal processing, machine learning, data-
for unsupervised learning and its application in bearing fault diagnosis,” driven diagnostics and prognostics, parallel processing, and GPS.
Expert Syst. Appl., vol. 38, pp. 11311–11320, 2011.
[12] T. W. Rauber, F. A. Boldt, and F. M. Varejao, “Heterogeneous feature
models and feature selection applied to bearing fault diagnosis,” IEEE
Trans. Ind. Electron., vol. 62, no. 1, pp. 637–646, Jan. 2015.
[13] L. Lu, J. Yan, and C. W. de Silva, “Dominant feature selection for the Jaeyoung Kim received the B.S. and M.S.
fault diagnosis of rotary machines using modified genetic algorithm and degrees in electrical, electronic, and com-
empirical mode decomposition,” J. Sound Vib., vol. 344, pp. 464–483, puter engineering from the University of Ulsan,
2015. Ulsan, Korea, in 2012 and 2015, respectively,
[14] A.-B. Ming, W. Zhang, Z.-Y. Qin, and F.-L. Chu, “Dual-impulse response where he is currently working toward the Ph.D.
model for the acoustic emission produced by a spall and the size evalu- degree in electrical, electronics, and computer
ation in rolling element bearings,” IEEE Trans. Ind. Electron., vol. 62, engineering.
no. 10, pp. 6606–6615, Oct. 2015. His research interests include artificial intelli-
[15] PCI-2 Based AE System User’s Manual [Online]. Available: http://www. gence, data-driven prognostics and health man-
physicalacoustics.com/content/literature/multichannel_systems/PCI-2_ agement, and high-performance computing.
Product_Bulletin.pdf
Jong-Myon Kim (M’05) received the B.S. Michael Pecht (M’83–SM’90–F’92) received the
degree in electrical engineering from Myongji dual M.S. degree in electrical engineering and
University, Yongin, Korea, in 1995, the M.S. engineering mechanics and the Ph.D. degree
degree in electrical and computer engineering in engineering mechanics from the University
from the University of Florida, Gainesville, FL, of Wisconsin–Madison, Madison, WI, USA, in
USA, in 2000, and the Ph.D. degree in electrical 1978, 1979, and 1982, respectively.
and computer engineering from Georgia Institute He is the Founder of the Center for Advanced
of Technology, Atlanta, GA, USA, in 2005. Life Cycle Engineering, The University of
He is currently a Professor with the Maryland, College Park, MD, USA, which is
Department of IT Convergence and also a funded by more than 150 of the world’s lead-
Vice President of the Foundation for Industry ing electronics companies at more than U.S. $6
Cooperation, University of Ulsan, Ulsan, Korea. His research interests million/year. He is also the George E. Dieter Professor of Mechanical
include multimedia-specific processor architecture, fault diagnosis and Engineering and a Professor of Applied Mathematics with The University
condition monitoring, parallel processing, and embedded systems. of Maryland. He has authored/coauthored more than 20 books on elec-
tronic product development, use, and supply chain management and
more than 500 technical articles.

Kang 2016

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Kang 2016

Uploaded by

Copyright:

Available Formats

IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 63, NO.

5, MAY 2016 3299

A Hybrid Feature Selection Scheme for

D URING the past several decades, model-based fault

is 0.2714, which is approximately three times higher than the

where BPFI is the ball pass frequency of the inner raceway,

Fig. 5. Overall flow diagram of bearing fault diagnosis.

S(f ) is the magnitude response of the fast Fourier transform of x(n.

Fig. 6. Overall process of the OIHFS methodology.

Fig. 7. Example of a 2-D fault-signature distribution, where the dot-

Fig. 11. Example showing how to calculate the interclass separability of

candidates, membershiplevel , is defined as . . . , Nclasses ∀k = 1, 2, . . . , Nanaldata and mindistoutlier-to-dp =

determination of which feature subset to use in the performance

You might also like