Professional Documents
Culture Documents
net/publication/3090942
CITATIONS READS
305 2,052
2 authors, including:
Robert X. Gao
Case Western Reserve University
390 PUBLICATIONS 5,840 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Deep Learning for Dynamical System Estimation and Prediction View project
Advanced sensing and stochastic modeling for performance tracking in dynamical systems View project
All content following this page was uploaded by Robert X. Gao on 12 December 2013.
Abstract—The sensitivity of various features that are character- such as autoregressive modeling have also been applied to
istic of a machine defect may vary considerably under different op- analyzing the spectra of defective bearings for classification
erating conditions. Hence it is critical to devise a systematic feature purpose [5]. It was found that feedforward neural networks
selection scheme that provides guidance on choosing the most rep-
resentative features for defect classification. This paper presents a were generally more accurate than linear autoregressive models
feature selection scheme based on the principal component anal- and radial basis function networks to classify healthy and
ysis (PCA) method. The effectiveness of the scheme was verified defective bearings. More recently, wavelet transformation has
experimentally on a bearing test bed, using both supervised and been demonstrated to provide feature inputs to a neural network
unsupervised defect classification approaches. The objective of the for the same goal [6].
study was to identify the severity level of bearing defects, where
no a priori knowledge on the defect conditions was available. The In view of predicting the remaining service life of a ma-
proposed scheme has shown to provide more accurate defect clas- chine, once a defect is identified, it is further important for
sification with fewer feature inputs than using all features initially the monitoring system to provide an estimate of the defect’s
considered relevant. The result confirms its utility as an effective severity level. Such ability is of great relevance to the industry.
tool for machine health assessment. To achieve this goal, both feed-forward and recurrent networks
Index Terms—Defect classification, feature selection, neural net- have been investigated for artificially induced defects [7].
works, principal component analysis (PCA). Knowledge about the defect severity level is needed beforehand
in order to develop appropriate neural network model. However,
I. INTRODUCTION since bearings are generally installed in closed environment
and thus not readily accessible for defect condition inspection,
(3)
(4)
(5)
III. FEATURE SELECTION METHODLOGY
The PCA-based feature selection scheme for machine con- where is a scaling factor. This leads to
dition monitoring was based on the understanding that the am- (6)
plitude of vibration signals of defective machine components
increases as the severity of the defect increases [16]. The issue This equation can be recognized as an eigenvalue problem
of feature selection from a contending feature set arises, because with nontrivial solutions only when are the eigenvalues of
of the stochastic nature of the defect propagation in machinery the scatter matrix . Thus, the associated vectors (
[20]. Generally, as the defect severity increases, an overall in- to ) are the eigenvectors. If the condition is satisfied,
creasing vibration trend is superimposed by local variations of then the above representation also reduces the dimensionality
smaller magnitudes [9]. The goal of feature selection is therefore of the vectors. The error in representation of the original data
to select features that allow for an accurate description of the de- set due to the reduction in number of
fect condition, and subsequently, reliable defect classification, dimensions to is given by [23]
diagnosis, and prognosis. The PCA approach was developed to
reduce the dimensionality of the input features (13 initially) for
both supervised and unsupervised classification purposes. This (7)
is based on the consideration that a large number of inputs, while
increasing the computational load, do not necessarily contribute where are the eigenvalues of the scatter matrix corre-
to improving the effectiveness of defect classification. Prelimi- sponding to the eigenvectors . It is seen from (7) that using
nary study has shown [21] that some features may even provide the eigenvectors corresponding to the largest eigenvalues would
contradictory information and thus reduce the quality of data give the smallest error in representation. Thus, the variance is
analysis. maximum in the direction of the eigenvectors. Also, the variance
In general, the PCA technique transforms vectors in the directions of the eigenvectors de-
from a -dimensional space to creases in the same order when
vectors in a new, -dimensional space
as [22] (8)
of magnitudes, as shown in Fig. 3(a)–(d). A total of 100 sam- Table III, . This result can be
ples were considered for each of the four features; hence each interpreted in terms of the directionality of the eigenvector
feature is a 100 1 vector. The four features were simulated to in the original feature space. If the unit vectors for the original
have random variations from the same mean for each of the four feature space were represented as , , , and (where
clusters. This is similar in principle to the variation of a vibration , , etc.), then a
data feature for four different defect sizes. Each of the four clus- higher magnitude of denotes the similarity in direction
ters for each feature contained 25 data points. The four features of the eigenvector with as compared to the other unit
become less clearly differentiated from to , as overlap be- vectors forming the basis for the original feature space. For the
tween the clusters increases. It is evident from Fig. 3 that a suit- presented simulated data, the component had the largest
able feature selection scheme should be able to rank , , , magnitude, followed by , , and . Thus, the feature
and in the same order. represented along was the most sensitive, followed by those
To derive the principal components for the simulated data set, along , , and . As a result, the presented selection
the four normalized features were arranged in a 4 100 matrix scheme was able to rank the four features – as desired.
Based on this study, the PCA approach was applied to selecting
most representative features extracted from the vibration data
(9) measured in defective bearings, to improve the effectiveness
of defect identification.
The eigenvalues and the eigenvectors were calculated for the
scatter matrix . The matrix of eigenvectors can be represented IV. EXPERIMENTAL SETUP
as , where to and to . The eigenvector
consists of four components from the fourth column of the To investigate the effectiveness of the proposed feature
matrix . Similar arrangement applies to , , and . The selection scheme, experimental data were collected from both
matrix is a 4 4 square matrix because of the presence of healthy and defective bearings that contained seeded, localized
the four features – . The eigenvector corresponding to the defects. Three cases were studied.
eigenvalue with the largest magnitude was chosen. As shown Case I: Defects were seeded in a deep groove ball
in Table II, one of the four eigenvalues of the data set is much bearing (DGBB) of type 6220 [24], in the following three
larger than the other three, indicating that most of the variance configurations:
is concentrated in one direction. • inner ring defect (0.25 mm diameter);
Table III lists the component magnitudes for the eigenvector • outer ring defect (0.25 mm diameter);
corresponding to the largest eigenvalue. Since this corresponds • defect in both the inner (3 mm diameter) and outer ring
to , the feature that is responsible for the maximum variance (0.25 mm diameter).
in the data is readily identified. Subsequently, the magnitudes The 13 features described before were extracted from the
of the four components of were examined. As shown in vibration signals obtained from an accelerometer structurally
MALHI AND GAO: PCA-BASED FEATURE SELECTION SCHEME FOR MACHINE DEFECT CLASSIFICATION 1521
V. DEFECT CLASSIFICATION
Defect classification refers to the identification of defective Fig. 4. Defect classification subtasks.
components. To identify the most sensitive features from the
data collected in Case I, the overall classification task was TABLE IV
decomposed into individual subtasks. If the features required PCA IDENTIFIED FEATURES FOR DEFECTIVE BEARING DATA (CASE I)
to differentiate defective inner and outer rings as well as the
healthy condition were identified separately by the PCA, they
were then used as inputs to a subsequent classification tool.
Fig. 4 illustrates such an approach of dividing the classification
problem into separate two-class problems. The three sub-
tasks were to differentiate bearing pairs designated as “Inner
Ring/Healthy,” “Outer Ring/Healthy,” and “Inner/Outer Ring.” present, both neurons gave an output of one, while the third
For each subtask, only the data corresponding to the two defect neuron gave an output of zero. Instead of using the Sigmoid neu-
conditions were used. rons in the output layer, the Softmax neurons were used (Fig. 5).
The procedure followed for the PCA technique was the same This was done to enable higher level decision-making, as the
as described for the simulated data before. The eigenvectors outputs of such neurons are representative of the a posteriori
corresponding to the largest eigenvalues, in de- probabilities of the particular output. The Softmax neuron is
creasing order of magnitude, were found for each of the three similar to the Sigmoid neuron, except that the outputs are scaled
subtasks. Then the magnitudes of the eigenvector components by the total activation at the output layer [23]. This ensures that
for were compared in each of the three subtasks. The two the total sum of the outputs is one, thus establishing outputs that
features selected due to the similarity in the direction with are representative of the probability of occurrence. This prevents
for each subtask are shown in Table IV. The three identified fea- a skew toward one particular output in cases where the network
tures were and . does not have one output as a clear winner. This feature is espe-
To investigate the effectiveness of these features identified cially useful for bearing defect classification, as the network out-
by the principal components, a classification scheme using an puts the probability of occurrence of a fault based on the training
FFNN with one hidden layer was devised. The output layer con- data.
tained three neurons, corresponding to the conditions of faults To test the generalization ability of such a classification
on the inner (IRF) ring, fault on the outer (ORF) ring, and no scheme, 30% of the data was used for training while the
fault (NF). In the case when both an IRF and ORF defect were remaining 70% was used for testing and validation in equal
1522 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 53, NO. 6, DECEMBER 2004
activation wins, and gives an output while the others remain un-
changed [27]. The weights were initialized with the values
at the end of -means clustering. This ensured that the weights
were in the vicinity of the mean of the cluster as approximated
by the -means procedure. When a new data point was input to
the network with the weights learnt by -means, it resulted in an
output from the neuron, whose weights were nearest to the new
data point. This is most commonly implemented by the criterion
in the WTA network [27]
(11)
TABLE VI
SUMMARY: CLUSTERING FOR SEVERITY CLASSIFICATION