You are on page 1of 10

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/3090942

PCA-Based Feature Selection Scheme for Machine Defect Classification

Article  in  IEEE Transactions on Instrumentation and Measurement · January 2005


DOI: 10.1109/TIM.2004.834070 · Source: IEEE Xplore

CITATIONS READS
305 2,052

2 authors, including:

Robert X. Gao
Case Western Reserve University
390 PUBLICATIONS   5,840 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Deep Learning for Dynamical System Estimation and Prediction View project

Advanced sensing and stochastic modeling for performance tracking in dynamical systems View project

All content following this page was uploaded by Robert X. Gao on 12 December 2013.

The user has requested enhancement of the downloaded file.


IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 53, NO. 6, DECEMBER 2004 1517

PCA-Based Feature Selection Scheme for Machine


Defect Classification
Arnaz Malhi, Student Member, IEEE, and Robert X. Gao, Senior Member, IEEE

Abstract—The sensitivity of various features that are character- such as autoregressive modeling have also been applied to
istic of a machine defect may vary considerably under different op- analyzing the spectra of defective bearings for classification
erating conditions. Hence it is critical to devise a systematic feature purpose [5]. It was found that feedforward neural networks
selection scheme that provides guidance on choosing the most rep-
resentative features for defect classification. This paper presents a were generally more accurate than linear autoregressive models
feature selection scheme based on the principal component anal- and radial basis function networks to classify healthy and
ysis (PCA) method. The effectiveness of the scheme was verified defective bearings. More recently, wavelet transformation has
experimentally on a bearing test bed, using both supervised and been demonstrated to provide feature inputs to a neural network
unsupervised defect classification approaches. The objective of the for the same goal [6].
study was to identify the severity level of bearing defects, where
no a priori knowledge on the defect conditions was available. The In view of predicting the remaining service life of a ma-
proposed scheme has shown to provide more accurate defect clas- chine, once a defect is identified, it is further important for
sification with fewer feature inputs than using all features initially the monitoring system to provide an estimate of the defect’s
considered relevant. The result confirms its utility as an effective severity level. Such ability is of great relevance to the industry.
tool for machine health assessment. To achieve this goal, both feed-forward and recurrent networks
Index Terms—Defect classification, feature selection, neural net- have been investigated for artificially induced defects [7].
works, principal component analysis (PCA). Knowledge about the defect severity level is needed beforehand
in order to develop appropriate neural network model. However,
I. INTRODUCTION since bearings are generally installed in closed environment
and thus not readily accessible for defect condition inspection,

S ENSOR-BASED machine condition monitoring has gained


increasing attention from the research community world-
wide [1], [2]. The goal of machine condition monitoring is to ob-
it is generally difficult to implement such models in an industry
setting.
Selecting the most representative features as inputs to the ma-
tain operational status of the machines and use the information chine condition monitoring system presents another challenge.
to 1) identify potential machine faults and failure before they Application of recurrent neural networks has been shown for
occur, thus reducing unexpected and costly machine downtime, predicting future values of some time and frequency domain
and 2) better control the quality of products, which is closely re- parameters in monitoring a gearbox with defective bearings [8],
lated to the condition of the machine. The information gathered [9]. An adaptive technique without the use of neural networks
from the monitoring sensors ultimately provides insight into the was developed, in which parameters of a defect propagation
manufacturing process itself, enabling effective high-level deci- model were updated based on the trend of the root mean square
sion-making for quality production at a lower cost. (rms) of the vibration data [10]. In these studies, the accu-
The application of neural networks to machine condition racy of the diagnostic and, subsequently, prognostic models
monitoring has long been demonstrated. The development is depended on the sensitivity of the features used to estimate the
rooted in the need for automated “learning” capabilities of the condition and propagation of the defects. Therefore, it is crit-
monitoring system to adapt to the machine or process being ical to devise a systematic scheme that is capable of selecting
monitored in-situ, in order to develop accurate functional rela- the most representative features to maximize the accuracy of
tionships between the available inputs and desired output data classification schemes for defect severity evaluation. The prin-
sets. In the area of condition monitoring of rolling bearings, cipal component analysis (PCA) technique, also known as the
early studies have applied neural networks to the identification Karhunen–Loeve transform, has been investigated before by
of defect locations (e.g., on the inner or outer ring of the researchers for signal and image processing [11], [12]. Be-
bearing) [3]–[6]. While successful in demonstrating the classi- cause of its ability to discriminate directions with the largest
ficatory ability of the neural networks, they did not fully utilize variance in a data set, the suitability of PCA for identifying
their adaptive learning capabilities [3], [4]. Other techniques the most representative features as inputs to a defect classi-
fication scheme is investigated in this paper. Two scenarios
were considered, with the objective to develop a systematic
Manuscript received June 15, 2003; revised May 31, 2004. This work was approach to defect severity evaluation.
supported by the National Science Foundation under Grants DMI-9624353 and
0218161. 1) Supervised training, where the applicability of PCA to
The authors are with the Department of Mechanical and Industrial En- select suitable features as inputs to feedforward neural
gineering, University of Massachusetts, Amherst, MA 01003 USA (e-mail:
amalhi@partners.org; gao@ecs.umass.edu). networks (FFNNs) and radial basis function (RBF) net-
Digital Object Identifier 10.1109/TIM.2004.834070 works for defect classification was investigated.
0018-9456/04$20.00 © 2004 IEEE
1518 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 53, NO. 6, DECEMBER 2004

2) Unsupervised training, where the most sensitive features


computed from the vibration signals of defective bearings
were identified without a priori knowledge of the defect
conditions. These features were then used as inputs for
an unsupervised competitive learning scheme to classify
defective bearings according to their defect sizes.

II. CONTENDING FEATURES


The merit of a machine condition monitoring system rests
upon its ability to detect and classify a defect, estimate its
severity, and estimate the remaining service life of the machine Fig. 1. Bearing health diagnosis: typical system configuration.
based on the current operating conditions (Fig. 1). As an ex-
ample, rolling bearings have been widely used in the industry
to support rotating machines. Identifying the existence and
severity of a defective condition at the early stage will provide
valuable input to adjusting the maintenance schedule and mini-
mizing machine downtime. Techniques aimed at improving the
efficiency of data acquisition and processing to enable on-line
bearing condition monitoring have been investigated [13]. The
concept of feature selection for accurate defect classification is
a critical step toward realizing such an online bearing condition
monitoring platform.
Major time-domain statistical features considered for bearing
defect diagnosis and prognosis have included the rms, kurtosis,
rectified skew, and crest factor [9], [10], [14], [15]. Other features
investigated included the characteristic frequencies related to the
rotation of bearing components, e.g., the ball-pass frequency of
outer ring (BPFO), ball-pass frequency of inner ring (BPFI), and
ball-spin frequency (BSF) [16]. These features relate closely to
localized defects through the amplitudes at specific frequencies,
and thus do not provide a wholesome measure of the bearing
health status. Furthermore, it has been found that the time- and
frequency-domain features, when used alone, are of limited
effectiveness for detecting faults at the incipient stage, due to
their generally weak amplitudes and short duration [17]. In
comparison, features extracted by the wavelet transform have
shown to be more indicative of the defect status. Prior work
has successfully identified an inner ring defect of 0.25 mm
diameter using a combined wavelet and Fourier analysis [17].
As shown in Fig. 2(a), conventional spectral analysis could
Fig. 2. Spectral and spectral-wavelet analysis for inner race defect detection.
not identify the existence of the defect frequency (BPFI, at
58.6 Hz). By taking the wavelet transform of the bearing
signals and postprocessing it with the Fourier transform, the while based on the 13 initial features, is not limited to them only.
defect frequency could be successfully detected, as shown in Intuitively, it is difficult to estimate which features are more
Fig. 2(b). Since bearing defect identification is essentially based sensitive to defect development and propagation in a machine
on monitoring the amplitudes at characteristic frequencies, such system, as various factors affect the effectiveness of the features,
a combined approach enhances the performance of the defect e.g., location of the sensors, signal-to-noise ratios (SNRs) of the
classification scheme. data acquisition system, etc. Therefore, a systematic approach
Leveraging the effectiveness of wavelet-based analysis, this to feature selection based on available data provides valuable
paper takes a multidomain approach by including the time-, guidance to the success of defect identification and severity
frequency-, and wavelet-domain features as inputs to a bearing evaluation. The utility of vibration data to classify faults and
defect classification scheme. An initial set of 13 features (Table I) their severity levels in other machinery components such as
was compiled, based on the published studies by various gears and cutting tools have been reported in [18] and [19].
researchers [3], [9], [10], [14]–[17]. It should be noted that Thus, the feature selection scheme presented in this paper
other features such as the peak amplitude in frequency-domain should be applicable to the monitoring of these components
[3] and sum of amplitudes of the sidebands [8] have also been as well, because of their similarity to problems associated with
investigated previously. The technique presented in this paper, the bearing condition monitoring.
MALHI AND GAO: PCA-BASED FEATURE SELECTION SCHEME FOR MACHINE DEFECT CLASSIFICATION 1519

TABLE I where is the statistical expectation operator applied


CONTENDING FEATURES FOR BEARING CONDITION MONITORING on the outer product of and its transpose. The representation
shown in (1) minimizes the error between the original and trans-
formed vectors. This is illustrated by considering the variance
of the principal components given by [23]

(3)

where represents the -by-1 vector .


It is evident that the variance of the principal components is a
function of the magnitude of the components of the vectors .
At the local maxima and minima for the variance function in
(3), the following relationship exists:

(4)

Equation (4) is satisfied [23] when

(5)
III. FEATURE SELECTION METHODLOGY
The PCA-based feature selection scheme for machine con- where is a scaling factor. This leads to
dition monitoring was based on the understanding that the am- (6)
plitude of vibration signals of defective machine components
increases as the severity of the defect increases [16]. The issue This equation can be recognized as an eigenvalue problem
of feature selection from a contending feature set arises, because with nontrivial solutions only when are the eigenvalues of
of the stochastic nature of the defect propagation in machinery the scatter matrix . Thus, the associated vectors (
[20]. Generally, as the defect severity increases, an overall in- to ) are the eigenvectors. If the condition is satisfied,
creasing vibration trend is superimposed by local variations of then the above representation also reduces the dimensionality
smaller magnitudes [9]. The goal of feature selection is therefore of the vectors. The error in representation of the original data
to select features that allow for an accurate description of the de- set due to the reduction in number of
fect condition, and subsequently, reliable defect classification, dimensions to is given by [23]
diagnosis, and prognosis. The PCA approach was developed to
reduce the dimensionality of the input features (13 initially) for
both supervised and unsupervised classification purposes. This (7)
is based on the consideration that a large number of inputs, while
increasing the computational load, do not necessarily contribute where are the eigenvalues of the scatter matrix corre-
to improving the effectiveness of defect classification. Prelimi- sponding to the eigenvectors . It is seen from (7) that using
nary study has shown [21] that some features may even provide the eigenvectors corresponding to the largest eigenvalues would
contradictory information and thus reduce the quality of data give the smallest error in representation. Thus, the variance is
analysis. maximum in the direction of the eigenvectors. Also, the variance
In general, the PCA technique transforms vectors in the directions of the eigenvectors de-
from a -dimensional space to creases in the same order when
vectors in a new, -dimensional space
as [22] (8)

Hence the features that have the largest variance due to a


(1) changing defect condition can be identified by examining their
directionality. This property of principal components has been
exploited for the presented feature selection study. Given that
where are the eigenvectors corresponding to the largest the features transformed by the principal components are not
eigenvalues for the scatter matrix and are the projections directly connected to the physical nature of the defect, the
of the original vectors on the eigenvectors . These projec- defect classification scheme presented in this study was based
tions are called the principal components of the original data set. on the original features themselves. The eigenvectors for the
Both and are positive integers, and the dimension cannot transformed data were only used to choose the most sensitive
be greater than . The -by- scatter matrix for the original features from the original feature set.
data set is defined as The effectiveness of the developed feature selection scheme
was investigated by means of numerical simulation. Four nor-
malized feature vectors , , , and were constructed
for to (2) with each of them forming clusters around four distinct levels
1520 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 53, NO. 6, DECEMBER 2004

Fig. 3. Simulated data for selecting features (f ;f ;f ;f ).

of magnitudes, as shown in Fig. 3(a)–(d). A total of 100 sam- Table III, . This result can be
ples were considered for each of the four features; hence each interpreted in terms of the directionality of the eigenvector
feature is a 100 1 vector. The four features were simulated to in the original feature space. If the unit vectors for the original
have random variations from the same mean for each of the four feature space were represented as , , , and (where
clusters. This is similar in principle to the variation of a vibration , , etc.), then a
data feature for four different defect sizes. Each of the four clus- higher magnitude of denotes the similarity in direction
ters for each feature contained 25 data points. The four features of the eigenvector with as compared to the other unit
become less clearly differentiated from to , as overlap be- vectors forming the basis for the original feature space. For the
tween the clusters increases. It is evident from Fig. 3 that a suit- presented simulated data, the component had the largest
able feature selection scheme should be able to rank , , , magnitude, followed by , , and . Thus, the feature
and in the same order. represented along was the most sensitive, followed by those
To derive the principal components for the simulated data set, along , , and . As a result, the presented selection
the four normalized features were arranged in a 4 100 matrix scheme was able to rank the four features – as desired.
Based on this study, the PCA approach was applied to selecting
most representative features extracted from the vibration data
(9) measured in defective bearings, to improve the effectiveness
of defect identification.
The eigenvalues and the eigenvectors were calculated for the
scatter matrix . The matrix of eigenvectors can be represented IV. EXPERIMENTAL SETUP
as , where to and to . The eigenvector
consists of four components from the fourth column of the To investigate the effectiveness of the proposed feature
matrix . Similar arrangement applies to , , and . The selection scheme, experimental data were collected from both
matrix is a 4 4 square matrix because of the presence of healthy and defective bearings that contained seeded, localized
the four features – . The eigenvector corresponding to the defects. Three cases were studied.
eigenvalue with the largest magnitude was chosen. As shown Case I: Defects were seeded in a deep groove ball
in Table II, one of the four eigenvalues of the data set is much bearing (DGBB) of type 6220 [24], in the following three
larger than the other three, indicating that most of the variance configurations:
is concentrated in one direction. • inner ring defect (0.25 mm diameter);
Table III lists the component magnitudes for the eigenvector • outer ring defect (0.25 mm diameter);
corresponding to the largest eigenvalue. Since this corresponds • defect in both the inner (3 mm diameter) and outer ring
to , the feature that is responsible for the maximum variance (0.25 mm diameter).
in the data is readily identified. Subsequently, the magnitudes The 13 features described before were extracted from the
of the four components of were examined. As shown in vibration signals obtained from an accelerometer structurally
MALHI AND GAO: PCA-BASED FEATURE SELECTION SCHEME FOR MACHINE DEFECT CLASSIFICATION 1521

embedded within the outer ring of the bearing. A total of TABLE II


224 vibration data files were collected, under varying load EIGENVALUES FOR SIMULATED DATA
and speed conditions as follows:
• radial load: 0–25 kN;
• axial load: 0–11 kN;
• rotational shaft speed: 0–1500 rpm.
Case II: Three cylindrical roller bearings of type N205 ECP
[24] were seeded with outer ring defects of 0.025, 0.1, and 1 mm
diameter, respectively. A “healthy” bearing was used to provide TABLE III
EIGENVECTOR COMPONENT MAGNITUDES FOR SIMULATED DATA
a reference baseline. Since the bearings used in cases I and II are
substantially different both in design (ball bearing versus roller
bearing) and in dimensions (180 versus 52 mm outer diameter),
the data collected cover a broad range to validate the effective-
ness of the proposed feature extraction technique under vastly
different conditions.
Case III: A deep groove ball bearing, type 1100 KR [25]
of 52 mm outer diameter with a 0.27 mm scratch introduced
across the outer raceway, was continually tested under a ro-
tational speed of 2000 rpm, for approximately 2.7 million
revolutions. Upon reaching this stage, the defect has propa-
gated throughout the entire raceway and rendered the bearing
practically nonfunctional. This case study was designed to
investigate the effect of continuous degradation of the defect,
whereas Cases I and II were concerned with the effect of
discrete defects. For Case III, vibration data were taken at an
interval of about every 7 min. A total of 314 data files were
collected, representing the process of defect propagation.

V. DEFECT CLASSIFICATION
Defect classification refers to the identification of defective Fig. 4. Defect classification subtasks.
components. To identify the most sensitive features from the
data collected in Case I, the overall classification task was TABLE IV
decomposed into individual subtasks. If the features required PCA IDENTIFIED FEATURES FOR DEFECTIVE BEARING DATA (CASE I)
to differentiate defective inner and outer rings as well as the
healthy condition were identified separately by the PCA, they
were then used as inputs to a subsequent classification tool.
Fig. 4 illustrates such an approach of dividing the classification
problem into separate two-class problems. The three sub-
tasks were to differentiate bearing pairs designated as “Inner
Ring/Healthy,” “Outer Ring/Healthy,” and “Inner/Outer Ring.” present, both neurons gave an output of one, while the third
For each subtask, only the data corresponding to the two defect neuron gave an output of zero. Instead of using the Sigmoid neu-
conditions were used. rons in the output layer, the Softmax neurons were used (Fig. 5).
The procedure followed for the PCA technique was the same This was done to enable higher level decision-making, as the
as described for the simulated data before. The eigenvectors outputs of such neurons are representative of the a posteriori
corresponding to the largest eigenvalues, in de- probabilities of the particular output. The Softmax neuron is
creasing order of magnitude, were found for each of the three similar to the Sigmoid neuron, except that the outputs are scaled
subtasks. Then the magnitudes of the eigenvector components by the total activation at the output layer [23]. This ensures that
for were compared in each of the three subtasks. The two the total sum of the outputs is one, thus establishing outputs that
features selected due to the similarity in the direction with are representative of the probability of occurrence. This prevents
for each subtask are shown in Table IV. The three identified fea- a skew toward one particular output in cases where the network
tures were and . does not have one output as a clear winner. This feature is espe-
To investigate the effectiveness of these features identified cially useful for bearing defect classification, as the network out-
by the principal components, a classification scheme using an puts the probability of occurrence of a fault based on the training
FFNN with one hidden layer was devised. The output layer con- data.
tained three neurons, corresponding to the conditions of faults To test the generalization ability of such a classification
on the inner (IRF) ring, fault on the outer (ORF) ring, and no scheme, 30% of the data was used for training while the
fault (NF). In the case when both an IRF and ORF defect were remaining 70% was used for testing and validation in equal
1522 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 53, NO. 6, DECEMBER 2004

proportion [26]. This was aimed at examining the performance


of the network dealing with data sets for which it was not
sufficiently trained. For purpose of comparison, a similar
classification scheme with all the 13 input features initially
considered was studied, and the results are shown in the first
two rows of Table V. It was seen that the three features selected
by the PCA technique gave better results (about 1% error) as
compared to the use of all the 13 original features as inputs
(9% error). This demonstrates the advantage of PCA for di-
mensionality reduction of the input space, and to ascertain the
Fig. 5. Softmax output neurons for defect classification.
usefulness of particular features for a particular classification
task. Such an approach will be useful when historical bearing
TABLE V
defect data for a certain machine being monitored are available SUPERVISED DEFECT CLASSIFICATION USING PCA IDENTIFIED
for network training purpose. Furthermore, the performance FEATURES (CASE I)
of the RBFs with the three PCA-identified features and all 13
original features was also compared. Neurons in the hidden
layer of RBF networks computed the Euclidean norm (distance)
between the input and the weight vector of that neuron [23].
Comparing to the feedforward neural network, the RBF showed
slightly higher errors in testing for the same classification
task. On the other hand, the advantage in using PCA identified since inner ring defects were monitored, it could have been ex-
features ( 2% error) over all 13 original features was signifi-
pected that either or would have been the most suitable
cant ( 12% error). This demonstrated that the proposed PCA
features, as they directly relate to the amplitude of the corre-
technique is effective in classifying bearing faults with higher
sponding defect frequency. However, the presence of an outer
accuracy and lower number of feature inputs as compared to
ring defect of 0.25 mm along with the inner ring defect of 3 mm
using all the original feature inputs.
in one bearing made a more suitable feature, because the
amplitudes at all frequencies in the defect range were taken into
VI. SEVERITY CLASSIFICATION account for that feature. The proposed method was able to take
Severity classification refers to the differentiation of defective this into account without any prior knowledge of the defect con-
bearings on the basis of defect size. The defect size can be es- ditions. The feature most similar in direction to was the peak
timated by the magnitude of a representative feature. Since no value .
guidelines are generally available as to which feature is more Unlike for defect classification where supervised training was
sensitive for a particular defect condition, developing an effec- used to identify defective component in a bearing, classifica-
tive feature selection scheme is essential to improving the accu- tion of severity based on the defect sizes was conducted using
racy of defect severity classification. unsupervised training. This type of training is applicable when
Case I: For this case, two bearings with inner ring defects no prior knowledge about the defect condition of the bearing is
(0.25 and 3 mm diameter, respectively) were used to establish available. Unsupervised training refers to updating the network
a feature selection scheme for inner ring severity classification. weights without the use of any desired output. The updates were
One of the bearings also contained a 0.25 mm diameter outer made on the basis of the input presented to the network. For
ring defect. A third, healthy bearing was used to serve as the the presented study, the inputs were features identified by PCA.
reference base. Fig. 6 shows a PCA-based two-dimensional rep- A two-dimensional representation of the input data is shown in
resentation of the data corresponding to these three classes. The Fig. 7. The two features chosen were from the wavelet domain:
eigenvectors corresponding to the largest eigenvalues and the power and frequency domain: BPFI . The “physical”
sensitive features were identified by the methodology described transformation taking place in choosing these features can be
before. Since PCA determines the directions with the largest seen by examining Figs. 6 and 7. In Fig. 6, the differentiation
variance for a given data set, and the data corresponded to in- of defect severity clusters along is shown. The two features
creasing severity levels, it was expected that the largest variation most similar in direction to this eigenvector are plotted in Fig. 7,
would be along the direction of increasing severity level. This on two separate axes, thus retaining the differentiation in the
was found to be true, as the data points corresponding to the severity clusters.
three severity levels were differentiated along the eigenvector To identify the means of the three clusters corresponding to
. On the other hand, variation was seen among the data points, the two defect conditions and the healthy condition, the method
as represented by the eigenvector . This can be attributed to of -means clustering was performed on the data set (Fig. 7).
the different operating conditions. Compared to the result in , This is a method to approximate the mean of a cluster, with the
very little increase in severity can be seen in the direction of . number of clusters being specified by the user [23].
Since the feature was found to be the most similar in di- These cluster centers were used as an approximate starting
rection to , it can be hence gauged as the most suitable feature point for online clustering, which was used for classifying inputs
for bearing defect severity classification. It should be noted that to their appropriate severity clusters. Also called competitive
MALHI AND GAO: PCA-BASED FEATURE SELECTION SCHEME FOR MACHINE DEFECT CLASSIFICATION 1523

activation wins, and gives an output while the others remain un-
changed [27]. The weights were initialized with the values
at the end of -means clustering. This ensured that the weights
were in the vicinity of the mean of the cluster as approximated
by the -means procedure. When a new data point was input to
the network with the weights learnt by -means, it resulted in an
output from the neuron, whose weights were nearest to the new
data point. This is most commonly implemented by the criterion
in the WTA network [27]

(11)

where is the input vector and is weight vector. Hence


the weight nearest to the input was identified and updated.
Identifying the weight is equivalent to identifying the cluster
Fig. 6. Inner ring defect severity classification using principal components that the input belongs to. If the wrong weight was updated, then
(Case I). the data point will be misclassified. This method was applied
to examine the effectiveness of the feature selection scheme by
calculating the misclassification percentage for the two features
identified by PCA. Approximately 30% of the data was used
for unsupervised training by the -means method, whereas the
remaining data were used for online clustering to check for
misclassification. A 0% misclassification was achieved with
the use of features and as inputs. to examine the
effectiveness of the developed feature selection scheme, cross
validation was introduced, which refers to the degradation in
classification performance as new features are added to the
selected feature set. For this purpose, each of the remaining 11
features was added to the input feature set one after another,
and the misclassification percentage was computed, with the
results shown in Fig. 8. Except for rms, the addition of all
the other features increased the misclassification percentage.
Since the addition of features to the feature set has led to
an increase in the misclassification rate, it can be concluded
Fig. 7. k -means clustering using PCA identified features (Case I). that the feature set of and is a good selection, thus
validating the PCA approach for selecting features.
learning, this type of clustering consists of a single layer Case II: The defect severity data in this case corresponded
topology that tunes particular neurons to various areas of the to outer ring defects. Fig. 9 shows the application of PCA to
input space. All the information is extracted from the input data collected from this second set of data. An analysis similar
patterns alone, with no need for a desired response. The Instar to that of Case I showed the expected increase of severity along
training rule was applied, which is characterized as [27] the direction of eigenvector . As shown in Fig. 9, data points
corresponding to the 1 mm defect were well separated from the
(10) rest of the data points. However, separation of the data points
representing healthy, 0.025 and 0.1 mm defect sizes were not
where is the weight connecting the th input to the th as clear. This indicates that even the best feature selected for
output , and is the number of training patterns. As illustrated this case has had similar amplitudes for the small defect sizes.
in (10), a weight update is only made when an output is seen at Thus, severity classification for defect sizes below 0.1 mm was
a particular neuron. The neuron has a threshold activation func- difficult to perform. The most sensitive feature found from com-
tion, with an output being either zero or one. Since weights are paring the eigenvector components was the amplitude at BPFO
attracted toward the input position (as the weight update is di- in the frequency domain . The misclassification percentage
rectly related to the difference of the input and the weight), if following the -means clustering and competitive learning ap-
the inputs are normalized, the weights automatically become proach was found to be higher (9%) than in Case I (1%).
normalized. For multiple neurons, the winner-take-all (WTA) Case III: For this study, a ball bearing with a small initial
network is commonly used at the output stage of the network to outer ring defect was continually run until it was completely
ensure that the neuron that wins the competition is updated. The failed (rotation seized up). Since no prior information was
purpose of the WTA network is that the neuron with the highest available regarding the defect severity at the various stages
1524 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 53, NO. 6, DECEMBER 2004

Fig. 8. Feature selection validation: misclassification in online clustering (Case I).

Fig. 10. Feature selection validation: misclassification in online clustering


(Case III).

ensure effective severity classification at higher accuracy than


the non-PCA-based approach.
Fig. 9. Outer ring defect severity classification using principal components
(Case II).
VII. CONCLUSION

A PCA-based approach to selecting the most representative


during defect progression, supervised topology such as feed- features for the classification of defective components and defect
forward network could not be applied. Instead, four data sections severity in three types of rolling bearings was developed. The
of equal length were used as inputs to the -means clustering PCA-selected features have shown to be able to improve the
and competitive learning analysis, with each section consisting accuracy of the classification scheme, for both supervised
of 44 data files. Such a division of the data files formed four classification using feedforward neural network and radial
distinct data clusters to which online clustering could be applied. basis function network, and unsupervised competitive learning.
The two most representative features identified by the prin- The performance of the PCA approach was evaluated using
cipal components were the peak value and rms . three different bearing defect configurations. The unsupervised
The overall misclassification rate was less than 3%. Fig. 10 classification scheme was investigated on its applicability to
shows the increase in misclassification percentage due to the real-world scenarios where no a priori knowledge about the
addition of more features in the feature set, thus validating the defect severity is available. The effectiveness of the feature
merit of the features selected by PCA. selection scheme was confirmed by cross validation, wherein
Table VI summarizes the results from all the three cases pre- the addition of other features not selected by PCA to the input
sented above. The fourth column in the table lists the errors feature set has led to an increase in the misclassification rate by
in classification when the PCA-selected features were used for 5.1%, 17.2%, and 4.7%, respectively. These results validated
clustering. The fifth column lists the average increase in the mis- the suitability of the PCA-based feature selection scheme.
classification rate when the remaining 11 features (not chosen The method presented is generic in nature, hence applicable
by PCA) were added one by one to the PCA-selected feature set. to a wide range of problems typically seen in manufacturing
In all three cases, the increase in error percentage was substan- equipment and process condition monitoring, health diagnosis,
tial. This result confirmed that the features identified by PCA and remaining life prognosis.
MALHI AND GAO: PCA-BASED FEATURE SELECTION SCHEME FOR MACHINE DEFECT CLASSIFICATION 1525

TABLE VI
SUMMARY: CLUSTERING FOR SEVERITY CLASSIFICATION

ACKNOWLEDGMENT [18] S. Li and M. Elbestawi, “Tool condition monitoring in machining by


fuzzy neural networks,” ASME J. Dynamic Syst., Meas. Contr., vol. 118,
The authors gratefully acknowledges experimental support pp. 665–672, 1996.
provided by SKF and Timken Corporations. [19] G. Dalpiaz, A. Rivola, and R. Rubini, “Effectiveness and sensitivity of
vibration processing techniques for local fault detection in gears,” Mech.
Syst. Signal Process., vol. 14, no. 3, pp. 387–412, 2000.
REFERENCES [20] M. Kotzalas and T. Harris, “Fatigue failure progression in ball bearings,”
ASME J. Tribol., vol. 123, pp. 238–242, 2001.
[1] K. Ng, “Overview of machine diagnostics and prognostics,” in Symp. [21] A. Malhi and R. Gao, “Feature selection for defect classification in ma-
Quantitative Nondestructive Evaluation, Dallas, TX, 1997. chine condition monitoring,” in Proc. 20th IEEE Instrumentation Mea-
[2] S. Billington, Y. Li, T. Kurfess, S. Liang, and S. Danyluk, “Roller bearing surement Technology Conf., vol. 1, Vail, CO, May 2003, pp. 36–41.
defect detection with multiple sensors,” in Proc. 1997 ASME Int. Me- [22] R. Duda, P. Hart, and D. Stork, Pattern Classification, 2nd ed: Wiley-
chanical Engineering Congr. Exposition—Tribology Division, vol. 7, Interscience, 2001.
1997, pp. 31–36. [23] S. Haykin, Neural Networks: A Comprehensive Foundation, 2nd
[3] T. Liu and J. Mengel, “Intelligent monitoring of ball bearings,” Mech. ed. Englewood Cliffs, NJ: Prentice-Hall, 1998.
Syst. Signal Process., vol. 6, no. 5, pp. 419–431, 1992. [24] SKF USA, Inc. SKF Interactive Bearing Catalog. [Online]. Avail-
[4] I. Alguindigue, A. Loskiewicz-Buczak, and R. Uhrig, “Monitoring and able: http://www.skf.com/portal/skf_us/home/products?contentId=
diagnosis of rolling element bearings using artificial neural networks,” 056001&lang=en
IEEE Trans. Ind. Electron., vol. 40, no. 2, pp. 209–216, 1993. [25] Timken. Torrington Bearing Catalog [Online]. Available: http://
[5] D. C. Baillie and J. Mathew, “A comparison of autoregressive modeling www.timken.com/industries/torrington/catalog/pdf/fafnir/wide_inn.pdf
techniques for fault diagnosis of rolling element bearings,” Mech. Syst. [26] L. Jack and A. Nandi, “Support vector machines for detection and char-
Signal Process., vol. 10, pp. 1–17, 1996. acterization of rolling element bearing faults,” Proc. Inst. Mech. Eng.,
[6] P. Xu and A. Chan, “Fast and robust neural network based wheel bearing pt. Part C, vol. 215, pp. 1065–1074, 2001.
fault detection with optimal wavelet features,” in Proc. 2002 Int. Joint [27] J. Principe, N. Euliano, and W. Lefebvre, Neural and Adaptive Systems:
Conf. Neural Networks (IJCNN ’02), vol. 3, 2002, pp. 2076–2080. Fundamentals Through Simulations. New York: Wiley, 1999.
[7] C. Li and Y. Fan, “Recurrent neural networks for fault diagnosis and
severity assessment of a screw compressor,” ASME J. Dynamic Syst.
Contr., vol. 121, pp. 724–728, 1999.
[8] P. Tse and D. Atherton, “Prediction of machine deterioration using vi- Arnaz Malhi (S’04) received the B.E. degree in mechanical engineering from
bration based fault trends and recurrent neural networks,” ASME J. Vi- the Thapar Institute of Engineering and Technology, India, in 2000. He is cur-
bration Acoust., vol. 121, pp. 355–362, 1999. rently pursuing the M.S. degree at the Department of Mechanical and Industrial
[9] Y. Shao and K. Nezu, “Prognosis of remaining bearing life using neural Engineering, University of Massachusetts, Amherst.
networks,” Proc. Inst. Mech. Eng.—J. Syst. Contr. Eng., vol. 214, no. 3, He has worked as a Research Assistant at the Department of Mechanical
pp. 217–230, 2000. and Industrial Engineering, University of Massachusetts. His research interests
[10] Y. Li, S. Billington, C. Zhang, T. Kurfess, S. Danyluk, and S. Liang, include signal processing, sensors, machine condition monitoring, and health
“Adaptive prognostics for rolling element bearing condition,” Mech. diagnosis.
Syst. Signal Process., vol. 13, no. 1, pp. 103–113, 1999.
[11] V. Algazi, K. Brown, and M. Ready, “Transform representation of the
spectra of acoustic speech segments with appliances, part I: General
approach and application to speech recognition,” IEEE Trans. Speech Robert X. Gao (M’91–SM’00) received the M.S.
Audio Process., vol. 1, pp. 180–195, 1993. and Ph.D. degrees from the Technical University
[12] L. Sirovich and L. Keefe, “Low dimensional procedure for characteri- Berlin, Germany, in 1985 and 1991, respectively.
zation of human faces,” J. Opt. Soc. Amer., vol. 4, pp. 519–524, 1987. He is currently an Associate Professor with the De-
[13] C. Wang and R. Gao, “A virtual instrumentation system for integrated partment of Mechanical and Industrial Engineering,
bearing condition monitoring,” IEEE Trans. Instrum. Meas., vol. 49, no. University of Massachusetts, Amherst, where he con-
2, pp. 325–332, 2000. ducts research in the areas of integrated sensing and
[14] F. Honarvar and H. Martin, “New statistical moments for diagnostics of sensor networks, system miniaturization, structural
rolling element bearings,” ASME J. Manufact. Sci. Eng., vol. 119, pp. dynamics and vibration measurement, multidomain
425–432, 1997. signal processing, and wireless communication.
[15] S. Braun, “The signature analysis of sonic bearing vibration,” IEEE Dr. Gao received the 1996 National Science
Trans. Sonics Ultrason., vol. SU-27, no. 6, 1980. Foundation CAREER Award and the 1999 University of Massachusetts
[16] T. Harris, Rolling Bearing Analysis, 3rd ed. New York: Wiley, 1991. Outstanding Engineering Junior Faculty Award. He is an Associate Editor
[17] C. Wang and R. Gao, “Wavelet transform with spectral post-processing for the IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT and
for enhanced feature extraction,” in Proc. IEEE Instrumentation Mea- Chair of the Technical Committee on Built-in-Test and Self-Test of the IEEE
surement Technology Conf., vol. 1, 2002, pp. 315–320. Instrumentation and Measurement Society.

View publication stats

You might also like