You are on page 1of 10

ARTICLE IN PRESS

Mechanical Systems
and
Signal Processing
Mechanical Systems and Signal Processing 18 (2004) 1273–1282
www.elsevier.com/locate/jnlabr/ymssp

Letter to the editor

Artificial neural networks and genetic algorithms


for gear fault detection

1. Introduction

Condition monitoring is gaining importance in industry because of the need to increase


machine availability. The use of vibration and acoustic emission (AE) signals is quite common in
the field of condition monitoring of rotating machinery [1–5] with potential applications of
artificial neural networks (ANNs) in automated detection and diagnosis [2,4,5]. Multi-layer
perceptrons (MLPs) and radial basis functions (RBFs) are most commonly used ANNs [6,7],
though interest on probabilistic neural networks (PNNs) is increasing in general [7,8] and in the
area of machine condition monitoring [9,10]. Genetic algorithms (GAs) have been used to make
the classification process faster and accurate using minimum number of features which primarily
characterise the system conditions with optimised structure or parameters of ANNs [10,11]. In a
recent work [12], results of MLPs with GAs were presented for fault detection of gears using only
time-domain features of vibration signals. In this approach, the features were extracted from finite
segments of two signals: one with normal condition and the other with defective gears. In the
present work, comparisons are made between the performance of three different types of ANNs,
both without and with automatic selection of features and classifier parameters for the dataset of
[12]. The roles of different vibration signals, obtained under both normal and light loads and at
low and high sampling rates, are investigated. The results show the effectiveness of the extracted
features from the acquired and preprocessed signals in diagnosis of the machine condition. The
procedure is illustrated using the vibration data of an experimental setup with normal and
defective gears [13].

2. Vibration data and feature extraction

In [13], vibration signals measured from seven accelerometers on a pump set driven by an
electrical motor through a two-stage gear reduction unit were presented. The sensors were placed
near the driving shaft and the bearings supporting the gear-shafts. Four sets of measurements with
two levels of load (maximum and minimum) and at two sampling rates (3200 and
128,000 samples/s) were obtained. The sampling rates were selected much above the gear mesh
frequency. Number of samples collected for each channel was 77,824 to cover sufficient number of
cycles.

0888-3270/$ - see front matter r 2004 Published by Elsevier Ltd.


doi:10.1016/j.ymssp.2003.11.003
ARTICLE IN PRESS

1274 Letter to the editor / Mechanical Systems and Signal Processing 18 (2004) 1273–1282

The samples were divided into 38 segments of 2048 samples each, which was further processed
[12] to extract nine features (1–9): mean, root mean square, variance, skewness, kurtosis and
normalised fifth to ninth central moments. The effect of segment size on the variation of features
between segments was studied and the present segment size (2048 data points) was chosen. Similar
features were extracted from derivative and integral of the signals (10–27) and from low- and
high-pass filtered signals (28–45) [12]. The procedure of feature extraction was repeated for two
load conditions, two sampling rates (high and low) and two gear conditions (normal and
defective) giving a total set of 45  266  2  2  2 features. Each of the features was normalised
dividing each row by its absolute maximum value keeping the inputs within 71 for better speed
and success of the network training. However, a scheme of statistical normalisation with
zero mean and a standard deviation of 1 for each feature set was also attempted. The results
comparing the effectiveness of two normalisation schemes (magnitude and statistical) are
discussed in Section 5.4.

3. Artificial neural networks

There are numerous applications of ANNs in data analysis, pattern recognition and control
[6,7]. Among different types of ANNs, three namely, MLP, RBF and PNN are considered in this
work. Here a brief introduction to ANNs is given and readers are referred to texts [6,7] for details.

3.1. Multi-layer perceptron

The feed-forward MLP neural network, used in this work, consisted of three layers: input,
hidden and output. The input layer had nodes representing the normalised features extracted from
the measured vibration signals. The number of input nodes was varied from 3 to 45 and the
number of output nodes was 2. The number of hidden nodes was varied between 10 and 30,
similar to [12].

3.2. Radial basis function networks

The structure of a RBF network is similar to that of an MLP. The activation function of the
hidden layer is Gaussian spheroid function as follows:
2 2
yðxÞ ¼ eJxcJ =2s : ð1Þ

The output of the hidden neuron gives a measure of distance between the input vector x and the
centroid, c; of the data cluster. The parameter, s, represents the radius of the hypersphere. The
parameter is generally determined using iterative process selecting an optimum width on the basis
of the full data sets. However, in the present work, the width is selected along with the relevant
input features using the GA based approach. In the present work, the RBFs were created, trained
and tested using Matlab through a simple iterative algorithm of adding more neurons in the
hidden layer till the performance goal is reached.
ARTICLE IN PRESS

Letter to the editor / Mechanical Systems and Signal Processing 18 (2004) 1273–1282 1275

3.3. Probabilistic neural networks

The structure of a PNN is similar to that of a RBF, both having a Gaussian spheroid activation
function in the first of the two layers. The linear output layer of the RBF is replaced with a
competitive layer in PNN which allows only one neuron to fire with all others in the layer
returning zero. The major drawback of using PNNs was computational cost for the potentially
large size of the hidden layer which could be equal to the size of the input vector. The PNN can be
Bayesian classifier, approximating the probability density function (PDF) of a class using Parzen
windows [6]. The generalised expression for calculating the value of Parzen approximated PDF at
a given point x in feature space is given as follows:
1 X
NA
2 2
fA ðxÞ ¼ 2 p
eJxcJ =2s ; ð2Þ
ð2pÞ s NA i¼1
where p is the dimensionality of the feature vector, NA is the number of examples of class A used
for training the network. The parameter s represents the spread of the Gaussian function and has
significant effects on the generalisation of a PNN.
One of the problems with the PNN is handling the skewed training data where the data from
one class is significantly more than the other class. The presence of skewed data is more likely in a
real environment as the number of data for normal machine condition would, in general, be much
larger than the machine fault data. The basic assumption in PNN approach is the so-called prior
probabilities, i.e. the proportional representation of classes in training data should match, to some
degree, the actual representation in the population being modeled [6,8]. If the prior probability is
different from the level of representation in the training cases, then the accuracy of classification is
reduced. To compensate for this mismatch, the a priori probabilities can be given as input to
the network and the class weightings are adjusted accordingly at the binary output nodes of the
PNN [6,8]. If the a priori probabilities are not known, then training data set should be large
enough for the PDF estimators to asymptotically approach the underlying probability density.
The skewed training set problem also affects MLPs.
In the present work, the data sets have equal number of samples from normal and faulty gear
conditions. The PNNs were created, trained and tested using Matlab. The width parameter is
generally determined using iterative process selecting an optimum value on the basis of the full
data sets. However, in the present work the width is selected along with the relevant input features
using the GA-based approach, as in case of RBFs.

4. Genetic algorithms

GAs have been considered with increasing interest in a wide variety of applications. These
algorithms are used to search the solution space through simulated evolution of ‘survival of the
fittest’ for solving linear and non-linear problems through mutation, crossover and selection
operations applied to individuals in the population [14]. The basic issues of GAs, in the context of
the present work, are briefly discussed in this section.
A population size of 10 individuals was used starting with randomly generated genomes. GA
was used to select the most suitable features and one variable parameter related to the particular
ARTICLE IN PRESS

1276 Letter to the editor / Mechanical Systems and Signal Processing 18 (2004) 1273–1282

classifier: the number of neurons in the hidden layer for MLP and the RBF kernel width (s) for
RBF and PNN. For a training run needing N different inputs to be selected from a set of Q
possible inputs, the genome string would consist of N þ 1 real numbers. The first N integers
(xi ; i ¼ 1; N) in the genome are constrained to be in the range 1pxj pQ for ANNs

X ¼ fx1 x2 ; y; xN xNþ1 gT : ð3Þ

The last number xNþ1 has to be within the range Smin pxNþ1 pSmax : The parameters Smin and Smax
represent, respectively, the lower and the upper bounds on the classifier parameter. A probabilistic
selection function, namely, normalised geometric ranking [15] was used such that the better
individuals, based on the fitness criterion in the evaluation function, have higher chance of being
selected. Non-uniform-mutation function [14] using a random number for mutation based on
current generation and the maximum generation number, among other parameters was adopted.
Heuristic crossover [14] producing a linear extrapolation of two individuals based on the fitness
information was chosen. The maximum number of generations was adopted as the termination
criterion for the solution process. The classification success for the test data was used as the fitness
criterion in the evaluation function.

5. Results and discussion

The data set were split in training and test sets of size 45  140  2  2  2 and
45  126  2  2  2, respectively. No separate validation set was used due to limited size of the
available data. The number of output node was two for MLPs and RBFs, and one for PNNs. The
use of one output node for all would have been enough. However, the classification success was
not satisfactory with one output node in case of MLPs and RBFs for the present data sets with the
particular choice of network structure and activation functions. The target value of the first
output node was set 1 and 0 for normal and failed bearings, respectively, and the values were
interchanged (0 and 1) for the second output node. For PNNs, the target values were specified as 0
and 1, respectively, representing normal and faulty conditions. Results are presented to see the
effects of accelerometer location and signal processing for diagnosis of machine condition using
ANNs without and with feature selection based on GA. The training success for each case
was 100%.

5.1. Performance comparison of ANNs without and with feature selection

In this section classification results are presented for ANNs without and with GA-based feature
selection. For each case of straight ANN, number of neurons in the hidden layer was 24 and for
straight RBFs and PNNs, widths (s) were kept constant at 0.50 and 0.10, respectively. These
values were found on the basis of several trials of training the ANNs. In GA-based approach, only
three features were selected from the corresponding range of input features. In case of MLPs, the
number of neurons in the hidden layer was selected in the range of 10 and 30 whereas for RBFs
and PNNs, the Gaussian spread (s) was selected in the range of 0.1 and 1.0 with a step size of 0.1.
ARTICLE IN PRESS

Letter to the editor / Mechanical Systems and Signal Processing 18 (2004) 1273–1282 1277

5.1.1. Effect of sensor location


Table 1 shows the classification results for each of the sensor locations. First nine input
features (1–9) were used in straight ANNs. The test success improved substantially in
each case with feature selection. The poor performance of the straight ANNs may be attributed
to the insufficient feature set with not enough discrimination between normal and faulty
conditions.

5.1.2. Effect of signal pre-processing


Table 2 shows the effects of signal processing on the classification results with first four signals
(1–4) and features from the corresponding ranges. Test success improved with GA, in some cases
to 100%.

5.1.3. Effect of load and sampling rate


Table 3 shows the effects of load and signal sampling rate on the classification results using the
full feature range (1–45) for first four signals (1–4). For most cases with GA, test success improved
to 100%.

Table 1
Performance comparison of classifiers without and with feature selection for different sensor locations
Data set Test success (%)
MLP RBF PNN
Straight With GA Straight With GA Straight With GA
Signal 1 88.89 100 50.00 100 91.67 100
Signal 2 95.83 100 50.00 100 83.33 100
Signal 3 100 100 55.56 100 94.44 100
Signal 4 87.50 94.44 91.67 100 80.56 94.44
Signal 5 48.61 100 63.89 100 55.56 100
Signal 6 56.94 98.61 91.67 94.44 63.89 98.61
Signal 7 87.50 100 55.56 100 77.78 100

Table 2
Performance comparison of classifiers without and with feature selection for different signal preprocessing
Data set Test success (%)
MLP RBF PNN
Straight With GA Straight With GA Straight With GA
Signals 1–4 96.53 100 97.92 99.31 98.61 98.61
Derivative/integral 97.92 100 97.92 95.14 97.92 97.92
High-/low-pass filtering 94.44 100 97.92 100 99.31 100
ARTICLE IN PRESS

1278 Letter to the editor / Mechanical Systems and Signal Processing 18 (2004) 1273–1282

Table 3
Performance comparison of classifiers without and with feature selection for different load and sampling rates
Data set Test success (%)
Load Sampling rate MLP RBF PNN
Straight With GA Straight With GA Straight With GA
Max Low 100 100 97.92 97.92 99.31 100
Min Low 96.88 100 94.44 100 100 100
Max High 80.21 100 86.81 99.31 96.53 100
Min High 97.57 100 97.92 100 99.31 100

Table 4
ANN performance for different number of features selected
Number of selected features Test success (%)
MLP RBF PNN
3 100 100 100
4 99.65 100 100
5 100 100 100
6 100 100 100

5.2. Effect of number of selected features

Table 4 shows test classification success for the ANNs with number of selected features varying
from 3 to 6 out of 45 for the first four signals at low load and low sampling rate. The test success
was almost 100% for all three classifiers.

5.3. Performance of PNNs with selection of 6 features

The performance of PNNs with 6 features selected from the corresponding ranges was studied.
The test success was 100% for all cases except one. The computation time (on a PC with Pentium
III processor of 533 MHz and 64 MB RAM) for training the PNNs was also noted for each
case. These values (39.397–61.468 s) were not much different from PNNs with three features
(36.983–56.872 s) but higher than straight cases (0.250–1.272 s). These values were substantially
lower than RBFs and MLPs, however direct comparison is not made among the ANNs due to
difference in code efficiency.

5.4. Results using statistical normalisation

The data sets discussed in previous sections were normalised in magnitude to keep the features
within 71. The procedure of GA-based selection of Section 5.2 was repeated for PNNs using the
statistically normalised features. The test classification success was almost 100% for both schemes
ARTICLE IN PRESS

Letter to the editor / Mechanical Systems and Signal Processing 18 (2004) 1273–1282 1279

of normalisation. Training time increased somewhat with higher number of features but not in
direct proportion.

5.5. Separability of data sets

To investigate the separability of data sets with and without gear fault, three features selected
by GA in MLP, RBF and PNN are shown in Figs. 1(a)–(c), respectively, for magnitude
normalised features. In all three cases, the data clusters are quite well separated with only a small
amount of overlap, especially the separation is the best for PNN. This can explain 100%
Normal
Faulty Normal
Faulty

1
1
0.8
0.9
3rd Feature

0.8 0.6
3rd Feature

0.7 0.4

0.6
0.2
0.5
0
0.4 1
1
0.8 1
0.8 1 0.8
0.9 0.6
0.6 0.8 0.6
0.7 0.4
0.4
0.4 0.6
0.5 2nd Feature 0.2 0.2 1st Feature
2nd Feature 0.2 0.4
1st Feature
(a) (b)

Normal
Faulty
1

0.8
3rd Feature

0.6

0.4

0.2

0
1
0.5 1
0.8
0 0.6
-0.5 0.4
0.2
2nd Feature -1 0 1st Feature
(c)

Fig. 1. Scatter plots of GA selected features with magnitude normalisation: (a) MLP, (b) RBF, (c) PNN.
ARTICLE IN PRESS

1280 Letter to the editor / Mechanical Systems and Signal Processing 18 (2004) 1273–1282

6
Normal
5 Faulty

3
3rd Feature

-1

-2
2

1 2
1
0
0
-1
-1
2nd Feature -2 -2
1st Feature

Fig. 2. Scatter plot of GA selected features with statistical normalisation for PNN.

classification success even with only three features for all three classifiers. Fig. 2 shows the scatter
plot of three statistically normalised features selected by GA for PNN. This also shows well
separation of data clusters explaining 100% classification success.

5.6. Comparison with other technique

Principal component analysis (PCA) is used to reduce dimensionality of data by forming new
set of variables, known as principal components (PCs), representing the maximal variability in the
data without loss of information [16]. Figs. 3(a) and (b) show the plots of first three PCs for
magnitude and statistically normalised feature sets, respectively. These PCs account for more than
60% of variability of the feature sets. The separation between the data clusters for two classes is
not very prominent. The classification success of using the first three to eight PCs of magnitude
normalised data is presented in Table 5. These were found to be very unsatisfactory:
54.86–67.71% for MLPs, 50.00–77.78% for RBFs, 65.97–85.42% for PNNs, compared to the
feature selection procedure of GA which gave almost 100% classification success for all three
classifiers. This shows the superiority of the present approach of GA-based feature selection over
using the PCs.
To summarise, the classification success of MLPs and PNNs, without feature selection, were
comparable and better than RBFs for most of the cases considered, but almost all were
substantially less than 100%. The use of GAs with only three features gave almost 100%
classification for most of the test cases even with different load conditions and sampling rates for
all ANNs. The use of six selected features with PNNs gave almost 100% test success for the cases
considered. The training time with feature selection was reasonable for PNNs and substantially
lower than RBFs and MLPs, with and without feature selection. The classification success of
the ANNs compares well with that of support vector machines for the same data and feature
ARTICLE IN PRESS

Letter to the editor / Mechanical Systems and Signal Processing 18 (2004) 1273–1282 1281

Normal
Faulty

3rd Principal component


2

-2
2
1 2
0 1
0
-1 -1
-2
-2 -3
2nd Principal component 1st Principal component

(a)
Normal
Faulty
3rd Principal component

10

-10
10
5 10
0 5
-5 0
-10 -5
-10
-15 -15
2nd Principal component 1st Principal component
(b)

Fig. 3. Scatter plots of principal components for features with different normalisation schemes: (a) magnitude,
(b) statistical.

Table 5
ANN performance with principal components (PC)
Number of first PCs Test success (%)
MLP RBF PNN
3 65.28 63.89 65.97
4 67.71 77.78 74.31
5 54.51 57.64 75.69
6 64.24 53.47 83.33
8 54.86 50.00 85.42

sets [12] and with other work of GA-based MLPs [10]. However, it should be mentioned that the
ANN-based approach has its inherent limitation that for changed machine condition with
different load, the retraining of ANNs may be required [5].

Acknowledgements

The data set was acquired in the Delft ‘‘Machine diagnostics by neural network’’ project with
help from TechnoFysica B.V., The Netherlands and can be downloaded freely at web-address:
ARTICLE IN PRESS

1282 Letter to the editor / Mechanical Systems and Signal Processing 18 (2004) 1273–1282

http://www.ph.tn.tudelft.nl/Bypma/mechanical.html. The author thanks Dr. Alexander Ypma of


Delft Technical University for making the dataset available and providing useful clarifications.

References

[1] J. Shiroishi, Y. Li, S. Liang, T. Kurfess, S. Danyluk, Bearing condition diagnostics via vibration and acoustic
emission measurements, Mechanical Systems and Signal Processing 11 (1997) 693–705.
[2] R.B. Randall (Guest Ed.), Special issue on gear and bearing diagnostics, Mechanical Systems and Signal
Processing 15 (2001) 827–993.
[3] K.R. Al-Balushi, B. Samanta, Gear fault diagnosis using energy-based features of acoustic emission signals,
Proceedings of IMechE, Part I: Journal of Systems and Control Engineering 216 (2002) 249–263.
[4] A.C. McCormick, A.K. Nandi, Classification of the rotating machine condition using artificial neural networks,
Proceedings of IMechE, Part C: Journal of Mechanical Engineering Science 211 (1997) 439–450.
[5] B. Samanta, K.R. Al-Balushi, Artificial neural network based fault diagnostics of rolling element bearings using
time-domain features, Mechanical Systems and Signal Processing 17 (2003) 317–328.
[6] P.D. Wasserman, Advanced Methods in Neural Computing, Van Nostrand Reinhold, New York, USA, 1995.
[7] S. Haykin, Neural Networks: A Comprehensive Foundation, 2nd Edition, Prentice Hall, NJ, USA, 1999.
[8] D.F. Specht, Probabilistic neural networks, Neural Networks 3 (1999) 109–118.
[9] L.B. Jack, A.K. Nandi, A.C. McCormick, Diagnosis of rolling element bearing faults using radial basis functions,
Applied Signal Processing 6 (1999) 25–32.
[10] L.B. Jack, Applications of artificial intelligence in machine condition monitoring, Ph.D. Thesis, Department of
Electrical Engineering and Electronics, the University of Liverpool, 2000.
[11] L.B. Jack, A.K. Nandi, Genetic algorithms for feature extraction in machine condition monitoring with vibration
signals, IEE Proceedings Vision and Image Signal Processing 147 (2000) 205–212.
[12] B. Samanta, Gear fault detection using artificial neural networks and support vector machines with genetic
algorithms, Mechanical Systems and Signal Processing 18 (2004) 625–644.
[13] A. Ypma, R. Ligteringen, R.P.W. Duin, E.E.E. Frietman, Pump vibration data sets, Pattern recognition group,
Delft University of Technology, 1999.
[14] Z. Michalewicz, Genetic algorithms+Data Structures=Evolution Programs, Springer, New York, USA, 1999.
[15] C.R. Houk, J. Joines, M.A. Kay, A genetic algorithm for function optimisation: a matlab implementation, North
Carolina State University, Report No: NCSU IE TR 95 09, 1995.
[16] I.T. Jolliffe, Principal Component Analysis, 2nd Edition, Springer, Berlin, 2002.

B. Samanta
Department of Mechanical and Industrial Engineering College of Engineering, Sultan Qaboos
University P.O. Box 33, PC 123, Muscat, Sultanate of Oman
E-mail address: samantab@squ.edu.om

You might also like