Fault Detection and Diagnosis in Low Speed Rolling Element Bearings Part 2 The Use of Nearest Neighbour Classification PDF

Mechanical Systems and Signal Processing (1992) 6(4), 309-316
FAULT DETECTION AND DIAGNOSIS IN L O W SPEED

ROLLING ELEMENT BEARINGS
PART II: THE USE OF NEAREST N E I G H B O U R CLASSIFICATION
C. K. MECHEFSKE AND J. MATHEW

Centre for Machine Condition Monitoring, Monash University, Melbourne, Victoria, Australia
(Received 12 February 1991, accepted 18 March 1991)
An effective procedure for automatic fault diagnosis in low speed (~<100 RPM) rolling
element bearings is described. The procedure involves the calculation of a statistical
distance measure between vibration signals. The distance measure is then automatically
used to distinguish between different fault conditions. A new trending index, based on
the statistical distance measure, which may be used for fault detection, is also described.
1. INTRODUCTION
Vibration based fault diagnosis has been referred to as somewhat of a "black art". Finding
conclusive evidence of a machine fault in a frequency spectrum can be difficult even
when one knows what to look for. For this reason, experience with particular machine
elements and with the use of frequency spectra for condition monitoring is often a sought
after skill combination in maintenance personnel.
The scarcity o f skilled and experienced personnel and the inherent complicated nature
o f the various information display formats has led to an increased interest in simple
monitoring tools. The use of single number indices to simplify fault detection is common
place in industry. The same simple y e s / n o approach would facilitate dependable pre-
liminary fault diagnosis without the use of highly trained and experienced personnel.
This paper describes an automatic fault diagnosis technique which is shown to work
well with relatively short vibration signals collected from a low speed rolling element
bearing rig. The procedure makes use of a nearest neighbour classification scheme based
on the Kullback-Leibler information number. A new trending index which shows excellent
fault detection potential is also described. Although this application was validated using
a low speed rig, the technique can be applied to a broad range of rotating machinery
without speed restrictions.
2. NEAREST NEIGHBOUR CLASSIFICATION

Making use of an automatic classification scheme to group vibration signals into
different fault categories will remove the need for detailed analysis of the frequency
spectrum by an analyst.
When full knowledge of the underlying probabilities of a class of samples is available,
Bayesian theory gives optimal new sample classification rates. In cases where this informa-
tion is not present, many algorithms make use of the similarity among samples as a means
o f classification. The nearest neighbour decision rule has often been used in these pattern
recognition problems.
309
0888-3270/92/040309+08 $03.00/0 © 1992 Academic Press Limited
310 c. K, MECHEFSKE AND J. MATHEW
The nearest neighbour decision rule was first formulated and analysed by Fix and
Hodges [1, 2]. The rule assigns to an unclassified sample the classification of the nearest
of a set of previously classified samples. Briefly, the procedure uses the characteristics of
a sample (such as mean and standard deviation) to classify the sample into a category
of samples with similar characteristics. The difference between the samples is evaluated
and minimising this difference is considered to be optimising the classification.
Cover and Hart [3] describe how, instead of the single nearest neighbour, the nearest
k neighbours (k-NN) may be determined depending on the situation. They show for any
number of categories, the probability of error of the nearest neighbour rule is bounded
above by twice the Bayesian optimum probability of error. They also show that, for any
number of samples, the single NN rule has lower probability of error than any other
k-NN rule.
DeVroye and Wagner [4] reviewed the single nearest neighbour rule for the infinite
sample case and concluded that two factors should be noted. First, all the data must be
stored and searched for each sample classification. Second, the nearest neighbour rule
performance deteriorates when large amounts of data are being considered. To reduce
these effects they suggest condensing or editing the data before the nearest neighbour
rule is applied. They point out an algorithm by Ritter et al. [5] which will reduce
computations and improve performance.
Csiszar [6] relates the quantity used to distinguish classes in the nearest neighbour
procedure to the/-divergence criterion, also known as the Kullback-Leibler (KL) number,
as described by Kullback [7]. He noted that certain similarities exist between the nearest
neighbour rule and Euclidean geometry where /-divergence plays the role of squared
Euclidean distance. Classifying a sample into a category with the /-divergence having
been minimised is equivalent to grouping plotted points in Euclidean space by minimising
the square of the distance from a sample point to the different groups of points.
The k-nearest neighbour decision rule has found application in a number of physical
science and biomedical situations. White and Fong [8], Paliwal and Rao [9], and Itakura
[10] have all looked at machine recognition of voice patterns. Gersch et al. [11] used the
nearest neighbour classification rules to classify electroencephalograms (EEGs) taken
from people under the influence of various amounts of anaesthesia.
Gersch et al. [12] have shown that a nearest neighbour classification scheme may be
used to automatically diagnose faults in rapidly rotating machinery. Wider acceptance
and use of this technique in applications involving particularly low speed rotating
machinery, may be facilitated by the development of a new trending parameter which
provides good results in these situations. Such an index may be available in the trending
of the probability of misclassification value which results as a by-product of the nearest
neighbour classification calculations.
3. PROBABILITY OF MISCLASSIFICATION TRENDING INDEX

The probability of misclassification is defined as the exponential function (EXP) raised
to the power of negative the distance measure used in the nearest neighbour classification
scheme [13], see equation (8). The number represents the probability that a time series
in a labelled category may be misclassified as being in another category. The closer two
distinct categories are the higher the chances of a sample being misclassified.
In order to make use of this parameter to identify the gradual development of potential
fault conditions or the presence of fault conditions that have not previously been
categorised, a slightly different view of the parameter must be taken. In this case,
monitoring the probability that a signal is normal will be more useful than comparing
AUTOMATIC FAULT DIAGNOSIS 311
signals that have been previously eategorised. No literature has been found which
investigates this idea, but Braun [13] has recommended it as an avenue of research.
4. MATHEMATICAL DEVELOPMENT
In order to make use of the nearest neighbour classification scheme a signal segment
will be digitised and treated as a time series of length n. The term x ~j) = {x<J)(1) . . . . , xtJ)(n)}
denotes an arbitrary n-duration time series with j = 0, 1 , . . . , L. Let L be the number of
alternative categories and 0 ~j) denote the category of the j-th time series. Previously
labelled time series are represented as x (~'), where m = 1, 2 , . . . , L. The term x ~°) denotes
a new, presently unclassified time series and x ~r'') is the nearest neighbour time series to
x (o).
Let d(x ~°),x ¢m)) denote a distance measure between the time series x ~°) and x ~m) for
m = 1, 2 , . . . , L. The nearest neighbour time series to x ~°) is the one with the smallest
distance d(x ~°), x¢~)). The time series x ~°) is then classified into the category of its nearest
neighbour. That is:
if d(x ~°),x ~')) <~d(x ~°),x¢")); m = 1.2,..., L
then 0 ~°)= 0 ~"')
The topic of obtaining a satisfactory distance measure, d (x ¢°), x ~~)), between the different
time series to allow classification is considered in the next section.
4.1. A DISSIMILARITY MEASURE

Kullback [7] defined t h e / - d i v e r g e n c e as
l(fO,fm) = f fo(X) .lOg~-x-~dx

fo(x) (1)
where fo and f,, are the probability density functions of the two different random variables.
The /-divergence or KL information number when considered in relation to Euclidean
geometry plays the role of squared Euclidean distance between two points. Csiszar [6]
points out that analogies exist between the properties of probability density functions
and Euclidean geometry. These analogies may allow the/-divergence numbers to be used
as a distance measure between two random variables with differing probability density
functions.
In the case where xo and x~ are n-dimensional, normally distributed random variables
with n-component means, tto and ~m, and n x n covariance matrices, -~o and -~m, then, [7]
2l(fo,fm)=log~+tr{,~m~,~o}+tr{~,t(lXo-t~,)(l~o-ttm)'}-n (2)
where: Iml denotes the determinant of matrix A; tr (A) denotes the trace of matrix A;
A -1 denotes the inverse of matrix A; A' denotes the transpose of matrix A.
When considering two sampled time series, x ~°) and x ~s), the dissimilarity measure,
d(x ~°),xCm)), may be calculated to mimic equation (2) [12]. If/2 represents the sample
means and ,~j for j = 0 or m, represents the sample covariance matrices of x Cj), then,
A
2d(xCO~,x~.~)_ - log i-~ol

I ~l + t r {2m-~ ,~o}+tr {-~m
-1 (it0--/~,)(tto--
A ~m) }--n (3)
Use o f this form of equation to determine a dissimilarity measure depends on the time
series having Gaussian statistics. If the probability distribution is not known, it must be
assumed Gaussian.
312 c.K. MECHEFSKE AND J. M A T H E W
4.2. COMPUTATION OF THE DISSIMILARITY MEASURE
From equation (2) it can be shown that the KL number between two Gaussian time series
may be obtained from the following equation [12]
0 -2 l ~ co
2I(fo,f,,) = log "~ +--T =oEk=Xoa°(i)ct"(k)y°(k - i) - 1 (4)
(TO Orm i = =
where: tr] is the variance; % denotes an autoregressive (AR) operator; yj denotes the
covariance function.
Equation (4) requires the determination of an autoregressive representation of the
sampled time series. Given that each sampled time series has a corresponding AR model,
determination of the dissimilarity number may take place using equation (5).
log 9 2 1 po p.,
2d(x{°)' x(")) - y' ao(i)am(k)Co(k- i ) - 1
-~ro 4-~mm,L0"=k=0 (5)
where: t~] is the sample variance; aj denotes an AR model parameter; p; denotes the AR
model order; Co is the estimated covariance function.
The estimated covariance function may be calculated using the following formulae.
Cj(k) = - (x(J)(t+k)-~)(x(J)(t)-~) (6)

n t=l
~=1 ~ x(J)(t); k = 0 , 1. . . . . pj (7)

n t=l
4.3. C O M P U T A T I O N OF THE PROBABILITY OF MISCLASSIFICATION MEASURE
Being able to calculate a measure of the dissimilarity between sampled signals allows
the classifying of the signals using a data base of signals representing known faults.
However, if a comprehensive data base is not available the only classification that is
possible is either normal or not normal. In this case, a measure is needed to track the
progression of a given machine's deterioration over time. Using the equations described
in the preceding section a new trending index based on the probability of misclassification
(Pe) [13] is proposed.
Pe = exp { - d ( x (°), x("))} (8)
This (Pye) number represents the probability that a time series in a labelled category may
be misclassified as being in another category. By modifying the probability of mis-
classification a new index was formulated as the probability of fault existence (Pie) [see
equation (9)]. Noting the change in this number, calculated from successive signals taken
at regular time intervals, would then be another method of trending the deterioration of
a machine element.
Pfe = 100X (1 -- Pe). (9)
5. EXPERIMENTAL RESULTS
Vibration signals from low speed bearings in good condition and with three different
faults were collected using the apparatus and instrumentation described by Mechefske
and Mathew [14]. Data analysis prior to applying the classification scheme involved
amplitude demodulation of the vibration signal [ 15, 16] and the calculation of AR models
[14] based on short length vibration signals. The statistical distance measure was then
calculated using the variances, oavariances, the AR model parameters of the two signals
being compared, and equation (5).
Figure 1 shows a typical raw vibration signal taken from a damaged bearing. The figure
shows short duration impact vibrations superimposed on an otherwise predominantly
tonal signal. The impact vibrations are caused by the bearing rollers contacting fault
which was artificially induced into, in this case, the outer race of the bearing. Figure 2
shows the same signal after amplitude demodulation. A vibration signal of just over 5 sec
in length was used in the classification and AR modelling routines to represent each
bearing condition.
In Tables 1 and 2 the signals in the left most column represent a catalogue of signals
representing known bearing conditions, [no-fault (NOF), outer race fault (ORF), rolling
element fault (REF), inner race fault (IRF)]. Table 1 shows the statistical distance between
signals representing known faults and various sample signals. The distance measure clearly
reaches a minimum when the test sample and the known sample represent the same
0.6
0-4
0.2
g o
0.
-0.2
-0.4
-0-6
-0.8 I I I I I I I
0 I00 200 300 400 500 600 700 800

Time (msec)
Figure l. Vibration signal (outer race fault).
0.2
0.15
0.1
O.
0.05
-0.05 i i t t I I t
0 I00 200 300 400 500 600 7OO 800

Time (reset)
Figure 2. Amplitude demodulated vibration signal.

314 c. K. MECHEFSKE AND J. MATHEW
TABLE 1
Statistical distance measure between signals
Samples
Known faults NOF ORF REF IRF
NOF 0"0115 0"7374 0.5366 1"5494

ORF 0"3590 0.11059 1"2104 0.8434
REF 1.0450 1"3321 0.0042 0.1598
IRF 0.3062 0"2722 0-2782 0.01165
TABLE 2
Probability of fault existence (Pie) measure between signals
Samples
Known faults NOF ORF REF IRF
NOF 1.2 52.2 41.5 78.8

ORF 30'2 0"6 70"2 57.0
REF 64.8 73.6 0.4 18.0
IRF 26.4 23.8 31.5 0-7
TABLE 3
Probability of fault existence (Pie)
Sample signals
Known condition NOF ORF REF IRF
NOF1 1.2 52.2 41-5 78.8

NOF2 6"5 54"2 74.3 78.0
NOF3 4.9 62"6 52.6 70.1
condition. The non-zero diagonal and lack of symmetry in Table 1 are attributable to the
finite model order of the AR models. Table 2 shows the probability of fault existence
index when calculated using known faults as normal signals. Again the minimums occur
where the known fault signals and sample signals represent the same condition.
Table 3 shows the performance of the probability of fault existence trending index
(Pfe) for three separate trials where NOF1, NOF2 and NOF3 are signals which represent
no-fault conditions but were recorded at different times. When a data base o f signals
representing known fault conditions is not available, such an index may be used to alert
monitoring personnel to the existence of fault conditions. It can be seen that a significant
and repeatable difference exists between the good bearing signal and each of the faulty
bearing signals.
6. DISCUSSION
After fault detection has been accomplished the more difficult task of fault diagnosis
remains. In many cases, unless the services of personnel skilled and experienced in the
art of fault diagnosis are available, this task is not completed satisfactorily. The procedure
described in this paper offers a way of obtaining preliminary diagnostic results without
the need for highly trained technicians. The only requirement of the procedure is that a
catalogue or data base of previously experienced faults, for the machinery being con-
sidered, be available. Even without a comprehensive data base the technique may still
be used as an accurate indicator of a machine's deviation from normal operating behaviour.
The results, as shown in Table 2, clearly indicate the diagnostic capabilities of the
probability of fault existence (P:~) parameter. When the sample signals are compared to
signals representing known faults, the sample signals are correctly classified. In each case
a low value of the Pye parameter indicates the most likely category into which the sample
signal should be classified (see diagonal values in Table 2). This type of easily interpretable
presentation will make preliminary diagnosis of machine faults possible for all condition
monitoring practitioners regardless of training.
As mentioned previously, if a data base of known fault signals is not present, the Pse
parameter still provides effective fault detection capabilities. Table 3 shows a set of sample
signals compared to several signals known to represent no-fault conditions. The no-fault
sample signal is correctly classified in each case. All the other signals show a much larger
probability of fault existence.
In both tables the P:e parameter represents the probability that the sample signal is
distinct from the known signal to which it is being compared. A low value means a correct
classification. Simply looking for the lowest match in a comparison table makes fault
diagnosis a straightforward task.
7. CONCLUDING COMMENTS
This paper has investigated the possibility of automatically diagnosing faults in rolling
element bearings when using relatively short vibration signal lengths. The procedure, as
described, has been shown to provide conclusive diagnostic results without the necessity
of a trained and experienced analyst reviewing the frequency spectra. A new trending
index, known as the probability of fault existence, (Pie) has also been described and has
excellent potential for detecting fault development. Future work is planned which will
evaluate the diagnostic procedure and trending index when using signals that represent
gradual bearing deterioration.
ACKNOWLEDGEMENTS
The authors wish to acknowledge the support provided by Comalco Smelting Ltd., and
the Centre for Machine Condition Monitoring at Monash University, Melbourne,
Australia.
REFERENCES
1. E. FIx and J. L. HODGESJR. 1951 USAFSchool of Aviation Medicine, Randolph Field, Texas,
Project 21-49-004 Report 4 Contract AF41(128)-31. Discriminatory analysis; nonparametric
discrimination.
2. E. FIX and J. L. HODGESJR. 1952 USAFSchool of Aviation Medicine, Randolph Field, Texas,
Project 21-49-008 Report 11 Contract AF41(128)-31. Discriminatory analysis, small sample
performance.
3. T. M. COVER and P. E. HART 1967 IEEE Transactions on Information Theory IToI3, 21-27.
Nearest neighbour pattern classification.
4. L. DEVROYE and T. J. WAGNER 1982 Handbook of Statistics, 12, Ed. by P.R. Krishnaiah and
L. N. Kanal. Nearest neighbour methods in discrimination. Amsterdam: North-Holland,
316 c. K. MECHEFSKE AND J. MATHEW
5. G. L. RITTER, H. B. WOODRUFF, S. R. LOWRY and T. L. ISENHOUR 1975 IEEE Transactions
on Information Theory IT-21, 665-669. An algorithm for a selective nearest neighbour rule.
6. I. CSISZAR 1975 The Annals of Probability 3, 146-158. /-Divergence geometry of probability
distributions and minimisation problems.
7. S. KULLBACK 1958 Information Theory and Statistics, New York: John Wiley.
8. G. M. WHITE and P. J. FONG 1975 IEEE Transactions on Systems, Man, and Cybernetics
SMC-8, 389 k-nearest neighbour decision rule performance in a speech recognition scheme.
9. K. K. PALIWAL and P. V. S. RAO 1982 IEEE Transactions on Pattern Analysis and Machine
Intelligence PAMI-5, 229-231. Application of k-nearest neighbour decision rule to vowel
recognition.
10. F. ITAKURA 1975 IEEE Transactions on Acoustics, Speech, and Signal Processing ASSP-23,
67-72. Minimum prediction residual principle applied to speech recognition.
11. W. GERSCH, F. M. J. YONEMOTO, M. D. LOW and J. A. McEWAN 1979 Science 205, 193-195.
Automatic classification of electroencephalograms: Kullback-Leibler nearest neighbour rules.
12. W. GERSCH, T. BOTHERTON and S. BRAUN 1983 Transactions of the ASME 105, 178-184.
Nearest neighbour time series analysis classification of faults in rotating machinery.
13. S. BRAUN 1986 Mechanical Signature Analysis: Theory and Applications. New York: Academic
Press.
14. C. K. MECHEFSKE and J. MATHEW 1990 2rid International Machine Monitoring and Diagonos-
tics Conference, Los Angeles, CA, pp. 108-114. Parametric spectral estimation to detect and
diagnose faults in low speed rolling element bearings: preliminary investigations.
15. J. MATHEW, A. SZCZEPANIK, B. T. KUHNELL and J. S. STECKI 1987 The Institution of
Engineers, Australia, International Tribology Conference, Melbourne, Australia, pp. 366-369.
Incipient damage detection in low speed bearings using the demodulated resonance technique.
16. M. L u o and J. MATHEW 1989 Proceedings of the Asia Vibration Conference, Shenzhen, China,
pp. 712-719. Demodulation resonance analysis: a theoretical analysis.

Fault Detection and Diagnosis in Low Speed Rolling Element Bearings Part 2 The Use of Nearest Neighbour Classification PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Fault Detection and Diagnosis in Low Speed Rolling Element Bearings Part 2 The Use of Nearest Neighbour Classification PDF

Uploaded by

Copyright:

Available Formats

Mechanical Systems and Signal Processing (1992) 6(4), 309-316

FAULT DETECTION AND DIAGNOSIS IN L O W SPEED

C. K. MECHEFSKE AND J. MATHEW

(Received 12 February 1991, accepted 18 March 1991)

2. NEAREST NEIGHBOUR CLASSIFICATION

3. PROBABILITY OF MISCLASSIFICATION TRENDING INDEX

4.1. A DISSIMILARITY MEASURE

l(fO,fm) = f fo(X) .lOg~-x-~dx

2d(xCO~,x~.~)_ - log i-~ol

4.2. COMPUTATION OF THE DISSIMILARITY MEASURE

Cj(k) = - (x(J)(t+k)-~)(x(J)(t)-~) (6)

~=1 ~ x(J)(t); k = 0 , 1. . . . . pj (7)

4.3. C O M P U T A T I O N OF THE PROBABILITY OF MISCLASSIFICATION MEASURE

0 I00 200 300 400 500 600 700 800

Figure l. Vibration signal (outer race fault).

0 I00 200 300 400 500 600 7OO 800

Figure 2. Amplitude demodulated vibration signal.

Known faults NOF ORF REF IRF

NOF 0"0115 0"7374 0.5366 1"5494

Known faults NOF ORF REF IRF

NOF 1.2 52.2 41.5 78.8

Known condition NOF ORF REF IRF

NOF1 1.2 52.2 41-5 78.8

You might also like