Professional Documents
Culture Documents
The use of MD-CUMSUM and NARX neural network for anticipating the re-
maining useful life of bearings
PII: S0263-2241(17)30464-5
DOI: http://dx.doi.org/10.1016/j.measurement.2017.07.030
Reference: MEASUR 4870
Please cite this article as: A. Rai, S.H. Upadhyay, The use of MD-CUMSUM and NARX neural network for
anticipating the remaining useful life of bearings, Measurement (2017), doi: http://dx.doi.org/10.1016/
j.measurement.2017.07.030
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers
we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and
review of the resulting proof before it is published in its final form. Please note that during the production process
errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
The use of MD-CUMSUM and NARX neural network for anticipating the remaining useful
life of bearings
Akhand Rai, S H Upadhyay
Department of Mechanical & Industrial Engineering
Indian Institute of Technology, Roorkee.
Email: raiakhand@gmail.com, shumefme@iitr.ac.in
Abstract:
The accurate determination of remaining useful life (RUL) of bearings is of immense
importance in the condition-based maintenance of any rotating machinery. In this paper, a data
driven prognostic approach based on nonlinear autoregressive neural network with eXogenous
Inputs (NARX-NN) in combination with wavelet-filter technique is applied to the RUL
estimation of bearings. Firstly, the vibration signals generated in an experimental test rig are
processed with the proposed wavelet-filter to augment the impulsive characteristics of bearing
signals and improve the quality of fault feature extraction. Secondly, a variety of time-domain
features are extracted from the processed bearing signals. However, these features exhibit a
highly non-monotonic behaviour as the bearing condition degrades. To overcome this drawback,
a new health indicator (HI) based on Mahalanobis distance (MD) criterion and cumulative sum
(CUMSUM) chart is proposed in this paper. Thirdly, the NARX-NN is first designed as a time
delay neural network (TDNN). Then, the derived HI and the age of the bearing are used as inputs
with life percentage of the bearing as output in order to train the TDNN model, which unlike the
usual artificial neural networks (ANNs) performs a one-step ahead prediction of the bearing
RUL. The results suggest that the proposed method can effectively predict the RUL of bearings
with an acceptable degree of accuracy, and outperforms the use of self-organizing map-based
indicator and the traditional FFNNs for RUL inference.
Keywords: Rolling element bearings, Prognostics, Health indicator, NARX neural network,
Remaining useful life.
1. Introduction
Rolling element Bearings (REBs) constitute the most critical components of any rotating
machinery. According to the literature [1], 40–50% of all motor failures occur due to the
malfunctioning of bearings. As a consequence of this, catastrophic breakdown of the machinery
is inevitable. This in turn increases the machine downtime and overhauling costs causing huge
economic losses to the industry. The development of an efficient maintenance strategy therefore
becomes necessary to guarantee the proper functioning of the rotating machinery. Recently,
condition-based maintenance (CBM) has evolved as a competent maintenance technique and
being widely utilized by the industries. In CBM, maintenance events are planned before the
failure of the machinery thereby reducing the jeopardy of terrible breakdowns and hence
P age |1
avoiding unnecessary maintenance expenses [2]. The three key steps for an effective CBM are
data acquisition, data processing and decision-making [3] as depicted in Figure 1.
Diagnostics is an assessment about the existing (and past) health of a system based on
observed symptoms. It deals with fault detection, isolation and identification when fault occurs.
However, prognostics deals with the evaluation of future health of a machine and the time left
before the occurrence of failure provided the current machine age and the past operation profile
are known. The time left before observing a failure is called as remaining useful life (RUL).
Though, a massive amount of literature is available on the diagnostics of REBs, prognostics is an
emerging research area and is gaining momentum during the recent few years. The precise
estimation of RUL is one of the challenging tasks in prognostics of REBs.
The literature [4] cites three different approaches for prognostics: physics based
prognostics, data-driven prognostics and integrated approaches. Majority of the approaches
applied for the RUL estimation of REBs are data driven approaches that are dependent upon the
availability of the run-to-failure bearing data for training the machine-learning models. The most
familiar machine-learning techniques are those based on neural networks. A good number of
articles are available in the literature utilizing the neural network approach for predicting the
remnant life of bearings [5-9]. For instance, Shao and Nezu [5] utilized the feed forward neural
network (FFNN) to perform single-step and multi-step ahead prediction of the bearing health
state by using a progression based prediction model of the bearing damage process. Gebraeel et
al. [6] utilized single bearing and clustered bearing FFNN models in association with weight
application techniques for anticipating the RUL of bearings. Built upon the same principle,
Huang et al. [7] exploited minimum quantization error derived from self-organizing maps (SOM-
MQE) to train the FFNNs and estimate the bearing RUL. Although the techniques discussed in
these papers are able to forecast the RUL of bearings, there are some limitations too. The weight
application techniques adopted in papers [6,7] necessitate the training of several FFNNs which is
quite time consuming. Besides, the RUL prediction methods proposed in the papers [6,7] require
P age |2
the setting up of failure thresholds that needs a lot of practical experience and is human intuitive
in nature. This difficulty is addressed well in the literature [8,9] where life percentage of the
bearing was employed to build the RUL prediction models and setting up of failure threshold
was not required. The literature [8] used the Weibull universal failure rate function (WUFRF) to
fit the defect features and further supplied them as inputs to artificial neural network (ANN) for
predicting the RUL of pump bearings. Ali et al. [9] again utilized the WUFRF fitted fault
features to train the simplified fuzzy adaptive resonance theory map (SFAM) neural network and
realize prognostics of bearing RUL. WUFRF was deployed to eliminate the noise in the fault
features and attain monotonic indicators which help to minimize the ANN training and testing
inaccuracies. Unlike the previous works [5-7], the researches [8,9] emphasized the use of
monotonic features in determination of bearing RUL. However, the employ of Weibull fitted
measurements for building the prediction model is a tedious task and difficult to implement in
the real-time environment.
Further challenge in bearing prognostics is the extraction of effective fault features from
the bearing signals. The available literature offers a variety of features for diagnosis and
prognosis of bearing defects. Among them, the time-domain features are frequently used [10-12]
and therefore utilized in this paper. Before feature extraction, the raw bearing signals are
processed by a wavelet filter for enhancement of weak periodic impulses generated by the
bearing defects and thus increase the sensitivity of time-domain features to defective bearing
signals. In the past, the wavelet based filters have been used extensively for condition monitoring
of bearings due to their excellent multiresolution capability in time-frequency analysis [13-15].
Hong et al. [13] utilized Lempel-Ziv complexity measure based on continuous wavelet transform
(CWT) filtering to assess the severity of faults in bearings. Verma et al. [14] proposed a morlet
wavelet filter to improve the signal to noise ratio of bearing signals before utilizing them for the
prognosis of bearings. Qiu et al. [15] performed a wavelet denoising of the signals using an
optimal morlet wavelet filter and later proposed SOM-MQE to assess the bearing performance
degradation. However, the defect features extracted from the wavelet-filtered signals still cannot
be used directly to estimate the RUL of bearings due to several reasons: (i) The large
dimensionality of the feature space may introduce a lot of noise in the training process (ii) Some
features may respond to a certain defect of certain severity while the others may not [15].
Consequently, a single HI is desired to comprehensively reflect the bearing condition (iii) the
extracted features show severe fluctuations due to the unstable nature of damage propagation in
bearings and finally lose their monotonicity towards the end lifetime of the bearings. Many
recently developed HIs [16-18] suffer from this deficiency. To address these shortcomings, a HI
based on MD-CUMSUM is proposed in this paper. The MD criterion is exploited to fuse the
time-domain features and CUMSUM is used to extract a monotonically increasing behavior of
the HI as the bearing progresses to its failure.
In this paper, a reduced structure of NARX-NN known as time-delay neural networks
(TDNN) is utilized to forecast the RUL. Both the NARX-NN and TDNN are primarily a type of
recurrent neural networks (RNNs) [19]. The RNNs have already been applied successfully for
P age |3
bearing prognostics in some of the previous works [5,20]. The RNN architecture suggested in
these papers utilize time-window of past values of a given time-series for one-step and multi-step
ahead predictions of the same time-series. These works were however limited only to forecasting
the bearing condition rather than estimating the RUL of bearings. On the other hand, a NARX-
NN or TDNN deals with the one-step and multi-step ahead predictions of a given time-series
with the help of a different time-series. The input time-series which assists in the prediction of
output time-series is called as exogenous inputs [21]. Thus, the constructed HI serves as an
exogenous input and the output is set to life percentage of the bearings for training the TDNN
and predict the bearing RUL. Although, a TDNN topology has been used in the paper, but to
symbolize the use of exogenous inputs, it will be referred to as NARX-NN in future. In addition,
the traditional FFNNs with no memory exhibit forgetting behavior once the current input is
mapped to the current output. Consequently, in this study a NARX-NN is used due to its ability
to take into account the past observations that are critical in anticipating the RUL of bearings.
Thus, this paper in all proposes a novel approach based on a fusion of wavelet filtering, MD-
CUMSUM and NARX-NN to estimate the RUL of bearings. The remainder of this paper is
organized as follows: In the section 2, the wavelet based filtering approach and the technical
background of NARX-NN is discussed. Section 3, introduces the proposed framework for
obtaining the advocated HI and assessment of RUL. The experimental apparatus is described in
section 4 along with the results and discussions. Finally, the work is concluded in section 5.
2. Theoretical Background
2.1 Signature enhancement using wavelet-filter approach
A wavelet is constructed from a single function δ(t) by translation and dilation
1 t p
(t ) (1)
p q
where p is called as scaling parameter, q is the time localization parameter and δ(t) is called the
‘mother wavelet’.
The continuous wavelet transform (CWT) of a signal x(t) with finite energy is the
convolution of x(t) with the complex conjugate of the scaled and shifted version of the mother
wavelet δ(t). Mathematically, the CWT is represented as follows:
1 t p
CWT ( p, q)
p
x(t ) *
q
(2)
P age |4
where ˆ( ) is the fast Fourier transform of δ(t). Eq. (3) can be utilized to obtain the inversion
formula in order to reconstruct the original signal as follows:
1 1
x(t )
C CWT ( p, q) (t ) p
0
2
dp.dq (4)
The wavelet signal processing technique offers a variety of wavelet families to process
the vibration signals. As an example, Abbasion et al. [23], used the discrete Meyer wavelet to
denoise the raw signals for effective diagnosis of the bearings. The articles [24,25] used the
daubechies family of wavelets for signal processing and analysis in their research work. Singh et
al.[26] utilized the morlet and complex morlet wavelet for bearing fault detection in induction
motors. Khanam et al.[27], executed a wavelet decomposition of the bearing vibration signal
using the Symlet-5 wavelet in order to estimate the defect size on the outer race of ball bearing.
Further information on the wavelet families can be seen in the reference [28].
In this paper, three mother wavelets namely Daubechies-10, Morlet and discrete Meyer
wavelet respectively are considered for analysis based on their frequent usage in earlier works.
The optimal wavelet filter is constructed into the following steps:
Step 1 The vibration signal is first decomposed through CWT into different scales using the
Daubechies-10 wavelet. Thus, a set of continuous wavelet coefficients (CWCs) are
generated at the various scales with more resolution in time and frequency domain.
Step 2 In the second step, the Shannon entropy of the CWCs is evaluated by using the relation:
[29,30]:
N
ShCWT ( p) p( si ) log 2 p( si ) (5)
i 1
where si denotes the ith scale; and p(si) is the energy probability distribution computed
as:
N
p( si ) E ( si ) / E ( si )
i 1
n
(6)
E ( si ) C ( si , j )
2
j 1
where C ( si , j ) represents the CWCs at scale si, E(si) is energy of CWCs at scale si, N is
the total number of decomposed CWT scales and n is the number of CWCs at each scale
s.
Step 3 The steps (1)-(2) are repeated for the remaining two analyzing wavelets. In this manner,
the Shannon wavelet entropy corresponding to each of the three mother wavelets is
obtained.
Step 4 The wavelet basis with the minimum Shannon wavelet entropy value produces the
sparsest set of wavelet coefficients with maximum periodicity and energy concentration
[31, 32] and therefore selected for further analysis.
Step 5 Once the appropriate analyzing wavelet has been achieved, the next step is to choose the
best scale. Since the objective of wavelet filtering is to retrieve the impulsive content of
P age |5
the signal, the kurtosis values of the wavelet coefficients at various scales are estimated
and the scale with the highest kurtosis value is retained. The kurtosis is often used as a
measure of impulsiveness in the signal but at the same time, it is sensitive to random
impulses produced by noise [33]. Thus, the use of kurtosis only may result in a false
choice of the desired scale. The CWT decomposes the original signal into a number of
frequency components represented by various scales. Thus, the wavelet coefficients at
scales containing the bearing defect frequencies will show a higher energy content than
those of the remaining ones [34]. Hence, in this paper, the product of energy and kurtosis
(PEK) is maximized to obtain the optimal scale for fault feature extraction.
P age |6
Nh
dx
y (n 1) f o bo wh 0 . f h bh wih x(n i ) (10)
h 1 i 0
It should be noted that the TDNN is driven by the past values of exogenous or external
input x(n) in contrast to the previously used RNNs applying the past values of the same time-
series that is to be predicted. As such there is no existing standard for representing a TDNN and
therefore, in this paper, a TDNN has been referred to as a NARX-NN only to distinguish it from
the conventional FFNN arrangements. Fig. 2(b) represents the structure of a TDNN deduced
from NARX-NN by eliminating the tapped delyaed lines for the output time-series.
3. Proposed framework
The proposed method employed for the RUL prediction of bearings is portrayed in Figure
3. The various steps are summarized as follows:
Step 1: The vibration signals are collected through machine condition monitoring and the
acquired raw signals are denoised using the wavelet based filter proposed in section 2.1
Step 2: In this step, the eight time domain features listed in Table 1 are extracted from the
wavelet filtered signals. The computational formulas for the features are also included in
Table 1. Here C (s, j ) and (s, j ) are the mean and standard deviations respectively of the
filtered signal data calculated as:
n
C ( si , j ) 1/ n C ( si , j ) (11)
j 1
C (s , j) C (s , j)
n 2
i i
σ si , j j 1
(12)
n 1
P age |7
Step 3: The third step involves the fusion of extracted time domain features by applying the MD
criterion. MD is a commonly used indicator for detecting the degradation process in
bearings [41-44]. It offers the advantages of scale-invariance and correlation
identification between different features [43]. Let the constructed feature set be denoted
by Fp× q = [f ij] p× q , i = 1,2,…, p ; j = 1,2,…,q where f ij represents the ith observation of
the jth feature, p is the total number of observations and q is the total number of features.
To make sure that each variable contributes equally to the MD measure, the observations
in F are normalized as:
fij f j
zij (13)
σj
m
f j 1/ m fij (14)
i 1
f fj
m 2
ij
σj i 1
(15)
m 1
where m is the number of healthy observations in the dataset.
Finally, the MD for a normalized test feature zij is calculated using the following relation:
1
MD(i ) zij C 1 zijT (16)
q
where C is the correlation matrix of the normalized features.
Step 4: After the calculation of MD, the CUMSUM method is initiated to extract the required HI.
CUMSUM chart is a well established methodology to detect an out of control process
[45,46]. However, here the authors are more interested in the ability of CUMSUM charts
to provide a monotonically growing curve once the degradation in the bearing
commences. The CUMSUM calculates the upward and downward deviations from the
target value as follows [47]:
CSi max 0, MDi ( μ0 k ) CSi1
(17)
CSi max 0, ( μ0 k ) MDi CSi 1
where CSi and CSi are the upward and downward CUMSUM, μ0 is the target value
and k is the slack value equal to half of the process shift which is to be detected. The HI
proposed in this paper is then given as:
HI CSi (18)
P age |8
Machine condition monitoring
Feature Extraction from the filtered Feature Extraction from the filtered
signal signal
Set HI and age of the bearing as Set HI and age of the bearing as
NARX-NN inputs and bearing life inputs to the optimal NARX-NN
percentage as output and predict bearing life percentages
as outputs
Delays
Best NO
Lowest 20 % data-Test
MSE model
YES
RUL prediction
P age |9
Step 5: Finally the developed HI is used to train the NARX-NN. Besides, the age of the bearing
being crucial to RUL determination [8,9] is also added to the input dataset. The output is
set to life percentage of the bearing. However, before feeding to the neural network, the
features are normalized in the range 0.1-0.9 by using the relation given below [48]:
xn 0.8 ( x xmin ) / ( xmax xmin ) 0.1 (19)
Step 4: Extensive number of iterations are carried out to arrive at the network giving the best
performance. The optimality of the NARX-NN is decided by two parameters i.e. number
of hidden neurons and number of input delays. The selection rules for the best NARX-
NN are discussed in the upcoming sections. The optimal network is then utilized to
forecast the bearing RUL.
4
1 n 1 n C ( si , j ) C ( si , j )
RMS C (si , j )2 Kurtosis
n j 1 n j 1 ( si , j ) 4
3
1 n C ( si , j ) C ( si , j )
Skewness
n j 1 ( si , j )3
Crest factor max C (si , j ) RMS
1 n 1 n
Impulse factor max C ( si , j ) C (si , j )
n j 1 Shape factor RMS C (si , j)
n j 1
2
1 n
Margin factor max C ( si , j ) C ( si , j ) Peak to Peak max C (si , j ) min C (si , j )
n j 1
P a g e | 10
screw handle in clockwise and anticlockwise directions. Run-to-failure tests are carried out for
collecting the vibration data over the whole life of the bearings. The bearing failure is inferred by
an abrupt increase in RMS values. If a bearing in one of the pedestals fails, the test is terminated.
The next test proceeds with new bearings in both the housings E and G. The failure time of each
bearing is recorded starting from the time of its installation.
The data acquisition arrangement consists of PCB 608A11 piezoelectric ICP
accelerometers mounted on bearing housings and a National Instruments compact Data
Acquisition (NI cDAQ-9174) system programmed with NI LabVIEW software. The vibration
data is collected for a duration of 0.1 seconds after every 10 minutes. The sampling frequency is
set to 20.48 kHz and the length of the vibration signal for a period of 0.1 seconds is 2048 points.
Table 2 provides the details of the failure time and type of defect for the damaged bearings
considered in the paper.
P a g e | 11
Fig. 4(b). Bearing test rig
(A) AC motor, (B) Speed control unit, (C) Tachometer, (D) Flexible Coupling , (E) Bearing housing 1,
(F) Accelerometer 1 (G) Bearing housing 2, (H) Accelerometer 2, (I) Load disc, (J) Power screw
arrangement, (K) Data acquisition unit, (L) Computer
where y ip and yai are the predicted and actual values at inspection point i, respectively. N is the
number of inspection points.
During the training phase, the feature vectors are divided into three subsets: the training
set, the validation set and the test set. The training data is used to adjust the weights and biases
while the validation data is used to stop the training when the network generalization stops
improving. The test data does not affect the training process and judges the network performance
in an independent manner. In the early stages of training process, the MSE for both the training
P a g e | 12
set and validation set decreases but after a certain point, the MSE for the validation set starts
increasing. This implies that the network has started overfitting the training data and the training
should be stopped [50]. This process of generalization is called as early stopping. The problem of
overfitting means that the network performs well with the validation dataset but gives poor
performance with the test data set. As such, the optimal network in this paper is the one, which
produces the lowest test MSE.
P a g e | 13
two cases have been considered in the paper: Case I utilizes the bearings B2-B5 for training the
NARX-NN and bearing B1 is employed for testing. Case II tests the bearing B3 against the
NARX-NN trained by the remaining bearings. The input data is normalized between 0.1 and 0.9
using Eq. (19) so as to improve the efficiency of the NARX-NN. The test bearing data is also
normalized using the maximum and minimum values of training data. Test values less than the
minimum value are set to 0.1 and greater than the maximum value are set to 0.9. The data is
initially separated in a random manner into 80% for constituting the training and validation sets
and the remaining 20% for forming the test set. Keeping the test set fixed at 20%, the validation
set is varied in the range 10-20-30% and the division that yields the minimum test MSE is
retained. Based on iterations, the data is finally divided as 60% for training, 20% for validation
and 20% for testing respectively. After a considerable number of iterations and retraining, the
parameters for the optimal network with lowest 20% test-set MSE are attained and are provided
in Table 3. The MSEs between the actual and predicted life percentages over the complete
lifetime of bearings for cases I and II are also provided in Table 3.
Fig. 5. (a) Raw defective signal (b) Shannon wavelet entropies for the three mother wavelets (c)
Shape of the morlet wavelet (d) CWCs at scales 1:64 using Morlet wavelet (c) PEK variation
with scales (d) Filtered Signal.
P a g e | 14
Fig. 6. Features extracted over the complete lifetime of bearing B2
Fig. 7. (a) MD and (b) HI for the entire failure history of bearing B2
P a g e | 15
Fig. 8. Proposed HI for bearings B1, B3, B4 and B5
The training results for the optimal NARX network for Case I are shown in Figs. 9(a) -9(c)
respectively. Fig. 9(a) plots the training response of the network. Fig. 9(b) depicts the learning
curves for the optimal network. Fig. 9(c) further indicates that a higher value of regression close
to one is obtained for each of the training, validation and independent test datasets. Figs. 10(a)
and 10(b) compare the predicted and actual life percentages for the two tested bearings B1 and
B3. From the Figs. 10(a) and (b), a simple relation can be established to predict the RUL of
bearings which is given by:
RUL (T100 / Tc ) tc tc (21)
where T100 and Tc are the total life percentage and the current life percentage respectively. tc is
the time for which the bearing has run or the current age of the bearing.
Table 3. Training and testing results of the optimal NARX–NN prediction model
Results Case I Case II
Training bearings B2, B3, B4, B5 B1, B2, B4, B5
Testing bearings B1 B3
Optimal no. of hidden neurons 6 20
Optimal no. of input delays 10 8
Training MSE 0.0118 0.0103
Validation MSE 0.0119 0.0097
Independent (20%) test data MSE 0.0113 0.0123
Overall prediction MSEs for bearings B1 and B3 0.0055 0.0099
P a g e | 16
Fig. 9. Training Results for best NAXR-NN in Case I (a) Output of the network (b) Performance
of the network and (c) Regression plots for the 60% training, 20% validation and 20% test data.
As an example, suppose the examiner is interested to calculate the RUL of the bearing B1
after 11,000 minutes of operation. From fig. 10(a), it can be visualized that the current life
percentage at 11,000 minutes is (0.579 x 100 = 57.9%) and the total life percentage always
remains equal to 100%. Thus, the current age of the bearing is 11,000 minutes. Using the relation
(21), the predicted RUL of the bearing is 7,998 minutes and the predicted failure time would be
11,000+7,998 = 18,998 minutes. The actual RUL of the bearing is, however, equal to 8,650
minutes. The following expression can measure the prediction accuracy:
RULpredicted RULactual
Accuracy 1 (22)
Ractual
Thus, it is observed that the proposed approach predicts the RUL of bearing B1 at the inspected
time-step with a satisfactory degree of accuracy that is equal to 92.46%. The RUL for the
P a g e | 17
bearings B1 and B3 at various time-steps are calculated in a similar manner and are shown in
Figs. 11(a) and (b) respectively.
Fig. 10. Prediction results for Cases I and II (a) Actual vs. predicted life percentage for bearing
B1 (b) Actual vs. predicted life percentage for bearing B3
Fig. 11. Prediction results for Cases I and II (a) Actual vs. predicted RUL for bearing B1 (b)
Actual vs. predicted RUL for bearing B3
It is apparent that the damage during the later stages of bearing life grows at a much
faster rate as compared to the beginning ones and therefore an accurate determination of RUL
P a g e | 18
during the end phases becomes more essential in order to prevent the sudden failure of bearings.
From Figs. 11 (a) and (b), it is seen that in the beginning the predicted RUL differs significantly
from the actual RUL. However, as the age of the bearing advances, the predicted RUL shifts
closer to the actual RUL and the accuracy of prediction increases. Figs. 11 (a) and (b) also
express that the RUL values predicted at majority of the inspection points are below the actual
RUL values which is a desirable factor in bearing prognostics. An underestimated RUL value
provides an early warning of the bearing failure and subsequently offers enough time-duration to
implement the maintenance actions. At the same time, Table 3 indicates that the prediction MSEs
achieved for the test bearings B1 and B3 are very small and good enough for the purpose of RUL
evaluation. The obtained results clearly demonstrate the effectiveness of the advocated technique
in accurate assessment of the bearing RUL.
4.4 Comparisons
The proposed method differs from the previous works in two aspects: (i) it uses a new
monotonic index based on MD criterion and CUMSUM (ii) it applies a different ANN structure
i.e. NARX-NN as compared to the usual FFNN models utilized earlier. Considering the two
perspectives, a comparison with other works is established in the subsequent paragraphs.
To confirm the effectiveness of the suggested HI, we compared it with the SOM-MQE
used in the references [7, 15]. The first 500 normalized time-domain feature vectors were used to
build the SOM model and the SOM-MQE was calculated for each of the bearings B1-B5. Then,
the SOM-MQEs and the age of the bearing were supplied as inputs to train the NARX-NN and
predict the bearing RUL. The corresponding results for Case I are plotted in Figs. 12(a) and (b).
As discussed before, the prediction accuracy in the late life of the bearings is more necessary
than in the beginning phases. Therefore, to judge the superiority of the recommended HI in the
final period, the prediction MSEs over the last 50 time-steps are also calculated. Table 4 provides
the overall MSE and the MSE obtained over the last 50-time steps using the MD-CUMSUM HI
and the SOM-MQE. It is revealed that the overall prediction and the prediction for last 50 time
steps using the proposed HI are 10% and 99.6% more accurate as compared to the SOM-MQE.
Thus, the use of MD-CUMSUM based HI is clearly advantageous over the SOM-MQE. It is
worth mentioning that the CUMSUM can also be applied to the prevailing degradation indicators
such as SOM-MQE to improve their performance and achieve better results. However, the task
of validating the efficacy of CUMSUM with other HIs is beyond the reach of this paper and is
left to the interested researchers.
The methodology proposed in this paper finds similarity to the literature [8] in the sense
that an ANN technique is adopted for building the prediction model and life percentage of the
bearing is used to calculate the RUL. Although the present work is inspired from the literature
[8], there exist several dissimilarities: (i) Instead of traditional RMS and kurtosis features, a more
robust measure i.e. MD is utilized in this work (ii) As an alternative to the WFRF fitted features,
CUMSUM is applied to extract the monotonicity from the MD. Consequently, the time spent in
fitting the features is eradicated. Moreover, the fitting of test feature samples being collected in
P a g e | 19
an online manner i.e. one sample at a time may give rise to errors in terms of best approximating
WFRF function (iii) The ANN used in the ref. [8] utilizes the measurement values at the current
and previous inspection points for training purpose. However, here a NARX-NN is utilized
which takes the measurement values at the past inspection points for training and outputs a one-
step ahead predicted value of the life-percentage or RUL. (iv) The selection of optimal prediction
model in the earlier work is based on the training MSE and the validation set is constructed using
the actual measurements from all the available failure histories. However, in this work, a data
division approach is employed to build the validation set and further an independent test set is
exploited to choose the best model thereby eliminating the possibility of errors arising due to
overfitting in the ANN. Let us compare the effectiveness of proposed NARX-NN structure with
the previously used ANN structure consisting of two hidden layers with three neurons in the first
hidden layer and two neurons in the second hidden layer. The ANN in [8] uses the HI and the
age of the bearings at the current and previous inspection points as inputs. The idea is to utilize
the current measurement values as well as the change of measurement values at these points. In,
our case, the NARX-NN utilizes the HI and the age values at the previous inspection points only.
Here, the motive is to use the previous measurement values to attain a one-step ahead predicted
value of the life percentage. To ensure a fair comparison, the structure of the NARX-NN is taken
identical to the ANN in [8]. The NARX-NN is constructed using two hidden layers where the
number of neurons are set to three and two respectively. The number of input delays in NARX-
NN is taken as two to have the number of input nodes identical to the ANN in [8]. As suggested
in [8], the training sets for the NARX-NN and ANN are composed using the derived HI
measurements and the validation sets are composed of the actual MD measurements. The
optimal NARX-NN is again selected based on the training MSE itself as for the earlier ANN.
Here, the only factor in which the authors are interested is the supply of inputs at dissimilar
inspection points i.e. at the current and previous time-steps for the ANN and at the previous
time-steps only for the NARX-NN. Both, the ANN and the NARX-NN are trained several times
and the ones giving the minimum training MSEs are preserved for RUL prediction. The
associated prediction MSEs for Case I are tabulated in Table 5. From Table 5, it is easily
concluded that although the overall prediction MSEs for both the NNs remains almost the same,
the NARX-NN performs 58% better than the ANN used before in terms of MSE over the last 50
time-steps. Also, the proposed NARX-NN structure imposes no restriction on varying the
number of delays that can be optimized to achieve further lower prediction MSEs.
We further extended our contrast to simple FFNN. The structure of the FFNN is taken
alike the NARX-NN i.e. it consists of an input layer, a hidden layer and an output layer. The
FFNN, however, utilizes the HI and the bearing age at current inspection points only as the
inputs. The output of the network remains the same i.e. the bearing life percentage. Here, the
optimal FFNN is decided by the number of hidden neurons only and is chosen in an analogous
manner as discussed in this paper for the NARX-NN. The number of hidden neurons is finally
set to 10 and the network is run eight times to attain the best model. The prediction results of the
FFNN are shown in Table 5. The NARX-NN improves the overall prediction accuracy by 8%
P a g e | 20
and final 50 time-steps prediction accuracy by 50.33% as compared to the traditional FFNN
model.
The authors would like to mention that the comparisons of NN models should be treated
valid only with the given settings of the network parameters. A more truthful comparison would
require the use of optimization methods such as genetic algorithm, particle swarm optimization
etc. to make a more accurate choice of the hidden neurons or both the hidden neurons and the
input delays depending upon the network architecture. This area will be studied in future.
However, no matter what NN model is being utilized, the HI based on MD-CUMSUM gives
lower prediction errors in all the cases and can be used effectively to predict the bearing RUL. In
addition, a one-step ahead output value predicted by the NARX-NN obviously provides an early
indication of the RUL, which is very much required when the bearing is close to failure.
Table 4. Comparison results for the HI suggested in the paper and the SOM-MQE for Case I
Results SOM-MQE Suggested HI
Training bearings B2, B3, B4, B5 B2, B3, B4, B5
Testing bearings B1 B1
Prediction MSE over the full lifetime 0.0061 0.0055
Prediction MSE over the last 50 time-steps 0.0042 1.65e-05
Table 5. Comparison results of the NARX-NN with ANN used in [8] and the traditional FFNN
Results NARX-NN ANN [8] FFNN
Training bearings B2, B3, B4, B5 B2, B3, B4, B5 B2, B3, B4, B5
Testing bearings B1 B1 B1
Overall prediction MSE 0.0059 0.0061 0.0064
MSE over the last 50 time-steps 1.49e-04 3.58e-04 3.00e-04
Fig. 12. Prediction results for Case I using SOM-MQE (a) Actual vs. predicted life percentage
for bearing B1 (b) Actual vs. predicted RUL for bearing B1
P a g e | 21
5. Conclusions
This paper proposes a neural network approach in combination with a wavelet-based
denoising method to assess the RUL of a bearing. The raw signals are treated with the wavelet
filter before the feature extraction process. The extracted features are then fused using the MD
criterion. The derived MD is subjected to CUMSUM process to obtain a monotonically
increasing trend in the end life of the bearing. The paper ignores the Weibull fitting phase
suggested in some of the research articles discussed in the introductory part. Finally, the HI
based on MD-CUMSUM is utilized to train the NARX-NN and estimate the bearing RUL. The
experimental outcomes indicate that the predictions achieved by the proposed method become
more accurate in the later stages of bearing existence. The comparison results demonstrate that
the suggested HI defeat the commonly used indicator SOM-MQE and further the adopted NN
configuration provides better prediction results than the conventional FFNNs. The efficiency and
the accuracy of the proposed method can be enhanced by training the NARX model with a group
of bearings working under different load and speed combinations, which will be a subject of
future study. Upcoming work also aims to combine the method with certain optimization
techniques such as genetic algorithm, particle swarm optimization etc., so that the optimal
network can be selected in a scientific way instead of the hit and trial method used in the paper.
Acknowledgements
The authors thankfully acknowledge the financial support provided by Aeronautics Research and
Development Board (ARDB), Government of India. (Project Grant Number DRD-826-MID).
References
[1] Nandi, S., Toliyat, H., & Li, X. (2005). Condition monitoring and fault diagnosis of
electrical motors-a review. Energy Conversion, IEEE Transactions on, 20(4), 719-729.
[2] Shin, J. H., & Jun, H. B. (2015). On condition based maintenance policy. Journal of
Computational Design and Engineering, 2(2), 119-127.
[3] Jardine, A. K., Lin, D., & Banjevic, D. (2006). A review on machinery diagnostics and
prognostics implementing condition-based maintenance. Mechanical systems and signal
processing, 20(7), 1483-1510.
[4] Heng, A., Zhang, S., Tan, A. C., & Mathew, J. (2009). Rotating machinery prognostics:
State of the art, challenges and opportunities. Mechanical Systems and Signal
Processing, 23(3), 724-739.
[5] Shao, Y., & Nezu, K. (2000). Prognosis of remaining bearing life using neural
networks. Proceedings of the Institution of Mechanical Engineers, Part I: Journal of
Systems and Control Engineering, 214(3), 217-230.
[6] Gebraeel, N., Lawley, M., Liu, R., & Parmeshwaran, V. (2004). Residual life predictions
from vibration-based degradation signals: a neural network approach. Industrial
Electronics, IEEE Transactions on, 51(3), 694-700.
P a g e | 22
[7] Huang, R., Xi, L., Li, X., Liu, C. R., Qiu, H., & Lee, J. (2007). Residual life predictions
for ball bearings based on self-organizing map and back propagation neural network
methods. Mechanical Systems and Signal Processing, 21(1), 193-207.
[8] Tian, Z. (2012). An artificial neural network method for remaining useful life prediction
of equipment subject to condition monitoring. Journal of Intelligent
Manufacturing, 23(2), 227-237.
[9] Ali, J. B., Chebel-Morello, B., Saidi, L., Malinowski, S., & Fnaiech, F. (2015). Accurate
bearing remaining useful life prediction based on Weibull distribution and artificial
neural network. Mechanical Systems and Signal Processing, 56, 150-172.
[10] Yang, Y., Liao, Y., Meng, G., & Lee, J. (2011). A hybrid feature selection scheme for
unsupervised learning and its application in bearing fault diagnosis. Expert Systems with
Applications, 38(9), 11311-11320.
[11] Shen, C., Wang, D., Kong, F., & Peter, W. T. (2013). Fault diagnosis of rotating
machinery based on the statistical parameters of wavelet packet paving and a generic
support vector regressive classifier. Measurement, 46(4), 1551-1564.
[12] Ali, J. B., Fnaiech, N., Saidi, L., Chebel-Morello, B., & Fnaiech, F. (2015). Application
of empirical mode decomposition and artificial neural network for automatic bearing
fault diagnosis based on vibration signals. Applied Acoustics, 89, 16-27.
[13] Hong, H., & Liang, M. (2009). Fault severity assessment for rolling element bearings
using the Lempel–Ziv complexity and continuous wavelet transform. Journal of sound
and vibration, 320(1), 452-468.
[14] Verma, A. K., Sreejith, B., & Srividya, A. (2010). Roller Bearing Defect Prognosis using
Likelihood Parameters and Proportional Hazards Model. International Journal of
Performability Engineering, 6(5), 425.
[15] Qiu, H., Lee, J., Lin, J., & Yu, G. (2003). Robust performance degradation assessment
methods for enhanced rolling element bearing prognostics. Advanced Engineering
Informatics, 17(3), 127-140.
[16] Pan, Y., Chen, J., & Guo, L. (2009). Robust bearing performance degradation assessment
method based on improved wavelet packet–support vector data description. Mechanical
Systems and Signal Processing, 23(3), 669-681.
[17] Pan, Y., Chen, J., & Li, X. (2010). Bearing performance degradation assessment based on
lifting wavelet packet decomposition and fuzzy c-means. Mechanical Systems and Signal
Processing, 24(2), 559-566.
[18] Yu, J. (2011). Bearing performance degradation assessment using locality preserving
projections and Gaussian mixture models. Mechanical Systems and Signal
Processing, 25(7), 2573-2588.
[19] Sheremetov, L., Cosultchi, A., Martínez-Muñoz, J., Gonzalez-Sánchez, A., & Jiménez-
Aquino, M. A. (2014). Data-driven forecasting of naturally fractured reservoirs based on
nonlinear autoregressive neural networks with exogenous input. Journal of Petroleum
Science and Engineering, 123, 106-119.
P a g e | 23
[20] Tse, P. W., & Atherton, D. P. (1999). Prediction of machine deterioration using vibration
based fault trends and recurrent neural networks. Journal of vibration and
acoustics, 121(3), 355-362.
[21] Andalib, A., & Atry, F. (2009). Multi-step ahead forecasts for electricity prices using
NARX: a new approach, a critical analysis of one-step ahead forecasts. Energy
Conversion and Management, 50(3), 739-747.
[22] Peter, W. T., Peng, Y. H., & Yam, R. (2001). Wavelet analysis and envelope detection
for rolling element bearing fault diagnosis—their effectiveness and flexibilities. Journal
of vibration and acoustics, 123(3), 303-310.
[23] Abbasion, S., Rafsanjani, A., Farshidianfar, A., & Irani, N. (2007). Rolling element
bearings multi-fault classification based on the wavelet denoising and support vector
machine. Mechanical Systems and Signal Processing, 21(7), 2933-2945.
[24] Lou, X., & Loparo, K. A. (2004). Bearing fault diagnosis based on wavelet transform and
fuzzy inference. Mechanical systems and signal processing,18(5), 1077-1095.
[25] Samanta, B., & Al-Balushi, K. R. (2003). Artificial neural network based fault
diagnostics of rolling element bearings using time-domain features. Mechanical systems
and signal processing, 17(2), 317-328.
[26] Singh, S., Kumar, A., & Kumar, N. (2014). Motor Current Signature Analysis for
Bearing Fault Detection in Mechanical Systems. Procedia Materials Science,6, 171-177.
[27] Khanam, S., Tandon, N., & Dutt, J. K. (2014). Fault size estimation in the outer race of
ball bearing using discrete wavelet transform of the vibration signal. Procedia
Technology, 14, 12-19.
[28] Misiti, M., Misiti, Y., Oppenheim, G., & Poggi, J. M. (Eds.). (2013). Wavelets and their
Applications. John Wiley & Sons.
[29] Godwin, J. L., Matthews, P., & Watson, C. (2014, June). Robust multivariate statistical
ensembles for bearing fault detection and identification. In Prognostics and Health
Management (PHM), 2014 IEEE Conference on (pp. 1-11). IEEE.
[30] Ren, W. X., & Sun, Z. S. (2008). Structural damage identification by using wavelet
entropy. Engineering Structures, 30(10), 2840-2849.
[31] Jiang, Y., Tang, B., Qin, Y., & Liu, W. (2011). Feature extraction method of wind turbine
based on adaptive Morlet wavelet and SVD. Renewable energy, 36(8), 2146-2153.
[32] Hemmati, F., Orfali, W., & Gadala, M. S. (2016). Roller bearing acoustic signature
extraction by wavelet packet transform, applications in fault detection and size
estimation. Applied Acoustics, 104, 101-118.
[33] Bozchalooi, I. S., & Liang, M. (2008). A joint resonance frequency estimation and in-
band noise reduction method for enhancing the detectability of bearing fault
signals. Mechanical Systems and Signal Processing, 22(4), 915-933.
[34] Yan, R., & Gao, R. X. (2009). Energy-based feature extraction for defect diagnosis in
rotary machines. IEEE Transactions on Instrumentation and Measurement, 58(9), 3130-
3139.
P a g e | 24
[35] Yassin, I. M., Taib, M. N., Hassan, H. A., Zabidi, A., & Tahir, N. M. (2010, December).
Heat exchanger modeling using NARX model with binary PSO-based structure selection
method. In Computer Applications and Industrial Electronics (ICCAIE), 2010
International Conference on (pp. 368-373). IEEE.
[36] Jain, V., Sambi, S., Kumar, S., Kumar, B., & Kumar, S. (2015). Modeling of a UASB
Reactor by NARX Networks for Biogas Production. Chemical Product and Process
Modeling, 10(2), 113-121.
[37] Asgari, H., Chen, X., Morini, M., Pinelli, M., Sainudiin, R., Spina, P. R., & Venturini, M.
(2015). NARX models for simulation of the start-up operation of a single-shaft gas
turbine. Applied Thermal Engineering.
[38] Çoruh, S., Geyikçi, F., Kılıç, E., & Çoruh, U. (2014). The use of NARX neural network
for modeling of adsorption of zinc ions using activated almond shell as a potential
biosorbent. Bioresource technology, 151, 406-410.
[39] Ardalani-Farsa, M., & Zolfaghari, S. (2010). Chaotic time series prediction with residual
analysis method using hybrid Elman–NARX neural networks. Neurocomputing, 73(13),
2540-2553.
[40] Xie, H., Tang, H., & Liao, Y. H. (2009, July). Time series prediction based on NARX
neural networks: An advanced approach. In 2009 International Conference on Machine
Learning and Cybernetics (Vol. 3, pp. 1275-1279). IEEE.
[41] Junsheng, C., Dejie, Y., & Yu, Y. (2006). A fault diagnosis approach for roller bearings
based on EMD method and AR model. Mechanical Systems and Signal
Processing, 20(2), 350-362.
[42] Soylemezoglu, A., Jagannathan, S., & Saygin, C. (2010). Mahalanobis Taguchi system
(MTS) as a prognostics tool for rolling element bearing failures. Journal of
Manufacturing Science and Engineering, 132(5), 051014.
[43] Lin, J., & Chen, Q. (2013). Fault diagnosis of rolling bearings based on multifractal
detrended fluctuation analysis and Mahalanobis distance criterion. Mechanical Systems
and Signal Processing, 38(2), 515-533.
[44] Hu, J., Zhang, L., & Liang, W. (2013). Dynamic degradation observer for bearing fault
by MTS–SOM system. Mechanical Systems and Signal Processing, 36(2), 385-400.
[45] Castagliola, P., & Maravelakis, P. E. (2011). A CUSUM control chart for monitoring
the variance when parameters are estimated. Journal of Statistical Planning and
Inference, 141(4), 1463-1478.
[46] Abbasi, S. A., Riaz, M., & Miller, A. (2012). Enhancing the performance of CUSUM
scale chart. Computers & Industrial Engineering, 63(2), 400-409.
[47] Montgomery, D. C. (2009). Introduction to statistical quality control. John Wiley &
Sons.
[48] Singh, G. K., & Al Kazzaz, S. A. A. S. (2009). Isolation and identification of dry bearing
faults in induction machine using wavelet transform. Tribology international, 42(6), 849-
861.
P a g e | 25
[49] Wu, S. J., Gebraeel, N., Lawley, M., & Yih, Y. (2007). A neural network integrated
decision support system for condition-based optimal predictive maintenance
policy. Systems, Man and Cybernetics, Part A: Systems and Humans, IEEE Transactions
on, 37(2), 226-236.
[50] Tian, Z., Wong, L., & Safaei, N. (2010). A neural network approach for remaining useful
life prediction utilizing both failure and suspension histories. Mechanical Systems and
Signal Processing, 24(5), 1542-1555.
P a g e | 26
Research Highlights
A novel method for remaining useful life (RUL) prediction of bearings is proposed.
Wavelet filtering approach is employed to denoise the bearing signals.
A new HI based on time-domain feature extraction, Mahalanobis distance (MD) and
cumulative sum (CUMSUM) is proposed.
A neural network based approach is used to build the RUL prediction model.
P a g e | 27