You are on page 1of 28

Accepted Manuscript

The use of MD-CUMSUM and NARX neural network for anticipating the re-
maining useful life of bearings

Akhand Rai, S.H. Upadhyay

PII: S0263-2241(17)30464-5
DOI: http://dx.doi.org/10.1016/j.measurement.2017.07.030
Reference: MEASUR 4870

To appear in: Measurement

Received Date: 1 December 2015


Revised Date: 26 April 2017
Accepted Date: 17 July 2017

Please cite this article as: A. Rai, S.H. Upadhyay, The use of MD-CUMSUM and NARX neural network for
anticipating the remaining useful life of bearings, Measurement (2017), doi: http://dx.doi.org/10.1016/
j.measurement.2017.07.030

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers
we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and
review of the resulting proof before it is published in its final form. Please note that during the production process
errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
The use of MD-CUMSUM and NARX neural network for anticipating the remaining useful
life of bearings
Akhand Rai, S H Upadhyay
Department of Mechanical & Industrial Engineering
Indian Institute of Technology, Roorkee.
Email: raiakhand@gmail.com, shumefme@iitr.ac.in

Abstract:
The accurate determination of remaining useful life (RUL) of bearings is of immense
importance in the condition-based maintenance of any rotating machinery. In this paper, a data
driven prognostic approach based on nonlinear autoregressive neural network with eXogenous
Inputs (NARX-NN) in combination with wavelet-filter technique is applied to the RUL
estimation of bearings. Firstly, the vibration signals generated in an experimental test rig are
processed with the proposed wavelet-filter to augment the impulsive characteristics of bearing
signals and improve the quality of fault feature extraction. Secondly, a variety of time-domain
features are extracted from the processed bearing signals. However, these features exhibit a
highly non-monotonic behaviour as the bearing condition degrades. To overcome this drawback,
a new health indicator (HI) based on Mahalanobis distance (MD) criterion and cumulative sum
(CUMSUM) chart is proposed in this paper. Thirdly, the NARX-NN is first designed as a time
delay neural network (TDNN). Then, the derived HI and the age of the bearing are used as inputs
with life percentage of the bearing as output in order to train the TDNN model, which unlike the
usual artificial neural networks (ANNs) performs a one-step ahead prediction of the bearing
RUL. The results suggest that the proposed method can effectively predict the RUL of bearings
with an acceptable degree of accuracy, and outperforms the use of self-organizing map-based
indicator and the traditional FFNNs for RUL inference.
Keywords: Rolling element bearings, Prognostics, Health indicator, NARX neural network,
Remaining useful life.

1. Introduction
Rolling element Bearings (REBs) constitute the most critical components of any rotating
machinery. According to the literature [1], 40–50% of all motor failures occur due to the
malfunctioning of bearings. As a consequence of this, catastrophic breakdown of the machinery
is inevitable. This in turn increases the machine downtime and overhauling costs causing huge
economic losses to the industry. The development of an efficient maintenance strategy therefore
becomes necessary to guarantee the proper functioning of the rotating machinery. Recently,
condition-based maintenance (CBM) has evolved as a competent maintenance technique and
being widely utilized by the industries. In CBM, maintenance events are planned before the
failure of the machinery thereby reducing the jeopardy of terrible breakdowns and hence

P age |1
avoiding unnecessary maintenance expenses [2]. The three key steps for an effective CBM are
data acquisition, data processing and decision-making [3] as depicted in Figure 1.
Diagnostics is an assessment about the existing (and past) health of a system based on
observed symptoms. It deals with fault detection, isolation and identification when fault occurs.
However, prognostics deals with the evaluation of future health of a machine and the time left
before the occurrence of failure provided the current machine age and the past operation profile
are known. The time left before observing a failure is called as remaining useful life (RUL).
Though, a massive amount of literature is available on the diagnostics of REBs, prognostics is an
emerging research area and is gaining momentum during the recent few years. The precise
estimation of RUL is one of the challenging tasks in prognostics of REBs.

Gathering and storing the vibration data through


Data Acquisition
condition monitoring using sensors and transducers

Extraction and selection of features through various


Data Processing
signal processing /signal analysis techniques

Decision-making Maintenance of the asset through diagnosis and


prognosis

Fig. 1. Steps in CBM of rotating machinery

The literature [4] cites three different approaches for prognostics: physics based
prognostics, data-driven prognostics and integrated approaches. Majority of the approaches
applied for the RUL estimation of REBs are data driven approaches that are dependent upon the
availability of the run-to-failure bearing data for training the machine-learning models. The most
familiar machine-learning techniques are those based on neural networks. A good number of
articles are available in the literature utilizing the neural network approach for predicting the
remnant life of bearings [5-9]. For instance, Shao and Nezu [5] utilized the feed forward neural
network (FFNN) to perform single-step and multi-step ahead prediction of the bearing health
state by using a progression based prediction model of the bearing damage process. Gebraeel et
al. [6] utilized single bearing and clustered bearing FFNN models in association with weight
application techniques for anticipating the RUL of bearings. Built upon the same principle,
Huang et al. [7] exploited minimum quantization error derived from self-organizing maps (SOM-
MQE) to train the FFNNs and estimate the bearing RUL. Although the techniques discussed in
these papers are able to forecast the RUL of bearings, there are some limitations too. The weight
application techniques adopted in papers [6,7] necessitate the training of several FFNNs which is
quite time consuming. Besides, the RUL prediction methods proposed in the papers [6,7] require

P age |2
the setting up of failure thresholds that needs a lot of practical experience and is human intuitive
in nature. This difficulty is addressed well in the literature [8,9] where life percentage of the
bearing was employed to build the RUL prediction models and setting up of failure threshold
was not required. The literature [8] used the Weibull universal failure rate function (WUFRF) to
fit the defect features and further supplied them as inputs to artificial neural network (ANN) for
predicting the RUL of pump bearings. Ali et al. [9] again utilized the WUFRF fitted fault
features to train the simplified fuzzy adaptive resonance theory map (SFAM) neural network and
realize prognostics of bearing RUL. WUFRF was deployed to eliminate the noise in the fault
features and attain monotonic indicators which help to minimize the ANN training and testing
inaccuracies. Unlike the previous works [5-7], the researches [8,9] emphasized the use of
monotonic features in determination of bearing RUL. However, the employ of Weibull fitted
measurements for building the prediction model is a tedious task and difficult to implement in
the real-time environment.
Further challenge in bearing prognostics is the extraction of effective fault features from
the bearing signals. The available literature offers a variety of features for diagnosis and
prognosis of bearing defects. Among them, the time-domain features are frequently used [10-12]
and therefore utilized in this paper. Before feature extraction, the raw bearing signals are
processed by a wavelet filter for enhancement of weak periodic impulses generated by the
bearing defects and thus increase the sensitivity of time-domain features to defective bearing
signals. In the past, the wavelet based filters have been used extensively for condition monitoring
of bearings due to their excellent multiresolution capability in time-frequency analysis [13-15].
Hong et al. [13] utilized Lempel-Ziv complexity measure based on continuous wavelet transform
(CWT) filtering to assess the severity of faults in bearings. Verma et al. [14] proposed a morlet
wavelet filter to improve the signal to noise ratio of bearing signals before utilizing them for the
prognosis of bearings. Qiu et al. [15] performed a wavelet denoising of the signals using an
optimal morlet wavelet filter and later proposed SOM-MQE to assess the bearing performance
degradation. However, the defect features extracted from the wavelet-filtered signals still cannot
be used directly to estimate the RUL of bearings due to several reasons: (i) The large
dimensionality of the feature space may introduce a lot of noise in the training process (ii) Some
features may respond to a certain defect of certain severity while the others may not [15].
Consequently, a single HI is desired to comprehensively reflect the bearing condition (iii) the
extracted features show severe fluctuations due to the unstable nature of damage propagation in
bearings and finally lose their monotonicity towards the end lifetime of the bearings. Many
recently developed HIs [16-18] suffer from this deficiency. To address these shortcomings, a HI
based on MD-CUMSUM is proposed in this paper. The MD criterion is exploited to fuse the
time-domain features and CUMSUM is used to extract a monotonically increasing behavior of
the HI as the bearing progresses to its failure.
In this paper, a reduced structure of NARX-NN known as time-delay neural networks
(TDNN) is utilized to forecast the RUL. Both the NARX-NN and TDNN are primarily a type of
recurrent neural networks (RNNs) [19]. The RNNs have already been applied successfully for

P age |3
bearing prognostics in some of the previous works [5,20]. The RNN architecture suggested in
these papers utilize time-window of past values of a given time-series for one-step and multi-step
ahead predictions of the same time-series. These works were however limited only to forecasting
the bearing condition rather than estimating the RUL of bearings. On the other hand, a NARX-
NN or TDNN deals with the one-step and multi-step ahead predictions of a given time-series
with the help of a different time-series. The input time-series which assists in the prediction of
output time-series is called as exogenous inputs [21]. Thus, the constructed HI serves as an
exogenous input and the output is set to life percentage of the bearings for training the TDNN
and predict the bearing RUL. Although, a TDNN topology has been used in the paper, but to
symbolize the use of exogenous inputs, it will be referred to as NARX-NN in future. In addition,
the traditional FFNNs with no memory exhibit forgetting behavior once the current input is
mapped to the current output. Consequently, in this study a NARX-NN is used due to its ability
to take into account the past observations that are critical in anticipating the RUL of bearings.
Thus, this paper in all proposes a novel approach based on a fusion of wavelet filtering, MD-
CUMSUM and NARX-NN to estimate the RUL of bearings. The remainder of this paper is
organized as follows: In the section 2, the wavelet based filtering approach and the technical
background of NARX-NN is discussed. Section 3, introduces the proposed framework for
obtaining the advocated HI and assessment of RUL. The experimental apparatus is described in
section 4 along with the results and discussions. Finally, the work is concluded in section 5.

2. Theoretical Background
2.1 Signature enhancement using wavelet-filter approach
A wavelet is constructed from a single function δ(t) by translation and dilation
1 t p
 (t )    (1)
p  q 
where p is called as scaling parameter, q is the time localization parameter and δ(t) is called the
‘mother wavelet’.
The continuous wavelet transform (CWT) of a signal x(t) with finite energy is the
convolution of x(t) with the complex conjugate of the scaled and shifted version of the mother
wavelet δ(t). Mathematically, the CWT is represented as follows:

1 t p
CWT ( p, q)  
p 
x(t ) * 
 q 
 (2)

where δ*(t) represents the complex conjugate of δ(t)



The mother wavelet δ(t) must be a window function, meaning that 

 (t ) dt   and

must satisfy the admissibility condition as [22]:


2
 ˆ( )
C  2 


d   (3)

P age |4
where ˆ( ) is the fast Fourier transform of δ(t). Eq. (3) can be utilized to obtain the inversion
formula in order to reconstruct the original signal as follows:
 
1 1
x(t ) 
C   CWT ( p, q) (t ) p
 0
2
dp.dq (4)

The wavelet signal processing technique offers a variety of wavelet families to process
the vibration signals. As an example, Abbasion et al. [23], used the discrete Meyer wavelet to
denoise the raw signals for effective diagnosis of the bearings. The articles [24,25] used the
daubechies family of wavelets for signal processing and analysis in their research work. Singh et
al.[26] utilized the morlet and complex morlet wavelet for bearing fault detection in induction
motors. Khanam et al.[27], executed a wavelet decomposition of the bearing vibration signal
using the Symlet-5 wavelet in order to estimate the defect size on the outer race of ball bearing.
Further information on the wavelet families can be seen in the reference [28].
In this paper, three mother wavelets namely Daubechies-10, Morlet and discrete Meyer
wavelet respectively are considered for analysis based on their frequent usage in earlier works.
The optimal wavelet filter is constructed into the following steps:
Step 1 The vibration signal is first decomposed through CWT into different scales using the
Daubechies-10 wavelet. Thus, a set of continuous wavelet coefficients (CWCs) are
generated at the various scales with more resolution in time and frequency domain.
Step 2 In the second step, the Shannon entropy of the CWCs is evaluated by using the relation:
[29,30]:
N
ShCWT ( p)   p( si ) log 2 p( si ) (5)
i 1

where si denotes the ith scale; and p(si) is the energy probability distribution computed
as:
N
p( si )  E ( si ) /  E ( si )
i 1
n
(6)
E ( si )   C ( si , j )
2

j 1

where C ( si , j ) represents the CWCs at scale si, E(si) is energy of CWCs at scale si, N is
the total number of decomposed CWT scales and n is the number of CWCs at each scale
s.
Step 3 The steps (1)-(2) are repeated for the remaining two analyzing wavelets. In this manner,
the Shannon wavelet entropy corresponding to each of the three mother wavelets is
obtained.
Step 4 The wavelet basis with the minimum Shannon wavelet entropy value produces the
sparsest set of wavelet coefficients with maximum periodicity and energy concentration
[31, 32] and therefore selected for further analysis.
Step 5 Once the appropriate analyzing wavelet has been achieved, the next step is to choose the
best scale. Since the objective of wavelet filtering is to retrieve the impulsive content of

P age |5
the signal, the kurtosis values of the wavelet coefficients at various scales are estimated
and the scale with the highest kurtosis value is retained. The kurtosis is often used as a
measure of impulsiveness in the signal but at the same time, it is sensitive to random
impulses produced by noise [33]. Thus, the use of kurtosis only may result in a false
choice of the desired scale. The CWT decomposes the original signal into a number of
frequency components represented by various scales. Thus, the wavelet coefficients at
scales containing the bearing defect frequencies will show a higher energy content than
those of the remaining ones [34]. Hence, in this paper, the product of energy and kurtosis
(PEK) is maximized to obtain the optimal scale for fault feature extraction.

2.2 Theory of NARX network


NARX-NN is a form of dynamic RNN, which had been applied successfully for
modeling nonlinear systems [19,21,35-37]. A NARX-NN is composed of several layers with
feedback connections. The output of the system is fed back to the input for a fixed number of
time steps. The mathematical formulation for the NARX model can be represented as [38]:
y  n  1  f  x(n  d x ),............x(n  1), x(n); y(n  d y ).............. y(n)  (7)
where x(n) and y(n) are the input and output of the system at discrete time step n. dx≥1 and dy≥1,
dx≤ dy are the input and output memory orders, respectively while the function f(.) is a nonlinear
mapping function. The output of the NARX-NN can be fully written as [39]:
 Nh  dx dy

y (n  1)  f o bo   wh 0 . f h  bh   wih x(n  i )   w jh y (n  j )   (8)
 h 1  i 0 j 0  
where wh0, wih and wjh ; i = 1,2,…, dx ; j = 1,2,…, dy ; h = 1,2,…, Nh are the network weight
vectors, bh and b0 are the biases, fh(.) and f0(.) are the activation functions of the hidden and
output layers.
A standard multilayer perceptron (MLP) network, in this work, a feed forward neural
network is used to approximate the unknown nonlinear mapping function f(.) and the ensuing
architecture is known as NARX network. Figure 2 (a) depicts the series-parallel structure of a
NARX neural network with one input layer, one hidden layer and one output layer respectively.
dx and dy are the delayed inputs and ouptus respectively. z-1 reprsents the unit time delay. The
time-series x(n) is referred to as exogenous or external input.
Generally, the typical structure of a NARX network consits of feedback connections from
the output neuron. However, when applied to a univariate time-series prediction, the NARX
network is designed as a feedforward time delay neural network (TDNN) without the existence
of delayed feedback loops [40]. In this case, the output memory order dy of the NARX neural
network is reduced to zero. Mathematically, a TDNN represents a function of the form:
y  n  1  f  x(n  d x ),............x(n  1), x(n)  (9)
Here y(n+1) represents the one-step ahead predicted value of the ouptut time-series using
the past values of input time-series x(n). Thus, in case of TDNN, the Eq. (8) reduces to:

P age |6
 Nh
 dx

y (n  1)  f o bo   wh 0 . f h  bh   wih x(n  i )   (10)
 h 1  i  0  
It should be noted that the TDNN is driven by the past values of exogenous or external
input x(n) in contrast to the previously used RNNs applying the past values of the same time-
series that is to be predicted. As such there is no existing standard for representing a TDNN and
therefore, in this paper, a TDNN has been referred to as a NARX-NN only to distinguish it from
the conventional FFNN arrangements. Fig. 2(b) represents the structure of a TDNN deduced
from NARX-NN by eliminating the tapped delyaed lines for the output time-series.

Fig. 2. (a) Architecture of NARX –NN (b) Architecture of TDNN

3. Proposed framework
The proposed method employed for the RUL prediction of bearings is portrayed in Figure
3. The various steps are summarized as follows:
Step 1: The vibration signals are collected through machine condition monitoring and the
acquired raw signals are denoised using the wavelet based filter proposed in section 2.1
Step 2: In this step, the eight time domain features listed in Table 1 are extracted from the
wavelet filtered signals. The computational formulas for the features are also included in
Table 1. Here C (s, j ) and  (s, j ) are the mean and standard deviations respectively of the
filtered signal data calculated as:
n
C ( si , j )  1/ n C ( si , j ) (11)
j 1

  C (s , j)  C (s , j) 
n 2
i i

σ  si , j   j 1
(12)
n 1

P age |7
Step 3: The third step involves the fusion of extracted time domain features by applying the MD
criterion. MD is a commonly used indicator for detecting the degradation process in
bearings [41-44]. It offers the advantages of scale-invariance and correlation
identification between different features [43]. Let the constructed feature set be denoted
by Fp× q = [f ij] p× q , i = 1,2,…, p ; j = 1,2,…,q where f ij represents the ith observation of
the jth feature, p is the total number of observations and q is the total number of features.
To make sure that each variable contributes equally to the MD measure, the observations
in F are normalized as:
fij  f j
zij  (13)
σj
m
f j  1/ m fij (14)
i 1

 f  fj
m 2
ij
σj  i 1
(15)
m 1
where m is the number of healthy observations in the dataset.
Finally, the MD for a normalized test feature zij is calculated using the following relation:
1
MD(i )  zij C 1 zijT (16)
q
where C is the correlation matrix of the normalized features.
Step 4: After the calculation of MD, the CUMSUM method is initiated to extract the required HI.
CUMSUM chart is a well established methodology to detect an out of control process
[45,46]. However, here the authors are more interested in the ability of CUMSUM charts
to provide a monotonically growing curve once the degradation in the bearing
commences. The CUMSUM calculates the upward and downward deviations from the
target value as follows [47]:
CSi  max  0, MDi  ( μ0  k )  CSi1  

 (17)
CSi  max  0, ( μ0  k )  MDi  CSi 1  
 

where CSi and CSi are the upward and downward CUMSUM, μ0 is the target value
and k is the slack value equal to half of the process shift which is to be detected. The HI
proposed in this paper is then given as:
HI  CSi (18)

P age |8
Machine condition monitoring

Training Bearing Data Testing Bearing Data (unknown RUL)

Wavelet filtering of the signal Wavelet filtering of the signal

Feature Extraction from the filtered Feature Extraction from the filtered
signal signal

Construction of HI based on MD- Construction of HI based on MD-


CUMSUM CUMSUM

Set HI and age of the bearing as Set HI and age of the bearing as
NARX-NN inputs and bearing life inputs to the optimal NARX-NN
percentage as output and predict bearing life percentages
as outputs

Training the NARX-NN model


life percentages
life percentages
Hidden Searching for optimal network with best
neurons performance

Delays

Best NO
Lowest 20 % data-Test
MSE model

YES

RUL prediction

Fig. 3 Flowchart for the proposed method

P age |9
Step 5: Finally the developed HI is used to train the NARX-NN. Besides, the age of the bearing
being crucial to RUL determination [8,9] is also added to the input dataset. The output is
set to life percentage of the bearing. However, before feeding to the neural network, the
features are normalized in the range 0.1-0.9 by using the relation given below [48]:
xn  0.8  ( x  xmin ) / ( xmax  xmin )  0.1 (19)
Step 4: Extensive number of iterations are carried out to arrive at the network giving the best
performance. The optimality of the NARX-NN is decided by two parameters i.e. number
of hidden neurons and number of input delays. The selection rules for the best NARX-
NN are discussed in the upcoming sections. The optimal network is then utilized to
forecast the bearing RUL.

Table 1: Time-domain features

 
4
1 n 1 n C ( si , j )  C ( si , j )
RMS  C (si , j )2 Kurtosis 
n j 1 n j 1  ( si , j ) 4

 
3
1 n C ( si , j )  C ( si , j )
Skewness 
n j 1  ( si , j )3
Crest factor max C (si , j ) RMS

1 n 1 n
Impulse factor max C ( si , j )  C (si , j )
n j 1 Shape factor RMS  C (si , j)
n j 1
2
1 n 
Margin factor max C ( si , j )   C ( si , j )  Peak to Peak max  C (si , j )   min  C (si , j ) 
 n j 1 

4. Application of the proposed method

4.1 Experimental set up:


The test-rig used to acquire the bearing signals is shown in Fig. 4(a) and 4(b)
respectively. The test setup consists of a rotating shaft supported by two 6205 bearings placed in
the pedestals E and G respectively. The bearings possess an inside diameter of 25 mm, an outside
diameter of 52 mm, a ball diameter of 8 mm, number of balls 9 and zero contact angle. An AC
motor drives the shaft at a constant speed of 1500 RPM. A combination of load discs and a
power screw mechanism mounted to the test bed is employed to apply a load of approximately 1
kN on the shaft. The loading provision on the shaft is depicted in Fig. 4(a). The power screw
mechanism is composed of a movable frame, a circular wheel, a lead screw bolt and a lead screw
nut. The movable frame holds the circular wheel on a short axle at its bottom. The lead screw nut
is fixed at the top of the frame. The screw nut is connected to lead screw bolt, which is provided
with a handle to adjust the screw movement. As the lead screw turns, the circular wheel is
pressed against the shaft and rotates with it. The load is applied or released by twisting the lead

P a g e | 10
screw handle in clockwise and anticlockwise directions. Run-to-failure tests are carried out for
collecting the vibration data over the whole life of the bearings. The bearing failure is inferred by
an abrupt increase in RMS values. If a bearing in one of the pedestals fails, the test is terminated.
The next test proceeds with new bearings in both the housings E and G. The failure time of each
bearing is recorded starting from the time of its installation.
The data acquisition arrangement consists of PCB 608A11 piezoelectric ICP
accelerometers mounted on bearing housings and a National Instruments compact Data
Acquisition (NI cDAQ-9174) system programmed with NI LabVIEW software. The vibration
data is collected for a duration of 0.1 seconds after every 10 minutes. The sampling frequency is
set to 20.48 kHz and the length of the vibration signal for a period of 0.1 seconds is 2048 points.
Table 2 provides the details of the failure time and type of defect for the damaged bearings
considered in the paper.

Table 2. Bearings’ failure history


Test bearing Nomenclature Type of failure Failure time
(min)
Bearing 1 B1 Inner race defect ( IRD) 19,650
Bearing 2 B2 Outer race defect (ORD) 14,310
Bearing 3 B3 Inner race defect ( IRD) 20,880
Bearing 4 B4 Outer race defect (ORD) 16,640
Bearing 5 B5 Ball defect (BD) 8,250

Fig. 4(a). Loading arrangement on the bearing test rig


(1) Load disc, (2) Lead screw, (3) Lead screw nut, (4) Rotating circular disc, (5) Short axle, (6) Frame, (7)
Handle

P a g e | 11
Fig. 4(b). Bearing test rig
(A) AC motor, (B) Speed control unit, (C) Tachometer, (D) Flexible Coupling , (E) Bearing housing 1,
(F) Accelerometer 1 (G) Bearing housing 2, (H) Accelerometer 2, (I) Load disc, (J) Power screw
arrangement, (K) Data acquisition unit, (L) Computer

4.2 Selection of optimal network


The structure of the NARX network used in this work comprises of a three-layer
feedforward neural network:
(i) An input layer with five input nodes corresponding to the five variables and associated
recurrent nodes
(ii) A hidden layer with log-sigmoid as transfer function and optimal number of hidden
neurons obtained through iterations
(iii) An output layer with purelinear as transfer function and a single neuron only.
The training algorithm used for the NARX network is the well-known Levenberg–
Marquardt (LM) algorithm [49]. The network performance is assessed based on the mean square
error (MSE) defined by the following equation:
1 N i
e 
N i 1
( y p  yai )2 (20)

where y ip and yai are the predicted and actual values at inspection point i, respectively. N is the
number of inspection points.
During the training phase, the feature vectors are divided into three subsets: the training
set, the validation set and the test set. The training data is used to adjust the weights and biases
while the validation data is used to stop the training when the network generalization stops
improving. The test data does not affect the training process and judges the network performance
in an independent manner. In the early stages of training process, the MSE for both the training

P a g e | 12
set and validation set decreases but after a certain point, the MSE for the validation set starts
increasing. This implies that the network has started overfitting the training data and the training
should be stopped [50]. This process of generalization is called as early stopping. The problem of
overfitting means that the network performs well with the validation dataset but gives poor
performance with the test data set. As such, the optimal network in this paper is the one, which
produces the lowest test MSE.

4.3 Results and discussions


All the analysis in this work has been performed in the MATLAB software environment.
The results of the wavelet filtering for a defect signal sample acquired from bearing B2 is shown
in Fig. 5(a)-(f). Fig. 5(a) shows the raw defect signal sample. Fig. 5(b) depicts the Shannon
entropies computed from the CWCs obtained by processing the raw signal with Db-10, Morlet
and Discrete-meyer wavelet bases. The Morlet wavelet produces minimum Shannon entropy of
4.64 and therefore selected for further investigation. Fig. 5(c) plots the Morlet wavelet function.
Fig. 5(d) portrays the CWCs in time-scale plane attained by processing the raw signal with
Morlet wavelet. Fig. 5(e) shows the variation of PEK values at different scales. The maximum
value of PEK is observed at scale no. 43. Fig. 5(f) finally portrays the filtered signal
corresponding to the raw signal shown in Fig. 5(a). After wavelet filtering of the signals, the
defect features are extracted from them. As an illustration, Fig. 6 shows the trend of eight time-
domain features over the entire lifetime of bearing B2. The first 500 feature vectors from each of
the bearings are taken as the healthy or reference data and Eqs. (13)-(18) are utilized to calculate
the MDs for the test feature vectors. A plot of the MD for bearing B2’s failure history is given in
Fig. 7(a). The MD combines the fault information from individual time-domain features and is
therefore capable of reflecting the bearing degradation more accurately. However, it shows a
highly fluctuating behaviour as the bearing damage becomes more and more severe. To eliminate
the noise in MD, CUMSUM is applied and the resultant HI is plotted in Fig. 7(b). The value of
μ0 in Eq. (17) is set to average of the first 500 healthy MD values, the shift to be detected in MD
measure is set to 3 times the standard deviation of first 500 MD values (3σ) and the value of k is
taken as 1.5σ. It is easily concluded from Fig. 7(b) that the proposed HI is free of unwanted
oscillations and rises monotonically as the bearing life approaches to its end. Thus, it serves as an
appropriate feature for the purpose of RUL prediction. The HIs for the remaining bearings are
produced in Fig. 8.
After the construction of HI, the next step is to train the NARX-NN. The inputs to
the NARX-NN are taken as the HI and age of the bearing while the output is set to life
percentage of the bearing. Life percentage of equipment being proportional to the lifetime is an
excellent indicator of bearing health condition. The choice of output as the life percentage is
reasonable because the health state of a machine deteriorates as its age increases. The bearing
will fail completely when the life percentage reaches 100% and there is no requirement of setting
up a threshold for RUL assessment. Note that if the life percentage of a bearing is T%, the
network target will be set equal to T/100. To validate the effectiveness of the proposed approach,

P a g e | 13
two cases have been considered in the paper: Case I utilizes the bearings B2-B5 for training the
NARX-NN and bearing B1 is employed for testing. Case II tests the bearing B3 against the
NARX-NN trained by the remaining bearings. The input data is normalized between 0.1 and 0.9
using Eq. (19) so as to improve the efficiency of the NARX-NN. The test bearing data is also
normalized using the maximum and minimum values of training data. Test values less than the
minimum value are set to 0.1 and greater than the maximum value are set to 0.9. The data is
initially separated in a random manner into 80% for constituting the training and validation sets
and the remaining 20% for forming the test set. Keeping the test set fixed at 20%, the validation
set is varied in the range 10-20-30% and the division that yields the minimum test MSE is
retained. Based on iterations, the data is finally divided as 60% for training, 20% for validation
and 20% for testing respectively. After a considerable number of iterations and retraining, the
parameters for the optimal network with lowest 20% test-set MSE are attained and are provided
in Table 3. The MSEs between the actual and predicted life percentages over the complete
lifetime of bearings for cases I and II are also provided in Table 3.

Fig. 5. (a) Raw defective signal (b) Shannon wavelet entropies for the three mother wavelets (c)
Shape of the morlet wavelet (d) CWCs at scales 1:64 using Morlet wavelet (c) PEK variation
with scales (d) Filtered Signal.

P a g e | 14
Fig. 6. Features extracted over the complete lifetime of bearing B2

Fig. 7. (a) MD and (b) HI for the entire failure history of bearing B2

P a g e | 15
Fig. 8. Proposed HI for bearings B1, B3, B4 and B5

The training results for the optimal NARX network for Case I are shown in Figs. 9(a) -9(c)
respectively. Fig. 9(a) plots the training response of the network. Fig. 9(b) depicts the learning
curves for the optimal network. Fig. 9(c) further indicates that a higher value of regression close
to one is obtained for each of the training, validation and independent test datasets. Figs. 10(a)
and 10(b) compare the predicted and actual life percentages for the two tested bearings B1 and
B3. From the Figs. 10(a) and (b), a simple relation can be established to predict the RUL of
bearings which is given by:
RUL  (T100 / Tc )  tc  tc (21)
where T100 and Tc are the total life percentage and the current life percentage respectively. tc is
the time for which the bearing has run or the current age of the bearing.

Table 3. Training and testing results of the optimal NARX–NN prediction model
Results Case I Case II
Training bearings B2, B3, B4, B5 B1, B2, B4, B5
Testing bearings B1 B3
Optimal no. of hidden neurons 6 20
Optimal no. of input delays 10 8
Training MSE 0.0118 0.0103
Validation MSE 0.0119 0.0097
Independent (20%) test data MSE 0.0113 0.0123
Overall prediction MSEs for bearings B1 and B3 0.0055 0.0099

P a g e | 16
Fig. 9. Training Results for best NAXR-NN in Case I (a) Output of the network (b) Performance
of the network and (c) Regression plots for the 60% training, 20% validation and 20% test data.

As an example, suppose the examiner is interested to calculate the RUL of the bearing B1
after 11,000 minutes of operation. From fig. 10(a), it can be visualized that the current life
percentage at 11,000 minutes is (0.579 x 100 = 57.9%) and the total life percentage always
remains equal to 100%. Thus, the current age of the bearing is 11,000 minutes. Using the relation
(21), the predicted RUL of the bearing is 7,998 minutes and the predicted failure time would be
11,000+7,998 = 18,998 minutes. The actual RUL of the bearing is, however, equal to 8,650
minutes. The following expression can measure the prediction accuracy:
 RULpredicted  RULactual 
Accuracy  1   (22)
 Ractual 
 
Thus, it is observed that the proposed approach predicts the RUL of bearing B1 at the inspected
time-step with a satisfactory degree of accuracy that is equal to 92.46%. The RUL for the

P a g e | 17
bearings B1 and B3 at various time-steps are calculated in a similar manner and are shown in
Figs. 11(a) and (b) respectively.

Fig. 10. Prediction results for Cases I and II (a) Actual vs. predicted life percentage for bearing
B1 (b) Actual vs. predicted life percentage for bearing B3

Fig. 11. Prediction results for Cases I and II (a) Actual vs. predicted RUL for bearing B1 (b)
Actual vs. predicted RUL for bearing B3

It is apparent that the damage during the later stages of bearing life grows at a much
faster rate as compared to the beginning ones and therefore an accurate determination of RUL

P a g e | 18
during the end phases becomes more essential in order to prevent the sudden failure of bearings.
From Figs. 11 (a) and (b), it is seen that in the beginning the predicted RUL differs significantly
from the actual RUL. However, as the age of the bearing advances, the predicted RUL shifts
closer to the actual RUL and the accuracy of prediction increases. Figs. 11 (a) and (b) also
express that the RUL values predicted at majority of the inspection points are below the actual
RUL values which is a desirable factor in bearing prognostics. An underestimated RUL value
provides an early warning of the bearing failure and subsequently offers enough time-duration to
implement the maintenance actions. At the same time, Table 3 indicates that the prediction MSEs
achieved for the test bearings B1 and B3 are very small and good enough for the purpose of RUL
evaluation. The obtained results clearly demonstrate the effectiveness of the advocated technique
in accurate assessment of the bearing RUL.

4.4 Comparisons
The proposed method differs from the previous works in two aspects: (i) it uses a new
monotonic index based on MD criterion and CUMSUM (ii) it applies a different ANN structure
i.e. NARX-NN as compared to the usual FFNN models utilized earlier. Considering the two
perspectives, a comparison with other works is established in the subsequent paragraphs.
To confirm the effectiveness of the suggested HI, we compared it with the SOM-MQE
used in the references [7, 15]. The first 500 normalized time-domain feature vectors were used to
build the SOM model and the SOM-MQE was calculated for each of the bearings B1-B5. Then,
the SOM-MQEs and the age of the bearing were supplied as inputs to train the NARX-NN and
predict the bearing RUL. The corresponding results for Case I are plotted in Figs. 12(a) and (b).
As discussed before, the prediction accuracy in the late life of the bearings is more necessary
than in the beginning phases. Therefore, to judge the superiority of the recommended HI in the
final period, the prediction MSEs over the last 50 time-steps are also calculated. Table 4 provides
the overall MSE and the MSE obtained over the last 50-time steps using the MD-CUMSUM HI
and the SOM-MQE. It is revealed that the overall prediction and the prediction for last 50 time
steps using the proposed HI are 10% and 99.6% more accurate as compared to the SOM-MQE.
Thus, the use of MD-CUMSUM based HI is clearly advantageous over the SOM-MQE. It is
worth mentioning that the CUMSUM can also be applied to the prevailing degradation indicators
such as SOM-MQE to improve their performance and achieve better results. However, the task
of validating the efficacy of CUMSUM with other HIs is beyond the reach of this paper and is
left to the interested researchers.
The methodology proposed in this paper finds similarity to the literature [8] in the sense
that an ANN technique is adopted for building the prediction model and life percentage of the
bearing is used to calculate the RUL. Although the present work is inspired from the literature
[8], there exist several dissimilarities: (i) Instead of traditional RMS and kurtosis features, a more
robust measure i.e. MD is utilized in this work (ii) As an alternative to the WFRF fitted features,
CUMSUM is applied to extract the monotonicity from the MD. Consequently, the time spent in
fitting the features is eradicated. Moreover, the fitting of test feature samples being collected in

P a g e | 19
an online manner i.e. one sample at a time may give rise to errors in terms of best approximating
WFRF function (iii) The ANN used in the ref. [8] utilizes the measurement values at the current
and previous inspection points for training purpose. However, here a NARX-NN is utilized
which takes the measurement values at the past inspection points for training and outputs a one-
step ahead predicted value of the life-percentage or RUL. (iv) The selection of optimal prediction
model in the earlier work is based on the training MSE and the validation set is constructed using
the actual measurements from all the available failure histories. However, in this work, a data
division approach is employed to build the validation set and further an independent test set is
exploited to choose the best model thereby eliminating the possibility of errors arising due to
overfitting in the ANN. Let us compare the effectiveness of proposed NARX-NN structure with
the previously used ANN structure consisting of two hidden layers with three neurons in the first
hidden layer and two neurons in the second hidden layer. The ANN in [8] uses the HI and the
age of the bearings at the current and previous inspection points as inputs. The idea is to utilize
the current measurement values as well as the change of measurement values at these points. In,
our case, the NARX-NN utilizes the HI and the age values at the previous inspection points only.
Here, the motive is to use the previous measurement values to attain a one-step ahead predicted
value of the life percentage. To ensure a fair comparison, the structure of the NARX-NN is taken
identical to the ANN in [8]. The NARX-NN is constructed using two hidden layers where the
number of neurons are set to three and two respectively. The number of input delays in NARX-
NN is taken as two to have the number of input nodes identical to the ANN in [8]. As suggested
in [8], the training sets for the NARX-NN and ANN are composed using the derived HI
measurements and the validation sets are composed of the actual MD measurements. The
optimal NARX-NN is again selected based on the training MSE itself as for the earlier ANN.
Here, the only factor in which the authors are interested is the supply of inputs at dissimilar
inspection points i.e. at the current and previous time-steps for the ANN and at the previous
time-steps only for the NARX-NN. Both, the ANN and the NARX-NN are trained several times
and the ones giving the minimum training MSEs are preserved for RUL prediction. The
associated prediction MSEs for Case I are tabulated in Table 5. From Table 5, it is easily
concluded that although the overall prediction MSEs for both the NNs remains almost the same,
the NARX-NN performs 58% better than the ANN used before in terms of MSE over the last 50
time-steps. Also, the proposed NARX-NN structure imposes no restriction on varying the
number of delays that can be optimized to achieve further lower prediction MSEs.
We further extended our contrast to simple FFNN. The structure of the FFNN is taken
alike the NARX-NN i.e. it consists of an input layer, a hidden layer and an output layer. The
FFNN, however, utilizes the HI and the bearing age at current inspection points only as the
inputs. The output of the network remains the same i.e. the bearing life percentage. Here, the
optimal FFNN is decided by the number of hidden neurons only and is chosen in an analogous
manner as discussed in this paper for the NARX-NN. The number of hidden neurons is finally
set to 10 and the network is run eight times to attain the best model. The prediction results of the
FFNN are shown in Table 5. The NARX-NN improves the overall prediction accuracy by 8%

P a g e | 20
and final 50 time-steps prediction accuracy by 50.33% as compared to the traditional FFNN
model.
The authors would like to mention that the comparisons of NN models should be treated
valid only with the given settings of the network parameters. A more truthful comparison would
require the use of optimization methods such as genetic algorithm, particle swarm optimization
etc. to make a more accurate choice of the hidden neurons or both the hidden neurons and the
input delays depending upon the network architecture. This area will be studied in future.
However, no matter what NN model is being utilized, the HI based on MD-CUMSUM gives
lower prediction errors in all the cases and can be used effectively to predict the bearing RUL. In
addition, a one-step ahead output value predicted by the NARX-NN obviously provides an early
indication of the RUL, which is very much required when the bearing is close to failure.

Table 4. Comparison results for the HI suggested in the paper and the SOM-MQE for Case I
Results SOM-MQE Suggested HI
Training bearings B2, B3, B4, B5 B2, B3, B4, B5
Testing bearings B1 B1
Prediction MSE over the full lifetime 0.0061 0.0055
Prediction MSE over the last 50 time-steps 0.0042 1.65e-05

Table 5. Comparison results of the NARX-NN with ANN used in [8] and the traditional FFNN
Results NARX-NN ANN [8] FFNN
Training bearings B2, B3, B4, B5 B2, B3, B4, B5 B2, B3, B4, B5
Testing bearings B1 B1 B1
Overall prediction MSE 0.0059 0.0061 0.0064
MSE over the last 50 time-steps 1.49e-04 3.58e-04 3.00e-04

Fig. 12. Prediction results for Case I using SOM-MQE (a) Actual vs. predicted life percentage
for bearing B1 (b) Actual vs. predicted RUL for bearing B1

P a g e | 21
5. Conclusions
This paper proposes a neural network approach in combination with a wavelet-based
denoising method to assess the RUL of a bearing. The raw signals are treated with the wavelet
filter before the feature extraction process. The extracted features are then fused using the MD
criterion. The derived MD is subjected to CUMSUM process to obtain a monotonically
increasing trend in the end life of the bearing. The paper ignores the Weibull fitting phase
suggested in some of the research articles discussed in the introductory part. Finally, the HI
based on MD-CUMSUM is utilized to train the NARX-NN and estimate the bearing RUL. The
experimental outcomes indicate that the predictions achieved by the proposed method become
more accurate in the later stages of bearing existence. The comparison results demonstrate that
the suggested HI defeat the commonly used indicator SOM-MQE and further the adopted NN
configuration provides better prediction results than the conventional FFNNs. The efficiency and
the accuracy of the proposed method can be enhanced by training the NARX model with a group
of bearings working under different load and speed combinations, which will be a subject of
future study. Upcoming work also aims to combine the method with certain optimization
techniques such as genetic algorithm, particle swarm optimization etc., so that the optimal
network can be selected in a scientific way instead of the hit and trial method used in the paper.

Acknowledgements
The authors thankfully acknowledge the financial support provided by Aeronautics Research and
Development Board (ARDB), Government of India. (Project Grant Number DRD-826-MID).

References

[1] Nandi, S., Toliyat, H., & Li, X. (2005). Condition monitoring and fault diagnosis of
electrical motors-a review. Energy Conversion, IEEE Transactions on, 20(4), 719-729.
[2] Shin, J. H., & Jun, H. B. (2015). On condition based maintenance policy. Journal of
Computational Design and Engineering, 2(2), 119-127.
[3] Jardine, A. K., Lin, D., & Banjevic, D. (2006). A review on machinery diagnostics and
prognostics implementing condition-based maintenance. Mechanical systems and signal
processing, 20(7), 1483-1510.
[4] Heng, A., Zhang, S., Tan, A. C., & Mathew, J. (2009). Rotating machinery prognostics:
State of the art, challenges and opportunities. Mechanical Systems and Signal
Processing, 23(3), 724-739.
[5] Shao, Y., & Nezu, K. (2000). Prognosis of remaining bearing life using neural
networks. Proceedings of the Institution of Mechanical Engineers, Part I: Journal of
Systems and Control Engineering, 214(3), 217-230.
[6] Gebraeel, N., Lawley, M., Liu, R., & Parmeshwaran, V. (2004). Residual life predictions
from vibration-based degradation signals: a neural network approach. Industrial
Electronics, IEEE Transactions on, 51(3), 694-700.

P a g e | 22
[7] Huang, R., Xi, L., Li, X., Liu, C. R., Qiu, H., & Lee, J. (2007). Residual life predictions
for ball bearings based on self-organizing map and back propagation neural network
methods. Mechanical Systems and Signal Processing, 21(1), 193-207.
[8] Tian, Z. (2012). An artificial neural network method for remaining useful life prediction
of equipment subject to condition monitoring. Journal of Intelligent
Manufacturing, 23(2), 227-237.
[9] Ali, J. B., Chebel-Morello, B., Saidi, L., Malinowski, S., & Fnaiech, F. (2015). Accurate
bearing remaining useful life prediction based on Weibull distribution and artificial
neural network. Mechanical Systems and Signal Processing, 56, 150-172.
[10] Yang, Y., Liao, Y., Meng, G., & Lee, J. (2011). A hybrid feature selection scheme for
unsupervised learning and its application in bearing fault diagnosis. Expert Systems with
Applications, 38(9), 11311-11320.
[11] Shen, C., Wang, D., Kong, F., & Peter, W. T. (2013). Fault diagnosis of rotating
machinery based on the statistical parameters of wavelet packet paving and a generic
support vector regressive classifier. Measurement, 46(4), 1551-1564.
[12] Ali, J. B., Fnaiech, N., Saidi, L., Chebel-Morello, B., & Fnaiech, F. (2015). Application
of empirical mode decomposition and artificial neural network for automatic bearing
fault diagnosis based on vibration signals. Applied Acoustics, 89, 16-27.
[13] Hong, H., & Liang, M. (2009). Fault severity assessment for rolling element bearings
using the Lempel–Ziv complexity and continuous wavelet transform. Journal of sound
and vibration, 320(1), 452-468.
[14] Verma, A. K., Sreejith, B., & Srividya, A. (2010). Roller Bearing Defect Prognosis using
Likelihood Parameters and Proportional Hazards Model. International Journal of
Performability Engineering, 6(5), 425.
[15] Qiu, H., Lee, J., Lin, J., & Yu, G. (2003). Robust performance degradation assessment
methods for enhanced rolling element bearing prognostics. Advanced Engineering
Informatics, 17(3), 127-140.
[16] Pan, Y., Chen, J., & Guo, L. (2009). Robust bearing performance degradation assessment
method based on improved wavelet packet–support vector data description. Mechanical
Systems and Signal Processing, 23(3), 669-681.
[17] Pan, Y., Chen, J., & Li, X. (2010). Bearing performance degradation assessment based on
lifting wavelet packet decomposition and fuzzy c-means. Mechanical Systems and Signal
Processing, 24(2), 559-566.
[18] Yu, J. (2011). Bearing performance degradation assessment using locality preserving
projections and Gaussian mixture models. Mechanical Systems and Signal
Processing, 25(7), 2573-2588.
[19] Sheremetov, L., Cosultchi, A., Martínez-Muñoz, J., Gonzalez-Sánchez, A., & Jiménez-
Aquino, M. A. (2014). Data-driven forecasting of naturally fractured reservoirs based on
nonlinear autoregressive neural networks with exogenous input. Journal of Petroleum
Science and Engineering, 123, 106-119.

P a g e | 23
[20] Tse, P. W., & Atherton, D. P. (1999). Prediction of machine deterioration using vibration
based fault trends and recurrent neural networks. Journal of vibration and
acoustics, 121(3), 355-362.
[21] Andalib, A., & Atry, F. (2009). Multi-step ahead forecasts for electricity prices using
NARX: a new approach, a critical analysis of one-step ahead forecasts. Energy
Conversion and Management, 50(3), 739-747.
[22] Peter, W. T., Peng, Y. H., & Yam, R. (2001). Wavelet analysis and envelope detection
for rolling element bearing fault diagnosis—their effectiveness and flexibilities. Journal
of vibration and acoustics, 123(3), 303-310.
[23] Abbasion, S., Rafsanjani, A., Farshidianfar, A., & Irani, N. (2007). Rolling element
bearings multi-fault classification based on the wavelet denoising and support vector
machine. Mechanical Systems and Signal Processing, 21(7), 2933-2945.
[24] Lou, X., & Loparo, K. A. (2004). Bearing fault diagnosis based on wavelet transform and
fuzzy inference. Mechanical systems and signal processing,18(5), 1077-1095.
[25] Samanta, B., & Al-Balushi, K. R. (2003). Artificial neural network based fault
diagnostics of rolling element bearings using time-domain features. Mechanical systems
and signal processing, 17(2), 317-328.
[26] Singh, S., Kumar, A., & Kumar, N. (2014). Motor Current Signature Analysis for
Bearing Fault Detection in Mechanical Systems. Procedia Materials Science,6, 171-177.
[27] Khanam, S., Tandon, N., & Dutt, J. K. (2014). Fault size estimation in the outer race of
ball bearing using discrete wavelet transform of the vibration signal. Procedia
Technology, 14, 12-19.
[28] Misiti, M., Misiti, Y., Oppenheim, G., & Poggi, J. M. (Eds.). (2013). Wavelets and their
Applications. John Wiley & Sons.
[29] Godwin, J. L., Matthews, P., & Watson, C. (2014, June). Robust multivariate statistical
ensembles for bearing fault detection and identification. In Prognostics and Health
Management (PHM), 2014 IEEE Conference on (pp. 1-11). IEEE.
[30] Ren, W. X., & Sun, Z. S. (2008). Structural damage identification by using wavelet
entropy. Engineering Structures, 30(10), 2840-2849.
[31] Jiang, Y., Tang, B., Qin, Y., & Liu, W. (2011). Feature extraction method of wind turbine
based on adaptive Morlet wavelet and SVD. Renewable energy, 36(8), 2146-2153.
[32] Hemmati, F., Orfali, W., & Gadala, M. S. (2016). Roller bearing acoustic signature
extraction by wavelet packet transform, applications in fault detection and size
estimation. Applied Acoustics, 104, 101-118.
[33] Bozchalooi, I. S., & Liang, M. (2008). A joint resonance frequency estimation and in-
band noise reduction method for enhancing the detectability of bearing fault
signals. Mechanical Systems and Signal Processing, 22(4), 915-933.
[34] Yan, R., & Gao, R. X. (2009). Energy-based feature extraction for defect diagnosis in
rotary machines. IEEE Transactions on Instrumentation and Measurement, 58(9), 3130-
3139.

P a g e | 24
[35] Yassin, I. M., Taib, M. N., Hassan, H. A., Zabidi, A., & Tahir, N. M. (2010, December).
Heat exchanger modeling using NARX model with binary PSO-based structure selection
method. In Computer Applications and Industrial Electronics (ICCAIE), 2010
International Conference on (pp. 368-373). IEEE.
[36] Jain, V., Sambi, S., Kumar, S., Kumar, B., & Kumar, S. (2015). Modeling of a UASB
Reactor by NARX Networks for Biogas Production. Chemical Product and Process
Modeling, 10(2), 113-121.
[37] Asgari, H., Chen, X., Morini, M., Pinelli, M., Sainudiin, R., Spina, P. R., & Venturini, M.
(2015). NARX models for simulation of the start-up operation of a single-shaft gas
turbine. Applied Thermal Engineering.
[38] Çoruh, S., Geyikçi, F., Kılıç, E., & Çoruh, U. (2014). The use of NARX neural network
for modeling of adsorption of zinc ions using activated almond shell as a potential
biosorbent. Bioresource technology, 151, 406-410.
[39] Ardalani-Farsa, M., & Zolfaghari, S. (2010). Chaotic time series prediction with residual
analysis method using hybrid Elman–NARX neural networks. Neurocomputing, 73(13),
2540-2553.
[40] Xie, H., Tang, H., & Liao, Y. H. (2009, July). Time series prediction based on NARX
neural networks: An advanced approach. In 2009 International Conference on Machine
Learning and Cybernetics (Vol. 3, pp. 1275-1279). IEEE.
[41] Junsheng, C., Dejie, Y., & Yu, Y. (2006). A fault diagnosis approach for roller bearings
based on EMD method and AR model. Mechanical Systems and Signal
Processing, 20(2), 350-362.
[42] Soylemezoglu, A., Jagannathan, S., & Saygin, C. (2010). Mahalanobis Taguchi system
(MTS) as a prognostics tool for rolling element bearing failures. Journal of
Manufacturing Science and Engineering, 132(5), 051014.
[43] Lin, J., & Chen, Q. (2013). Fault diagnosis of rolling bearings based on multifractal
detrended fluctuation analysis and Mahalanobis distance criterion. Mechanical Systems
and Signal Processing, 38(2), 515-533.
[44] Hu, J., Zhang, L., & Liang, W. (2013). Dynamic degradation observer for bearing fault
by MTS–SOM system. Mechanical Systems and Signal Processing, 36(2), 385-400.
[45] Castagliola, P., & Maravelakis, P. E. (2011). A CUSUM control chart for monitoring
the variance when parameters are estimated. Journal of Statistical Planning and
Inference, 141(4), 1463-1478.
[46] Abbasi, S. A., Riaz, M., & Miller, A. (2012). Enhancing the performance of CUSUM
scale chart. Computers & Industrial Engineering, 63(2), 400-409.
[47] Montgomery, D. C. (2009). Introduction to statistical quality control. John Wiley &
Sons.
[48] Singh, G. K., & Al Kazzaz, S. A. A. S. (2009). Isolation and identification of dry bearing
faults in induction machine using wavelet transform. Tribology international, 42(6), 849-
861.

P a g e | 25
[49] Wu, S. J., Gebraeel, N., Lawley, M., & Yih, Y. (2007). A neural network integrated
decision support system for condition-based optimal predictive maintenance
policy. Systems, Man and Cybernetics, Part A: Systems and Humans, IEEE Transactions
on, 37(2), 226-236.
[50] Tian, Z., Wong, L., & Safaei, N. (2010). A neural network approach for remaining useful
life prediction utilizing both failure and suspension histories. Mechanical Systems and
Signal Processing, 24(5), 1542-1555.

P a g e | 26
Research Highlights

 A novel method for remaining useful life (RUL) prediction of bearings is proposed.
 Wavelet filtering approach is employed to denoise the bearing signals.
 A new HI based on time-domain feature extraction, Mahalanobis distance (MD) and
cumulative sum (CUMSUM) is proposed.
 A neural network based approach is used to build the RUL prediction model.

P a g e | 27

You might also like