A Feature Extraction and Machine Learning Framework For Bearing Fault Diagnosi

Renewable Energy 191 (2022) 987e997
Contents lists available at ScienceDirect
Renewable Energy
journal homepage: www.elsevier.com/locate/renene
A feature extraction and machine learning framework for bearing fault

diagnosis
Bodi Cui a, 1, Yang Weng b, *, Ning Zhang c, *
a
School of Mechanical Engineering, Jiangsu Ocean University, Lianyungang, 222005, China
b
School of Electrical, Computer and Energy Engineering, Arizona State University, Tempe, AZ, USA
c
State Key Laboratory of Power Systems, Department of Electrical Engineering, Tsinghua University, Beijing, 100084, China
a r t i c l e i n f o a b s t r a c t
Article history: Wind power generation has been widely adopted due to its renewable nature and decreasing capital cost
Received 12 April 2021 per kW. However, existing equipment ages rapidly, leading to higher failure rates, greater operation and
Received in revised form maintenance costs, and worsening safety conditions, calling for improved condition monitoring and fault
23 January 2022
diagnosis for wind turbines. Past methods utilize physical models, but they are only successful in lab-
Accepted 10 April 2022
Available online 14 April 2022
oratory environments. As increasing data are becoming available, there are methods applying machine
learning without careful discrimination, leading to low accuracy. To solve this problem, first this paper
proposes to conduct unsupervised learning to understand data properties, e.g., structural density. Sub-
Index Terms:
Bearing fault diagnosis
sequently, the sensitivity analysis is conducted to extract the significant features and to avoid overfitting.
Feature extraction The sensitivity of various features that are characteristics of wind turbine bearings may vary significantly
Machine learning under different working conditions. During such a process, the piece-wise properties are studied to
Nonstationary signals improve supervised learning. By combining the properties of data and regression, a three-stage learning
Time-frequency analysis algorithm is proposed to refine and learn the most useful information for turbine bearing fault diagnosis.
The proposed framework is validated by using real data from diversified data sets for nonstationary
vibration signals of bearings.
© 2022 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND
license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
1. Introduction existing installations. Wind generators are mounted on top of high

towers with enclosed shells, thus preventing manual monitoring
Wind energy has been widely recognized as one of the most and diagnosis. Therefore, different from solar panels, the higher
efficient, economical, and environmentally friendly distributed uncertainty with respect to the rotating mechanism destabilizes
energy resources. For such reasons, installed wind power capacity revenues, creating a large O&M cost. For example [3], shows a
has increased substantially over the last decade. The worldwide potential 30% reduction in O&M cost in offshore environments if
wind power capacity reached 650 GW by the end of 2019. China better diagnosis algorithms can be deployed. This potential leads to
alone has installed more than 200 GW and will double that number a strong interest in optimizing the O&M strategies in both industry
by 2030 [1]. Though wind energy is not currently the largest energy and academia.
source, it has the largest number of generators in the power system O&M costs are more critical for wind turbines (WTs) than other
since the capacity of each wind generator is much smaller than that generators because the corresponding malfunction is much more
of thermal and hydroelectric generators [2]. With continuous complex [4]. The reasons are as follows: 1) The turbines continu-
installation, aging wind generators present increasing challenges ously operate seven days per week, 24 h per day; 2) There are many
for sustainable operation and economical maintenance (O&M) of rotating machines that cannot possibly operate maintenance-free,
even with a perfect design; comparatively, solar panels include
no rotating components and can reach a maintenance-free condi-
tion if carefully designed; and 3) Wind turbines are directly
* Corresponding authors. exposed to the environment, where the wind speed is variable, thus
E-mail addresses: cui_bodi@jou.edu.cn (B. Cui), yang.weng@asu.edu (Y. Weng),
ningzhang@tsinghua.edu.cn (N. Zhang).
introducing variable mechanical loads, which add greater stress to
1
This work was done when he was at Arizona State University as a visiting the mechanical parts. Such a dilemma leads to the fact that some
scholar. WTs can fail, sometimes frequently.
https://doi.org/10.1016/j.renene.2022.04.061
0960-1481/© 2022 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
B. Cui, Y. Weng and N. Zhang Renewable Energy 191 (2022) 987e997
To reduce the failure rate, it is crucial to apply reliable and cost- turbine gearbox [31]. Several studies were proposed to detect the
effective condition monitoring (CM) techniques so that the best wind turbine blades faults (such as cracks, erosion, and pitch angle
maintenance intervention can be determined in time to allow the twists) using ML [32].
machine to return to work as early as possible [5,6]. Additionally, For this work, the focus is to study the fault diagnosis of wind
improved CM leads to a better understanding of the design short- turbine bearing in wind energy. Schlechtingen et al. predicted the
comings, which can be used to improve WT design [7]. For this bearing failure of wind turbine generators by comparative analysis
purpose, CM systems are increasingly installed. Past research of neural network and regression using the data of bearing tem-
shows great potential for economic benefit. Such gains are proven perature [33]. They found that the nonlinear NN approaches
to be substantial and largely dependent on the fault detection rate outperform the regression models. Bangalore et al. used an artificial
[8,9]. Therefore, CM systems grew from 25% in 2012 to 36.1% in neural network approach to predict the early fault in the bearing of
2015. The key purpose of CM is to monitor and identify bearing a planetary gearbox in wind turbine by bearing temperature mea-
failures on wind turbines. For example, the gearbox and generator surement [34]. They conducted case studies in which their method
are studied due to the high percentage of replacement over the predicts a gearbox bearing fault a week before it was indicated by
wind farm life and to considerable equipment downtime, leading to the vibration based condition monitoring system. Bach-Andersen
substantial production loss. Recent studies show that bearings et al. use convolutional neural networks to classify expert-labeled
cause approximately 70% of gearbox downtime and 21%e70% of vibration data containing main bearing faults (rotor, planetary
generator downtime [10]. stage, and helical stage bearing faults) [35]. They find that the deep
In addition, corrective or preventive maintenance is inefficient CNN approach achieves a better performance on the test data than
to avoid these failure rates. Corrective maintenance is only per- a logistic classifier or a shallow multilayer perceptron.
formed after the misfunction has occurred, so it would not be able In this work, an approach is proposed to diagnose the bearings
to avoid any failures. Preventive maintenance is time-based way used in wind turbines based on customized ML framework. For
that faces two main challenges: under- and over-maintenance. In such a design, an observation is firstly done that the wind is char-
other words, the challenges are that the system performance is not acterized by its speed and direction, which are affected by several
effectively monitored, resulting in unexpected failures, or there is factors, e.g., geographic location, climate characteristics, height
an excessive number of maintenance activities, resulting in waste of above ground, and surface topography. For example, wind turbines
resources. Therefore, it is crucial to use condition-based mainte- interact with the wind, capturing a portion of its kinetic energy and
nance on bearings and to develop methods that can be used to converting it into useable energy. Therefore, the modeling is
diagnose faults early. In general, bearing diagnosis is a well- customized with these factors for knowledge enhancement. In the
recognized field of research; however, this is not the case for ma- process of fault diagnosis, the selection of characteristic parameters
chine operating under nonstationary loads, such as wind turbines significantly influences the diagnostic accuracy. Thus, a systematic
[11,12]. In the case of varying load/speed, vibrational signals method is presented to assess the prediction accuracy of each
generated by rolling element bearings are affected by operational feature to find the most effective combination for fault diagnosis of
factors, making the diagnosis difficult [13e15]. These difficulties wind turbine bearings based on neighborhood component analysis
arise from the variation in vibration-based diagnostic features (NCA). Various statistical features are used in this work from
caused mostly by load/speed variation (operation factors), low notable category differences in time-domain, frequency-domain,
energy of sought-after features, and low signal-to-noise levels and time-frequency. For this purpose, the hybrid-domain feature
[16,17]. Analysis of the signal from the main bearing is even more parameters defined on multiple domains are introduced to describe
difficult due to the very low rotational speed of the main shaft. the state of the rolling bearing. Furthermore, the fault diagnosis of
In recent years, machine learning (ML) approaches, such as wind turbine bearings is performed based on time-frequency
support vector machine (SVM), artificial neural networks (ANN), analysis and a supervised learning method.
and Deep Neural Networks (DNN), have been widely employed to To demonstrate the performance, the proposed method is vali-
resolve the complex nonlinear and uncertainty problems across dated on different data sets and conditions. The experimental re-
various disciplines. Successful applications of ML dealing with wind sults prove that the proposed method can effectively extract useful
energy industry can be found in many recent studies [18e20]. features accurately and efficiently for nonstationary vibrational
These researches can be divided into four general categories: pre- bearing signals. Further, the accuracy of fault diagnosis is signifi-
diction, design optimization, optimal control, and fault detection cantly higher than that of past methods.
[21]. The important parameters prediction of wind power genera- The remaining paper is organized as follows. Section II analyzes
tion based on ML, such as wind speed and wind power, has received the impacts of working conditions on wind turbine bearings. Sec-
extensive research [22e24]. Plenty of research has been done on tion III performs the data-driven feature extraction and selection
deploying ML to improve the design of wind energy systems, such based on neighborhood component analysis. Section IV develops a
as wind turbine design and wind farm design [25]. At the same bearing fault diagnosis method using ML. Section V conducts ex-
time, ML technologies are showing sustainable strategies in the periments for numerical validation. Section VI compares the per-
field of control, and are shown in related research on max power formance between the proposed method and regression-based
tracking control [26], pitch angle control [27], speed control [28] method. Section VII concludes the paper.
and so on during the process of wind power generation. Research
on fault diagnosis of wind energy using ML has also received 2. Impacts of working conditions on wind turbine bearings
continuous and widespread attention. Leahy et al. diagnosed and
predicted the wind turbine generator faults from SCADA data using Wind turbines (WTs) are exposed to a variable environment and
support vector machines [29]. High recall and low precision were operate 24 h a day, 7 days a week and experience rapid changes in
also found for the diagnostic and prognostic cases. Tang et al. built a temperature, air pressure, wind shear, wind speed, and total load.
classifier to identify gear, bearings, shaft and general transmission Due to these variations, the bearings of wind turbines undergo
failures based on manifold learning and Shannon Wavelet Support continuously changing loads. Therefore, the characteristics of the
Vector Machines [30]. A standard SVM with a radial basis function working conditions of wind turbine bearings are first studied to
kernel obtained an accuracy of 76%. Jiang et al. proposed the mul- understand the causes of bearing faults. The observations will
tiscale convolutional neural networks for fault diagnosis of wind subsequently be incorporated into our feature pool.
988
2.1. Effect of nonstationary mechanical load 2.3. Types and impacts of the bearing faults
Mechanical loads cause fatigue damage to several parts of wind According to the characteristics of vibration signals during
turbine bearings, thereby reducing the useful life of the system. operation, the faults of rolling bearings can be mainly divided into
There are two types of mechanical loads: static and dynamic. Static two categories: surface damage faults and wear faults.
loads result from mean wind speed and determine the absolute
mechanical load level. Dynamic loads, induced by turbulence and 1) Surface damage faults: During the operation of the bearing, the
gusts, are more related to the fatigue damage. rolling ball, the outer race, and the inner race, they are all sub-
jected to the action of periodic pulsating loads, resulting in
1) Mean wind speed: The mean wind speed determines the basic cyclically changing contact stress. When the number of stress
working load of a wind turbine. This speed is quantified by the cycles reaches a certain value, fatigue damage occurs on the
hourly mean of wind speeds. It is also used to determine the rolling ball or the working surfaces of the inner and outer race.
economic viability of a wind energy project. The probability This phenomenon caused by impact load and alternating stress,
distribution of the mean wind speed is predicted from mea- also known as bearing fatigue failure, is the main cause of rolling
surements collected during several years. The experimentally bearing faults. When bearing fatigue failure occurs, significant
obtained wind distribution can be approximated by a Weibull shock load, vibration, and noise will be generated.
distribution given by
For bearings in wind turbines, this type of fatigue failure is
k
further aggravated due to the non-linear and uncertain non-
Vm
k Vm k1 C stationary loads, especially sudden shock loads. When such fail-
pðVm Þ ¼ e ; (1)
C C ures are not detected at an early stage, the surface damage will
further increase, resulting in complete bearing invalid and even
where k and C are the shape and scale coefficients, respectively. equipment accidents. Therefore, it is important to diagnose the
These coefficients are adjusted to match the wind data at a fatigue failure of rolling bearings.
particular site. The Weibull probability function reveals that large
wind speeds rarely occur, whereas moderate winds are more 2) Wear faults: In the normal use of the bearing, the friction be-
frequent. tween the bearing balls, the outer and the inner race will lead to
the occurrence of wear failures. This kind of failure usually takes
2) Turbulence: Turbulence includes all wind speed fluctuations a long time to occur because it is a gradual process. In general,
with frequencies above the spectral gap. Therefore, turbulence wear faults will not cause bearing damage immediately, and the
contains all components in the range from seconds to minutes. degree of harm is far less than surface damage faults.
In general, turbulence exerts a minor impact on the annual
energy capture. However, it has major impacts on aerodynamic
loads and power quality. 3. Feature parameter selection for improving the diagnosis
accuracy
Feature selection is a critical component in data workflow

because high-dimensional data can downgrade the model perfor-
2.2. Effect of wind farm location mance [37]. For example, the training time can increase exponen-
tially with numerous features, which also causes overfitting
Wind field characteristics exert important impacts on the type problems. Feature selection methods help with these problems by
of wind turbine bearing failure. Wind farms are primarily divided reducing the dimensions without much loss of the total informa-
into offshore wind farms and onshore wind farms from the wind tion. This approach also helps to make sense of the features and
field point of view. Terrain, even in the absence of obstacles, pro- their importance. Feature selection methods aid in creating an ac-
duces friction forces that delay the winds in the lower layers, which curate predictive model by choosing features that will yield good or
produces wind shear. This effect is more important at lower height better accuracy while requiring fewer data [38].
levels and when the terrain is complex. The wind shear makes the Feature selection is the process of selecting a subset of relevant
wind turbine generator unbalanced and causes the transmission features for use in model construction. Given a set of features
system to be more easily damaged. Comparatively, the problem is
not that obvious for offshore wind farms since the sea level is x ¼ fx1 ; , , , ; xN g; (3)
generally flat and the wind encounters much less resistance.
Furthermore, the frequency of the wind change on the sea is lower the feature selection problem is to find a subset of
than that on the land, and therefore, the mechanical loads of
offshore wind farms are more stable. x0 2x (4)
The shear of wind speed is a function of height. Different
mathematical models have been proposed to describe wind shear, that maximizes the classification accuracy on an unseen test set,
and one model is the Prandtl logarithmic law [36]. where
Vm ðzÞ lnðz=z0 Þ x0 ¼ fxi1 ; …; xiM g; N M: (5)

¼ . ; (2)
Vm ðzref Þ lnðz
ref z0 Þ Usually, the x0 maximize some score functions. It should be noted that
feature selection is different from dimensionality reduction. Both
where z is the height above ground level, zref is the reference height methods seek to reduce the number of attributes in the data set, but a
(usually 10 m) and z0 is the roughness length. Therefore, the dis- dimensionality reduction method does so by creating new combi-
cussion in this section is used to build the feature pool for feature nations of attributes, whereas feature selection methods include and
selection in the next section. exclude attributes present in the data without changing them.
989
3.1. Feature parameters used to describe the status of bearings Table 1

Feature parameters.
Statistical algorithms are used to analyze the time signals of Time-domain Frequency-
vibration sensors. The employed approach involves calculating feature parameters domain feature
statistical values for a trend analysis and learning more about the parameters
xmax ¼ max(xi) FC ¼ PH
shape of the vibration signal. It is assumed that vibration signals fk sðkÞ
PN p5 ¼ Pk¼1
taken from intact bearings are normally distributed since the sonic n¼1 xðnÞxðnÞ
H
k¼1 sðkÞ
P
pulses will be generated by the rolling elements and surfaces in a 2p N n¼1 xðnÞ
2
random way. If a fault starts to develop on the bearing, sonic pulses xpp ¼ max(xi) min(xi) FMS ¼ p6 ¼
PN sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2 PH
n¼1 xðnÞ 2
with higher energy (spikes) can be found in the vibration signals. P k¼1 ðfk p5 Þ sðkÞ
4p 2 Nn¼1 xðnÞ
2
H
The presence of those spikes can be identified with the statistical rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffi vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1 XN 2 FRMS ¼ FMS uPH 2
analysis algorithms as described below. Therefore, these statistical xrms ¼ u f k sðkÞ
N i¼1 i
x p7 ¼ t Pk¼1 H
algorithms provide an absolute measure of the condition of the k¼1 sðkÞ
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
respective bearing; there is no training phase required to learn the 1 XN FV ¼ FMS ðFC Þ2 uPH 4
u
x ¼ x f k sðkÞ
baseline values, which correspond to a fault-free condition of the N i¼1 i p8 ¼ tPk¼1 H 2
k¼1 f k sðkÞ
component. The disadvantages of these algorithms include the xrms PH p9 ¼
Sf ¼ k¼1 sðkÞ
relatively poor selectivity, which means that a change in the value 1 XN p1 ¼ PH 2
j xj H k¼1 fk sðkÞ
does not indicate a certain faulty component of a bearing, for N i¼1 i qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
PH PH 4
example, the inner ring surface. k¼1 sðkÞ k¼1 f k sðkÞ
xmax p2 ¼ p6
The various statistical characteristics are used in this study from If ¼ p10 ¼
1 XN PH p5
j xj p1 Þ2
k¼1 ðsðkÞ
notable category differences in the time-domain, frequency- N i¼1 i
H1
domain, and time-frequency features. For example, the hybrid- xmax
CLf ¼ X pffiffiffiffiffiffiffi p3 ¼ p11 ¼
1 N PH PH
domain feature parameters defined in multiple domains are j jxi jj2 k¼1 ðsðkÞ p1 Þ3 3
k¼1 ðfk p5 Þ sðkÞ
N i¼1 pffiffiffiffiffiffi 3
introduced to describe the state of the rolling bearing. Hð p2 Þ Hp36
1 XN p4 ¼ p12 ¼
ðx xÞ4
i¼1 i PH 4 PH 4
1) Time-domain feature parameters: The crest factor has a value of Ck ¼ N 4 k¼1 ðsðkÞ p1 Þ k¼1 ðfk p5 Þ sðkÞ
xrms
approximately 3 for an ideal normally distributed signal. The Hðp2 Þ2 HF64
xmax
reason, the maximum value in a normally distributed signal (the CF ¼
xrms
so-called white noise) is nearly 3s, whereas the Root Mean where xi is a signal series for n ¼ 1, 2, where s(k) is a spectrum for k ¼ 1, 2, …, H;
Square (RMS) is 1s. If a fault is developing in a bearing, the spike …, N; N is the number of data H is the number of spectrum lines; and fk is
amplitudes, and therefore the maximum value of the signal, points. the frequency value of the kth spectrum
line.
increase. In an early occurrence of the faulty condition, the RMS
value will not be affected since the spike energy is small relative
to the overall energy of the signal. Therefore, the RMS value does
not change significantly, and the crest factor increases. The fault
detection and generation of condition monitoring system (CMS)
alarms are performed by trend analysis of the crest factor.
Although the calculation of the statistical values of vibrational
signals are independent of the measurement time, an appro-
priate time window should be measured, for example, 1 s. In any
case, the measurement time should be a multiple of the grid
frequency (20 ms in Europe) because this will help eliminate
signal distortions related to grid EMC effects.
2) Frequency-domain feature parameters: Energy values of the
intrinsic mode function (IMF) components are obtained from
vibration acceleration signals of the bearing data. To effectively
assess different fault locations and different degrees of perfor-
mance degradation of a rolling bearing with a unified assess-
ment index, a state assessment method based on the relative
compensation distance of multiple domain features and locally
linear embedding is adopted [39]. The time-domain, frequency-
domain and time-frequency features are listed in Table 1, and
Fig. 1 is a visualization of four parameters in the table. Fig. 1. Feature weights for classification.
3.2. Neighborhood component analysis (NCA) for feature selection Consider a multiclass classification problem with a training set
containing n observations
Neighborhood component analysis (NCA) is a nonparametric and
embedded method for selecting features with the goal of maxi- S ¼ fðxi ; yi Þ; i ¼ 1; 2; …; ng; (6)
mizing the prediction accuracy of regression and classification al-
gorithms. The method of NCA feature selection with regularization where xi 2Rp are the feature vectors, yi 2 {1, 2, …, c} are the class
learns feature weights for minimization of an objective function that labels, and c is the number of classes. The aim is to learn a classifier
measures the average leave-one-out classification or regression loss f : Rp /f1; 2; …; cg that accepts a feature vector and makes a pre-
over the training data. In this work, the NCA method is used to select diction f(x) of the true label y.
the feature parameters for classification of bearing faults. Consider a randomized classifier that
990
randomly picks a point, Ref(x), from S, as the ‘reference point’ for 2 3

x and 1 Xn Xn Xp
w2r ¼ 4 p y l w2 5
labels x using the label of the reference point Ref(x). n i¼1 j¼1;jsi ij ij r¼1 r
|fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl}
This scheme is similar to that of a 1-nearest neighbor (NN) 1 Xn
classifier, where the reference point is chosen to be the nearest Fi ðwÞ ¼ F ðwÞ;
i¼1 i
(16)
n
neighbor of the new point x. In NCA, the reference point is chosen
randomly, and all points in S have some probability of being where l is the regularization parameter. The regularization term
selected as the reference point. The probability P(Ref(x) ¼ xj|S) that drives many of the weights in w to 0.
point xj is picked from S as the reference point for x is higher if xj is After choosing the kernel parameter s in pij as 1, the weight
closer to x as measured by the distance function dw, where vector w is found via the minimization problem for given l,
Xp 1 Xn
dw ðxi ; xj Þ ¼ w2r jxir xjr j; (7) ^ ¼ arg min f ðwÞ ¼ arg min
w f ðwÞ; (17)
r¼1
w w n i¼1 i
and wr are the feature weights. Assume that

where
PðRefðxÞ ¼ xj jSÞfkðdw ðx; xj ÞÞ; (8) f ðwÞ ¼ FðwÞ (18)
where k is some kernel or a similarity function that assumes large

fi ðwÞ ¼ Fi ðwÞ: (19)
values when dw(x, xj) is small. Let
P Pn
1 n
Note that j¼1;jsi pij ¼ 1, and the argument of the mini-
z n i¼1
kðzÞ ¼ exp : (9) mum does not change if a constant is added to an objective func-
s tion. Therefore, the objective function is rewritten by adding the
The reference point for x is chosen from S, so the sum of constant 1.
P(Ref(x) ¼ xj|S) for all j must be equal to 1 [40]. Therefore, it is
possible to write ^ ¼ arg min f1 þ f ðwÞg
w
w
8 9
kðdw ðxi ; xj ÞÞ <1 Xn Xn Xp =
PðRefðxÞ ¼ xj jSÞ ¼ Pn : (10) ¼ arg min pij ð1 yij Þ þ l 2
wr
j¼1 kðdw ðxi ; xj ÞÞ w :n i¼1 j¼1;jsi r¼1 ;
Now, consider the leave-one-out application of this randomized 8 9
<1 Xn Xn Xp =
classifier; that is, predicting the label of xi using the data in Si, with
¼ arg min p ij lðy i ; yj Þ þ l w2
r ;
the training set S excluding the point (xi, yi). The probability that w :n i¼1 j¼1;jsi r¼1 ;
point xj is picked as the reference point for xi is
(20)
kðdw ðx; xj ÞÞ
pij ¼ PðRefðxi Þ ¼ xj jSi Þ ¼ Pn : (11) where the loss function is defined as
j¼1;jsi kðdw ðx; xj ÞÞ

1 ifyi syj
The average leave-one-out probability of a correct classification lðyi ; yj Þ ¼ (21)
0 otherwise:
is the probability pi that the randomized classifier correctly clas-
sifies observation i using Si: The argument of the minimum is the weight vector that mini-
mizes the classification error.
Xn
pi ¼ j¼1;jsi
PðRefðxi Þ ¼ xj jSi ÞIðyi ¼ yj Þ (12)
4. Bearing fault diagnosis using ML
Xn
¼ j¼1;jsi
pij yij ; (13) 4.1. ML methods
where ML is the process of building an inductive model that learns

from a limited amount of data without specialist intervention. This

1 if yi syj learning implies finding an underlying set of structures (or pat-
yij ¼ Iðyi ¼ yj Þ ¼ (14) terns) are useful to understand relationships in data that might not
0 otherwise:
be exactly similar to that on which learning occurred. In general, ML
The average leave-one-out probability of a correct classification is divided into two categories: supervised learning and unsuper-
using the randomized classifier can be written as vised learning. The supervised learning predicts an output variable
using labeled input data, while unsupervised learning draws in-
1 Xn ferences from data without labeled inputs (those done by clustering
FðwÞ ¼ p:
i¼1 i
(15)
n algorithms, recommender systems etc.). For supervised learning, it
The right-hand side of F(w) depends on the weight vector w. The can be distinguished between models that predict a numeric var-
goal of neighborhood component analysis is to maximize F(w) with iable (regression) or a categorical variable (classifiers). Learning in
respect to w. models translates into fitting a model's parameters to a specific
dataset, iteratively updating them with several passes through the
1 Xn Xp data until a specific predefined function is minimized.
FðwÞ ¼ p l
i¼1 i r¼1 In the field of wind power, both supervised learning and un-
n
supervised learning have been successfully applied to fault
991
detection [33e35,41,42]. Helbing et al. reviewed a selection of rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

studies using ANN and Deep Learning since 2009 [18]. They found XK
dðxi ; xj Þ ¼ k¼1
ðxi ðkÞ xj ðkÞÞ2 : (27)
that supervised approaches might often outperform unsupervised
approaches at fault detection. Furthermore, supervised learning
enable additional fault diagnosis due to the labeled input data.
Therefore, supervised ML algorithms including support vector
4.5. Artificial neural networks (ANN)
machine, Naive Bayes, k-nearest neighbor, and artificial neural
network were adopted to perform the bearing fault diagnosis in the
As the most popular ML methods, Artificial Neural Networks
following section.
(ANN) simulates the function of biological nervous systems, where
a certain number of simple elements called neurons constitute the
ANN and operate in parallel. This kind of algorithm performs
4.2. Support vector machine (SVM) noticeably well for regression and classification by building the
relationship between inputs and outputs throughout adjusting the
SVM is a computational learning method for small sample weights of the connections between its elements and biases.
classification. Algorithmically, SVM builds an optimal separating A two-layer feed-forward network with sigmoid hidden and
hyperplane f(x) ¼ 0 between data sets by solving a constrained softmax output neurons is employed to classify the bearing con-
quadratic optimization problem based on structural risk minimi- dition in this study. The network is trained with scaled conjugate
zation (SRM) [43,44]. gradient backpropagation learning algorithm. When a training loop
XN begins, the data are transformed through the network layers and
y ¼ f ðxÞ ¼ W T x þ b ¼ i¼1
Wi xi þ b; (22) map the input values to the output values, which is called the
forward propagation of signal. Then the loss function evaluates the
where W is an N-dimensional vector and b is a scalar. The optimal deviations between the target and output values and dynamically
separating hyperplane is the separating hyperplane that creates the tune the weight matrix in each hidden layer to produce better
maximum distance between the plane and the nearest data. By outputs. Then the classification is performed by the output neurons
converting the optimization problem with the Kuhn-Tucker con- function softmax.
dition into the equivalent Lagrangian dual quadratic optimization
problem, the classifier based on the support vector can be obtained. 4.6. Fault diagnosis process based on feature extraction and ML
The flow chart of the fault diagnosis process is shown in Fig. 2,

4.3. Naive Bayes classifier where the feature extraction/selection process is integrated in the
upper figure and the ML part in the lower figure. Together, an ac-
Naive Bayes is a generative model with high learning and pre- curate bearing fault diagnosis is achieved.
dicting efficiency, which is easy to implement [45]. This method
attempts to maximize the posterior 5. Experiments and discussion
PðYjXÞ PðYÞPðXjYÞ: (23) 5.1. Experimental data

Following the conditional independence assumption, it can be
Numerical validations are extensively conducted with similar
expressed as
results. For example, one of the data sets on a rolling bearing was
PðXjYÞ ¼ PðX1 X2 ; …; Xn jYÞ: (24) obtained from the Bearing Data Center of Case Western Reserve
University (https://engineering.case.edu/bearingdatacenter) [47].
Or, equivalently The bearing type is an SKF6205 deep-groove ball bearing. Accel-
eration sensors were arranged on the bearing seat at the driving
PðXjYÞ ¼ PðX1 jYÞPðX2 jYÞ/PðXn jYÞ: (25) end and fan end of the motor. The vibrational signals were acquired
The parameters P(Y ¼ yk) (1 k K can be obtained, where K is with a 16-channel data logger, and the signal sampling frequency
the total number of labels) and P(Xi ¼ xi|Y ¼ yk) (xi is a value of was set as 12 kHz.
random variable Xi) via maximum likelihood estimation (MLE) on This study attempts to use a small number of sample data to
obtain the fault diagnosis model. Therefore, the bearing vibration
the data set. When a new sample ðx1 ; x2 ; …; xn ÞT arrives, its label
signal is sampled and only 64 sample data under each operating
can be predicted as
condition for each fault mode are used to train and validate the
Yn diagnosis model. Since the bearings of wind turbines are subjected
^ ¼ arg max PðY ¼ yk Þ
y PðXi ¼ xi jY ¼ yk Þ: (26)
k2f1;2;…;Kg
i¼1 to transiently varying loads and rotational speeds, the fatigue faults
of the bearings are diverse. To validate the accuracy and robustness
of fault diagnosis, at first the test data are recombined into four sets
of data. The first data set corresponds with a relatively uniform
fault; that is, the numbers of faults of the inner race, the outer race,
4.4. K-nearest neighbors (KNN) and the ball are substantially equal. The inner race, the outer race,
and the ball fault separately occupy a majority in the remaining
KNN is an instance-based ML method. For a testing data set, the three data sets. Next, these four sets of data for analysis are used to
KNN calculates the distance between the testing sample and all the verify the adaptability of the proposed fault diagnosis method.
training samples. Then, the K nearest neighbors are counted and
vote for the label of the testing data [46]. For example, the 5.2. Data characteristics of the bearing fault signals
Euclidean distance is used to calculate sample distance for KNN as
follows: To understand the bearing fault signals, the faults are manually
992
Fig. 3. Classification of normal and faulty bearings with peak-peak values.
Fig. 2. Flow chart of bearing fault diagnosis based on feature extraction and ML.
classified by numerical comparisons. The normal bearings are first

distinguished from faulty bearings by time-domain feature pa-
rameters. The peak-peak comparison of normal and faulty bearings
is shown in Fig. 3. It can be seen that when the peak-peak value is
less than 0.55, the bearing does not malfunction. Therefore, the
normal bearings can be distinguished using the peak-peak values.
Next, the remaining faulty bearing signals are classified to
determine the type of fault in these bearings. In this step, analysis is
performed to distinguish between ball faults and inner/outer race
faults. The peak-peak comparison of normal and faulty bearings is
shown in Fig. 4. It is determined that when the peak-peak value is
greater than 15, a rolling element failure has occurred; when the
peak-peak value is greater than 4 and less than 15, a race failure has
occurred; and when the peak-peak value is less than 4, the two
faults are aliased together and cannot be distinguished. Therefore,
the multiple time-domain feature parameters are used to subdivide
the data with peak-peak values less than 4. The results show that
the distinction is best when using the peak-peak indicator, but
there is still a ball fault bearing that cannot be correctly Fig. 4. Classification of ball and outer/inner race bearing faults with peak-peak values.
distinguished.
Next, the inner and outer race faults are distinguished. The
comparison of the values is shown in Fig. 5. It can be seen that when According to the analysis above, the peak-peak parameters can
the peak-peak value is between 2 and 8, the two fault signals are effectively distinguish between normal bearings and faulty bear-
mixed and are difficult to distinguish. When the peak-peak value is ings; however, when classifying bearing fault types, time-domain
found in other ranges, the two fault signals can be directly distin- parameters cannot correctly classify all faults, especially when
guished. Therefore, the data are subdivided with peak-peak values distinguishing between inner and outer race faults. Therefore, the
between 2 and 8. The results show that there are still 5 data points proposed ML methods are used to classify the bearing faults. The
that are mixed and cannot be distinguished. validation is presented in the next section.
993
Fig. 6. Comparison of fault diagnosis accuracy with different methods.
Fig. 5. Classification of outer and inner race bearing faults with peak-peak values.
5.3. Fault classification in its entirety
To match the effects of wind field characteristics on bearing fault

types, the bearing fault signals are regrouped. In other words, three
sets of bearing fault signals are constructed, and these sets sepa-
rately contain more ball faults, outer race faults, and inner race
faults. This approach allows us to utilize different diagnostic
methods for different wind farms, thus improving the relevance
and accuracy of diagnosis. The ML algorithms mentioned in Section
4 above were used to train the classifier. Then, the bearing data
after the feature extraction and selection process are simulta-
neously classified into normal bearings, ball fault bearings, inner Fig. 7. Comparison of training time with different methods.
race fault bearings and outer race fault bearings by the trained
classifier.
The classification rate of bearing fault for each method in its 5.4. Fault classification by steps
entirety way was shown in Table 2. The comparison of fault clas-
sification accuracy and training time with different methods is In this section, a step-by-step method was used to classify the
shown in Fig. 6 and Fig. 7 respectively. Notice that the classification bearing faults with ML. That is, the process was performed with the
accuracy obtained by the KNN method is the highest. Again, the following steps:
KNN method trained with a small number of fault data produces
the highest classification accuracy of 89.4%. Although the training Step 1: Build a classifier to distinguish the normal bearing from
time of the KNN method is slightly higher than that of the other all bearing. At this time, the bearings were divided into two
three methods, the training of the KNN method is still faster. categories, normal bearings and faulty bearings.
Furthermore, as shown in Fig. 6, the classification accuracy im- Step 2: Build a classifier to distinguish the ball fault bearing from
proves as the training data increases. faulty bearings. At this time, the faulty bearings were divided
into two categories, ball fault bearings and race fault bearings
including the inner and outer race fault bearings.
Step 3: Build a classifier to distinguish the inner race fault
bearing from race fault bearings. At this point, the bearing fault
Table 2
Classification rate of bearing fault for each method in its entirety.
has been completely classified.
Method Classification rate The classification rate of bearing fault for each method by steps
SVM 75% was shown in Table 3. Note, it is easy to classify normal bearings
Naive Bayes classifier 70.83% and faulty bearings with classification rate 100% for all the pro-
KNN 87.5%
posed ML methods. This means that the proposed method works
ANN 83.33%
994
very well for fault detection, and can distinguish between normal into discrete labels, e.g., fault or no-fault. The comparison between
and faulty bearings at a very high classification rate. The classifi- the proposed ML methods and regression-based method is done
cation rate decreased in the Step 2 stage, but the rate of all methods finally.
was still above 80%. Next, the classification performance of the four
algorithms further decreased and showed significant differences 6.2. Regression-based modeling of bearing fault
for classifying the inner race fault and the outer race fault. The KNN
method achieved a good performance with classification rate above The public dataset obtained from the Bearing Data Center of
90% in the Step 3 stage. The classification rate of Naive Bayes Case Western Reserve University were used to compare the pro-
method is relatively poor, below 70%. The results of the average rate posed method with regression-based method [47]. For comparison,
show that the KNN method has the best classification performance, the regression model was trained with the same training and
which is slightly higher than that of the ANN algorithm. testing data used by the proposed ML model. In general, both
parametric modeling techniques (linear regression and nonlinear
5.5. Analysis and discussion regression) and non-parametric modeling techniques (ANN) are
employed to predict the bearing fault. The bearing vibration signal
By comparing the classification accuracy in Tables 2 and 3, it after the feature extraction and selection process is used as the
shows that the classification performance of ANN and KNN input data of the regression model. The status of bearing is divided
methods can be improved by using the step-by-step classification into normal with label 0, ball fault with label 1, inner race fault with
method. At the same time, the KNN method produces the best label 2, and outer race fault with label 3. The label value of the
classification performance in both its entirety and step-by-step bearing is used as the output data of the regression model. The
classification. Thus the KNN algorithm can provide good results training of the regression model was performed in MATLAB version
for bearing fault diagnosis. 2019a. Subsequently, data samples selected randomly are input into
the trained regression model to predict the corresponding bearing
state.
6. Comparison between the proposed method and regression
based method
6.3. Evaluation criteria
6.1. Regression-based fault detection method
To analyze the difference of each model clearly, the evaluation
Regression based fault detection identifies how signals and criteria are introduced for comparison. Mean absolute error (MAE)
features are related to a state output in different components. This and root mean square error (RMSE) are used to evaluate the per-
relationship is captured by fitting regression models when the formance. The mathematical expressions are as follows:
system is in a healthy state. Therefore, this regression model is also
1 Xn
known as the normal behavior model (NBM). When new data is MAE ¼ ^t r
ryt y (28)
n t¼1
obtained, the NBM prediction value of state variable is compared
with the measured values. The difference between the predicted
and actual measured values is called residual, and the residual are
used for monitoring the condition. Residuals that exceed a given
Table 4
error threshold indicate malfunctioning.
MAE and RMSE for the prediction methods.
For the fault detection in wind turbine, the residuals indicate
how close the wind turbine's behavior resembles the one observed Method MAE RMSE
during training of the NBM. It is reasonable to assume that in a Linear regression 0.5426 0.6574
normal functioning wind turbine, the residuals are homogeneously Nonlinear regression 0.5052 0.6258
randomly distributed. Therefore, trends or excessively large re- ANN 0.5589 0.6658
siduals indicate a departure from normal functioning and an alarm

is triggered if the residuals exceed the given threshold. A number of
studies have established regression models to predict state pa-
rameters such as bearing temperature, lubricant pressure, gener-
ator temperature and power curves in healthy state, and perform
successfully the fault detection in wind turbine [19,33]. It should be
noted that the above methods cannot directly classify the bearing
fault types when detecting bearing faults.
In this section, an attempt is made to use the regression-based
method for fault diagnosis of rolling bearings. The public data are
first used to train the regression method for a continuous output or
label. Then a threshold is given to discretize the regression result
Table 3
Classification rate of bearing fault for each method by steps.
Method Classification rate
Step1 Step2 Step3 Average rate
SVM 100% 83.33% 75% 86.11%

Naive Bayes classifier 100% 83.33% 66.67% 83.33%
KNN 100% 94.44% 91.67% 95.37%
Fig. 8. Comparison of actual labels with predictive label based on regression model
ANN 100% 94.44% 83.33% 92.59%
and KNN.
995
Table 5
Comparison between the actual value and predicted value of bearing state produced with nonlinear regression.
Actual value 1 1 1 1 0 2 2 2
Predicted value 1.7699 1.4837 1.8018 1.2616 0.5165 2.6506 2.5833 2.1380
Actual value 2 2 3 3 3 3 3 3
Predicted value 2.6871 2.1910 2.5462 2.9518 2.5096 1.2209 2.2482 2.7204
Actual value 3 0 0 0 1 1 2 0
Predicted value 2.9060 1.0983 0.3385 0.4730 1.3737 0.8637 2.1730 0.5525
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi CRediT authorship contribution statement

1 Xn
RMSE ¼ ½y y ^ t 2 (29)
n t¼1 t Bodi Cui: Conceptualization, Methodology, Writing e original
draft, Writing e review & editing, Data curation, Software, Formal
^t is the predicted value, and n is the
where yt is the actual value, y analysis, Validation. Yang Weng: Conceptualization, Methodology,
number of data. Writing e original draft, Writing e review & editing. Ning Zhang:
Methodology, Writing e review & editing.
6.4. Comparison and discussion Declaration of competing interest
The trained regression models were evaluated with MAE and The authors declare that they have no known competing
RMSE criteria, and the results were listed in Table 4. Notice that the financial interests or personal relationships that could have
MAE and RMSE values corresponding to the nonlinear regression appeared to influence the work reported in this paper.
model are the smallest, that is, the accuracy of the nonlinear
regression model is the highest. The labels of actual values and the
Acknowledgements
predictive values are shown in Fig. 8. It can be seen that the actual
value of the label and the predictive value generated by the KNN
This work was supported in part by the Department of Energy
method are both discrete data. The predicted value of the KNN
under grants DE-AR00001858-1631 and DE-EE0009355, the Na-
produced three prediction errors, and the others were consistent
tional Science Foundation (NSF) under the grants ECCS-1810537
with the actual value. The label that is predicted based on the
and ECCS-2048288.
regression method is a continuous label. A threshold needs to be
given to determine the state of the bearing.
The comparison between the actual value and predicted value of References
bearing label produced with nonlinear regression is shown in [1] Q. Xu, C. Kang, N. Zhang, Y. Ding, Q. Xia, R. Sun, J. Xu, A probabilistic method
Table 5. Currently, how to determine the threshold is the key to for determining grid-accommodable wind power capacity based on multi-
fault diagnosis. For example, if the threshold for outer race faults is scenario system operation simulation, IEEE Trans. Smart Grid 7 (1) (Jan 2016)
400e409.
set to 2.5, five bearings with outer race fault will be correctly
[2] E. Gonzalez, B. Stephen, D. Infield, J.J. Melero, Using high-frequency scada data
identified. At this time, the detection ratio of outer race fault was for wind turbine performance monitoring: a sensitivity study, Renew. Energy
71.4%. But unfortunately, the other three bearings with inner race 131 (Feb 2019) 841e853.
anomalies were mistakenly believed to have outer race fault. [3] J. Carroll, A. McDonald, I. Dinwoodie, D. McMillan, M. Revie, I. Lazakis, Avail-
ability, operation and maintenance costs of offshore wind turbines with
Obviously, the design of thresholds heavily depends on the de- different drive train configurations, Wind Energy 20 (2) (Jul 2017) 361e378.
signer's experience and knowledge. At the same time, the threshold [4] D. Milborrow, Operation and maintenance costs compared and revealed,
setting will also change for different wind farm environments. The Wind Stats 19 (3) (2006) 3.
[5] W. Liu, B. Tang, J. Han, X. Lu, N. Hu, Z. He, The structure healthy condition
proposed ML method can automatically complete the classification monitoring and fault diagnosis methods in wind turbines: a review, Renew.
of faults without the need to set thresholds and provides a diag- Sustain. Energy Rev. 44 (Jan 2015) 466e472.
nostic accuracy about 90% that is produced by KNN. Therefore, the [6] G. Pedro, T. Mark, P. Pinar, P. Mayorkinos, Condition monitoring of wind
turbines: techniques and methods, Renew. Energy 46 (2012) 169e178.
proposed method outperforms the regression-based method for [7] P. Tchakoua, R. Wamkeue, M. Ouhrouche, F. Slaoui-Hasnaoui, T. Tameghe,
bearing fault diagnosis. G. Ekemb, Wind turbine condition monitoring: state-of-the-art review, new
trends, and future challenger, Energies 7 (4) (Apr 2014) 2595e2630.
[8] K. Fischer, F. Besnard, L. Bertling, Reliability-Centered Maintenance for Wind
Turbines Based on Statistical Analysis and Practical Experience 27 (1) (Mar
7. Conclusion 2012) 184e195.
[9] A. May, D. McMillan, S. Thons, Economic analysis of condition monitoring
systems for offshore wind turbine sub-systems, IET Renew. Power Gener. 9 (8)
This paper proposes a novel scheme for wind turbine bearing (Nov 2015) 900e907.
fault diagnosis based on data mining and ML. The proposed method [10] H.D.M. de Azevedo, A.M. Araújo, N. Bouchonneau, A review of wind turbine
works directly on original data obtained from vibration signals of bearing condition monitoring: state of the art and challenges, Renew. Sustain.
Energy Rev. 56 (Apr 2016) 368e379.
bearings under different operation conditions. There are two main [11] Z. Feng, M. Liang, Fault diagnosis of wind turbine planetary gearbox under
steps in the proposed method. (1) A systematic method is pre- nonstationary conditions via adaptive optimal kernel time-frequency analysis,
sented to assess the prediction accuracy of each characteristic to Renew. Energy 66 (Jan 2014) 468e477.
[12] X. Chen, Z. Feng, Iterative generalized time-frequency reassignment for
find the combination of the most effective features for fault diag-
planetary gearbox fault diagnosis under nonstationary conditions, Mech. Syst.
nosis of wind turbine bearings. (2) ML is utilized to perform the Signal Process. 80 (May 2016) 429e444.
fault diagnosis of wind turbine bearings. The experiments [13] Z. Feng, W. Zhu, D. Zhang, Time-frequency demodulation analysis via vold-
demonstrate that this method can effectively extract features from kalman filter for wind turbine planetary gearbox fault diagnosis under
nonstationary speeds, Mech. Syst. Signal Process. 128 (Apr 2019) 93e109.
nonstationary signals, obtaining excellent results in the fault [14] I. Vamsi, G. Sabareesh, P. Penumakala, Comparison of condition monitoring
diagnosis of wind turbine bearings. techniques in assessing fault severity for a wind turbine gearbox under non-
996
stationary loading, Mech. Syst. Signal Process. 124 (Jan 2019) 1e20. [31] G. Jiang, H. He, J. Yan, P. Xie, Multiscale convolutional neural networks for
[15] R.U. Maheswari, R. Umamaheswari, Trends in non-stationary signal process- fault diagnosis of wind turbine gearbox, IEEE Trans. Ind. Electron. 66 (4)
ing techniques applied to vibration analysis of wind turbine drive train e a (2018) 3196e3207.
contemporary survey, Mech. Syst. Signal Process. 85 (2017) 296e311. [32] K. Chandrasekhar, N. Stevanovic, E.J. Cross, N. Dervilis, K. Worden, Damage
[16] Y. Guo, J. Keller, Investigation of High-Speed Shaft Bearing Loads in Wind detection in operational wind turbine blades using a new approach based on
Turbine Gearboxes through Dynamometer Testing, vol. 21, Wind energy, machine learning, Renew. Energy 168 (2021) 1249e1264.
2018, pp. 139e150. [33] M. Schlechtingen, I.F. Santos, Comparative analysis of neural network and
[17] E. Marino, A. Giusti, L. Manuel, Offshore wind turbine fatigue loads: the in- regression based condition monitoring approaches for wind turbine fault
fluence of alternative wave modeling for different turbulent and mean winds, detection, Mech. Syst. Signal Process. 25 (5) (2011) 1849e1875.
Renew. Energy 102 (2017) 157e169. [34] P. Bangalore, L.B. Tjernberg, An artificial neural network approach for early
[18] G. Helbing, M. Ritter, Deep learning for fault detection in wind turbines, fault detection of gearbox bearings, IEEE Trans. Smart Grid 6 (2) (2015)
Renew. Sustain. Energy Rev. 98 (2018) 189e198. 980e987.
[19] A. Stetco, F. Dinmohammadi, X. Zhao, V. Robu, D. Flynn, M. Barnes, J. Keane, [35] M. Bach-Andersen, B. Rømer-Odgaard, O. Winther, Deep learning for auto-
G. Nenadic, Machine learning methods for wind turbine condition moni- mated drivetrain fault detection, Wind Energy 21 (1) (2018) 29e41.
toring: a review, Renew. Energy 133 (Oct 2019) 620e635. [36] F.D. Bianchi, H.D. Battista, R.J. Mantz, Wind Turbine Control Systems. Plus 0.5em
[20] Y. Wang, R. Zou, F. Liu, L. Zhang, Q. Liu, A review of wind speed and wind Minus 0.4em, Springer, 2007.
power forecasting with deep neural networks, Appl. Energy 304 (2021). [37] I. Guyon, A. Elisseeff, An introduction to variable and feature selection,
[21] F. Elyasichamazkoti, A. Khajehpoor, Application of machine learning for wind J. Mach. Learn. Res. 3 (2003) 1157e1182.
energy from design to energy-water nexus: a survey, Energy Nexus 2 (2021). [38] P. Marti-Puig, A. Blanco-M, J. Cardenas, J. Cusido, J. Sole-Casals, Effects of the
[22] M.M. Nezhad, A. Heydari, E. Pirshayan, D. Groppi, D.A. Garcia, A novel fore- Pre-processing Algorithms in Fault Diagnosis of Wind Turbines, Environ-
casting model for wind speed assessment using sentinel family satellites mental Modelling and Software, Feb 2018, 0e1.
images and machine learning method, Renew. Energy 179 (Aug 2021) [39] S. Kang, D. Ma, Y. Wang, C. Lan, Q. Chen, V. Mikulovich, Method of assessing
2198e2211. the state of a rolling bearing based on the relative compensation distance of
[23] A. Marvuglia, A. Messineo, Monitoring of wind farms' power curves using multiple-domain features and locally linear embedding, Mech. Syst. Signal
machine learning techniques, Appl. Energy 98 (2012) 574e583. Process. 86 (Oct 2017) 40e57.
[24] M. Sharifzadeh, A. Sikinioti-Lock, N. Shah, Machine-learning methods for in- [40] W. Yang, K. Wang, W. Zuo, Neighborhood component feature selection for
tegrated renewable power generation: a comparative study of artificial neural high-dimensional data, J. Comput. 7 (1) (Jan 2012) 161e168.
networks, support vector regression, and Gaussian process regression, Renew. [41] R. Liu, B. Yang, E. Zio, X. Chen, Artificial intelligence for fault diagnosis of
Sustain. Energy Rev. 108 (2019) 513e538. rotating machinery: a review, Mech. Syst. Signal Process. 108 (Feb 2018)
[25] B. Cheng, J. Du, Y. Yao, Machine learning methods to assist structure design 33e47.
and optimization of dual darrieus wind turbines, Energy (2021), https:// [42] R. Zhao, R. Yan, Z. Chen, K. Mao, P. Wang, R.X. Gao, Deep learning and its
doi.org/10.1016/j.energy.2021.122643. applications to machine health monitoring, Mech. Syst. Signal Process. 115
[26] S. Sabzevari, A. Karimpour, M. Monfared, M.N. Sistani, Mppt control of wind (Jan 2019) 213e237.
turbines by direct adaptive fuzzy-pi controller and using ann-pso wind speed [43] W. Liu, Z. Wang, J. Han, G. Wang, Wind turbine fault diagnosis method based
estimator, J. Renew. Sustain. Energy 9 (2017), 013302. on diagonal spectrum and clustering binary tree svm, Renew. Energy 50 (Feb
[27] G. Abdalrahman, W. Melek, F.-S. Lien, Pitch angle control for a small-scale 2013) 1e6.
darrieus vertical axis wind turbine with straight blades (h-type vawt), [44] Y. Wang, S. Kang, Y. Jiang, G. Yang, L. Song, V. Mikulovich, Classification of
Renew. Energy 114 (2017) 1353e1362, 013302. fault location and the degree of performance degradation of a rolling bearing
[28] H.M. Hasanien, S.M. Muyeen, Speed control of grid-connected switched based on an improved hyper-sphere-structured multi-class support vector
reluctance generator driven by variable speed wind turbine using adaptive machine, Mech. Syst. Signal Process. 29 (May 2012) 404e414.
neural network controller, Elec. Power Syst. Res. 84 (2012) 206e213. [45] V. Muralidharan, V. Sugumaran, A comparative study of naive bayes classifier
[29] K. Leahy, R. Hu, I. Konstantakopoulos, C. Spanos, A. Agogino, D. O'Sullivan, and bayes net classifier for fault diagnosis of monoblock centrifugal pump
Diagnosing and predicting wind turbine faults from scada data using support using wavelet analysis, Appl. Soft Comput. 12 (8) (2012) 2023e2029.
vector machines, Int. J. Prognostics Health Manag. 9 (2018) 1e11. [46] D. He, R. Li, J. Zhu, Plastic bearing fault diagnosis based on a two-step data
[30] B. Tang, T. Song, F. Li, L. Deng, Fault diagnosis for a wind turbine transmission mining approach, IEEE Trans. Ind. Electron. 60 (8) (2013) 3429e3440.
system based on manifold learning and shannon wavelet support vector [47] Bearing Data Center, Case Western Reserve University [Online]. Available:
machine, Renew. Energy 62 (2014) 1e9. https://engineering.case.edu/bearingdatacenter.
997

A Feature Extraction and Machine Learning Framework For Bearing Fault Diagnosi

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Feature Extraction and Machine Learning Framework For Bearing Fault Diagnosi

Uploaded by

Copyright:

Available Formats

Renewable Energy 191 (2022) 987e997

Contents lists available at ScienceDirect

A feature extraction and machine learning framework for bearing fault

1. Introduction existing installations. Wind generators are mounted on top of high

Feature selection is a critical component in data workﬂow

Vm ðzÞ lnðz=z0 Þ x0 ¼ fxi1 ; …; xiM g; N M: (5)

3.1. Feature parameters used to describe the status of bearings Table 1

randomly picks a point, Ref(x), from S, as the ‘reference point’ for 2 3

and wr are the feature weights. Assume that

PðRefðxÞ ¼ xj jSÞfkðdw ðx; xj ÞÞ; (8) f ðwÞ ¼ FðwÞ (18)

where k is some kernel or a similarity function that assumes large

where ML is the process of building an inductive model that learns

detection [33e35,41,42]. Helbing et al. reviewed a selection of rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

The ﬂow chart of the fault diagnosis process is shown in Fig. 2,

PðYjXÞ  PðYÞPðXjYÞ: (23) 5.1. Experimental data

Fig. 3. Classiﬁcation of normal and faulty bearings with peak-peak values.

classiﬁed by numerical comparisons. The normal bearings are ﬁrst

Fig. 6. Comparison of fault diagnosis accuracy with different methods.

5.3. Fault classiﬁcation in its entirety

To match the effects of wind ﬁeld characteristics on bearing fault

siduals indicate a departure from normal functioning and an alarm

Method Classiﬁcation rate

Step1 Step2 Step3 Average rate

SVM 100% 83.33% 75% 86.11%

rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi CRediT authorship contribution statement

6.4. Comparison and discussion Declaration of competing interest

You might also like

PðYjXÞ PðYÞPðXjYÞ: (23) 5.1. Experimental data