Professional Documents
Culture Documents
Conference Paper
Author(s):
Abdallah, Imad ; Ntertimanis, Vasileios ; Mylonas, Charilaos; Tatsis, Konstantinos; Chatzi, Eleni ; Dervilis, Nikolaos; Keith,
Worden; Eoghan, Maguire
Publication date:
2018
Permanent link:
https://doi.org/10.3929/ethz-b-000313962
Rights / license:
In Copyright - Non-Commercial Use Permitted
This page was generated automatically upon download from the ETH Zurich Research Collection.
For more information, please consult the Terms of use.
Fault Diagnosis of wind turbine structures using decision tree learning
algorithms with big data
E. Maguire
Vattenfall AB
ABSTRACT: In the context of optimized Operation & Maintenance of wind energy infrastructure, it is impor-
tant to develop decision support tools, able to guide operators and engineers in the management of these assets.
This task is particularly challenging given the multiplicity of uncertainties involved, from the point of view of
the aggregated data, the available knowledge with respect to the wind turbine structures, and sub-components, as
well as the constantly varying operational and environmental loads. We here propose to propagate wind turbine
telemetry through a decision tree learning algorithm to detect faults, errors, damage, patterns, anomalies and ab-
normal operation. The use of decision trees is motivated by the fact that they tend to be easier to implement and
interpret than other quantitative data-driven methods. Furthermore, the telemetry consists of data from condition
and structural health monitoring systems, which lends itself nicely in the field of Big Data as large amounts are
continuously sampled at high rate from thousands of wind turbines. In this paper, we review several decision tree
algorithms formerly proposed by the machine learning community (i.e. ID3, C4.5, C5.0, J48, SPRINT, FACT,
FIRM, SLIQ, CHAID, QUEST, CRUISE, PUBLIC,BOAT, RAINFOREST, MARS, RIPPER and CART), we then
train an ensemble Bagged decision tree classifier on a large condition monitoring dataset from an offshore wind
farm comprising 48 wind turbines, and use it to automatically extract paths linking excessive vibrations faults
to their possible root causes. We finally give an outlook of an architecture to implement decision tree learning
in the context of cloud computing for big data, namely involving a cloud based Apache Hadoop software for
very large data storage and handling, and Apache Spark for efficiently running machine-learning algorithms.
0.7 Entropy
Class error
and pruning trees to avoid over-fitting are usually ap-
0.6 plied to improve the ability of the tree to generalise to
0.5 unseen data. The accuracy of each rule is then eval-
0.4
uated to determine the order in which they should be
0.3
applied. C5.0 is essentially the same as C4.5 but uses
less memory and builds smaller rule sets while being
0.2
more accurate.
0.1
CART (Breiman, Friedman, Olshen, & Stone 1984)
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
which stands for Classification and Regression Trees
Fraction of Class i=1 (f1) is very similar to C4.5, but it differs in that it supports
numerical target variables (regression) and constructs
Figure 2: Impurity index Id for a two-class example as a function binary tree based on a numerical splitting criterion re-
of the probability of one of the classes f1 using the information
entropy, Gini impurity and classification error. In all cases, the cursively applied to the data instead of constructing
impurity is at its maximum when the fraction of data within a rule sets.
node with class 1 is 0.5, and zero when all data are in the same According to (Lim, Loh, & Shih 2000) C4.5 and
category. QUEST have the best combinations of error rate and
speed, but C4.5 tends to produce trees with twice as
where i is the class to be predicted, n is all possible many leaves as those from Quest. QUEST is a binary-
classes, and fi is the fraction of the training data be- split decision tree algorithm for classification and re-
longing to class i. The second option, is to measure gression (Loh & Shih 1997). It uses a significance
the Gini impurity (G). In this case, a leaf is considered test to select variables for splitting. An advantage of
pure if all the data contained within it have the same the QUEST tree algorithm is that it is not biased in
class. The Gini impurity can be computed inside each split-variable selection, unlike CART which is biased
node using the following simplified equation: towards selecting split-variables which allow more
n
splits, and those which have more missing values.
Id (T ) ≡ G (T ) = 1 −
X
fi2 (3) CRUISE is one of the most accurate decision tree
classifiers (Loh 2011) that is also efficiently capable
i=1
of performing multiway splits. (Kim & Loh 2001)
The third method is to use the classification error proposed CRUISE which stands for Classification
(CE ): Rule with Unbiased Interaction Selection and Estima-
tion that splits each node into as many as J subnodes,
Id (T ) ≡ CE (T ) = 1 − max fi (4) which precludes the use of greedy search methods.
CRUISE is practically free of selection bias (Kim &
where the maximum values are taken among the frac- Loh 2001) and is capable of integrating interactions
tions fi within the data T that have class i. Figure 2 between variables when growing the tree. CRUISE
shows the three impurity indices, for a node with data borrows and improves upon ideas from many sources,
that are categorized into two classes, as a function of but especially from FACT, QUEST, and GUIDE for
the fraction of the data having a specific class. If all of split selection and CART for pruning.
the data belong to one class, the impurity is zero. On Finally a word about RainForest which is not a de-
the other hand, if half of the data have one class and cision tree classifier per se but rather a framework.
the remaining data belong to the other class, the im- RainForest was proposed to make decision tree con-
purity is at its maximum (Carrasco Kind & Brunner struction more scalable (same as BOAT which is in
2013). fact faster than RainForest by a factor of three while
There exist several algorithms for training de- constructing exactly the same decision tree, and can
cision trees from data including ID3, C4.5, C5.0, handle a wide range of splitting criteria (Gehrke,
J48, SPRINT, FACT, FIRM, SLIQ, CHAID, QUEST, Ganti, Ramakrishnany, & Lohz 1999)). According to
CRUISE, PUBLIC, BOAT, RAINFOREST, MARS, (Gehrke, Ramakrishnan, & Ganti 2000), a thorough
RIPPER and CART. In the following we will briefly examination of the algorithms in the literature (in-
mention some of the more common algorithms. Ross cluding C4.5, CART, CHAID, FACT, ID3 and ex-
Quinlan developed ID3 (Quinlan 1986) which stands tensions, SLIQ, Sprint and QUEST) shows that the
for Iterative Dichotomiser 3, and its later iterations greedy schema described in Algorithm 1 can be re-
include C4.5 and C5.0. ID3 attempts to generate the fined to a generic RainForest tree induction schema.
smallest multiway tree. If the problem involves real- In fact, most decision tree algorithms consider ev-
valued variables, they are first binned into intervals, ery attribute individually and need the distribution of
each interval being treated as an unordered nominal class labels for each distinct value of an attribute to
decide on the splitting criteria. Rainforest is a com- Table 1: The attributes (features) used as input to train the deci-
sion tree classifier.
prehensive approach for decision tree classifiers that
separates the scalability aspects of algorithms for con- Attributes Description
structing a decision tree from the central features that HSGenT mp Mean temperature of gear bearing
determines the quality of the tree. Rainforest concen- on generator side
trates on the growth phase of a decision tree due to P o max Max value of active power
the time consuming nature of this step RainForest, DM axM eanP ow Difference between max and mean
active power
closes the gap between the limitations to main mem- GRpm max Max generator RPM
ory datasets of algorithms in the machine learning and DM axM eanGRpm Difference between max and mean
statistics literature and the scalability requirements of generator RPM
a data mining environment (Singh & Sulekh 2017). P i min Min collective pitch
Next we present a demonstration of decision tree DM axM inV Difference between max and min
wind speed
classifier learning based on a real-world SCADA data
set from the Lillgrund offshore wind farm.
nostics; for instance wind speed, power and RPM are estimator is
strongly inter-dependent attributes, but for the same
ĝbagg = E ĝ∗k
wind speed a wind turbine could be found operating at (5)
two different power output levels (during distinct time In the machine learning and statistics literature, the
periods) or two different generator RPM levels, and two main performance measures for a classification
both cases are considered normal operating modes. tree algorithm are its predictive quality and construc-
How so? Indeed this happens very often when a wind tion time of the resulting tree. In this paper the pre-
turbine has to de-rate the power output following a de- dictive quality is given by the misclassification rate
mand from the grid side or reduce the generator RPM on the validation data set. As shown in Figure 4, the
under a specific noise or load control mode. The tar- misclassification rate of the trained Bagged tree is less
get variable T urbineState for classification is shown than 1% when the bag size is more than 10.
in Table 2. It can take two labels, N oF ault indicating Fault diagnostics (offline) The final step uses the
that the wind turbine is producing electric power un- newly generated decision tree classifier (e.g. Figure 3)
der normal operating conditions, and V ibr indicating to create diagnostics for individual target fault classes,
an excessive vibration fault resulting in the wind tur- i.e. Vibr. One aspect of fault diagnostics deals with of-
bine shutting down, furthermore triggering a message fline root cause analysis, which we will demonstrate
to be send to the vibration support technical team (in here. When diagnosing faults, we are interested in
order for some corrective action to take place). identifying the root causes (or sequence of events)
Bagged decision tree construction The second that lead to a large portion of the overall abnormal be-
step is the construction of the Bagged decision tree havior, where the decision tree edges leading to faults
classifier. An ensemble of decision trees is often become root cause candidates. In the literature, one
more accurate than any single tree classifier (Bauer can find a limited number of proposed programatic
& Kohavi 1999, Dietterich 1996). Bagging (Breiman algorithms by which decision tree classifiers can be
1996), boosting (Schapire 1990) and random forest scanned/probed for root cause analysis (Solé, Muntés-
are three popular methods of creating accurate en- Mulero, Rana, & Estrada 2017). One implementation
sembles. Previous research indicates that boosting is is as follows (Zheng, Lloyd, & Brewer 2004):
more prone to overfitting the training data (Freund & 1. Ignore the leaf nodes that correspond to normal
Schapire 1996, Opitz & Maclin 1999). Consequently, operations (i.e. N oF ault) as they will not be
the presence of noise causes a greater decrease in the useful in diagnosing faults.
performance of boosting. Therefore, this study uses 2. Identify, in the decision tree, all leaf nodes with
bagging to create an ensemble of bagged decision the target fault class of interest, i.e. V ibr
tree classifiers (using the standard CART algorithm) 3. Ranking: list the leaf nodes with the target fault
to better address the noise in the condition monitor- class ranked by failure count to prioritize their
ing data. Other decision tree algorithms and ensem- importance.
ble approaches will be investigated in future work. 4. Noise Filtering: in diagnostics, we are interested
This technique can be summarized as, (i) take a boot- in identifying the root causes that result in a large
strap sample from the data set, (ii) train an un-pruned proportion of the overall faults. Thus we retain
classification tree and (iii) aggregate the trained tree the leaf nodes accounting for more than a cer-
classifiers. In more detail, Bagging predictors com- tain threshold of the total number of faults (e.g.
prise a method for generating multiple versions of threshold = 5%)
a predictor, each on random subsets of the original 5. Extract traces (paths) containing the target fault
dataset, and fusing these into a unique final aggre- leaf node (C), and all edges (E) and internal
gated predictor. This aggregated predictor can typi- nodes (N ) up to the root node (R).
cally be used for reducing the variance of a black- 6. Extract rules at each internal node of the traces
box estimator, by introducing randomization into the 7. Node Merging: we merge nodes on a path by
construction procedure and forming an ensemble (for eliminating ancestor nodes that are logically
proof refer to (Breiman 1996, Bühlmann 2012)). The “subsumed” by successor nodes.
bagging algorithm
consists
in (1)constructing a boot- Below is an example of an automated extraction of
(1) (1) (n) (n)
strap sample X∗ , Y∗ , . . . , X∗ , Y∗ by ran- a sequential trace of events (root causes) from the
domly drawing n times with replacement from the trained decision tree classifier (see Figure 3), leading
original data X (1) , Y (1) , . . . , X (n) , Y (n) , (2) com- from the classified fault to the root of the tree:
puting the bootstrapped estimator (i.e. tree classi-
(1) (1) (n) (n) V ibr ← 1503.6 <= GRpm max < 1539.05 ←
fier) ĝ∗ = hn X∗ , Y∗ , . . . , X∗ , Y∗ where 110.455 <= DM axM eanGRpm < 380.51 ←
the function hn (·) defines the estimator as a func- DM axM inV >= 10.65
DMaxMeanGRPM
0.03
<380.51 >=380.51
Misclassification Rate
0.025
GRpm_max DMaxMinV
0.02
<1539.05 >=1539.05 <8.05 >=8.05
0.015
DMaxMinV Vibr DMaxMeanGRPM Po_max
0.01
<10.60 >=10.60 <999.04 >=999.04 <2377 >=2377
0.005
10
14
18
22
26
30
34
38
42
46
50
54
58
62
66
70
74
78
82
86
90
94
98
2
6
<110.45 >=110.45
Tree Bag Size
NoFault GRpm_max Figure 4: Misclassification rate of the validation set as a function
of tree bag size.
<1503.60 >=1503.60
CM Real-time
data telemetry
Filtering
SHM
API Categoriza�on API
data
Quality checks
Synchroniza�on
DAQ
API
Output
Predict events
Cloud compu�ng,
Classify new events
storage & analy�cs
loads Root cause analysis
Updated probabili�es
WT System
Figure 5: Graphical summary of the proposed framework decision tree learning with big data for fault diagnostics.