You are on page 1of 9

ETH Library

Fault diagnosis of wind turbine


structures using decision tree
learning algorithms with big data

Conference Paper

Author(s):
Abdallah, Imad ; Ntertimanis, Vasileios ; Mylonas, Charilaos; Tatsis, Konstantinos; Chatzi, Eleni ; Dervilis, Nikolaos; Keith,
Worden; Eoghan, Maguire

Publication date:
2018

Permanent link:
https://doi.org/10.3929/ethz-b-000313962

Rights / license:
In Copyright - Non-Commercial Use Permitted

Originally published in:


https://doi.org/10.1201/9781351174664-382

This page was generated automatically upon download from the ETH Zurich Research Collection.
For more information, please consult the Terms of use.
Fault Diagnosis of wind turbine structures using decision tree learning
algorithms with big data

I. Abdallah, V. Dertimanis, H. Mylonas, K. Tatsis & E. Chatzi


Department of Civil, Environmental and Geomatic Engineering
ETH Zürich, Zürich, Switzerland.

N. Dervilis & K. Worden


Department of Mechanical Engineering
The University of Sheffield, Sheffield, United Kingdom.

E. Maguire
Vattenfall AB

ABSTRACT: In the context of optimized Operation & Maintenance of wind energy infrastructure, it is impor-
tant to develop decision support tools, able to guide operators and engineers in the management of these assets.
This task is particularly challenging given the multiplicity of uncertainties involved, from the point of view of
the aggregated data, the available knowledge with respect to the wind turbine structures, and sub-components, as
well as the constantly varying operational and environmental loads. We here propose to propagate wind turbine
telemetry through a decision tree learning algorithm to detect faults, errors, damage, patterns, anomalies and ab-
normal operation. The use of decision trees is motivated by the fact that they tend to be easier to implement and
interpret than other quantitative data-driven methods. Furthermore, the telemetry consists of data from condition
and structural health monitoring systems, which lends itself nicely in the field of Big Data as large amounts are
continuously sampled at high rate from thousands of wind turbines. In this paper, we review several decision tree
algorithms formerly proposed by the machine learning community (i.e. ID3, C4.5, C5.0, J48, SPRINT, FACT,
FIRM, SLIQ, CHAID, QUEST, CRUISE, PUBLIC,BOAT, RAINFOREST, MARS, RIPPER and CART), we then
train an ensemble Bagged decision tree classifier on a large condition monitoring dataset from an offshore wind
farm comprising 48 wind turbines, and use it to automatically extract paths linking excessive vibrations faults
to their possible root causes. We finally give an outlook of an architecture to implement decision tree learning
in the context of cloud computing for big data, namely involving a cloud based Apache Hadoop software for
very large data storage and handling, and Apache Spark for efficiently running machine-learning algorithms.

1 INTRODUCTION dislocation between (i) data derived from CM sys-


tems (e.g. power output, rotor RPM), (ii) data ob-
Wind turbines maintenance and inspection relies on tained from specialized SHM (e.g. tower accelera-
conventional techniques (Yang & Sun 2013), such tion, strain on blade root), and (iii) specialized main-
as visual inspection, non-destructive evaluation and tenance strategies of individual WT components. As
standard signal processing, trend analysis and statis- a result, there are currently no holistic approach, and
tics of data streamed from the Supervisory Control systematic, quantitative and automated tools for mon-
And Data Acquisition (SCADA). Specialized con- itoring and diagnostics of WTs, for operation, mainte-
dition monitoring (CM) systems are only available nance (O&M) and decision making within their life-
on specific components such as the gearbox and cycle. Towards this end, we propose to perform au-
main bearing (Hameed, Hong, Ahn, & Song 2009), tomated fault diagnostics and root cause analysis of
while far forming part of the actual engineering prac- faults on wind turbines (WTs) on the basis of deci-
tice (Grasse, Trappe, Thoens, & Said 2011) struc- sion tree classifiers. A decision tree is a predictive
tural health monitoring (SHM) systems are deployed model that maps observations to their target values or
mostly for research purposes, or temporarily dur- labels. The key concept lies in running WT teleme-
ing the certification stage. In fact, there exists a
R
try data through a decision tree learning algorithm
E1 E3
for detecting faults, errors, damage patterns, anoma- N3
lies and abnormal operation (i.e., “end states”). A de- N1
E6 E7
cision tree essentially comprises a machine learning E2 E4 C5
C3 E8
tool for classification of event outcomes. For a given E5 C7
initiating event, multiple end states are possible, link- C1
C2 N2 C6
ing each event to an associated probability of occur- E10 E9
rence. Once built and trained, and given a new set of R – Root node
C4 N4
measurements, the decision tree may be used to pre- N- Internal node E11 E12
dict “end states” and classify (discover) previously C- Leaf Node
unknown “end states”. By examining the paths that
E- Edges
lead to failure-predicting leaf nodes, one may distin-
guish the possible sources (root causes) of error. The Figure 1: Graphical representation of a decision tree (DT) clas-
remainder of this article is organized as follows. In sifier. DT terminologies are also shown.
section 2 we revisit the decision tree learning theory.
In section 3 we train an ensemble of bagged decision (Lim, Loh, & Shih 1997, Hand 1997, Lim, Loh, &
tree classifiers with the standard CART algorithm on a Shih 2000, Caruana & Niculescu-Mizil 2006).
condition monitoring data set from the Lillgrund off- A decision tree is is a tree-structured classifier built
shore wind farm comprising 48 wind turbines and use by starting with a single node that encompasses the
it to perform a diagnostics to elucidate the root cause entire data and recursively splitting the data within a
of excessive vibrations. Finally, in section 4 we fur- node, generally into two branches (some algorithms
ther our discussion to show how decision tree learn- can perform multiway splits) by selecting the vari-
ing can be expanded to big data based applications for able (dimension) that best classifies the samples ac-
monitoring and diagnostics for wind turbines using cording to a split criterion, i.e. the one that maxi-
the object-oriented based decision tree concept (Ab- mizes the information gain (Equation 1) among the
dallah, Dertimanis, & Chatzi 2017). random subsample of dimensions obtained at every
point. The splitting continues until a terminal leaf is
2 DECISION TREES created that meets a stopping criterion such as a min-
imum leaf size or a variance threshold. Each terminal
A Decision Tree (also called Classification or Predic- leaf contains data that belongs to one or more classes.
tion Tree) is designed to classify or predict a discrete Within this leaf, a model is applied that provides a
category from the data. Decision Trees (DTs) are a fairly comprehensible prediction, especially in situa-
non-parametric supervised learning method used for tions where many variables may exist that interact in
classification (and regression). In the machine learn- a nonlinear manner as is often the case on wind tur-
ing sense, the goal is to create a classification model bines (Carrasco Kind & Brunner 2013). Algorithm 1
(classification tree) that predicts the value of a tar- shows pseudocode of a generic decision tree learning
get variable (also known as label or class) by learn- algorithm.
ing simple decision rules inferred from the data fea- Formally, the splitting is done by choosing the at-
tures (also known as attributes or predictors). From tribute that maximizes the Information Gain (IG ),
Figure 1 an internal node N denotes a test on an at- which is defined in terms of the impurity degree in-
tribute, an edge E represents an outcome of the test, dex Id :
and the Leaf nodes C represent class labels or class
X |Tm |
distribution. IG (T, M ) = Id (T ) − Id (Tm ) (1)
Four reasons motivated us to work with decision
m∈M
|T |
tree classifiers. First, they can be learned and up-
dated from data relatively fast compared to other where T is the training data in a given node, M is one
methods. Second, they are visually more intuitive, of the possible dimensions along which the node may
simpler and easier to assimilate and interpret by be split, m are the possible values of M , |T | is the
humans/engineers. Third, unlike other classification size of the training data, |Tm | is the number of objects
methods, with decision tree classifiers one is able to for a given subset m within the current node, and Id
perform data-driven root cause analysis of faults; one is the function that represents the degree of impurity
can trace a path from the end state (e.g. blade dam- of the information. There are three standard methods
age) to the initiating event (e.g. wrong parameters in to compute the impurity index (Id ). The first method
control system), a way that follows the sequence and is by using the information entropy (H), which is de-
chronology of how events are interlinked. Last, it has fined by:
been shown that the accuracy of decision tree classi-
fiers is comparable or superior (especially ensemble n
descision tree classifiers) to other models and in fact
X
Id (T ) ≡ H (T ) = − fi log2 (fi ) (2)
display the best combination of error rate and speed i=1
1
attribute. C4.5 converts the trained trees into sets of if-
then rules (Quinlan 1994) and improves on ID3 by al-
0.9
lowing both discrete and continuous attributes, miss-
0.8
Gini Index
ing attribute values, attributes with differing costs,
Impurity Index Id

0.7 Entropy
Class error
and pruning trees to avoid over-fitting are usually ap-
0.6 plied to improve the ability of the tree to generalise to
0.5 unseen data. The accuracy of each rule is then eval-
0.4
uated to determine the order in which they should be
0.3
applied. C5.0 is essentially the same as C4.5 but uses
less memory and builds smaller rule sets while being
0.2
more accurate.
0.1
CART (Breiman, Friedman, Olshen, & Stone 1984)
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
which stands for Classification and Regression Trees
Fraction of Class i=1 (f1) is very similar to C4.5, but it differs in that it supports
numerical target variables (regression) and constructs
Figure 2: Impurity index Id for a two-class example as a function binary tree based on a numerical splitting criterion re-
of the probability of one of the classes f1 using the information
entropy, Gini impurity and classification error. In all cases, the cursively applied to the data instead of constructing
impurity is at its maximum when the fraction of data within a rule sets.
node with class 1 is 0.5, and zero when all data are in the same According to (Lim, Loh, & Shih 2000) C4.5 and
category. QUEST have the best combinations of error rate and
speed, but C4.5 tends to produce trees with twice as
where i is the class to be predicted, n is all possible many leaves as those from Quest. QUEST is a binary-
classes, and fi is the fraction of the training data be- split decision tree algorithm for classification and re-
longing to class i. The second option, is to measure gression (Loh & Shih 1997). It uses a significance
the Gini impurity (G). In this case, a leaf is considered test to select variables for splitting. An advantage of
pure if all the data contained within it have the same the QUEST tree algorithm is that it is not biased in
class. The Gini impurity can be computed inside each split-variable selection, unlike CART which is biased
node using the following simplified equation: towards selecting split-variables which allow more
n
splits, and those which have more missing values.
Id (T ) ≡ G (T ) = 1 −
X
fi2 (3) CRUISE is one of the most accurate decision tree
classifiers (Loh 2011) that is also efficiently capable
i=1
of performing multiway splits. (Kim & Loh 2001)
The third method is to use the classification error proposed CRUISE which stands for Classification
(CE ): Rule with Unbiased Interaction Selection and Estima-
tion that splits each node into as many as J subnodes,
Id (T ) ≡ CE (T ) = 1 − max fi (4) which precludes the use of greedy search methods.
CRUISE is practically free of selection bias (Kim &
where the maximum values are taken among the frac- Loh 2001) and is capable of integrating interactions
tions fi within the data T that have class i. Figure 2 between variables when growing the tree. CRUISE
shows the three impurity indices, for a node with data borrows and improves upon ideas from many sources,
that are categorized into two classes, as a function of but especially from FACT, QUEST, and GUIDE for
the fraction of the data having a specific class. If all of split selection and CART for pruning.
the data belong to one class, the impurity is zero. On Finally a word about RainForest which is not a de-
the other hand, if half of the data have one class and cision tree classifier per se but rather a framework.
the remaining data belong to the other class, the im- RainForest was proposed to make decision tree con-
purity is at its maximum (Carrasco Kind & Brunner struction more scalable (same as BOAT which is in
2013). fact faster than RainForest by a factor of three while
There exist several algorithms for training de- constructing exactly the same decision tree, and can
cision trees from data including ID3, C4.5, C5.0, handle a wide range of splitting criteria (Gehrke,
J48, SPRINT, FACT, FIRM, SLIQ, CHAID, QUEST, Ganti, Ramakrishnany, & Lohz 1999)). According to
CRUISE, PUBLIC, BOAT, RAINFOREST, MARS, (Gehrke, Ramakrishnan, & Ganti 2000), a thorough
RIPPER and CART. In the following we will briefly examination of the algorithms in the literature (in-
mention some of the more common algorithms. Ross cluding C4.5, CART, CHAID, FACT, ID3 and ex-
Quinlan developed ID3 (Quinlan 1986) which stands tensions, SLIQ, Sprint and QUEST) shows that the
for Iterative Dichotomiser 3, and its later iterations greedy schema described in Algorithm 1 can be re-
include C4.5 and C5.0. ID3 attempts to generate the fined to a generic RainForest tree induction schema.
smallest multiway tree. If the problem involves real- In fact, most decision tree algorithms consider ev-
valued variables, they are first binned into intervals, ery attribute individually and need the distribution of
each interval being treated as an unordered nominal class labels for each distinct value of an attribute to
decide on the splitting criteria. Rainforest is a com- Table 1: The attributes (features) used as input to train the deci-
sion tree classifier.
prehensive approach for decision tree classifiers that
separates the scalability aspects of algorithms for con- Attributes Description
structing a decision tree from the central features that HSGenT mp Mean temperature of gear bearing
determines the quality of the tree. Rainforest concen- on generator side
trates on the growth phase of a decision tree due to P o max Max value of active power
the time consuming nature of this step RainForest, DM axM eanP ow Difference between max and mean
active power
closes the gap between the limitations to main mem- GRpm max Max generator RPM
ory datasets of algorithms in the machine learning and DM axM eanGRpm Difference between max and mean
statistics literature and the scalability requirements of generator RPM
a data mining environment (Singh & Sulekh 2017). P i min Min collective pitch
Next we present a demonstration of decision tree DM axM inV Difference between max and min
wind speed
classifier learning based on a real-world SCADA data
set from the Lillgrund offshore wind farm.

Table 2: Target variable: T urbineState. Turbine state is defined


Algorithm 1: Pseudocode of a generic recur- in this demonstration according to two labels.
sive decision tree learning algorithm.
1 DTClassifier (T R, T arget, Attr); Class Label Description
Input : T R: training examples, T arget: N oF ault Normal operation, turbine is producing
power, no faults
target label, Attr: set of descriptive V ibr Structural vibrations error resulting in: ”In-
attributes form Vibration Support”
Output: DT : decision tree classifier
2 Create a Root node for the tree;
3 if T R have all the same label ti then
than 2.5 million records were available, of which 980
4 return a single-node tree, corresponding to excessive vibration error events are recorded across
leaf node with that label; all wind turbines. The error event of interest in this
5 else if the set of attributes Att is empty then
demonstration is excessive structural vibrations.
6 Return the single-node tree, i.e. Root, with
most common value of T arget in T R; Data Pre-processing The first step prepares the
data for the construction of the prediction tree classi-
7 else
fier. Knowledge of the process is helpful in the elim-
8 pick an attribute A from Att (such that A
ination of parameters (features) that are not signifi-
maximizes IG ) and create a node R for it;
cant. The SCADA system recorded parameters can
9 for each possible value vi of A do
be grouped into the following categories: (i) system
10 Let T Rvi be the subset of T R that have
related data, e.g., turbine number and index, time
value vi for A;
stamp, are turbine specific and, therefore, can be ex-
11 Add an out-going edge E to node R
cluded from the decision tree classifier training, (ii)
labeled with the value vi ;
operating performance parameters such as power out-
12 if T Rvi is empty then
put, pitch and rotor speed, (iii) environmental param-
13 attach a leaf node to edge E labeled
eters such as wind speed and wind direction, (iv)
with the target value = most
temperature measurements for various components
common value of T arget in T R;
such as gearbox, bearings and generator, and finally
14 else (4) dynamics parameters such as tower top acceler-
15 call ations. The attributes chosen in this demonstration
DTClassifier(T Rvi , T arget, Attr − are shown in Table 1. The table includes both origi-
A) and attach the resulting tree as nal sensor attributes such as maximum generator ro-
the subtree under edge E; tational speed over a 10min period (GRpm max) and
16 end additional derived parameters such as the difference
17 end between max and min wind speed over a 10min pe-
18 Return the subtree rooted at R; riod (DM axM inV ). We open a small parenthesis
19 end here. As with most pattern recognition methods, tree-
based classification methods work best if the proper
features are selected to start with; preprocessing by
a data dimensionality reduction techniques such as
3 DEMONSTRATION principal component analysis (PCA) or independent
component analysis (ICA) or optimal feature selec-
Condition monitoring data from 48 wind turbines was tion approaches such as the wrapper approach inte-
collected over a period of 12 months and sampled grated with the genetic or the best-first search algo-
every 10 minutes, across 64 channels. In total, more rithms can be effective because they find important
axes to be used as guideline for the selection of the tion of the data, and (3) repeating steps 1 and 2 M
features upon which a decision tree is trained. How-  where M is often chosen as 50 or 100, yield-
times,
ever, in this demonstration we chose not to in order to ing ĝ∗k , k = 1, . . . , M and the bagged estimator is
test the limit of a decision tree classifier for fault diag- ĝbagg = M k=1 ĝ∗ /M . In theory, if M → ∞ the bagged
k
P

nostics; for instance wind speed, power and RPM are estimator is
strongly inter-dependent attributes, but for the same
ĝbagg = E ĝ∗k
 
wind speed a wind turbine could be found operating at (5)
two different power output levels (during distinct time In the machine learning and statistics literature, the
periods) or two different generator RPM levels, and two main performance measures for a classification
both cases are considered normal operating modes. tree algorithm are its predictive quality and construc-
How so? Indeed this happens very often when a wind tion time of the resulting tree. In this paper the pre-
turbine has to de-rate the power output following a de- dictive quality is given by the misclassification rate
mand from the grid side or reduce the generator RPM on the validation data set. As shown in Figure 4, the
under a specific noise or load control mode. The tar- misclassification rate of the trained Bagged tree is less
get variable T urbineState for classification is shown than 1% when the bag size is more than 10.
in Table 2. It can take two labels, N oF ault indicating Fault diagnostics (offline) The final step uses the
that the wind turbine is producing electric power un- newly generated decision tree classifier (e.g. Figure 3)
der normal operating conditions, and V ibr indicating to create diagnostics for individual target fault classes,
an excessive vibration fault resulting in the wind tur- i.e. Vibr. One aspect of fault diagnostics deals with of-
bine shutting down, furthermore triggering a message fline root cause analysis, which we will demonstrate
to be send to the vibration support technical team (in here. When diagnosing faults, we are interested in
order for some corrective action to take place). identifying the root causes (or sequence of events)
Bagged decision tree construction The second that lead to a large portion of the overall abnormal be-
step is the construction of the Bagged decision tree havior, where the decision tree edges leading to faults
classifier. An ensemble of decision trees is often become root cause candidates. In the literature, one
more accurate than any single tree classifier (Bauer can find a limited number of proposed programatic
& Kohavi 1999, Dietterich 1996). Bagging (Breiman algorithms by which decision tree classifiers can be
1996), boosting (Schapire 1990) and random forest scanned/probed for root cause analysis (Solé, Muntés-
are three popular methods of creating accurate en- Mulero, Rana, & Estrada 2017). One implementation
sembles. Previous research indicates that boosting is is as follows (Zheng, Lloyd, & Brewer 2004):
more prone to overfitting the training data (Freund & 1. Ignore the leaf nodes that correspond to normal
Schapire 1996, Opitz & Maclin 1999). Consequently, operations (i.e. N oF ault) as they will not be
the presence of noise causes a greater decrease in the useful in diagnosing faults.
performance of boosting. Therefore, this study uses 2. Identify, in the decision tree, all leaf nodes with
bagging to create an ensemble of bagged decision the target fault class of interest, i.e. V ibr
tree classifiers (using the standard CART algorithm) 3. Ranking: list the leaf nodes with the target fault
to better address the noise in the condition monitor- class ranked by failure count to prioritize their
ing data. Other decision tree algorithms and ensem- importance.
ble approaches will be investigated in future work. 4. Noise Filtering: in diagnostics, we are interested
This technique can be summarized as, (i) take a boot- in identifying the root causes that result in a large
strap sample from the data set, (ii) train an un-pruned proportion of the overall faults. Thus we retain
classification tree and (iii) aggregate the trained tree the leaf nodes accounting for more than a cer-
classifiers. In more detail, Bagging predictors com- tain threshold of the total number of faults (e.g.
prise a method for generating multiple versions of threshold = 5%)
a predictor, each on random subsets of the original 5. Extract traces (paths) containing the target fault
dataset, and fusing these into a unique final aggre- leaf node (C), and all edges (E) and internal
gated predictor. This aggregated predictor can typi- nodes (N ) up to the root node (R).
cally be used for reducing the variance of a black- 6. Extract rules at each internal node of the traces
box estimator, by introducing randomization into the 7. Node Merging: we merge nodes on a path by
construction procedure and forming an ensemble (for eliminating ancestor nodes that are logically
proof refer to (Breiman 1996, Bühlmann 2012)). The “subsumed” by successor nodes.
bagging algorithm
 consists
 in (1)constructing  a boot- Below is an example of an automated extraction of
(1) (1) (n) (n)
strap sample X∗ , Y∗ , . . . , X∗ , Y∗ by ran- a sequential trace of events (root causes) from the
domly drawing n times with replacement from the trained decision tree classifier (see Figure 3), leading
original data X (1) , Y (1) , . . . , X (n) , Y (n) , (2) com- from the classified fault to the root of the tree:
puting the bootstrapped estimator  (i.e. tree classi-

(1) (1) (n) (n) V ibr ← 1503.6 <= GRpm max < 1539.05 ←
fier) ĝ∗ = hn X∗ , Y∗ , . . . , X∗ , Y∗ where 110.455 <= DM axM eanGRpm < 380.51 ←
the function hn (·) defines the estimator as a func- DM axM inV >= 10.65
DMaxMeanGRPM

0.03
<380.51 >=380.51

Misclassification Rate
0.025
GRpm_max DMaxMinV
0.02
<1539.05 >=1539.05 <8.05 >=8.05

0.015
DMaxMinV Vibr DMaxMeanGRPM Po_max
0.01
<10.60 >=10.60 <999.04 >=999.04 <2377 >=2377
0.005

DMaxMeanGRPM NoFault Vibr Vibr NoFault


0

10
14
18
22
26
30
34
38
42
46
50
54
58
62
66
70
74
78
82
86
90
94
98
2
6
<110.45 >=110.45
Tree Bag Size
NoFault GRpm_max Figure 4: Misclassification rate of the validation set as a function
of tree bag size.
<1503.60 >=1503.60

4 OUTLOOK: BIG DATA BASED


NoFault Vibr MONITORING AND DIAGNOSTICS
Figure 3: Part of the decision tree classifier based on the SCADA FRAMEWORK
data from 48 wind turbines.
So far we demonstrated how offline root cause anal-
ysis can be conducted via an ensemble based Bagged
This sequential trace of events indicate that the decision tree learning from SCADA data of 48 wind
fault V ibr can possibly occur when, over a period of turbines based on the CART algorithm. In this section
10 minutes, the maximum generator speed is in the we summarize an outlook for a monitoring and di-
range 1503.6 − 1549.05 RP M , the difference be- agnostics framework that would perform on big data
tween max and mean generator speed is in the range and in real-time. The intent is for this framework to
of 110.45 − 380.51 RP M and the difference between be deployed on a cloud such as Azure and scale as the
the max and min wind speed exceeds 10.65 m/s. volume of streamed data increases. Figure 5 shows an
Note that it is not possible to infer from the data the overview of the architecture of the framework. The
time interval during which the large change in wind main features of the framework include:
speed occured but it could well be an indication of • Hardware-software package, able to acquire
a gust or excessive turbulence over a short period and fuse in real-time both condition moni-
of time resulting in the excessive vibration fault. toring (CM) (e.g. electric power output, rotor
Another example of an automated extraction of a speed) and specialized structural health monitor-
sequential trace of events (root causes) leading from ing (SHM) data (e.g. tower accelerations, strains
the classified fault to the root of the tree is as follows: on blade) from WT components (e.g. blades,
gearbox, tower, etc.).
V ibr ← P o max < 2377 ← DM axM inV >= • Computational core: object-oriented decision
8.05 ← DM axM eanGRpm >= 380.51 tree learning algorithm, Bayesian network (BN)
based computation of probabilities and root-
This sequential trace of events indicate that the cause discovery algorithm (as demonstrated in
fault V ibr can possibly occur when, over a period of the previous section).
10 minutes, the maximum electric power output does • Distributed cloud based data storage and analyt-
not exceed 2377 kW , the difference between max ics of the high rate of real-time data streaming
and mean generator speed exceeds 380.51 RP M and from the measuring unit on the WT, which in-
the difference between the max and min wind speed terfaces with the real-time decision-tree software
exceeds 8.05 m/s. unit. Hence, the toolkit does not need to save data
and do heavy computations on remote WT.
• Online user interface to visualize the output from
the decision tree (decision support tool).
In the following we elaborate a bit more on: (i)
This approach to data-driven root cause analysis el- the cloud computing, storage and analytics, and (ii)
egantly elucidates the traces of events that lead to a object-oriented decision tree learning aspects.
fault. These traces can subsequently be used by an en- (i) For efficient fusion of the large bulk of infor-
gineer to design simulation scenarios to try and repli- mation made available from the condition and struc-
cate the faults and to propose mitigating actions. tural health monitoring systems constitutes a “Big
Acquisition Object-Oriented Real-Time Decision-Tree
Hardware

CM Real-time
data telemetry

Filtering
SHM
API Categoriza�on API
data
Quality checks
Synchroniza�on
DAQ

API

Output
Predict events
Cloud compu�ng,
Classify new events
storage & analy�cs
loads Root cause analysis
Updated probabili�es
WT System

Figure 5: Graphical summary of the proposed framework decision tree learning with big data for fault diagnostics.

Apache Spark very suitable to process and analyze the


large volumes of data that are gathered from the wind
Collector
turbines (Canizo, Onieva, Conde, Charramendieta, &
Trujillo 2017).

(ii) A priori Decision Trees are built first to serve


the fundamental diagnostic tasks by combining engi-
Figure 6: Cloud computing and storage. neering knowledge of the system, failure modes and
domain knowledge. The dynamic decision tree classi-
Data” problem, as large amounts of data are contin- fiers are trained over time from wind turbine teleme-
uously sampled at high and diverse rates from the try (CM & SHM data). The limitation in traditional
WTs within a wind farm. Thus, the integration of a decision tree classifiers appears when the component
distributed cloud based data storage and analytics is displays a behaviour with feedback (i.e., after a re-
a natural fit with consequent reduction in the cost of pair/update in the system) or for evolving systems
handling and manipulating of said data (see Figure 6). (e.g. when new sensors are integrated or aging of
In the proposed framework, large scale storage of his- the system), which implies a need to establish sev-
torical data from WTs is done via the Hadoop Data eral decision trees based on the possible ordering of
File System (HDFS). HDFS is a distributed file sys- the events or based on new initiating events. One way
tem that is fault tolerant and can load large volumes around this is an innovation that we introduced in
of data, in a distributed manner, even if any errors oc- this framework in the form of object-oriented deci-
curs. Telemetry and data acquisition is supported by sion tree learning (Wyss & Durán 2001). To this end,
Apache Kafka which is a distributed streaming plat- a WT is viewed as a multi-layered system of objects
form that aims to provide a unified, high-throughput, (e.g. structure, controller, actuator, etc.) that are de-
low-latency methodology for handling real-time data fined on the basis of abstract super-classes, attributed
feeds. Kafka is a fault-tolerant system, meaning that if with specific properties and methods. Decision trees
any error occurs, published data would still be avail- classifier are further mapped into Bayesian Networks
able in case the application or Kafka is stopped, is for further assessment of the conditional probabilities
offline and then restarted. Thus, Kafka can be guar- (Bearfield & Marsh 2005, Jassens, Wets, Brijs, Van-
anteed not to miss any data that is issued from the hoof, Arentz, & Timmermans 2006). The integration
wind turbines. Finally, for data processing purposes, of the object-oriented decision tree learning and BN
Apache Spark is used. Spark is an in-memory fast delivers updated probability of occurrence associated
and general engine for large-scale data processing. In with each event and end state, which would form a
addition, Spark can scale to thousands of machines solid indicator of the risk of future failure of any given
in a fault-tolerant manner. These characteristics make component.
5 CONCLUSIONS and random forests. Monthly Notices of the Royal Astronom-
ical Society 432, 1483–1501.
Caruana, R. & A. Niculescu-Mizil (2006). An empirical compar-
We presented a review of several decision tree classi- ison of supervised learning algorithms. In Proc. of the 23rd
fication algorithms. We then demonstrated how data- inter. conf. on Mach. lear., Pittsburgh, Pennsylvania, USA,
driven and automated root cause analysis can be con- pp. 161–168.
ducted via an ensemble based Bagged decision tree Dietterich, T. (1996). Applying the weak learning framework
learning from SCADA data of 48 wind turbines based to understand and improve c4.5. In Proc. 13th International
Conference on Machine Learning, pp. 96–104.
on the CART algorithm. Root cause analysis is here Freund, Y. & R. Schapire (1996). Experiments with a new boost-
cast in the sense of programmatically discovering the ing algorithm. In Proc. 13th International Conference on
sequence of events (paths) leading from a classified Machine Learning, pp. 148–156.
fault at the leaf, all the way to the root of the decision Gehrke, J., V. Ganti, R. Ramakrishnany, & W. Lohz (1999).
tree classifier. These traces can subsequently be used BOAToptimistic decision tree construction. In Proc. of the
1999 ACM SIGMOD intern. conf. on Manag. of data, Stock-
by an engineer to design simulation scenarios to try holm, Sweden, pp. 169–180.
and replicate the faults and to propose mitigating ac- Gehrke, J., R. Ramakrishnan, & V. Ganti (2000). Rainforest
tions. Finally, we briefly presented an architecture for - A framework for fast decision tree construction of large
a monitoring and diagnostics framework that would datasets. Data Min. Knowl. Discov. 4(2/3), 127–162.
perform on big data. In particular we highlighted the Grasse, F., V. Trappe, S. Thoens, & S. Said (2011). Structural
health monitoring of wind turbine blades by strain measure-
need for cloud based storage and computing, and an ment and vibration analysis. In Proc. of the 8th International
innovative approach based on object-oriented deci- Conference on Struct. Dyn.(EURODYN), Leuven, Belgium.
sion tree learning that extends the traditional decision Hameed, Z., Y. S. Hong, S. H. Ahn, & C. K. Song (2009). Con-
tree classifier concept. In the future, more concrete dition monitoring and fault detection of wind turbines and
related algorithms: A review. Renewable and Sustainable En-
implementations and results of the proposed frame- ergy Reviews 13, 1–39.
work will be disseminated. Hand, D. (1997). Construction and Assessment of Classification
Rules. Chichester: John Wiley and Sons.
Jassens, D., G. Wets, T. Brijs, K. Vanhoof, T. Arentz, & H. Tim-
6 ACKNOWLEDGEMENTS mermans (2006). Integrating Bayesian Networks and deci-
sion trees in a sequential rule-based transportation model.
Euro. J. Oper. Res. 1, 16–34.
The authors acknowledge the support of the European Kim, H. & W. Loh (2001). Classification trees with unbiased
Research Council via the ERC Starting Grant WIND- multiway splits. J. Amer. Statist. Assoc. 96, 598–604.
MIL (ERC-2015-StG #679843) on the topic of Smart Lim, T., W. Loh, & Y. Shih (1997). An empirical comparison
Monitoring, Inspection and Life-Cycle Assessment of of decision trees and other classification methods. Technical
Report TR 979, Department of Statistics, UW Madison.
Wind Turbines. We would like to thank Vattenfall AB Lim, T., W. Loh, & Y. Shih (2000). A comparison of prediction
for providing access to the SCADA data from the Lill- accuracy, complexity, and training time of thirty-three old
grund offshore wind farm. and new classification algorithms. Machine Learning Jour-
nal. 4, 203–228.
Loh, W. & Y. Shih (1997). Split selection methods for classifica-
REFERENCES tion trees. Statistica Sinica 7, 815–840.
Loh, W. Y. (2011). Classification and regression trees. Wiley In-
terdiscip. Rev. Data Min. Knowl. Discov. 1, 14–23.
Abdallah, I., V. Dertimanis, & E. Chatzi (2017). Data-driven Opitz, D. & R. Maclin (1999). The strength of weak learn ability.
identification, classification and update of decision tress for Popular ensemble methods: an empirical study 1, 169–198.
monitoring and diagnostics of wind turbines. In 2nd ECCO- Quinlan, J. R. (1986). Induction of decision trees. Machine
MAS Thematic Conference, UNCECOMP, Rhodes Island, Learning 1, 81–106.
Greece. Quinlan, J. R. (1994). Programs for Machine Learning. Morgan
Bauer, E. & R. Kohavi (1999). An empirical comparison of vot- Kaufmann Publishers.
ing classification algorithms: Bagging, boosting, and vari- Schapire, R. (1990). The strength of weak learn ability. Mach.
ants. Mach. Learn. 36, 105–139. Learn. 5, 197–227.
Bearfield, G. & W. Marsh (2005). Generalising event trees us- Singh, K. & R. Sulekh (2017). The comparison of various deci-
ing bayesian networks with a case study of train derailment. sion tree algorithms for data analysis. International Journal
In International Conf. on Comp. Saf., Reli., and Sec., SAFE- Of Engineering And Computer Science 6, 21557–21562.
COMP, Fredrikstad, Norway, pp. 52–66. Solé, M., V. Muntés-Mulero, A. I. Rana, & G. Estrada (2017).
Breiman, L. (1996). Bagging predictors. Mach. Learn. 24, 123– Survey on Models and Techniques for Root-Cause Analysis.
140. ArXiv e-prints.
Breiman, L., J. Friedman, R. Olshen, & C. Stone (1984). Classi- Wyss, G. D. & F. A. Durán (2001). OBEST: The ob-
fication and Regression Trees. Wadsworth Inc. ject based event scenario tree methodology. Technical Re-
Bühlmann, P. (2012). Bagging, boosting and ensemble methods. port SAND2001-0828, Sandia National Laboratories, Albu-
In J. Gentle and Y. Mori (Eds.), Handbook of comp. stat., querque, CA, USA.
Chapter 33, pp. 985–1022. Springer. Yang, B. & D. Sun (2013). Testing, inspecting and monitoring
Canizo, M., E. Onieva, A. Conde, S. Charramendieta, & S. Tru- technologies for wind turbine blades: A survey. Renewable
jillo (2017). Real-time predictive maintenance for wind tur- and Sustainable Energy Reviews 22, 515–526.
bines using big data frameworks. In Proc. on IEEE Int. Conf. Zheng, A. X., J. Lloyd, & E. Brewer (2004). Failure diagnosis
on Prognostics and Health Management, Allen, TX, USA, using decision trees. In Proc. of the 1st Intern. Conf. on Au-
pp. 70–77. tonomic Comp., ICAC, Washington, DC, USA, pp. 36–43.
Carrasco Kind, M. & R. Brunner (2013). Tpz : Photometric red-
shift pdfs and ancillary information by using prediction trees

You might also like