You are on page 1of 9

Safety and Reliability – Safe Societies in a Changing World – Haugen et al.

(Eds)
© 2018 Taylor & Francis Group, London, ISBN 978-0-8153-8682-7

Fault diagnosis of wind turbine structures using decision tree learning


algorithms with big data

I. Abdallah, V. Dertimanis, H. Mylonas, K. Tatsis & E. Chatzi


Department of Civil, Environmental and Geomatic Engineering, ETH Zürich, Zürich, Switzerland

N. Dervilis & K. Worden


Department of Mechanical Engineering, The University of Sheffield, Sheffield, UK

E. Maguire
Vattenfall AB, Edinburgh, UK

ABSTRACT: In the context of Operation and Maintenance of wind energy infrastructure, it is impor-
tant to develop decision support tools, able to guide engineers in the management of these assets. This
task is particularly challenging given the multiplicity of uncertainties involved, from the point of view of
the aggregated data, the available knowledge with respect to the wind turbine structures, as well as the
varying operational and environmental loads. We propose to propagate wind turbine telemetry through a
decision tree learning algorithm to detect faults, damage, and abnormal operations. The use of decision
trees is motivated by the fact that they tend to be easier to implement and interpret than other quantitative
data-driven methods. Furthermore, the telemetry consists of data from condition and structural health
monitoring systems, which lends itself nicely in the field of Big Data as large amounts are continuously
sampled at high rate from thousands of wind turbines. In this paper, we review several decision tree algo-
rithms, we then train an ensemble Bagged decision tree classifier on a dataset from an offshore wind farm
comprising 48 wind turbines, and use it to automatically extract paths linking excessive vibrations faults
to their possible root causes. We finally give an outlook of a cloud computing based architecture to imple-
ment decision tree learning involving Apache Hadoop and Spark.

1 INTRODUCTION WTs, for operation, maintenance (O&M) and deci-


sion making within their life-cycle. Towards this
Wind Turbines (WTs) maintenance and inspection end, we propose to perform automated fault diag-
relies on conventional techniques (Yang & Sun nostics and root cause analysis of faults on wind
2013), such as visual inspection, non-destructive turbines (WTs) on the basis of decision tree clas-
evaluation and standard signal processing, trend sifiers. A decision tree is a predictive model that
analysis and statistics of data streamed from maps observations to their target values or labels.
the Supervisory Control And Data Acquisition The key concept lies in running WT telemetry data
(SCADA). Specialized Condition Monitoring through a decision tree learning algorithm for
(CM) systems are only available on specific com- detecting faults, errors, damage patterns, anoma-
ponents such as the gearbox and main bearing lies and abnormal operation (i.e., end states).
(Hameed et al. 2009), while far forming part of A decision tree essentially comprises a machine
the actual engineering practice (Grasse et al. 2011) learning tool for classification of event outcomes.
Structural Health Monitoring (SHM) systems are For a given initiating event, multiple end states are
deployed mostly for research purposes, or tempo- possible, linking each event to an associated prob-
rarily during the certification stage. In fact, there ability of occurrence. Once built and trained, and
exists a dislocation between (i) data derived from given a new set of measurements, the decision tree
CM systems (e.g. power output, rotor RPM), (ii) may be used to predict end states and classify (dis-
data obtained from specialized SHM (e.g. tower cover) previously unknown end states. By examin-
acceleration, strain on blade root), and (iii) spe- ing the paths that lead to failure-predicting leaf
cialized maintenance strategies of individual WT nodes, one may distinguish the possible sources
components. As a result, there are currently no (root causes) of error. The remainder of this arti-
holistic approach, and systematic, quantitative and cle is organized as follows. In section 2 we revisit
automated tools for monitoring and diagnostics of the decision tree learning theory. In section  3 we

3053
train an ensemble of bagged decision tree clas- the accuracy of decision tree classifiers is compara-
sifiers with the standard CART algorithm on a ble or superior (especially ensemble descision tree
condition monitoring data set from the Lillgrund classifiers) to other models and in fact display the
offshore wind farm comprising 48 wind turbines best combination of error rate and speed (Lim,
and use it to perform a diagnostics to elucidate Loh, & Shih 1997, Hand 1997, Lim, Loh, & Shih
the root cause of excessive vibrations. Finally, in 2000, Caruana & Niculescu-Mizil 2006).
section  4 we further our discussion to show how A decision tree is is a tree-structured classifier
decision tree learning can be expanded to big data built by starting with a single node that encom-
based applications for monitoring and diagnostics passes the entire data and recursively splitting the
for wind turbines using the object-oriented based data within a node, generally into two branches
decision tree concept cite (Abdallah 2017). (some algorithms can perform multiway splits) by
selecting the variable (dimension) that best clas-
sifies the samples according to a split criterion,
2 DECISION TREES i.e. the one that maximizes the information gain
(Equation 1) among the random subsample of
A Decision Tree (also called Classification or Pre- dimensions obtained at every point. The splitting
diction Tree) is designed to classify or predict a dis- continues until a terminal leaf is created that meets
crete category from the data. Decision Trees (DTs) a stopping criterion such as a minimum leaf size or
are a non-parametric supervised learning method a variance threshold. Each terminal leaf contains
used for classification (and regression). In the data that belongs to one or more classes. Within
machine learning sense, the goal is to create a clas- this leaf, a model is applied that provides a fairly
sification model (classification tree) that predicts comprehensible prediction, especially in situations
the value of a target variable (also known as label where many variables may exist that interact in a
or class) by learning simple decision rules inferred nonlinear manner as is often the case on wind tur-
from the data features (also known as attributes bines (Carrasco Kind & Brunner 2013). Algorithm
or predictors). From Figure 1 an internal node N 1 shows pseudocode of a generic decision tree
denotes a test on an attribute, an edge E represents learning algorithm.
an outcome of the test, and the Leaf nodes C rep- Formally, the splitting is done by choosing the
resent class labels or class distribution. attribute that maximizes the Information Gain
Four reasons motivated us to work with deci- (IG), which is defined in terms of the impurity
sion tree classifiers. First, they can be learned degree index:
and updated from data relatively fast compared
to other methods. Second, they are visually more Tm
intuitive, simpler and easier to assimilate and IG (T , M ) = I d (T ) − ∑
m ∈M T
I d (Tm ) (1)
interpret by humans/engineers. Third, unlike other
classification methods, with decision tree classifi-
ers one is able to perform data-driven root cause where T is the training data in a given node, M is
analysis of faults; one can trace a path from the one of the possible dimensions along which the
end state (e.g. blade damage) to the initiating event node may be split, m are the possible values of
(e.g. wrong parameters in control system), a way M, |T | is the size of the training data, |Tm| is the
that follows the sequence and chronology of how number of objects for a given subset m within the
events are interlinked. Last, it has been shown that current node, and Id is the function that represents
the degree of impurity of the information. There
are three standard methods to compute the impu-
rity index (Id). The first method is by using the
information entropy (H), which is defined by:
n
I d (T ) ≡ H (T ) = − ∑ fi log 2 ( fi ) (2)
i =1

where i is the class to be predicted, n is all pos-


sible classes, and fi is the fraction of the training
data belonging to class i. The second option, is to
measure the Gini impurity (G). In this case, a leaf
is considered pure if all the data contained within
it have the same class. The Gini impurity can be
Figure  1. Graphical representation of a decision tree computed inside each node using the following
(DT) classifier. DT terminologies are also shown. simplified equation:

3054
n
attribute values, attributes with differing costs,
I d (T ) ≡ G (T ) = 1 − ∑ fi 2 (3) and pruning trees to avoid over-fitting are usually
i =1
applied to improve the ability of the tree to gen-
eralise to unseen data. The accuracy of each rule
The third method is to use the classification
is then evaluated to determine the order in which
error (CE):
they should be applied. C5.0 is essentially the same
as C4.5 but uses less memory and builds smaller
I d (T ) ≡ CE (T ) = 1 − max fi (4)
rule sets while being more accurate.
CART (Breiman, Friedman, Olshen, & Stone
where the maximum values are taken among the 1984) which stands for Classification and Regres-
fractions fi within the data T that have class i. Fig- sion Trees is very similar to C4.5, but it differs in
ure 2 shows the three impurity indices, for a node that it supports numerical target variables (regres-
with data that are categorized into two classes, as sion) and constructs binary tree based on a numer-
a function of the fraction of the data having a spe- ical splitting criterion recursively applied to the
cific class. If all of the data belong to one class, the data instead of constructing rule sets.
impurity is zero. On the other hand, if half of the According to (Lim, Loh, & Shih 2000) C4.5 and
data have one class and the remaining data belong QUEST have the best combinations of error rate
to the other class, the impurity is at its maximum and speed, but C4.5 tends to produce trees with
(Carrasco Kind and Brunner 2013). twice as many leaves as those from Quest. QUEST
There exist several algorithms for training deci- is a binary-split decision tree algorithm for classifi-
sion trees from data including ID3, C4.5, C5.0, cation and regression (Loh & Shih 1997). It uses a
J48, SPRINT, FACT, FIRM, SLIQ, CHAID, significance test to select variables for splitting. An
QUEST, CRUISE, PUBLIC, BOAT, RAINFOR- advantage of the QUEST tree algorithm is that it is
EST, MARS, RIPPER and CART. In the following not biased in split-variable selection, unlike CART
we will briefly mention some ofthe more common which is biased towards selecting split-variables
algorithms. Ross Quinlan developed ID3 (Quinlan which allow more splits, and those which have
1986) which stands for Iterative Dichotomiser 3, more missing values.
and its later iterations include C4.5 and C5.0. ID3 CRUISE is one of the most accurate decision
attempts to generate the smallest multiway tree. If tree classifiers (Loh 2011) that is also efficiently
the problem involves real-valued variables, they capable of performing multiway splits. (Kim &
are first binned into intervals, each interval being Loh 2001) proposed CRUISE which stands for
treated as an unordered nominal attribute. C4.5 Classification Rule with Unbiased Interaction
converts the trained trees into sets of if-then rules Selection and Estimation that splits each node into
(Quinlan 1994) and improves on ID3 by allowing as many as subnodes, which precludes the use of
both discrete and continuous attributes, missing greedy search methods. CRUISE is practically free
of selection bias (Kim & Loh 2001) and is capable
of integrating interactions between variables when
growing the tree. CRUISE borrows and improves
upon ideas from many sources, but especially from
FACT, QUEST, and GUIDE for split selection
and CART for pruning.
Finally a word about RainForest which is not a
decision tree classifier per se but rather a frame-
work. RainForest was proposed to make decision
tree construction more scalable (same as BOAT
which is in fact faster than RainForest by a fac-
tor of three while constructing exactly the same
decision tree, and can handle a wide range of
splitting criteria (Gehrke et al. 1999)). Accord-
ing to (Gehrke et al. 2000), a thorough examina-
tion of the algorithms in the literature (including
C4.5, CART, CHAID, FACT, ID3 and extensions,
SLIQ, Sprint and QUEST) shows that the greedy
Figure 2. Impurity index Id for a two-class example as a
function of the probability of one of the classes f1 using
schema described in Algorithm 4 can be refined
the information entropy, Gini impurity and classification to a generic RainForest tree induction schema. In
error. In all cases, the impurity is at its maximum when fact, most decision tree algorithms consider every
the fraction of data within a node with class 1 is 0.5, and attribute individually and need the distribution of
zero when all data are in the same category. class labels for each distinct value of an attribute

3055
to decide on the splitting criteria. Rainforest is a sampled every 10 minutes, across 64 channels. In
comprehensive approach for decision tree classi- total, more than 2.5 million records were available,
fiers that separates the scalability aspects of algo- of which 980 excessive vibration error events are
rithms for constructing a decision tree from the recorded across all wind turbines. The error event
central features that determines the quality of the of interest in this demonstration is excessive struc-
tree. Rainforest concentrates on the growth phase tural vibrations.
of a decision tree due to the time consuming nature Data Pre-processing The first step prepares the
of this step RainForest, closes the gap between the data for the construction of the prediction tree
limitations to main memory datasets of algorithms classifier. Knowledge of the process is helpful in
in the machine learning and statistics literature and the elimination of parameters (features) that are
the scalability requirements of a data mining envi- not significant. The SCADA system recorded
ronment (Singh & Sulekh 2017). parameters can be grouped into the following
Next we present a demonstration of deci- categories: (i) system related data, e.g., turbine
sion tree classifier learning based on a real-world number and index, time stamp, are turbine specific
SCADA data set from the Lillgrund offshore wind and, therefore, can be excluded from the decision
farm. tree classifier training, (ii) operating performance
parameters such as power output, pitch and rotor
speed, (iii) environmental parameters such as wind
speed and wind direction, (iv) temperature meas-
urements for various components such as gearbox,
bearings and generator, and finally (4) dynam-
ics parameters such as tower top accelerations.
The attributes chosen in this demonstration are
shown in Table  1. The table includes both origi-
nal sensor attributes such as maximum generator
rotational speed over a 10min period (GRpm_max)
and additional derived parameters such as the dif-
ference between max and min wind speed over a
10min period (DMaxMinV). We open a small
parenthesis here. As with most pattern recogni-
tion methods, tree-based classification methods
work best if the proper features are selected to
start with; preprocessing by a data dimensionality
reduction techniques such as principal component
analysis (PCA) or independent component analy-
sis (ICA) or optimal feature selection approaches
such as the wrapper approach integrated with the
genetic or the best-first search algorithms can be
effective because they find important axes to be
used as guideline for the selection of the features
upon which a decision tree is trained. However,

Table 1. The attributes (features) used as input to train


the decision tree classifier.

Attributes Description

HSGenTmp Mean temperature of gear bear-


ing on generator side
Po_max Max value of active power
DMaxMeanPow Difference between max and
mean active power
GRpm_max Max generator RPM
DMaxMeanGRpm Difference between max and
mean generator RPM
3 DEMONSTRATION
Pi_min Min collective pitch
DMaxMinV Difference between max and
Condition monitoring data from 48 wind turbines min wind speed
was collected over a period of 12 months and

3056
in this demonstration we chose not to in order to on random subsets of the original dataset, and fus-
test the limit of a decision tree classifier for fault ing these into a unique final aggregated predictor.
diagnostics; for instance wind speed, power and This aggregated predictor can typically be used
RPM are strongly inter-dependent attributes, but for reducing the variance of a black-box estima-
for the same wind speed a wind turbine could be tor, by introducing randomization into the con-
found operating at two different power output lev- struction procedure and forming an ensemble (for
els (during distinct time periods) or two different proof refer to (Breiman 1996, Bühlmann 2012)).
generator RPM levels, and both cases are consid- The bagging algorithm consists in (1) construct-
ered normal operating modes. How so? Indeed this ing a bootstrap sample (X *(1) ,Y*(1) ) ,…, (X *( n ) ,Y*( n ) )
happens very often when a wind turbine has to by randomly drawing n times with replacement
de-rate the power output following a demand from from the original data ( X (1) ,Y (1) ) ,…,( X ( n ) ,Y ( n ) ) ,
the grid side or reduce the generator RPM under (2) computing the bootstrapped estimator (i.e. tree
a specific noise or load control mode. The target classifier) gˆ* = hn ( X *(1) ,Y*(1) ) ,…, ( X *( n ) ,Y*( n ) ) where
variable TurbineState for classification is shown in the function hn(.) defines the estimator as a func-
Table 2. It can take two labels, NoFault indicating tion of the data, and (3) repeating steps 1 and 2
that the wind turbine is producing electric power M times, where M is often chosen as 50 or 100,
under normal operating conditions, and Vibr indi- yielding { gˆ*k , k = 1,…, M } and the bagged estima-
cating an excessive vibration fault resulting in the
tor is gˆbagg = ∑ k = 1 gˆ*k M . In theory, M → ∞ if the
M
wind turbine shutting down, furthermore trigger-
ing a message to be send to the vibration support bagged estimator is
technical team (in order for some corrective action
to take place). gˆbagg = E [ gˆ*k ] (5)
Bagged decision tree construction The second
step is the construction of the Bagged decision
In the machine learning and statistics literature,
tree classifier. An ensemble of decision trees is
the two main performance measures for a clas-
often more accurate than any single tree classi-
sification tree algorithm are its predictive quality
fier (Bauer & Kohavi 1999, Dietterich 1996). Bag-
and construction time of the resulting tree. In this
ging (Breiman 1996), boosting (Schapire 1990)
paper the predictive quality is given by the misclas-
and random forest are three popular methods of
sification rate on the validation data set. As shown
creating accurate ensembles. Previous research
in Figure 4, the misclassification rate of the trained
indicates that boosting is more prone to overfit-
Bagged tree is less than 1% when the bag size is
ting the training data (Freund & Schapire 1996,
more than 10.
Opitz & Maclin 1999). Consequently, the pres-
Fault diagnostics (offline) The final step uses
ence of noise causes a greater decrease in the per-
the newly generated decision tree classifier (e.g.
formance of boosting. Therefore, this study uses
Figure  3) to create diagnostics for individual
bagging to create an ensemble of bagged decision
target fault classes, i.e. Vibr. One aspect of fault
tree classifiers (using the standard CART algo-
diagnostics deals with offline root cause analysis,
rithm) to better address the noise in the condition
which we will demonstrate here. When diagnos-
monitoring data. Other decision tree algorithms
ing faults, we are interested in identifying the root
and ensemble approaches will be investigated in
causes (or sequence of events) that lead to a large
future work. This technique can be summarized
portion of the overall abnormal behavior, where
as, (i) take a bootstrap sample from the data set,
the decision tree edges leading to faults become
(ii) train an un-pruned classification tree and
root cause candidates. In the literature, one can
(iii) aggregate the trained tree classifiers. In more
find a limited number of proposed programatic
detail, Bagging predictors comprise a method for
algorithms by which decision tree classifiers can
generating multiple versions of a predictor, each
be scanned/probed for root cause analysis (Solé,
Muntés-Mulero, Rana, & Estrada 2017). One
implementation is as follows (Zheng, Lloyd, &
Brewer 2004):
Table  2. Target variable: Turbine State. Turbine state
is defined in this demonstration according to two labels. 1. Ignore the leaf nodes that correspond to nor-
mal operations (i.e. NoFault) as they will not be
Class label Description useful in diagnosing faults.
2. Identify, in the decision tree, all leaf nodes with
NoFault Normal operation, turbine is producing the target fault class of interest, i.e. Vibr
power, no faults
3. Ranking: list the leaf nodes with the target fault
Vibr Structural vibrations error resulting in:
“Inform Vibration Support” class ranked by failure count to prioritize their
importance.

3057
a short period of time resulting in the excessive
vibration fault. Another example of an automated
extraction of a sequential trace of events (root
causes) leading from the classified fault to the root
of the tree is as follows:

Vibr ← Po _ max < 2377 ← DMaxMinV > =


8.05 ←DMaxMeanGRpm > = 380.51

This sequential trace of events indicate that the


fault Vibr can possibly occur when, over a period
of 10 minutes, the maximum electric power out-
put does not exceed, 2377  kW, the difference
between max and mean generator speed exceeds
380.51 RPM and the difference between the max
and min wind speed exceeds 8.05 m/s.
This approach to data-driven root cause analysis
elegantly elucidates the traces of events that lead to
a fault. These traces can subsequently be used by an
Figure  3. Part of the decision tree classifier based on engineer to design simulation scenarios to try and
the SCADA data from 48 wind turbines. replicate the faults and to propose mitigating actions.

4. Noise Filtering: in diagnostics, we are interested 4 OUTLOOK: BIG DATA BASED


in identifying the root causes that result in a MONITORING AND DIAGNOSTICS
large proportion of the overall faults. Thus we FRAMEWORK
retain the leaf nodes accounting for more than
a certain threshold of the total number of faults So far we demonstrated how offline root cause
(e.g.) threshold = 5%) analysis can be conducted via an ensemble based
5. Extract traces (paths) containing the target Bagged decision tree learning from SCADA data
fault leaf node (C), and all edges (E) and inter- of 48 wind turbines based on the CART algorithm.
nal nodes (N) up to the root node (R). In this section we summarize an outlook for a
6. Extract rules at each internal node of the traces monitoring and diagnostics framework that would
7. Node Merging: we merge nodes on a path by perform on big data and in real-time. The intent is
eliminating ancestor nodes that are logically for this framework to be deployed on a cloud such
subsumed by successor nodes. as Azure and scale as the volume of streamed data
Below is an example of an automated extraction increases. Figure 5 shows an overview of the archi-
of a sequential trace of events (root causes) from tecture of the framework. The main features of the
the trained decision tree classifier (see Figure  3), framework include:
leading from the classified fault to the root of the
tree:

Vibr ← 1503.6 <= GRpm _ mar < 1539.05 ←


110.455 <= DMaxMeanGRpm < 380.51 ←
DMaxMinV > = 10.65

This sequential trace of events indicate that the


fault Vibr can possibly occur when, over a period
of 10 minutes, the maximum generator speed is
in the range 1503.6–1549.05 RPM, the difference
between max and mean generator speed is in the
range of 110.45–380.51  RPM and the difference
between the max and min wind speed exceeds
10.65 m/s. Note that it is not possible to infer from
the data the time interval during which the large
change in wind speed occured but it could well be Figure 4. Misclassification rate of the validation set as
an indication of agust or excessive turbulence over a function of tree bag size.

3058
Figure 5. Graphical summary of the proposed framework decision tree learning with big data for fault diagnostics.

• Hardware-software package, able to acquire based data storage and analytics is a natural
and fuse in real-time both condition monitoring fit with consequent reduction in the cost of
(CM) (e.g. electric power output, rotor speed) handling and manipulating of said data (see
and specialized structural health monitoring Figure  6). In the proposed framework, large
(SHM) data (e.g. tower accelerations, strains on scale storage of historical data from WTs
blade) from WT components (e.g. blades, gear- is done via the Hadoop Data File System
box, tower, etc.). (HDFS). HDFS is a distributed file system that
• Computational core: object-oriented decision is fault tolerant and can load large volumes
tree learning algorithm, Bayesian network (BN) of data, in a distributed manner, even if any
based computation of probabilities and root- errors occurs. Telemetry and data acquisition
cause discovery algorithm (as demonstrated in is supported by Apache Kafka which is a dis-
the previous section). tributed streaming platform that aims to pro-
• Distributed cloud based data storage and analyt- vide a unified, high-throughput, low-latency
ics of the high rate of real-time data streaming methodology for handling real-time data feeds.
from the measuring unit on the WT, which inter- Kafka is a fault-tolerant system, meaning that
faces with the real-time decision-tree software if any error occurs, published data would still
unit. Hence, the toolkit does not need to save be available in case the application or Kafka
data and do heavy computations on remote WT. is stopped, is offline and then restarted. Thus,
• Online user interface to visualize the output Kafka can be guaranteed not to miss any data
from the decision tree (decision support tool). that is issued from the wind turbines. Finally,
In the following we elaborate a bit more on: (i) for data processing purposes, Apache Spark is
the cloud computing, storage and analytics, and used. Spark is an in-memory fast and general
(ii) object-oriented decision tree learning aspects. engine for large-scale data processing. In addi-
tion, Spark can scale to thousands of machines
i. For efficient fusion of the large bulk of infor- in a fault-tolerant manner. These characteris-
mation made available from the condition and tics make Apache Spark very suitable to proc-
structural health monitoring systems consti- ess and analyze the large volumes of data that
tutes a Big Data problem, as large amounts are gathered from the wind turbines (Canizo,
of data are continuously sampled at high and Onieva, Conde, Charramendieta, & Trujillo
diverse rates from the WTs within a wind farm. 2017).
Thus, the integration of a distributed cloud

3059
analysis is here cast in the sense of programmati-
cally discovering the sequence of events (paths)
leading from a classified fault at the leaf, all the
way to the root of the decision tree classifier. These
traces can subsequently be used by an engineer to
design simulation scenarios to try and replicate the
faults and to propose mitigating actions. Finally,
we briefly presented an architecture for a moni-
toring and diagnostics framework that would per-
form on big data. In particular we highlighted the
need for cloud based storage and computing, and
an innovative approach based on object-oriented
decision tree learning that extends the traditional
Figure 6. Cloud computing and storage. decision tree classifier concept. In the future, more
concrete implementations and results of the pro-
posed framework will be disseminated.
ii. A priori Decision Trees are built first to serve
the fundamental diagnostic tasks by combining
engineering knowledge of the system, failure ACKNOWLEDGEMENTS
modes and domain knowledge. The dynamic
decision tree classifiers are trained over time The authors acknowledge the support of the Euro-
from wind turbine telemetry (CM & SHM pean Research Council via the ERC Starting Grant
data). The limitation in traditional decision tree WINDMIL (ERC-2015-StG #679843) on the
classifiers appears when the component displays topic of Smart Monitoring, Inspection and Life-
a behaviour with feedback (i.e., after a repair/ Cycle Assessment of Wind Turbines. We would
update in the system) or for evolving systems like to thank Vattenfall AB for providing access to
(e.g. when new sensors are integrated or aging the SCADA data from the Lillgrund offshore wind
of the system), which implies a need to estab- farm.
lish several decision trees based on the possible
ordering of the events or based on new initiat-
ing events. One way around this is an innova- REFERENCES
tion that we introduced in this framework in
the form of object-oriented decision tree learn- Abdallah, I., V. Dertimanis, & E. Chatzi (2017). Data-
ing (Wyss & Durán 2001). To this end, a WT driven identification, classification and update of
decision tress for monitoring and diagnostics of wind
is viewed as a multi-layered system of objects
turbines. In 2nd ECCOMAS Thematic Conference,
(e.g. structure, controller, actuator, etc.) that are UNCECOMP, Rhodes Island, Greece.
defined on the basis of abstract super-classes, Bauer, E. & R. Kohavi (1999). An empirical comparison of
attributed with specific properties and methods. voting classification algorithms: Bagging, boosting, and
Decision trees classifier are further mapped variants. Mach. Learn. 36, 105–139.
into Bayesian Networks for further assessment Bearfield, G. & W. Marsh (2005). Generalising event trees
of the conditional probabilities (Bearfield & using bayesian networks with a case study of train
Marsh 2005, Jassens, Wets, Brijs, Van Vanhoof, derailment. In International Conf. on Comp. Saf., Reli.,
Arentz, & Timmermans 2006). The integration and Sec., SAFECOMP, Fredrikstad, Norway, pp.
52–66.
of the object-oriented decision tree learning
Breiman, L. (1996). Bagging predictors. Mach. Learn. 24,
and BN delivers updated probability of occur- 123–140.
rence associated with each event and end state, Breiman, L., J. Friedman, R. Olshen, & C. Stone (1984).
which would form a solid indicator of the risk Classification and Regression Trees. Wadsworth Inc.
of future failure of any given component. Bühlmann, P. (2012). Bagging, boosting and ensemble
methods. In J. Gentle and Y. Mori (Eds.), Handbook of
comp. stat., Chapter 33, pp. 985–1022. Springer.
5 CONCLUSIONS Canizo, M., E. Onieva, A. Conde, S. Charramendieta, &
S. Trujillo (2017). Real-time predictive maintenance for
wind turbines using big data frameworks. In Proc. on
We presented a review of several decision tree clas-
IEEE Int. Conf. on Prognostics and Health Manage-
sification algorithms. We then demonstrated how ment, Allen, TX, USA, pp. 70–77.
data-driven and automated root cause analysis can Carrasco Kind, M. & R. Brunner (2013). Tpz: Photometric
be conducted via an ensemble based Bagged deci- redshift pdfs and ancillary information by using predic-
sion tree learning from SCADA data of 48 wind tion trees and random forests. Monthly Notices of the
turbines based on the CART algorithm. Root cause Royal Astronomical Society 432, 1483–1501.

3060
Caruana, R. & A. Niculescu-Mizil (2006). An empirical Technical Report TR 979, Department of Statistics,
comparison of supervised learning algorithms. In Proc. UW Madison.
of the 23rd inter. conf. on Mach. lear., Pittsburgh, Penn- Lim, T., W. Loh, & Y. Shih (2000). A comparison of predic-
sylvania, USA, pp. 161–168. tion accuracy, complexity, and training time of thirty-
Dietterich, T. (1996). Applying the weak learning frame- three old and new classification algorithms. Machine
work to understand and improve c4.5. In Proc. 13th Learning Journal. 4, 203–228.
International Conference on Machine Learning, pp. Loh, W. & Y. Shih (1997). Split selection methods for clas-
96–104. sification trees. Statistica Sinica 7, 815–840.
Freund, Y. & R. Schapire (1996). Experiments with a new Loh, W.Y. (2011). Classification and regression trees. Wiley
boosting algorithm. In Proc. 13th International Confer- Interdiscip. Rev. Data Min. Knowl. Discov. 1, 14–23.
ence on Machine Learning, pp. 148–156. Opitz, D. & R. Maclin (1999). The strength of weak learn
Gehrke, J., R. Ramakrishnan, & V. Ganti (2000). Rainforest ability. Popular ensemble methods: an empirical study 1,
—A framework for fast decision tree construction of 169–198.
large datasets. Data Min. Knowl. Discov. 4(2/3), 127–162. Quinlan, J.R. (1986). Induction of decision trees. Machine
Gehrke, J., V. Ganti, R. Ramakrishnany, & W. Lohz (1999). Learning 1, 81–106.
BOAToptimistic decision tree construction. In Proc. Quinlan, J.R. (1994). Programs for Machine Learning.
of the 1999 ACM SIGMOD intern. conf. on Manag. of Morgan Kaufmann Publishers.
data, Stockholm, Sweden, pp. 169–180. Schapire, R. (1990). The strength of weak learn ability.
Grasse, F., V. Trappe, S. Thoens, & S. Said (2011). Struc- Mach. Learn. 5, 197–227.
tural health monitoring of wind turbine blades by strain Singh, K. & R. Sulekh (2017). The comparison of vari-
measurement and vibration analysis. In Proc. of the 8th ous decision tree algorithms for data analysis. Interna-
International Conference on Struct. Dyn.(EURODYN), tional Journal Of Engineering And Computer Science 6,
Leuven, Belgium. 21557–21562.
Hameed, Z., Y.S. Hong, S.H. Ahn, & C.K. Song (2009). Solé, M., V. Muntés-Mulero, A.I. Rana, & G. Estrada
Condition monitoring and fault detection of wind tur- (2017). Survey on Models and Techniques for Root-
bines and related algorithms: A review. Renewable and Cause Analysis. ArXiv e-prints.
Sustainable Energy Reviews 13, 1–39. Wyss, G.D. & F.A. Duŕan (2001). OBEST: The object
Hand, D. (1997). Construction and Assessment of Classifi- based event scenario tree methodology. Technical
cation Rules. Chichester: John Wiley and Sons. Report SAND2001-0828, Sandia National Laborato-
Jassens, D., G. Wets, T. Brijs, K. Vanhoof, T. Arentz, & H. ries, Albuquerque, CA, USA.
Timmermans (2006). Integrating Bayesian Networks Yang, B. & D. Sun (2013). Testing, inspecting and moni-
and decision trees in a sequential rule-based transporta- toring technologies for wind turbine blades: A survey.
tion model. Euro. J. Oper. Res. 1, 16–34. Renewable and Sustainable Energy Reviews 22, 515–526.
Kim, H. & W. Loh (2001). Classification trees with unbi- Zheng, A.X., J. Lloyd, & E. Brewer (2004). Failure diagno-
ased multiway splits. J. Amer. Statist. Assoc. 96, 598–604. sis using decision trees. In Proc. of the 1st Intern. Conf.
Lim, T., W. Loh, & Y. Shih (1997). An empirical compari- on Autonomic Comp., ICAC, Washington, DC, USA,
son of decision trees and other classification methods. pp. 36–43.

3061

You might also like