Choe 2021

Renewable Energy 174 (2021) 218e235
Contents lists available at ScienceDirect
Renewable Energy
journal homepage: www.elsevier.com/locate/renene
Sequence-based modeling of deep learning with LSTM and GRU

networks for structural damage detection of floating offshore wind
turbine blades
Do-Eun Choe a, *, Hyoung-Chul Kim b, Moo-Hyun Kim c
a
Dept. of Civil Eng., New Mexico State Univ., 3035 South Espina St., Las Cruces, NM, 88003, USA
b
Center for Energy and Environmental Sustainability (CEES), Prairie View A&M Univ., Roy G. Perry College of Engineering, Prairie View, TX, 77446, USA
c
Dept. of Ocean Eng., Texas A&M Univ., 727 Ross Street, College Station, TX, 77843, USA
a r t i c l e i n f o a b s t r a c t
Article history: This paper proposes and tests a sequence-based modeling of deep learning (DL) for structural damage
Received 27 October 2020 detection of floating offshore wind turbine (FOWT) blades using Long Short-Term Memory (LSTM) and
Received in revised form Gated Recurrent Unit (GRU) neural networks. The complete framework was developed with four
7 March 2021
different designs of deep networks using unidirectional or bidirectional layers of LSTM and GRU net-
Accepted 6 April 2021
Available online 16 April 2021
works. These neural networks, specifically developed to learn long-term and short-term dependencies
within sequential information such as time-series data, are successfully trained with the sensor signals of
damaged FOWT. The sensor data were simulated due to the limited availability of field data from
Keywords:
Offshore wind energy
damaged FOWTs using multiple computational methods previously validated with experimental tests.
Machine learning The simulations accounted for the damage scenarios with various intensities, locations, and damage
Deep learning shapes, totaling 1320 damage scenarios. Both the presence of damage and its location were detected up
Long-short-term memory to an accuracy of 94.8% using the best performing model of the selected network when tested for in-
Gated recurrent unit dependent signals. The K-fold cross-validation accuracy of the selected network is estimated to be 91.7%.
The presence of damage itself was detected with an accuracy of 99.9% based on the cross-validation
regardless of the damage location. Structural damage detection using deep learning is not restricted
by the assumptions of the systems or the environmental conditions as the networks learn the system
directly from the data. The framework can be applied to various types of civil and offshore structures.
Furthermore, the sequence-based modeling enables engineers to harness the vast amounts of digital
information to improve the safety of structures.
Published by Elsevier Ltd.
1. Introduction in Scotland [2,3]. However, FOWT is in the infancy stage of

commercialization. This is due to challenges that complicate the
The rapid increase in demand for clean and sustainable energy predictive outcomes for its highly nonlinear system, which involves
has promoted the development of technology and commercializa- the uncertainties inherent in the system, environment, and oper-
tion of the floating offshore wind turbine (FOWT). The develop- ational conditions. The uncertainties inherent in the floating sys-
ment of FOWT intends to harvest larger amounts of energy tem includes larger inertia loading by motions, potential instability
resources from deeper water compared to the current fixed-based by blade pitch control, potential high fluctuation of power output
offshore wind turbine. Its various advantages over the on-land or due to motions etc. The modeling uncertainties increase as the
fixed-based offshore wind turbine have made FOWT more hydrodynamics of floating structures and the turbine aerodynamics
appealing than ever before, particularly with the successful com- are coupled in the system. In addition, the uncertainties during
missions of projects such as WindFloat in Portugal [1] and Hywind operation and monitoring increase due to the longer physical dis-
tance of FOWT from the coast compared to the current fixed-based
offshore wind turbine. These disadvantages result in higher oper-
* Corresponding author. ation and maintenance costs [4,5]. Structural Health Monitoring
E-mail addresses: dchoe@nmsu.edu (D.-E. Choe), khc4156@gmail.com (SHM) has been discussed as a vital option that may significantly
(H.-C. Kim), m-kim3@tamu.edu (M.-H. Kim).
https://doi.org/10.1016/j.renene.2021.04.025
0960-1481/Published by Elsevier Ltd.
D.-E. Choe, H.-C. Kim and M.-H. Kim Renewable Energy 174 (2021) 218e235
improve the safety and the cost-efficiency of FOWT systems, which observable to be applicable. The substructure of FOWT a floating
service lives are longer than typical offshore structures [6]. By system rather than stationary. In addition, the coupled dynamics of
continuously monitoring the sensor signals from the tower or the hydrodynamics of floating body and the turbine aerodynamic is
blade, the on-site maintenance and inspection schedule can be highly nonlinear, which does not respect the linearity assumption
optimized [7]. Early detection of structural damage could prevent of OMA. The suggested DL method may potentially be applicable to
catastrophic failures and secondary damage leading to exorbitant various types of future structures with dissimilar environmental
repair costs [8]. and mechanical conditions.
SHM methods based on the modal properties as damage sen- Beside the specific DL applications, various data-drive modeling
sitive features have been suggested considering the characteristics methods has been applied to solve the wind energy problems
of the FOWT structures. The structural members of FOWT are [19e24]. Data-driven modeling is a broader definition than ML and
typically slender, e.g., blades and towers and experience highly refers the methods that are informed by data rather than the the-
nonlinear environmental conditions including extreme winds and ories behind the system regardless of the methods including
waves [9,10]. However, Tcherniak et al. [11] reported that the various ML, ANN, and DL that utilize multiple layers of neural
application of modal-property-based SHM methods to FOWT networks.
structure is limited due to several violations of the assumptions of Both LSTM and GRU networks are types of Recurrent Neural
modal-property-based methods in the FOWT system and Networks (RNNs) that are specifically developed to learn long-term
environments. and short-term relationships within the sequence data often used
Several non-modal-property-based methods have recently been for text recognition, weather predictions, or medical detections.
developed for wind turbine applications. Dervilis et al. [12] inves- These networks were chosen to design the deep networks and train
tigated SHM of wind turbine blades using Artificial Neural Network the sensor signals of FOWT in our research. It has been reported
(ANN), which is a traditional shallow neural network, non-deep- that it is practically difficult to train the traditional RNN as the
learning technique. It was reported that the damage could be gradients tend to either vanish or diverge [25]. LSTM neural
detected before a visible crack in the structure was observed. network was first introduced by Hochreiter & Schmidhuber [17]
Ha€ckell et al. and Tsiapoki et al. [6] [[,13] introduced and validated a and was followed by many variations of the algorithm. LSTM al-
three-tier modular SHM framework with real field wind turbine gorithm introduced a few unique functions including “forget gate”,
data. They analyzed training and testing data using a combination which allows the network to efficiently capture the sequential
of machine learning, damage parameters, and probabilistic dependency and differentiated LSTM from the traditional RNNs.
modeling to determine if the damage was present for on-land wind GRU neural network was recently developed by Cho et al. [26] This
turbines. Wang et al. [14] introduced a deep autoencoder for application has been rapidly adopted due to its simpler structure
identifying impending wind turbine blade breakages based on su- and faster training time. Overall, it has demonstrated comparable
pervisory control and data acquisition data, which were collected or superior performance to the LSTM network. Bahdanau et al. [27]
from four wind farms located in China. However, these methods reported that LSTM and GRU units performed comparably to each
have been developed for on-land turbines, which may not fully other based on their machine translation experiments; however,
consider the nature of FOWT behavior, such as the dynamic evidence showing that this applies to other research areas are
coupling of the blades’ aerodynamics and the hydrodynamics lacking. Chung et al. [28] tested both LSTM and GRU for polyphonic
involved in floating body motion. Therefore, reliable SHM methods music and speech signals, where the GRU performed slightly better
for FOWT are critically needed for the safe, reliable, and cost- but comparable to LSTM. Other than this, reports addressing the
effective maintenance of FOWT systems. In an effort to develop a performance of LSTM and GRU are scarce.
reliable SHM method, we propose a sequence-based modeling of In this paper, the sequence-based modeling of DL is proposed
Deep Learning (DL) using Long Short-Term Memory (LSTM) and and tested for structural damage detection of FOWT blades. The
Gated Recurrent Unit (GRU) networks. This approach has not been following section describes basic properties of the FOWT and the
previously applied to the problem in structural engineering, but its numerical modeling of the FOWT system and various damage
potential for providing multiple advantages is high. scenarios. The next section describes the principle of the primary
DL is a subarea of Machine Learning (ML), which is again a unit of LSTM and GRU networks for the sequence-based modeling,
subarea of Artificial Intelligence (AI). DL uses deep layers of neural including conceptual and mathematical details of how the LSTM
networks to learn the system directly from the data, which has and GRU network learns the long-term dependency from sequen-
drastically improved the performance from traditional shallow tial data. The section, entitled Framework, provides the details and
neural networks. Traditional neural network, typically called as visual representations of the entire DL process within its sub-
Artificial Neural Network (ANN) [15], is a single hidden-layered sections. The last two sections of the paper present the results of
neural network that feed the input only forward. Multiple layered the damage detection of FOWT blades using the proposed deep
neural networks, deep learning, enabled a fine tuning of the asso- network and provide a conclusion for our research findings.
ciated parameters of the network by its back propagation as well as
by increasing the model complexity with the deeper layer [16]. In 2. Structural damage of FOWT blades
addition, the traditional neural network cannot learn from a
sequential data as an input. Recurrent Neural Network (RNN) is one 2.1. Simulation of OC4-DeepCwind semi-submersible FOWT
of deep learning method that learns from sequential data. Recent
development of LSTM neural network, a type of RNN, improved the In this study, OC4-DeepCwind semi-submersible FOWT [29]
learning performance by introducing “forget gate” [17]. with NREL 5 MW baseline turbine is used as shown in Fig. 1. The
The advantage of DL methods for the maintenance of FOWT is turbine is a three-bladed semi-submersible type FOWT with the
that it is less restricted by the assumption of the system and the capacity of 5 MW. Each blade length is 61.5 m with hub diameter of
environmental conditions because the deep network learns the 3 m, which makes the rotor diameter of 126 m. It is designed to
system directly from the data. While the modal-property-based operate the turbine between 3 m/s~ 25 m/s for its cut-in and cut-off
methods carry various assumptions for their applications, wind speeds and the rated tip speed is 80 m/s. Hub height is 90 m
whereas the DL method does not. For instance Operational Modal and the center of mass of the turbine is 0.2 m, 0.0, 64.0 m in three
Analysis (OMA) [18] assumes the system as a linear, stationary, and global coordinate directions. The mass of rotor, nacelle, and tower
219
Fig. 1. DeepCWind OC4 semi-submersible FOWT: Simulation model (left) and actual model testing [30] (right).
are measured to be 110, 000 kg, 240, 999 kg, & 347,460 kg domain analysis, the lumped mass element method was used. The
respectively. The detailed dimension, parameters, and properties of detailed description of the coupled process between OrcaFlex and
the turbine, platform, and mooring are given in Kim et al. [10]. FAST can be found in Masciola et al. [32] The hydrodynamic co-
To ensure the simulated data to represent the responses of the efficients, such as radiation damping and added mass, and first and
actual wind turbine, the floating platform, and the mooring system, second order wave excitation forces are obtained based on the
two software, FAST [31] and Orcaflex [32], are coupled with addi- potential theory in the frequency domain by WAMIT [35] and
tional sets of connected software. Fig. 2 shows the structure of the subsequently entered into the time-domain analysis. The effects of
simulation software. The integration of the set of software, FAST radiation damping force from added mass and damping coefficients
and OrcaFlex, have been verified with actual testing data [30,33,34]. are included using the impulse response function in the form of a
In this coupled model, OrcaFlex calculates the hydrodynamics of convolution integral. Also, the viscous drag force is added using the
the substructures, passing mooring tension, hydrodynamic co- Morison equation for pontoons and platform columns.
efficients, and hydrodynamic forces to FAST. FAST calculates the The dynamics due to the elasticity of the tower and blade in the
displacements and velocities of the platform and passes back into time-domain simulation are implemented using ElastoDyn, a sub-
OrcaFlex at each time step. For the mooring dynamics in the time routine of FAST. In ElastoDyn, the blade model is a straight and
Fig. 2. Simulation software structures.
220
isotropic cantilever beam structure with distributed mass and 30%) and the corresponding mass reductions are modeled. At each
bending stiffness. Blade mode shape is pre-calculated using finite damage location and each damage intensity, ten different damage
element method (FEM) code, specified as a polynomial coefficient, shapes defined by the edge side width of the damaged blade, w, are
and entered into the FAST-time domain simulation. BModes [36] is modeled (Fig. 3). The right top picture in Fig. 3 shows an example
the FEM program code that provides dynamically coupled modes damage model when the damage is located in the normalized scale
for a beam. The mode-shape polynomial equations are used as of 0.46, where 1 represents the blade tip with the width w of 0.006
shape functions in a nonlinear beam model according to the (0.006 blade length 61.5 m). A total of 1200 damage scenarios are
Rayleigh-Ritz method. Deflections of the beam’s node points are simulated to be used for structural damage detection at ten loca-
calculated using a linear summation of normal mode shapes. In this tions. Damage location is expressed as an integer from 1 to 10 so
study, the first and second flap modes, the first edge mode for that the blade root element and the blade tip element are 1 and 10,
blade, the first two fore-aft mode, and the side-to-side modes for respectively as shown in Fig. 3. The reduced stiffness and mass
the tower are included [37]. distribution for the damaged materials were replaced with the
non-damaged values of the simulation model. To maintain the
X
n
same number of undamaged FOWT, 120 additional simulations
uðz; tÞ ¼ fa ðzÞqa ðtÞ (1)
were performed with randomly generated wind speeds and tur-
a¼1
bulence using a coefficient of variance of 0.25 for wind speed and
where uðz; tÞ is the lateral deformation at time t and location z. 0.125 for wind turbulence.
fa ðzÞ and qa ðtÞ are the normal mode shape and generalized coor-
dinate for normal mode a, respectively. 3. Principle of sequence-based modeling for damage
The full-field wind is generated using the Turbsim program [38], detection of FOWT
a stochastic, full-field, turbulent-wind simulator that integrates
various wind spectra and turbulence models. For the imple- To model and predict the structural damage status of FOWT
mentation of turbulence, the Kaimal turbulence model is used. The blades, we used both LSTM and GRU neural networks. In this sec-
Kaimal model is defined in IEC 61400e1 2nd ed. [39], and 3rd ed. tion, the principle of the sequence-based modeling is presented
[40], which is used for generating of full-filed wind file or hub- through the primary units of LSTM and GRU neural networks.
height wind file including wind turbulence in the TurbSim [38], The goal of the prediction in the sequence-based modeling of
which is the pre-processor program of FAST-OrcaFlex. In this study, this research is to identify the damaged status Y ¼ fy1 ; y2 ; /; y11 g
the accelerations of fore-aft and side-to-side directions are ob- indicating undamaged status (y1 ) or one of ten damaged locations
tained from the numerical sensors, and the acceleration results are (y2 to y11 ) based on the collected signals of the numerical accel-
used for the structural damage detection. For simplicity, only the erometers X, located at multiple, n, points of the wind turbine
parking condition (blade-fixed) is considered, and the blade rota- tower and blades of FOWT.
tion and control are not considered. Nineteen numerical sensors In sequence-based modeling, the model predicts yi based on n
per blade are evenly distributed along the blade length, and twenty sets of sequential data xi , which may typically be n set of numerical
sensors at the towers are assumed. In this paper, accelerometer is values in traditional modeling. In other words, instead of the nu-
used to measure vibration, which has been mainly used in long merical values of input sets x, the long-term or short-term re-
structures such as bridges [41e44], offshore structure [45], and lationships within the sets of sequence vectors, x, becomes the
wind turbine blade and tower [9,46e49] for the industrial and basis of the predictions. In this study, an input xi is represented by a
research purposes. In the case of accelerometer, it must be properly ½n t matrix consisting of n sets of sensor signals recorded for the
installed so that all the interested directions of acceleration can be length of t. The training input X can be represented by X ¼ fx1 ; x2 ;
measured. For example, if users want to measure the three axes of /xi ; /; xm g, where m is the total number of observations. The
acceleration of flap, edge, and axial directions in the wind blade, number of sensors, n, is also referred to as the number of features in
three single-axis accelerometers must be installed for each direc- DL. It is ideal to maintain identical time steps, t, for all observations,
tion. If the multi-axis accelerometer is used, one accelerometer is x1 .through xm . If not, the timestep variation of the data sets within
enough to measure all the directions of acceleration at the specific each batch must be minimized to reduce the amount of the padding
blade length. or truncation of data.
Both LSTM and GRU neural networks are types of RNNs and are
known to capture long-term dependencies. In RNNs, the models
2.2. Damage modeling are trained to maximize the conditional log-likelihood function of
[26].
In the present study, the damaged condition is modeled by
reducing the bending stiffness and the mass of the damaged ma- 1 Xm
max logPq ðyi jxn Þ (2)
terials [8,50,51]. During the simulations of FEM analyses and time- m i¼1
domain analyses, the distribution of mass and the stiffness of the
damaged blades were entered along with the blade length. The where q is a set of unknown parameters. The conditional proba-
bending stiffness is represented by the product of Young’s modulus bility that yi is classified as y given the set of input xi ¼ fx1;1;i ; x1;2;i ;
(E) and the area moment of inertia of the beam cross-section (I). /x1;t; i ; x2;1;i ; x2;2;i ; /xn;t; i g is defined as:
The undamaged properties of mass and the bending stiffness are
given in Jonkman et al. [29]. Pðyi ¼ yj x i Þ ¼ gðht ; c t Þ (3)
The damaged condition is modeled by various damage locations,
shapes, and intensity. The blade model of FAST is divided into ten where ht ¼ f ðht1 ; xt ; cÞ for GRU or ht ¼ f ðct ðht1 ; xt ÞÞ for LSTM.
blade segments. This study aims to use DL to find the damage The hidden stateht , and cell state, ct , with associated parame-
location in one of the ten segments. To create the various damage ters,q; are to be trained using already-known training data sets of
conditions, the blade structure of each segment is divided into two XeY, which is the main goal of the DL modeling. Fig. 4 shows the
elements at 19 distinct locations (Fig. 3). At each damage location, primary unit of LSTM and GRU cells that will repeat its process for
six different levels of stiffness reductions (5%, 10%, 15%, 20%, 25 & each data timestep to train the cell and hidden states.
221
Fig. 3. Illustration of damage location index and sample damage modeling.
The principal differentiator between LSTM network and the matrix of recurrent weights, b are a vector of bias, s represents a
traditional RNN is that a few gates, such as ‘forget gate’, are used at logistic sigmoid function, sðxÞ ¼ ð1 þ ex Þ1 , and 1 represents an
each timestep to control the passing of information along with the element-wise product.
sequences, which capture long-term dependencies at higher ac- GRU neural network is first introduced by Cho et al. [26] The cell
curacy [17]. The diagram on the left side of Fig. 4 shows the step-by- structure is presented on the right side of Fig. 4. The GRU unit in-
step prediction mechanism of an LSTM unit. Upon the input of each cludes two separate gates similar to the forget gate of LSTM, update
data xi , the recurrent gate, also called the forget gate, first decides gate,zt , and reset gate,r t , while the overall structure is simpler than
how much of the information to drop from the previous cell state, LSTM. The reset gate, r t , decides how much information to drop
c t1 . The new cell state, c t , is then updated based on the modified from the previous information and creates a state candidate, h~ , t
previous cell status, f t 1c t1 , the new input gate, it , and cell with the given input data xt : For instance, if the reset gate is closed
candidate, g t . The new hidden status, ht , is updated based on the to 0, this means the entire information from the previous steps are
new cell state, ct , and the output gate, ot . The gates can be repre- ignored. The update gatezt , decides how much information from
sented as follows [52,53]. the previous hidden state to keep. The update gate will be mostly
active with the data that shows long-term dependency, while the
c t ¼ f t 1c t1 þ it 1tanhðg t Þ (4a) reset gate will be active with a short-term dependency of the data.
The hidden state is updated by the state candidate and the update
ht ¼ ot 1tanhðct Þ; (4b) gate. After the end of the sequence of the data, the summary of the
input, c, is calculated. The gates can be represented as follows [28].

f t ¼ s W f xt þ V f ht1 þ bf ; (4c)
ht ¼ zt 1 ht1 þ ð1 zt Þ1h~t (5a)
it ¼ sðW i xt þ V i ht1 þ bi Þ; (4d)

h~t ¼ tanhðW c xt þ V c ðr t 1 ht1 ÞÞ; (5b)
g t ¼ sðW c xt þ V c ht1 þ bc Þ; (4e)
r t ¼ sðW r xt þ V r ht1 Þ; & (5c)

ot ¼ s W f xt þ V f ht1 þ bf (4f)
zt ¼ sðW z xt þ V z ht1 Þ (5d)

where W; V; & b are the unknown model parameters q ¼
fW ; V; bg; where W are the matrix of input weights, V are the
Fig. 4. Primary units of LSTM (left) & GRU (right) cells.
222
4. Framework Table 1 shows the sets of data used in this paper. Each feature set
is tested for two variations. Data sets A, C, and E were collected
In this section, we present the framework of the sequence-based using an identical timeline for all observations starting from 400 s
prediction using a DL technique structured with two most (8000 timesteps). Data set B, D, and F were collected at multiple
advanced RNNs, LSTM and GRU network, based upon the time- time durations per damage scenario. These variable timelines were
series information of structural behavior. We tested the proposed randomly chosen using the condition that the signals have at least a
framework for the structural damage detection of FOWT blades. 50% overlap between the sequences. The overlapping technique is
Fig. 5 illustrates the framework used in this paper: data collection, commonly used in image processing DL applications. For instance,
data preprocess, sequence input, LSTM/GRU layers, fully connected data C includes a total of 1320 observations, collecting only one
layer, softmax layer, output layer, and prediction. The subsequent sequence per feature for each damage scenario. Data set D includes
subsections provide details of the methods used in each step. Each a total of 6600 observations, collecting five sequences per feature at
step of the entire framework for damage detection is coded, pro- random time durations for each damage scenario (Table 1). Fig. 6
cessed, modeled, and computed using MATLAB software. shows a sample data out of 1320 scenarios used for data set F,
displaying the full-length raw data from two sensors. The figure
also presents the data slicing process with 50% of overlap.
4.1. Data collections Once the group of data sets, A to F, were prepared, the data is
then divided into two groups of training and testing sets: 80% for
A total of 13,200 sets of simulations were performed: 120 sets of training sets and 20% for testing sets. The common split of the data
damaged FOWTs at each of the ten different locations with various in DL is either 70 to 30 (training:testing) or 80 to 20 (train-
damage levels and shapes, totaling 1200 damage scenarios, and an ing:testing). In this research, the 80:20 ratio was chosen in order to
additional 120 sets of undamaged FOWTs detailed in earlier sec- secure a sufficient number of training sets. For K-fold cross vali-
tions of this study. Each set of raw data is composed of 20,000 dation, the data split is redefined for the validations of the final
timestep signals (1000 s) for a total of 154 sensor signals (features) network, which is detailed in a later section.
including 20 sensors at FOWT tower recorded in two local coor-
dinate directions ðxtw ; ytw ), 19 sensors at each of three FOWT blades
recorded in two local coordinate directions ðxbl ; ybl ). For the pur- 4.2. Data pre-treatment: normalization, wavelet filter, & denoise
pose of efficient training of the deep networks, only 2e12 features
out of a total of 154 features were selected and tested based on the Data normalization is critical to obtain an unbiased model and
domain engineering judgment. After numerous tests, 3 types of to train the network efficiently. One assumes that n features of
selections are reported in this paper: (1) 12 features at tower and input sequence data with various units, dimensions are used to
blade in 2 directions, (2) 2 features (sensor signals) recorded at the train a deep network and one of the feature sequences contains
blade tip in 2 coordinate directions; and (3) 5 features including 2 significantly higher numbers due to the engineering unit used for
tower sensors and 3 blade sensors. The optimum length of the the measurement. Thus, the network will rely on this particular
sequences, t, per single observation was determined after testing feature sequence compared to other features. To avoid this prob-
various lengths within the range of 300e1000 timesteps collected lem, data normalization is commonly recommended to improve
out of 20,000 total timesteps. After multiple trial-and-errors, we the performance of the DL process. This is usually accomplished
found that sequence data containing less than 500 timesteps uniformly between 0 and 1 or in a zero-centered transformation. In
significantly decreased the performance while the performance the research reported here, the sets of data are normalized with
was slightly improved by sequence data comprised of up to 900 center 0 and standard deviation 0, called z-score standardization,
timesteps. Therefore, data A, C, E, and F collected 900 timesteps per which is one of the commonly used data normalization methods.
each feature within a single observation, and B and D collected 500 The transformation is performed with the equation, xt; norm ¼ ðxt
timesteps considering a large amount of data in these sets shown in xÞ=v, where xt; norm is the data in the transformed space, xt is the
Table 1. data in the original space, is x the mean of a time series of xi , and v is
Fig. 5. Framework of sequence-based modeling of deep learning for structural damage detection.
223
Table 1
List of collected data sets.
Set Number of Length of input x, Size of single input x, Number of classes, Damage scenarios per Samples per Extracted Number of observations,
(n) (t) (nt) (s) location scenario timeline (m)
A 12 900 12900 11 120 1 Identical 1320

B 12 500 12500 11 120 3 Random 3960
C 5 900 5900 11 120 1 Identical 1320
D 5 500 5500 11 120 5 Random 6600
E 2 900 2900 11 120 1 Identical 1320
F 2 900 2900 11 120 5 Random 6600
the standard deviation of the time series signal. This common pass and low-pass filters (HP and LP), which are repeated for i
approach of data normalization is used in statistical analysis as well P
∞
levels. The filters are represented as H ¼ s½kfh ½2n k, L ¼
as DL techniques for effective training of the networks. k¼∞
Wavelet decomposition filters [54] are also tested for the per- P
∞
s½kfg ½2n k, where s is the input signal and fh and fg are
formance improvement of training and predictions. The wavelet k¼∞
filter extracts additional feature vectors, which are added to the low-pass and high-pass filters, respectively. The output of HP filter
original data to train the model. The model performance trained at each level i is detailed coefficients (cDi). The output of LP is
with the added features from the wavelet filters was then approximate coefficients (cAi), which are passed to HP and LP filters
compared to the performance with non-treated data. The tech- at the next level. In our research, the added features up to 3rd level
nique has been successfully used for the recent study of signal of detailed coefficient (cD1, cD2, & cD3) from the high pass filter are
classifications [55e57], although none of these studies were asso- tested for improvement of the performance. We used the Daube-
ciated with DL classification. It is reported to improve the perfor- chies dB6 wavelet member [59] of the wavelet family for the filter.
mance of predictions for the noisy data. Lately, Yildirim [58] The technique did not significantly improve the performance in this
reported improved prediction accuracies for LSTM network classi- study, which we expected as the data used in this research is free
fication by using wavelet-filtered electrocardiogram signals as an from noise. This finding is detailed in later sections of this paper.
added feature vector. The wavelet filter bank is composed of high- Fig. 7 shows a sample observation, xi , for data with five features.
Fig. 6. A sample data out of total 1320 scenarios used for data set F; sliced samples for training.
224
The figure shows the raw signal (top left), normalized signal (top data denoise technique we used did not improve model training in
right), wavelet-filtered added feature (bottom left) for level 1 (cD1) our study. Fig. 9 shows the sample denoised data compared to the
only, and the normalized added features (bottom right). Fig. 8 original signal through levels 1, 2, and 3.
shows the sample added features for levels 1 to 3 (cD1, cD2, &
cD3) in addition to the original signal.
4.3. Recurrent layers of LSTM and GRU neural networks
While the wavelet filter method is to add more information to
help finding important information from noisy data, the wavelet
The sensor signals from FOWT are entered one at a time as input
denoise technique is to delete unimportant information from those.
sets into the LSTM/GRU cell discussed in the previous section
We also performed wavelet denoise using Donoho and Johnstone’s
(Fig. 4). Fig. 10 shows the continuous process of the data input in the
universal threshold to quell the data noise [60]. Assuming that a
sequence t ¼ 1 to t ¼ T with various configurations. The layer can be
time-series input xn is expressed as xn ¼ f ðiÞ þ sεðiÞ, the objective
in a single direction or bi-directional, which can be layered deeper
of denoising is to surpass the noise of xn and recover f ðiÞ. The
by stacking layers. The standard unidirectional LSTM/GRU network
denoising algorithm is composed of three steps: First, decompose
layer is shown in Fig. 10 (top left). Bidirectional LSTM (BiLSTM)
the time series input with a chosen wavelet member and a chosen
layers scan the data once in forward and next in backward to
level of decomposition. Second, apply soft thresholding to the
combine both information to predict the structural damage loca-
detailed coefficients (cDi) of each level with a chosen threshold
tion as shown in Fig. 10 (lower left). The bi-directional LSTM/GRU
method. Third, reconstruct the time-series based on the original cAi
enables the efficient capture of both the past and future contexts.
at the last level i and the detailed coefficients of cD1 to cDi at the
The equation of the bidirectional LSTM can be represented as the
levels 1 to i modified from the previous step. The model trained
same equations used for the gates and the status, ht , f t , it , & ot as in
with denoised input data is then compared to the model trained by
equation (4), repeated for both forward and backward directions,
untreated data. Although field data may include a significant level
! ! ! ! ) ) ) )
of noise, our simulated data generates much less noise. In fact, the ht , f t , it , & ot and ht , f t , it , & ot , respectively. Finally, both sets
Fig. 7. Sample raw and treated data. Observation 5; damage location at 2.
225
Fig. 9. Sample data denoise level 1 to 3 using Daubechies dB6 wavelet & Donoho and
Fig. 8. Sample wavelet filtered signals: Original and 1st to 3rd detailed band. Johnstone’s universal threshold.
of information are used to predict the damage location by yT ¼ classification of the damaged segment expressed as a category
! ) variable.
s f ð ht ; ht Þ, where s f represents the function used to estimate the
output [53,61].
A deeper network of either unidirectional or bidirectional of 4.4. Proposed networks for structural damage detection of FOWT
LSTM or GRU network can be designed as needed. The right side of blades
Fig. 10 shows a deeper BiLSTM network layer where two layers of
BiLSTM networks are stacked together. The hidden output of one In this research, we designed and tested numerous neural net-
BiLSTM layer is propagated through t in the same way as a single works to find the structural damage of FOWT blades. Four deep
BiLSTM, and at the same time, the output of the one BiLSTM layer is networks are presented in this paper, as shown in Fig. 11. Each
used as the input data to the next BiLSTM. Every hidden layer of the proposed network contains different recurrent layers: (I) standard
second BiLSTM layer receives an input sequence, which includes LSTM, (II) standard GRU, (III) single layer of BiLSTM, and (IV)
the output sequences of both forward and backward layers from stacked BiLSTM, which details and selections were presented in the
the previous layer. In this research, the complexity of GRU network previous section. More than a single layer of unidirectional GRU led
is not further developed because a single layer of unidirectional to an unnecessary increase in network complexity that led to
GRU performed better than the most complex form of LSTM layer overfitting data. Therefore, only single-layered GRU networks are
and the techniques of bi-directional or double-layered approaches presented in this paper.
did not improve the performance of GRU for the data used in this In order to avoid overfitting or underfitting the training data, it
research. is critical to optimize the network complexity. A less complex
Computationally, the modeling of damage detection using deep model would underfit the data, which results in a high prediction
neural networks can be done by either sequence-to one regression, error in both training and testing sets. However, an overly complex
which is the prediction from the given sequence sets to the damage networks tend to overfit the data resulting in a low prediction error
location as a continuous number, or sequence classification, which is in training sets while producing a high prediction error in the
the prediction from the sequences to the damage location as a testing sets. The ideal model produces a similar error rate in the
discontinuous number representing the segment number of the predictions for both training and testing sets. The network hyper-
damaged location. In this paper, we chose to perform the sequence parameters are detailed in the later section of this paper.
226
Fig. 10. LSTM, GRU, BiLSTM, and double-layered BiLSTM network layers.
Fig. 11. Proposed networks for deep learning of structural damage detection of FOWT blades.
4.5. Fully connected & Softmax layer function is used, which calculates the ratio of the exponential of the
out value from Fc layer to the sum of the exponential values of all
The fully connected layer (Fc) follows the recurrent layers of out values from the Fc layer.
LSTM/GRU/BiLSTM where the outputs of these recurrent layers are
connected to every activated unit of the next layer. Softmax layer. In
this research, the fully connected layer multiplies the input data to 4.6. Fine-tuning network
the weight, W , from the last iteration of the previous recurrent
layer and adds the bias, b. This layer outputs the classification based In order to train an unbiased model with the highest accuracy
on the trained recurrent layer given input data. and the simplest structure, it is necessary to fine tune the hyper-
The Softmax layer provides a vector, Sðyi Þ, as an outcome that parameters such as the number of hidden unit and a learning rate.
includes the probabilities of each target class based on the weights The number of hidden units is the size of the hidden status vector
from the last step of the previous layer: Sðyi Þ ¼ Pðy ¼ yi jxi ; qÞ, per batch. A larger number of hidden units reflects increasing
where i ¼ 1 to 11. In this research, the Softmax function calculates model complexity, and a network with an excessive number of
the probabilities that the damage has occurred at the location yi ¼ hidden units for a given system and data attribution may result in
1 s given the input sensor data (xi ) where s ¼ 11 is the size of the the overfitting of data. In the damage detection models, it was
vector S and the number of the class, representing the damage defined between 64 and 256 units after testing the training prog-
status: no damage (ND) status and possible damage locations 1 to ress. The selected number of hidden units for each case and the
10 defined in FOWT blades. The probabilities are between 0 and 1 selection progress will be shown in the later section with the re-
and later used to determine the damage location. Typical Softmax sults of the network training.
227
Learning rate (LR) is a hyperparameter used in the stochastic when comparing one algorithm to another. We used K-fold cross-
gradient descent optimization algorithm to find the unknown validation which has been shown to be one of the most reliable and
parametersq ¼ fW ; V; bg, of the model discussed in equations (4) non-exhaustive validation methods for estimating an unbiased
and (5). This controls how much to change the model parame- model [66]. In addition, Blum et al. [67] reported that the K-fold
ters q, upon the estimated error when the model weights are method provides more accurate estimates of prediction errors
updated. In fact, this is the most critical hyperparameter during the compared to hold-out testing.
training. A lower value of LR tends to overfit the model by leading it K-fold cross-validation splits the training data set into K sets
to the local minimum, whereas, a larger value of LR may not sta- with a size of m=K, where m is the total number of observations.
bilize the model. The optimum LR varies depending on the model K 1 sets are used for training network and one set is used for
complexity and the data attributions. This should be explored in a validation, which independent training process is repeated for K
large interval and then refined using a smaller interval. LR can be times by choosing a different validation set through k ¼ 1 through
scheduled to change over the iterations. In this research, the LRs k ¼ K. Model accuracy is calculated by averaging K training accu-
were fine-tuned and tested for each different network and data set racies and the most accurately trained model is used for prediction.
to find the best network. Fig. 12 visualizes the iteration of repeated training for K-fold
Batch size is the number of data sets that will be scanned at each cross-validation. In this research, K ¼ 8 is used. The eight (K) sets of
iteration out of the total number of observations used for the data is divided into two groups, seven set for the training and one
training data. In our research, a batch size of 50 was used for all set for the validation. The split of eight fold is randomly selected
cases after testing various numbers ranging between 20 and 200. within the group such that ðX; YÞ ¼ ðfX k ; U k g fY k ; V k gÞ, where X k
For example, if one of training data has 1000 observations, then the and Y k represent k-th training set of the input signals (X) and cor-
entire data set is divided into 20 group of 50 observations. Then, responding classifications (Y) with a size of m ðm =KÞ, and U k and
only 50 data are scanned at each iteration. It takes 20 iterations to V k represent k-th validation set with a size of m=K.
scan the entire observation, which is called an epoch. In this case, The cross-validation estimator, CV, [66] is used for the estimator
one epoch is equal to 20 iterations. In the network training, the of the prediction error, PE, given algorithm Að:Þ and the sets of data
maximum epoch also defines when to stop the network training ðX; YÞ in the cross-validation of the structural damage detection
since excessive iterations after the model to find the optimum model of FOWT. The estimator of the prediction error is repre-
status may also cause overfitting problems. It is desirable to have sented as follows:
the maximum epoch be within a reasonable range. The settings of 0 1
hyperparameters are related to one another. For instance, a lower 1 X
K
@ 1 X
setting of LR requires longer training, meaning a higher value for PE½AðÞ; ðX; YÞ ¼ CV ¼ L½AðX k ; :Þ; ðxi ; yi ÞA
K ðm=KÞ x ;y 2U ;V
the maximum epoch. In this research, the maximum epoch is k¼1 i i k k
defined within the 200 to 600 range depending on model (6)

complexity and defined learning rates.
where Aða; bÞ is the return of a function by algorithm A trained by
4.7. Testing, cross-validation, & estimators of prediction error and the set a upon the new input of b and L½: is a component loss which
accuracy represents the misclassifications of each observation. In this
research, the loss is defined as a weighted classification error:
The trained networks are tested using independent sensor sig-
nals not employed during the training process. The data split ratio L½Aða; :Þ; ðxi ; yi Þ ¼ wi IfAða; xi Þ s yi g (7)
of 80e20 is used for training and testing sets. However, accuracies
obtained from the testing sets may differ slightly from the true P
n
where wi is the weight which sums to 1, wi ¼ 1 and Ifg is the
value of prediction accuracies depending on the number of obser- i¼1
vations used during training and testing or on the magnitude of indicator function: Ifxg ¼ 1 if x is true and otherwise, Ifxg ¼ 0. In
bias incorporated into the model. Once the model complexity and the structural damage detection, Ifg ¼ 0 if the damage status of
hyperparameters are fine-tuned, the cross-validation was per- FWOT blades detected from the deep neural network model is the
formed for the selected models for each network-data set. same as the actual damage status, meaning no loss. Otherwise, the
Cross-validation is critical to obtain an unbiased model and to value returns to 1 with given weight. Note that Að:Þ is trained by X k
allow the estimation of prediction errors and accuracies, which and tested for xi ; 2U k in equation (7).
more closely represent the true value. In this section, the K-fold Similarly, the estimator of the prediction accuracy given the
cross-validation method and estimators of the prediction error and network algorithms and the sets of numerical sensor data of FOWT
accuracy used for structural damage detection in FOWTs are can be calculated as follows:
presented. 0 1
In the machine learning and statistical predictions, there are 1 X
K
1 X n o
PA½AðÞ; ðX; YÞ ¼ @ I AðX k ; xi Þ ¼ yj A
several cross-validation methods and variations such as leave-p- K ðm=KÞ x ;y 2U ;V
k¼1 i i k k
out, hold-out, and K-fold cross-validations. Cross-validation was
studied earlier by Stone [62] which was a leave-one-out method in (8)
the context of regression, which is the same as a leave-p-out
method when p ¼ 1. Geisser [63,64] has studied V-fold method.
Leave-p-out method is an exhaustive validation method which
tests and validates all possible ways to split the data. Leave-one-out 5. Results of damage detection
validation results in the same computation as K-fold using K ¼ m
where m is the number of observations, which is also considered as 5.1. Summary results
an exhaustive method. Hold-out testing is known for its inefficient
use of the data because it does not account for the variance with Table 2 shows the results of the damage detection with the
respect to the training set reported by Dietterich [65]. In the same selected models for four proposed networks (Fig. 11) with various
report, it was concluded that K-fold is a more appropriate method sets of sensor data (Table 1). The general model selection
228
Fig. 12. Iterations k ¼ 1~K in K-fold cross-validation with K ¼ 8.
philosophy is applied to this research that an unbiased model with networks (network IV). In most of the data selections, the perfor-
the highest accuracy and with the simplest structures are to be mance of the GRU network demonstrated a 10e30% gain in accu-
selected. The selected model (D-II) exhibits 91.7% of the cross- racy compared to the other networks tested. The accuracies
validation accuracy based on the estimator of K-fold iterations reached 89.7% using a 12-feature selection, 91.7% with a 5-feature
and was trained with network design II using data sets with five selection, and 81.8% with a 2-feature selection. It is also observed
features (sensors), data D. The detailed K-fold cross-validation re- that the selection of more than five features does not improve
sults of the selected D-II model is shown in Table 3, where the best performance.
network prediction reached 94.8%. The learning rate was scheduled Except for data sets A and F, the bidirectional layer of LSTM
to set the initial value of 0.007 with a drop rate of 0.3 every 100 (BiLSTM, network III) slightly improves the performance of LSTM
epoch up to the maximum epoch of 600 in this case. In all cases, the networks (network I); however, the results are comparable to one
best results are obtained using normalized data input sequences another. It was also found that the double-layered BiLSTM network
with no other data treatments. performs better than the single-layered BiLSTM with most of the
It is found that the performance of the single-layer unidirec- data selections except for data B. However, the computational time
tional GRU network (network II) was superior to even the most takes roughly 300% longer compared to the single-layered BiLSTM,
complex LSTM network, the double-layered bidirectional LSTM and the performance improvement is a marginal percentage: 2%
Table 2
Results of the selected models for all proposed networks and data.
Case Data Network Prediction Accuracy (PA[$]) Average Initial LR LR Drop (rate/freq.) # of Feature # of Hidden Unit Batch Size Max Epoch
Training
Accuracy
A-I A I 0.628 0.831 0.001 0.5/200 12 128 50 350

A-II II 0.757 0.834 0.0008 0.5/200 200 50 220
A-III III 0.600 0.833 0.001 0.5/100 128 50 350
A-IV IV 0.619 0.887 0.0005 0.5/100 1282 50 250
BeI B I 0.746 0.906 0.001 0.5/200 12 128 50 300
B-II II 0.867 0.947 0.0005 0.5/200 128 50 600
B-III III 0.770 0.957 0.001 0.5/100 128 50 300
B-IV IV 0.735 0.926 0.001 0.5/100 1282 50 300
CeI C I 0.678 0.941 0.002 0.5/200 5 96 50 600
C-II II 0.640 0.959 0.002 0.5/200 96 50 600
C-III III 0.695 0.982 0.007 0.5/100 96 50 450
C-IV IV 0.686 0.998 0.007 0.5/100 962 50 400
D-I D I 0.640 0.998 0.007 0.3/100 5 96 50 600
D-II II 0.917 0.997 0.007 0.3/100 96 50 600
D-III III 0.819 0.998 0.007 0.3/100 96 50 600
D-IV IV 0.848 0.980 0.007 0.3/100 962 50 450
E-I E I 0.665 0.947 0.005 0.5/100 2 128 50 600
E-II II 0.711 0.867 0.001 0.5/200 200 50 600
E-III III 0.674 0.915 0.001 0.5/100 200 50 600
E-IV IV 0.699 0.991 0.001 0.5/100 1282 50 600
FeI F I 0.731 0.991 0.001 0.5/200 2 128 50 600
F-II II 0.818 0.999 0.001 0.5/200 200 50 600
F-III III 0.434 0.986 0.005 0.5/100 128 50 600
F-IV IV 0.625 0.994 0.005 0.5/100 962 50 450
229
Table 3
Results of K-fold cross-validation of the selected model of D-II.
Data k¼1 k¼2 k¼3 k¼4 k¼5 k¼6 k¼7 k¼8 Best Accuracy PA[·]
Validation 0.895 0.932 0.948 0.925 0.918 0.910 0.892 0.918 0.948 0.917
Training 0.999 0.998 1.000 0.999 0.999 0.998 0.999 0.999 1.000 e
increase in data A and C; 3% increase in data D, 19% increase in data

F.
It is also observed that unnecessary increases in model
complexity result in lower performance. This is particularly clear
when the model is trained using a small number of features. For
instance, the training model with data set F (at the smallest number
of the features) shows an accuracy of 73.1% with a single-layer
LSTM network (FeI); however, double-layered BiLSTM network
(F-IV) decreases the performance accuracy to 62.5% for the same
data set.
5.2. Summary of the selected model, D-II
The K-fold iterations of the selected model of D-II is displayed in

Table 3. The prediction accuracy (PA½:), 91.7%, is estimated based on
equation (8). The best model exhibits 94.8% of accuracy at k ¼ 3.
Note that PA½: is defined only based on the accuracies of validation
sets after being trained by the training sets and, therefore, the table
cell that represents PA½: on the bottom row remains blank in the
table. In this K-fold cross-validation, out of total 6600 observations
(m), the number of observations used during each training (m-m/K)
is 5775 and the number of observations for the validations (m/K) is
825. The entire training process is repeated K ¼ 8 times as shown in
Fig. 12. A single set of observation includes the signals from 5
sensors of 500 timesteps each. The K fold is randomly split and,
therefore, the total numbers of observation for each class may not
be exactly the same.
Fig. 13 and Fig. 14 show the confusion matrix of the selected
model of case D-II when k ¼ 2 & k ¼ 3, which compares the
detected damage status to the true damage status. ND stands for no
damage (ND) status, and the numbers 1 to 10 represent the location
of the damages. Diagonal boxes display the numbers of true clas-
sifications for 11 different damage states. The lower and upper
white triangles show the numbers of false classifications. The
bottom line of the confusion matrix shows the prediction accu-
racies (%) of each true damage group and the bottom right corner
box displays the overall prediction accuracy. Similarly, the right-
side column of the confusion matrix displays the prediction accu-
racies (%) of each predicted damage group. The prediction accu-
racies reached 93.2% (k ¼ 2) and 94.8% (k ¼ 3) for determining both
the presence and location of the damage. This is displayed in the
box at the bottom right corner of Figs. 13 and 14. In the model at
k ¼ 3, out of 5.2% of total misclassifications, 3.1% of mis-
classifications locate the damage within one-segment distance. The Fig. 13. Confusion matrix of the selected model, k ¼ 2.
damage location of only 2.1% of the entire data set is located more
than one-segment distance from the true damage location.
the model detected the presence of damage with 99.8% accuracy.
The presence of damage itself was detected with higher accu-
racy (99.9%) regardless of the identification of its location in this
model. The damage status 1 to 10 indicates that damage (D) existed 5.3. Selection of model complexity
while ND indicates no damage. From the network k ¼ 3 (Fig. 14),
only 1 sample out of 825 samples tested provided a false identifi- In addition to the depth of the network layers, model
cation of damage, and this was at location 1 and predicted as ND complexity is also determined by the number of hidden units. A
from the network. The rest of the 824 samples show true D-D or higher number of hidden units increases the model complexity
true ND-ND regardless of the damage location. Thus, the results which may lead to overfit the data and produce a biased model. On
suggest that the model is 99.9% accurate for detecting the presence the other hand, an insufficient number of hidden units decreases
of damage. Similarly, the confusion matrix of network k ¼ 2 shows the performance of the model. The relationship between model
only 2 misclassifications out of 825 observations. This suggest that complexity, errors, and model bias (underfitting or overfitting) was
230
accuracies. Fig. 15 (c) shows the accuracy level of 88.92% and (d)
show 88.90%. Using these results, we chose to further investigate
models with 64 and 96 hidden units and to fine-tune our approach
as detailed in the following section.
5.4. Improving accuracy
After the model complexity is chosen based on the depth of

network layers and the number of hidden units, networks were
fine-tuned to achieve the highest accuracy. As detailed in earlier
sections, model hyperparameters such as learning rate, minimum
batch size, and maximum number of epochs were controlled in
order to improve accuracy. The most critical hyperparameter to
improve the model is the learning rate (LR) which controls the
changes of the unknown parameters of q ¼ fW ; V; bg. The
expression of the changes in the stochastic optimization algorithm
is qiþ1 ¼ qi aVlðqi Þ where a > 0 is the learning rate and lðqi Þ is the
loss function from the entire data set. Note that q is updated at each
batch, a subset of the entire dataset, while lðqi Þ uses the entire data
used for training. A batch size of 50 was used for our research. The
dynamics between the hyperparameters allows the model to use a
fixed batch size at this reasonable level. We proceeded to control
the learning rate for each network design. Table 4 shows the fine-
tuning progress of the selected model D-II. The test set A is for
the network with 64 hidden units and set B is the progress for the
network with 96 hidden units. Fine-tuning of the network resulted
in the highest testing accuracy for test B5 which reached 92.7% and
used for the final configuration of D-II model. The details of 24 fine-
tuned network models (A-I to F-IV) were summarized in Table 2.
5.5. Effects of wavelet filters and denoise
In this section, the effects of added features from the wavelet

filter and the denoise technique are presented. Data treatments
other than data normalization did not significantly improve the
results of the training and predictions. Instead, they interfered with
the training process. Since the data is simulated and lacks the noise
expected from the field sensors, we expected this outcome. How-
ever, the results are presented here because data treatments are a
critical part of the framework for increasing the performance of
models and will play a significant role to train a network with field
data. The results do not exclude the needs of the wavelet filters and
wavelet denoise techniques when the network is trained with field
Fig. 14. Confusion matrix of the selected model, k ¼ 3. data or when the data type is dissimilar to the data used in this
research.
Table 5 shows the results of model testing and training accuracy
discussed in the earlier section. after fine-tuning. Samples of treated data were presented in Figs. 7,
Fig. 15 shows the training progress of Network II trained with set Figure 8, and Fig. 9. Results from models trained by denoised data
D by decreasing the number of hidden units: (a) 256 units; (b) 128 are listed according to the level of denoise, 1 to 3, and compared
units; (c) 96 units and (d) 64 units. Left side figures of (a), (b), (c), with the results of the untreated data shown in the top four rows of
and (d) show the model accuracies of training sets (solid line) and the table. It was observed that the denoise process removes
validation sets (triangle mark) during training progress. The figures necessary information from the signals in this research. Therefore,
on the right-side show the loss function. At this stage, all models the performance of the model decreases as the denoise level in-
use a constant learning rate of 0.001 to determine model creases. This may be due to the fact that the training data is
complexity. The ideal progress is to show the validation accuracy as simulated and without noise. The bottom three rows of Table 5
close as possible to the training accuracy throughout the progres- show the results of the fine-tuned model trained by data sets
sion while maintaining the highest level of accuracy. Although an with added features from the wavelet filter. cDi, which indicates the
increase in the number of hidden units leads to similar validation detailed coefficient at the level of i, the output from the HP filters at
accuracies, 87.33%, 86.75%, 88.92% and 88.92%, the gap between the each level. For the type of data used in our research, added features
training (solid line) and validation (triangle mark) accuracies in- did not improve the training results.
creases over iterations in figure (a) and (b). In addition, the loss
functions increase over iterations in the two figures of (a) and (b). 6. Conclusion
These results indicate that the model slightly overfits the training
data. In contrast, Fig. 15 (c) and (d) show stabilized loss compared to A sequence-based modeling and prediction method of DL was
(a) and (b) and their validation accuracies are close to the training proposed and tested for structural damage detection of FOWT
231
Fig. 15. Comparison by the number of hidden units.
232
Table 4
Accuracies by learning rate (LR) during the fine-tuning.
Test Network Testing Training Accuracy Initial LR LR Drop (rate/freq.) # of Hidden Unit Batch Size Max Epoch
Accuracy
A1 0.777 0.847 0.0005 0.5/200 64 50 600

A2 0.850 0.920 0.0008 0.5/200 64 50 600
A3 0.889 0.952 0.001 0.5/200 64 50 600
A5 0.897 0.999 0.0035 0.4/100 64 50 600
A6 0.886 0.998 0.007 0.3/100 64 50 600
B1 0.825 0.880 0.0005 0.5/200 96 50 600
B2 0.880 0.942 0.001 0.5/E200 96 50 600
B3 0.889 0.997 0.0035 0.4/E100 96 50 600
B4 0.917 0.998 0.005 0.5/E100 96 50 600
B5 0.948 0.997 0.007 0.3/E100 96 50 600
Table 5
Training results of wavelet filters and denoise for case D-II.
Treatment Levels Testing Training Accuracy Initial LR LR Drop (rate/freq.) # of Hidden Unit Batch Size Max Epoch
Method Accuracy
Untreated 0.927 0.997 0.007 0.3/100 96 50 600

Wavelet Denoise Level 1 0.848 1.000 0.007 0.3/100 96 50 600
Level 2 0.803 0.999 0.007 0.3/100 96 50 600
Level 3 0.450 1.000 0.007 0.3/100 96 50 600
Wavelet Filter cD1 0.830 0.979 0.001 0.9/100 128 50 600
cD1 & cD2 0.823 0.991 0.005 0.3/100 200 50 600
cD1, cD2, & cD3 0.843 0.974 0.005 0.3/100 200 50 600
blades, using uni/bi-directional LSTM and GRU neural networks. was not significantly improved after using more than five features
The complete framework was developed and presented. A total of (sensors). The future work of this research will be a systematic DL
1200 damage scenarios were simulated with various shapes, in- monitoring of the entire FOWT system. In addition to the structural
tensities, and locations of 5 MW semisubmersible FOWT blades, in failure of blades, holistic system failure scenarios will be considered
addition to 120 simulations of undamaged FOWT. Various data including structural, mechanical, and operational failures. Alter-
treatment methods were applied and tested to improve the per- native DL methods will also be considered.
formance of the network. We found that the wavelet filter and The methodology and framework of the sequence-based
denoise treatments did not improve or interfere with the network modeling provided in this paper can be applied to various engi-
training performance in this research. This may be because the neering research fields especially in data-driven modeling. We
simulated data does not likely contain the noises commonly ex- believe that sequence-based modeling will enable the engineers to
pected from the field data. Therefore, the data treatments should advance the technologies and knowledge in various engineering
not be excluded if a network is trained by field data or if the data fields by harnessing the massive amounts of digital informational
attribution is dissimilar to the one used in this research. currently being cumulated.
Four different deep network designs were proposed and tested.
Overall, the network with a single GRU unit (Network II) performed Funding acknowledgements
better than the other networks in training numerical sensor signals
from the FOWTs. However, the results from the deep network with This material is based upon work supported by the National
a stacked BiLSTM network (Network IV) demonstrated that this Science Foundation (NSF) under Grant #2118586. This material is
configuration is comparable to one with GRU networks depending also based upon work supported by the National Science Founda-
on data attribution. The difference in performance between the tion under Grant # 1914692.
networks with single and double layers of BiLSTM was not
significant. Data accessibility statement
The presence of structural damage was successfully detected
using the selected network (Network II) with a 99.9% accuracy, Some or all data, models, or code generated or used during the
which was tested with independent testing signals and repeated study will be available through the funding agency (National Sci-
for K randomly-split groups during the K-fold cross-validation. The ence Foundation, United States) and are also available from the
best network during K-fold validation reached 100% accuracy. The corresponding author by request.
overall damage status, which indicates both the presence and its
location, was detected with an accuracy of 91.7.% based on K-fold CRediT authorship contribution statement
cross-validation. The testing accuracy of the best model reaches
94.8% in the detection of damage status at k ¼ 3. Out of 5.2% of Do-Eun Choe: Conceptualization, Methodology, Formal anal-
incorrect predictions in this model, 3.1% fall in the damage location ysis, Funding acquisition, Investigation, Software, Project adminis-
within 1-segment distance from the true damage location. Only tration, Validation, Visualization, Writing e original draft, Writing
2.1% of the testing signals are misclassified more than 1 segment e review & editing. Hyoung-Chul Kim: Data curation, Software,
distance from the true damage location. K-fold cross-validation was Validation, Writing e original draft, Writing e review & editing.
performed to obtain unbiased results and to calculate the estima- Moo-Hyun Kim: Software, Writing e review & editing,
tors of the prediction accuracies. The prediction of damage status Supervision.
233
Declaration of competing interest [29] J. Jonkman, S. Butterfield, W. Musial, et al., Definition of a 5-MW reference
wind turbine for offshore system development, National Renewable Energy
Lab.(NREL), Golden, CO (United States), 2009.
The authors declare that they have no known competing [30] A.J. Coulling, A.J. Goupee, A.N. Robertson, et al., Validation of a FAST semi-
financial interests or personal relationships that could have submersible floating wind turbine numerical model with DeepCwind test
appeared to influence the work reported in this paper. data, J. Renew. Sustain. Energy 5 (2013), 023116.
[31] J. Jonkman, M. Buhl, New Developments for the NWTC’s Fast Aeroelastic
HAWT Simulator, 2004, p. 504.
[32] M. Masciola, A. Robertson, J. Jonkman, et al., Investigation of a FAST-OrcaFlex
References Coupling Module for Integrating Turbine and Mooring Dynamics of Offshore
Floating Wind Turbines, National Renewable Energy Lab.(NREL), Golden, CO
[1] Y. Liu, S. Li, Q. Yi, et al., Developments in semi-submersible floating founda- (United States), 2011.
tions supporting wind turbines: a comprehensive review, Renew. Sustain. [33] F. Wendt, A. Robertson, J. Jonkman, et al., Verification and Validation of the
Energy Rev. 60 (2016) 433e449. New Dynamic Mooring Modules Available in FAST V8, National Renewable
[2] M. Froese, World’s first floating wind farm delivers promising results, World’s Energy Lab.(NREL), Golden, CO (United States), 2016.
first floating wind farm delivers promising results, https://www. [34] A. Coulling, A. Goupee, A. Robertson, et al., Validation of a FAST Floating Wind
windpowerengineering.com/worlds-first-floating-wind-farm-delivers- Turbine Model Using Data from the DeepCwind Semi-submersible Model
promising-results/, 2018. Tests, 2013.
[3] Allen HL, Goupee AJ, Viselli AM, et al. Experimental comparison of an annular [35] Lee C, Newman J. The Computation of Second-Order Wave Loads..
floating offshore wind turbine hull against past model test data. J. Offshore [36] G. Bir, User’s guide to BModes (software for computing rotating beam-coupled
Mech. Arctic Eng.; 142. modes), National Renewable Energy Lab.(NREL), Golden, CO (United States),
[4] A.C. Levitt, W. Kempton, A.P. Smith, et al., Pricing offshore wind power, Energy 2005.
Pol. 39 (2011) 6408e6421. [37] X. Qu, Y. Li, Y. Tang, et al., Comparative study of short-term extreme responses
[5] C. Mone , M. Hand, M. Bolinger, et al., Cost of Wind Energy Review, National and fatigue damages of a floating wind turbine using two different blade
Renewable Energy Lab.(NREL), Golden, CO (United States), 2015, 2017. models, Appl. Ocean Res. 97 (2020), 102088.
[6] M.W. Ha €ckell, R. Rolfes, M.B. Kane, et al., Three-tier modular structural health [38] B.J. Jonkman, TurbSim User’s Guide: Version 1.50, National Renewable Energy
monitoring framework using environmental and operational condition clus- Lab.(NREL), Golden, CO (United States), 2009.
tering for data normalization: validation on an operational wind turbine [39] International Electrotechnical Commission. Wind Turbine Generator Systems-
system, Proc. IEEE 104 (2016) 1632e1646. Part 1: Safety Requirements. IEC 61400-1..
[7] Z. Hameed, Y. Hong, Y. Cho, et al., Condition monitoring and fault detection of [40] International Electrotechnical Commission. Wind Turbines-Part 1: Design
wind turbines and related algorithms: a review, Renew. Sustain. Energy Rev. Requirements. IEC 614001 Ed 3.
13 (2009) 1e39. [41] J.P. Conte, X. He, B. Moaveni, et al., Dynamic testing of Alfred Zampa memorial
[8] Griffith DT, Yoder N, Resor B, et al. Structural Health and Prognostics Man- bridge, J. Struct. Eng. 134 (2008) 1006e1015.
agement for Offshore Wind Turbines: an Initial Roadmap. SAND2012-10109 [42] Brownjohn J, Pavic A, Carden E, et al. Modal Testing of Tamar Suspension
Sandia Natl Lab. Bridge.
[9] W. Weijtjens, R. Shirzadeh, G. De Sitter, et al., Classifying resonant frequencies [43] F.M.R.L.D. Magalh~ aes, Operational modal analysis for testing and monitoring
and damping values of an offshore wind turbine on a monopile foundation for of bridges and special structures, 2010.
different operational conditions, Proc. European Wind Energy Association, [44] White JR. Operational Monitoring of Horizontal axis Wind Turbines with In-
EWEA (2014). ertial Measurements..
[10] H.-C. Kim, M.-H. Kim, D.-E. Choe, Structural health monitoring of towers and [45] F. Liu, H. Li, S.-L.J. Hu, Stochastic Modal Analysis for a Real Offshore Platform,
blades for floating offshore wind turbines using operational modal analysis 2015, pp. 12e14.
and modal properties with numerical-sensor signals, Ocean. Eng. 188 (2019), [46] C. Ruzzo, G. Failla, M. Collu, et al., Operational modal analysis of a spar-type
106226. floating platform using frequency domain decomposition method, Energies
[11] D. Tcherniak, S. Chauhan, M.H. Hansen, Applicability Limits of Operational 9 (2016) 870.
Modal Analysis to Operational Wind Turbines, in: Structural Dynamics and [47] Devriendt C, Elkafafy M, De Sitter G, et al. Continuous Dynamic Monitoring of
Renewable Energy, ume 1, Springer, 2011, pp. 317e327. an Offshore Wind Turbine on a Monopile Foundation. ISMA2012..
[12] N. Dervilis, M. Choi, S. Taylor, et al., On damage diagnosis for a wind turbine [48] G. De Sitter, W. Weitjens, M. El-Kafafy, et al., Monitoring Changes in the Soil
blade using pattern recognition, J. Sound Vib. 333 (2014) 1833e1850. and Foundation Characteristics of an Offshore Wind Turbine Using Automated
[13] S. Tsiapoki, M.W. H€ ackell, T. Grießmann, et al., Damage and ice detection on Operational Modal Analysis, Trans Tech Publ, 2013, pp. 652e659.
wind turbine rotor blades using a three-tier modular structural health [49] C. Devriendt, P.J. Jordaens, G. De Sitter, et al., Damping estimation of an
monitoring framework, Struct. Health Monit. 17 (2018) 1289e1312. offshore wind turbine on a monopile foundation, IET Renew. Power Gener. 7
[14] L. Wang, Z. Zhang, J. Xu, et al., Wind turbine blade breakage monitoring with (2013) 401e412.
deep autoencoders, IEEE Trans Smart Grid 9 (2016) 2824e2833. [50] M.M. Shokrieh, R. Rafiee, Simulation of fatigue failure in a full composite wind
[15] M.H. Hassoun, Fundamentals of Artificial Neural Networks, MIT press, 1995. turbine blade, Compos. Struct. 74 (2006) 332e342.
[16] J. Schmidhuber, Deep learning in neural networks: an overview, Neural [51] S. Kim, D.E. Adams, H. Sohn, et al., Crack detection technique for operating
Network. 61 (2015) 85e117. wind turbine blades using Vibro-Acoustic Modulation, Struct. Health Monit.
[17] S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural Comput. 9 13 (2014) 660e670.
(1997) 1735e1780. [52] Gers FA, Schmidhuber J, Cummins F. Learning to Forget: Continual Prediction
[18] C. Rainieri, G. Fabbrocino, Operational Modal Analysis of Civil Engineering with LSTM..
Structures vol. 142, Springer N Y, 2014, p. 143. [53] R. Zhao, R. Yan, J. Wang, et al., Learning to monitor machine health with
[19] P. Qian, X. Ma, P. Cross, Integrated data-driven model-based approach to convolutional bi-directional LSTM networks, Sensors 17 (2017) 273.
condition monitoring of the wind turbine gearbox, IET Renew. Power Gener. [54] S.G. Mallat, A theory for multiresolution signal decomposition: the wavelet
11 (2017) 1177e1185. representation, IEEE Trans. Pattern Anal. Mach. Intell. 11 (1989) 674e693.
[20] H. Zhao, H. Liu, W. Hu, et al., Anomaly detection and fault analysis of wind [55] R.J. Martis, U.R. Acharya, L.C. Min, ECG beat classification using PCA, LDA, ICA
turbine components based on deep learning network, Renew. Energy 127 and discrete wavelet transform, Biomed. Signal Process Contr. 8 (2013)
(2018) 825e834. 437e448.
[21] S. Bogoevska, M. Spiridonakos, E. Chatzi, et al., A data-driven diagnostic [56] M. Thomas, M.K. Das, S. Ari, Automatic ECG arrhythmia classification using
framework for wind turbine structures: a holistic approach, Sensors 17 (2017) dual tree complex wavelet based features, AEU-Int J Electron Commun 69
720. (2015) 715e721.
[22] S. Yin, G. Wang, H.R. Karimi, Data-driven design of robust fault detection [57] S. Sahoo, B. Kanungo, S. Behera, et al., Multiresolution wavelet transform
system for wind turbines, Mechatronics 24 (2014) 298e306. based feature extraction and ECG classification to detect cardiac abnormal-
[23] J. Steiner, R. Dwight, A. Vire , Data-driven Turbulence Modeling for Wind ities, Measurement 108 (2017) 55e66.
Turbine Wakes under Neutral Conditions, IOP Publishing, 2020, 062051. € Yildirim, A novel wavelet sequence based on deep bidirectional LSTM
[58] O.
[24] H.-J. Kim, B.-S. Jang, C.-K. Park, et al., Fatigue analysis of floating wind turbine network model for ECG signal classification, Comput. Biol. Med. 96 (2018)
support structure applying modified stress transfer function by artificial 189e202.
neural network, Ocean. Eng. 149 (2018) 113e126. [59] I. Daubechies, Ten Lectures on Wavelets, SIAM, 1992.
[25] Y. Bengio, P. Simard, P. Frasconi, Learning long-term dependencies with [60] D.L. Donoho, J.M. Johnstone, Ideal spatial adaptation by wavelet shrinkage,
gradient descent is difficult, IEEE Trans. Neural Network. 5 (1994) 157e166. Biometrika 81 (1994) 425e455.
[26] Cho K, Van Merrie €nboer B, Gulcehre C, et al. Learning Phrase Representations [61] Cui Z, Ke R, Pu Z, et al. Deep Bidirectional and Unidirectional LSTM Recurrent
Using RNN Encoder-Decoder for Statistical Machine Translation. ArXiv Prepr Neural Network for Network-wide Traffic Speed Prediction. ArXiv Prepr
ArXiv14061078.. ArXiv180102143..
[27] Bahdanau D, Cho K, Bengio Y. Neural Machine Translation by Jointly Learning [62] M. Stone, Cross-validatory choice and assessment of statistical predictions, J R
to Align and Translate. ArXiv Prepr ArXiv14090473.. Stat Soc Ser B Methodol 36 (1974) 111e133.
[28] Chung J, Gulcehre C, Cho K, et al. Empirical Evaluation of Gated Recurrent [63] S. Geisser, A predictive approach to the random effect model, Biometrika 61
Neural Networks on Sequence Modeling. ArXiv Prepr ArXiv14123555.. (1974) 101e107.
234
[64] S. Geisser, The predictive sample reuse method with applications, J. Am. Stat. [66] Y. Bengio, Y. Grandvalet, No unbiased estimator of the variance of k-fold
Assoc. 70 (1975) 320e328. cross-validation, J. Mach. Learn. Res. 5 (2004) 1089e1105.
[65] T.G. Dietterich, Approximate statistical tests for comparing supervised clas- [67] A. Blum, A. Kalai, J. Langford, Beating the Hold-Out: Bounds for K-fold and
sification learning algorithms, Neural Comput. 10 (1998) 1895e1923. Progressive Cross-Validation, 1999, pp. 203e208.
235

Choe 2021

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Choe 2021

Uploaded by

Copyright:

Available Formats

Renewable Energy 174 (2021) 218e235

Contents lists available at ScienceDirect

Sequence-based modeling of deep learning with LSTM and GRU

1. Introduction in Scotland [2,3]. However, FOWT is in the infancy stage of

Fig. 2. Simulation software structures.

Fig. 3. Illustration of damage location index and sample damage modeling.

it ¼ sðW i xt þ V i ht1 þ bi Þ; (4d)

r t ¼ sðW r xt þ V r ht1 Þ; & (5c)

zt ¼ sðW z xt þ V z ht1 Þ (5d)

Fig. 4. Primary units of LSTM (left) & GRU (right) cells.

A 12 900 12900 11 120 1 Identical 1320

Fig. 7. Sample raw and treated data. Observation 5; damage location at 2.

deﬁned within the 200 to 600 range depending on model (6)

Fig. 12. Iterations k ¼ 1~K in K-fold cross-validation with K ¼ 8.

A-I A I 0.628 0.831 0.001 0.5/200 12 128 50 350

increase in data A and C; 3% increase in data D, 19% increase in data

5.2. Summary of the selected model, D-II

The K-fold iterations of the selected model of D-II is displayed in

5.4. Improving accuracy

After the model complexity is chosen based on the depth of

5.5. Effects of wavelet ﬁlters and denoise

In this section, the effects of added features from the wavelet

Fig. 15. Comparison by the number of hidden units.

A1 0.777 0.847 0.0005 0.5/200 64 50 600

Untreated 0.927 0.997 0.007 0.3/100 96 50 600

You might also like