You are on page 1of 10

Proceedings of the ASME 2017 36th International Conference on Ocean, Offshore and Arctic Engineering

OMAE2017
June 25-30, 2017, Trondheim, Norway

OMAE2017-62077

EFFICIENT ALGORITHM FOR DISCRETIZATION OF METOCEAN DATA INTO


CLUSTERS OF ARBITRARY SIZE AND DIMENSION

Samuel Kanner Alexia Aubault


Principle Power, Inc. Principle Power, Inc.
Emeryville, California, 94608 Emeryville, California, 94608
E-mail: skanner@principlepowerinc.com

Antoine Peiffer Bingbin Yu


Principle Power, Inc. Principle Power, Inc.
Emeryville, California, 94608 Emeryville, California, 94608

ABSTRACT other iteration is required. Generally, the population threshold


can be a function of iteration number so that all observations will
In order to run a fatigue analysis on a floating structure, it be binned. The algorithm can properly take into account extreme
is common practice among ocean engineers to rely upon a large data by setting a tolerance level on the N-dimensional distance
set of test cases, each with a unique set of environmental con- by which an observation can be included in a certain bin. A qual-
ditions. For a specific test site, the issue remains of how to ob- ity measure, q, is created to measure the level of representation
tain a limited set of environmental conditions for these test cases, of the original data by a set of bins, independent of the number
sometimes known as bins, which can accurately recreate the con- of bins. Depending on the tolerance levels, the algorithm can be
ditions. When considering a floating offshore wind turbine, it is completed in seconds on a normal laptop for the available data
necessary to obtain a timeseries of not only the wave conditions, set of 20 years with a 3-hour sampling rate. The observations
but also the wind conditions (and perhaps current, if possible). and bins from a case study are shown as an example of how the
Thus, it is common to have greater than 5 dimensions in the time- bins can be created and visualized.
series (e.g., significant wave height, wave period, wave direction,
wind speed, wind direction, etc). The creation of bins in two di-
1 INTRODUCTION
mensions is quite easily solved by creating an arbitrary grid and
taking the mean of all the observations which fall in a specific This paper concerns the reduction of multivariate metocean
cell. In higher dimensions, an N-dimensional cell is not easily data in order to perform a fatigue analysis of an ocean structure
visualized and so the resulting set of bins cannot easily be graph- in a computationally effective manner. Normally, a wind turbine
ically represented. In this paper, an efficient, iterative algorithm is only subject to bivariate environmental conditions (wind speed
is developed to convert N-dimensional metocean data into a set and direction, Uw and θw , respectively). Likewise, trivariate data
of discrete bins of arbitrary size. The algorithm works by set- is commonly used to study an offshore structure (signficant wave
ting a tolerance level on the number of observations that must height and period, as well as mean wave direction, Hs , Tp and
be included in a cell in order to create a bin. If the population θm ). However, for structures such as fixed or floating offshore
threshold is not met, the observations remain unbinned and an- wind turbines (OWTs), it is not patently obvious which type

1 Copyright © 2017 ASME

Downloaded From: http://proceedings.asmedigitalcollection.asme.org/ on 01/04/2018 Terms of Use: http://www.asme.org/about-asme/terms-of-use


of environmental conditions will drive the design of a certain represents a number of original data. These techniques start
structure component. Thus, the consideration of multivariate with a random guess for the initial position of the centroids
data (number of dimensions, p = 5) is necessary in order to (in p−dimensional space) and use various algorithms to iterate
accurately simulate the environmental conditions. If the site on the positions of these centroids. Despite the fact that the
in consideration has an enhanced bi-modal oceanic spectrum, K−means was first described over fifty years ago, there is still
then the number of dimensions may be increased to 8 (p = 8, much research in these clustering algorithms [10]. Recently the
wind-sea and swell wave height, period and direction). However, K−means algorithm was adapter for this purpose in [11]. Gen-
the wind-sea data are usually highly correlated to the wind data erally the SOM method results in centroids that are more dense,
[1]. while the MDA algorithm results in centroids that are spaced as
far as possible from each other. The K−means algorithm gener-
In the wind industry design load cases (DLCs) are used to or- ally results in centroids that are somewhere in between these two
ganize the various types of loading conditions a turbine may face methods. More information on these algorithms can be found in
throughout its design lifetime. As specified in the standards from [6]. The algorithm presented in this paper borrows heavily from
DNV [2], the American Bureau of Shipping [3] and the Inter- the K-means algorithm. The purpose of this paper is to present
national Electrotechnical Commission [4], the specific environ- an algorithm which is specially tailored to the subjective use for
mental conditions depend on the environmental data available. the analysis of floating platforms where wind conditions are rel-
For short-term fatigue analysis, the structure must be simulated evant.
over a representative range of metocean conditions [5]. Rain- The paper is organized as follows. The clustering algorithm
flow counting algorithms can be used to post-process specific is described in the following section in both graphical and logical
data sets, such as the tower base side-to-side or fore-aft bend- terms. A case study is presented next, using publicly available
ing moment, in order to estimate the damage for a specific en- data from the Netherlands Enterprise Agency. The data is visu-
vironmental condition. After the damage from all representative alized in two dimensions with the number of clusters m equaling
environmental conditions (load cases) are calculated, a weighted approximately 50, 100, 200, 500, 1000, and 2000 bins. Finally,
summation is used to estimate the fatigue life of a structure given an extension of the algorithm is presented where certain critical
the probability of a load case occurring in a certain time frame subspaces of the domain are weighted more heavily than others.
[5].
2 ALGORITHM
Generally, wave buoy or hindcast data are the sources of
metocean data at a specific site. For one of the sites considered The algorithm is organized into two parts: an initial cluster-
in this study, the Dutch government has created an open-source ing and an extreme data clustering loop, as shown in Fig. 1. Each
database using hindcast data. The hindcast data is discussed in part will be discussed along with a discussion on the advantages
greater detail in Sec. 3. Typically these timeseries consist of and drawbacks of the algorithm.
decades of observations with an hourly or 3-hour sample rate,
resulting in 100,000s of time steps (n corresponds to the number 2.1 Initial Clustering
of time steps). Thus, for a given site the number of data is greater During the initial stage the user inputs the data of size n × p
than 1M (p · n). Due to computational constraints, each of these and an initial grid. The grid is represented by p vectors, each
n sea states cannot be simulated. In this paper we describe an of arbitrary length q j , denoted by Y j,k , where j = 1, ..., p, and
efficient algorithm to find a number of load cases m, such that k = 1, ..., q j . For instance, the vector spanning the wave height
m << n, that represent all of the environmental conditions fairly dimension may only contain 5 elements (q1 = 5), while the wave
well. The desired number of load cases depends on the precision direction dimension may contain 13 (q2 = 13 =⇒ ∆θw = 30◦ ).
of the results sought as well as the simulation technique and the Thus, these vectors would form the input to a function such as
computational resources available. Ideally, as the number of load ndgrid, which is commonly used in MATLAB R
to transform
cases increase, the fatigue life of a structure approaches the fa- p 1-dimensional vectors into p−dimensional tensors. For the
tigue life of that structure if every single data had been simulated current discussion, a cell is defined as p−dimensional volume,
and equally weighted. which is enclosed by the initial gridlines. Thus, given an initital
Data clustering algorithms are common in many diverse data grid with vectors of length q j there are Nb cells, where
science applications, including the pharmaceutical industry [6]. p
Three common types of clustering algorithms are known as the Nb = ∏ q j . (1)
K-means algorithm [7], self-organizing maps (SOM) [8] and the j=1
maximum dissimilarity algorithm (MDA) [9]. These algorithms Generally, q j is O (101 ), so that with 8 dimensions, this number
are highly general and efficient ways to select a subset m vec- can become quite large, O (108 ). A 32-bit machine could not
tors (of length p) from a database consisting of size n. The fi- store an array of doubles this size. In order to prevent this from
nal m vectors are known as ‘centroids’ or ‘clusters’ since each occurring, a for loop is implemented over the p−dimensions so

2 Copyright © 2017 ASME

Downloaded From: http://proceedings.asmedigitalcollection.asme.org/ on 01/04/2018 Terms of Use: http://www.asme.org/about-asme/terms-of-use


Data Initialize Grid
𝑋" = 𝐻%," , 𝑇)," , 𝜃+," , 𝑈-," , 𝜃-," = 𝑥/," , 𝑥0," , … , 𝑥)," 𝑌<,/ , … , 𝑌<,= > , min 𝑥)," > 𝑌),= > , ≤ max 𝑥),"
" "
𝑖 = 1,2, … , 𝑛 j= 1,2, … , 𝑝

Normalization
𝑋" → 𝑋7" , ∋ 0 ≤ 𝑋7" ≤ 1
Initial clustering

𝑌<,= > → 𝑌?<,= > ∋ 0 ≤ 𝑌?<,= > ≤ 1


𝑖 = 1,2, … , 𝑛; 𝑗 = 1,2, … , 𝑝,

Assign Bin
Assign all 𝑋" to window 𝑤N , where 𝑘 = 1, … , 𝑁T

Calculate Centroid Distance Check


𝑋UN = mean 𝑋?" ∈ 𝑤N For all unbinned 𝑋?" :
𝑑" = 𝛼 min 𝑋UN − 𝑋?"
N
If 𝑑" < 𝑑XYZ , assign 𝑋?" to closest 𝑋UN
Population Check
If 𝑁N , ≥ 𝑁XYZ ,
all 𝑋?" in 𝑤N are binned, YES
Else
each 𝑋?" remain unbinned. Extreme data
Are there are unbbined observations? clustering loop
NO
Output

Denormalization
𝑋7" → 𝑋" Update Distance AND/OR
Population Tolerance(s) Update Centroids
𝑑XYZ = 𝛽a 𝑑XYZ ; 𝑁XYZ = 𝛽b 𝑁XYZ 𝑋UN = mean 𝑋?" ∈ 𝑤N
Done

Figure 1: Graphical representation of algorithm including extreme observation clustering loop.

that the largest array created is of size n · max(q j ) − 1. To en- 2.2 Extreme Clustering
j
sure that no dimension is weighted disproportionately, the data is In the extreme clustering loop, clusters are created only if
normalized using the min and max of each dimension so that the they pass a population threshold test. On a given iteration, the
normalized data and gridlines X̄i , Ȳ j,q j ∈ [0, 1]. population tolerance is given as a percentage of the total number
A two-dimensional example is shown in Fig. 2. Here, the of observations. If a cell does not contain enough observations,
observational data is represented by points and the grid lines by then a cluster is not formed. This concept can be seen in Fig. 3(a),
dashed lines. The gridlines are evenly spaced with q1 = q2 = 5. where only 3 cells have passed the population threshold test to
All of the data and gridlines have been normalized such that they form clusters or bins. These bins are shown by the large circles
span [0, 1]. It is apparent that many cells, such as those in the with a number corresponding to the number of observations they
lower-right corner, are empty. Other cells, such as those in the represent. They are located at the average position, or centroid,
upper-right corner, only contain relatively few observations. The of all the representing observations. The black lines show ob-
purpose of the extreme clustering algorithm is to strike a balance servations outside of the clusters’ respective cells that pass the
between the number of clusters created and the distance between distance test. In p−dimensions, a distance is defined as,
an observation and its associated cluster. p
!(1/m)
dk = α ∑ (x̂ j − x̄ j,k )m (2)
j=1

3 Copyright © 2017 ASME

Downloaded From: http://proceedings.asmedigitalcollection.asme.org/ on 01/04/2018 Terms of Use: http://www.asme.org/about-asme/terms-of-use


1
Number of Iterations: 7
32-bit or 64-bit machine. Once the observations are assigned, a
population tolerance test can be administered to determine which
cells become ‘bins’ (correspondingly, which observations be-
come ‘binned’).
0.75 In the second for loop in Algorithm 1, each of the unbinned
observations is treated individually. The distance to all centroids,
normalized y−distance [−]

as given in Eq. (2), is calculated. The minimum distance for


each observation is compared to a distance tolerance, which can
0.5
change on each iteration. If the observation passes the distance
test, then the position of that cluster is updated on the next itera-
tion.
0.25

Data: Import timeseries for specific site (n x p)


Result: m clusters of p-dimension, where m < n
0
0 0.25 0.5 0.75 1 Grid initialization;
normalized x−distance [−]
Set all observations unbinned ;
Figure 2: Two-dimensional plot of representative normalized data with
Set initial values for tolerances Ntol , dtol ;
initial gridding shown in dashed lines.
Set iteration factors βd , βN ∈ (0, 1] ;
where X̂ represents the position of the nearest cluster to obser- while number of unbinned observations > 0 do
vation k. The α term will be discussed in Sec. 4, but for now for j = 1 : p;
should be considered to be unity. The Euclidian distance is a Assign all observations to a cell;
case of Eq. (2) where m = 2. On the following iteration, the if N j,obs > Ntol ;
new position of the cluster depends on the observations that have then
passed the distance threshold test. count all observations in cell j as ‘binned’;
During each subsequent iteration either the population calculate centroid of observations ≡ bin ;
threshold Ntol or the distance threshold dtol (or both) is reduced end
by a factor βN or βd . The iterations in Fig. 3 show the case where do
the population threshold is reduced, while the distance threshold end
remains constant. As shown in Fig. 3(h), by iteration 7, all obser- for k = 1 : Nunbinned ;
vations except one have been accounted for. During this iteration, Calculate distance dk to all centroids ;
the population threshold drops to zero so that a cluster only rep- if min(dk ) < dtol ;
resenting one observation is formed (as shown by the circle with k
then
a ‘1’ inside in Fig. 3(h)).
move observation into closest bin;
2.3 Pseudocode calculate the centroid of the bin;
Pseudocode is a method of conveying the logic behind an end
algorithm without becoming impeded by the exact implementa- do
tion. The pseudocode for the clustering algorithm presented in end
this paper is shown in Algorithm 1. The main difficulty arises in dtol = βd dtol ;
assigning n observations into Nb cells, where n is O (105−6 ) and Ntol = βN Ntol ;
Nb is O (105−8 ). This problem is solved in the first for loop in Al- end
gorithm 1 by treating each of the dimensions individually. Each Algorithm 1: Basic clustering algorithm for multivariate
n × 1 observation vector (e.g., the entire time series of the signif- metocean data.
icant wave heights) is repeated 1 × q j − 1 times. Meanwhile, the
gridlines, which are of size 1 × q j are repeated n × 1. A simple
comparison test can determine the assignment of each observa-
tion in a cell for a given dimension. The cell assignment informa- 2.4 Discussion
tion is stored until all dimensions have been analyzed whereby a The main drawback to this routine, as compared to the
final cell assignment for each dimension can be made. By re- K−means algorithm, is the subjectivity in assigning the initial
ducing the maximum size of the array created [n × (p − 1)], the gridlines. The K−means algorithm is initiated with m clusters
algorithm can efficiently assign each observation to a cell on a randomly dispersed throughout the domain of the observational

4 Copyright © 2017 ASME

Downloaded From: http://proceedings.asmedigitalcollection.asme.org/ on 01/04/2018 Terms of Use: http://www.asme.org/about-asme/terms-of-use


Number of Iterations: 0 Number of Iterations: 1 Number of Iterations: 2
1 1 1

23
32

0.75 0.75 0.75


normalized y−distance [−]

normalized y−distance [−]

normalized y−distance [−]


22
138
144 144 144
177 179

0.5 0.5 0.5

293 329 333

0.25 0.25 0.25

30

0 0 0
0 0.25 0.5 0.75 1 0 0.25 0.5 0.75 1 0 0.25 0.5 0.75 1
normalized x−distance [−] normalized x−distance [−] normalized x−distance [−]

(a) Iteration 0. (b) Iteration 1. (c) Iteration 2.


Number of Iterations: 3 Number of Iterations: 4 Number of Iterations: 5
1 1 1

4
25 26 26
32 32 32
5
6

0.75 0.75 0.75

33 33
normalized y−distance [−]

normalized y−distance [−]

normalized y−distance [−]


28

144 144 144


180 180 180

0.5 0.5 0.5

333 333 333


7 7

0.25 0.25 0.25

30 30 30

0 0 0
0 0.25 0.5 0.75 1 0 0.25 0.5 0.75 1 0 0.25 0.5 0.75 1
normalized x−distance [−] normalized x−distance [−] normalized x−distance [−]

(d) Iteration 3. (e) Iteration 4. (f) Iteration 5.


Number of Iterations: 6 Number of Iterations: 7
1 1

4 4
26 26
32 32
6 6

0.75 0.75

33 33
normalized y−distance [−]

normalized y−distance [−]

1
144 144
180 180

0.5 0.5

333 333
7 7

0.25 0.25

30 30

0 0
0 0.25 0.5 0.75 1 0 0.25 0.5 0.75 1
normalized x−distance [−] normalized x−distance [−]

(g) Iteration 6. (h) Iteration 7.


Figure 3: Visualization of sample data for each iteration of culstering algorithm with 11 final clusters (βd = 1, βN = 0.25).

data. The bias in the initial gridding of the data can be seen in only after a few trial and errors, the user can generally obtain
Fig. 3(h), where the final centroids are all located near the center m ± 10 clusters fairly quickly.
of each gridded cell (that has observational data). Changing the One of the advantages of the algorithm is the ability to en-
initial grid would have a large effect on the final position of the sure clusters that include a wide range values over single dimen-
clusters. As [6] notes, however, the random initialization of the sions, much like the MDA. In Sec. 3, it is shown that a bin can be
K−means or the SOM classifications also has a great influence formed due to single outlier. In this example, this outlier is ac-
on the final centroids. counted for in the final centroids, which may (or may not) have
Another drawback to this algorithm is the fact that the final a significant impact on the fatigue life. A discussion on how the
number of m clusters cannot be determined a-priori. That is, the clusters can be modified in order to shape the user’s specifica-
tolerances, Ntol and dtol , or the initial gridlines must be modified tions can be found in Sec. 4.
until the desired final number of clusters is achieved. However,

5 Copyright © 2017 ASME

Downloaded From: http://proceedings.asmedigitalcollection.asme.org/ on 01/04/2018 Terms of Use: http://www.asme.org/about-asme/terms-of-use


XXX
Bin Color
XXX Blue Pink D. Green Grey-Blue Yellow Teal Maroon D. Blue Red Hazel L. Green
XXX
Iter. X
0 [-] [-] [-] [-] 144 138 [-] [-] 293 [-] [-]
1 [-] [-] [-] [-] 144 177 [-] [-] 329 [-] [-]
2 32 23 [-] [-] 144 179 22 [-] 333 [-] 30
3 32 25 [-] [-] 144 180 28 [-] 333 [-] 30
4 32 26 [-] 5 144 180 33 [-] 333 7 30
5 32 26 4 6 144 180 33 [-] 333 7 30
6 32 26 4 6 144 180 33 [-] 333 7 30
7 32 26 4 6 144 180 33 1 333 7 30

Table 1: Number of observations in each color-coded bin shown in Fig. 3, per iteration. The order of the columns corresponds to the bins starting in the
upper-left corner and moving to the right, and then down row-by-row.

3 CASE STUDY
52 oN

Metocean data from the Borssele Wind Farm Zone (BWFZ)


was utilized to show an example of the clustering algorithm. The
Netherlands Enterprise Agency1 has established a goal to have
45’
8GW of offshore wind power capacity by 2023, either in an op- I
IV
erational or contracted stage. In order to meet this aggressive III
Longit ude

II
target, the Dutch government established four wind farm sites
on the Dutch continental shelf. The Dutch government con- 30’
Netherlands
tracted Deltares to provide a metocean study for each wind site Middelburg

that would be made public. The metocean data for Site III, on
which this example is based, can be found in [12]. The site 15’
is shown in Fig. 4. The wind conditions are based on high- Bruges Belgium

resolution HARMONIE data from 1979 to 2013. The wave data


are based on 20-year hindcast data (1992-2011), generated using
a third-generation, phase-averaging, shallow-water wave model, 51 oN Ghent
30’

named SWAN. The current conditions are estimated using hind-


30’

30’
4 oE
3 oE

cast simulations with the Delft3D-FLOW model for the same Lat it ude
time period as the waves. Only data from this time is provided Figure 4: Map of case-study site off the coast of Netherlands. The data
by Deltares in [12]. The data provided only consists of unimodal is taken from Site III.
wave spectra (i.e., the wave data is not split into wind-wave seas
and swell seas). However, it is common to split the wave data 3.1 RESULTS
into a bimodal spectrum using information from the wind hind-
Two-dimensional results for using this algorithm to find 50,
cast with such an algorithm proposed as proposed in [13]. The
5-dimensional clusters are shown in Figs. 5 and 6. In Fig. 5(b),
metocean conditions for Site III are determined at (2.961111N,
a cluster has formed over a single observation at 4.2 m Hs and
51.694424E), as shown by the + symbol in Fig. 4, which has
3.1 m/s wind speed. In general, it is helpful to keep graphically
a water depth of 35.1m [12]. The current data is not taken into
outlying data since they may have undue influence on the fatigue
account in the binning analysis, but is assumed to be a linear
life of the structure. While the algorithm successfully creates a
function of the wind speed.
cluster at this low windspeed, it fails to create any clusters with
Hs > 6m. The lack of clusters in this range has to do with the
initial grid for this data set. The bias of the initial grid can also
be seen in Fig. 5(b), where there are a lack of clusters from 8-10
1 RVO.nl m/s. The absence of clusters in certain wind speed ranges can be

6 Copyright © 2017 ASME

Downloaded From: http://proceedings.asmedigitalcollection.asme.org/ on 01/04/2018 Terms of Use: http://www.asme.org/about-asme/terms-of-use


(a) Hs vs θm
(a) Hs vs Tp

(b) Uw vs θw
Figure 6: Selected directional data for algorithm with 50 Bins.
(b) Uw vs Hs
creased to nearly 200 and by Fig. 8(a), the distribution of cluster
Figure 5: Selected data for algorithm with 50 bins. looks quite continuous.
troublesome when calculating the fatigue life from such a lim-
3.2 Quality Measure
ited set of data. For instance, some wind speed ranges may cause
more damage than others due to resonance interactions. Not in- For each set of bins a ‘quality’ measure can be used to de-
cluding these data can result in unconservative estimates for a scribe how closely the clusters are located to the observations.
fatigue life. If these large absences are noticed, it is prudent to The quality of a set of bins q is defined as,
use a data set with a larger number of clusters, which generally "
p
!#
1 n 1
results in a larger subspace of the original domain. The abil- q = 100 1 − ∑ ∑ |X̂k − X j,k | , (3)
ity to use more clusters in a fatigue analysis allows for a much n j=1 p k=1
more accurate and representative set of the data. This feature can where the taxicab (Manhattan) distance is used instead of the
be seen in Figs. 7 and 8, where wind speed as a function of wave Euclidian distance. This definition of the distance function pe-
height is shown for clusters of approximate size of 100, 200, 500, nalizes clusters that are far away from a given observation in a
1000 and 2000. In Figs. 5(b) and 7(a) there are extremely few single dimension. Increasing q corresponds to an increase in the
clusters between 0-3 m/s, 8-10 m/s, and 16-18 m/s. This issue is ‘quality’ of a set of bins, in the sense that more observations are
nearly resolved in Fig. 7(b) when the number of bins has been in- closer to their associated clusters. The q values for the bins pre-
sented in Sec. 3.1 are shown in Table 2.

7 Copyright © 2017 ASME

Downloaded From: http://proceedings.asmedigitalcollection.asme.org/ on 01/04/2018 Terms of Use: http://www.asme.org/about-asme/terms-of-use


(a) 565 bins.
(a) 101 bins.

(b) 981 bins.

(b) 193 bins.


Figure 7: Wind speed as a function of wave height for clusters of various
size.

Nbins q

50 93.617
101 93.720
193 94.615
565 95.625
981 96.329
1981 96.971
(c) 1981 bins.
Table 2: Measure of quality of bins as function of number of bins.
Figure 8: Wind speed as a function of wave height for clusters of various
size.
4 WEIGHTING FUNCTION EXTENSION
This section examines the purpose and consequences of
varying the α term in Eq. 2. This terms provides the user with

8 Copyright © 2017 ASME

Downloaded From: http://proceedings.asmedigitalcollection.asme.org/ on 01/04/2018 Terms of Use: http://www.asme.org/about-asme/terms-of-use


more flexibility when determining which observations are ‘in- 1
Number of Iterations: 5
4
significant’ and can be clustered, and which observations should 3.5
3.5
3
3
receive greater weight. Clearly, this process is subjective and 2.5
2
2.5
2
1.5 1.5
2.5 2 1
5 1
0.75 1. 1
5
Number of Iterations: 0 0.

0.5
1
4

normalized y−distance [−]


3.5
3.5 3 3
2.5 2.5 0.5
2 2
1.5 0.5 1
2 1.5
2.5 1
1
0.75 .5 1

1
1 1
5
0.
0.5

0.5
normalized y−distance [−]

0.5 0.25
1
0.5 1

1
1

0.5 1
0.5
0
0 0.25 0.5 0.75 1
normalized x−distance [−]
0.25
1
Figure 10: Visualization of final step of weighted algorithm.

the fatigue life of an ocean structure. The algorithm is similar to


0.5 1
the popular K−mean clustering algorithm used in many appli-
0
0 0.25 0.5
normalized x−distance [−]
0.75 1 cations, including data science and the pharmaceutical industry.
Figure 9: Visualization of α function and first step with weighted algo- One major difference, however, is that an initial grid is specified
rithm. by the user, instead of using a number of random seeds to initial-
ize the algorithm. The subjectivity introduced in this initializa-
tion scheme can be a cause for concern, as it reduces the gener-
should only be undertaken with caution. Figure 9 shows the first ality of the scheme. An informed user, however, can more easily
step in the algorithm where α = α(x, y) is displayed with col- alter the final locations of the clusters to ensure they are located
ored contours. For all observations that are within a contour of in structurally relevant environmental conditions. Furthermore,
0.5, these observations ‘see’ the other clusters at a minimized a p−dimensional weighting function can be used to modify the
distance. Alternatively, for α >> 1, the observations tend to not distance function that is used to attract observations to existing
be clustered during the initial iterations and are more likely to clusters. The creation and interpolation of p−dimensional func-
be associated to clusters that are more localized. This function- tion can create numerical issues and significantly slow down the
ality can be useful if the user would like to ensure that a cer- algorithm. Qualitatively, it was shown in this paper that for a
tain subset of observations are accounted for properly. However, hindcast timeseries of 20 years, with a 3-hour sample rate and 5
the interpolation scheme used to calculate the α value at a spe- dimensions, around 200 clusters can cover all observations quite
cific observation can be poorly behaved. The final clusters are adequately. In an internally conducted study, the fatigue life was
shown in Fig. 10. Comparing to the final positions of the clusters shown to converge around this value as well.
in Fig. 3(h), they are quite similarly positioned. The weight- Each ocean structure has a unique sensitivity to specific
ing function may be useful around areas of resonance interac- ocean conditions. Therefore, it is best practice to run a conver-
tions. When resonant frequencies of parts of the structure, (i.e, gence study on the fatigue life of a structure for a wide range of
the wind turbine’s tower) are excited by certain environmental bin sizes. Most likely, the values should approach the fatigue life
phenomena, it is necessary that the final clusters include these of a structure if every single observation is used in the analysis
conditions. Since the fatigue damage may be disproportionately (m = n). After this study has been performed, the bin size with
due to these specific environmental conditions, the designer must the least number of bins that resulted in a fatigue life close to the
include results from these simulations when calculating the over- final value can be used for design iterations.
all damage to the structure.
REFERENCES
5 CONCLUSIONS
[1] Forristall, G. Z., and Cooper, C. K., 2016. “Metocean Extreme
An efficient algorithm has been presented to cluster many and Operating Conditions”. In Springer Handbook of Ocean Engi-
multivariate metocean observations into a small number of load neering, M. R. Dhanak and N. I. Xiros, eds. Springer International
cases, which can be used along with a simulation tool, to estimate Publishing, Cham, pp. 47–76.

9 Copyright © 2017 ASME

Downloaded From: http://proceedings.asmedigitalcollection.asme.org/ on 01/04/2018 Terms of Use: http://www.asme.org/about-asme/terms-of-use


[2] DNV, 2013. Design of Floating Wind Turbine Structures. Offshore
Standard DNV-OS-J103, Det Norske Veritas AS, June.
[3] ABS, 2014. Guide for Building and Classing Floating Offshore
Wind Turbine Installations, July.
[4] IEC, 2009. International Standard for Wind Turbines-Part 3: De-
sign requirements for offshore wind turbines, IEC 61400-3:2009.
[5] Stewart, G. M., Robertson, A., Jonkman, J., and Lackner, M. A.,
2016. “The creation of a comprehensive metocean data set for
offshore wind turbine simulations”. Wind Energy, 19(6), June,
pp. 1151–1159.
[6] Camus, P., Mendez, F. J., Medina, R., and Cofiño, A. S., 2011.
“Analysis of clustering and selection algorithms for the study of
multivariate wave climate”. Coastal Engineering, 58(6), June,
pp. 453–462.
[7] Hartigan, J. A., and Wong, M. A., 1979. “Algorithm AS 136: A
K-Means Clustering Algorithm”. Journal of the Royal Statistical
Society. Series C (Applied Statistics), 28(1), pp. 100–108.
[8] Kohonen, T., 1990. “The self-organizing map”. Proceedings of
the IEEE, 78(9), Sept., pp. 1464–1480.
[9] Snarey, M., Terrett, N. K., Willett, P., and Wilton, D. J., 1997.
“Comparison of algorithms for dissimilarity-based compound se-
lection”. Journal of Molecular Graphics and Modelling, 15(6),
Dec., pp. 372–385.
[10] Jain, A. K., 2010. “Data clustering: 50 years beyond K-means”.
Pattern Recognition Letters, 31(8), June, pp. 651–666.
[11] Vogel, M., Hanson, J., Fan, S., Forristall, G., Li, Y., Fratantonio,
R., and Jonathan, P., 2016. “Efficient Environmental and Structural
Response Analysis by Clustering of Directional Wave Spectra”. In
OTC-27039-MS, Offshore Technology Conference.
[12] Riezebos, H. J., de Graaff, R., and Schouten, J., 2015. Metocean
study for the Borssele Wind Farm Zone. Tech. Rep. 1210467-000-
HYE-0012, Deltares, Feb.
[13] Hanson, J. L., and Phillips, O. M., 2001. “Automated Analysis of
Ocean Surface Directional Wave Spectra”. Journal of Atmospheric
and Oceanic Technology, 18(2), Feb., pp. 277–293.

10 Copyright © 2017 ASME

Downloaded From: http://proceedings.asmedigitalcollection.asme.org/ on 01/04/2018 Terms of Use: http://www.asme.org/about-asme/terms-of-use

You might also like