A New Pattern Recognition Methodology For Classification of Load Profiles For Ships Electric Consumers

Journal of Marine Engineering & Technology
ISSN: 2046-4177 (Print) 2056-8487 (Online) Journal homepage: https://www.tandfonline.com/loi/tmar20
A new pattern recognition methodology for

classification of load profiles for ships electric
consumers
G J Tsekouras, I K Hatzilau & J M Prousalidis
To cite this article: G J Tsekouras, I K Hatzilau & J M Prousalidis (2009) A new pattern recognition
methodology for classification of load profiles for ships electric consumers, Journal of Marine
Engineering & Technology, 8:2, 45-58, DOI: 10.1080/20464177.2009.11020222
To link to this article: https://doi.org/10.1080/20464177.2009.11020222
Published online: 01 Dec 2014.
Submit your article to this journal
Article views: 318
View related articles
Full Terms & Conditions of access and use can be found at

https://www.tandfonline.com/action/journalInformation?journalCode=tmar20
A new pattern recognition methodology for classification of load profiles for ships electric consumers
A new pattern recognition

methodology for classification
of load profiles for ships
electric consumers
GJ Tsekouras1 , IK Hatzilau1 , JM Prousalidis1,2
1
Hellenic Naval Academy, Department of Electrical Engineering and Computer Science
2
National Technical University of Athens, School of Naval Architecture and Marine
Engineering, Division of Marine Engineering
In this paper a new pattern recognition methodology is presented for the classification
of the daily chronological load curves of ship electric consumers (equipment) and the
determination of the respective typical load curves of each one of them. It is based on
pattern recognition methods, such as k-means, adaptive vector quantisation, fuzzy k-
means, self-organising maps and hierarchical clustering, which are theoretically described
and properly adapted. The parameters of each clustering method are properly selected
by an optimisation process, which is separately applied for each one of six adequacy
measures: the error function, the mean index adequacy, the clustering dispersion
indicator, the similarity matrix indicator, the Davies-Bouldin indicator and the ratio of
within cluster sum of squares to between cluster variation. As a study case, this method-
ology is applied to a set of consumers of Hellenic Navy MEKO type frigates.
AUTHORS’ BIOGRAPHIES Naval and Marine Engineering of National Technical Univer-

Dr GJ Tsekouras (Electrical & Computer Engineer from sity of Athens, dealing with electric energy systems and
NTUA/1999, Civil Engineer from NTUA/2004, PhD on electric propulsion schemes on shipboard installations.
Electrical Engineering from NTUA/2006). Adjunct lecturer at
the Hellenic Naval Academy, dealing with power system
analysis, forecasting and pattern recognition methodologies. INTRODUCTION
he ‘load demand profile’ ie, the chronological
Prof Dr-Ing IK Hatzilau (Electrical & Mechanical Engineer
from NTUA/1965, Dr-Ing from University of Stuttgart/
1969). After few years in the industry, he joined the Aca-
demic Staff of Dept of Electrical Engineering and Computer
Science in Hellenic Naval Academy dealing with marine
T energy demand curve is a ‘characteristic para-
meter’ of an electric consumer (equipment) of a
ship electric network, the value of which vary
depending on several factors such as ship type, state, oper-
ating modes and mission. In Fig 1 an indicative segment of
electrical and control engineering issues. He is representa- a daily chronological curve of the refrigeration plant in
tive of Hellenic Navy in NATO AC/141(MCG/6)SG/4 deal- ship-condition at ‘SHORE’ of HN MEKO type frigate is
ing with warship-electric systems. shown, based on records taken, every 1 min.1
The load demand profile is of primary importance in
Dr J Prousalidis (Electrical Engineer from NTUA/1991, PhD performing several studies such as load estimation, power
from NTUA/1997). Assistant Professor at the School of sources selection, power cable rating, short circuit analysis,
No. A14 2009 Journal of Marine Engineering and Technology 45

a methodology in terms of the exploitation of the results

yielded to a series of applications and studies of ship
systems. In brief, these results, ie, the typical chronological
load curves of each consumer, can be used as input infor-
mation in:
• the formalisation of the consumer’s behaviour and the

corresponding clustering using the representative load
curve of each consumer;
• the design of the ship’s electric power system estimat-
ing the respective total load demand more accurately
and, hence selecting the generators properly;
• the operation of the ship’s power system succeeding
more precise short-term load forecasting, increasing the
respective reliability and decreasing the respective op-
Fig 1: Indicative chronological energy demand curve of an eration cost;
electric consumer of HN MEKO type frigate • the improvement of power factor taking into considera-
tion the respective reactive load curves;
load shedding, harmonics, modulation etc. The representa- • the load estimation after the application of demand side
tive load demand profiles of each consumer can be obtained management programs for each specific consumer, as
from a classification procedure of its chronological load well as the feasibility studies of the energy efficiency
curves in typical time intervals (also called ‘typical days’). which normalise the total load demand and improve the
This classification can be conducted by pattern recognition total load factor;
techniques2-6 , such as: • the improvement of the operation of the automatic
battle management and load shedding systems, because
- the ‘modified follow the leader’2,4 the automatic supply of the critical consumers in each
- the k-means3-4 operating mode is facilitated in case of power system’s
- the adaptive vector quantisation3 fault based on the available generators, the healthy part
- the fuzzy k-means3-5 of the power distribution system and the load demand
- the self-organising map4 of each consumer.
- the hierarchical methods3-4 .
The objective of this paper is to present a new methodology
Regarding adequacy measures commonly used, these can be: for the classification of the daily chronological load curves
- the mean square error3,5 of ship electric consumers. This method is based on the so-
- the mean index adequacy2-3 called unsupervised neural networks. More specifically, for
- the clustering dispersion indicator2-4 each consumer the corresponding set of load curves is
- the similarity matrix indicator3 organised into well-defined and separate classes, in order to
- the Davies-Bouldin indicator 3-5 successfully describe the respective demand behaviour with-
- the modified Dunn index4 out any intervention by the user in the classification proce-
- the scatter index4 dure. The proposed methodology compares the results
- the ratio of within cluster sum of squares to between obtained from certain clustering techniques (k-means,
cluster variation.3 adaptive vector quantisation (AVQ), fuzzy k-means, self-
organising maps (SOM) and hierarchical methods) using six
Alternatively, the load curves’ classification of consumers adequacy measures (mean square error, mean index ade-
can be performed by data mining7 , wavelet packet trans- quacy, clustering dispersion indicator, similarity matrix in-
formation8 and Fourier analysis. The last ones are also used dicator, Davies-Bouldin indicator, ratio of within cluster
with simplified clustering models. Conventional tools, like sum of squares to between cluster variation). The main
statistical techniques,9 need the knowledge of the ‘typical points of this methodology are:
days’, which can be defined by experienced ship’s person-
nel, eg, chief engineers. • the estimation of the representative typical daily load
Evidently, one of the major issues of this classification profiles for each consumer;
is the definition of the ‘typical day’. In the continental • the modification of the clustering techniques for this
power systems the load curve’s time interval is usually a kind of classification problem, such as the appropriate
day for a study time period from a few weeks2 to one year.3 weights initialisation for the k-means and fuzzy k-
In ships’ power systems the corresponding time interval is means;
not defined by any standards, while the total study period • the proper parameters calibration, such as the amount
can not but be fairly limited (varying from a few days to of fuzziness for the fuzzy k-means, in order to fit the
one month). In this paper the typical time interval is as- classification needs;
sumed to be a day. • the comparison of the clustering algorithms perform-
In a previous paper,11 the authors have thoroughly dis- ance for each one of the adequacy measures;
cussed and highlighted the significance of developing such • the selection of the proper adequacy measure.
46 Journal of Marine Engineering and Technology No. A14 2009

Finally, the results of the application of the developed kWh and kVArh) for each period in steps of 1min,
methodology are thoroughly presented for the case study of 15min, etc. The chronological load curves for each
six electric consumers of Hellenic Navy MEKO type fri- individual consumer are determined for each study
gates, the chronological load curves of which have been period (week, month).
measured within the framework of a research project.1 2. Consumers’ clustering using a priori indices: Consu-
mers can be characterised by their voltage level (high,
medium, low), installed power, power factor, load fac-
PROPOSED PATTERN RECOGNITION tor, criticality according to ship’s operating mode etc.
These indices are not necessarily related to the load
METHODOLOGY FOR LOAD CURVES curves. They can be used however for the pre-classifi-
CLASSIFICATION OF SHIPS ELECTRIC cation of consumers. It is noted that the load curves of
CONSUMERS each consumer are normalised using the respective
The classification of daily chronological load curves of minimum and maximum loads of the period under
each ship electric consumer is achieved by applying the study.
pattern recognition methodology shown in Fig 2. The main 3. Data preprocessing: The load curves of each consumer
steps are: are examined for normality, in order to modify or
delete the values that are obviously wrong (noise sup-
1. Data and features selection: Using electronic meters or pression). If necessary, a preliminary execution of a
ship energy management system for main consumers, pattern recognition algorithm is carried out, in order to
the active and reactive power values are recorded (in trace erroneous measurements or networks faults,
Fig 2: Flow diagram of

pattern recognition
methodology for the
classification of daily
chronological load
curves of ships electric
consumer

vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
which, if uncorrected, will reduce the number of the u d
u1 X
useful typical time intervals for a constant number of d ð~ x‘2 Þ ¼ t
x‘1 , ~ ð x‘ i x‘2 i Þ
2
(1)
clusters. d i¼1 1
4. Typical load curves clustering for each consumer: For
2. the distance between the representative vector ~ w j of
each consumer, a number of clustering algorithms (k-
j-th cluster and the subset j , calculated as the geo-
means, adaptive vector quantisation, fuzzy k-means,
metric mean of the Euclidean distances between ~ w j and
self-organising maps and hierarchical methods) is ap-
each member of j :
plied. Each algorithm is trained for the set of load sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
diagrams and evaluated according to six adequacy 1 X 2
d ~ w j, j ¼ d ~ w j, ~ x‘ (2)
measures. The parameters of the algorithms are opti- N j ~x 2
‘ j
mised, if necessary. The developed methodology uses
the clustering methods providing the most satisfactory 3. the infra-set mean distance of a set, defined as the
results. This process is repeated for the total set of geometric mean of the inter-distances between the
consumers under study. Special consumers, such as members of the set, ie, for the subset j :
seasonal or emergency ones (eg, machine tools, fire- sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1 X 2
pumps, etc) are identified. These results can be com- d ð j Þ ¼
^ d ~ x‘ , j (3)
bined with the ships’ operating mode. At this stage, the 2N j ~x 2
‘ j
size of time interval (1, 2, 3, 4, 6 hours, half or one

day) could be investigated, too.
The basic characteristics of the five clustering methods
being used are the following.
The typical load curves of consumers used are selected by
choosing the type of typical time interval (such as the most
populated one, the time interval with the peak demand load, k-means model
etc).
The k-means method is the simplest hard clustering method,
which gives satisfactory results for compact clusters.11 The
k-means clustering method groups the set of the N input
vectors to M clusters using an iterative procedure. The re-
MATHEMATICAL MODELLING OF spective steps of the algorithm are the following:
CLUSTERING METHODS AND
CLUSTERING VALIDITY ASSESSMENT 1. Initialisation of the weights of the M clusters is deter-
mined. In the classic model a random choice among
General the input vectors is used4 . In the developed algorithm
In the study case of the chronological typical load curves of the w ji of the j-th centre is initialised as:
ship electric consumers, a number of N analytical daily load
w(0)
ji ¼ a þ b ( j 1)=(M 1) (4)
curves is given. The main target is to determine the respec-
tive sets of days and load patterns. Generally, N is defined where a and b are properly calibrated parameters.
as the population of the input (or training) vectors, which 2. During training iteration t (called ‘epoch’ t, hereinafter)
are going to be clustered. ~ x‘ ¼ (x‘1 , x‘2 , . . . x‘i , . . . x‘d ) T for each training vector ~ x‘ its Euclidean distances
symbolises the ‘-th input vector and d its dimension, which d(~x‘ , ~
w j ) are calculated for all centres. The ‘-th input
equals to 1440 (the load measurements are taken every vector is put in the set (jt) , for which the distance
minute). The corresponding set is given by X ¼ between ~ x‘ and the respective centre is minimum,
f~x‘ : ‘ ¼ 1, . . . , N g. It is worth mentioning that x‘i are which means:
normalised using the higher and lower values of all ele-
d ð~ w k Þ ¼ min d ~
x‘ , ~ x‘ , ~
wj (5)
ments of the original input patterns set, in order to obtain 8j
better results from the application of clustering methods.
3. When the entire training set is formed, the new weights
Each classification process makes a partition of the ini-
of each centre are calculated as:
tial N input vectors to M clusters. The j-th cluster has a
representative, which is the respective load profile and is 1 X
w(jtþ1) ¼ ( t)
~ ~
x‘ (6)
represented by the vector ~ w j ¼ (w j1 , w j2 , . . . , w ji , . . . , N j ~x 2( t)
‘
w jd ) T of d dimension. Vector ~ w j expresses the cluster j
centre. The corresponding set is the classes set, which is where N (jt)
is the population of the respective set (jt)
defined by W ¼ f~ w k , k ¼ 1, . . . Mg. The subset of input during epoch t.
vectors ~ x‘ , which belongs to the j-th cluster, is j and the 4. Next, the number of epochs is increased by one. This
respective population of load diagrams is N j . For the study process is repeated (return to step b) until the maxi-
and evaluation of classification algorithms the following mum number of epochs is used or weights do not
distance forms are defined: w(jt) ~
significantly change, ie, (j~ w(jtþ1) j , , where is
the upper limit of weight change between sequential
1. the Euclidean distance between ‘1 , ‘2 input vectors of iterations). The algorithm’s main purpose is to mini-
the set X: mise the error function J:

1X N between sequential iterations and the respective criter-

J¼ d 2 ð~ w k:~x‘ 2 k Þ
x‘ , ~ (7) ion is activated after Tin epochs.
N ‘¼1
The main difference with the classic model is that the The algorithm core is executed for a specific number of
process is repeated for various pairs of (a,b). The best neurons and the respective parameters 0 , min and T0 are
results for each adequacy measure are recorded for optimised for each adequacy measure separately. This pro-
various pairs (a,b). cess is repeated from M1 to M2 neurons.
Kohonen adaptive vector quantisation (AVQ) Fuzzy k-means

This algorithm is a variation of the k-means method, which During the application of the k-mean or the adaptive vector
belongs to the unsupervised competitive one-layer neural quantization algorithm, each pattern is assumed to be in
networks. It classifies input vectors into clusters by using a exactly one cluster (hard clustering). In many cases the
competitive layer with a constant number of neurons. Prac- areas of two neighbour clusters are overlapped, so that there
tically in each step all clusters compete with each other for are not any valid qualitative results.
the winning of a pattern. The winning cluster moves its If the condition of exclusive partition of an input pattern
centre to the direction of the pattern, while the rest of the to one cluster is to be relaxed, the fuzzy clustering techni-
clusters move their centres to the opposite direction (super- ques should be used. More specifically, each input vector ~ x‘
vised classification) or remain stable (unsupervised class- does not belong to only one cluster, but it participates in
ification). Here, we will use the last unsupervised every j-th cluster by a membership factor u‘ j, where:
classification algorithm. The respective steps are the follow- X
M
ing: u‘ j ¼ 1 & 0 < u‘ j < 1, 8 j (12)
j¼1
1. Initialisation of the weights of the M clusters is deter-
mined, where the weights of all clusters are equal to Theoretically, the membership factor gives more flexibility
0.5, that is w(0) ji ¼ 0:5, 8 j, i. in the vector’s distribution. During the iterations the follow-
2. During epoch t each input vector ~ x‘ is randomly pre- ing objective function is minimised:
sented and its respective Euclidean distances from
every neuron are calculated. In the case of existence of 1X M X N

J fuzzy ¼ u‘ j d 2 ~x‘ , ~
wj (13)
bias factor º, the respective minimization function is: N j¼1 ‘¼1

f winner_ neuron ð~
x‘ Þ ¼ j: min d ~ w j þ º N j =N
x‘ , ~ (8)
8j
The simplest algorithm is the fuzzy k-means clustering one,
where N j is the population of the respective set j in which the respective steps are the following:
during epoch t-1.
The weights of the winning neuron (with the smallest 1. Initialisation of the weights of the M clusters is deter-
distance) are updated as: mined. In the classic model a random choice among

the input vectors is used.5 In the developed algorithm
ð tÞ ð tÞ ð tÞ
~ w j ð n Þ þ ð t Þ ~
w j ð n þ 1Þ ¼ ~ x‘ ~
w j ð nÞ (9) the w ji of the j-th centre is initialised by equation (4).
2. During epoch t for each training vector ~ x‘ the member-
where n is the number of input vectors, which have ship factors are calculated for every cluster:
been presented during the current epoch, and (t) is the
ð Þ 1
learning rate according to: u‘ tþ1
j ¼
(14)
ð tÞ
t XM d ~ x‘ , ~
wj
ð tÞ ¼ 0 exp . min (10)

T 0 ð tÞ
k¼1 d ~x‘ , ~
wk
where 0 , min and T0 are the initial value, the mini- 3. Afterwards the new weights of each centre are calcu-
mum value and the time parameter respectively. The lated as:
remaining neurons are unchangeable for ~ x‘, as intro-
N
X q
duced by the Kohonen winner-take-all learning ð Þ
u‘ tþ1 ~
x‘
rule.13,14 j
ð Þ ‘¼1
3. Next, the number of epochs is increased by one. This w j tþ1 ¼
~ N
q (15)
X ð Þ
process is repeated (return to step b) until either the u‘ tþ1
j
maximum number of epochs is reached or the weights ‘¼1
converge or the error function J does not improve,
where q is the amount of fuzziness in the range (1, 1)
which means:
which increases as fuzziness reduces.
J ð tÞ J ð tþ1Þ 4. Next, the number of epochs is increased by one. This

, 9 for t > T in (11) process is repeated (return to step b) until the maxi-
J ð tÞ
mum number of epochs is used or weights do not
where 9 is the upper limit of error function change significantly change.

This process is repeated for different pairs of (a,b) and for during the fine tuning phase the respective values are
different amounts of fuzziness. The best results for each f , T 0 . The h i9 j (t) is the neighbourhood symmetrical
adequacy measure are recorded for different pairs (a,b) function, that will activate the j neurons that are topolo-
and q. gically close to the winning neuron i9, according to
their geometrical distance, who will learn from the
same ~ x‘ (collaboration stage). In this case the Gauss
Self-organising maps (SOM) function is proposed:
" #
The Kohonen SOM 13-16 is a topologically unsupervised
d 2i9 j
neural network that projects a d-dimensional input data set h i9 j ð tÞ ¼ exp (17)
into a reduced dimension space (usually a mono-dimen- 2 2 ð tÞ
sional or bi-dimensional map). It is composed of a prede- where d i9 j ¼ k~
ri9 ~
r j k is the respective distance be-
fined grid containing M 3 1 or M 1 3 M 2 d-dimensional tween i9 and j neurons, ~ r j ¼ (x j , y j ) are the respective
neurons ~w k for mono-dimensional or bi-dimensional map co-ordinates in the grid, (t) ¼ 0 exp (t=T 0 ) is
respectively, which are calculated by a competitive learning the decreasing neighbourhood radius function where 0
algorithm updating not only the weights of the winning and T 0 are the respective initial value and time para-
neuron, but also the weights of its neighbour units in in- meter of the radius respectively.
verse proportion of their distance. The neighbourhood size 3. Next, the number of the epochs is increased by one.
of each neuron shrinks progressively during the training This process is repeated (return to step b) until either
process, starting with nearly the whole map and ending with the maximum number of epochs is reached or the index
the single neuron. The training of SOM is divided into two Is gets the minimum value:10
phases:
I s (t) ¼ J (t) þ ADM ð tÞ þ TE(t) (18)
• rough ordering, with high initial learning rate, large where the quality measures of the optimum SOM are
radius and small number of epochs, so that neurons are based on the quantization error J – given by equation
arranged into a structure which approximately displays (7), the topographic error TE and the average distortion
the inherent characteristics of the input data, measure ADM. The topographic error measures the
• fine tuning, with small initial learning rate, small radius distortion of the map as the percentage of input vectors
and higher number of training epochs, in order to tune for which the first i91 and second i92 winning neuron are
the final structure of the SOM. not neighbouring map units:
The transition of the rough ordering phase to fine tuning X
N
one takes place after T s0 epochs. TE ¼ neighbð i91 , i92 Þ=N (19)
It is mentioned that, in the case of the bi-dimensional ‘¼1
map, the immediate exploitation of the respective clusters is where, for each input vector, neighb(i91 , i92 ) equals
not a simple problem. The map can be exploited either by either 1, if i91 and i92 neurons are not neighbours, or 0.
inspection or applying a second simple clustering method, The average distortion measure is given for the t epoch
such as the simple k-means method.16 Here, only the case by:
of the mono-dimensional map is examined. N X
X M

More specifically, the respective steps of the mono- ADM ð tÞ ¼ h i9!~x‘ , j ð tÞ d 2 ~x‘ , ~
w j =N (20)
dimensional SOM algorithm are the following: ‘¼1 j¼1
1. The number of neurons of the SOM’s grid are defined

and the initialisation of the respective weights is deter- This process is repeated for different parameters of 0 ,
mined. Thus, the weights can be given by (a) f , r , T0 , T 0 and T s0 . Alternatively, the multiplication
w ki ¼ 0:5, 8k, i, (b) the random initialisation of each factors and are introduced – without decreasing the
neuron’s weight, (c) the random choice of the input generalisation ability of the parameters’ calibration:
vectors for each neuron. T s0 ¼ T0 (21)
2. The SOM training commences by first choosing an T 0 ¼ T0 =ln 0 (22)
input vector ~ x‘, at t epoch, randomly from the input
vectors’ set. The Euclidean distances between the n-th
presented input pattern ~ x‘ and all ~ w k are calculated, so The best results for each adequacy measure are recorded for
as to determine the winning neuron i9 that is closest to all parameters 0 , f , r , T0 , and .
~
x‘ (competition stage). The j-th reference vector is
updated (weights’ adaptation stage) according to: Hierarchical agglomerative algorithms

ð tÞ ð tÞ ð tÞ
w j ð n þ 1Þ ¼ ~
~ w j ð nÞ þ ð tÞ h i9 j ð tÞ ~x‘ ~
w j ð nÞ Agglomerative algorithms are based on matrix theory6. The
input is the N 3 N dissimilarity matrix P0 . At each level t,
(16)
when two clusters are merged into one, the size of the
where (t) is the learning rate according to equation dissimilarity matrix Pt becomes (N t) 3 (N t). Matrix
(10). During the rough ordering phase, r , T 0 are the Pt is obtained from Pt-1 by deleting the two rows and
initial value and the time parameter respectively, while columns that correspond to the merged clusters and adding


a new row and a new column containing the distances 1
d ð1Þ ð C q , C s Þ ¼ d ð1Þ ð C i , C s Þ þ d ð1Þ ð C j , C s Þ
between the newly formed cluster and the old ones. The 2
distance between the newly formed cluster C q (the result of 1
d ð1Þ ð C i , C j Þ
merging C i and C j ) and an old cluster C s is determined as: 4
(29)
d ðC q , C s Þ ¼ ai d ðC i , C s Þ þ a j d ðC j , C s Þ
• the Ward or minimum variance algorithm (WARD):
þ b d ð C i , C j Þ þ c d ð C i , C s Þ d ð C j , C s Þ
d ð2Þ ð C q , C s Þ ¼
(23)
ð Þ ð Þ
where a i , a j , b and c correspond to different choices of the ð ni þ ns Þ d 2 ðC i , C s Þ þ ð n j þ ns Þ d 2 ðC j , C s Þ
dissimilarity measure. It is noted that in each level t the n s d ð2Þ ð C i , C j Þ
respective representative vectors are calculated by (6). ð ni þ n j þ ns Þ
The basic algorithms, which are going to be used in our (30)
case, are:
where:
• the single link algorithm (SL) – it is obtained from (23)
d ð2Þ ð C i , C j Þ ¼
ni n j
d ð1Þ ð C i , C j Þ (31)
for a i ¼ a j ¼ 0:5, b ¼ 0 and c ¼ 0:5: ni þ n j

d ð C q , C s Þ ¼ min d ð C i , C s Þ, d ð C j , C s Þ It is noted that in each level t the respective representa-
tive vectors are calculated by equation (6).
1 1
¼ d ðC i , C s Þ þ d ðC j , C s Þ (24)
2 2
1 The respective steps of the respective algorithms are the
d ð C i , C s Þ d ð C j , C s Þ following:
2
• the complete link algorithm (CL) – it is obtained from 1. Initialisation: The set of the remaining patterns R0 for
(23) for a i ¼ a j ¼ 0:5, b ¼ 0 and c ¼ 0:5: zero level (t ¼ 0 ) is the set of the input vectors X. The

d ð C q , C s Þ ¼ max d ð C i , C s Þ, d ð C j , C s Þ similarity matrix P0 ¼ P(X ) is determined. Afterwards
t increases by one (t ¼ t + 1).
1 1 2. During level t clusters C i and C j are found, for
d ðC i , C s Þ þ d ðC j , C s Þ
¼ (25)
2 2 which the minimisation criterion d(C i , C j ) ¼
1 min r,s¼1,..., N , r6¼ s d(C r , C s ) is satisfied.
þ d ð C i , C s Þ d ð C j , C s Þ 3. Then clusters C i and C j are merged into a single
2
cluster C q and the set of the remaining patterns R t is
• the unweighted pair group method average algorithm formed as: R t ¼ (R t1 fC i , C j g) [ fC q g.
(UPGMA): 4. The construction of the dissimilarity matrix Pt from
ni d ðC i , C s Þ þ n j d ðC j , C s Þ Pt1 is realised by applying equation (23).
d ðC q , C s Þ ¼ (26) 5. Next, the number of the levels is increased by one. This
ni þ n j
process is repeated (return to step b) until the remain-
where n i and n j - are the respective members’ popula- ing patterns R N 1 is formed and all input vectors are
tions of clusters C i and C j . in the same and unique cluster.
• the weighted pair group method average algorithm
(WPGMA): It is mentioned that the number of iterations is determined
1 from the beginning and it equals to the number of input
d ðC q , C s Þ ¼ d ðC i , C s Þ þ d ðC j , C s Þ (27) vectors decreased by 1 (N-1).
2
• the unweighted pair group method centroid algorithm
(UPGMC):

Adequacy measures
ð1Þ ð1Þ
n i d ð C i , C s Þ þ n j d C , C
ð j sÞ In order to evaluate the performance of the clustering algo-
d ð1Þ ð C q , C s Þ ¼
n
ð i þ n jÞ rithms and to compare them with each other, six different
adequacy measures are applied. Their purpose is to obtain
d ð1Þ ð C i , C j Þ well-separated and compact clusters, in order to make the
ni n j load curves self explanatory. The definitions of these meas-
ð n i þ n j Þ2
ures are the following:
(28)
(1) 2
where d (C q , C s ) ¼ k~
wq ~
w s k and ~
w q is the repre- 1. Mean square error or error function (J) given by equa-
sentative centre of the q-th cluster according to the tion (7).
equation (6). 2. Mean index adequacy (MIA)2-3 , which is defined as the
• the weighted pair group method centroid algorithm average of the distances between each input vector
(WPGMC): assigned to the cluster and its centre:

vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
u tem, the maximum peak load of which ranges between
u1 X M

MIA ¼ t d2 ~ w j, j (32) 3.5kW and 60kW. The respective data are the 1 minute ON/
M j¼1
OFF normalised load values for each individual consumer
for a period of eleven days during November 1997 and
3. Clustering dispersion indicator (CDI),2-4 which de-
January 1998.1 The respective consumers are:
pends on the mean infra-set distance between the input
vectors in the same cluster and inversely on the infra- • chiller,
set distance between the class representative load • HP compressor FWD/AFT,
curves: • refrigeration plant,
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi •
u LP compressor FWD/AFT,
u1 X M
CDI ¼ t d^ 2 ð k Þ= d^ ð W Þ (33) • sanitary plant,
M k¼1 • boiler.
4. Similarity matrix indicator (SMI),3 which is defined as Initially, the analysis of the ‘chiller’ is presented in detail,
the maximum off-diagonal element of the symmetrical while additional consumers can be analysed in a similar
similarity matrix, whose terms are calculated by using way. It is supposed that the time interval is a day. The
a logarithmic function of the Euclidean distance be- representative consumer’s typical day has been chosen to be
tween any kind of class representative load curves: the most populated one. The respective set of the daily
n 1 o chronological curves has 11 members. No curves are re-
SMI ¼ max 1 1=ln d ~ w p, ~
wq : jected through data pre-processing.
piq (34)
p, q ¼ 1, . . . , M
5. Davies-Bouldin indicator (DBI)3,5 , which represents the Application of k-means
system-wide average of the similarity measures of each The modified model of the k-means method is executed for
cluster with its most similar cluster: different pairs (a,b) from 2 to 10 clusters, where a ¼ {0.1,
( ) 0.11 ,. . . , 0.45} and a + b ¼ {0.54, 0.55, . . . , 0.9}. For
1 X M
d^ ð p Þ þ d^ ð q Þ each cluster, 1332 different pairs (a, b) are examined. The
DBI ¼ max :
M k¼1 p6¼ q d ~w p, ~
wq (35) best results for the six adequacy measures do not refer to
p, q ¼ 1, . . . , M the same pair (a, b). From the results of Table 1, it is
obvious that the developed k-means is superior to the clas-
6. Ratio of within cluster sum of squares to between sical one with the random choice of the input vectors during
cluster variation (WCBCR)3 , which depends on the the centres’ initialisation. For the classical k-means model
sum of the distance square between each input vector 100 executions are carried out and the best results for each
and its cluster representative vector, as well as the adequacy measure are recorded. The superiority of the
similarity of the clusters centres: developed model applies in all cases of neurons. A second
X M X X
M advantage is the convergence to the same results for the
2

WCBCR ¼ d ð~ x‘ Þ
wk, ~ d2 ~ w p, ~
wq respective pairs (a,b), which can not be achieved using the
x‘ 2 k
k¼1 ~ 1<q, p classical model.
(36)
The success of the various algorithms for the same final

Application of adaptive vector quantisation
number of clusters is expressed by having small values of The initial value 0 , the minimum value min and the time
adequacy measures. By increasing the number of clusters, parameter T0 of learning rate must be properly calibrated.
all measures decrease, except from the similarity matrix For example in Fig 3 the sensitivity of the ratio of within
indicator. An additional adequacy measure could be the cluster sum of squares to between cluster variation WCBCR
number of the dead clusters, for which the sets are empty. It to the 0 and T0 parameters is presented for 90 experi-
is intended to minimise this number. It is noted that in ments. The best results of the adequacy measures are not
equations (7), (32)-(36), M is the number of clusters without given for the same 0 and T0, according to the results of
the dead ones. Table 1. The min value does not practically improve the
neural network’s behaviour assuming that it ranges between
10-5 and 10-6 .
APPLICATION OF THE PROPOSED
METHODOLOGY TO A SET OF Application of fuzzy k-means
CONSUMERS OF HELLENIC NAVY In the fuzzy k-means method the model is executed for
MEKO FRIGATES different amount of fuzziness q ¼ {2, 4, 6}. As an example
the WCBCR is presented in Fig 4 for different number of
Analysis of the case study clusters for three cases of q ¼ {2, 4, 6}. The best results
The developed methodology is applied to the measured data are given by q ¼ 6, while the respective values for q ¼ 4 is
of six consumers of HN MEKO type frigates electric sys- quite similar to those of q ¼ 6.

Methods -Parameters Adequacy Measure

J MIA CDI SMI DBI WCBCR
Proposed k-means 42.234 0.15467 0.31280 0.35368 0.39264 0.029022
a- b parameters 0.10– .59 0.10– .59 0.10– .59 0.10– .59 0.18-.67 0.24 – 0.31
Classical k-means 47.255 0.17675 0.33473 0.37643 0.41847 0.031293
AVQ 27.046 0.12375 0.31076 0.38665 0.28141 0.028891
0 min T0 parameters 0.9-5 3 10-7 - 0.9-5 3 10-7 - 0.9-5 3 10-7 - 0.1-5 3 10-7 - 0.6-5 3 10-7 - 0.5-5 3 10-7 -
2000 2000 5000 4000 2500 3500
Fuzzy k-means 86.908 0.30057 0.72987 0.90689 No 0.826710
q- a- b parameters 6- 0.10- 0.46 6- 0.10- 0.46 6- 0.10- 0.74 6- 0.28- 0.32 convergence 6- 0.10- 0.74
All hierarchical algorithms 27.030 0.12374 0.31280 0.43515 0.40657 0.029021
(in this case only)
Mono-dimensional SOM 40.223 0.15241 0.43608 0.58230 0.48805 0.058237
0 f r T0 4-1.0-0.6-0.4- 4-1.0-0.6-0.4- 4-1.0-0.6-0.4- 4-1.0-0.1-0.4- 4-1.0-0.1-0.3- 4-1.0-0.6-0.4-
parameters 5 3 10-4 -750 5 3 10-4 -750 5 3 10-4 -750 5 3 10-4 -250 5 3 10-4 -750 5 3 10-4 -750
Table 1: Comparison of the best clustering models for 4 clusters for the chiller
Fig 3: WCBCR measure

of the AVQ method for
4 clusters for the set of
11 training patterns of
the chiller with
0 ¼ {0.1, 0.2, . . . , 0.9},
T0 ¼ {500, 1000, . . . ,
5000}
Application of mono-dimensional self-organising

maps
Although the mono-dimensional SOM is theoretically
well defined, there are several issues that need to be
solved for the effective training of SOM. The major
problems are:
• to stop the training process of the optimum SOM.

In this case the target is to minimise the index Is
(equation (18)). Generally, it is noticed that the con-
vergence is completed after 0.5 4 2.0. T0 epochs
during fine tuning phase, when T0 has big values
(>1000 epochs).
• the proper calibration of (a) the initial value of the
Fig 4: WCBCR measure of the fuzzy k-means method for 2 to neighbourhood radius 0 , (b) the multiplicative factor
10 clusters for the set of 11 training patterns of the chiller between T s0 (epochs of the rough ordering phase)
with q ¼ 2, 4, 6 and T0 (time parameter of learning rate), (c) the

multiplicative factor between T 0 (time parameter

of neighbourhood radius) and T0 , (d) the proper ini- Comparison of clustering models for the chiller
tial values of the learning rate r and f during the The best results for each clustering method (modified k-
rough ordering phase and the fine tuning phase re- means, adaptive vector quantisation, fuzzy k-means, self-
spectively. The optimisation process for the mono- organised maps and hierarchical methods) are presented in
dimensional SOM parameters is similar to that one of Fig 6. The optimised AVQ and WARD methods present the
the application of the adaptive vector quantisation for best behaviour for the mean square error J, the optimised
0 and T0 . The best results for each adequacy meas- AVQ and the unweighted pair group method average algo-
ures are presented in Table 1 for the case of 4 rithm (UPGMA) for the MIA, the optimised AVQ and the
clusters. modified k-means for the CDI, DBI and WCBCR, the
• the initialisation of the weights of the neurons, where modified k-means for the SMI. All measures (except DBI)
the three cases of the theoretical analysis of self- show an improved performance, as the number of clusters
organising maps are examined and the best training increases.
behaviour is presented in case (a) (for w ki ¼ 0:5, The number of dead clusters for the WCBCR indicator
8k, i ). for all clustering techniques and for all adequacy measures
for AVQ method are shown in Fig 6g and 6h respectively.
Here, it is not obvious which measure is the best one
because of the extremely small set of training patterns.
However, taking into consideration the results3,10 and noting
Application of hierarchical agglomerative algorithms that the basic theoretical advantage of the WCBCR measure
In the case of the seven hierarchical models the best results is that it combines the distances of the input vectors from
are given for different models for two and three clusters, the representative clusters and the distances between clus-
while for four clusters or more, all models give the same ters (covering also the J and CDI characteristics), the appli-
results, as it is shown in Fig 5. It should be mentioned that cation of WCBCR is proposed.
there are not any other parameters for calibration, such as Observing Fig 6f, the improvement of the WCBCR is
maximum number of iterations etc. significant up to 4 clusters. After this value, the behaviour
Fig 5: Adequacy measures of the 7 hierarchical clustering algorithms for 2 to 10 clusters for the set of 11 training patterns of the
chiller

Fig 6: Adequacy measures and dead clusters of each clustering method for 2 to 10 clusters for the set of 11 training patterns of
the chiller
Type Number of Population of Best Date

clusters the clustering
representative technique November 1997 January 1998
cluster
20 21 22 23 24 25 26 27 1 2 3
a 4 7 AVQ 2nd 3rd 3rd 1st 4th 4th 4th 4th 4th 4th 4th
b 3 9 CL - UPGMC 2nd 1st 1st 1st 1st 1st 1st 1st 3rd 1st 1st
c 4 7 k-means 4th 3rd 4th 4th 4th 4th 4th 4th 2nd 1st 1st
d 4 6 UPGMC 4th 3rd 4th 4th 1st 1st 1st 1st 1st 2nd 1st
e 3 8 All hierarchical 1st 2nd 1st 1st 1st 1st 1st 3rd 1st 2nd 1st
f 4 8 k-means 1st 2nd 1st 1st 1st 1st 1st 1st 4th 3rd 1st
Table 2: Results of the methodology for 6 electrical consumers of HN MEKO Type frigates, where type ‘a’ is the chiller, type ‘b’
the HP compressor FWD/AFT, type ‘c’ the refrigeration plant, type ‘d’ the LP compressor FWD/AFT, type ‘e’ the sanitary plant,
type ‘f’ the boiler respectively

of the adequacy measure is gradually stabilised. It can also chiller are presented using the AVQ method with the best
be estimated graphically, using the ‘knee-rule’, of which WCBCR measure for 4 clusters.
gives values between 3 to 4 clusters (see Fig 7). In Fig 8 The training time for the methods under study has the
the typical daily chronological ON/OFF load curves for the ratio 0.05:1:22:24:36 (hierarchical: proposed k-means: opti-
mised adaptive vector quantisation: mono-dimensional self-
organising map: fuzzy k-means for q ¼ 6). Therefore, the
k-means, AVQ and hierarchical models have been selected.
Application of the clustering models for the other

electric consumers
The same methodology is applied to the other five electric
consumers. In Table 2, the total number of clusters, the
population of the representative cluster, the respective clus-
tering technique and the clusters calendar are registered for
each consumer. It is obvious that the modified k-means, the
AVQ and the hierarchical (especially UPGMC) clustering
Fig 7: WCBCR measure of the AVQ model for 2 to 10 clusters techniques compete each other, while the fuzzy k-means
for the set of 11 training patterns of the chiller and the use of and self-organising map techniques have poor results in this
the tangents for the estimation of the knee kind of classification. It is reminded that the representative
Fig 8: The typical daily

chronological ON/OFF
load curves for the
chiller for four clusters
using AVQ method

cluster is the most populated one. In Fig 9, the respective the electrical consumer’s load behaviour, the design, the
representative typical load diagrams are presented. operation and the reliability of the ship’s power system and
the improvement of the operation of the automatic battle
management system for a battleship.
CONCLUSIONS From the respective application for six consumers of
This paper presents an efficient pattern recognition method- HN MEKO type frigates electric system for a small dataset
ology for the study of the load demand behaviour of elec- (only 11 training sets), it is concluded that three to four
trical consumers of ships. Unsupervised clustering methods clusters suffice for the satisfactory description of the daily
can be applied, such as the k-means, adaptive vector quanti- load curves of each consumer. It is also concluded that the
sation (AVQ), fuzzy k-means, mono-dimensional self-orga- optimal clustering technique is the modified k-means, the
nising maps (SOM) and hierarchical methods. The AVQ and the hierarchical clustering techniques, while the
performance of these methods is evaluated by six adequacy optimal adequacy measure is the ratio of within cluster sum
measures: mean square error, mean index adequacy, cluster- of squares to between cluster variation.
ing dispersion indicator, similarity matrix indicator, Davies- These results surely depend on the population of the
Bouldin indicator, the ratio of within cluster sum of squares data set, the ship operating mode (‘anchor’, ‘shore’, ‘at
to between cluster variation. Finally, the representative daily sea’, ‘general quarter’) and the load curve’s time interval
load diagrams along with the respective populations per (in this case study, it was a day). In the future, it should be
each consumer are calculated. This information is valuable investigated in larger datasets for longer study periods with
for ship engineers, because it facilitates the formalisation of different time intervals (from few hours to days).
Fig 9: The representative (most populated one) chronological ON/OFF load curves for each one of the six consumers

REFERENCES 8. Petrescu M, Scutariu M. 2002. Load diagram charac-

1. Hatzilau IK, Perros S, Karamolegos G, Galanis K, terisation by means of wavelet packet transformation. 2nd
Dalakos A, Anastasopoulos K, Kavousanos A and Eno- Balkan Conference, Belgrade, Yugoslavia, June 2002.
tiadis X. 1998. Load estimation of MEKO Class frigates 9. Chen C S, Hwang J C, Huang C W. 1997. Application
– Field measurements, results. Proceedings of the Inter- of load survey systems to proper tariff design. IEEE Trans.
national Conference on The Naval Technology for the Power Syst,Vol. 12-3: 1746-1751.
21st Century, pp225-232, 29-30 June 1998, Pireaus, 10. Tsekouras GJ, Kotoulas PB, Tsirekis CD, Dialynas
Greece. EN, Hatziargyriou ND. 2008. A pattern recognition method-
2. Chicco G, Napoli R, Postolache P, Scutariu M and ology for evaluation of load profiles and typical days of
Toader C. 2003, Customer characterisation for improving large electricity customers. Electrical Power Systems Re-
the tariff offer, IEEE Transactions Power Systems, 18-1: search, Vol. 78-9: 1494-1510.
381-387. 11. Hatzilau IK, Tsekouras GJ, Prousalidis J, Gyparis
3. Tsekouras GJ, Hatziargyriou ND and Dialynas EN. IK. On electric load characterization and categorization in
2007. Two-stage pattern recognition of load curves for ship electric installations. INEC-2008, 1-3 April 2008,
classification of electricity customers. IEEE Transactions Hamburg, Germany.
Power Systems, 22-3: 1120-1128. 12. Duda RO, Hart PE, Stork DG. 2001. Pattern Classi-
4. Gerbec D, Gasperic S, Smon I and Gubina F. 2005. fication. 1st edition, A Wiley-Interscience Publication.
Allocation of the load profiles to consumers using probabil- 13. Haykin S. 1994. Neural networks: A comprehensive
istic neural networks. IEEE Transactions Power Systems, foundation. 2nd Edition, Prentice Hall, NJ, 1994.
Vol 20-2: 548-555. 14. Kohonen T. 1989. Self–organization and associative
5. Chicco G, Napoli R and Piglione F. 2006. Compari- memory. 2nd Edition, Springer-Verlag, NY.
sons among clustering techniques for electricity customer 15. SOM Toolbox for MATLAB 5. 2000. Helsinki, Fin-
classification. IEEE Transactions Power Systems, Vol 21-2: land: Helsinki Univ. Technology.
933-940. 16. Chicco G, Napoli R, Piglione F, Scutariu M, Post-
6. Theodoridis S and Koutroumbas K. 1999. Pattern olache P, Toader C. 2002. Options to classify electricity
Recognition, 1st ed. New York: Academic Press. customers. Med Power 2002, 4-6 November, Athens,
7. Figueiredo V, Rodrigues F, Vale Z, Gouveia JB. 2005. Greece
An electric energy consumer characterization framework 17. Hand D, Manilla H, Smyth P. 2001. Principles of
based on data mining techniques. IEEE Trans. Power Syst, data mining. The MIT Press, Cambridge, Massachusetts,
Vol. 20-2: 596-602. London, England.

A New Pattern Recognition Methodology For Classification of Load Profiles For Ships Electric Consumers

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A New Pattern Recognition Methodology For Classification of Load Profiles For Ships Electric Consumers

Uploaded by

Copyright:

Available Formats

Journal of Marine Engineering & Technology

ISSN: 2046-4177 (Print) 2056-8487 (Online) Journal homepage: https://www.tandfonline.com/loi/tmar20

A new pattern recognition methodology for

G J Tsekouras, I K Hatzilau & J M Prousalidis

To link to this article: https://doi.org/10.1080/20464177.2009.11020222

Published online: 01 Dec 2014.

Submit your article to this journal

Article views: 318

View related articles

Full Terms & Conditions of access and use can be found at

A new pattern recognition

AUTHORS’ BIOGRAPHIES Naval and Marine Engineering of National Technical Univer-

No. A14 2009 Journal of Marine Engineering and Technology 45

a methodology in terms of the exploitation of the results

• the formalisation of the consumer’s behaviour and the

46 Journal of Marine Engineering and Technology No. A14 2009

Fig 2: Flow diagram of

No. A14 2009 Journal of Marine Engineering and Technology 47

size of time interval (1, 2, 3, 4, 6 hours, half or one

48 Journal of Marine Engineering and Technology No. A14 2009

1X N between sequential iterations and the respective criter-

Kohonen adaptive vector quantisation (AVQ) Fuzzy k-means

No. A14 2009 Journal of Marine Engineering and Technology 49

1. The number of neurons of the SOM’s grid are defined

50 Journal of Marine Engineering and Technology No. A14 2009

No. A14 2009 Journal of Marine Engineering and Technology 51

The success of the various algorithms for the same final

52 Journal of Marine Engineering and Technology No. A14 2009

Methods -Parameters Adequacy Measure

Fig 3: WCBCR measure

Application of mono-dimensional self-organising

• to stop the training process of the optimum SOM.

No. A14 2009 Journal of Marine Engineering and Technology 53

multiplicative factor between T 0 (time parameter

54 Journal of Marine Engineering and Technology No. A14 2009

Type Number of Population of Best Date

No. A14 2009 Journal of Marine Engineering and Technology 55

Application of the clustering models for the other

Fig 8: The typical daily

56 Journal of Marine Engineering and Technology No. A14 2009

No. A14 2009 Journal of Marine Engineering and Technology 57

REFERENCES 8. Petrescu M, Scutariu M. 2002. Load diagram charac-

58 Journal of Marine Engineering and Technology No. A14 2009

You might also like

multiplicative factor between T 0 (time parameter