You are on page 1of 5

2017 7th International Conference on Power Systems (ICPS)

College of Engineering Pune, India. Dec 21-23, 2017

Optimal Selection of Significance Level α for


g-Means Algorithm Based Load Profiling
Mudassir A Maniar, A. R. Abhyankar
Department of Electrical Engineering
Indian Institute of Technology Delhi
New Delhi, India - 110016
Email: maniar.er@gmail.com and abhyankar@ee.iitd.ac.in

Abstract—Clustering Algorithms like k-Means, Fuzzy C Means adopted by utilities for tariff design, expansion planning, distri-
(FCM), Self-Organizing Maps (SOM), Gaussian Mixture Model bution system management, demand response, load forecasting
(GMM), Expectation Minimization (EM), etc. are adopted for etc. Chicco et. al. have presented a detailed literature survey on
load profiling. For all such clustering algorithms, number of
clusters to be formed should be known a priori. But initial the application of various methods adopted for load profiling
guess about a number of clusters is not always available for from time to time [2], [3], [4]. Jardini et. al. [5] have proposed
most of the applications. So, Cluster Validity Indices (CVI) are a statistical analysis based clustering for load profiling of LV
evaluated to quantitatively assess and judge the outcome and the consumer data of Brazilian utility. Carpaneto et. al. [6] have
number of clusters. Recently, g-Means clustering was proposed proposed frequency domain approach for clustering. Chicco et.
for load profiling where a number of clusters are evaluated by
an algorithm based on specified significance level α. Selection of al. [7], Figueiredo et. al. [8], and Verdu et. al. [9] have pro-
α is crucial for extracting optimal number of profiles or groups. posed Self-Organizing Maps (SOM) for load profiling on Low
This paper demonstrates the effect of α on load profiling outcome Voltage (LV) consumer data of various European distribution
and proposes a methodology to select appropriate value of the utilities of Romania, Portugal and Spain, respectively. Gerbec
α. Real Life data of 176 feeders of Central Indian state discom et. al. [10] have proposed probabilistic neural network for
has been used to validate the proposal.
Index Terms—Cluster Validity Indices (CVI), g-Means, Load
load profiling of consumers of Slovenian distribution system.
Profiling, Significance level α. Singh et. al. [11] have proposed probability density function
based Expectation Minimization clustering approach for load
I. I NTRODUCTION profiling on 95-bus UK Generic Distribution System. Mutanen
et. al. [12] have proposed used of ISODATA algorithm of
The restructuring of electrical power systems all across clustering for load profiling on consumer data of Western
the world is giving an impetus to efficient, smart, advanced, Finland discom.
reliable, economical and greener operation of power sys- Except hierarchical clustering, all algorithms require a num-
tem. But, along with that come new challenges in planning, ber of the clusters to be known a priori. So, to judge the
operation, control and expansion of power system due to number of clusters and assess the quality of outcome, cluster
highly diverse and conflicting interests of different entities viz. validity analysis is carried by evaluating Cluster Validity
Gensco/IPP, Transco, Discom, MO, ISO, and Regulator, etc. In Indices (CVIs). They are mathematical functions evaluated to
this competitive environment, it is essential for each entity to judge the effectiveness of clustering outcome. Ideally, CVIs
have a thorough knowledge of electricity demand for survival, should quantify the ‘within the cluster similarity’ i.e, cohesion
profit-making and expansion. A competitive environment has and ‘between cluster dissimilarity’ i.e, separation. The most
developed an interest in load profiling and research amongst all widely adopted indices for crisp clustering are Dunns Index
players. This is not a recent exercise carried by power system (DI) [13], Calinski-Harabasz Index (CH) [14], Davis-Bouldin
operators, but was started in 1934 by AEIC (Association Index (DB) [15], Silhouette Index (SHI) [16], Xie-Beni Index
of Edison Illuminating Companies) for grouping of different (XB) [17] etc. The choice of validity indices depends on the
consumer class, load curves of various appliances and tariff type of clustering algorithm.
design, etc. Systematic load research was started in 1980 in Tsekouras et. al. [18] have compared k-Means, FCM, SOM
the US when order 48 was passed by FERC to implement and hierarchical clustering using six different cluster validity
article 37 of PDPUA, where it becomes necessary for the large indices for load profiling on load patterns of 94 Medium Volt-
utility to carry out this exercise every two years [1]. age (MV) consumers of Greek power system. Authors have
Load profiling is carried out on power consumption data concluded that k-Means and Hierarchical Clustering are the
(consumer level or utility level) acquired at a regular time most appropriate for load profiling. Literature survey reveals
interval of around 1, 15, 30 or 60 minutes. The objective of that k-Means algorithm is most widely preferred algorithm for
load profiling is to group consumers by consumption pattern load profiling for the larger data set, but it has few limitations.
similarity. Each representative pattern of group or cluster is The number of clusters k should be known a priori. Then,
known as a profile. This information is useful and widely ‘within the cluster similarity’ is high but ‘between the cluster

978-1-5386-1789-2/17/$31.00 ©2017 IEEE 446

Authorized licensed use limited to: NWFP UNIV OF ENGINEERING AND TECHNOLOGY. Downloaded on August 01,2022 at 15:24:36 UTC from IEEE Xplore. Restrictions apply.
similarity’ is low and the performance of the algorithm is
data dependent. Also, the outcome depends on initialization
of centroid and may be local optimum. However, limitation of
local optimal solution could be avoided by executing k-Means
algorithm multiple times with different random initialization
and selecting the best outcome.
Recently, Mets et. al. [19] have adopted g-Means clustering
algorithm for load profiling. It is a variant of k-Means, which
evaluates the number of clusters ‘k’, based on Anderson-
Darling statistical analysis test by verifying zero Gaussian
mean distribution of cluster around its centroid at specified
significance level α. The method is based on a robust k-Means
algorithm with an unsupervised approach. It is very suitable for
load profiling related application provided an appropriate value
of α is selected. Though, algorithm overcomes the limitation
of specifying k as an input, however α must be specified. The
choice of α is crucial because a higher value of α result in a
large number of clusters and lower value may distort clustering Fig. 1: General Clustering Algorithm for Load Profiling
outcome. This paper demonstrates the effect of α on clustering
outcome and proposes a method to select the appropriate value
for load profiling application. The study has been carried out
on real-life active power data of distribution system feeders
acquired at one-hour intervals for the year 2015 of Central
Indian state discom.
II. G ENERAL L OAD P ROFILING P ROCEDURE AND
G -M EANS A LGORITHM
The ultimate aim of all partitional clustering algorithms is
to divide patterns amongst the subgroups such that patterns
in a given subgroup are as similar as possible and patterns in
different subgroups are as dissimilar as possible. These m pat-
terns are represented by a profile or data matrix Z = [xij ]m×n .
Each row of this matrix represents a pattern and each member
of this matrix is known as a feature. So, xij represents a j th
feature of the ith pattern.
In our case, the power delivered by the feeder i at an interval
of one hour for a year is a time series represented as Xi =
xi1 , xi2 , xi3 , ..., xin , where n is number of hours in a year and
xin is an active power carried by feeder i at a given hour of
a day of a given month and year. A flowchart of the generic
clustering algorithm is shown in Fig.-1
1) g-Means Algorithm: The g-Means algorithm is a combi- Fig. 2: g-Means Algorithm
nation of k-Means algorithm and Anderson-Darling statistical
analysis test [20]. The flow chart of an algorithm is shown
in Fig.-2. Unlike another clustering algorithms, where the are extracted. The clusters that fail the test are only merged,
number clusters k is specified as an input, it determines and the algorithm proceeds with k = Kmin till all features
the optimal number of clusters from input data. g-Means are grouped or k = Kmax is reached. The value Kmin can
iteratively executes k-Means algorithm by increasing number available as an initial guess or can be simply taken as 2.
of clusters as it progresses. After each execution of k-Means, 2) Anderson Darling Test: Anderson-Darling test is power-
Anderson-Darling normality test is applied on each cluster ful statistical hypothesis test for normal distribution [21] [22].
to verify that all features/patterns in a group or a cluster It assesses whether a sample comes from a specific distribution
follow a normal distribution around its centroid with specified (normal, exponential, Weibull distribution, etc.) at a specified
significance level. If all the clusters pass the test then algorithm significance level α. A standard normal distribution curve
terminates and the result is displayed. But, if the test fails, indicating the different value of α is shown in Fig 3. The test
then, it merges all the clusters and repeats the process with begins with an assumption that a given set of observations
k = k + 1. For a large dataset, the clusters that pass the test X = x1 , x2 , ...., xn does not follow hypothesized theoretical

447

Authorized licensed use limited to: NWFP UNIV OF ENGINEERING AND TECHNOLOGY. Downloaded on August 01,2022 at 15:24:36 UTC from IEEE Xplore. Restrictions apply.
50
JAN

45 FEB
MAR

40 APR

35

Number of Clusters
30
Knee Points representing
25 suitable vlaue of α

20

15

10

0
10 -4 10 -3 10 -2 10 -1
Significnace Level α

Fig. 3: A Standard Normal Distribution Curve. Fig. 5: Effect of significance level α on number of clusters
evaluated based on g-Means Algorithm.
Representative Load Pattern of All Feeders
1.1
Working Day or Weekday Non-Working Day or Holiday
1
50
0.9 JAN
FEB
0.8 45
Active Power in p.u.

MAR
APR
0.7
40
0.6

CH Index Value
0.5 35

0.4 30 Knee Point for Judging optimal


number of cluster
0.3
25
0.2
0.1 20

0
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 15
Hours
10
0 10 20 30 40 50 60 70
Number of Clusters
Fig. 4: Representative Load Patterns of 176 HV Feeders
Patterns for a Month of January. Fig. 6: Cluster validity analysis of four monthly load patterns
of 176 HV feeders based on CH index.

normal distribution. It evaluates the test statistic given by (1),


and compare it with critical values of hypothesized distribution The g-Means algorithm and above procedure were pro-
at specified significance level α. grammed in Matlab 2015a [24]. During each execution of g-
n
 Means, clustering has been carried with 100 random initial
2i − 1  
A2 = −n − ln(φ(xi )) + ln(1 − φ(xn+1−i )) (1) guesses and best outcome amongst them is selected to avoid
n
i=1 local optimum solution. The α was varied in four steps:
where φ = 12 [1 + erf ( x−μ √ )] is a cumulative distribution
σ 2
1) From 0.00001 to 0.00001 in steps of 0.00001.
function (CDF) of a normal distribution. 2) From 0.0001 to 0.001 in steps of 0.0001.
The result of the test is logical if test statistic exceeds the 3) From 0.001 to 0.01 in steps of 0.001.
critical value than the hypothesis of normality is rejected and 4) From 0.01 to 0.1 insteps of 0.01.
vice a versa. The value of α can vary from 0 to 1. Higher the
value of α more stringent are the criteria for test hypothesis IV. R ESULTS
as shown in Fig.-3. The g-Means clustering algorithm is repeatedly applied on
given data by varying α from 0.00001 to 0.1 on four monthly
III. M ETHODOLOGY
data patterns of 176 feeders. Initially, the change is gradual,
To analyze the effect of α on clustering, a monthly data of but beyond certain value of α, which lies between 0.005 to
176 feeders of central region discom of Madhya Pradesh State 0.01, the number of the clusters increases rapidly. Clusters
of Central India has been selected for the study [23]. A pre- with new unique patterns are revealed till the certain value of
processed, normalized representative load patterns of typical α but beyond that, similarly behaving patterns are extracted.
working and non-working day of a month of 176 feeders is The large number of clusters with few feeders are formed
shown in Fig-4. The data pre-processing and normalization which resemble each other. In fact, all such patterns should be
is not explained as load profiling is not the focus of the grouped together in a single cluster, but algorithm segregates
study. There is no prior information or initial guess available them due to the high value of significance level α. The
about a number of clusters as data of all types of HV outcome of g-Means at the different value of alpha is shown by
feeders of distribution system has been selected without any different sub-figures of Fig.-7, while Fig.-5 shows a graph of
discrimination. The steps are as follows: the number of clusters versus significance level α. Increasing
1) Read Input data. the α beyond a certain value (for different cases it lies between
2) Initialize α = αmin . 0.005 to 0.01) does not improve the outcome. For example,
3) Apply g-Means Algorithm. for the month of January, optimal value of α as highlighted in
4) Store the result. the Table-1 is 0.01, and the knee of the curve shown by Fig.-5
5) Increment α = α + α is lying between 0.006 to 0.01. If we compare the patterns of
6) If α ≤ αmax , go to 3, else stop. sub figures a to d of Fig.-7, all the patterns in each sub-figure

448

Authorized licensed use limited to: NWFP UNIV OF ENGINEERING AND TECHNOLOGY. Downloaded on August 01,2022 at 15:24:36 UTC from IEEE Xplore. Restrictions apply.
(a) g-Means outcome at α =0.00001 (b) g-Means Result at α = 0.0001
1 1 1 1 1 1 1

0.9 0.9 0.9 0.9 0.9


0.9 0.9
0.8 0.8 0.8 0.8
0.8
Normalized Active Power

Normalized Active Power

Normalized Active Power

Normalized Active Power

Normalized Active Power

Normalized Active Power

Normalized Active Power


0.8 0.8
0.7 0.7 0.7 0.7
0.7
0.7 0.7
0.6 0.6 0.6 0.6
0.6
0.6 0.5 0.5 0.6 0.5 0.5
0.5
0.4 0.4 0.4 0.4
0.5 0.5
0.4
0.3 0.3 0.3 0.3
0.4 0.4
0.3
0.2 0.2 0.2 0.2
0.3 0.3 0.2
0.1 0.1 0.1 0.1

0.2 0 0 0.2 0.1 0 0


0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50
Hours Hours Hours Hours Hours Hours Hours

1 1 1 1 1

0.9 0.9 0.9 0.9


0.9
0.8 0.8 0.8
0.8
Normalized Active Power

Normalized Active Power

Normalized Active Power

Normalized Active Power

Normalized Active Power


0.8
0.7 0.7 0.7
0.7
0.7
0.6 0.6 0.6
0.6
0.5 0.5 0.6 0.5
0.5
0.4 0.4 0.4
0.5
0.4
0.3 0.3 0.3
0.4
0.3
0.2 0.2 0.2
0.3 0.2
0.1 0.1 0.1

0 0 0.2 0.1 0
0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50
Hours Hours Hours Hours Hours

(a) (b)
(c) g-Means reults at α =0.001 (d) g-Means Result at α = 0.01
1 1 1 1 1 1 1 1 1 1

0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9


0.9 0.9
0.8 0.8 0.8
0.8 0.8 0.8 0.8 0.8
Normalized Active Power

Normalized Active Power

Normalized Active Power

Normalized Active Power

Normalized Active Power

Normalized Active Power

Normalized Active Power

Normalized Active Power

Normalized Active Power

Normalized Active Power


0.8 0.8
0.7 0.7 0.7
0.7 0.7 0.7 0.7 0.7
0.7 0.7
0.6 0.6 0.6
0.6 0.6 0.6 0.6 0.6
0.6 0.5 0.6 0.5 0.5
0.5 0.5 0.5 0.5 0.5
0.4 0.4 0.4
0.5 0.5
0.4 0.4 0.4 0.4 0.4
0.3 0.3 0.3
0.4 0.4
0.3 0.3 0.3 0.3 0.3
0.2 0.2 0.2

0.2 0.3 0.2 0.3 0.2 0.2 0.2


0.1 0.1 0.1

0.1 0.2 0.1 0 0.2 0.1 0 0.1 0.1 0


0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 0 20 40 60 0 20 40 60 0 20 40 60 0 20 40 60 0 20 40 60 0 20 40 60
Hours Hours Hours Hours Hours Hours Hours Hours Hours Hours

1 1 1 1 1 1 1 1 1

0.9 0.9 0.9 0.9 0.9 0.9 0.9


0.9
0.9
0.8 0.8 0.8 0.8
0.8 0.8 0.8
Normalized Active Power

Normalized Active Power

Normalized Active Power

Normalized Active Power

Normalized Active Power

Normalized Active Power

Normalized Active Power

Normalized Active Power

Normalized Active Power


0.8
0.7 0.7 0.7 0.7
0.7 0.7 0.8 0.7
0.7
0.6 0.6 0.6 0.6
0.6 0.6 0.6
0.6 0.5 0.5 0.5 0.7 0.5
0.5 0.5 0.5
0.4 0.4 0.4 0.4
0.5
0.4 0.4 0.6 0.4
0.3 0.3 0.3 0.3
0.4
0.3 0.3 0.3
0.2 0.2 0.2 0.2
0.5
0.3 0.2 0.2 0.2
0.1 0.1 0.1 0.1

0.2 0.1 0 0 0 0.1 0.4 0.1 0


0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 0 20 40 60 0 20 40 60 0 20 40 60 0 20 40 60 0 20 40 60
Hours Hours Hours Hours Hours Hours Hours Hours Hours

(c) (d)
(e) g-Means Result at α beyond 0.01
1
1 1 1 1 1
Normalized Active Power

1 1 1 1
1
Normalized Active Power

Normalized Active Power

Normalized Active Power

Normalized Active Power

Normalized Active Power

Normalized Active Power


Normalized Active Power
Normalized Active Power

Normalized Active Power


Normalized Active Power
0.8 0.8
0.8 0.8 0.8 0.8 0.8 0.8
0.8 0.8 0.8
0.6
0.6 0.6 0.6
0.6 0.6 0.6 0.6 0.6 0.6 0.6
0.4
0.4 0.4
0.4
0.4 0.4 0.4 0.4
0.2 0.4 0.4 0.4
0.2 0.2
0.2 0.2
0 50 0 0.2 0.2 0.2 0.2 0.2
0 50 0 0 0.2
0 50 0 50 0 50 0 50 0 50 0 50
Hours 0 50 0 50 0 50
Hours Hours Hours Hours Hours
Hours Hours Hours Hours Hours
1
1
Normalized Active Power

1 1 1 1
1
Normalized Active Power

Normalized Active Power


Normalized Active Power
Normalized Active Power
Normalized Active Power

0.8 1
Normalized Active Power

1 1
Normalized Active Power

0.8
Normalized Active Power

0.9 0.8
Normalized Active Power

0.8 0.8
Normalized Active Power

0.8 0.8
0.6 0.9 0.95
0.6 0.8 0.8 0.6
0.6 0.6
0.9 0.6
0.4 0.8 0.6
0.7
0.4 0.4 0.4
0.4 0.85 0.6
0.7 0.4
0.2 0.6
0.2 0.4 0.2
0.2 0.8 0.2
0.5 0.6 0.4 0.2
0
0 50 0 0.75 0
0 0 0.2
0 50 0.4 0 0 50
Hours 0 50 0.5 0 50 0 50
0 50 0.7 0.2 0 50
Hours 0 50 Hours Hours Hours
Hours Hours 0 50 0 50
Hours Hours
Hours Hours

1
1 1
Normalized Active Power

1 1
Normalized Active Power

1 1 1
Normalized Active Power

0.8 1
Normalized Active Power

Normalized Active Power

Normalized Active Power

Normalized Active Power

0.8
Normalized Active Power

Normalized Active Power

0.8 0.8 0.8 0.8 0.8 0.8


0.6 0.8
0.6
0.6 0.6 0.6 0.6
0.6 0.6
0.4 0.4 0.6
0.4 0.4 0.4 0.4 0.4
0.2 0.2 0.4 0.4
0.2 0.2 0.2 0.2
0 0.2 0.2
0 0.2
0 50 0 50 0 50 0 0 0 0
Hours Hours 0 50 0 50 0 50 0 50 0 0
Hours 0 50
Hours Hours Hours Hours 0 50
Hours Hours

(e)
Fig. 7: g-Means clustering outcome for the month of January at different value of α (a) 0.00001 (b) 0.0001 (c) 0.001 (d) 0.01
(e) beyond 0.01

are not very similar. While in the case of the sub figure Fig.- curves as per CH index lies between 8 to 15. The result of
7(e) there are several patterns which are very much similar CVI analysis matches with the outcome at α lying between
and can be grouped together in a single cluster. Thus, unique 0.005 to 0.01. For same months, cluster validity analysis curve
patterns are revealed till α is 0.01; but beyond that, there is of Fig.-6 shows that the optimal number of clusters is around
no further improvement. 10-14, while g-Means gives the outcome as 10-13 at α lying
between 0.005 to 0.01. As the purpose of the paper was to
To validate the results, cluster validity analysis based on CH study the effect of α, it was varied with such high granularity.
index has been carried out on all monthly data sets as shown In fact, for practical cases, α can be varied in the power of
in Fig.-6. The optimal number of the clusters as per CH index 10 i.e 0.00001, 0.0001, 0.001,... and graph as shown in Fig.-5
is represented by the knee of the curve. The knee of all the

449

Authorized licensed use limited to: NWFP UNIV OF ENGINEERING AND TECHNOLOGY. Downloaded on August 01,2022 at 15:24:36 UTC from IEEE Xplore. Restrictions apply.
TABLE I: EFFECT OF SIGNIFICANCE LEVEL α ON THE [2] G. Chicco, R. Napoli, F. Piglione, M. Scutariu, P. Postolache, and
NUMBER OF CLUSTERS EVALUATED BASED ON g-MEANS C. Toader, “Application of clustering techniques to load pattern-based
ALGORITHM electricity customer classification,” Proc. 18th CIRED, pp. 6–9, 2005.
[3] G. Chicco, R. Napoli, and F. Piglione, “Comparisons among clustering
Sr No α Number of Clusters techniques for electricity customer classification,” IEEE Transactions on
JAN FEB MAR APR Power Systems, vol. 21, no. 2, pp. 933–940, 2006.
1 0.00001 5 6 4 4 [4] G. Chicco, “Overview and performance assessment of the clustering
2 0.00002 5 6 4 5 methods for electrical load pattern grouping,” Energy, vol. 42, no. 1,
3 0.00003 5 7 4 5 pp. 68–80, 2012.
4 0.00004 5 7 4 5 [5] J. A. Jardini, C. M. Tahan, M. Gouvea, S. U. Ahn, and F. Figueiredo,
5 0.00005 5 7 5 5 “Daily load profiles for residential, commercial and industrial low
6 0.00006 5 7 5 5 voltage consumers,” IEEE Transactions on power delivery, vol. 15, no. 1,
7 0.00007 6 7 5 5 pp. 375–380, 2000.
8 0.00008 6 8 5 5 [6] E. Carpaneto, G. Chicco, R. Napoli, and M. Scutariu, “Customer
9 0.00009 6 8 5 5 classification by means of harmonic representation of distinguishing
10 0.0001 7 8 6 5 features,” in Power Tech Conference Proceedings, 2003 IEEE Bologna,
11 0.0002 7 8 6 5 vol. 3. IEEE, 2003, pp. 7–pp.
12 0.0003 7 8 6 5 [7] G. Chicco, R. Napoli, F. Piglione, P. Postolache, M. Scutariu, and
13 0.0004 7 9 6 5 C. Toader, “Load pattern-based classification of electricity customers,”
14 0.0005 7 9 6 5 IEEE Transactions on Power Systems, vol. 19, no. 2, pp. 1232–1239,
15 0.0006 7 9 6 5 2004.
16 0.0007 7 9 6 5 [8] V. Figueiredo, F. Rodrigues, Z. Vale, and J. B. Gouveia, An electric
17 0.0008 8 9 6 5 energy consumer characterization framework based on data mining
18 0.0009 8 10 6 6 techniques, 2005.
19 0.001 8 10 7 8 [9] S. V. Verdú, M. O. Garcia, C. Senabre, A. G. Marı́n, and F. G. Franco,
20 0.002 8 10 7 9 “Classification, filtering, and identification of electrical customer load
21 0.003 8 11 7 9 patterns through the use of self-organizing maps,” IEEE Transactions
22 0.004 8 11 7 12 on Power Systems, vol. 21, no. 4, pp. 1672–1682, 2006.
23 0.005 9 12 12 13 [10] D. Gerbec, S. Gasperic, I. Smon, and F. Gubina, “Allocation of the
24 0.006 11 12 16 17 load profiles to consumers using probabilistic neural networks,” IEEE
25 0.007 11 12 19 21 Transactions on Power Systems, vol. 20, no. 2, pp. 548–555, 2005.
26 0.008 11 15 19 23 [11] R. Singh, B. C. Pal, and R. A. Jabr, “Statistical representation of distri-
27 0.009 11 15 20 23 bution system loads using gaussian mixture model,” IEEE Transactions
28 0.01 11 18 20 24 on Power Systems, vol. 25, no. 1, pp. 29–37, 2010.
29 0.02 20 20 21 25 [12] A. Mutanen, M. Ruska, S. Repo, and P. Jarventausta, “Customer
30 0.03 24 20 22 26 classification and load profiling method for distribution systems,” IEEE
31 0.04 31 26 24 29 Transactions on Power Delivery, vol. 26, no. 3, pp. 1755–1763, 2011.
32 0.05 36 28 26 30 [13] J. C. Dunn, “Well-separated clusters and optimal fuzzy partitions,”
33 0.06 38 28 27 34 Journal of cybernetics, vol. 4, no. 1, pp. 95–104, 1974.
34 0.07 40 34 29 37 [14] T. Caliński and J. Harabasz, “A dendrite method for cluster analysis,”
35 0.08 42 41 33 42 Communications in Statistics-theory and Methods, vol. 3, no. 1, pp. 1–
36 0.09 45 43 41 43 27, 1974.
37 0.1 48 43 42 45 [15] D. L. Davies and D. W. Bouldin, “A cluster separation measure,” IEEE
transactions on pattern analysis and machine intelligence, no. 2, pp.
224–227, 1979.
could be plotted to optimally judge the value of α. [16] P. J. Rousseeuw, “Silhouettes: a graphical aid to the interpretation and
validation of cluster analysis,” Journal of computational and applied
V. C ONCLUSION mathematics, vol. 20, pp. 53–65, 1987.
[17] X. L. Xie and G. Beni, “A validity measure for fuzzy clustering,” IEEE
The g-Means algorithm combines the robustness of k-Means Transactions on pattern analysis and machine intelligence, vol. 13, no. 8,
and screening ability of statistical hypothesis test (Anderson- pp. 841–847, 1991.
Darling test) to make clustering process unsupervised. It [18] G. J. Tsekouras, N. D. Hatziargyriou, and E. N. Dialynas, “Two-
stage pattern recognition of load curves for classification of electricity
overcomes the important drawbacks of k-Means, viz: 1. value customers,” IEEE Transactions on Power Systems, vol. 22, no. 3, pp.
of ‘k’ as input; 2. low inter cluster dissimilarity. Statistical test 1120–1128, 2007.
overcomes the first drawback and α act as a control to increase [19] K. Mets, F. Depuydt, and C. Develder, “Two-stage load pattern clustering
using fast wavelet transformation,” IEEE Transactions on Smart Grid,
intra cluster similarity and inter cluster dissimilarity. Thus, vol. 7, no. 5, pp. 2250–2259, 2016.
the algorithm is recommended for clustering applications like [20] G. Hamerly and C. Elkan, “Learning the k in k-means,” in Advances in
load profiling, where initial guess of a number of clusters in neural information processing systems, 2004, pp. 281–288.
[21] T. W. Anderson and D. A. Darling, “A test of goodness of fit,” Journal
unavailable or is difficult to obtain. This paper demonstrates of the American statistical association, vol. 49, no. 268, pp. 765–769,
the effect of α on clustering and provide procedure for its 1954.
selection. Though α between 0.005-0.01 is proposed and is [22] R. B. D’Agostino, Goodness-of-fit-techniques. CRC press, 1986,
vol. 68.
found suitable for the most load profiling applications, the [23] M. P. M. K. V. V. C. Ltd. [Online]. Available: :http://www.mpcz.co.in
actual choice depends largely on data. Thus a demonstrated [24] Statistics and Machine Learning Toolbox TM:Users Guide R2015a. The
procedure could be followed for judiciously selecting α for MathWorks, Inc., 2015.
any g-Means based clustering applications.
R EFERENCES
[1] Load Research Manual, vol.1, US DOE in association with ANL and
AEIC, USA, Nov. 180.

450

Authorized licensed use limited to: NWFP UNIV OF ENGINEERING AND TECHNOLOGY. Downloaded on August 01,2022 at 15:24:36 UTC from IEEE Xplore. Restrictions apply.

You might also like