Professional Documents
Culture Documents
Abstract—Data mining is one of the statistical means that the phase identification in certain low-voltage network system
extracts useful information from an extremely large set of raw is crucial to prevent such consequences.
data. Therefore, data mining methods are under vigorous
development and are commonly used in artificial intelligence fields Traditionally, the phase connection layout of network
such as image processing and robot industry. There has also been systems can be determined by manual intervention or signal
recently applications of data mining in electric power industry, injection approaches. These methods are normally being used
such as classification, clustering and forecasting. In this research in a relatively small-scale electrical networks. However, with
work, clustering techniques are adopted to identify the phase the growing demand of utility power, they turn to be inefficient
connectivity in power systems. Supported by smart meter data to deal with recent large-scale network system.
obtained from end-users on the low-voltage (LV) feeder, phase
The introduction of automatic meter reading (AMR)
identification is properly discussed in this paper. Firstly, the LV
network model is modeled using simulation tool OpenDSS. systems, where sensors and smart-meters are embedded in
Secondly, the phase identification algorithm of the LV network is household grid, allows electrical engineers to monitor the
developed in Matlab by using K-means clustering as well as the unbalanced grid in the whole network system, thus control and
Gaussian Mixture Model (GMM) clustering. Finally, the IEEE predict the correct demand of household usage. A new non-
European Low Voltage Test Feeder is used to verify the proposed intrusive and low-cost approach is needed alongside the
method. Results indicate that these two methods enable phase booming of the modern smart grid network systems.
identification to realize its goals, which is to precisely address the
active loads as well as the correlated phase of corresponding load. II. PROBLEM FORMULATION
Keywords—Phase identification, distribution system, data In general, the phase identification procedure in this paper
mining, Clustering, K-means, GMM. is carried out in four steps:
i. Model the LV network - IEEE LV Feeder in the
I. INTRODUCTION simulation tool OpenDSS.
From personal computers to lighting systems, from high ii. Extract simulated data from OpenDSS using Matlab and
voltage transmission grids to low voltage distribution systems, perform K-means and GMM clustering algorithms to
three-phase systems are widely adopted in modern society. place the simulated data into three clusters.
Typically, there are three voltage levels of modern power
iii. Perform classification to identify the three clusters and
systems: high-voltage (HV), medium-voltage (MV) and low
label them precisely based on the a-priori information
voltage (LV). Specifically, the HV grid is mainly for high
from the transformer side.
capacity long-range transmission, whereas MV and LV grids
target for regional distribution and household utility, iv. Verify the consistence posteriorly by comparing the
respectively. LV grid is of most concern in people’s daily life labelled clusters with the correct information provided by
among these three levels, therefore it is the focus of this paper. IEEE LV test feeder manual.
In typical LV distribution networks, electricity reaches the It is noteworthy that the time series of voltage magnitude on
households via a step-down transformer that converts the both transformer and household side are the only requirement
voltage into 230V / 50Hz, either in single-phase or three-phase. to perform the above procedure.
Based on the phasor principle of the three-phase system, the
summation of the three ideal phase currents should be equal to III. METHODOLOGY
0 A to guarantee the balance. However, it is not always the case
in practice as system unbalances frequently occur. These A. Power flow problem
unbalance issues may lead to negative consequences such as The clustering method used this paper requires information
reduction of asset lifetime, decrease in operational efficiency or of the time-varying voltage magnitudes as input for the
even damage the overloaded phase coils, etc. Hence, analyzing identification algorithm, whereas only load profiles are
available. Thus, the power flow calculation is used.
,(((
2
One way to specify the various loads for the voltages is where T km is from the exponential form of Euler Transform of
through the calculation of bus-admittance matrix, which
indicates the relationship between the currents and the voltages complex numbers.
in the power systems.
B. Simulation Software
By Kirchhoff’s Current Law and Nodal analysis, the current
OpenDSS
injection at the bus k and the related voltages can be defined as:
[1] The Open Distribution System Simulator (OpenDSS) is an
open-source simulation software developed by Electric Power
Vk Vm Research Institute (EPRI) for electric utility power distribution
Ik Vk YkG ¦ (1) systems [2]. One of the most important features of OpenDSS is
m Z km that it can solve the radial feeder as well as other important
mz k
power flow equations, as indicated in the previous section by
where Vk and Vm are the voltage on buses k and m respectively. applying Newton-Raphson method. By using OpenDSS, the
YkG represents the sum of admittances connected at bus k to power flow results can be easily obtained. Since it is open-
source, it allows programmers to perform various
ground, Z km is the impedance between buses k and m, which functionalities based on their needs, so the external software
can be also represented by the admittance Ykm : like Matlab can directly drive its functional properties by using
the Component Object Model (COM) interface.
1 Matlab
Z km (2)
Ykm In this paper, Matlab is used as the software for data analysis.
Moreover, YkG is the admittance connected at bus k to In order to load the data flow from OpenDSS model into Matlab,
the COM server interface will firstly get the numerical results
ground, which also relates to the self-admittance Ykk : solved by OpenDSS, then pass all the data to Matlab. Then the
data flow calculation can be easily performed and post-
1
Ykk YkG ¦ (3)
processed for a higher level analysis such as clustering.
m Z km In short, the interaction between OpenDSS, Matlab and
mzk
COM is depicted in Fig. 1:
After formulation (3), the bus-admittance matrix [Y] for an
n-bus system can be defined as: [1]
¦Y ¦ (G
* * *
Ik *
km Vm km jBkm )Vm (6)
m 1 m 1
Then the PQ buses can be specified as: Fig. 1. Proposed interface between OpenDSS and Matlab interface with
respect to COM [3]
n
¦ ª«¬(G
jBkm )(Vk Vm ) º»
* *
Pk jQk Vk I k (7) It can be seen in Fig. 1 that data flow is transmitted to COM
m 1 ¼ km
from OpenDSS, then Matlab loads the data flow through the
In power systems, Newton-Raphson (N-R) Procedure has COM interface. Matlab users can easily call all the power flow
been more commonly used in solving power flow equations due quantities and equation results via COM interface for further
to its fast convergence feature. For typical PQ buses, the power data processing and analysis, including the time-series of three-
flow equations can be summarized into: phase voltage magnitude data flow on the transformer side and
the voltage magnitude of the 55 active profiles on the load side.
3
C. Clustering Therefore, the cluster results are not very stable when compared
Clustering is one of the mathematical means for grouping a to GMM clustering.
set of objects so that the objects with similar features will be GMM clustering
assigned into the same group [4]. Clustering, a subset of
GMM definition
unsupervised learning, is also a common method in data mining.
Owing to the big amounts of data stored in these matrices, Another powerful and commonly used data mining method
cluster process seems to be a useful way to simplify the is GMM. When dealing with data in multiple dimensions,
problems. K-means and Gaussian Mixture Model (GMM) are Normal Gaussian distribution has to be transformed into Multi-
the two commonly used methods in clustering. Variate Gaussian distribution [4] :
K-means clustering 1
1 ( x P )T ¦1 ( x P )
For K-means clustering algorithm, various input feature N ( x | P , ¦) e 2
(11)
samples will lead to different clustering results. The details of 2S ¦
choosing best training samples can be found in the Section V.
where P is the mean value and ¦ represents the covariance of
Besides the feature samples, it is also important to use these
samples to dig out some useful information—labels, which is the Gaussian distribution dataset. Like K-means, the training
relevant to the phase identification. To obtain the labels of the samples can be denoted as ^ x (1) , x (2) x ( m ) ` for x n .
data points, clustering algorithms is needed to find every
sample’s potential labels, and put similar features with the same In GMM, the mixture models are regarded as the
label together, then cluster each profiles into three groups. The combination of Multi-variate Gaussian distributions:
reason for three groups is that each load profile is using one of K
the phase from substation. Among them, K-means clustering
algorithm can be regarded as the easiest and fastest clustering
p ( x) ¦S N
k 1
k (x | Pk , ¦ k ) (12)
method for data classification. where K represents the number of mixture models and S k is
Typically, in K-means clustering, training samples can be the mixing coefficient, which satisfies 0 d S k d 1 and
denoted as ^ x (1) , x (2) x ( m ) ` , x ( m ) 1 n . And the formulation K
J ( zk ) { p ( zk 1| x) K k
Euclidean distance, whereas P j is the updated centroid by (15)
taking the average Euclidean distance value of all the sample
¦ S j N (x | P j , 6 j )
j 1
Fig. 3. One-line diagram of the IEEE European low voltage test feeder.
Fig. 5. Comparisons between before using GMM (left) and after using GMM
algorithm (right)
Fig. 4. Comparisons between before using K-means (left) and after using K-
means algorithm (right).
which is the truncated time-series of voltage magnitude of the the proposed data mining methods are promising candidates for
end-users. It is noteworthy that the first 50 rows of Vnode is of further usages in the power system.
great importance of achieving a high accuracy for both K- However, the optimal features that both clustering
means and GMM clustering algorithms. algorithms needed as the input are the first 50 real-time
measurements of the voltage profiles, which will be quite
TABLE III CLUSTERING RESULTS BY K-MEANS WITH DIFFERENT bloated in the case of a power network with many end-users.
FEATURES Since it will give rise to a matrix with a dimension of N e u 50,
NUM. IN NUM. IN NUM. IN (Ne is the number of end-users), which is relatively expensive
FEATURES ACCURACY to compute the results when Ne becomes greater than 100.
CLUSTUR1 CLUSTUR2 CLUSTUR3
X [ P; Md ]' 6 36 13 36.3636% Therefore, the future work of both K-means and GMM
X [ P ; Md ; Mo]' 15 27 13 85.454%
clustering algorithms might focus on searching for features that
are more efficient.
X [ P ; Md ; Mo;V ]' 22 14 19 76.3636%
X [ P ; Md ; Mo;
V ; Max; Min; V 2 ]'
18 9 28 52.7273% REFERENCES
X [Vnode (1: 50,:)]' 19 21 15 96.3636% [1] N. Mohan, Electric power systems. Hoboken, N.J.: John Wiley & Sons,
2012.
di diff (Vnode (1: 50,:)) [2] Misa, Ritam, “Impact of Plug-In Electrical Vehicles and Wind Generators
21 19 15 100%
X [di ]' on Harmonic Distortion of Electric Distribution Systems”, Master’s
Thesis, Michigan Technological University, 2014.
*X is the input for K-means algorithm, Vnode is the voltage matrix on the load side.
[3] Meghasai, S. Monger, R. Vega, H.Krisnaswami, "Simulation of Smart
Functionalities of Photovoltaic Inverters by Interfacing OpenDSS and
TABLE IV CLUSTERING RESULTS BY GMM WITH DIFFERENT FEATURES Matlab," The university of Texas at San Antonio, Thesis, 1588356, p. 70,
2015. W.-K. Chen, Linear Networks and Systems (Book style). Belmont,
NUM. IN NUM. IN NUM. IN CA: Wadsworth, 1993, pp. 123–135.
FEATURES ACCURACY
CLUSTUR1 CLUSTUR2 CLUSTUR3
[4] C. Bishop, Pattern recognition and machine learning. New York: Springer,
X [ P; Md ]' 19 17 19 70.9091% 2006, pp. 430-439.
X [ P ; Md ; Mo]' 19 17 19 70.9091% [5] A. Ng. CS 229. Class Lecture, Topic: “Unsupervised Learning, k-means
clustering.” Dept. of Computer Science, Stanford University, California,
X [ P ; Md ; Mo;V ]' 18 17 20 70.9091% CA, spring 2015.
X [ P ; Md ; Mo; [6] S. Z. Selim and M. A. Ismail, "K-Means-Type Algorithms: A Generalized
20 14 21 65.4545% Convergence Theorem and Characterization of Local Optimality," in
V ; Max; Min; V 2 ]'
IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.
X [Vnode (1: 50,:)]' 17 23 15 92.7273% PAMI-6, no. 1, pp. 81-87, Jan. 1984.
di diff (Vnode (1: 50,:)) [7] IEEE PES. (2016, Feb.) Distribution test feeders. [Online]. Available:
21 19 15 100% http://www.ewh.ieee.org/soc/pes/dsacom/testfeeders/index.html
X [di ]'
[8] Arthur, David, and Sergi Vassilvitskii. "K-means++: The Advantages
*X is the input for GMM algorithm, Vnode is the voltage matrix on the load side. of Careful Seeding." SODA ‘07: Proceedings of the Eighteenth Annual
On the other hand, based on the knowledge from Kirchhoff ACM-SIAM Symposium on Discrete Algorithms. 2007, pp. 1027–1035.
Voltage Law, the voltage of a node is time-dependent [9]. So [9] T. Jones and N. Nenadic, Electromechanics and MEMS. Cambridge
University Press Textbooks, 2013, pp. 11-12.
the voltage magnitudes of adjacent time steps should be
correlated. Furthermore, it is noted that voltage amplitudes are [10] X. Lu, “Data Mining Techniques in Power Quality Analysis,” TU/e,
Eindhoven, 2015.
more stable at the first 50 time steps especially in the night
hours from 00:00 to 00:50.
VI. CONCLUSION
In recent years, smart meter technologies are widely used in
the electrical system. As a result, a huge amount of data can be
collected from the data center or other parties. Therefore, data
mining techniques is able to perform as an efficient way to
extract useful information out of the large-scale dataset [10].
This paper has introduced two data mining algorithms: K-
means clustering and GMM clustering into power systems. The
performances also have been tested on the IEEE LV test feeder
with real datasets. Both of the two methods used the time-series
information of voltage magnitude at the transformer side as well
as 55 end-users that have load profiles. Firstly, the required
features, such as voltage magnitude of loads and the
transformer, were calculated, and then selected features were
processed by the K-means and GMM clustering algorithms.
Both of the clustering methods are able to converge, and to
achieve good results compared with the reference. Therefore,