Professional Documents
Culture Documents
Detection of Electricity Theft in Customer PDF
Detection of Electricity Theft in Customer PDF
Abstract—Advanced Metering Infrastructure (AMI) is a core best knowledge, multiple outliers detection algorithms have
part of Smart-grid, which is responsible for collecting, measuring not been used. Our approach includes the use of seven outliers
and analyzing energy usage data of customers. The development detection algorithms to detect abnormal pattern consumption;
of this network has been possible thanks to the emergence of new
information and communication technologies. However, with the also, a preprocessing of the data using k-means clustering
arrival of these technologies, new problems have arisen in the algorithm is performed with the objective of reducing the
AMI. One of these challenges is the energy theft, which has number of measurement samples. The validation is performed
been a major concern in traditional power systems worldwide. by analyzing the electricity consumption of five customers,
To face these challenges, datasets of electricity consumptions are which include seven different types of electricity theft. After
analyzed to detect intruders. Traditional techniques to detect
intruders include the use of machine learning and data mining comprehensive experiments, a feasibility study is performed to
approaches. In this paper, we analyze the feasibility of applying analyze the use of these existing outliers detection algorithms
outliers detection algorithms for enhancing the security of AMI as an improvement to AMI security. The study will help
through of the detection of electricity theft. We explore the future researchers to apprehend and extend existing outliers
performances of various existing outlier detection algorithms on detection algorithms for building robust IDSs, which are used
a real dataset (consumer energy usage). The results show the
feasibility of use outliers algorithms in the security of AMI and in various components of AMI and other critical systems. The
also the effectiveness of the use of these methods in the electricity contributions of this research are as follows:
consumption datasets for theft detection. 1) We have conducted a set of experiments on a public data
Index Terms—Outliers detection, electricity theft detection, set using state-of-the-art outliers detection techniques
advanced metering infrastructure, smart grid
and compared their performances.
2) Seven types of electricity theft are generated to validate
I. I NTRODUCTION the outlier detection algorithms.
Smart-grid incorporates computer intelligence and sensors 3) We have performed a feasibility study of applying these
into the power system. This includes the use of smart meters outliers algorithms in the detection of electricity theft in
in the AMI subsystem. However, with the intelligence incor- AMI.
porated, new security concerns are raised. One of them is the
II. R ELATED W ORK
electricity theft that has been the long-standing issue to utilities
as it can cause billions of dollars worth in financial losses [1]. In recent years, the emergence of Smart-grid has motivated
The methods of stealing electricity include tampering meters research into a variety of intrusion detection techniques. One
[2] and connecting unregistered appliances to the power grid. approach is exploring Machine-Learning Methods (MLMs)
Many efforts are committed to addressing electricity theft [5] to detecting anomalies, such as neural networks, support
issues using a variety of approaches. Traditionally, pattern vector machines, K-nearest neighbor and Hidden Markov
recognition and data-mining techniques are used over histori- model. However, inaccurate models can lead to false alarms
cal electricity usage to detect electricity theft [3] [4]. and/or missed detections. Also, in the literature we find the
In this paper, we analyze the feasibility of applying out- outliers detection techniques which can be applied to detect
liers detection algorithms for enhancing the security of AMI malicious attempts to manipulate the data from most nominal
through the detection of electricity theft in a variety of types. cases. Outliers detection techniques can be categorized into
Our focus in this study is to detect normal and malicious four groups: statistical, distance based, clustering methods
customers based in the analysis of consumption patterns in the and density-based approaches. In statistical techniques [6]–
AMI subsystem. In the past, many classification algorithms [8], the data points are typically modeled using a stochastic
were used to train a classifier based on a sample database, distribution, and points are labeled as outliers depending
which is then utilized to find abnormal patterns, but, to our on their relationship with the distributional model. Distance
based methods [9]–[11] use solely the distance space to flag
outliers. The cluster-based methods [12], [13] detect outliers
136
TABLE I C. Experiment Design
DATASET USED IN OUR EXPERIMENTS
The methodology has two steps. In the first step, we
Features performed the preprocessing of the data; this process includes
Dataset
original clustering
Customer-1 24 3 tasks such as (1) format and normalize the data from the raw
Customer-2 24 3 file (2) select a season period(90 days) with 24 meter readings
Customer-3 24 13 by day,(3) create a synthetic data set of malicious samples
Customer-4 24 9
Customer-5 24 3
which contains seven different types of electricity theft, and
(4) perform k-means in conjunction with Silhouette plots to
reduce the number of meter readings by day which sums up
the in-between samples. In the second part, we tested the
seven detection algorithms for the five customers. For every
customer we tested seven files, which correspond to seven
types of electricity theft, making 35 files processed in total.
We use the Area Under ROC Curve (AUC) as a metric
for measuring the performance of the algorithms. In addition,
we tuned the parameter values with the help of corresponding
literature. The process included a test to find the bests ’h’
values (kernel width) for the algorithms LDF, KDEOS, and
RDOS.
V. R ESULTS
The results of applying the k-means clustering algorithm
over the five customer are shown in the Table I. We can see that
in the majority of the cases, the features were reduced to three
meter readings by day. In other words, this corresponds to
three patterns of consumption during one day: low(morning),
medium(afternoon) and high consumption(night). Customer-
Fig. 1. Example of the types of electricity theft generated (attacks) 3 and Customer-4 have more patterns of consumption in one
day. This would probably correspond to commercial customers
who have many patterns of consumption during the day.
• Type-1: Scaling down the amount of consumption by a
constant ratio, i.e. reduce the consumption every hour by A. AUC performance by seven Algorithms
the same constant As result of the k-means clustering, we have customers with
• Type-2: It generates an interruption in the transmission of different meter readings by day. In Fig. 2, RDOS, LDF and
usage reports by smart meters during a random duration INFLO obtain the best performance for customers with three
of the day features. For data sets with electricity theft type-4 to type-
• Type-3: Scaling down the amount of consumption with 7, the best results are over 90 percent AUC. Customer-3 and
a different random value for every hour Customer-4 are shown in Fig. 3 and Fig. 4. In both cases,
• Type-4: Scaling down the average consumption of the the better performances are obtained with KDEOS, INFLO,
day every hour for a random value LDF and RDOS. Also we observed that for electricity theft
• Type-5: This represents a constant consumption all day. 5,6,7 the AUC is over 75 percent while in the rest of the
The value of the constant is the average consumption for performance is lower. In addition, we can appreciate that
that day Customer-3 and Customer-4, in the majority of the methods,
• Type-6: Reverses the order of readings. In this way, the the AUC is lower than Customer 1, 2, 5 (low features). This
illegal customer pays less because the price of the hours leads us to conclude that the increase in the features affects
with high demand is more expensive than hours with low the performance of some algorithms. For example, the method
demand RDOS dramatically decreases its performance when it runs
• Type-7: Scaling down the amount of consumption. The dataset with high features.
consumption will be between zero and the minimum
value for that hour. B. Results by types of electricity theft
An example of the daily consumption of a customer with Table II shows the results over the full data sets, i.e. for
the seven types of synthetic electricity theft generated is shown the five customers with seven types of electricity thefts for
in Fig. 1. In this figure, the normal consumption is colored each one. We observed that the majority of algorithms obtain
with black; the rest of the colors correspond to the attacks good results for all types of electricity thefts (only the type-2
generated. obtains a low AUC). Table II also shows the best methods by
137
Fig. 2. AUC of customers 1,2,5 (3 features) Fig. 4. AUC of customer-4 (9 features)
Fig. 3. AUC of customer-3 (13 features) type, which shows that INFLO, RDOS, LDF have excellent
results.
In Fig. 5, we can see the result for the average time over the
ve customer. MNN and KDEOS, obtain the best time while
TABLE II LDF and LOF are the worst performer. We must highlight that
D ETECTION LEVEL FOR TYPE OF ELECTRICITY THEFT (AUC) all methods output similar values for all type of the electricity
Theft-Type AUC Methods theft. Fig. 6 show the Standard deviation obtained over the full
1 0.81 INFLO, LDF, RDOS data set. RDOS and INFLO are the methods with the lowest
2 0.65 KDEOS, LDF, INFLO standard deviation.
3 0.91 RDOS, INFLO
4 0.94 INFLO, LDF, RDOS
5 0.95 RDOS, KDEOS
C. Effect of the clustering
6 0.95 RDOS, LDF Due to space limitations we show the results of features
7 0.94 KDEOS, INFLO
reduction only for customer-2 in Fig. 7 and 8. In these figures,
we observed the effect on the AUC performance using 24-
features vs 3-features. The figures show that for the majority
138
Fig. 6. General Standard deviation over all data set Fig. 8. AUC with 3 features for customer 2
139
[3] G. Tsekouras, N. Hatziargyriou, and E. Dialynas, “Two- no. 2, pp. 93–104, 2000, cited By 1763. [Online]. Available:
stage pattern recognition of load curves for classification of https://www.scopus.com/inward/record.uri?eid=2-s2.0-0039253819&
electricity customers,” IEEE Transactions on Power Systems, partnerID=40&md5=8237238cd72e69d886fad873ae89c433
vol. 22, no. 3, pp. 1120–1128, 2007, cited By 122. [Online]. [18] L. Latecki, A. Lazarevic, and D. Pokrajac, “Outlier detection
Available: https://www.scopus.com/inward/record.uri?eid=2-s2. with kernel density functions,” Lecture Notes in Computer
0-34548048165&doi=10.1109%2fTPWRS.2007.901287&partnerID= Science (including subseries Lecture Notes in Artificial
40&md5=13b0469f43888be7d57410f728775244 Intelligence and Lecture Notes in Bioinformatics), vol. 4571
[4] Y. Zhang, W. Chen, and J. Black, “Anomaly detection LNAI, pp. 61–75, 2007, cited By 33. [Online]. Available:
in premise energy consumption data,” 2011, cited By 12. https://www.scopus.com/inward/record.uri?eid=2-s2.0-37249036471&
[Online]. Available: https://www.scopus.com/inward/record.uri?eid= partnerID=40&md5=bf02ef283a026cf9a22e35f95194b686
2-s2.0-82855163956&doi=10.1109%2fPES.2011.6039858&partnerID= [19] E. Schubert, A. Zimek, and H.-P. Kriegel, “Generalized outlier detection
40&md5=65a396de5313a3981c18b939f44e8570 with flexible kernel density estimates,” vol. 2, 2014, pp. 542–550, cited
[5] S. Dua and X. Du, Data Mining and Machine Learning in Cybersecurity, By 22. [Online]. Available: https://www.scopus.com/inward/record.
1st ed. Boston, MA, USA: Auerbach Publications, 2011. uri?eid=2-s2.0-84958543874&doi=10.1137%2f1.9781611973440.63&
[6] V. Barnett and T. Lewis, Outliers in Statistical Data, ser. Wiley partnerID=40&md5=ac27f26d093348cccba3db76e25f0405
Series in Probability & Statistics. Wiley, 1994. [Online]. Available: [20] W. Jin, A. Tung, J. Han, and W. Wang, “Ranking outliers using
https://books.google.com.pr/books?id=B44QAQAAIAAJ symmetric neighborhood relationship,” Lecture Notes in Computer
[7] N. Billor, A. Hadi, and P. Velleman, “Bacon: Blocked adaptive Science (including subseries Lecture Notes in Artificial Intelligence and
computationally efficient outlier nominators,” Computational Statistics Lecture Notes in Bioinformatics), vol. 3918 LNAI, pp. 577–593, 2006,
and Data Analysis, vol. 34, no. 3, pp. 279–298, 2000, cited cited By 117. [Online]. Available: https://www.scopus.com/inward/
By 154. [Online]. Available: https://www.scopus.com/inward/record. record.uri?eid=2-s2.0-33745772192&doi=10.1007%2f11731139 68&
uri?eid=2-s2.0-0034282347&doi=10.1016%2fS0167-9473%2899% partnerID=40&md5=b49bc5895564746a45af3007544b77d2
2900101-2&partnerID=40&md5=79b7dd5e97236600daaec721ec1ff5d7 [21] B. Tang and H. He, “A local density-based approach for outlier
[8] E. Eskin, “Anomaly detection over noisy data using learned probability detection,” Neurocomputing, vol. 241, pp. 171–180, 2017, cited
distributions,” in Proceedings of the Seventeenth International By 1. [Online]. Available: https://www.scopus.com/inward/record.
Conference on Machine Learning, ser. ICML ’00. San Francisco, CA, uri?eid=2-s2.0-85015321791&doi=10.1016%2fj.neucom.2017.02.039&
USA: Morgan Kaufmann Publishers Inc., 2000, pp. 255–262. [Online]. partnerID=40&md5=072e890f93ba2dd8e30a6b2946fae646
Available: http://dl.acm.org/citation.cfm?id=645529.658128 [22] I. C. for Energy Regulation. (2018) Irish social science
[9] E. M. Knorr and R. T. Ng, “Algorithms for mining distance-based data archive. [Online]. Available: http://www.ucd.ie/issda/data/
outliers in large datasets,” in Proceedings of the 24rd International commissionforenergyregulationcer/
Conference on Very Large Data Bases, ser. VLDB ’98. San Francisco, [23] P. Jokar, N. Arianpoo, and V. Leung, “Electricity theft detection
CA, USA: Morgan Kaufmann Publishers Inc., 1998, pp. 392–403. in ami using customers’ consumption patterns,” IEEE Transactions
[Online]. Available: http://dl.acm.org/citation.cfm?id=645924.671334 on Smart Grid, vol. 7, no. 1, pp. 216–226, 2016, cited By 29.
[10] C. Aggarwal and P. Yu, “Outlier detection for high dimensional [Online]. Available: https://www.scopus.com/inward/record.uri?eid=
data,” 2001, pp. 37–46, cited By 433. [Online]. Available: 2-s2.0-84960349614&doi=10.1109%2fTSG.2015.2425222&partnerID=
https://www.scopus.com/inward/record.uri?eid=2-s2.0-0034832620& 40&md5=862fb157e50102b2d986b413beed9e29
partnerID=40&md5=838180bd526a93462c9547680d306d01
[11] S. Ramaswamy, R. Rastogi, and K. Shim, “Efficient algorithms for
mining outliers from large data sets,” SIGMOD Record (ACM Special
Interest Group on Management of Data), vol. 29, no. 2, pp. 427–438,
2000, cited By 862. [Online]. Available: https://www.scopus.com/
inward/record.uri?eid=2-s2.0-0039845384&doi=10.1145%2f335191.
335437&partnerID=40&md5=d0562a1553e880767d386344c4682eec
[12] M. Brito, E. Chvez, A. Quiroz, and J. Yukich, “Connectivity
of the mutual k-nearest-neighbor graph in clustering and
outlier detection,” Statistics and Probability Letters, vol. 35,
no. 1, pp. 33–42, 1997, cited By 89. [Online]. Available:
https://www.scopus.com/inward/record.uri?eid=2-s2.0-0031571391&
partnerID=40&md5=4016de803c224e486b5e279444f2ca13
[13] V. Hautamki, I. Krkkinen, and P. Frnti, “Outlier detection using
k-nearest neighbour graph,” vol. 3, 2004, pp. 430–433, cited
By 77. [Online]. Available: https://www.scopus.com/inward/record.
uri?eid=2-s2.0-10044269754&doi=10.1109%2fICPR.2004.1334558&
partnerID=40&md5=c2079e09347211036c9fcd2e25070210
[14] S. Depuru, L. Wang, and V. Devabhaktuni, “Support vector machine
based data classification for detection of electricity theft,” 2011, cited
By 54. [Online]. Available: https://www.scopus.com/inward/record.
uri?eid=2-s2.0-79958816472&doi=10.1109%2fPSCE.2011.5772466&
partnerID=40&md5=36a976d4bac1ea0db3acb2b5f5007ac0
[15] S. Depuru, L. Wang, V. Devabhaktuni, and R. Green, “High performance
computing for detection of electricity theft,” International Journal of
Electrical Power and Energy Systems, vol. 47, no. 1, pp. 21–30, 2013,
cited By 24. [Online]. Available: https://www.scopus.com/inward/
record.uri?eid=2-s2.0-84870315054&doi=10.1016%2fj.ijepes.2012.10.
031&partnerID=40&md5=e6fde4f7bd1a5b9c7fab2666de820387
[16] S. Depuru, L. Wang, V. Devabhaktuni, and P. Nelapati, “A
hybrid neural network model and encoding technique for enhanced
classification of energy consumption data,” 2011, cited By 20.
[Online]. Available: https://www.scopus.com/inward/record.uri?eid=
2-s2.0-82855182204&doi=10.1109%2fPES.2011.6039050&partnerID=
40&md5=7677b27c8553df10a26d29389d7175c4
[17] M. Breuniq, H.-P. Kriegel, R. Ng, and J. Sander, “Lof:
Identifying density-based local outliers,” SIGMOD Record (ACM
Special Interest Group on Management of Data), vol. 29,
140