You are on page 1of 5

A Method for Anomaly Prediction in Power

Consumption using Long Short-Term Memory and


Negative Selection
Andresso da Silva, I. S. Guarany, B. Arruda, E. C. Gurjão, R. S. Freire
Department of Electrical Engineering. Federal University of Campina Grande
Campina Grande-PB, Brazil
{andresso.silva, ivana.guarany}@ee.ufcg.edu.br, {ecg, rcsfreire}@dee.ufcg.edu.br

Abstract—To identify and predict anomalous power consump- measured value of the power, thus enabling the verification
tion, this paper proposes a method based on Long Short- and classification of next consumption values by means of
Term Memory (LSTM) and Negative Selection technologies that Negative Selection technique. As a case study, the method was
anticipates the occurrence of anomalies in power consumption,
and to provide useful information for energy efficiency. Using applied to the 20-week power consumption data of a building
the proposed method it is possible to anticipate the occurrence at the Federal University of Campina Grande, Brazil, and the
of anomalies in power consumption. When applied to the power obtained results shows the feasibility of the proposed method.
consumption recorded during 20 weeks of a building the method The paper is divided as follows: Section II presents Negative
yielded promising results. Finally, the effectiveness and advan- Selection essentials, while Section II-A presents and Long
tages of this method is demonstrated which it could be directly
used for real-time electricity monitoring and anomaly prediction. Short-Term Memory essentials. In Section III, the proposed
method is introduced and detailed. In Section IV, the ex-
Index Terms—Power Consumption, Anomaly Detection, perimental results are depicted, while Section V presents the
Anomaly Prediction, Negative Selection, Long Short-Term Mem- conclusions and future works.
ory.
II. N EGATIVE S ELECTION
I. I NTRODUCTION Artificial Immune Systems (AIS) are adaptive systems based
Anomalies can be defined as behaviors, values or set of on theoretical immunology to solve problems [12]. Negative
values having different probability distribution of the data selection is an AIS algorithm inspired by maturation of T-cells
that a system normally presents [10]. Anomalous values were that are produced in the bone marrow and undergo maturation
caused by non common events in a system, so their identi- in the thymus. When T-cells are exposed to the body’s proteins
fication can provide indications of the anomalous behavior (self-proteins) and if they bind, then that T-cell is killed. In
causes [3] [11] [8]. With the increasing volume of data, contrary, If they do not bind with the body’s proteins, then the
methods to handle anomalous values or behaviors are neces- T-cell is kept [13].
sary [1]. Anomaly identification is getting more attention [2] Negative selection algorithm consists of two stages, called
and already has a considerable number of applications in censoring and monitoring. In the former detectors are gen-
various areas [3]. erated and to react to the self-data, then they are discarded.
Anomalies identification methods are based in methods If they do not react, they are added to the set of competent
like classification, clustering, nearest neighbors, among oth- detectors. In the second stage data from the system is verified
ers [11] [14] [7] [19] [2] [12] [9]. Most of these methods only by detectors belonging to set of competent detectors generated
identify anomalous data. In practice, it is necessary predict in the first stage. If received data in the monitoring stage
and identify whether and when anomaly will occur [2]. matches any of the detectors, then the data is not non-self.
In energy power consumption anomalous behavior cause Matching verification occurs by the use of an affinity
unwanted expenditures and faults. Thus forecast of anomalous measure between detectors, self-data and monitored data. To
consumption provides information to better manage the sys- compute the affinity data is divided in vectors of length L,
tem. This paper presents a method using LSTM and Negative called self-strings. Thus monitored data will be divided into
Selection to predict anomalies occurrence in power consump- vectors of size L and the detectors will be also vectors
tion. The proposed method combines LSTM and Negative of length L. Self-strings and detectors may have elements
Selection techniques in order to anticipate the occurrence of belonging to set of real numbers. This encoding is called real-
anomalies in the consumption of electric energy, enabling valued encoding.
actions aimed at energy efficiency. For this encoding, several affinity measures can be used.
Unlike other methods using LSTM for anomalies detec- The r-Hamming affinity measure is described as follow. Given
tion, the proposed method does not requires the value of a detector Ab = [x11 x12 . . . x1L ] and a string Ag =
the error between value predicted by the LSTM and actual [w11 w12 . . . w1L ] of monitored data, is verified how many

978-1-7281-0397-6/19/$31.00 ©2019 IEEE


positions elements of vector |Ab − Ag| = [|x11 −w11 | |x12 − generated using the mean and median. The self-data can then
w12 | . . . |x1L − w1L |] are less than a deviation value , where be represented by means of a line vector
|.| denotes the absolute value [4] [5] [6]. If the number of  
positions in which |x1k − w1k | < , 1 ≤ k ≤ L, is greater S= x11 x12 ... x1n
than or equal to the threshold parameter r, it is said that Ab
Having obtained self-data, the set of detectors has been
and Ag partially match.
generated using negative selection method. This step of the
Therefore, negative selection is an algorithm that can detect method was implemented in two approaches:
changes in what may be considered normal behavior of a
• Complete data: self-data were defined as any vector S,
system, being able to determine what are self-data and non-self
thus generating a set of detectors.
data.
• Divided data: the vector S was divided into N ranges

A. Long Short-Term Memory of equal number of samples, thus generating N set of


detectors.
Recurrent neural networks (RNN) is a technique to learn
In both approaches encodingused real values, the chosen
from past values of a time series and to predict next values.
deviation was based on the standard deviation of the intervals
On of the drawbacks of RNN is the network training that
and the distance metric used was r-Hamming.
may lead to prohibitive time duration. The long short-term
memory (LSTM) is a type of RNN that uses a gradient-based
Power Consumption on Mondays
learning algorithm and it was designed to overcome the RNNs 10
Week 1
problems [14], [16]. 5
Many architectures have been proposed and LSTM has been 0
applied to various types of machine learning problems such 10
as handwriting, voice and song recognition, as well as other Week 2
5
problems that are associated with sequential learning [18].The
0
central of LSTM is the use of a memory cell that, through
10
gates, regulate information flow in the cell over time.
Week 3
These features allow LSTM are particularly useful for 5

learning sequences that correlate over long periods, as in the 0


Power Consumption (kW)

case of energy consumption [14]. 10


Week 4
5
III. L ONG S HORT-T ERM M EMORY AND N EGATIVE
0
S ELECTION M ETHOD FOR R EAL -T IME A NOMALY
10
D ETECTION
Week 5
5
In this paper we consider the active power consumption of
0
a building. Database consisted of the power consumption on
10
Monday for 20 weeks. The measurements were taken using a Self-data (Mean)
three - phase meter that provides informations as power factor 5

active and reactive powers. The active power was selected to 0

be analyzed and data from Mondays is an empirical choice, 10


the method can be applied in any day of te week day. The 5
Self-data (Median)

active power is represented by the an m × n matrix


0
00:00
01:00
02:00
03:00
04:00
05:00
06:00
07:00
08:00
09:00
10:00
11:00
12:00
13:00
14:00
15:00
16:00
17:00
18:00
19:00
20:00
21:00
22:00
  23:00
x11 x12 . . . x1n
 x21 x22 . . . x2n  Time of day
X= .
 
.. .. .. 
 .. . . .  Fig. 1. Weeks for tests (Week 1-5) and self-data calculated from mean and
xm1 xm2 . . . xmn median.

where m is the number of Mondays (consequently the number Given the k measured values
of weeks) and n is a total of measurements in each Monday.  
Y = y11 y12 ... y1k
Thus, each element xij represents the j-th measurement of the
i-th week. Measurements represents the power consumption in an LSTM was used to predict the consumption values
15 minute intervals, and this interval was chosen based on the [y1k+1 , . . . y1k+L−1 ], where L is the length of the self-string.
Brazilian Electricity Regulatory Agency (ANEEL) norms. Then, the predicted values were presented to the set of
Data were divided into two sets. The first one contains 15 detectors to be classified.
weeks and it was used to determine the self-data. The second The proposed method is based on two stages:
one contains 5 weeks, and it was used to test the method. • Censoring: The detector set is generated from the nega-
In Fig. 1 we present week trials (Week 1-5) and self-data tive selection.
• Monitoring: Given the consumption values the next L TABLE I
consumption values are predicted and it is checked if PARAMETERS USED IN THE EXPERIMENTS .
expected values indicate a consumption anomaly, when Negative Selection
an alarm is generated. Otherwise, next consumption value Parameters Divided Complete
is expected to begin the monitoring cycle once again. Encoding Real-valued Real-valued
The flowchart of adopted method is shown in Fig. 2. The Distance r-Hamming r-Hamming
parameters adopted in the experiments are presented in Table I. Length of self- 2 2
string (L)
Threshold (r) 1 1
Censoring Number of divi- 4 1
sions in the data
self-data (N )
Number of detec- 30 120
tors for each divi-
sion
Generate Deviation () 2.9 × standard de- 2.9× standard de-
Rejected viation of each di- viation of the data
condidate Match? vision in the data
detector Yes
LSTM
Parameters Divided Complete
No Units in the hidden 150 150
layers of LSTM
Add to Monitoring Number of epochs 300 300
the set of
Optimization Adam Adam
competent method
detectors

the week. In order to calculate precision and recall values for


Predict each test week, intervals were defined where there was greater
Match? Anomaly discrepancy of values.
consumption Yes
In the obtained results, the self-data computed from the
median gives better results than the self-data computed from
No the mean. Thus, all presented results are related to self-data
obtained from the median. Fig. 3 presents self-data, measured
Past/Present and predicted data using LSTM. The number of activated
data detectors for each time in Week 1 is also presented. Shaded
areas are defined as anomalous consumption intervals.
It can be noticed in the Fig. 3 that there happened loads
Fig. 2. Monitoring consumption flowchart for the proposed method.
activation before (06:45) of the common activation time (from
08:00). In addition, consumption is above the normal for the
To evaluate the results were used same period. In the interval between 14:15 and 15:00 the
TP TP observed values of the self-data differ and there was detection.
precision = ; recall = (1) In the interval between 16:30 and 18:00 that presents the
TP + FP TP + FN
second highest value of discrepancy between self-data and ob-
and Fβ -score ((2))
served data, there was also detection, but only between 16:45
precision × recall and 17:00. Finally, in the interval between 20:00 and 23:45,
Fβ = (1 + β 2 ) × (2)
(β 2 × precision) + recall there was the smallest number of detector activation, probably
as a performance measures, calculated from the values of True caused by lower level of discrepancy between observed data
positive (TP), False Negative (FN), and False Positive (FP). and self-data.
In this work, the precision is more significant than recall The method was applied over the 5 test weeks and the
because False Positives (FP) are more costly than False results of precision, recall, and F0.1 -score are both divided
Negatives (FN). This way, the β used was equal to β = 0.1. and complete data-based approaches for each week of test are
reported in the Table II.
IV. E XPERIMENTAL R ESULTS AND D ISCUSSION Divided data-based approach gave better results than the
Definition and evaluation of anomalies in power consump- complete data-based approach. This may suggest that data-
tion may have an intuitive characteristic [19], since power based approach is more robust than complete data-based
consumption varies with many factors, e.g. temperature, hours approach. For week two, complete data-based approach pre-
of load operation at buildings, equipment and even days of sented results well below the divided data-based approach.
Fig. 3. Results for the divided data-based approach for Week 1.

TABLE II consumption. Two approaches were evaluated and in obtained


P RECISION , R ECALL AND F0.1 - SCORE FOR THE PROPOSED METHOD FOR results the divided data-based approach gave better results
DIVIDED AND COMPLETE DATA - BASED APPROACHES
than complete data-based approach. This suggest that divided
Precision Recall F0.1 -score data-based approach is more robust than complete data-based
Week Divided Complete Divided Complete Divided Complete approach.
1 1.00 1.00 0.27 0.13 0.97 0.93 Unlike other methods found in the literature, the proposed
2 0.50 0.00 0.07 0.00 0.47 0.00 method does not directly require the calculation of error
3 0.90 1.00 0.19 0.02 0.87 0.68 between value predicted by LSTM and current measured value,
4 0.70 0.50 0.26 0.02 0.69 0.41 since it is used the technique of negative selection. This
5 0.75 0.75 0.27 0.14 0.74 0.81 way, the method makes it possible to determine whether next
measured values will be anomalies or not.
In a future work it will be investigated improvements in
This can be explained because week 2 is the most similar LSTM parameters to improve the predictions of the consump-
to the self-data of the test weeks. As the complete data- tion, such as allying the method developed to integrate an
based approach does not take into account the temporal alternative energy generation system and a energy storage
context, relatively close values were not detected by detectors system and so determine when there was a very high or much
generated. lower power consumption than expected in order to decide
Presented method can be adapted to serve as a tool for how the loads will be fed.
determining when alternative energy sources can supply the
demand for loads in place of the power supplied by the utility. ACKNOWLEDGMENT
As the consumption for the next 30 minutes is estimated, it The authors would like to thank Post Graduate Program
is also possible to check whether a energy storage system has in Electrical Engineering (COPELE-UFCG) and Coordination
enough charge to supply power to system or part of it. for the Improvement of Higher Education Personnel (CAPES).
This work is being realized in the context of SCIKE-Bahia
V. C ONCLUSIONS
Project.
In this work, it was presented a method using LSTM and
Negative Selection for real-time anomaly detection in power
R EFERENCES
[1] T. Dunning, and E. Friedman, “Practical Machine Learning: A New
Look at Anomaly Detection,” O’Reilly Meida, 2014.
[2] D. Hong, D. Zhao, and Y. Zhang, “The Entropy and PCA Based
Anomaly Prediction in Data Streams,” Procedia Computer Science, 96,
pp. 139–146, 2016.
[3] C. C. Aggarwal, “Outlier Analysis”, 2.ed. Springer, 2017.
[4] Hua Yang, Tao Li, Xinlei Hu, Feng Wang, and Yang Zou, “A Survey
of Artificial Immune System Based Intrusion Detection,” The Scien-
tific World Journal, vol. 2014, Article ID 156790, 11 pages, 2014.
https://doi.org/10.1155/2014/156790.
[5] Fernando P.A. Lima, Mara L.M. Lopes, Anna Diva P. Lotufo, and Carlos
R. Minussi.“An artificial immune system with continuous-learning for
voltage disturbance diagnosis in electrical distribution systems,” Expert
Systems with Applications, 56, pp.131–142, 2016.
[6] J. Timmis, A. Hone, T. Stibor, and E. Clark. “Theoretical advances in
artificial immune systems,” Theoretical Computer Science, 403, pp;131–
142, 2016.
[7] S. Ahmad, A. Lavin, S. Purdy, and Z. Agha, “Unsupervised real-time
anomaly detection for streaming data,”Neurocomputing, 262, pp.134–
147, 2017.
[8] A. Forestiero, “Self-organizing anomaly detection in data streams,”
Information Sciences, 373, pp.321–336, 2016.
[9] M. Toledano, I. Cohen, Y. Ben-Simhon, and I. Tadeski, “Real-time
anomaly detection system for time series at scale,” Proceedings of
Machine Learning Research, 71, pp.56–65, 2017.
[10] D. Hawkins, “Identification of Outliers,” Chapman and Hall, London,
1980.
[11] M. Gupta, J. Gao, C. Aggarwal and J. Han., “Outlier Detection for
Temporal Data,” Morgan & Claypool, 2014.
[12] D. Dasgupta, “Artificial immune systems and their applications”. New
York, Springer-Verlag, 1999.
[13] M. Ayara, J. Timmis, R. de Lemos, L. de Castro, and R. Duncan,
“Negative selection: How to generate detectors”, Proceedings of the
1st International Conference on Artificial Immune Systems (ICARIS),
pp.182–192, September 2002.
[14] P. Malhotra, L. Vig, G. Shroff, and P. Agarwal, “Long Short Term
Memory Networks for Anomaly Detection in Time Series,” European
Symposium on Artificial Neural Networks, Computational Intelligence
and Machine Learning, Belgium, pp.89–94, April 2015.
[15] A. Singh, “Anomaly detection for temporal data using Long
Shot-term Memory,”, Master Thesis, KTH Royal Institute
of Technology, 2017. [Online]. Available: http://www.diva-
portal.org/smash/get/diva2:1149130/FULLTEXT01.pdf
[16] S. Hochreiter, and J. Schmidhuber, “Long short-term memory,” Neural
computation, 9(8), pp.1735-1780, 1997.
[17] S. Hochreiter and J. Schmidhuber, “Long short-term memory,”
Dept. Fakultät für Informatik, Tech. Univ. Munich, Munich, Ger-
many, Tech. Rep. FKI-207-95, Aug. 1995. [Online]. Available:
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.51.3117
[18] K. Greff, R. K. Srivastava, J. Koutnik, B. R. Steunebrink, and J.
Schmidhuber, “LSTM: A Search Space Odyssey,” IEEE Transactions on
Neural Networks and Learning Systems, 28(10), pp.2222–2232, 2017.
[19] E. Keogh, J. Lin, and A. Fu, “HOT SAX: efficiently finding the most
unusual time series subsequence,” Fifth IEEE International Conference
on Data Mining (ICDM’05), Houston, TX, 2005, pp.8–.

You might also like