Professional Documents
Culture Documents
Abstract—Models for forecasting residential energy consumption in the The forecast of residential energy consumption can be used to
short term (horizon of hours or days) allow its users to plan and make assist residents in decision-making and cost-conscious planning [9]
decisions in order to reduce their consumption. Meanwhile, much of two and energy concessionaires in medium-scale and large-scale
academic jobs forecasting and detection of customer consumption anomalies [10].
This area uses offline environments for experimentation, disregarding the We can also facilitate energy transactions between prosumers in
challenges of automation, monitoring and updating of models in online point-to-point (P2P) energy markets [11], [12], promoting the efficient
environments. This article presents the challenges and solutions in a case use of the electrical network. Home Energy Management Systems
study, detailing the implementation in a real scenario online of consumption
forecasting models in 4 Brazilian residences in the period of 2020. It is
(HEMS) can use consumption prediction as input for predictive
concluded that the use of best practices and metrics for the development in
control models[13], helping to plan the use of controlled applications
online environments will not only increase
such as washing machines, air conditioning systems and electric
vehicles, in order to optimize the use of energy consumption of
Accuracy of forecasts, as well as facilitating the development of two models
and helping in the speed of experimentation and reproducibility of results. energy and financial economy for the user in the variable tariff
scheme.
Authorized licensed use limited to: LIVERPOOL JOHN MOORES UNIVERSITY. Downloaded on August 13,2022 at 06:16:40 UTC from IEEE Xplore. Restrictions apply.
Machine Translated by Google
culture uses test automation, monitoring and integration, 6 New models are gradually introduced to users
infrastructure management as code, among other techniques,
7 New models can be reversed quickly and safely
allowing assim deliveries
and continuous implantation of the system [15].
Monitoring
The application of the culture of DevOps in ML systems,
known as MLOps [16], seeks to adapt the DevOps techniques 1 Changes of dependencies of the system are monitored
to the area, distinguishing the practices used in traditional two Invariants two dice are kept both in the offline environment
software systems due to their dependence on the quality of how much online
data by means of correct extraction and processing, its
3 Features are calculated equally during training and prediction
exploratory character during the development, with the test of
different configurations, model architectures and feature
4 Models are not old too
generation , and its
monitoring of only errors derived from the programming 5 The model is numerically stable
wrong system, but also caused by obsolete or shipped models
and training data. 6 The model does not obtain significant changes at the speed of
training or delivery latency
Thus, testing systems before introducing them into 7 There was no detriment to the quality of prediction in the online environment
production environments and monitoring their performance is
considered a good practice not to develop and operate software
systems. However, due to its nature Besides these metrics, another good practice in ML projects
prediction, these practices are difficult to define and implement is the separation of its stages in pipelines [19], in order to
in ML systems [17]. facilitate the integration of the different stages, the scalability
of the system and the reproducibility of two results.
Google Research uses 28 metrics to measure the readiness
of ML systems in production [18]. These metrics involve tests One of the two differences of online systems is the need
related to 4 categories, related to the input given, to the model for continuous training of their models to avoid the occurrence
used, à of concept drift. In [20] a strategy has been defined for
infrastructure, and system monitoring year, listed in TABLE I. simulating and evaluating two effects of periodic training in
temporary series, finding the seasonality of two input data and
updating the model at each seasonal cycle, using training and
TABLE I. METRICS FOR THE PREPARATION OF ML SYSTEMS IN validation data that reflect the cycle more recent
PRODUCTION. SOURCE: [18]
Dices
III. FORECAST OF RESIDENTIAL ENERGY CONSUMPTION
1 Feature expectations are captured in data schemes
With the implementation of the use of smart meters in the
two
All the features are beneficial to the model's accuracy residential sector, given on the individual consumption of
3 Features are not too expensive in memory usage greater granularity, it allows new applications and discoveries
to help in education and economy by its user.
4 Features in addition to business requirements
5
Among these applications, there is forecasting of residential
Pipeline has appropriate privacy control
consumption, helping in the planning of expenses and decision-
6 New features can be added quickly making for the consumer [9], forecasting in the medium and
7 Code for creation of features is tested large scale by energy distributors [10], or assistance in the
energy transaction between prosumers (consumers who also
generate electricity on a small scale) in peer-to-peer markets
Models
(energy transactions carried out directly between consumers)
1 Changes in specifications are revised and versioned [12], [21], and in predictive control models of Home Energy
Management Systems
two
Offline metrics correlate with real online impact
[13].
3 All hyper parameters are adjusted
Different energy consumption on a medium and large
4 The aging effects of the model are known
scale, or hourly individual consumption, presents greater
5 A simpler model is not better than the current one volatility, with daily consumption peaks that may occur at
different times.
6 The quality of the model is sufficient in all parts of the data
456
Authorized licensed use limited to: LIVERPOOL JOHN MOORES UNIVERSITY. Downloaded on August 13,2022 at 06:16:40 UTC from IEEE Xplore. Restrictions apply.
Machine Translated by Google
We end up measuring just accurately point by point, not analyzing with an intermediary for temporary storage of two data [3], [25].
temporary or formal errors.
Fig. 1 shows an example in which a constant forecast (F1), which The data of hourly consumption and internal temperature of the
does not introduce a significant value to its user, has a point-to-point residence are sent to a remote data bank, which are used for the
error less than a forecast of behavior closer to the real one, more proposed solution of energy consumption forecast for three ML
dislocated in time (F3 ). Regarding the forecast F1 possui um models.
Currently, based on data provided by information in the period from
MAE of 0.82, forecast F3 possui a MAE of 0.99.[18][19][20] January 2020 to February 2021.
457
Authorized licensed use limited to: LIVERPOOL JOHN MOORES UNIVERSITY. Downloaded on August 13,2022 at 06:16:40 UTC from IEEE Xplore. Restrictions apply.
Machine Translated by Google
which will be the most adequate and relevant metrics to the To analyze or win the introduction of new features
problem. (Test Dice 2: all the features are beneficial and test Dice 3:
Features are not too expensive in memory use), a base reference
In order to measure the preparation of the solution for
model was used, or one that hardly uses or consumes 24 hours
environments in production, we use the metrics provided by Google
ago and data referring to the calendar: time , day of the week and
Research, being the implementation to satisfy each metric
of the month, month and year.
explained in the following sections.
This reference model was compared with other models with
A. Loading two dice
additional features other than those used in the reference model
The input data are received from the storage service in the (test Model 5: a simpler model is not better than the current one),
cloud, or which storage or total energy consumption and by as shown in TABLE II.
sectors, apart from the internal temperature of the residence, with
a sampling frequency of one hour.
Cross-validation was used for each residence, obtaining the
Every hour, searches are made for files in the cloud, if files are mean of the mean quadratic error (MSE) and the error adjusted
found that are not present in the local files, they are downloaded with a 2-hour window and norm 4 [24] of all the residences. TO
from these local directories. TABLE II. also show the reductions
Percentages of the MSE and the adjusted error in relation to the
B. Pre-processing reference model.
During or pre-processing, the presence of missing hours and
gross anomalies is verified. To be considered an anomalous value,
TABLE II. ANALYZE THE ADDITION OF FEATURES IN THE ACURÁCIA DO
the consumption must be less than 0, and the temperature with a MODEL
variation of more than 10 ºC in relation to the previous value, Model MSE Adjusted Error
satisfying the test Data 1: expectations of the features are captured
Reference Model 0.0479 (0%) 0.2535 (0%)
in data schemes [18].
Reference + 1st Derivative 0.0476 (-0.62%) 0.2529 (-0.24%)
There were occurrences of temperature anomalies, in which Reference + 0.0502 (+4.80%) 0.2568 (+1.30%)
its variation in relation to the previous hour exceeded 20 ºC, apart Internal temperature
from moments with missing temperature and consumption readings. Reference + Consumption 0.0415 (-13.36%) 0.2043 (-19.40%)
These cases were attributed as a reading error, discarding these of 25 and 23 hours ago
values.
The addition of the internal temperature of the residence as a
For the definition of features, we have added three
feature of the model ended up reducing its accuracy, while using
Past hourly consumption, referring to 25, 24 and 23 hours ago in
the 1st derivative of energy consumption, obtaining slightly
relation to the moment to be forecast, in addition to attributes
related to the calendar, such as the time of day, day of month, and significant gains. Due to these results, tais features
not foram considered not final model.
month of year to be forecast. Check out these features
For the final model of prediction, it was carried out in the stage of We observed a low weekly correlation for all the residences,
exploratory analysis of two data. not having great variation between days of the week and the end
of the week, as shown in Fig. 4 for one of the residences. It is
Meanwhile, new features can be added by altering two input
necessary to note that the consumption figures refer to the year
files, such as by adding the internal temperature of the residence,
2020, and this low variation may be related to the quarantine period
or also generated by modifying the code-source at the pre-
during the Covid-19 pandemic.
processing stage, such as by adding gives 1st derivative
458
Authorized licensed use limited to: LIVERPOOL JOHN MOORES UNIVERSITY. Downloaded on August 13,2022 at 06:16:40 UTC from IEEE Xplore. Restrictions apply.
Machine Translated by Google
Fig. 5 shows the autocorrelation function of energy parameters, the one with the smallest mean square error is
consumption shown with hourly frequency for one of the monitored chosen.
residences. Not graphic, a higher value not on the ordinates
indicates a high correlation between the temporal series and the
non-temporary series in k units, with k represented by the axis of
the abscissas. It can be observed autocorrelation peaks for delays
of 24 in 24 hours, evidencing daily seasonality.
Fig. 7 Importance of the features for the forecast model of one of the
residences of the project
Fig. 5 Autocorrelation function of hourly energy consumption (band of A random seed used by XGBoost is fixed
significance of 95%)
automatically, guaranteeing the reproducibility of two results (Test
Infrastructure 1: the training of the model is reproducible).
Fig. 6 shows the 1st decision tree of the model, being possible
to observe the relevance of the time of day in the prediction of the
model, while Fig. 7 shows the importance of the features Both the data and the code are versioned by means of Git
for one of the residences, informing the relative contribution of and DVC version control systems (test Model 1: changes are
each feature in the XGBoost tree breeding process. The possibility revised and versioned).
of obtaining information related to the internal structure of the
model is important because it allows us to debug the operation Fig. 8 shows an example of the forecast of energy consumption
and investigate performance problems or instability (Test for one of the residences made during the month of July, being
Infrastructure 5: it is possible to carry out the debugging of the possible to observe the daily seasonality of energy consumption.
model).
The models are also endorsed with extreme tickets E. Assessment
or even invalid, endorsing its robustness (Monitoring test 5: the The evaluation of the accuracy in a static environment was
model is numerically stable). The inputs tested are: consumption carried out by means of the proposed method in [20]. This method
equal to zero, negative, infinity and with missing values. is trained multiple models, each one based on training dice of
different instants, in order to reflect the change of new dice in an
online environment (test Model 4: the aging effects of the model
D. Model Training and Prediction are known and test Model 6: a quality of the model is sufficient in
The model uses the XGBoost library to predict the hourly all parts of the data).
consumption of the next 24 hours, being trained with the features
of the consumption of 23, 24 and 25 hours ago, at the hour of the
day, day of the week, day of the month, day of the year and month It is considered a daily training schedule, with data separation
of 80% for training and 20% for tests. The hyper parameters are
defined by means of search in grade, as stated in Section IV.D.
To perform or adjust two hyperparameters of XGBoost, a
search was performed in the grade (grid search) with cross
validation with partitioning in 4 subsets for each residence (test The adjusted error is used to compare the updated model
Model 3: all hyperparameters are adjusted), varying the with the previous one, being used the one with less error
parameters of size of trees, learner taxa, and objective function to (Infrastructure test 4: the quality of the model is validated before
serving it and Monitoring test 4: models are not old too).
be minimized. apos
or thirty two models for each hyper combination
459
Authorized licensed use limited to: LIVERPOOL JOHN MOORES UNIVERSITY. Downloaded on August 13,2022 at 06:16:40 UTC from IEEE Xplore. Restrictions apply.
Machine Translated by Google
460
Authorized licensed use limited to: LIVERPOOL JOHN MOORES UNIVERSITY. Downloaded on August 13,2022 at 06:16:40 UTC from IEEE Xplore. Restrictions apply.
Machine Translated by Google
TABLE IV. TESTS RELATED TO GIVEN metrics proposed in the literature, and the positive and negative
Test No. points in their due use, not development of systems
1 3 4 5 6 7
Set
two
of online machine learning, through a case study using real data
Dices AMMN/AN/AAM of energy consumption of 4 residences in the period of 2020 in
Model AN/YYYMM - order to approximate a real scenario found in the industry.
Infrastructure A- - AMN/A -
Monitoring - AN/AAMA -
The results also show the difficulty in defining metrics that
capture the relevant characteristics in the domain of the energy
Due to having a low number of users at the moment, or the
consumption prediction problem, especially the need to consider
project still does not pay careful attention to questions of social
both spatial and temporal errors. Other difficulties encountered
inclusion of the system (test Model 7). When new users are
during the development of the system were also considered, and
invited to participate, questions of representativeness of the
the impact of these difficulties were quantified from experiments
Brazilian population will be important so as not to be sent to the
system. in dynamic scenarios, we were able to analyze the impact of the
model training, reducing the adjusted error by 11.03% in relation
At the moment there were no changes in the structure of two to a model static.
input data, so that the monitoring of changes (test Monitoring 1)
is not currently carried out, but in future stages, in the case of
As future work, the application of the system in a larger
new features obtained from external sources, such as forecast of
number of users can help in the validation of its scalability, also
time, maybe inserted, this test will become more important.
taking into account the representation of the Brazilian population
in the selection of new testers to analyze the performance of two
Given that the model forecasts consumption for the next 24 models for different consumption profiles. Além of the residential
hours, or real-time monitoring of the quality of the forecasts made scope focused on this work, the application of the architecture of
(Test Monitoring 7) was not done, once its accuracy can only be MLOps proposed power
measured 24 hours after the forecast. also contribute to the prediction of consumption in industrial or
commercial establishments, being a possible future scenario to
analyze its viability, gains for the client, consumption peculiarities
The exploratory data analysis proved to be of extreme
and make a comparison
importance, satisfying various tests (Data 2, Model 5, Infrastructure com or residential sector.
5, Monitoring 5) that were previously performed manually, can be
reused in the future for addition to the pipeline, being executed ACKNOWLEDGMENTS
automatically.
The authors thank the patrimonial fund “Amigos da
Poli” for financial support.
One of the main objections to the project start-up was related
to the large number of functional changes due to the project start- REFERENCES
up, on the assumption that the tests implemented at this stage [1] K. Carrie Armel, A. Gupta, G. Shrimali, and A. Albert, “Is
would quickly become obsolete. However, this belief was disaggregation the holy grail of energy efficiency? The case
unfounded, once the simple definition of two tests did not just of electricity,” Energy Policy, vol. 52, p. 213–234, 2013.
[two] V. Hayashi, R. Arakaki, T. Fujii, K. Khalil, and F. Hayashi, “B2B
verify the correct execution of the code, but also the development
B2C Architecture for Smart Meters using IoT and Machine
process, following the test-driven development technique (TDD) Learning: a Brazilian Case Study,” International Conference on
[28]. Smart Grids and Energy Systems, p . to be published, 2020.
[3] V. Hayashi, T. Fujii, R. Arakaki, H. Amaral, and A. Souza, “Boa Energia:
Public Database of Residential Consumption with Data Quality,” 2020.
The versioning of data was shown to be important not only
for the online environment, but also in experimentation during the [4] S. Humeau, TK Wijaya, M. Vasirani, and K. Aberer, “Electricity
exploratory analysis of two data, guaranteeing the reproducibility load forecasting for residential customers: Exploiting aggregation
and correlation between households,” 2013 Sustainable Internet
of experiments carried out in previous versions.
and ICT for Sustainability, SustainIT 2013, 2013.
[5] M. Martins PB, RGD Pinto, and SP Bittencourt, “Load Disaggregation of
During the development it was necessary to balance the Industrial Machinery Power Consumption Monitoring Using Factorial
Hidden Markov Models,” The International Workshop on Non-Intrusive
delivery of results with the execution of tests, so that in defining Load Monitoring (NILM), p. 6, 2018.
priorities it was extremely important not to run the project. A
selection of priorities was calculated according to the probability [6] W. Kong, ZY Dong, Y. Jia, DJ Hill, Y. Xu, and Y. Zhang,
of related problems occurring, as well as the impact of these “Short-Term Residential Load Forecasting Based on LSTM
Recurrent Neural Network,” IEEE Transactions on Smart Grid,
problems on the system.
vol. 10, no. 1, p. 841–851, 2019.
Another key point for the execution of two tests was the [7] P. Lusis, KR Khalilpour, L. Andrew, and A. Liebman, “Short
modularization of the stages of the pipeline. By accurately defining term residential load forecasting: Impact of calendar effects
and forecast granularity,” Applied Energy, vol. 205, no. March,
its functionalities, expected inputs and outputs, it becomes easier p. 654–669, 2017.
to alter the source code and experiment with new configurations, [8] S. ben Taieb and RJ Hyndman, “A gradient boosting approach
visualizing more clearly the impact of these changes in the project. to the Kaggle load forecasting competition,” International Journal
of Forecasting, vol. 30, no. 2, p. 382–394, 2014.
[9] P. Serrenho, T., Bertoldi, “Smart home and appliances: State of
SAW. CONCLUSION the art,” Luxembourg, 2019.
[10] HLMD Amaral, JAG Maginador, RMJ Ayres, AN de Souza, and DS
Neste artigo foram presented the details of implementation, Gastaldello, “Integration of consumption forecasting in smart meters and
the difficulties encountered in evaluating smart home management
461
Authorized licensed use limited to: LIVERPOOL JOHN MOORES UNIVERSITY. Downloaded on August 13,2022 at 06:16:40 UTC from IEEE Xplore. Restrictions apply.
Machine Translated by Google
systems,” SBSE 2018 - 7th Brazilian Electrical Systems Symposium, [19] P. Sugimura and F. Hartl, “Building a reproducible machine learning
pp. 1–6, 2018. pipeline,” arXiv, 2018.
[eleven] Y. Wang, Q. Chen, T. Hong, and C. Kang, “Review of Smart Meter [twenty] JA Guajardo, R. Weber, and J. Miranda, “A model updating strategy
Data Analytics: Applications, Methodologies, and Challenges,” for predicting time series with seasonal patterns,” Applied Soft
IEEE Transactions on Smart Grid, Vol. 10, no. 3, p. 3125–3148, Computing Journal, vol. 10, no. 1, p. 276–283, 2010.
2019. [twenty-one] Y. Wang, Q. Chen, T. Hong, and C. Kang, “Review of Smart Meter
[12] W. Tushar, TK Saha, C. Yuen, P. Liddell, R. Bean, and HV Data Analytics: Applications, Methodologies, and Challenges,”
Poor, “Peer-to-Peer Energy Trading With Sustainable User IEEE Transactions on Smart Grid, Vol. 10, no. 3, p. 3125–3148, 2019.
Participation: A Game Theoretic Approach,” IEEE Access, vol. 6, no.
October, p. 62932–62943, 2018. [22] A. Stefan, V. Athitsos, and G. Das, “The move-split-merge metric for
[13] A. Pratt, D. Krishnamurthy, M. Ruth, H. Wu, M. Lunacek, and P. time series,” IEEE Transactions on Knowledge and Data Engineering,
Vaynshenk, “Transactive Home Energy Management Systems: The vol. 25, no. 6, p. 1425–1438, 2013.
Impact of Their Proliferation on the Electric Grid,” IEEE Electrification [23] V. le Guen and N. Thome, “Shape and Time Distortion Loss for
Magazine, vol. 4, no. 4, p. 8–14, Dec. 2016. Training Deep Time Series Forecasting Models,” no. NeuroIPS, pp.
[14] A. Gerossier, R. Girard, A. Bocquet, and G. Kariniotakis, “Robust day- 1–13, 2019.
ahead forecasting of household electricity demand and operational [24] S. Haben, J. Ward, D. Vukadinovic Greetham, C. Singleton, and P.
challenges,” Energies, vol. 11, no. 12, 2018. Grindrod, “A new error measure for forecasts of household level, high
[fifteen] M. Senapathi, J. Buchan, and H. Osman, “DevOps Capabilities, resolution electrical energy consumption,”
Practices, and Challenges,” in Proceedings of the 22nd International International Journal of Forecasting, vol. 30, no. 2, p. 246–256, 2014.
Conference on Evaluation and Assessment in Software Engineering
2018, Jun. 2018, pp. 57–67. [25] R. Arakaki, VT Hayashi, and WV Ruggiero, “Available and Fault
[16] D. Sculley et al., “Hidden technical debt in machine learning systems,” Tolerant IoT System: Applying Quality Engineering Method,” 2nd
Advances in Neural Information Processing Systems, vol. 2015- International Conference on Electrical, Communication and Computer
Janua, pp. 2503–2511, 2015. Engineering, ICECCE 2020, no.
[17] Google Cloud, “MLOps: Continuous delivery and automation pipelines June, 2020.
in machine learning,”
learning/mlops
https://cloud.google.com/solutions/machine-
pipelines-in-machine 2020.
continuous-delivery-and-automation-
learning. [26] T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,”
Proceedings of the ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining, vol. 13-
17-August, pp. 785–794, 2016.
[18] E. Breck, S. Cai, E. Nielsen, M. Salib, and D. Sculley, “The ML test [27] Plotly Technologies Inc., “Dash.” Plotly Technologies Inc., Montreal,
score: A rubric for ML production readiness and technical debt QC.
reduction,” in 2017 IEEE International Conference on Big Data (Big [28] K. Beck, Test Driven Development: By Example, 1st ed. Addison
Data ), Dec. 2017, vol. 47, no. 3, p. 1123–1132. Wesley Professional, 2002.
462
Authorized licensed use limited to: LIVERPOOL JOHN MOORES UNIVERSITY. Downloaded on August 13,2022 at 06:16:40 UTC from IEEE Xplore. Restrictions apply.