You are on page 1of 7

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/327159996

Integrating Small Scale Green Energy into Smart Grids: Prediction for Peak Load
Reduction

Conference Paper · August 2018


DOI: 10.1109/COMAPP.2018.8460222

CITATIONS READS
0 16

4 authors, including:

Sonam Rinchen Abdulsalam Yassine


Lakehead University Thunder Bay Campus Lakehead University Thunder Bay Campus
1 PUBLICATION   0 CITATIONS    69 PUBLICATIONS   492 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Multimedia Big Data View project

Smart Meters Big Data View project

All content following this page was uploaded by Abdulsalam Yassine on 22 August 2018.

The user has requested enhancement of the downloaded file.


Integrating Small Scale Green Energy into Smart
Grids: Prediction for Peak Load Reduction
Sonam Rinchen∗ , Abdulsalam Yassine∗ , Kevin Schwartzentruber∗ , Hamsa Ahmed∗ , Andy Armitage†
∗ Department
of Software Engineering, Lakehead University, 955 Oliver Road,Thunder Bay, Canada
† Faculty
of Business, Lakehead University, 955 Oliver Road,Thunder Bay, Canada
srinchen@lakeheadu.ca, ayassine@lakeheadu.ca, kschwart@lakeheadu.ca, hahmed7@lakeheadu.ca, ajarmita@lakeheadu.ca

Abstract—The emerging Smart Grid technologies allow for the


integration of clean energy from small-scale energy generators
(SEGs). In this paper, we investigate a model by which an electric
grid operator (EGOs) schedules the integration of clean energy
from residential homes acting as SEGs. These SEGs are equipped
with rooftop photovoltaic (PV) and a bank of utility grade battery
systems. The challenge facing the electric grid operator (EGO) is
that home-based battery systems require several hours of sunlight
to charge from rooftop PV panels, and an average 90 minutes to
be discharged to 30% original capacity. The EGO must be able to
schedule the discharging cycle so that it coincides with the time of
the highest peak load during the day for efficient cost reduction.
In this paper, we propose a model that allows the EGO to predict
the highest peak of energy consumption on the distribution feed
where the SEGs are connected. For the realization of the system,
we have acquired a dataset which includes time series of energy Fig. 1. One Day of Energy Consumption of a Power feed for 1500 houses,
consumption data for approximately 1500 houses including 3 5 minute intervals
SEGs. We performed our prediction using the multivariate
autoregressive integrated moving average (MARIMA) method
and achieved 92.64% accuracy. The real-life implementation of KW)in 24 hour period. The data in the figure is an actual
the system and the prediction model are described in this paper. aggregated daily consumption data from approximately 1500
houses connected to a power distribution system in Thunder
I. I NTRODUCTION
Bay, Ontario, Canada. In this example, the EGO must schedule
Recently, Small-scale Energy Generators (SEGs) such as the discharging process within the shaded square area, other-
residential houses, small businesses, and communities, are be- wise will not be able to benefit from reducing the cost of the
coming integral players in stabilizing the electric grid system maximum peak load. However, accurately determining when
[1]. SEGs utilize systems such as rooftop solar photo voltaic the time of the highest peak is going to occur is practically a
(PV) power systems and wind turbine stations for energy challenging prospect. In this paper, we investigate and develop
generation. The generated energy is either fed into the power a novel prediction algorithm that determines the peak load
grid directly based on an incentive policy mechanism as in using machine learning algorithms and weather data. Our
the Feed-in Tariff (FiT) policy in Ontario, Canada or stored mechanism allows the EGO to determine the time of the
locally in batteries for future use [2]. The former approach highest peak load with high accuracy.
concerns government regulated markets where licensed util- Prediction mechanisms for peak load reduction have been
ity companies sell electricity. In the latter approach, energy explored extensively in the past. The most common approaches
producers can freely trade energy in semi-regulated or open that are close to our work employ machine learning models
electricity markets. The research in this paper focuses on the for short-term load forecasting (STLF). For example, the
latter approach. Specifically, we study the integration of clean work in [5] and [6] use Artificial Neural for STLF while
energy into smart grid from home-based battery systems. In [7] and [8] use fuzzy logic technique and random forest
this situation, the challenge facing the electric grid operator techniques to relax the complexities of non-linear forecasting
(EGO) is that home-based battery systems require several models. Other studies such as [9] and [10] implement Support
hours of sunlight to charge from rooftop solar panels, and an Vector Regression (SVR) instead of Artificial Neural Networks
average 90 minutes to be discharged to 30% original capacity (ANN) for adaptive STLF. For a daily forecast of peak load
[3] [4]. The EGO must be able to schedule the discharging demand, the work in [11] and [12] proposed hybrid mecha-
cycle so that it coincides with the time of the highest peak nisms that combine statistical approaches and AI models. The
load during the day for efficient cost reduction. work in [15] proposes an adaptive univariate and multivariate
The above problem is illustrated in figure 1. The peak approach to select the best forecasting model among several
load in the shaded square includes the highest load (582214 methods in the system as a means of coping with forecasting
errors. The work proposed in [16] used Bayesian networks A. System Components
to predict time series of energy consumption for short and In figure (2), we describe the components of the system.
long term forecasting. Other work such as [14] analyze energy It consists of residential homes connected to the distribution
consumption data to predict contributors to peak hours while network. Through the monitoring and data collection services
in [13] the focus is on identifying activities for healthcare (e.g., Supervisory control and data acquisition (SCADA), the
applications.The above-discussed approaches are genuine, but EGO collects the energy consumption data about the feeds that
they suffer from shortcoming when real-world guarantees must supply the homes with electricity. A stream of data runs every
be satisfied for meaningful end-user applications. In this paper, 5 minutes includes energy consumption for all consumers con-
we study the issue from a practical point of view and propose a nected to the feed. The Online analytical processing (OLAP)
system that applies to real-world scenarios where the size and database stores energy consumption about each feed for 30
the resolution of the data generated from smart meters vary days only. It must be noted here, that the 30 days historical
according to the utility’s cyber infrastructure. Our prediction data is specific to our system, however, other utilities may
model is based on the multivariate autoregressive integrated store data for any period. Furthermore, the frequency of data
moving average (MARIMA) method [18]. The preliminary acquisition may vary (e.g., 5 min, 15 min, etc.) depending
results of deploying the prediction mechanism are presented in on the configuration of the system. Nonetheless, the OLAP
this paper including the accuracy for determining the highest database is updated on a daily basis. The prediction and control
peak load of the underlying distribution power feed. services are responsible for forecasting the time of the highest
The rest of the paper is organized as follows: In section II, peak during the day. It also controls the home battery systems
we present the components of the proposed platform followed and issues signals for discharging according to the prediction
by a study case in section III. Finally, in section IV we results. In our system, the utility company and the homeowners
conclude the paper and provide direction for future work. are in a contract that allows the EGO to keep 30% of the
batteries original capacity.
II. S YSTEM OVERVIEW
B. Prediction Mechanism
In our system, there are several houses equipped with In the above mentioned system, energy consumption data is
solar panels and utility grade battery systems connected to a time series data collected every 5 minutes. It gets updated
electric distribution power feeds. The EGO monitors the home every day and stored for 30 days only. The goal is to
batteries via a direct control mechanism and discharges them predict the next hours using the historical energy consumption
during peak times. The following are the main requirements readings and weather data since there is correlation with
for the EGO to discharge the batteries into the power feed. how consumer use there electricity. In this paper, we propose
• The EGO must collect daily data about the status of the the use of Multivariate Auto Regressive Integrated Average
power feed and predict the peak times. It is of utmost (MARIMA) to perform the prediction [18]. The rational for
importance to determine the time of the highest peak to using MARIMA is as follows: First, time series models such
reduce the cost of paying for extra energy. as ARIMA are applied with variables measured over time. In
• Energy load is a stream of large time series data at our model, we have multiple variables affecting the prediction
high resolution. Processing such vast amount of continu- including, time, energy consumption, temperature, dew point,
ous data requires a cost-effective and resource efficient humidity, windspeed, visibility, pressure, wind chill. These
mechanism that meets the requirements of the timely multi variables have varying effect on energy consumption.
integration of clean energy into the electric power grid. Second, the model requires that data include persistence read-
• The EGO must also have a mechanism to acquire past ings, otherwise the error get magnified for longer period. In
and future weather forecasts to predict the energy load our system the data is captured in fixed intervals and the goal
on the feeder. Weather status plays a significant role in is to forecast for shorter time steps. Furthermore, MARIMA
consumers’ energy consumption behavior. models are useful for data that exhibit non-stationary due to
• Home batteries depend on solar panels to charge. They the presence of trend and seasonality in the observations [19].
take longer time to charge than discharge due to the MARIMA model is an extension to the simpler form of
intermittent availability of sunlight especially during Win- Auto-Regressive (AR) and Moving-Average ((M A)) models.
ter season. Therefore, the EGO practically has one time Such models, in general, can be expressed as ARIMA ((p, q)
window during the day to discharge the batteries into the [18] where p and q represent the order of the autoregressive
grid system and the moving average components respectively. In AR
models, the value of the a variable in one period is related
To support the above requirements, this paper presents a to its value in previous periods. The AR model is defined as
platform capable of processing and analyzing a large volume follows: p
of energy consumption data streaming from the utility’s power X
feed. Next section provides more details about the design yt = γi yt−i + t (1)
approach and the prediction mechanism used to determine the i=1

highest peak load during the day. where γp is the coefficient for lagged variable in time t − p.
Fig. 2. Residential Houses Acting as SEGs are equipped with Utility Grade Battery Banks Connected to the Electric Distribution System

In the MA model a relationship between a variable and the III. E VALUATION


residuals of the previous periods may exist. The MA model is As mentioned above, this study was conducted to predict the
defined as follow: daily peak time of power usage for EGOs. EGOs could then
q
X use the highest predicted peak hour to discharge home batteries
yt = φi t−i + t (2) which to lower energy generation cost. In our evaluation, we
i=1 used a dataset of actual energy consumption on the power
feed connecting to 1500 homes in Thunder Bay, Ontario,
where yt for (t = 1, 2, ..., N ), φp is the coefficient for lagged Canada. These data are measured in kilowatts and contain
error term in time t − q. one year energy time series collected from January 2014
Putting equation (1) and (2) together we get the ARMA to December 2014 with 5 minute resolution intervals. The
model as a combination of AR and MA. privacy of the users is assured through the aggregation of
p
X q
X all consumption values so that individual energy consumption
yt = γi yt−i + φi t−i + t (3) is not identifiable. Other forms of privacy protection that
i=1 i=1 allows sharing data is discussed in [17] [22] [23] and [21].
Furthermore, we collected for the same period the weather data
Equation (3) can be written as follows: from Environment Canada [20]. The weather data contains the
hourly measurements of Temperature, Dewpoints, Humidity,
yt = γ1 yt−1 + ... + γp yt−p + t + φ1 t−1 + ... + φq t−q (4) Windspeed, Visibility, Pressure, and Wind-chill as shown in
figure (3). Followings are step by step direction for the
Take an non-zero average µ for yt such that yt = Yt + µ, prediction of daily peak time by using MARIMA.
then the solution is:
A. Data Preparation and Processing
yt = γ1 yt−1 +...+γp yt−p +t +φ1 t−1 +...+φq t−q + µ
b (5) At first we resolved the inconsistency of data points between
power consumption and weather data. Power consumption
where µb = µ − γ1 µ − ... − γp µ data were captured every five minutes whereas weather data
The ARIMA model includes a differencing operator when were captured every one hour. In order to make data set
data with non-seasonal measurements is generated. This is consistent, we changed the five-minute interval data points
typically happen by subtracting the current values of the time into hourly data points. These data points were converted by
series from the previous values, and continues until the impact taking average of all five-minute interval data points of the
of the trend is removed. Let the first difference operator, ∆ respective hour. Furthermore, the data set were thoroughly
defined as ∆yt = yt − yt−1 = (1 − l)yt , where l is the time checked for missing and invalid value. We have averaged
lag operator. The generalized differencing form is ∆d for any all data that are missing. The data that are ready for pre-
positive integer d. diction consist of eleven different variables, namely: power
Including the difference operator in equation (3), then the consumption, Temperature, Dewpoints, Humidity, Windspeed,
ARIMA model becomes [18] Visibility, Pressure, Wind-chill, Second, Sine and Cosine. The
p q
Second, Sine and Cosine variables represent the respective date
d d
X X and time. Finally, the processed data set consists of 721 data
(1 − l) yt = (1 − l) γi yt−i + φi t−i + t (6)
i=1 i=1
points per variable. These data points were divided into two
subsets, training set and testing set. Out of 721 data points 696
were used as training set and remaining 25 were used as testing
set. The training data set was fed into the model for the pre-
diction and the testing data set was used to compare with the
predicted value. The behavior of the model is tested through
a series of variations on the model autoregressive (AR),
integration (I), and moving average (M A). The order of
these values can be acquired using the ”auto.arima(variable)”
library package forecast in the free open-source language (R)
developed at Bell Laboratories which automatically generate
a set of optimal (p, d, q) parameters. This function searches
through combinations of order parameters and picks the set
that optimizes model fit criteria. The parameters(p, d, q) are
Fig. 3. Weather data for the period between January 2014 and December
defined as follows: p(AR), the number of lag observations 2014 from Environment Canada
included in the model, also called the lag order. d(I), the
number of times that the raw observations are differenced,
also called the degree of differencing. q(M A), the size of
the moving average window, also called the order of moving
average. Table (I) shows the respective optimal (p, d, q) after
using auto.arima() function on each variable.

TABLE I
MARIMA MODEL PARAMTER C OMPONENTS

p(AR) d(I) q(M A)


Power 5 1 4
Temp 2 1 1
DewPoint 3 1 3
Humidity 5 1 3
Visibility 1 0 1
Pressure 2 1 1 Fig. 4. Correlation Coefficient of the MARIMA model
WindChill 2 1 0
Second 1 0 3
Sine 1 0 0 TABLE II
Cosine 1 0 0 C ORRESPONDING MARIMA M ODEL ACCURACY W ITH R ESPECT TO
EACH F EATURE

B. Features Selection Accuracy Feature


Temp 98.06
We developed the very first version of MARIMA model DewPoint 91.05
Humidity 90.06
by using the above corresponding parameters to check the Visibility 90.09
correlation between each feature. Figure (4) shows the corre- Pressure 69.77
lation between each feature. According to above graph, power, WindChill 78.61
Second 81.42
temperature, pressure, windchill, second and cosine scored Sine 92.03
high in correlation value, which indicate that these features Cosine 70.41
would have strong impact on predicted value whereas dew-
point, humidity, visibility and sine would have least impact on
predicted value as these features scored lowest in correlation. We ordered all features in a descending order according to
The selection of features is based on the correlation value. their correlation value and run eleven difference iterations
However, in our case it is too early for us to decide the while adding one feature in every next iteration. Figure (5)
variable selection as the lowest correlation score still has shows that after the addition of the 8th variable there were
relatively significant impact of 20%. Thus, in the next step a dip in both Root Mean Square Error (RMSE) and Mean
of the MARIMA model we try to select the optimum number Absolute Error (MAE) values. This shows that model with
of features according to the lowest error value of the prediction seven highest correlation values of features would perform
model. the best. Although we have now acquired all the information
needed to finalize the model, we could still look for more
C. Finding Optimum Number of Features ways to optimize it. We initially ran the model with all eleven
Since the previous step did not resulted in elimination variables and their respective AR, I and MA patterns; the
of single variable, we tried to find the optimum number of accuracy of it was then recorded. We then ran the model ten
features through the determination of the model accuracy. more times where each time we remove one specific feature
(leaving the other nine features) and its respective AR, I and
MA (if needed). For example, in the first iteration we ran the
model without ”Temperature” and in the second iteration we
removed ”Dewpoint” only, we then added ”Temperature” to
the model. Hence, the impact of removing each variable on
the accuracy of the model was recorded. As it is shown in table
II, after removing dewpoint, hum, vis or sin accuracy of the
model either dropped by 1% or improved by 1%. This means
above four variables barely have any impact in improving the
accuracy of the final model. Thus, we are certain that the final
model should not have those four variables. (NOTE: Power
variable was never removed in this process because power is
the depended variable)
Fig. 6. Actual v.s Predicted Results using MARIMA Model. Predicting the
D. Results Highest Peak on December 30 2014 with accuracy 92.64 %

The final model is ready to build by using information ac-


quired in the steps above. We used the difference parameter ”I”
have showed that the proposed model can be very effective
from Table (I) and applied ”define.dif” function of MARIMA
and practical especially that it is deployed in real-life. The
library package [18] on each feature. The function define.dif
data set used in this study is an actual energy consumption of
returns the differenced value in one of the returned attributes
houses connected to the grid system in Thunder Bay, Ontario,
called y.dif. Like differencing, we used MA, AR parameters
Canada. Our plan for the future is to examine new prediction
of respective selected features to define the model which
models and enhance the performance of the system so that the
would returns MA and AR pattern along with other attributes.
prediction of the highest peak is conducted on a much granular
We then used ”marima” function of the same package with
data points. We also plan to assess the proposed model using
y.dif, MA pattern and AR pattern as parameters to build
additional feature selection methods and prediction models.
the final model. Finally, by using ”arma.forecast” function
of MARIMA library the prediction on depended features is
achieved. Figure (6) shows predicted vs actual power con- R EFERENCES
sumption value for 30th December 2014. The predicted peak
[1] M. E. Peck and D. Wagman, ”Energy trading for fun and profit buy
time on this day is off by one hour. However, if the granularity your neighbor’s rooftop solar power or sell your own-it’ll all be on a
of the dataset is changed to every 15 minutes instead of every blockchain,” in IEEE Spectrum, vol. 54, no. 10, pp. 56-61, October 2017
1 hour, then same model could have its predicted peak time [2] FIT http://www.energy.gov.on.ca/en/fit-and-microfit-program/2-year-fit-
review/
off by 15 minutes. The model achieved average accuracy of [3] L. Zhang, Z. Mu and C. Sun, Remaining Useful Life Prediction
92.64%. for Lithium-ion Batteries Based on Exponential Model and Particle
Filter in IEEE Access, vol. PP, no. 99, pp. 1-1. doi: 10.1109/AC-
CESS.2018.2816684
[4] A. Degla, M. Chikh, A. Chouder, F. Bouchafaa and A. Taallah, Update
battery model for photovoltaic application based on comparative anal-
ysis and parameter identification of leadacid battery models behaviour
in IET Renewable Power Generation, vol. 12, no. 4, pp. 484-493, 3 19
2018
[5] L. Hernndez, C. Baladrn, J.M. Aguiar, L. Calavia, B. Carro, A. Snchez-
Esguevillas,F. Prez, A. Fernndez, J. Lloret, Artificial neural network
for short-term load forecasting in distribution systems. Energies 2014,
7, 15761598.
[6] A.S. Khwaja, M. Naeem, A. Anpalagan, A. Venetsanopoulos, B.
Venkatesh, Improved short-term load forecasting using bagged neural
networks. Electric Power Systems. Res. 2015, 125, 109115
[7] K.B Song, Y.S. Baek, D.H. Hong, G. Jang, Short-Term load forecasting
for the holidays using fuzzy linear regression method. IEEE Transaction
on Power Systems, 2005, 20, pages 96101.
[8] N. Huang, G. Lu, D. Xu, A Permutation Importance-Based Feature
Fig. 5. Error results of testing MARIMA on various features Selection Method for Short-Term Electricity Load Forecasting Using
Random Forest. Energies 2016, 9, 767.
[9] G. LV, X. Wang, Y. Jin, Short-Term Load Forecasting in Power System
IV. C ONCLUSION AND F UTURE W ORK Using Least Squares Support Vector Machine. Computational Intelli-
gence Theory Applications. 2006, 38, 117126.
This paper presented a system of integrating small-scale [10] Y.H. Chen, W.C. Hong, W. Shen, N.N. Huang, Electric Load Forecasting
energy generators into the smart grid. The prediction model Based on a Least Squares Support Vector Machine with Fuzzy Time
described in the paper allows electric grid operators to sched- Series and Global Harmony Search Algorithm. Energies 2016, 9, 70.
[11] M. Ghayekhloo, M. Menhaj, M. Ghofrani, A hybrid short-term load
ule the discharge of home battery system during peak time to forecasting with a new data preprocessing framework. Electric. Power
balance the energy demand and reduce electricity costs. We Systems. Res. 2015, 119, 138148.
[12] C.W Lee, B.Y. Lin, Application of Hybrid Quantum Tabu Search with
Support Vector Regression (SVR) for Load Forecasting. Energies 2016,
9, 873.
[13] A. Yassine, S. Singh and A. Alamri. Mining Human Activity Patterns
From Smart Home Big Data for Health Care Applications. IEEE Access,
vol. 5, pp. 13131-13141, 2017. doi: 10.1109/ACCESS.2017.2719921
[14] S. Singh and A. Yassine. Mining Energy Consumption Behavior Patterns
for Households in Smart Grid. IEEE Transactions on Emerging Topics
in Computing. doi: 10.1109/TETC.2017.2692098
[15] M. Matija, J.A. Suykens, S. Krajcar, Load forecasting using a mul-
tivariate meta-learning system. Expert Systems Application 2013, 40,
44274437.
[16] S. Singh, A. Yassine, Big Data Mining of Energy Time Series for
Behavioral Analytics and Energy Consumption Forecasting. Energies
2018, 11, 452
[17] A. Yassine, A. A. Nazari Shirehjini and S. Shirmohammadi, ”Smart
Meters Big Data: Game Theoretic Model for Fair Data Sharing in
Deregulated Smart Grids”, in IEEE Access, vol. 3, no. , pp. 2743-2754,
2015.
[18] H. Spliid, Multivariate Time Series Estimation using MARIMA 38.
Symposium in Anvendt Statistik 2016, orbit.dtu.dk
[19] P.K. Kenabatho, B.P. Parida, D.B. Moalafhi and T. Segosebe, Analysis
of rainfall and large-scale predictors using a stochastic model and ar-
tificial neural network for hydrological applications in southern Africa,
Hydrological Sciences Journal, Vol. 60, No 11, pages 1943-1955, 2015
[20] Weather Canada Available from https://weather.gc.ca/ last access April
2018
[21] A.Yassine, S.Shirmohammadi, ”Measuring user’s privacy payoff using
intelligent agents”, Computational Intelligence for Measurement Sys-
tems and Applications, 2009, CIMSA’09. IEEE International Conference
on, pp169-174
[22] A.Yassine, S.Shirmohammadi, ”Privacy and the market for private
data: a negotiation model to capitalize on private data”, IEEE/ACS
International Conference on Computer Systems and Applications, Doha,
2008, pp. 669-678.
[23] A. Yassine, A. A. N. Shirehjini, S. Shirmohammadi, and T. T. Tran,
”Knowledge-empowered agent information system for privacy payoff in
eCommerce”, Knowl. Inf. Syst. , vol. 32, no. 2, pp. 445-473, Aug. 2012.

View publication stats

You might also like