You are on page 1of 13

energies

Article
A High Precision Artificial Neural Networks Model
for Short-Term Energy Load Forecasting
Ping-Huan Kuo 1 ID
and Chiou-Jye Huang 2, * ID

1 Computer and Intelligent Robot Program for Bachelor Degree, National Pingtung University,
Pingtung 90004, Taiwan; phkuo@mail.nptu.edu.tw
2 School of Electrical Engineering and Automation, Jiangxi University of Science and Technology,
Ganzhou 341000, Jiangxi, China
* Correspondence: chioujye@163.com; Tel.: +86-137-2624-7572

Received: 14 December 2017; Accepted: 9 January 2018; Published: 16 January 2018

Abstract: One of the most important research topics in smart grid technology is load forecasting,
because accuracy of load forecasting highly influences reliability of the smart grid systems. In the
past, load forecasting was obtained by traditional analysis techniques such as time series analysis
and linear regression. Since the load forecast focuses on aggregated electricity consumption patterns,
researchers have recently integrated deep learning approaches with machine learning techniques.
In this study, an accurate deep neural network algorithm for short-term load forecasting (STLF) is
introduced. The forecasting performance of proposed algorithm is compared with performances of
five artificial intelligence algorithms that are commonly used in load forecasting. The Mean Absolute
Percentage Error (MAPE) and Cumulative Variation of Root Mean Square Error (CV-RMSE) are used
as accuracy evaluation indexes. The experiment results show that MAPE and CV-RMSE of proposed
algorithm are 9.77% and 11.66%, respectively, displaying very high forecasting accuracy.

Keywords: artificial intelligence; convolutional neural network; deep neural networks; short-term
load forecasting

1. Introduction
Nowadays, there is a persistent need to accelerate development of low-carbon energy technologies
in order to address the global challenges of energy security, climate change, and economic growth.
The smart grids [1] are particularly important as they enable several other low-carbon energy
technologies [2], including electric vehicles, variable renewable energy sources, and demand
response. Due to the growing global challenges of climate, energy security, and economic growth,
acceleration of low-carbon energy technology development is becoming an increasingly urgent issue [3].
Among various green technologies to be developed, smart grids are particularly important as they are
key to the integration of various other low-carbon energy technologies, such as power charging for
electric vehicles, on-grid connection of renewable energy sources, and demand response.
The forecast of electricity load is important for power system scheduling adopted by energy
providers [4]. Namely, inefficient storage and discharge of electricity could incur unnecessary
costs, while even a small improvement in electricity load forecasting could reduce production costs
and increase trading advantages [4], particularly during the peak electricity consumption periods.
Therefore, it is important for electricity providers to model and forecast electricity load as accurately as
possible, in both short-term [5–12] (one day to one month ahead) and medium-term [13] (one month
to five years ahead) periods.
With the development of big data and artificial intelligence (AI) technology, new machine learning
methods have been applied to the power industry, where large electricity data need to be carefully

Energies 2018, 11, 213; doi:10.3390/en11010213 www.mdpi.com/journal/energies


Energies 2018, 11, 213 2 of 13

managed. According to the Mckinsey Global Institute [14], the AI could be applied in the electricity
industry for power demand and supply prediction, because a power grid load forecast affects many
stakeholders. Based on the short-term forecast (1–2 days ahead), power generation systems can
determine which power sources to access in the next 24 h, and transmission grids can timely assign
appropriate resources to clients based on current transmission requirements. Moreover, using an
appropriate demand and supply forecast, electricity retailers can calculate energy prices based on
estimated demand more efficiently.
The powerful data collection and analysis technologies are becoming more available on the
market, so power companies are beginning to explore a feasibility of obtaining more accurate results
using AI in short-term load forecasts. For instance, in the United Kingdom (UK), the National Grid
is currently working with the DeepMind [15,16], a Google-owned AI team, which is used to predict
the power supply and demand peaks in the UK based on the information from smart meters and by
incorporating weather-related variables. This cooperation tends to maximize the use of intermittent
renewable energy and reduce the UK national energy usage by 10%. Therefore, it is expected that
electricity demand and supply could be predicted and managed in real time through deep learning
technologies and machines, optimizing load dispatch, and reducing operation costs.
The load forecasting can be categorized by the length of forecast interval. Although there is no
official categorization in the power industry, there are four load forecasting types [17]: very short term
load forecasting (VSTLF), short term load forecasting (STLF), medium term load forecasting (MTLF),
and long term load forecasting (LTLF). The VSTLF typically predicts load for a period less than 24 h,
STLF predicts load for a period greater than 24 h up to one week, MTLF forecasts load for a period
from one week up to one year, and LTLF forecasts load performance for a period longer than one year.
The load forecasting type is chosen based on application requirements. Namely, VSTLF and STLF are
applied to everyday power system operation and spot price calculation, so the accuracy requirement
is much higher than for a long term prediction. The MTLF and LTLF are used for prediction of power
usage over a long period of time, and they are often referenced in long-term contracts when determining
system capacity, costs of operation and system maintenance, and future grid expansion plans. Thus,
if the smart grids are integrated with a high percentage of intermittent renewable energy, load forecasting
will be more intense than that of traditional power generation sources due to the grid stability.
In addition, the load forecasting can be classified by calculation method into statistical methods
and computational intelligence (CI) methods. With recent developments in computational science
and smart metering, the traditional load forecasting methods have been gradually replaced by AI
technology. The smart meters for residential buildings have become available on the market around
2010, and since then, various studies on STLF for residential communities have been published [18,19].
When compared with the traditional statistical forecasting methods, the ability to analyze large amounts
of data in a very short time frame using AI technology has displayed obvious advantages [10].
Some of frequently used load forecast methods include linear regression [5,6,20], autoregressive
methods [7,21], and artificial neural networks [9,22,23]. Furthermore, clustering methods were also
proposed [24]. In [20,25] similar time sequences were matched while in [24] the focus was on customer
classification. A novel approach based on the support vector machine was proposed in [26,27].
The other forecasting methods, such as exponential smoothing and Kalman filters, were also applied
in few studies [28]. A careful literature review of the latest STLF method can be found in [8]. In [13],
it was shown that accuracy of STLF is influenced by many factors, such as temperature, humidity,
wind speed, etc. In many studies, the artificial neural network (ANN) forecasting methods [9–11,29]
have been proven to be more accurate than traditional statistical methods, and accuracy of different
ANN methods has been reviewed by many researchers [1,30]. In [31], a multi-model partitioning
algorithm (MMPA) for short-term electricity load forecasting was proposed. According to the obtained
experimental results, the MMPA method is better than autoregressive integrated moving average
(ARIMA) method. In [17], authors used the ANN-based method reinforced by wavelet denoising
algorithm. The wavelet method was used to factorize electricity load data into signals with different
Energies 2018, 11, 213 3 of 14

moving average (ARIMA) method. In [17], authors used the ANN-based method reinforced
Energies 2018, 11, 213
by
3 of 13
wavelet denoising algorithm. The wavelet method was used to factorize electricity load data into
signals with different frequencies. Therefore, the wavelet denosing algorithm provides good
frequencies.
electricity load Therefore,
data forthe wavelet
neural denosing
network algorithm
training providesload
and improves good electricityaccuracy.
forecasting load data for neural
network training and improves load forecasting accuracy.
In this study, a new load forecasting model based on a deep learning algorithm is presented.
The In this study,
forecasting a new of
accuracy load forecasting
proposed model model based
is within theon a deep learning
requested range, and algorithm
model has is advantages
presented.
The forecastingand
of simplicity accuracy
highofforecasting
proposed model is within the
performance. Therequested range, and model
major contributions of this has paper
advantages of
are: (1)
simplicity and high forecasting performance. The major contributions
introduction of a precise deep neural network model for energy load forecasting; (2) comparison ofof this paper are: (1) introduction
of a precise deep
performances ofneural
severalnetworkforecastingmodel for energy
methods; load
and, (3)forecasting;
creation of (2) comparison
a novel research of direction
performances in timeof
several
sequence forecasting
forecastingmethods;
based and, (3) creation ofneural
on convolutional a novel research direction in time sequence forecasting
networks.
based on convolutional neural networks.
2. Methodology of Artificial Neural Networks
2. Methodology of Artificial Neural Networks
Artificial neural networks (ANNs) are computing systems inspired by the biological neural
Artificial
networks. Theneural
general networks
structure (ANNs)
of ANNs are computing systemsweights,
contains neurons, inspired and by the bias. biological
Based on neural
their
networks.
powerful The general
molding structure
ability, ANNs of are
ANNs stillcontains neurons,
very popular weights,
in the machine andlearning
bias. Based field. onHowever,
their powerful there
molding
are many ability,
ANNANNs structuresare still very
used in popular
the machine in thelearning
machineproblems,
learning field.but theHowever,
Multilayer therePerceptron
are many
ANN structures used in the machine learning problems, but the Multilayer
(MLP) [32] is the most commonly used ANN type. The MLP is a fully connected structure artificial Perceptron (MLP) [32] is
the mostnetwork.
neural commonly Theused ANNof
structure type.
MLP The MLP isin
is shown a fully
Figureconnected structure
1. In general, the MLPartificial
consistsneural network.
of one input
The structure of MLP is shown in Figure 1. In general, the MLP consists
layer, one or more hidden layers, and one output layer. However, the MLP network presented of one input layer, one or more in
hidden
Figure layers,
1 is theandmost one output MLP
common layer.structure,
However,which the MLPhas network
only one presented
hidden layer. in Figure
In the1MLP, is theall
mostthe
common
neurons MLP of thestructure,
previouswhich hasfully
layer are only connected
one hiddentolayer. In the MLP,
the neurons of the allnext
the neurons
layer. Inof the previous
Figure 1, x1, x2,
layer are fully connected to the neurons of the next layer. In Figure 1, x ,
x3, ... , x6 are the neurons of the input layer, h1, h2, h3, h4 are the neurons1 of2 the3 hidden6 layer,x , x , . . . , x are theand
neurons
y1, y2,
of
y3,the input
y4 are thelayer, h1 , hof
neurons 2 , hthe
3 , houtput
4 are the neurons
layer. of the
In the casehidden layer,
of energy y1 , y2 , y3 , ythe
andforecasting,
load 4 areinput
the neurons
is the pastof
the output layer. In the case of energy load forecasting, the input is the past
energy load, and the output is the future energy load. Although, the MLP structure is very simple, it energy load, and the output
isprovides
the future energy
good load.
results in Although, the MLP structure
many applications. The mostiscommonly
very simple, usedit provides
algorithm good for results in manyis
MLP training
applications.
the backpropagationThe mostalgorithm.
commonly used algorithm for MLP training is the backpropagation algorithm.

Figure 1. The Multilayer Perceptron (MLP) structure.


Figure 1. The Multilayer Perceptron (MLP) structure.

Although MLPs are very good in modelling and patter recognition, the convolutional neural
Although
networks MLPs
(CNNs) are very
provide goodaccuracy
better in modelling and patter
in highly recognition,
non-linear the convolutional
problems, such as energy neural
load
networks (CNNs) provide better accuracy in highly non-linear problems,
forecasting. The CNN uses the concept of weight sharing. The one-dimensional convolution and such as energy load
forecasting.
pooling layer The
areCNN uses the
presented concept
in Figure of weight
2. The lines insharing.
the sameThe one-dimensional
color denote the same convolution and
sharing weight,
pooling layer are presented in Figure 2. The lines in the same color denote the same
and sets of the sharing weights can be treated as kernels. After the convolution process, the inputs x1, sharing weight,
and
x2, xsets of the sharing weights can be treated as kernels. After the convolution process, the inputs x1 ,
3, ... , x6 are transformed to the feature maps c1, c2, c3, c4. The next step in Figure 2 is pooling,
xwherein
,
2 3x , . . , x6feature
. the are transformed to the feature
map of convolution layermaps c1 , c2 , cand
is sampled 3 , c4 its
. The next stepisin
dimension Figure 2For
reduced. is pooling,
instance,
wherein
in Figure 2 dimension of the feature map is 4, and after pooling process that dimension is reducedin
the feature map of convolution layer is sampled and its dimension is reduced. For instance, to
Figure
2. The 2process
dimension of the is
of pooling feature map is 4,procedure
an important and after topooling
extractprocess that dimension
the important is reduced
convolution to 2.
features.
The process of pooling is an important procedure to extract the important convolution features.
Energies 2018, 11, 213 4 of 13
Energies 2018, 11, 213 4 of 14
Energies 2018, 11, 213 4 of 14

Figure 2.
Figure 2. The
The one-dimensional
one-dimensional (1D)
(1D) convolution
convolution and
andpooling
poolinglayer.
layer.
Figure 2. The one-dimensional (1D) convolution and pooling layer.

The other popular solution of the forecasting problem is Long Short Term Memory network
The
The other
other popular
popular solution
solution of
of the
the forecasting
forecasting problem problem isis Long Long Short
Short Term
Term Memory
Memory network
network
(LSTM) [33]. The LSTM is a recurrent neural network, which has been used to solve many time
(LSTM)
(LSTM) [33].
[33]. The
TheLSTM
LSTMisis aa recurrent
recurrent neural
neural network,
network, which which has
has been
been used
used to
to solve
solve many
many time
time
sequence problems. The structure of LSTM is shown in Figure 3, and its operation is illustrated by
sequence
sequence problems. The structure
structure of
ofLSTM
LSTMisisshownshownininFigure Figure3, 3,
andand
itsits operation
operation is illustrated
is illustrated by
by the
the following equations:
the following
following equations:
equations:
f = t σ=(W ⋅[h , x ] + b ) b f ) (1)
f t= fσ
t ⋅ [ hf ·t −[,1hxt−]t1+, xbt ] f+
(Wσ(fW f t −1 t ) f
(1)(1)
i = σ (W ⋅ [h·t −[1h, xt ],+xb]i )+ b ) (2)
it t=itσ=(Wσi(i⋅W[hit −1 , xtt−]1+ bti ) i (2)(2)
C  = tanh(W ⋅ [h , x ] + b ) (3)
C t t=
Cettanh(
= tanhWC(C⋅W [hCt −t·1−,1[hxt−]t +
1, b
xCt C])+ bC ) (3)(3)
C = ft × Ct −1 + it ×Ct  (4)
Ct t= ft ×f tC×
Ct = t −1C +t−it 1×+Cit t × Cet (4)(4)
ot = σ (Wo ⋅ [ht −1 , xt ] + bo ) (5)
ot =o σ=(Wσo(W ⋅ [ht −·1[,hxt ] +, xbo])+ b ) (5)(5)
th = o × otanh( t−C1 ) t o
ht t= ot t× tanh(Ct )t (6)
(6)
ht = ot × tanh(Ct ) (6)
where xt is the network input, and ht is the output of hidden layer, σ denotes the sigmoidal function,
where xt is the network input, and ht is the output of hidden layer, σ denotes the sigmoidal function,
 denotes
Ct denotes
where
Ct is thext is the
cell network
state, and input, and htthe
is the output value
candidate of hidden layer,
of the σ denotes
state. Besides, the sigmoidal
there are threefunction,
gates in
C t is the cell state, and C t denotes thethe candidate
candidate value
value of
of the
the state.
state. Besides, there are three gates in
CLSTM:
t is the cell state, and Ce
it is the input gate, Besides, there are three
t o is the output gate, and ft is the forget gate. The LSTM is designed for gates in
t
LSTM: i is the input gate, o is the output gate,
LSTM: it is the input gate, ott is the output gate, and ft
t and ft is the forget gate. The LSTM is designed
The LSTM is designed for for
solving the long-term dependency problem. In general, the LSTM provides good forecasting results.
solving
solving the the long-term
long-term dependency
dependency problem.
problem. In In general,
general, the
the LSTM
LSTM provides
provides good forecasting results.

C
Ct t

Figure 3. The Long Short Term Memory network (LSTM) structure.


Figure
Figure3.3.The
TheLong
LongShort
ShortTerm
Term Memory
Memory network
network (LSTM)
(LSTM) structure.
structure.
3. The Proposed Deep Neural Network
3.
3. The
TheProposed
ProposedDeepDeepNeural
Neural Network
Network
The structure of the proposed deep neural network DeepEnergy is shown in Figure 4. Unlike
The
The structure
structure of the
of the proposed
proposeddeep
deepneural
neuralnetwork
networkDeepEnergy
DeepEnergy is is shown
shown in Figure
in CNN
Figure 4. Unlike
4. Unlike the
the general forecasting method based on the LSTM, the DeepEnergy uses the structure. The
the general
general forecasting
forecasting method
method based
based on on
the the
LSTM, LSTM,
the the DeepEnergy
DeepEnergy uses uses
the the
CNN CNN structure.
structure. The The
input
input layer denotes the information on past load, and the output values represent the future energy
input
layer layer denotes the information on load,
past load, andoutput
the output values represent the future energy
load. denotes
There arethetwo
information on past
main processes and the
in DeepEnergy, featurevalues represent
extraction, the future
and forecasting. energy load.
The feature
load.
There There are
are two two main
main processesprocesses in DeepEnergy,
in DeepEnergy, feature feature extraction,
extraction,layers and forecasting.
and forecasting. The featureThe feature
extraction
extraction in DeepEnergy is performed by three convolution (Conv1, Conv2, and Conv3) and
extraction in DeepEnergy is performed by three convolution layers (Conv1, Conv2, and Conv3) and
three pooling layers (Pooling1, Pooling2, and Pooling3). The Conv1–Conv3 are one-dimensional (1D)
three pooling layers (Pooling1, Pooling2, and Pooling3). The Conv1–Conv3 are one-dimensional (1D)
convolutions, and the feature maps are all activated by the Rectified Linear Unit (ReLU) function.
convolutions, and the feature maps are all activated by the Rectified Linear Unit (ReLU) function.
Energies 2018, 11, 213 5 of 13

in DeepEnergy is performed by three convolution layers (Conv1, Conv2, and Conv3) and three pooling
layers (Pooling1, Pooling2, and Pooling3). The Conv1–Conv3 are one-dimensional (1D) convolutions,
and the feature maps are all activated by the Rectified Linear Unit (ReLU) function. Besides, the kernel
Energies
sizes 2018, 11, 213
of Conv1, Conv2, and Conv3 are 9, 5, 5, respectively, and the depths of the feature maps are 5 of16,
14 32,

64, respectively. The pooling method of Pooling1 to Pooling3 is the max pooling, and the pooling size
Besides, the kernel sizes of Conv1, Conv2, and Conv3 are 9, 5, 5, respectively, and the depths of the
is equal to 2. Therefore, after the pooling process, the dimension of the feature map will be divided by
feature maps are 16, 32, 64, respectively. The pooling method of Pooling1 to Pooling3 is the max pooling,
2 to extract the important features of the deeper layers.
and the pooling size is equal to 2. Therefore, after the pooling process, the dimension of the feature map
In the forecasting, the first step is to flat the Pooling3 layer into one dimension and construct
will be divided by 2 to extract the important features of the deeper layers.
a fully Inconnected structure between Flatten layer and Output layer. In order to fit the values previously
the forecasting, the first step is to flat the Pooling3 layer into one dimension and construct a
normalized in the range [0,between
fully connected structure 1], the sigmoidal
Flatten layerfunction is chosen
and Output layer.asInan activation
order to fit thefunction of the output
values previously
layer. Furthermore,
normalized in order
in the range [0, 1], to
theovercome
sigmoidal the overfitting
function is chosen problem, the dropout
as an activation technology
function of the output[34] is
adopted
layer. Furthermore, in order to overcome the overfitting problem, the dropout technology [34] is in
in the fully connected layer. Namely, the dropout is an efficient way to prevent overfitting
artificial
adoptedneural
in the network. During
fully connected the Namely,
layer. training theprocess,
dropoutneurons are randomly
is an efficient way to “dead”. As shown in
prevent overfitting
Figure 4, theneural
in artificial outputnetwork.
values of chosen
During theneurons
training (the grayneurons
process, circles)arearerandomly
equal to zero in certain
“dead”. As shown training
in
iteration.
Figure 4,The thechosen
output neurons
values ofare randomly
chosen neurons changed during
(the gray training
circles) process.
are equal to zero in certain training
Furthermore,
iteration. The chosen the neurons
flowchart areofrandomly
proposedchanged
DeepEnergyduringistraining
represented in Figure 5. Firstly, the raw
process.
energyFurthermore,
load data arethe flowchart
loaded into of theproposed
memory.DeepEnergy
Then, the datais represented in Figure
preprocessing 5. Firstly,
is executed andthedata
raware
energy load
normalized indata are loaded
the range [0, 1]into the memory.
in order to fit theThen, the data preprocessing
characteristic of the machine is executed
learning and
model.dataFor
arethe
normalized in the range [0, 1] in order to fit the characteristic of the machine
purpose of validation of DeepEnergy generalization performance, the data are split into training data learning model. For the
andpurpose
testingofdata.
validation of DeepEnergy
The training data aregeneralization
used for training performance,
of proposed themodel.
data areAfter
split the
intotraining
training process,
data
theand testing data.
proposed The training
DeepEnergy data areisused
network for training
created of proposed
and initialized. model.
Before theAfter the training
training, process,
the training data
the proposed DeepEnergy network is created and initialized. Before
are randomly shuffled to force the proposed model to learn complicated relationships between the training, the training data
input
are randomly shuffled to force the proposed model to learn complicated relationships between input
and output data. The training data are split into several batches. According to the order of shuffled
and output data. The training data are split into several batches. According to the order of shuffled
data, the model is trained on all of the batches. During the training process, if the desired Mean Square
data, the model is trained on all of the batches. During the training process, if the desired Mean
Error (MSE) is not reached in the current epoch, the training will continue until the maximal number
Square Error (MSE) is not reached in the current epoch, the training will continue until the maximal
of epochs or desired MSE is reached. On the contrary, if the maximal number of epochs is reached,
number of epochs or desired MSE is reached. On the contrary, if the maximal number of epochs is
then the training process will stop regardless the MSE value. Final performances are evaluated to
reached, then the training process will stop regardless the MSE value. Final performances are
demonstrate
evaluated tofeasibility
demonstrate andfeasibility
practicability of the proposed
and practicability of themethod.
proposed method.

Input Conv1 Pooling1 Conv2 Pooling2 Conv3 Pooling3 Flatten Output


...

...
...
...
...
...
...

...
...
...
...
...
...

...

...
...
...
...
...
...
...

...
...
...
...
...
...

...
...
...
...
...
...

...
...
...
...
...
...

Feature extraction Forecasting

Figure 4. The DeepEnergy structure.


Figure 4. The DeepEnergy structure.
Energies 2018, 11, 213 6 of 13
Energies 2018, 11, 213 6 of 14

Start Shuffle the order


of training data

Load the raw


Train the model No
energy load data
on batchs

Data
preprocessing
Network
converged?

Split the training


and testing data Yes

Performance
evaluation on
Initialize of the testing data
neural network

End

Figure The
5. 5.
Figure DeepEnergy
The flowchart.
DeepEnergy flowchart.

4. 4.
Experimental Results
Experimental Results
InInthe experiment,
the experiment, thetheUSAUSA District
Districtpublic
public consumption
consumption dataset
datasetand
and electric
electricload
loaddataset
dataset from
from
2016 provided by the Electric Reliability Council of Texas were used. Since
2016 provided by the Electric Reliability Council of Texas were used. Since then, the support vector then, the support vector
machine
machine (SVM)(SVM) [35][35]
is a popular
is a popular machine learning
machine technology,
learning in experiment;
technology, the radialthe
in experiment; basis function
radial basis
(RBF) kernels
function (RBF)of SVM
kernels were chosen
of SVM to demonstrate
were the SVM performance.
chosen to demonstrate Besides, the
the SVM performance. random
Besides, the
forest (RF) [36], decision tree (DT) [37], MLP, LSTM, and proposed
random forest (RF) [36], decision tree (DT) [37], MLP, LSTM, and proposed DeepEnergy network DeepEnergy network were also
implemented and tested. The
were also implemented andresults
tested.ofThe
loadresults
forecasting
of load byforecasting
all of the methods
by all ofare theshown
methodsin Figures
are shown6–11.in
InFigures
the experiment,
6–11. Inthe thetraining
experiment,data were two-month
the training data data, and
were test data were
two-month data, one-month data. were
and test data In order
one-
tomonth
evaluate theIn
data. performances
order to evaluate of allthe
listed methods, the
performances of dataset
all listedwas divided
methods, theinto 10 partitions.
dataset was divided In the
into
first
10 partition,
partitions.training
In the data consisted of
first partition, energydata
training loadconsisted
data collected in January
of energy load dataand February
collected2016, and
in January
test
anddata consisted
February of data
2016, collected
and test in March of
data consisted 2016.
dataIncollected
the second partition,
in March 2016. training
In the data
secondwere data
partition,
collected
trainingindataFebruary and March
were data collected 2016,
in and test data
February andwereMarch data collected
2016, in April
and test data were2016.data
The following
collected in
partitions
April 2016. canThebe deduced
following bypartitions
the same can analogy.
be deduced by the same analogy.
InInFigures
Figure 6–11,
6–11, red curves denote denotethe theforecasting
forecastingresults
resultsofofthethe corresponding
corresponding models,
models, andandblue
blue curves
curves represent
represent thethe groundtruth.
ground truth.TheThevertical
verticalaxesaxes represent
represent the energyenergyload load(MWh),
(MWh),and andthethe
horizontal
horizontalaxes axesdenote
denotethe thetime
time(hour). The energy
(hour). The energyload loadfrom
fromthe thepast
past(24(24 × h7)was
× 7) h wasusedused
as an asinput
an
input
of theof the forecasting
forecasting model,
model, and and predicted
predicted energyload
energy loadininthe next (24 ×
the next × 3) hh waswasan anoutput
outputofofthe the
forecasting
forecasting model.
model. After
After thethe
models
models received
receivedthethe
past past × 7)
(24(24 h data,
× 7) they
h data, theyforecasted
forecasted thethe
next
next × 3)
(24(24 × 3)
h energy
h energy load, red red
load, curves in Figures
curves in Figures6–11. 6–11.
Besides, the correct
Besides, information
the correct is illustrated
information by blue curves.
is illustrated by blue
The differences
curves. between red
The differences and blue
between redcurves
and bluedenote
curves thedenote
performances of the corresponding
the performances models.
of the corresponding
For the sake
models. For ofthe
comparison fairness, testing
sake of comparison datatesting
fairness, were not dataused
were during the training
not used during the process of models.
training process
According
of models. to According
the results to presented
the results in Figures
presented6–11,in the proposed
Figures 6–11, DeepEnergy
the proposednetwork DeepEnergyhas the best
network
prediction
has the best performance
predictionamong all of the
performance models.
among all of the models.
Energies 2018, 11, 213 7 of 13
Energies 2018, 11, 213 7 of 14

Energies 2018, 11, 213 7 of 14

(a) (b)

(a) (b)

(c) (d)

(c) (d)

(e) (f)

Figure 6. The forecasting


(e)results of support vector machine (SVM): (a) Partial
(f) results A; (b) Partial
Figure 6. The forecasting results of support vector machine (SVM): (a) Partial results A; (b) Partial
results B; (c) Partial results C; (d) Partial results D; (e) Partial results E; (f) Partial results F.
Figure
results B; 6. The forecasting
(c) Partial results
results C; (d) of support
Partial resultsvector machine
D; (e) Partial(SVM):
results(a)E;Partial resultsresults
(f) Partial A; (b) Partial
F.
results B; (c) Partial results C; (d) Partial results D; (e) Partial results E; (f) Partial results F.

(a) (b)

(a) (b)

(c) (d)

(c) (d)

(e) (f)

(e) (f)

Figure 7. The forecasting results of random forest (RF): (a) Partial results A; (b) Partial results B;
(c) Partial results C; (d) Partial results D; (e) Partial results E; (f) Partial results F.
Energies 2018, 11, 213 8 of 14
Energies 2018, 11, 213 8 of 14
Energies 2018, 11, 213 7. The forecasting results of random forest (RF): (a) Partial results A; (b) Partial results B; (c)
Figure 8 of 13
Partial 7.
Figure results C; (d) Partial
The forecasting results
results of D; (e) Partial
random results
forest (RF): E;
(a)(f) Partialresults
Partial resultsA;F.(b) Partial results B; (c)
Partial results C; (d) Partial results D; (e) Partial results E; (f) Partial results F.

(a) (b)
(a) (b)

(c) (d)
(c) (d)

(e) (f)
(e) (f)
Figure 8. The forecasting results of decision tree (DT): (a) Partial results A; (b) Partial results B; (c)
Figure 8. The forecasting results of decision tree (DT): (a) Partial results A; (b) Partial results B;
Partial 8.
Figure results C; (d) Partial
The forecasting results
results ofD; (e) Partial
decision treeresults E; (f)
(DT): (a) Partial
Partial results
results A;F.(b) Partial results B; (c)
(c) Partial results
Partial C; C;
results (d)(d)Partial
Partialresults D;(e)
results D; (e)Partial
Partial results
results E; (f)E;Partial
(f) Partial
resultsresults
F. F.

(a) (b)
(a) (b)

Energies 2018, 11, 213


(c) (d) 9 of 14
(c) (d)

(e) (f)
Figure 9. The forecasting results of Multilayer Perceptron (MLP): (a) Partial results A; (b) Partial
Figure 9. The forecasting results of Multilayer Perceptron (MLP): (a) Partial results A; (b) Partial results B;
results B; (c) Partial results C; (d) Partial results D; (e) Partial results E; (f) Partial results F.
(c) Partial results C; (d) Partial results D; (e) Partial results E; (f) Partial results F.
(e) (f)
Energies 2018, Figure
11, 213 9. The forecasting (e)results of Multilayer Perceptron (MLP): (a) Partial (f) results A; (b) Partial 9 of 13
results B;
Figure 9. (c)
ThePartial results results
forecasting C; (d) Partial results D;
of Multilayer (e) Partial (MLP):
Perceptron results E;
(a)(f) Partialresults
Partial resultsA;
F. (b) Partial
results B; (c) Partial results C; (d) Partial results D; (e) Partial results E; (f) Partial results F.

(a) (b)
(a) (b)

(c) (d)
(c) (d)

(e) (f)
(e)results of LSTM: (a) Partial results A; (b) Partial(f)
Figure 10. The forecasting results B; (c) Partial results
Figure 10. The forecasting
C; (d) Partial
results of LSTM: (a)(f)Partial results A; (b) Partial results B; (c) Partial results C;
Figure 10. Theresults D; (e) results
forecasting Partial of
results
LSTM:E; (a) Partial results
Partial results A;
F. (b) Partial results B; (c) Partial results
(d) Partial results D; (e) Partial results E; (f) Partial results F.
C; (d) Partial results D; (e) Partial results E; (f) Partial results F.

(a) (b)
Energies 2018, 11, 213 10 of 14
(a) (b)

(c) (d)

(e) (f)
Figure 11. The forecasting results of proposed DeepEnergy: (a) Partial results A; (b) Partial results B;
Figure 11. The forecasting results of proposed DeepEnergy: (a) Partial results A; (b) Partial results B;
(c) Partial results C; (d) Partial results D; (e) Partial results E; (f) Partial results F.
(c) Partial results C; (d) Partial results D; (e) Partial results E; (f) Partial results F.
In order to evaluate the performance of forecasting models more accurately, the Mean Absolute
Percentage Error (MAPE) and Cumulative Variation of Root Mean Square Error (CV-RMSE) were
employed. The MAPE and CV-RMSE are defined by Equations (7) and (8), respectively, where yn
denotes the measured value, yˆ n is the estimated value, and N represents the sample size.

1 N
y n − yˆ n
M APE =
N

n =1 yn
(7)
Energies 2018, 11, 213 10 of 13

In order to evaluate the performance of forecasting models more accurately, the Mean Absolute
Percentage Error (MAPE) and Cumulative Variation of Root Mean Square Error (CV-RMSE) were
employed. The MAPE and CV-RMSE are defined by Equations (7) and (8), respectively, where yn
denotes the measured value, ŷn is the estimated value, and N represents the sample size.

1 N yn − ŷn

N n∑
MAPE = yn (7)
=1
s
N 
yn −ŷn
2
1
N ∑ yn
n =1
CV − RMSE = (8)
N
1
N ∑ yn
n =1

The detailed experimental results are presented numerically in Tables 1 and 2. As shown in
Tables 1 and 2, the MAPE and CV-RMSE of the DeepEnergy model are the smallest and the goodness
of error is the best among all models, namely, average MAPE and CV-RMSE are 9.77% and 11.65%,
respectively. The MAPE of MLP model is the largest among all of the models; an average error is
about 15.47%. On the other hand, the CV-RMSE of SVM model is the largest among all models; an
average error is about 17.47%. According to the average MAPE and CV-RMSE values, the electric load
forecasting accuracy of tested models in descending order is as follows: DeepEnergy, RF, LSTM, DT,
SVM, and MLP.

Table 1. The experimental results in terms of Mean Absolute Percentage Error (MAPE) given
in percentages.

Test SVM RF DT MLP LSTM DeepEnergy


#1 7.327408 7.639133 8.46043 9.164315 10.40804813 7.226127
#2 7.550818 8.196129 10.23476 11.14954 9.970662683 8.244051
#3 13.07929 10.11102 12.14039 19.99848 14.85568499 11.00656
#4 16.15765 17.27957 19.86511 22.45493 12.83487893 12.17574
#5 5.183255 6.570061 8.50582 15.01856 5.479091542 5.41808
#6 10.33686 9.944028 11.11948 10.94331 11.7681534 9.070998
#7 8.934657 6.698508 8.634132 7.722149 7.583802292 9.275215
#8 18.5432 16.09926 17.17215 16.93843 15.6574951 13.2776
#9 49.97551 17.9049 21.29354 29.06767 16.31443679 11.18214
#10 11.20804 8.221766 10.68665 12.20551 8.390061493 10.80571
Average 14.82967 10.86644 12.81125 15.46629 11.32623153 9.768222

Table 2. The experimental results in terms of Cumulative Variation of Root Mean Square Error
(CV-RMSE) given in percentages.

Test SVM RF DT MLP LSTM DeepEnergy


#1 9.058992 9.423908 10.57686 10.65546 12.16246177 8.948922
#2 10.14701 10.63412 12.99834 13.91199 12.19377007 10.46165
#3 17.02552 12.42314 14.58249 23.2753 16.9291218 13.30116
#4 21.22162 21.1038 24.48298 23.63544 14.13596516 14.63439
#5 6.690527 7.942747 10.10017 15.44461 6.334195125 6.653999
#6 11.88856 11.6989 13.39033 12.20149 12.96057349 10.74021
#7 10.77881 7.871596 10.35254 8.716806 8.681353107 10.85454
#8 19.49707 17.09079 18.95726 17.73124 16.55737557 14.51027
#9 54.58171 19.91185 24.84425 29.37466 17.66342548 13.01906
#10 13.80167 10.15117 13.06351 13.39278 10.20235927 13.47003
Average 17.46915 12.8252 15.33487 16.83398 12.78206008 11.65942
Energies 2018, 11, 213 11 of 13

It is obvious that red curve in Figure 11, which denotes the DeepEnergy algorithm, is better than
other curves in Figures 6–10, which further verifies that the proposed DeepEnergy algorithm has the
best prediction performance. Therefore, it is proven that the DeepEnergy STLF algorithm proposed
in the paper is practical and effective. Although the LSTM has good performance in time sequence
problems, in this study, the reduction of training loss is still not fast enough to handle this forecasting
problem because the size of input and output data is too large for the traditional LSTM neural network.
Therefore, the traditional LSTM is not suitable for this kind of prediction. Finally, the experimental
results show that proposed DeepEnergy network provides the best results in energy load forecasting.

5. Discussion
The traditional machine learning methods, such as SVM, random forest, and decision tree, are
widely used in many applications. In this study, these methods also provide acceptable results.
In aspect of SVM, the supporting vectors are mapped into a higher dimensional space by the kernel
function. Therefore, the selection of kernel function is very important. In order to achieve the goal
of nonlinear energy load forecasting, the RBF is chosen as a SVM kernel. When compared with the
SVM, the learning concept of decision tree is much simpler. Namely, the decision tree is a flowchart
structure easy to understand and interpret. However, only one decision tree does not have the ability
to solve complicated problems. Therefore, the random forest, which represents the combination of
numerous decision trees, provides the model ensemble solution. In this paper, the experimental
results of random forest are better than those of decision tree and SVM, which proves that the model
ensemble solution is effective in the energy load forecasting. In aspect of the neural networks, the
MLP is the simplest ANN structure. Although the MLP can model the nonlinear energy forecasting
task, its performance in this experiment is not outstanding. On the other hand, the LSTM considers
data relationships in time steps during the training. According to the result, the LSTM can deal with
the time sequence problems, and the forecasting trend is marginally correct. However, the proposed
CNN structure, named the DeepEnergy, has the best results in the experiment. The experiments
demonstrate that the most important feature can be extracted by the designed 1D convolution and
pooling layers. This verification also proves the CNN structure is effective in the forecasting, and the
proposed DeepEnergy gives the outstanding results. This paper not only provides the comparison of
the traditional machine learning and deep learning methods, but also gives a new research direction in
the energy load forecasting.

6. Conclusions
This paper proposes a powerful deep convolutional neural network model (DeepEnergy) for energy
load forecasting. The proposed network is validated by experiment with the load data from the past
seven days. In the experiment, the data from coast area of the USA were used and historical electricity
demand from consumers was considered. According to the experimental results, the DeepEnergy
can precisely predict energy load in the next three days. In addition, the proposed algorithm was
compared with five AI algorithms that were commonly used in load forecasting. The comparison
showed that performance of DeepEnergy was the best among all tested algorithms, namely the
DeepEnergy had the lowest values of both MAPE and CV-RMSE. According to all of the obtained
results, the proposed method can reduce monitoring expenses, initial cost of hardware components,
and long-term maintenance costs in the future smart grids. Simultaneously, the results verify that
proposed DeepEnergy STLF method has strong generalization ability and robustness, thus it can
achieve very good forecasting performance.

Acknowledgments: This work was supported by the Ministry of Science and Technology, Taiwan, Republic of China,
under Grants MOST 106-2218-E-153-001-MY3.
Author Contributions: Ping-Huan Kuo wrote the program and designed the DNN model. Chiou-Jye Huang
planned this study and collected the energy load dataset. Ping-Huan Kuo and Chiou-Jye Huang contributed in
drafted and revised manuscript.
Energies 2018, 11, 213 12 of 13

Conflicts of Interest: The authors declare no conflict of interest.

References
1. Raza, M.Q.; Khosravi, A. A review on artificial intelligence based load demand forecasting techniques for
smart grid and buildings. Renew. Sustain. Energy Rev. 2015, 50, 1352–1372. [CrossRef]
2. Da Graça Carvalho, M.; Bonifacio, M.; Dechamps, P. Building a low carbon society. Energy 2011, 36, 1842–1847.
[CrossRef]
3. Jiang, B.; Sun, Z.; Liu, M. China’s energy development strategy under the low-carbon economy. Energy 2010,
35, 4257–4264. [CrossRef]
4. Cho, H.; Goude, Y.; Brossat, X.; Yao, Q. Modeling and forecasting daily electricity load curves: A hybrid approach.
J. Am. Stat. Assoc. 2013, 108, 7–21. [CrossRef]
5. Javed, F.; Arshad, N.; Wallin, F.; Vassileva, I.; Dahlquist, E. Forecasting for demand response in smart
grids: An analysis on use of anthropologic and structural data and short term multiple loads forecasting.
Appl. Energy 2012, 96, 150–160. [CrossRef]
6. Iwafune, Y.; Yagita, Y.; Ikegami, T.; Ogimoto, K. Short-term forecasting of residential building load for
distributed energy management. In Proceedings of the 2014 IEEE International Energy Conference, Cavtat,
Croatia, 13–16 May 2014; pp. 1197–1204. [CrossRef]
7. Short Term Electricity Load Forecasting on Varying Levels of Aggregation. Available online: https://arxiv.
org/abs/1404.0058v3 (accessed on 11 January 2018).
8. Gerwig, C. Short term load forecasting for residential buildings—An extensive literature review. Smart Innov. Syst.
2015, 39, 181–193.
9. Hippert, H.S.; Pedreira, C.E.; Souza, R.C. Neural networks for short-term load forecasting: A review and evaluation.
IEEE Trans. Power Syst. 2001, 16, 44–55. [CrossRef]
10. Metaxiotis, K.; Kagiannas, A.; Askounis, D.; Psarras, J. Artificial intelligence in short term electric load
forecasting: A state-of-the-art survey for the researcher. Energy Convers. Manag. 2003, 44, 1524–1534. [CrossRef]
11. Tzafestas, S.; Tzafestas, E. Computational intelligence techniques for short-term electric load forecasting.
J. Intell. Robot. Syst. 2001, 31, 7–68. [CrossRef]
12. Ghayekhloo, M.; Menhaj, M.B.; Ghofrani, M. A hybrid short-term load forecasting with a new data
preprocessing framework. Electr. Power Syst. Res. 2015, 119, 138–148. [CrossRef]
13. Xia, C.; Wang, J.; McMenemy, K. Short, medium and long term load forecasting model and virtual load
forecaster based on radial basis function neural networks. Int. J. Electr. Power Energy Syst. 2010, 32, 743–750.
[CrossRef]
14. Bughin, J.; Hazan, E.; Ramaswamy, S.; Chui, M. Artificial Intelligence—The Next Digital Frontier? Mckinsey
Global Institute: New York, NY, USA, 2017; pp. 1–80.
15. Oh, C.; Lee, T.; Kim, Y.; Park, S.; Kwon, S.B.; Suh, B. Us vs. Them: Understanding Artificial Intelligence
Technophobia over the Google DeepMind Challenge Match. In Proceedings of the 2017 CHI Conference on
Human Factors in Computing Systems, Denver, CO, USA, 6–11 May 2017; pp. 2523–2534. [CrossRef]
16. Skilton, M.; Hovsepian, F. Example Case Studies of Impact of Artificial Intelligence on Jobs and Productivity.
In 4th Industrial Revolution; National Academies Press: Washingtom, DC, USA, 2018; pp. 269–291.
17. Ekonomou, L.; Christodoulou, C.A.; Mladenov, V. A short-term load forecasting method using artificial
neural networks and wavelet analysis. Int. J. Power Syst. 2016, 1, 64–68.
18. Valgaev, O.; Kupzog, F. Low-Voltage Power Demand Forecasting Using K-Nearest Neighbors Approach.
In Proceedings of the Innovative Smart Grid Technologies—Asia (ISGT-Asia), Melbourne, VIC, Australia,
28 November–1 December 2016.
19. Valgaev, O.; Kupzog, F. Building Power Demand Forecasting Using K-Nearest Neighbors Model—Initial Approach.
In Proceedings of the IEEE PES Asia-Pacific Power Energy Conference, Xi’an, China, 25–28 October 2016;
pp. 1055–1060.
20. Humeau, S.; Wijaya, T.K.; Vasirani, M.; Aberer, K. Electricity load forecasting for residential customers:
Exploiting aggregation and correlation between households. In Proceedings of the 2013 Sustainable Internet
and ICT for Sustainability, Palermo, Italy, 30–31 October 2013.
Energies 2018, 11, 213 13 of 13

21. Veit, A.; Goebel, C.; Tidke, R.; Doblander, C.; Jacobsen, H. Household electricity demand forecasting:
Benchmarking state-of-the-art method. In Proceedings of the 5th International Confrerence Future Energy
Systems, Cambridge, UK, 11–13 June 2014; pp. 233–234. [CrossRef]
22. Jetcheva, J.G.; Majidpour, M.; Chen, W. Neural network model ensembles for building-level electricity load
forecasts. Energy Build. 2014, 84, 214–223. [CrossRef]
23. Kardakos, E.G.; Alexiadis, M.C.; Vagropoulos, S.I.; Simoglou, C.K.; Biskas, P.N.; Bakirtzis, A.G. Application
of time series and artificial neural network models in short-term forecasting of PV power generation.
In Proceedings of the 2013 48th International Universities’ Power Engineering Conference, Dublin, Ireland,
2–5 September 2013; pp. 1–6. [CrossRef]
24. Fujimoto, Y.; Hayashi, Y. Pattern sequence-based energy demand forecast using photovoltaic energy records.
In Proceedings of the 2012 International Conference on Renewable Energy Research and Applications,
Nagasaki, Japan, 11–14 November 2012.
25. Chaouch, M. Clustering-based improvement of nonparametric functional time series forecasting: Application
to intra-day household-level load curves. IEEE Trans. Smart Grid 2014, 5, 411–419. [CrossRef]
26. Niu, D.; Dai, S. A short-term load forecasting model with a modified particle swarm optimization algorithm
and least squares support vector machine based on the denoising method of empirical mode decomposition
and grey relational analysis. Energies 2017, 10, 408. [CrossRef]
27. Deo, R.C.; Wen, X.; Qi, F. A wavelet-coupled support vector machine model for forecasting global incident
solar radiation using limited meteorological dataset. Appl. Energy 2016, 168, 568–593. [CrossRef]
28. Soubdhan, T.; Ndong, J.; Ould-Baba, H.; Do, M.T. A robust forecasting framework based on the Kalman
filtering approach with a twofold parameter tuning procedure: Application to solar and photovoltaic
prediction. Sol. Energy 2016, 131, 246–259. [CrossRef]
29. Hahn, H.; Meyer-Nieberg, S.; Pickl, S. Electric load forecasting methods: Tools for decision making. Eur. J.
Oper. Res. 2009, 199, 902–907. [CrossRef]
30. Zhang, G.; Patuwo, B.E.; Hu, M.Y. Forecasting with artificial neural networks: The state of the art. Int. J. Forecast.
1998, 14, 35–62. [CrossRef]
31. Pappas, S.S.; Ekonomou, L.; Moussas, V.C.; Karampelas, P.; Katsikas, S.K. Adaptive load forecasting of the
Hellenic electric grid. J. Zhejiang Univ. A 2008, 9, 1724–1730. [CrossRef]
32. White, B.W.; Rosenblatt, F. Principles of neurodynamics: Perceptrons and the theory of brain mechanisms.
Am. J. Psychol. 1963, 76, 705. [CrossRef]
33. Hochreiter, S.; Urgen Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780.
[CrossRef] [PubMed]
34. Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A Simple Way to Prevent
Neural Networks from Overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [CrossRef]
35. Suykens, J.A.K.; Vandewalle, J. Least squares support vector machine classifiers. Neural Process. Lett. 1999, 9,
293–300. [CrossRef]
36. Liaw, A.; Wiener, M. Classification and Regression by randomForest. R News 2002, 2, 18–22. [CrossRef]
37. Safavian, S.R.; Landgrebe, D. A Survey of Decision Tree Classifier Methodology. IEEE Trans. Syst. Man Cybern.
1991, 21, 660–674. [CrossRef]

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).

You might also like