You are on page 1of 13

This article has been accepted for inclusion in a future issue of this journal.

Content is final as presented, with the exception of pagination.

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 1

Short-Term Load and Wind Power Forecasting


Using Neural Network-Based Prediction Intervals
Hao Quan, Student Member, IEEE, Dipti Srinivasan, Senior Member, IEEE, and Abbas Khosravi, Member, IEEE

Abstract— Electrical power systems are evolving from today’s and the renewable energy forecasting are required for the
centralized bulk systems to more decentralized systems. Pene- control and scheduling of power systems and affect the system
trations of renewable energies, such as wind and solar power, reliability and fuel consumptions.
significantly increase the level of uncertainty in power sys-
tems. Accurate load forecasting becomes more complex, yet Forecasting models that are popularly applied to electrical
more important for management of power systems. Traditional load and the renewable energy forecasting can be classified
methods for generating point forecasts of load demands cannot into three categories: 1) statistical models, such as aurore-
properly handle uncertainties in system operations. To quantify gressive (AR), AR integrated moving average (ARIMA), and
potential uncertainties associated with forecasts, this paper imple- exponential smoothing (ES) models [1], [3], [4]; 2) artificial
ments a neural network (NN)-based method for the construction
of prediction intervals (PIs). A newly introduced method, called intelligence models, such as neural networks (NNs) [5]–[9],
lower upper bound estimation (LUBE), is applied and extended fuzzy logic systems (FLSs) [10]–[12], expert systems, and so
to develop PIs using NN models. A new problem formulation is on; and 3) hybrid models as neuro-fuzzy systems [13], [14],
proposed, which translates the primary multiobjective problem to name a few. In [1], different forecasting methods including
into a constrained single-objective problem. Compared with the ARIMA modeling, periodic AR modeling, double seasonality
cost function, this new formulation is closer to the primary
problem and has fewer parameters. Particle swarm optimization of Holt-Winters ES, and principal component analysis are
(PSO) integrated with the mutation operator is used to solve considered for STLF. Electricity demands from 10 European
the problem. Electrical demands from Singapore and New countries are used as case studies to compare these methods.
South Wales (Australia), as well as wind power generation from In [4], two time series models are proposed, namely the
Capital Wind Farm, are used to validate the PSO-based LUBE multiplicative decomposition model and the seasonal ARIMA
method. Comparative results show that the proposed method can
construct higher quality PIs for load and wind power generation model. These two models are implemented and compared
forecasts in a short time. for STLF using Singapore (SG) data sets. Hippert et al. [5]
provide a comprehensive review of STLF using NN models.
Index Terms— Load forecasting, neural network (NN), particle
swarm optimization (PSO), prediction interval (PI), uncertainty, Different forecasting strategies, the iterative, multimodel, and
wind power. single-model multivariate forecasting are investigated. Issues
such as NN designing, implementation, and validation are
also covered. In [7], three techniques called error output,
I. I NTRODUCTION
resampling, and multilinear regression are applied to STLF for

W ITH power systems growth and the penetration of


renewable energy sources, the system complexity and
uncertainty levels have significantly increased. The load and
constructing confidence intervals using NN models. In [9], the
second-generation wavelets are combined with recurrent NNs
to improve the accuracy of solar radiation prediction. In [15],
the renewable energy forecasting processes have become even a hybrid NN and enhanced particle swarm optimization (PSO)
more complex, and more accurate forecasts are required for are used as the forecast engine for wind power forecast.
the management of power systems. Long-term forecasts of Focusing on feature selection, an irrelevancy filter and a redun-
the peak electricity demand are needed for capacity planning dancy filter are applied to select the set of candidate inputs
and maintenance scheduling [1]. Medium-term forecasts are of NN.
used for maintenance planning, fuel scheduling, and hydro Like NNs, FLSs are also universal approximators. Recently,
reservoir management [2]. Short-term load forecasting (STLF) FLSs have a quick development and are popularly applied to
is a fundamental and vital factor in day-to-day operations, forecasting applications [14]. In [13], the adaptive neuro-fuzzy
unit commitment (UC) and scheduling functions, evaluation of inference system model is applied for very short-term wind
net interchange, and system security analysis [2]. The STLF forecasting using Tasmania data sets. In [10], an interval type-2
FLS (IT2 FLS) is applied to STLF for handling uncertainties.
Manuscript received January 31, 2013; revised June 30, 2013; accepted
July 26, 2013. This work was supported by the National Research Foundation The output of an IT2 FLS is an interval (the type reduced
under Grant R-263-000-A66-279. set composed of the left and right end points), but is not
H. Quan and D. Srinivasan are with the Department of Electrical and a prediction interval (PI). A PI has a prescribed probability
Computer Engineering, National University of Singapore, Singapore 117576
(e-mail: quan.hao@nus.edu.sg; dipti@nus.edu.sg). associated with it called the confidence level. The output
A. Khosravi is with the Center for Intelligent Systems Research, Deakin obtained from an IT2 FLS does not have this feature and
University, Geelong 3217, Australia (e-mail: abbas.khosravi@deakin.edu.au). is a simple interval. An IT2 FLS model has built-in features
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org. (e.g., membership functions with uncertain mean and variance)
Digital Object Identifier 10.1109/TNNLS.2013.2276053 for handling uncertainties and minimizing their effects on the
2162-237X © 2013 IEEE
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

2 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

quality of output. It still, however, lacks to assign a confidence translated into a single-objective one using cost functions
level to its type reduced set (interval). [18]–[21], [28]. In this paper, a new problem formulation
Electrical power systems are evolving from today’s cen- method is proposed. The nominal coverage probability is con-
tralized bulk systems to more decentralized systems [16]. sidered as a hard constraint. Our only objective is to minimize
Renewable energy sources, such as the wind and solar power the width of PIs. The new constrained single-objective problem
generations, with their advantages of being cheaper, more formulation is closer to the primary problem and has fewer
flexible, and environmentally friendly, become the key to a sus- parameters than cost functions.
tainable energy supply infrastructure. However, penetrations of To solve the new constrained single-objective problem, tra-
these increase the level of uncertainty in power systems. More ditional derivative-based algorithms, such as gradient descent
advanced methods for accurate load and wind power forecasts methods, cannot be applied. These methods also run the risk
under various uncertainties become urgent for smart grid appli- of being trapped into local optima. Therefore, more intelligent
cations. In [17], wind generation is considered as a negative and powerful optimization methods are needed. To optimize
load. This further increases the complexity of load forecasting. NN parameters, PSO [15], [29], genetic algorithm (GA) [30],
Most of the applications on STLF and wind power forecasting and simplified swarm optimization [31] have been introduced
are point forecasts. Point forecasts, however, cannot properly in the literatures. In this paper, PSO, which is powerful for
handle the uncertainties associated with data sets [18], [19]. parameter optimization, is employed to solve the problem.
In this paper, STLF and wind power forecasting are imple- The mutation operator, which helps to achieve diversity in
mented using NN-based PIs. PIs are excellent tools for the GA, is also integrated into PSO to improve the exploratory
quantification of uncertainties associated with point forecasts capabilities and help in jumping out of local optima. The
and predictions [20], [21]. By definition, a PI is an estimate objective of using PSO has mainly two aspects. For one
of an interval in which a future observation will fall, with thing, PSO is used to solve the newly formulated constrained
a certain probability [(1 − α)%], given what has already single-objective problem. That is, to handle the constraints
been observed [22]. Typically, a PI consists of a lower and and optimize the objective. For another thing, PSO with
an upper bound, and the confidence level that the targets mutation operator is used as the training algorithm through
will lie within the two bounds. For point forecasts, only one optimizing the connection weights of NN models. Data sets
predicted point is provided for one target value. Point forecasts from electrical load demands and wind power generations are
provide only the prediction error but tell nothing about the used to validate this method. For the purpose of comparisons,
probability for correct predictions. PIs not only provide a ARIMA, ES, and naive models are also built using the same
range in which targets are highly likely to be covered, but data sets. Comparative results show that the PSO-based LUBE
also have an indication of their accuracy called the coverage method can construct higher quality PIs for load and wind
probability. power forecasting applications.
Delta, Bayesian, and bootstrap are three traditional methods The main contributions of this paper are as follows.
commonly used for construction of NN-based PIs [19], [23]. 1) A new problem formulation method for PI construction
In spite of the advantages of PIs, applications of these methods is proposed. The primary multiobjective problem is
are still less popular than NN point forecasts. Implementation formulated and solved as a constrained single-objective
difficulties, special assumptions about the data distribution, problem.
and massive computational requirements [24]–[26] hinder 2) A new PI width evaluation index, which is suitable for
widespread applications of these methods for decision making. training NN models, is proposed.
To overcome these problems, a new method called lower upper 3) PSO associated with mutation operator is initially inte-
bound estimation (LUBE) method for PI construction was grated into the LUBE method called the PSO-based
proposed in [27]. The LUBE method makes no assumption LUBE method. This PSO associated with mutation
about data distribution, and avoids calculation of matrices such operator has a very strong searching capability.
as Jacobian and Hessian matrixes. As we know, wind power 4) Different types of prediction tasks including electrical
is intermittent and very volatile in nature. Thus, assumptions load and wind power generation forecasts are imple-
about the data distributions seem problematic and in doubt. mented and compared together.
That is, where we can use the nonparametric LUBE method. 5) The obtained results from three case studies show that
Comparative results reported in [27] reveal that the LUBE the quality of PIs has been significantly improved com-
method is simpler, faster, and more reliable than the traditional pared with ARIMA, ES, and naive models.
methods. 6) Implementation of the proposed method is straightfor-
From the decision-making point of view, larger coverage ward and much easier; the PI construction time is also
probability and narrower width are always expected. However, much shorter than the traditional methods.
these two aspects of PIs are conflicting with each other. This The rest of this paper is organized as follows. Section II
optimization problem can be formulated and solved in different introduces the evaluation indices of PIs. Three problem formu-
ways. In this paper, after introducing the evaluation indices of lation methods are provided in Section III. The proposed PSO-
PIs, three problem formulation methods are summarized and based LUBE method is described in Section IV. Case studies
developed. Obviously, the primary problem is a multiobjective and results and discussions are implemented in Sections V
optimization problem for larger coverage probability and and VI separately. Finally, Section VII concludes this paper
narrower width. Previously, the primary problem has been and provides guidelines for future work.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

QUAN et al.: SHORT-TERM LOAD AND WIND POWER FORECASTING 3

II. E VALUATION I NDICES OF PIs used for their construction or the magnitudes of the underlying
In the literature, various methods have been applied to targets.
evaluate the performance of point forecasts. Out of these The format of PINAW is similar to the MAPE used for
methods, the most popular ones are error-based measures, such point forecasts. It gives equal weights to each width of PIs.
as mean square errors (MSEs) and mean absolute percentage It is MSE that is frequently used to train NN models instead
errors (MAPE). Likewise, the quality of PIs needs to be of MAPE. This is because MSE magnifies bigger forecasting
quantitatively evaluated. In this section, evaluation indices for errors and results in better training performances. Inspired by
both the coverage probability and width of PIs are initially this, a new width evaluation index for PIs, called PI normalized
introduced. Specially, a new index for width evaluation is root-mean-square width (PINRW), is developed for training in
proposed. The new index is suitable for training NN models. this paper

Finally, a cost function for the comprehensive evaluation of 
1 1 
N
PIs is developed. PINRW =  (Ui − L i )2 . (4)
R N
i=1
A. PI Coverage Probability The new index PINRW is functionally similar to MSE, and
Usually, the coverage probability (or confidence level) is magnifies wider intervals. In practice, the experimental results
considered as the key feature of PIs. PI coverage probability show that PINRW trends to obtain higher quality PIs than
(PICP) shows in which probability target values will be PINAW, just as MSE is a much better cost function for training
covered by the upper and lower bounds. A larger PICP means than MAPE. Next, in this paper, PINRW is used for training
more targets lie within the constructed PIs and vice versa. NN models and PINAW for testing.
PICP is defined as follows [18]:
C. Coverage Width-Based Criterion
1 
N
PICP = i (1) PICP and PINAW (or PINRW) assess only one aspect
N of PIs individually. Focusing on only one side of PIs may
i=1
lead to misleading results. In practice, a measure is required
where N is the number of samples and i is a Boolean variable,
to simultaneously address both aspects and comprehensively
which shows the coverage behavior of PIs. If the target value
evaluate the overall quality of PIs. An interesting index, called
yi is covered between the lower bound L i and upper bound
coverage width-based criterion (CWC), is proposed in [28]
Ui , i = 1; otherwise i = 0. Mathematically, i is defined as  
follows:  CWC = PINAW 1 + γ (PICP)e−η(PICP−μ) (5)
1, if yi ∈ [L i , Ui ];
i = (2) where γ (PICP) = 1 for training. μ and η are the two con-
0, if yi ∈
/ [L i , Ui ]. trolling parameters. The nominal confidence level [(1 − α)%]
To have valid PIs, PICP should not be less than the nominal can be used as a guidance for choosing μ. It stands for the
confidence level of PIs. Otherwise, PIs are invalid and should preassigned PICP that must be satisfied. η is a hyperparameter
be discarded. The ideal case for PICP is PICP = 100%, which that magnifies the difference between the PICP and μ. If the
means all the targets are covered by PIs. preassigned PICP is not satisfied, CWC exponentially penal-
izes on this term. When the PICP reaches around μ, there is a
balance between the PINRW (for training) and PICP. CWC
B. PI Normalized Average Width and PI Normalized provides a comprehensive assessment for both evaluation
Root-Mean-Square Width indices. It tries to find a tradeoff between informativeness
The quality of PIs is often evaluated by PICP and discussion (PINAW and PINRW) and validity (PICP) of PIs.
about the width of PIs is either ignored or vaguely pre- If the preassigned PICP is satisfied, the comparison between
sented [32]. If the upper and lower bounds of PIs are chosen as the two CWCs reasonably pays more attention to the narrower
extreme values of the targets (maximum and minimum values), PINAW. Thus, for test samples, γ (PICP) is a step function
a high PICP (even 100% PICP) can be easily achieved. The whose value is determined upon the satisfaction of PICP
argument here is that too wide PIs convey a little information 
0, PICP ≥ μ;
and are of no use for decision making. Width of PIs determines γ (PICP) = (6)
their informativeness. In the literature, a quantitative measure 1, PICP < μ.
of the width is defined as PI normalized average width When evaluating test PIs, if PICP is not less than the
(PINAW) [28] assigned μ, γ (PICP) = 0 gives equal measurements of PICP.
Otherwise, γ (PICP) = 1 and the corresponding penalty will
1 
N
PINAW = (Ui − L i ) (3) be accounted by CWC.
NR
i=1
III. P ROBLEM F ORMULATION
where R is the range of the underlying targets (maximum
minus minimum). The purpose of using R is to normalize the A. Primary Problem Formulation
PI average width in percentage. In this way, PINAW can be From the optimization point of view, higher PICP and
used for objective comparisons, regardless of the techniques narrower PINAW are the two objectives for high-quality PIs.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

4 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

Thus, the primary problem can be modeled as a multiobjective parameters and is closer to the primary problem. Considering
problem. These objectives are, however, conflicting with each the above CWC cost function as an example, parameters such
other as improving one objective will decrease another. If it as μ, η, and γ (PICP) need to be assigned in advance. The
is solved by some multiobjective methods, the nondominated performance of the cost function is also sensitive to these
solutions will appear on the Pareto front. parameters. They need to be fine tuned carefully.
To solve the constrained single-objective problem using
Objectives : Finding optimal weights ω∗ to :
PSO, the criteria for replacing one particle −→
m with another
Maximize : PICP(ω); →

particle n are as follows [33], [34].
Minimize : PINAW(ω). 1) Particle −→n is feasible and particle −

m is not.
Constraints : PINAW(ω) > 0; 2) Both particles are feasible or they have the same satis-
0 ≤ PICP(ω) ≤ 100%. faction of constraints, but −

n yields a better objective
function value.
3) Both particles are infeasible, but −→
n results in the lower
B. Single-Objective Problem Formulation sum of constraint violations.
In [18]–[20], and [28], if the nominal PICP (μ) was
preassigned, the primary problem was transformed into a IV. PSO-BASED LUBE M ETHOD FOR STLF
single-objective problem. This is realized through using CWC
as a cost function defined in (5). A. LUBE Method
Traditional methods construct NN-based PIs in two
Objective : Finding optimal weights ω∗ to :
steps [20]: 1) they regress the given data set to a specified
Minimize : CWC(ω). model or function, which is the same as point forecasts and
CWC cost function provides a comprehensive evaluation 2) according to the assumed data distribution, the statisti-
on both PICP and PINAW (or PINRW for training). At the cal mean and variance values are calculated, if Jacobian or
beginning of the training process, PICP is usually very low, Hessian matrix is needed, they are also calculated at this step.
and then CWC gives a heavily exponential penalty on this With this information, PIs are then constructed.
term. As the training continues, PICP becomes higher and Traditional methods for the construction of PIs suffer from
higher, the penalty for the unsatisfying PICP exponentially various problems. For example, the delta method makes
decreases. Once PICP is near the preassigned coverage prob- assumption on data and residual distributions [32]. In the
ability (μ), there is a balance between the validity (PICP) and process of PI construction, the derivatives also need to be
the informativeness (PINAW and PINRW) of PIs. calculated. Jacobian and Hessian matrixes are required by the
delta and Bayesian methods, respectively [32]. This may result
in singularity problems then decrease the reliability of PIs.
C. Constrained Single-Objective Problem Formulation Calculations for Jacobian and Hessian matrixes also signifi-
In this paper, a new problem formulation method is cantly increase the computation time. Thus, the complexity of
proposed. Because PICP is usually considered as the fun- traditional methods hinders widespread applications of PIs.
damental feature and determines the validity of PIs, it is A new method called the LUBE method was proposed
reasonably to be regarded as the hard constraint. This means in [27] to construct NN-based PIs. LUBE method adopts a NN
that the preassigned PICP must be satisfied for valid PIs. with two outputs to directly construct PIs in one step without
Under this hard constraint, the remaining objective is to any assumption about the data distribution. The two outputs
minimize PINAW. In this way, the primary problem is of NN correspond to the lower and upper bounds of PIs.
successfully represented as a constrained single-objective This design format is similar to point forecasts; the process
problem of PI construction is easy and straightforward. However, the
functions are totally different. A symbolic NN for the LUBE
Objective : Finding optimal weights ω∗ to :
method is shown in Fig. 1. The number of layers and neurons
Minimize : PINAW(ω). in each layer can be any in real NN models. LUBE method
Constraints : PINAW(ω) > 0; constructs PIs in just one step, thus is simpler and faster to be
μ ≤ PICP(ω) ≤ 100%. implemented.

where μ is the nominal confidence level, which can be set


to (1 − α)%. As shown in the model, this new formulation B. Implementation of the PSO-Based LUBE Method
is a single-objective problem with three constraints. Our only The flow chart of the proposed PSO-based LUBE method
objective is to minimize the average width of PIs. For the is shown in Fig. 2.
three constraints, out of which, μ ≤ PICP(ω) is the hard 1) Data Splitting: The whole data sets are split into three
constraint that must be met. If proper controls are given in the sets: a) training set; b) validation set; and c) test set. The
process of calculation, other two constraints can be satisfied training set is used to adjust the connection weights of NNs.
automatically. The validation set is applied to determine the optimal NN
Compared with the cost function method, there are two structure and other undetermined parameters. The test set will
obvious advantages of this problem formulation. It has fewer evaluate the final performance of the algorithm. After splitting,
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

QUAN et al.: SHORT-TERM LOAD AND WIND POWER FORECASTING 5

Hidden layer
Start

Input 1 Split dataset into training set (Dtraining), validation set (Dvalidation)

.. ..
Upper bound and test set (Dtest) then Normalization
Input 2
Prediction

. .
Interval Perform seasonal differencing and Correlation Analysis

Target
Use the validation set to determine
the optimal structure of NN and other parameters
Input n-1
Lower bound
Input n Initialization of NN and PSO parameters

Velocity and Position Update


Fig. 1. NN model for LUBE method to generate upper and lower bounds
of PIs.
Mutation Operator

the training and validation data sets are normalized to [−1, 1],
PI Construction and Evaluation on Dtraining
and the same settings are applied to test set for normalization. and Dvalidaon
2) First Seasonal Difference and Correlation Analysis: The
purpose of differencing is to make a time series stationary. It
is particularly effective for a seasonal time series. Correlation Update pbest and gbest
analysis is applied to help in choosing the inputs of NNs. The
detailed implementation of this part can be found in Section V.
3) Determination of the Optimal NN Structure: For each NO Training Termination?
candidate NN structure, the NN is trained and validated using
the training and validation sets for five times. Median values YES
of PINAWs (with satisfied PICPs) are used to determine
the optimal structure of NNs. Similarly, this method is also Construct PIs for Dtest and Evaluation
applicable to other undetermined parameters.
4) Initialization: NN weight and PSO parameter initializa-
tions become crucial for the proposed algorithm. The ini- NO 5 Time Repeats?
tialization process directly influences the quality of PIs and
repeatability of the algorithm. Several NN weight initialization YES
methods have been investigated [35] and compared in advance.
The performances of fixed initial weights, zero symmetric End
and nonsymmetric random initialization, and Nguyen–Widrow
(NW) method are examined by a list of experiments. Compara- Fig. 2. PSO-based LUBE method for the construction and evaluation of PIs.
tive results show that zero nonsymmetric random initialization
has the worst results. One possible explanation is that the input throughout the whole swarm. The classic formulas for velocity
data sets are normalized to [−1, 1], which is symmetric. On and position update [34] are shown below
the other hand, NW method repeatedly obtains the best and
stablest results. NW method chooses initial weights to distrib- v n (t + 1) = W v n (t) + c1 rand()( pbest,n − x n (t))
ute the active region of each neuron in the layer approximately + c2 rand()(gbest,n − x n (t)) (7)
evenly across the layer’s input space [36]. Thus, NW method x n (t + 1) = x n (t) + v n (t + 1) (8)
is chosen in this algorithm for NN weight initialization.
PSO parameter initialization consists of particle position where v n is the particle velocity in the nth dimension, rand()
and velocity initialization. Because NN connection weights is a random number between 0 and 1, W is a scaling factor,
are represented as the position of particles, so position initial- and c1 and c2 are the scaling factors that determine the relative
ization has been completed in weight initialization. Particle pull of pbest and gbest [37]. In addition to the two updates, the
velocity is randomly initialized with zero symmetric numbers. ranges for velocity and position are limited to Vmax and X max
5) Velocity and Position Update: Velocity and position separately.
update are the core of the PSO algorithm. The particles will 6) Mutation Operator: Selection, crossover, and mutation
exchange their findings with each other in the update process. are three main operators in GA. Mutation operator, which
In this way, the information will be exchanged efficiently helps to achieve diversity in GA, is integrated into PSO.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

6 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

6000
This integration strongly enhances the searching capacity and

SG Load (MW)
avoids being trapped into local optima. In the flow chart shown 5000
in Fig. 2, Gaussian mutation is added to each connection
weight after the position update. The mean and standard 4000
deviation of Gaussian distribution are the weight value and
10% of that weight value, respectively. The mutation rate 3000
0 50 100 150 200 250 300 350
exponentially decreases as the optimization continues. Half−hour
4
7) PI Construction and Evaluation for Training: The val- x 10
1.4
idation set has been applied to determine the optimal NN

NSW Load (MW)


structure. In this step, the training and validation sets are 1.2

combined together to train the NN. After update of the 1


NN connection weights, LUBE method is then applied to
0.8
construct new PIs. PI assessment indices (PICP and PINRW)
are calculated. 0.6
0 50 100 150 200 250 300 350
8) Update pbest and gbest : pbest is the personal best value Half−hour
of each particle and gbest is the best value of the whole swarm.
Fig. 3. Typical weekly load of SG and NSW (January 22–28, 2007).
Compared with the cost function method, the constrained
single-objective optimization is different. When updating the 30 km northeast of Canberra, just southeast of Lake George
pbest and gbest , the criterion introduced in Section III is and north of Bungendore. The wind farm was completed in
applied. The feasibility and the objective function will be 2009 and cost around A$220 million. The wind farm was built
considered together. While only CWC is considered for cost as a part of the Kurnell Desalination Plant project to offset the
function method. power usage of the desalination plant. The total capacity is
9) Training Termination: The training termination criterion 140 MW. The original 5-min interval data sets are combined
can be set as the reach of the maximum number of iterations into 1-h interval ones using average values. Some missing
or a few improvements made in certain number of iterations. points are filled by the neighborhood values. The whole year
Otherwise, the training process continues and returns to Step 5. of 2010 is chosen, out of which, the first six months are used
10) Test and Evaluation: Once the training process termi- for training, the following three months for validation and last
nates, the gbest value is chosen to generate PIs for the test three months for testing. Because one-day (24 h) ahead wind
set. PICP and PINAW instead of PINRW are calculated and power generation forecasting is commonly used in the UC and
recorded. For the comprehensive evaluation purpose, CWC is economic dispatches (EDs), one-day ahead PIs for wind power
also calculated. γ (PICP) is a step function defined in (6) for generation are implemented in this paper.
testing. The whole process is repeated five times. Results in
each run and the median values are reported.
B. Correlation Analysis
V. C ASE S TUDIES In time series analysis, identification of models usu-
ally relies on correlation analysis. For ARIMA models,
A. Data Sets ARIMA( p, d, q), the autocorrelation function (ACF) and par-
The demand data sets are real electrical load data from SG tial ACF (PACF) are used to determine the q and p orders.
and New South Wales (NSW, Australia). These two different ACF and PACF are useful tools to analyze the correlation
areas stand for two different types of load profiles. Load between the forecast values and the historical data sets.
pattern of SG is obvious more fixed than NSWs because of Inspired by the successful applications of ACF and PACF, we
the climate and regional reasons. SG has a tropical rainforest apply them to determine the input values of NN that are most
climate with no distinctive seasons, uniform temperature, and related to the forecast values. However, in time series analysis,
pressure. Temperatures of this city country usually range from only a stationary time series can make ACF and PACF more
23 °C to 32 °C (73 °F–90 °F). On the other hand, NSW is more sense. A time series that is seasonal or has varied mean and
influenced by the seasonal factors, thus, with more complex variance values is absolutely nonstationary.
load patterns. Unfortunately, the load data sets are nonstationary because
The chosen time periods are from January 2007 to they are seasonal time series with daily and weekly patterns.
December 2011, with 48 load points in each day. These five- Thus, ACF and PACF make a little sense on the original data
year load data sets are further divided into three sets: training sets. How to remove the seasonality of data sets becomes
set, validation set, and test set. The periods used for training, important for ACF and PACF analysis. Differencing is a
validation, and testing are the first three years, the fourth year, common way to make a time series stationary. Because one-
and fifth year, respectively. The typical weekly curves of SG week ahead load forecasting is studied in this paper, first
and NSW load, from Monday to Sunday, are shown in Fig. 3. seasonal difference [38] is conducted here
One-week ahead load forecasting using NN-based PIs will be
y(t) = y(t0 ) − y(t0 − T ) (9)
implemented and compared for both case studies.
The wind power generation data sets are from Capital Wind where T is the cycle of a time series, here T is the weekly
Farm (Captl_WF). The Captl_WF is located in NSW, around cycle: T = 48 × 7 = 336. y(t0 ) and y(t) are the time series
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

QUAN et al.: SHORT-TERM LOAD AND WIND POWER FORECASTING 7

First seasonal differenced


for NN load forecasting models. We also leave half an hour
time slack for power system weekly ahead planning, such as
1000
SG Load (MW)

the UC scheduling and EDs. The number of inputs for NN


0

wind power generation forecasting is chosen as 24 without


−2000

0 10000 20000 30000 40000 50000


day marks. In the following of this paper, training of NN for
Half−hour load forecasting is based on the data sets after first seasonal
difference. When calculating the assessment indices of PIs, the
Year 2007−2009
data sets are transformed back into the original load. While
1.0

the PIs of wind power generation forecasting are constructed


0.6
Partial ACF

on the original data sets.


0.2
−0.2

0 500 1000 1500 2000


VI. R ESULTS AND D ISCUSSION
Lag
A. Determination of Optimal NN Structure and Parameters
Year 2007−2009 Fully connected feedforward NNs with two hidden layers
are chosen in three case studies. The activation functions in
0.8

the hidden and output layers are tansig and purelin, respec-
0.4
ACF

tively. The quality of PIs is sensitive to the structure of


−0.4 0.0

NNs. Too small NNs have a poor learning capacity and too
0 500 1000 1500 2000 large NNs have a low generalization power. They also suffer
Lag
from underfitting and overfitting problems. How to determine
Fig. 4. ACF and PACF analysis of first seasonal differenced SG load from the number of neurons in the two hidden layers becomes
2007 to 2009. crucial for constructing high quality PIs. In the literatures,
several methods such as network pruning [43]–[45], cascade
correlation [46], and hybrid evolutionary NN construction [47]
before and after the first seasonal difference. ACF and PACF have been applied to determine this optimality. In [20], a
analyses of first seasonal differenced SG and NSW load from k-fold cross validation was applied to address this problem.
2007 to 2009 are implemented. The example of SG is shown Because in load and wind power forecasting, the time series
in Fig. 4. is in sequence, the k-fold cross validation cannot be directly
The implementations of wind power generation PIs are implemented here. However, the idea is very similar to cross
similar to the STLF, as shown in the flow chart of Fig. 2. validation. The number of neurons in the two hidden layers
The main difference is that no differencing is conducted. (n 1 and n 2 ) varies from 1 to 10, 1 ≤ n 1 , n 2 ≤ 10. Thus,
Wind power is the intermittent and volatile renewable resource. there are totally 10 × 10 = 100 candidate NN structures. Each
It fluctuates from time to time, so it has no obvious daily and candidate NN is trained and validated for five times using the
weekly patterns. Then, no seasonal differencing is conducted training and validation sets. Then, the median PINAW of the
on wind power generation forecasts. The PIs are constructed validation set is used as the criterion for a better structure. If
on the original data sets. Furthermore, some limitations are the hard constraint of μ ≤ PICP(ω) has not been met, then
set to the upper and lower bounds of Captl_WF. The upper the object of PINAW will be arbitrarily set to a very big value.
limitation is set to the capacity of 140 MW and the lower Thus, this candidate NN structure will be discarded.
limitation is 0. Determination of the optimal NN structure needs to be
balanced between the network complexity, generalization, and
C. Inputs of the NN learning capacity of NNs. Fig. 5 shows the NN structure
How to determine the inputs of NN for STLF is still an versus the median PINAW of the validation set of NSW load.
open question. Previously, they were usually determined based The lowest point in this 3-D plot is chosen as the optimal
on experience or a priori knowledge about the behavior of NN structure. As shown in Fig. 5, the optimal number of
the system. A rather intuitive guess is that, there must be neurons in the hidden layers are n 1 = 8 and n 2 = 1.
homologous instants in the past to the current instant, either Therefore, the optimal NN structure is 16-8-1-2 for NSW
the same moment yesterday (24 h ago) or the same instant load. Because of the space limitation, the similar plots for SG
one week ago, two weeks ago, and so on [39]. load and Captl_WF are omitted, the corresponding optimal
In this paper, the inputs of NN are chosen based on the NN structures are 16-5-1-2 and 24-8-5-2.
ACF and PACF analyses, as well as some empirical guidelines In addition to the optimal NN structure, the validation
from [39]–[42]. The first input of NN for load forecasting is set can also be used to determine other parameters in the
the day mark. The day mark in each week noted as {1, 2, 3, algorithm. Actually, the cost function CWC is not necessary
4, 5, 6, 7}, then it is normalized and added to the input set of needed to solve the constrained single-objective optimization
NN. The day mark can distinguish different daily load patterns problem. For the purpose of comprehensive evaluation, CWC
in each week. Lagged (one-week ahead, lags > T ) values is also applied for testing. Table I shows the typical parameters
of the larger absolute ACF and PACF are also considered. of three case studies for PSO and CWC. Wmax and Wmin are
With the trial and error method, there are totally 16 inputs the maximum and minimum values of the inertia weight W
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

8 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

100

90

PICP (%) and PINRW (%) of gbest particles


28.5
29
80
PINAW (%) of validation dataset

28.5

28 70
28

60 PICP (%) of gbest


27.5

PINRW (%) of gbest


27 27.5 50

26.5 40

26
27 30
10
8 10
6 8 20
4 6
4 26.5 10
2
2 0 200 400 600 800 1000 1200 1400 1600 1800 2000
n2 0 0 Iterations
n1

Fig. 5. Optimal NN structure of NSW load. Fig. 6. PICP and PINRW of gbest particles during training for SG load.

TABLE I 100
PARAMETERS FOR PSO AND CWC

PICP (%) and PINRW (%) of gbest particles


90

80

70

60 PICP (%) of gbest


PINRW (%) of gbest
50

40

for previous velocity. W plays an important role in controlling 30

the PSO convergence and it linearly decreases as iterations


20
increase. 0 200 400 600 800 1000 1200 1400 1600 1800 2000
The nominal coverage probability μnominal is 90%. When Iterations

training NNs, the μtrain is set to 91%–93% according to Fig. 7. PICP and PINRW of gbest particles during training for NSW load.
performances of the validation set. Usually, μtrain is 1%–3%
greater than μnominal. This conservation leaves a slack for the 125
test set. In this way, the nominal coverage probability will be
120
PICP (%) and PINRW (%) of gbest particles

easily reached for testing.


115

B. Training Process 110

Representations of the results for NN-based PIs mainly 105


consist of three parts: 1) the training process; 2) test results;
100 PICP (%) of gbest
and 3) discussions on quality of PIs. The training process
PINRW (%) of gbest
shows the convergence behavior of PICP and PINRW for 95

the gbest particle. It implies how they change through the 90


optimization process. The final performance of the proposed
85
method is examined by test set. Test results are cast in the form
of figures and tables. Finally, the qualities of PIs including 80

the repeatability and computation time are further discussed. 75


0 200 400 600 800 1000 1200 1400 1600 1800 2000
Figs. 6–8 show the PICP and PINRW of gbest particles in Iterations
iterations for SG, NSW load, and Captl_WF separately. The
population size is 80 for SG and NSW load, and 200 for Fig. 8. PICP and PINRW of gbest particles during training for Captl_WF.
Captl_WF; the numbers of iterations are all set to 2000.
As shown in Figs. 6–8, the training processes for all the
case studies simply converge. The PICP of gbest particle for reaches to the preassigned coverage probability μtrain . For
load forecasting only has a rapid drop at the beginning. After the gbest PICP of Captl_WF, not only a rapid drop happens
the first few iterations, PICP makes little changes and quickly at the beginning, but also it has a small perturbation in the
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

QUAN et al.: SHORT-TERM LOAD AND WIND POWER FORECASTING 9

6500 12000
upper−bound−test upper−bound−test
lower−bound−test lower−bound−test

Upper−lower−test bound and the Test Data


11000
Upper−lower−test bound and the Test Data

6000 Test Data Test Data

10000
5500

9000
5000
8000

4500
7000

4000
6000

3500 5000
0 50 100 150 200 250 300 350 0 50 100 150 200 250 300 350
Half−hour Half−hour

Fig. 9. SG weekly load and PIs for testing (March 21–27, 2011). Fig. 10. NSW weekly load and PIs for testing (March 21–27, 2011).

middle. This implies that the algorithm pays much more 140
upper−bound−test
attention to the hard constraint at first. The particles with lower−bound−test
satisfied PICP have the priority to survive and are chosen as

Upper−lower−test bound and the Test Data


120 Test Data
the gbest particles. It just meets our design of the problem
formulation. The PINRW of gbest particle decreases sharply at 100

the beginning. This means once the hard constraint of μtrain is


satisfied, the algorithm quickly shifts the emphasis on smaller 80

objective function of PINRW. As the optimization proceeds,


60
PINRW gradually plateaus. Even from 100 to 600 iterations,
PINRW minimization makes a little improvement. However,
40
after about 600 iterations, PINRW continues to reduce step by
step. Finally, PINRW takes its optimal value and converges to
20
a good solution. This implies the strong searching capacity of
PSO combined with the mutation operator.
0
20 40 60 80 100 120 140 160
Hours
C. Test Results
For unbiased assessments, the whole year of 2011 is used Fig. 11. Captl_WF weekly generation and PIs for testing (October 1–7,
for testing. Therefore, there are totally 365 × 48 = 17 520 test 2010).

samples for load forecasting. The number of test samples for


In this way, the validity of PIs has been confirmed. On the
Captl_WF is 92 × 24 = 2208 (last three months of 2010). For
other hand, the width of PIs for SG load is different from
better visualization, test result figures for only one week are
NSW load. They can be very narrow, as the pink and red
shown in Figs. 9–11. The numerical results of the whole test
lines of the lower upper bounds are tight with each other. It is
set are shown later in Table II.
wider for NSW load. This is because, as mentioned before,
From Figs. 9–11, the constructed PIs cover the real test
load pattern of NSW is more irregular than SG because of the
samples in a great percentage. The real test samples (blue
climate and regional reasons. Furthermore, the widths of PIs
line) lie within the constructed lower and upper bounds (pink
for Captl_WF are much larger than load forecasts. The widths
and red lines) in most of the cases. For SG and NSW load
of PIs are determined by the uncertainty level of the data
forecasts, the shapes of the three lines are very similar to
sets. Under the preassigned PICP, a lower level of uncertainty
each other. However, for Captl_WF, the upper bounds and the
results in narrower PIs and vice versa. Thus, the PSO-based
real test data have strong fluctuations, which show the high
LUBE method can construct high-quality PIs for data sets
uncertainties of wind power. The lower bounds of Captl_WF
under different levels of uncertainty. It can handle different
mostly drop to zeros. That is, because in the training and
types of prediction tasks.
validation sets, wind power outputs unexpectedly drop to zeros
frequently. The percentage of zero values is 30.42% in the
whole Captl_WF data set. This is called the intermittence of D. Discussions on Quality of PIs
wind power. To cover the real test data with a high PICP To validate the repeatability of the algorithm, and provide
(≥ 90%), the lower bounds have to be set down as zeros quantitative and convincing results, each case study is repeated
frequently. five times. Results in each run as well as the median values of
High PI coverages imply that the PICP indices for the PICP, PINAW, and CWC instead of the best ones are shown in
test samples are very satisfactory using the proposed method. Table II. PI construction time for test samples is also reported.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

10 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

TABLE II
PI E VALUATION I NDICES AND C ONSTRUCTION T IME FOR T EST S AMPLES

With Table II, we can have the following conclusions. forecasts. However, the widths of PIs increase very quickly
1) Demonstrated results imply the strong repeatability and with the increasing of time steps. PINAWs of PIs for one-
stability of the proposed algorithm. For five time repli- week ahead are even larger than 100%. These are too large
cates, the results show a high consistency for PICP, and are of no use for comparisons.
PINAW, and CWC. The standard deviations of CWCs Multistep models have the risk of running accumulative
for three case studies are 0.1045, 1.3525, and 1.6602. errors. Usually, ARIMA models have a better performance
The obtained results have definitely small variations in for one-step ahead forecasting. The concept of direct one-step
different runs. method is to resample the original half an hour interval data
2) For all the runs and case studies, the preassigned PICP sets into the new one-week interval time series. One step of
(90%) can be satisfied. This means that the constructed the new time series is one week. Thus, the one-step ahead
PIs cover the target values with a high probability. forecasting on the new time series using ARIMA models
It clearly shows that the hard constraint [μ ≤ PICP(ω)] can directly construct one-week ahead PIs. For example, to
in the problem formulation is successfully met. forecast load point y{t}, the chosen time series is
3) The median value of PINAW for NSW load is 23.50%, y{t − LT }, y{t − (L − 1)T }, y{t − (L − 2)T }
that is obviously larger than 16.05% of SG load. The
, . . . , y{t − 3T }, y{t − 2T }, y{t − T }. (10)
average widths of PIs are different for two case studies.
Under certain PICP, the widths of PIs rely on the T = 48 × 7 = 336, is the weekly cycle for load forecast-
uncertainty level of the data sets. NSW load has a ing. L is the length of look-back weeks. One year contains
higher level of uncertainty, thus its PIs are wider than 52 weeks, if the look-back length is four years, then
PIs of SG load. In addition, the widths of PIs for L = 52 × 4 = 208. PI of y{t} is then constructed based on
wind power generation are obviously lager than the load the resampled time series. For the wind power generation fore-
forecasting. Although the forecast horizon is one-day casts, the forecast horizon is one-day ahead, so the resampling
ahead, PINAWs of PIs are still much larger than one- cycle in (10) is T = 24 for 24 h, and the look-back length L
week ahead load forecasting. This strongly shows the is nine months (approximately L = 9 × 30 = 270 days).
high uncertainty of wind power. The above direct one-step method is conducted on the
4) PI construction time is one of the key characteristics ARIMA, ES, and naive models to construct 90% PIs. The
for algorithm design. This is especially true for online naive model is similar to the persistence model of point
applications. Under a hardware configuration of Intel forecasts, which states that the variable’s future value will be
Core2 Duo CPU E8500 3.16 GHz, and 4 GB of RAM, the same as the last one measured. The simulations of the three
the average PI construction time for the test set is 12.30, benchmark models are implemented in a statistical software
14.56, and 4.27 ms separately. This is very fast and as named R [49]. The numerical results of the three models
simple as point forecasts. are listed in Table II. From this table, the proposed PSO-
based LUBE method outperforms the ARIMA, ES, and naive
E. Result Comparisons With Benchmark Models models. The quality of PIs has been significantly improved.
For unbiased comparisons, ARIMA, ES, and naive models Because CWC has a comprehensive evaluation on both aspects
are used as three benchmarks for one-week ahead load and of PIs, the following discussions on the improvements are
one-day ahead wind power generation PIs using the same mainly focused on CWC. The percentage improvement is
data sets. Two methods iterative multistep and direct one-step defined as follows:
ARIMA models are tried. For the iterative multistep method, Compared result − New result
× 100%. (11)
a multiplicative seasonal ARIMA model [48] is built. Double Compared result
seasonal intraday and intraweek cycles are considered in this For the three case studies, the percentage improvements of
method. The iterative multistep method works well on point the median and best CWCs compared with ARIMA, ES, and
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

QUAN et al.: SHORT-TERM LOAD AND WIND POWER FORECASTING 11

TABLE III
for both load and wind power generation data sets show that
CWC P ERCENTAGE I MPROVEMENTS TO B ENCHMARK M ODELS
not only the high PICP and narrow PINAW are obtained, but
also the PI construction time remains short. The quality of
PIs is significantly improved compared with ARIMA, ES, and
naive models. In conclusion, the proposed PSO-based LUBE
method constructs higher quality PIs for different types of
prediction tasks in a short time.
To reserve the uncertainties in the original demand data sets,
we did not have special considerations on holidays. Further
improvements can be made by smoothing out the holiday data
sets through average, separating the weekends from weekdays,
naive models are listed in Table III. For the NSW load, all and applying enhanced input selection methods. In the future,
the three benchmarks fail to construct valid PIs with satisfied the proposed method will be applied to model the uncertainties
PICPs. Although PINAWs of the first two models are narrower of power systems after the penetration of renewable energy
than the proposed method, their PICPs are all unsatisfied resources. The obtained upper and lower bounds of PIs will be
and lower than the preassigned value of 90%. Thus, CWCs further incorporated into UC scheduling and EDs for decision
put a penalty on the violation of the PICP hard constraint. making and risk assessment in power systems.
This penalty term is also added to naive model of Captl_WF. R EFERENCES
In all the four methods, the proposed method obtains the best
[1] J. Taylor and P. McSharry, “Short-term load forecasting methods: An
results, whereas the naive models have the worst results for evaluation based on European data,” IEEE Trans. Power Syst., vol. 22,
the three case studies. Furthermore, the proposed method uses no. 4, pp. 2213–2219, Nov. 2007.
only one NN model for one test set. Then, the three benchmark [2] D. Srinivasan and M. Lee, “Survey of hybrid fuzzy neural approaches to
electric load forecasting,” in Proc. IEEE Int. Conf. Syst., Man Cybern.,
models apply multiple forecast models. The number of models Intell. Syst. 21st Century, vol. 5. Oct. 1995, pp. 4004–4008.
is equal to the number of test samples. [3] A. Conejo, M. Plazas, R. Espinola, and A. Molina, “Day-ahead electric-
In addition, the LUBE method outperforms the traditional ity price forecasting using the wavelet transform and ARIMA models,”
IEEE Trans. Power Syst., vol. 20, no. 2, pp. 1035–1042, May 2005.
methods on both the quality of PIs and computation time. [4] J. Deng and P. Jirutitijaroen, “Short-term load forecasting using time
These advantages have been verified in [20] and [27]. On the series analysis: A case study for singapore,” in Proc. IEEE Conf. CIS,
one hand, LUBE method can construct PIs with satisfied PICP Jun. 2010, pp. 231–236.
[5] H. Hippert, C. Pedreira, and R. Souza, “Neural networks for short-term
and narrower PINAW than the traditional methods. On the load forecasting: A review and evaluation,” IEEE Trans. Power Syst.,
other hand, PI construction requirement of traditional methods vol. 16, no. 1, pp. 44–55, Feb. 2001.
is at least 10 times more than LUBE methods [27]. Thus, [6] R. Sood, I. Koprinska, and V. Agelidis, “Electricity load forecasting
based on autocorrelation analysis,” in Proc. IJCNN, Jul. 2010, pp. 1–8.
the proposed method can construct higher quality PIs in a [7] A. da Silva and L. Moulin, “Confidence intervals for neural network
shorter time for load and wind power generation forecasting based short-term load forecasting,” IEEE Trans. Power Syst., vol. 15,
applications. no. 4, pp. 1191–1196, Nov. 2000.
[8] Q. Yu, H. Tang, K. Tan, and H. Li, “Rapid feedforward computation
by temporal encoding and learning with spiking neurons,” IEEE Trans.
VII. C ONCLUSION Neural Netw. Learn. Syst., 2013, doi: 10.1109/TNNLS.2013.2245677.
[9] G. Capizzi, C. Napoli, and F. Bonanno, “Innovative second-generation
STLF and the renewable energy forecasting are of great wavelets construction with recurrent neural networks for solar radiation
forecasting,” IEEE Trans. Neural Netw. Learn. Syst., vol. 23, no. 11,
importance for controlling and scheduling of smart grids. The pp. 1805–1815, Nov. 2012.
uncertainty of power systems increases because of the random [10] A. Khosravi, S. Nahavandi, D. Creighton, and D. Srinivasan, “Interval
nature of climate and the penetration of the renewable energies type-2 fuzzy logic systems for load forecasting: A comparative study,”
IEEE Trans. Power Syst., vol. 27, no. 3, pp. 1274–1282, Aug. 2012.
such as wind and solar power. To overcome the deficiencies [11] D. Hidalgo, P. Melin, and O. Castillo, “An optimization method for
of point forecasts to handle uncertainty, this paper imple- designing type-2 fuzzy inference systems based on the footprint of
ments the STLF and short-term wind power forecasting using uncertainty using genetic algorithms,” Expert Syst. Appl., vol. 39, no. 4,
pp. 4590–4598, 2012.
NN-based PIs. PIs are excellent tools for the quantification of [12] J. R. Castro, O. Castillo, P. Melin, O. Mendoza, and A. Rodríguez-
uncertainties associated with point forecasts and predictions. Díaz, “An interval type-2 fuzzy neural network for chaotic time series
Traditional methods for PI construction suffer from various prediction with cross-validation and akaike test,” in Soft Computing for
Intelligent Control and Mobile Robotics. New York, NY, USA: Springer-
problems. A newly proposed method called LUBE method Verlag, 2011, pp. 269–285.
is adopted and further developed to construct PIs. The pri- [13] C. Potter and M. Negnevitsky, “Very short-term wind forecasting for
mary multiobjective problem is successfully transformed into Tasmanian power generation,” IEEE Trans. Power Syst., vol. 21, no. 2,
pp. 965–972, May 2006.
a constrained single-objective problem. Advantages of this [14] P. Melin, J. Soto, O. Castillo, and J. Soria, “A new approach for time
new problem formulation are of being closer to the original series prediction using ensembles of ANFIS models,” Expert Syst. Appl.,
problem and having fewer parameters than the cost function. vol. 39, no. 3, pp. 3494–3506, 2012.
[15] N. Amjady, F. Keynia, and H. Zareipour, “Wind power prediction by
PSO with a strong searching ability for parameter adjustment a new forecast engine composed of modified hybrid neural network and
is integrated with the mutation operator. With the enhanced enhanced particle swarm optimization,” IEEE Trans. Sustainable Energy,
searching capacity, PSO is then used to solve the new problem vol. 2, no. 3, pp. 265–276, Jul. 2011.
[16] Y. Atwa, E. El-Saadany, M. Salama, and R. Seethapathy, “Optimal
and train the NN models. Correlation analysis is applied to renewable resources mix for distribution system energy loss minimiza-
help in choosing the inputs of NN models. Comparative results tion,” IEEE Trans. Power Syst., vol. 25, no. 1, pp. 360–370, Feb. 2010.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

12 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

[17] Y. Makarov, P. Etingov, J. Ma, Z. Huang, and K. Subbarao, “Incorpo- [40] V. Yadav and D. Srinivasan, “A SOM-based hybrid linear-neural model
rating uncertainty of wind power generation forecast into power system for short-term load forecasting,” Neurocomputing, vol. 74, no. 17,
operation, dispatch, and unit commitment procedures,” IEEE Trans. pp. 2874–2885, 2011.
Sustainable Energy, vol. 2, no. 4, pp. 433–442, Oct. 2011. [41] S.-T. Chen, D. Yu, and A. Moghaddamjo, “Weather sensitive short-
[18] A. Khosravi, S. Nahavandi, and D. Creighton, “Construction of optimal term load forecasting using nonfully connected artificial neural network,”
prediction intervals for load forecasting problems,” IEEE Trans. Power IEEE Trans. Power Syst., vol. 7, no. 3, pp. 1098–1105, Aug. 1992.
Syst., vol. 25, no. 3, pp. 1496–1503, Aug. 2010. [42] S. Fan and R. Hyndman, “Short-term load forecasting based on a semi-
[19] A. Khosravi, E. Mazloumi, S. Nahavandi, D. Creighton, and J. van parametric additive model,” IEEE Trans. Power Syst., vol. 27, no. 1,
Lint, “Prediction intervals to account for uncertainties in travel pp. 134–141, Feb. 2012.
time prediction,” IEEE Trans. Intell. Transp. Syst., vol. 12, no. 2, [43] A. Luchetta, “Automatic generation of the optimum threshold for
pp. 537–547, Jun. 2011. parameter weighted pruning in multiple heterogeneous output neural
[20] H. Quan, D. Srinivasan, and A. Khosravi, “Construction of neural networks,” Neurocomputing, vol. 71, nos. 16–18, pp. 3553–3560, 2008.
network-based prediction intervals using particle swarm optimization,” [44] P. L. Narasimha, W. H. Delashmit, M. T. Manry, J. Li, and F. Maldon-
in Proc. IJCNN, Jun. 2012, pp. 647–653. ado, “An integrated growing-pruning method for feedforward network
[21] H. Quan, D. Srinivasan, A. Khosravi, S. Nahavandi, and D. Creighton, training,” Neurocomputing, vol. 71, nos. 13–15, pp. 2831–2847, 2008.
“Construction of neural network-based prediction intervals for short- [45] E. Ricci and R. Perfetti, “Improved pruning strategy for radial basis
term electrical load forecasting,” in Proc. IEEE Symp. CIASG, Apr. 2013, function networks with dynamic decay adjustment,” Neurocomputing,
pp. 66–72. vol. 69, nos. 13–15, pp. 1728–1732, 2006.
[46] S. F. Christian and C. Lebiere, “The cascade-correlation learning archi-
[22] C. Chatfield, “Calculating interval forecasts,” J. Business Econ. Stat.,
tecture,” in Advances in Neural Information Processing Systems, vol. 2.
vol. 11, no. 2, pp. 121–135, Apr. 1993.
San Mateo, CA, USA: Morgan Kaufmann, 1990, pp. 524–532.
[23] C. Sheng, J. Zhao, W. Wang, and H. Leung, “Prediction intervals
[47] L. Ma and K. Khorasani, “New training strategies for constructive neural
for a noisy nonlinear time series based on a bootstrapping reservoir
networks with application to regression problems,” Neural Netw., vol. 17,
computing network ensemble,” IEEE Trans. Neural Netw. Learn. Syst.,
no. 4, pp. 589–609, 2004.
vol. 24, no. 7, pp. 1036–1048, Jul. 2013.
[48] D. Yang, P. Jirutitijaroen, and W. M. Walsh, “Hourly solar irradiance
[24] G. Chryssolouris, M. Lee, and A. Ramsey, “Confidence interval pre- time series forecasting using cloud cover index,” Solar Energy, vol. 86,
diction for neural network models,” IEEE Trans. Neural Netw., vol. 7, no. 12, pp. 3531–3543, 2012.
no. 1, pp. 229–232, Jan. 1996. [49] R. Shumway and D. Stoffer, Time Series Analysis and Its Applications:
[25] J. T. G. Hwang and A. A. Ding, “Prediction intervals for artificial neural With R Examples (Springer Texts in Statistics). New York, NY, USA:
networks,” J. Amer. Stat. Assoc., vol. 92, no. 438, pp. 748–757, 1997. Springer-Verlag, 2010.
[26] T. Heskes, “Practical confidence and prediction intervals,” in Advances in
Neural Information Processing Systems, vol. 9. Cambridge, MA, USA:
MIT Press, 1997, pp. 176–182.
[27] A. Khosravi, S. Nahavandi, D. Creighton, and A. F. Atiya, “Lower
upper bound estimation method for construction of neural network-
based prediction intervals,” IEEE Trans. Neural Netw., vol. 22, no. 3,
pp. 337–346, Mar. 2011.
[28] A. Khosravi, S. Nahavandi, and D. Creighton, “Prediction interval con-
struction and optimization for adaptive neurofuzzy inference systems,” Hao Quan (S’11) received the B.Eng. degree
IEEE Trans. Fuzzy Syst., vol. 19, no. 5, pp. 983–988, Oct. 2011. in water conservancy and hydropower engineering
[29] F. Valdez, P. Melin, and O. Castillo, “Evolutionary method combining from the Huazhong University of Science and Tech-
particle swarm optimisation and genetic algorithms using fuzzy logic nology, Wuhan, China, in 2008. He is currently
for parameter adaptation and aggregation: The case neural network pursuing the Ph.D. degree with the Department
optimisation for face recognition,” Int. J. Artif. Intell. Soft Comput., of Electrical and Computer Engineering, National
vol. 2, nos. 1–2, pp. 77–102, 2010. University of Singapore (NUS), Singapore.
[30] M. Pulido, P. Melin, and O. Castillo, “Genetic optimization of ensemble He joined at NUS in 2011. His current research
neural networks for complex time series prediction,” in Proc. IJCNN, interests include uncertainty modeling in distributed
2011, pp. 202–206. power systems, forecasting, neural networks, and
[31] W.-C. Yeh, “New parameter-free simplified swarm optimization for unit commitment scheduling.
artificial neural network training and its application in the prediction
of time series,” IEEE Trans. Neural Netw. Learn. Syst., vol. 24, no. 4,
pp. 661–665, Apr. 2013.
[32] A. Khosravi, S. Nahavandi, D. Creighton, and A. F. Atiya, “Compre-
hensive review of neural network-based prediction intervals and new
advances,” IEEE Trans. Neural Netw., vol. 22, no. 9, pp. 1341–1356,
Sep. 2011.
[33] K. Zielinski and R. Laur, “Constrained single-objective optimization
Dipti Srinivasan (SM’02) received the M.Eng.
using particle swarm optimization,” in Proc. IEEE CEC, Jul. 2006,
and Ph.D. degrees in electrical engineering from
pp. 443–450.
the National University of Singapore, Singapore, in
[34] G. Pulido and C. Coello, “A constraint-handling mechanism for par-
1991 and 1994, respectively.
ticle swarm optimization,” in Proc. IEEE CEC, vol. 2. Jun. 2004,
She was with the University of California at
pp. 1396–1403.
Berkeley’s Computer Science Division, Berkeley,
[35] A. Pavelka and A. Proch, “Algorithms for initialization of neural network CA, USA, as a Post-Doctoral Researcher, from 1994
weights random numbers in matlab,” in Proc. Control Eng., vol. 2. 2004, to 1995. In June 1995, she joined the Faculty of the
pp. 453–459. Electrical and Computer Engineering Department,
[36] D. Nguyen and B. Widrow, “Improving the learning speed of 2-layer National University of Singapore, where she is an
neural networks by choosing initial values of the adaptive weights,” in Associate Professor. From 1998 to 1999, she was a
Proc. IJCNN, vol. 3. Jun. 1990, pp. 21–26. Visiting Faculty with the Department of Electrical and Computer Engineering,
[37] R. Eberhart and J. Kennedy, “A new optimizer using particle swarm Indian Institute of Science, Bangalore, India. Her current research interests
theory,” in Proc. 6th Int. Symp. MHS, Oct. 1995, pp. 39–43. include the development of hybrid neural network architectures, learning
[38] B. Bowerman, R. O’Connell, and A. Koehler, Forecasting, Time Series, methods and their practical applications for large complex engineered systems,
and Regression: An Applied Approach (Duxbury Applied Series). Bel- such as the electric power system and urban transportation systems.
mont, CA, USA: Thomson Brooks/Cole, 2005. Dr. Srinivasan is currently serving as an Associate Editor of the IEEE
[39] D. Srinivasan, Z. Guofan, A. Khosravi, S. Nahavandi, and D. Creighton, T RANSACTION OF N EURAL N ETWORKS and the IEEE T RANSACTIONS ON
“Hybrid neural-evolutionary model for electricity price forecasting,” in I NTELLIGENT T RANSPORTATION S YSTEMS . She received the IEEE PES
Proc. IJCNN, Jul./Aug. 2011, pp. 3164–3169. Outstanding Engineer Award in 2010.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

QUAN et al.: SHORT-TERM LOAD AND WIND POWER FORECASTING 13

Abbas Khosravi (M’07) received the B.Sc. degree


in electrical engineering from the Sharif University
of Technology, Tehran, Iran, in 2002, the M.Sc. in
electrical engineering from the Amirkabir Univer-
sity of Technology, Tehran, in 2005, and the Ph.D.
degree from Deakin University, Burwood, Australia,
in 2010.
He was with the eXiT Group, University of
Girona, Girona, Spain, from 2006 to 2007, conduct-
ing research in artificial intelligence. Currently, he
is a Research Fellow with the Centre for Intelligent
Systems Research, Deakin University. His current research interests include
theory and application of neural networks and fuzzy logic systems for
modeling, analysis, control, and optimization of operations within complex
systems.
Mr. Khosravi is a recipient of the Alfred Deakin Post-Doctoral Research
Fellowship in 2011.

You might also like