Professional Documents
Culture Documents
DOI 10.1007/s00521-013-1480-1
ORIGINAL ARTICLE
Abstract Modeling and forecasting of time series data are collections of observations measured over successive times.
integral parts of many scientific and engineering applica- Analysis and forecasting of time series data have funda-
tions. Increasing precision of the performed forecasts is mental importance in various scientific and engineering
highly desirable but a difficult task, facing a number of applications. As such, improving forecasting accuracies is a
mathematical as well as decision-making challenges. This matter of constant attentions of researchers [1]. Various
paper presents a novel approach for linearly combining linear and nonlinear forecasting models are the outcomes of
multiple models in order to improve time series forecasting extensive works in this area during the past three decades
accuracy. Our approach is based on the assumption that each [1–3]. However, due to the stochastic nature of a time ser-
future observation of a time series is a linear combination of ies, it is evident that no single model alone can capture all
the arithmetic mean and median of the forecasts from all the intrinsic details of the associated data-generating pro-
participated models together with a random noise. The cess [4]. Hence, it is quite risky and inappropriate to rely
proposed ensemble is constructed with five different fore- upon only one of the available models for forecasting future
casting models and is tested on six real-world time series. data. On the other hand, combining forecasts from con-
Obtained results demonstrate that the forecasting accuracies ceptually different models is a reliable approach of
are significantly improved through our combination mech- decreasing the model selection risk while at the same time
anism. A nonparametric statistical analysis is also carried improving overall forecasting precisions. Moreover, the
out to show the superior forecasting performances of the accuracies obtained through combining forecasts are often
proposed ensemble scheme over the individual models as better than any individual model in isolation [4–6].
well as a number of other forecast combination techniques. Intrigued by their strengths and benefits, many forecast
combination algorithms have been developed during the
Keywords Time series Forecast combination last two decades. At present, selection of the most prom-
Box-Jenkins models Artificial neural networks ising combination scheme for a particular application is
Elman networks Support vector machines itself a nontrivial task [5]. Here, it is worth mentioning that
a combination of multiple forecasts attempts to enhance the
1 Introduction forecasting precisions at the expense of increased compu-
tational complexity. Obviously, a balanced trade-off
Many natural as well as synthetic phenomena can be between the accuracy improvement and associated com-
expressed in terms of time series which are sequential putational cost is highly desirable from an ideal combina-
tion scheme. An extensive body of the literature has shown
that simple methods for combining often provide much
R. Adhikari (&) R. K. Agrawal
better accuracies than more complicated and sophisticated
School of Computer and Systems Sciences,
Jawaharlal Nehru University, New Delhi 110067, India techniques [6, 7]. Thus, for combining forecasts, one
e-mail: adhikari.ratan@gmail.com should use straightforward tools and avoid intricacies.
R. K. Agrawal The simple average in which all component forecasts
e-mail: rkajnu@gmail.com are weighted equally is so far the most elementary, yet
123
Neural Comput & Applic
123
Neural Comput & Applic
n o
One such benchmark technique, due to Zhang [24], adopts ð1Þ ð2Þ ðnÞ
mean and median of y^j ; y^j ; . . .; y^j ; j ¼ 1; 2; . . .; N.
a hybridization of autoregressive integrated moving aver-
age (ARIMA) [2, 3, 24] and artificial neural network Then, the proposed combined forecast of Y is defined as
(ANN) models [2, 4, 24]. In this hybrid mechanism, the Y y1 ; y^2 ; . . .; y^N T , where
^ ¼ ½^
linear correlation structure of the time series is modeled
8
< vj ; for a ¼ 0
through ARIMA, and then, the remaining residuals, which y^j ¼ uj ; for a ¼ 1 ð1Þ
contain only nonlinear part, are modeled through ANN. :
auj þ ð1 aÞvj þ ej ; for 0\a\1
With three real-world datasets, Zhang [24] showed that his
hybrid scheme provided reasonably improved accuracies ej Nð0; r2 Þ 8j ¼ 1; 2; . . .; N:
and also outperformed each component model. Recently, In Eq. 1, fej jj ¼ 1; 2; . . .; Ng is assumed to be a white noise
Khashei and Bijari [25, 26] further explored this approach, process, i.e., a sequence of independent, identically
thereby suggesting a similar but slightly modified and more distributed (i.i.d.) random variables, which follow the
robust combination method. typical normal distribution with zero mean and a constant
Forecast combinations through nonlinear techniques are variance r2. These white noise terms are introduced as the
analyzed as well in the time series literature, but to a trade-offs between the accuracy improvement and proper
limited extent. This is mainly due to the lack of recognized combination of the two averages. The parameter a manages
studies which document success of such schemes [5]. the contributions of the two averages in the final combined
Adhikari and Agrawal [27] recently developed a nonlinear- forecast. The median is more dominating for 0 B a \ 0.5,
weighted-ensemble technique that considers both the whereas the simple average is more dominating for
individual forecasts as well as the correlation among pairs 0.5 \ a B 1. The proposed ensemble scheme can be
of forecasts. Their scheme was able to provide reasonably written in the vector form as follows:
enhanced forecasting accuracies for three popular time
series datasets. ^ ¼ aU þ ð1 aÞV þ E
Y
ð2Þ
In spite of several improved techniques, the robustness 0a1
and efficiency of elementary statistical combination
methods are always appreciated in the literature [1]. Many where U ¼ ½u1 ; u2 ; . . .; uN T ; V ¼ ½v1 ; v2 ; . . .; vN T are,
empirical evidences show that the naı̈ve simple average respectively, the vectors of means and medians, and E ¼
notably outperformed more complex ensemble schemes ½e1 ; e2 ; . . .; eN T is the vector of the white noise terms.
[7, 28]. A robust alternative to the simple average is the
trimmed mean in which forecasts are averaged by 3.2 Selections of the tuning parameters
excluding an equal percentage of highest and lowest fore-
casts [5, 7, 8]. In a recent comprehensive study, Jose and Our proposed scheme combines the simple average and
Winkler [7] have found that trimmed means were able to median of individual forecasts in an unbiased manner. The
provide slightly more accurate results than the simple success of the scheme solely depends on the suitable
averages and reduced the risk of high errors. But hitherto, selection of the parameters a and r. Here, we suggest
there is no rigorous method for selecting the exact amount effective techniques for selecting these parameters.
of trimming. The median (i.e., the ultimate trimming) has For selecting a, first, we consider one of the two ranges:
also been studied as an alternative to the simple average 0 B a \ 0.5 (median dominating) and 0.5 \ a B 1 (mean
with varied results [7–12]. Thus, it seems to be advanta- dominating). In either of these ranges, we vary a in an
geous to adequately combine both simple average and arithmetic progression with a certain step size (i.e., com-
median. mon difference) s. The desired value of a is then taken to
be the mean of its values in this particular range and is
denoted as a*. The precise mathematical formulation of a*
is given as follows:
3 The proposed combination method
Ns
1 X
3.1 Formulation of the proposed combination method a ¼ ½l þ ði 1Þs ð3Þ
Ns i¼1
Let, Y ¼ ½y1 ; y2 ; . . .; yN T 2 RN be the actual testing dataset where Ns ¼ 0:5 s and l = 0, 0.5 in the ranges 0 B a \ 0.5
of a time series, which is forecasted using n different and 0.5 \ a B 1, respectively.
iT It is obvious that smaller values of the step size s ensure
^ðiÞ ¼ y^ðiÞ ; y^ðiÞ ; . . .; y^ðiÞ be the ith forecast
h
models, and Y 1 2 N better estimation of the final combined forecast. The value
of Y (i = 1, 2,…, n). Let, uj and vj, respectively, be the s = 0.01 is used for all empirical works in this paper.
123
Neural Comput & Applic
The nature of the white noise terms has a crucial impact et al. [29] suggested the ensemble scheme SARIMABP
on the success of our combination scheme, and so the noise which combines seasonal ARIMA (SARIMA) and
variance r2 must have to be chosen with utmost care. The backpropagation ANN (BP-ANN) models for seasonal
value of r2 is closely related to the deviation between the time series forecasting. However, in many situations,
simple average and median of the forecasts. Keeping this our linear ensemble method can be apparently more
fact in mind, we suggest choosing r2 as the variance of the accurate as it combines a large number of competing
difference between the mean and median vectors, i.e., forecasting models.
3. Considering a wide pool of available forecast combi-
r2 ¼ varðU VÞ ð4Þ
nation schemes, nowadays, a major challenge faced by
Equation 4 provides a rational as well as robust method for the time series research community is to select the
selecting the noise variance in our combination scheme. most appropriate method of combining forecasts. Our
After choosing the two tuning parameters a and r through proposed mechanism improves the forecasting accu-
Eqs. 3 and 4, the combined forecast vector is given by racy as well as diminishes the model selection risk to a
^ ¼ a U þ ð1 a ÞV þ E great extent. The formulation of our method suggests
Y ð5Þ
that it apparently performs much better than both
The requisite steps of our combination method are con- simple average and median. As such, the proposed
cisely summarized in Algorithm 1. scheme provides a potentially good choice in the
domain of time series forecast combination.
Algorithm 1 The proposed linear combination of multiple forecasts 4. Our proposed scheme is notably simple and much
more computationally efficient than various existing
^ðiÞ ði ¼ 1; 2; . . .; nÞ of Y
Inputs: The individual forecasts Y combination methods. This is due to the fact that many
Output: The combined forecast Y ^ sophisticated methods, e.g., RLS, dynamic RLS,
1. Compute the mean vector U and median vector V from the NRLS, outperformance, etc., require repeated in-
individual forecasts sample applications of the constituent forecasting
2. Select a certain step size s and a range for a from either models, thus entailing large amount of computational
0 B a \ 0.5 or 0.5 \ a B 1
times. On the contrary, our proposed method applies
3. Compute the values of the parameters a* and r2 through Eqs. 3
the component models only once and, hence, saves a
and 4, respectively
lot of associated computations.
4. Generate N i.i.d. random variables ej * N(0,r2), (j = 1, 2,…, N)
and define the vector E ¼ ½e1 ; e2 ; . . .; eN T
5. Finally, calculate the proposed combined forecast vector
through Eq. 5
4 The component forecasting models
A schematic depiction of our proposed combination The effectiveness of a forecast combination mechanism
method is presented in Fig. 1. depends a lot on the constituent models. Several studies
document that for a good combination scheme, the com-
3.3 Salient features of our proposed combination ponent forecasting models are essentially to be as diverse
mechanism and competent as possible [4, 8, 21]. Armstrong [8] in his
extensive study on combining forecasts further emphasized
1. The primary advantage of the proposed scheme is that on using four to five constituent models for achieving
it reduces the overall forecasting error in more precise maximum combined accuracy. Based on these studies and
manner than various other ensemble mechanisms. In recommendations, here, we use the following five diverse
our method, a large extent of error is already reduced forecasting models to build up our proposed ensemble:
through simple average and median of the individual
• The Autoregressive Integrated Moving Average (AR-
forecasts. Then, combining these averages results in
IMA) [2, 3].
further reduction in the forecasting error. Hence, the
• The Support Vector Machine (SVM) [30].
proposed methodology is evidently better than directly
• The iterated Feedforward Artificial Neural Network
combining the forecasts from the individual models.
(FANN) [2, 25, 32, 33].
2. The proposed method benefits from the forecasting
• The iterated Elman ANN (EANN) [5, 34].
skills of diverse constituent models, unlike some
• The direct EANN [5, 34].
others, which combines only a few particular ones.
For example, Zhang [24] suggested a hybridization of In the forthcoming subsections, we concisely describe
two models, viz. ARIMA and ANN. Similarly, Tseng these five forecasting techniques.
123
Neural Comput & Applic
4.1 The ARIMA model (p, P), (q, Q), and (d, D) of a SARIMA model represent the
autoregressive, moving average, and differencing terms,
ARIMA models are the most extensively used statistical respectively, and s denote the period of seasonality.
techniques for time series forecasting. These are developed
in the benchmark work of Box and Jenkins [3] and are also
4.2 The SVM model
commonly known as the Box-Jenkins models. The under-
lying hypothesis of these models is that the associated time
SVM is a relatively recent statistical learning theory,
series is generated from a linear combination of predefined
originally developed by Vapnik [30]. It is based on the
numbers of past observations and random error terms. A
principle of Structural Risk Minimization (SRM) whose
typical ARIMA model is mathematically given by
objective is to find a decision rule with good generalization
/ðLÞð1 LÞd yt ¼ hðLÞet ð6Þ ability through selecting some special-training data points,
viz. the support vectors [30, 31]. Time series forecasting is
where,
a branch of support vector regression (SVR) problems in
p q
X X which an optimal separating hyperplane is constructed to
/ðLÞ ¼ 1 /i Li ; hðLÞ ¼ 1 þ hj L j ; Lyt ¼ yt1 :
correctly classify real-valued outputs. But the explicit
i¼1 j¼1
knowledge of this mapping is avoided through the use of a
The parameters p, d, and q are, respectively, the number of kernel function that satisfies the Mercer’s condition [31].
autoregressive, degree of differencing, and moving average Given a training dataset of N points fxi ; yi gNi¼1 with
terms; yt is the actual time series observation, and et is a xi 2 Rn ; yi 2 R, the goal of SVM is to approximate the
white noise term. The white noise terms are basically i.i.d. unknown data-generating function in the following form:
normal variables with zero mean and a constant variance. It f ðx; wÞ ¼ w uðxÞ þ b ð7Þ
is customary to refer the model, represented through Eq. 6
as the ARIMA (p, d, q) model. This model effectively where w is the weight vector, x is the input vector, u is the
converts a nonstationary time series to a stationary one nonlinear mapping to a higher dimensional feature space,
through a series of ordinary differencing processes. A and b is the bias term.
single differencing is enough for most applications. The Using Vapnik’s e-insensitive loss function, given in
appropriate ARIMA model parameters are usually deter- Eq. 8, the SVM regression is converted to a quadratic
mined through the well-known Box-Jenkins three-step programming problem (QPP) to minimize the empirical
iterative model building methodology [3, 24]. risk, as defined in Eq. 9.
(
The ARIMA (0, 1, 0), i.e., yt - yt-1 = et is in particular 0; if jy f ðx; wÞj 2
known as the random walk (RW) model and is commonly Le ðy; f ðx; wÞÞ ¼ ð8Þ
jy f ðx; wÞj 2; otherwise
used for modeling nonstationary data [24]. Box and Jenkins
[3] further generalized the basic ARIMA model to forecast 1 XN
J w; ni ; ni ¼ kwk2 þC ni þ ni :
seasonal time series, as well, and this extended model is ð9Þ
2 i¼1
referred as the seasonal ARIMA (SARIMA). A SARIMA
(p, d, q) 9 (P, D, Q)s model adopts an additional seasonal In Eq. 9, C is the positive regularization constant that
differencing process to remove the effect of seasonality assigns a penalty to misfit, and ni ; ni are the nonnegative
from the dataset. Like ARIMA (p, d, q), the parameters slack variables.
123
Neural Comput & Applic
After solving the associated QPP, the optimal decision consists of several interconnecting nodes, which transmits
hyperplane is given by processed information to the next layer. It is also known as
Ns
X an FANN model. An FANN with a single hidden layer is
yðxÞ ¼ ðai ai ÞKðx; xi Þ þ bopt : ð10Þ often sufficient for practical time series forecasting appli-
i¼1 cations, and so FANNs with single hidden layers are con-
where Ns is the number of support vectors, ai and ai (i = 1, sidered in this paper. There are two extensively popular
2,…, Ns) are the Lagrange multipliers, bopt is the optimal approaches for multi-periodic forecasts through FANNs,
bias, and K(x, xi) is the kernel function. viz. iterative and direct [5, 32]. An iterative approach
From several choices for the SVM kernel function, the consists of one neuron in the output layer, and the value of
Radial Basis Function (RBF) kernel [31], defined as the next period is forecasted using the current predicted
K(x, y) = exp(–||x–y||2/2r2), r being a tuning parameter is value as one of the inputs. On the contrary, the number of
used in this paper. While fitting the SVM models, the output neurons in a direct approach is precisely equal to the
associated hyperparameters C and r are determined forecasting horizon, i.e., the number of future observations
through precisely following the grid-search technique, as to be forecasted. In short-term forecasting, the direct
recommended and utilized by Chapelle [35]. method is usually more accurate than its iterative coun-
terpart, but there is no firm conclusion in this regard [32].
4.3 The FANN model The network structures for iterative and direct FANN
forecasting methods are shown in Fig. 2a, b respectively.
Initially inspired from the biological structure of human
brain, ANNs gradually achieved great success and recog- 4.4 The EANN model
nition in versatile domains, including time series fore-
casting. The major advantage of ANNs is their nonlinear, Relatively recently, EANNs attracted notable attention of
flexible, and model-free nature [4, 25, 33]. ANNs have the time series forecasting community. An EANN has a
remarkable ability of adaptively recognizing relationship in recurrent network structure that differs from a common
input data, learning from experience and then utilizing the feedforward ANN through inclusion of an extra context
gained previous knowledge to predict unseen future patterns. layer and feedback connections [34]. The context layer is
Unlike other nonlinear statistical models, ANNs do not continuously fed back by the outputs from the hidden layer,
require any information about the intrinsic data-generating and as such it acts as a reservoir of past information. This
process. Moreover, an ANN can always be designed that can recurrence imparts robustness and dynamism to the net-
approximate any nonlinear continuous function as closely as work so that it can perform temporal nonlinear mappings.
desired. Due to this reason, ANNs are referred to as the The architecture of an EANN model is shown in Fig. 3.
universal function approximators [24, 33]. EANNs generally provide better forecasting accuracies
The most common ANN architecture, used in time than FANNs due to the introduction of additional memory
series forecasting, is the multilayer perceptron (MLP). An units. But they require more network weights, especially
MLP is a feedforward architecture of an input, one or more the hidden nodes in order to properly model the associated
hidden and an output layer in such a way that each layer temporal relationship. However, there is no rigorous
Fig. 2 The FANN architectures Input layer Input layer Output layer
for the following: a iterative
yt-1 yt-1 yt
forecasting, b direct forecasting
Hidden layer Hidden layer
yt+1
yt-2 Output layer yt-2
yt
yt+2
yt-p yt-p
yt+q-1
(a) (b)
123
Neural Comput & Applic
guideline in the literature for selecting the optimal structure and Fig. 4 depicts their corresponding time plots. The
of an EANN model [5]. In this paper, we set the number of horizontal and vertical axis of each time plot, respectively,
hidden nodes as 25 and the training algorithm as traingdx represents the indices and actual values of successive
[36] for all EANN models. observations. Here, we are considering short-term fore-
casting, and so the size of the testing dataset for each time
series is kept reasonably small.
All experiments are performed on MATLAB. The
5 Empirical results and discussions
default neural network toolbox [36] is used for the FANN
and EANN models. The forecasting accuracies are evalu-
Six real-world time series from different domains are used
ated through the mean-squared error (MSE) and the sym-
in order to empirically examine the performances of our
metric mean absolute percentage error (SMAPE), which
proposed ensemble scheme. These are collected from the
are defined as follows:
Time Series Data Library (TSDL) [37], a publicly available
N
online repository of a wide variety of time series datasets. 1X
Table 1 presents the descriptions of these six time series, MSE = ðyt y^t Þ2 ð11Þ
N t¼1
N
1X jyt y^t j
SMAPE = 100 ð12Þ
N t¼1 ðyt þ y^t Þ=2
Hidden nodes where yt and y^t are, respectively, the actual and forecasted
Input nodes
Canadian Lynx (LYNX) Number of lynx trapped per year in the Mackenzie River Stationary, nonseasonal 114 14
district of Northern Canada (1821–1934)
Sunspots (SNSPOT) The annual number of observed sunspots (1700–1987) Stationary, nonseasonal 288 67
USA Real GNP (RGNP) Real GNP of USA in billions of dollars (1890–1974) Nonstationary, 85 15
nonseasonal
Child Births (BIRTHS) Births per 10,000 of 23 year old women in USA Nonstationary, 59 10
(1917–1975) nonseasonal
Airline Passengers (AP) Monthly number of international airline passengers Monthly, seasonal 144 12
(in thousands) (January 1949–December 1960)
USA expenditure (UE) Quarterly new plant/equipment expenditures in USA Quarterly, seasonal 52 8
(1964–1976)
123
Neural Comput & Applic
180
6000
Number of lynx trapped 160
5000
Number of sunspots
140
120
4000
100
3000
80
2000 60
40
1000
20
0 0
1 25 49 73 97 114 1 37 73 109 145 181 217 253 288
800 260
700 240
Number of births
Yearly real GNP
600 220
500 200
400 180
300 160
200 140
100 120
0 100
0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60
600
30
Quarterly expenditures
Number of passengers
500
25
400
20
300
15
200
100 10
0 12 24 36 48 60 72 84 96 108 120 132 144 0 4 8 12 16 20 24 28 32 36 40 44 48 52
Fig. 4 Time plots of the following: a LYNX, b SNSPOT, c RGNP, d BIRTHS, e AP, f UE
123
Neural Comput & Applic
ARIMA 0.0153 3.40 803.34 44.44 1,906.05 4.92 98.00 4.71 0.0291 2.48 1.88 4.01
SVM 0.0527 6.12 792.96 33.375 1,194.95 4.00 61.87 4.53 0.0177 2.35 1.61 3.51
Iterated FANN 0.0319 5.25 669.03 40.472 813.38 2.75 93.37 5.48 0.0378 3.39 2.76 4.71
Iterated EANN 0.0365 5.48 899.89 30.435 322.17 2.27 75.50 5.41 0.0333 3.32 2.58 3.94
Direct EANN 0.0189 3.80 1,479.64 53.616 1,772.28 5.12 124.8 5.66 0.0967 5.64 1.94 3.80
Simple average 0.0172 3.75 693.11 30.848 301.78 2.05 45.52 3.62 0.0189 2.23 0.91 2.28
Trimmed mean (40 % 0.0222 4.21 743.97 31.633 301.22 1.96 39.74 3.33 0.0196 2.33 1.11 2.63
trimming)
Median 0.0232 4.29 795.67 31.007 296.84 2.00 39.35 3.42 0.0176 2.25 1.28 2.67
LSR 0.0254 4.10 881.16 113.62 263.48 1.90 81.06 4.31 0.0188 2.27 1.70 3.84
Outperformance 0.0152 3.60 686.79 31.654 285.65 2.18 45.87 3.64 0.0143 2.16 1.19 2.72
Proposed-I (0 B a \ 0.5) 0.0146 3.33 664.21 29.802 174.64 1.56 26.51 2.81 0.0127 1.97 0.792 2.08
Proposed-II (0.5 B a \ 1) 0.0115 3.03 646.00 30.245 255.27 1.72 29.90 2.82 0.0139 2.03 0.725 2.36
dataset are given in transformed scale (original Fig. 5a, the MSE values for SNSPOT and RGNP are
MSE = MSE 9 104). divided by 10,000, and those for BIRTHS and UE are
The following important observations are evident from divided by 1,000 and 100, respectively. Similarly, in
Tables 3 and 4: Fig. 5b, the SMAPE values for the SNSPOT data are
divided by 10. Figure 5a, b clearly show that the two
1. The obtained forecasting accuracies notably vary
forms of our proposed scheme, viz. Proposed-I and
among the individual models, and no single model
Proposed-II achieved least MSE and SMAPE values
alone could achieve the best forecasting results for all
throughout.
datasets.
We further show the percentage reductions in MSE and
2. In terms of MSE, the simple average and the median
SMAPE of the best individual models through our pro-
outperformed the best individual models for three and
posed schemes in Fig. 6a, b, respectively. From these fig-
four datasets, respectively. In terms of SMAPE, both
ures, it can be seen that except SNSPOT, for all other
of them outperformed the best individual model for
datasets, our proposed schemes reduced the forecasting
four datasets.
errors of the best individual models to considerably large
3. Our proposed schemes, viz. Proposed-I and Proposed-
extents. Only for SNSPOT, the amounts of error reductions
II outperformed all individual forecasting models as
are small, which can be credited to the reasonably good
well as linear combination methods in terms of both
performances of the corresponding best individual models
MSE and SMAPE.
for this dataset.
4. Among themselves, the Proposed-I scheme achieved
The diagrams of the actual observations and their fore-
least MSE and SMAPE values for three and five
casts through the proposed combination scheme for all six
datasets, respectively, whereas the Proposed-II scheme
time series are shown in Fig. 7. The closeness among the
achieved least MSE and SMAPE values for three and
actual and forecasted observations for each dataset is
one datasets, respectively.
clearly visible in all the six plots of Fig. 7.
We present the two bar diagrams in Fig. 5 for visual We have carried out the nonparametric Friedman test for
depictions of the forecasting performances of different a statistical analysis of the obtained forecasting results.
methods for all six time series. This test evaluates the null hypothesis (H0) that all the
We have transformed the error measures for some forecasting methods are equally effective in terms of MSE
datasets in order to uniformly depict them in Fig. 5a, b. In or SMAPE against the alternative hypothesis (H1) that all
123
Neural Comput & Applic
of them are not equally effective [38]. The obtained Here, p represents the probability that the null hypoth-
Friedman test results are as follows: esis is true. From the sufficiently small values of p, we
reject the null hypotheses with 95 % confidence level and
• For forecasting MSE, the Friedman’s v2 statistic is
conclude that the forecasting methods differ significantly in
46.92 and p = 0.0000022.
terms of both MSE and SMAPE.
• For forecasting SMAPE, the Friedman’s v2 statistic is
The Friedman test results are depicted in Fig. 8a, b. In
46.69 and p = 0.0000024.
these figures, the mean rank of a forecasting method is
(b)
12
ARIMA
SVM
10
Iterated FANN
Iterated EANN
8
Direct EANN
Simple average
6
Trimmed mean
4 Median
LSR
2 Outperformance
Proposed-I
0 Proposed-II
LYNX SNSPOT RGNP BIRTHS AP UE
40 25
20
30
15
20
10
10
5
0 0
123
Neural Comput & Applic
(a) (b)
3.8 200
3.6 180
160
3.4
140
3.2
120
3 100
2.8 80
60
2.6
40
2.4
20
2.2 0
0 2 4 6 8 10 12 14 0 10 20 30 40 50 60 70
(c) (d)
850 200
800 190
180
750
170
700
160
650
150
600
140
550
130
500 120
450 110
0 5 10 15 1 2 3 4 5 6 7 8 9 10
(e) (f)
650 35
34
600
33
32
550
31
500 30
29
450
28
27
400
26
350 25
0 2 4 6 8 10 12 1 2 3 4 5 6 7 8
Fig. 7 Diagrams of actual and forecasted observations for the time series: a LYNX, b SNSPOT, c RGNP, d BIRTHS, e AP, f UE
pointed by a circle, and the horizontal bar across each From Fig. 8a, b, it can be seen that the individual
circle is the critical difference. The performances of two forecasting methods do not differ significantly among
methods differ significantly if their mean ranks differ by at themselves, but many of them are significantly outper-
least the critical difference, i.e., if their horizontal bars are formed by our combination schemes in terms of both
nonoverlapping. obtained MSE and SMAPE values.
123
Neural Comput & Applic
-5 0 5 10 15 -5 0 5 10 15
123
Neural Comput & Applic
22. Pollock DSG (1993) Recursive estimation in econometrics. 31. Suykens JAK, Vandewalle J (1999) Least squares support vector
Comput Stat Data An 44:37–75 machines classifiers. Neural Process Lett 9(3):293–300
23. Bunn D (1975) A Bayesian approach to the linear combination of 32. Hamzaçebi C, Akay D, Kutay F (2009) Comparison of direct and
forecasts. J Oper Res Q 26:325–329 iterative artificial neural network forecast approaches in multi-
24. Zhang GP (2003) Time series forecasting using a hybrid ARIMA periodic time series forecasting. J Expert Syst Appl 36(2):
and neural network model. Neurocomputing 50:159–175 3839–3844
25. Khashei M, Bijari M (2010) An artificial neural network (p, d, q) 33. Gheyas IA, Smith LS (2011) A novel neural network ensemble
model for time series forecasting. J Expert Syst Appl 37(1): architecture for time series forecasting. Neurocomputing 74(18):
479–489 3855–3864
26. Khashei M, Bijari M (2011) A novel hybridization of artificial 34. Lim CP, Goh WY (2005) The application of an ensemble of
neural networks and ARIMA models for time series forecasting. boosted Elman networks to time series prediction: a benchmark
J Appl Soft Comput 11:2664–2675 study. J Comput Intel 3(2):119–126
27. Adhikari R, Agrawal RK (2012) A novel weighted ensemble 35. Chapelle O (2002) Support vector machines: introduction prin-
technique for time series forecasting. In: Proceedings of the 16th ciples, adaptive tuning and prior knowledge. Ph.D. Thesis, Uni-
Pacific-Asia conference on knowledge discovery and data mining versity of Paris, France
(PAKDD), Kuala Lumpur, Malaysia, pp 38–49 36. Demuth H, Beale M, Hagan M (2010) Neural network toolbox
28. Winkler RL, Clemen RT (1992) Sensitivity of weights in com- user’s guide. The MathWorks, Natic, MA
bining forecasts. Oper Res 40(3):609–614 37. Hyndman RJ (2011) Time series data library (TSDL), http://
29. Tseng FM, Yu HC, Tzeng GH (2002) Combining neural network robjhyndman.com/TSDL/
model with seasonal time series ARIMA model. J Technol 38. Hollander M, Wolfe DA (1999) Nonparametric statistical methods.
Forecast Soc Change 69:71–87 Wiley, Hoboken, NJ
30. Vapnik V (1995) The nature of statistical learning theory.
Springer, New York
123