You are on page 1of 29

Computational Intelligence, Volume 0, Number 0, 2014

A ROLLING GREY MODEL OPTIMIZED BY PARTICLE SWARM


OPTIMIZATION IN ECONOMIC PREDICTION
 ;2;3
LI LIU1 ,QIANRU WANG,3 JIANZHOU WANG ,4 AND MING LIU5
1
School of Software Engineering, Chongqing University, Chongqing, China
2
School of Computing, National University of Singapore, Singapore
3
School of Information Science and Engineering, Lanzhou University, Gansu, China
4
School of Mathematics and Statistics, Lanzhou University, Gansu, China
5
School of Electrical and Information Engineering, The University of Sydney, Sydney, Australia

Grey system theory has been widely used to forecast the economic data that are often nonlinear, irregular, and
nonstationary. Current forecasting models based on grey system theory could adapt to various economic time series
data. However, these models ignored the importance of the model parameter optimization and the use of recent data,
which lead to poor forecasting accuracy. In this article, we propose a novel forecasting model, called particle swarm
optimization rolling grey model (PSO-RGM(1,1)), based on a rolling mechanism GM with optimized parameters
by using the particle swarm optimization algorithm. The simple model is shown to be very effective in forecasting
the tertiary industry data sequences, which are short and noisy but regular in secular trend. The experimental results
show that PSO-RGM(1,1) outperforms other commonly used forecasting models on three real economic data sets.
Our empirical study shows that PSO is found to be the best overall algorithm to optimize the parameter of RGM
compared with other well-known metaheuristics. Furthermore, we evaluated other variant PSOs and found that
single particle PSO outperforms others overall in terms of prediction accuracy, convergence speed, and degree of
certainty.

Received 4 January 2013; Revised 22 September 2013; Accepted 8 June 2014


Key words: grey system theory, economic forecasting, rolling mechanism, particle swarm optimization.

1. INTRODUCTION

Effective forecasting is a vital component of tertiary industry economic risk manage-


ment. Forecasting helps to establish economic policies for governments, organizations, and
enterprises to avoid management risks and plan future budgets. Over the past few decades,
many forecasting models were proposed and focused on improving forecasting accuracy.
These forecasting models can be divided into two categories: causal models and time series
models (Liu et al. 1991). The causal model assumes that independent variables could explain
the variations in dependent variables. The historical relationship between dependent and
independent variables will remain valid in future. The causal model includes multiple lin-
ear regression analysis and econometric models. However, the limitations of causal models
include the availability and reliability of independent variables.
Time series models assume that history will repeat itself. The future values are
forecasted based on the information obtained from the past and current data points. In the lit-
erature, two main approaches for time series prediction are statistical and machine-learning-
based approaches. The well-known statistical models proposed include autoregressive,
moving average, autoregressive moving average (ARMA), and autoregressive integrated
moving average (ARIMA) models. The widely used machine-learning approaches include
the neural network (NN) based models (Quah and Srinivasan 1999; Rabiner 1989; Roman

Address correspondence to Jianzhou Wang, School of Information Science and Engineering, Lanzhou University, Gansu
730000, China; e-mail: wjz@lzu.edu.cn

Correction added on 29 October 2015, after first online publication: the first affiliation has been added for Li Liu and the rest
of the author affiliations have been reordered as a result of this change.

© 2014 Wiley Periodicals, Inc.


COMPUTATIONAL INTELLIGENCE

and Jameel 1996), support vector machines (De Gooijer and Hyndman 2006; He et al.
2008; Shen et al. 2010; Tkacz 2001), fuzzy systems (Kandel 1991), linear regression,
Kalman filtering (Ma and Teng 2004), and hidden Markov models (Rabiner 1989). All
of these approaches were used for learning the forecasting models. The statistical mod-
els are generally not as accurate as the learning-based approaches for nonlinear problems
(Kayacan et al. 2010), and are usually used for short-term forecasting. However, the
machine-learning-based approaches are often limited by insufficient data (Hsu 2003) and
the training time is relatively long (Jo 2003).
However, tertiary industry economic data are often highly nonlinear, irregular, and non-
stationary, but upward moving in secular trend. Besides, the data sequences generated from
these yearly or quarterly tertiary industry reports are often short. On the other hand, risk
management requires not only the prediction of individual data point but also the estimation
of trend in tertiary industry. As such, it is very difficult to fit a model to them by the use of
conventional linear statistical methods or NNs to forecast either short-term or medium-term
production from these noisy and insufficient data sets. In this study, grey prediction theory
is proposed to alleviate the problem.
Grey system theory was introduced and developed by Deng (1989) for mathematical
analysis on the data set with uncertainty and roughness. It only requires a small set of train-
ing data, which are discrete or incomplete, to construct a model for forecasting. The data
with uncertainty and roughness are called “grey” data. Grey system theory has been widely
and successfully used to do short-term forecasting in many areas. It is shown that the grey
model (GM) is more robust with respect to noise and lack of modeling information when
compared to conventional methods because grey predictors adapt their variables to new
conditions because new outputs become available. In recent years, GM has been optimized
in many different ways. Rolling mechanism (RM) is one of the most effective methods to
improve the performance of GM (Akay and Atak 2007; He et al. 2008; Ju-Long 1982; Tang
and Yin 2012). It incorporates the usage of recent data to handle noisy sequence. Tang and
Yin (2012) improved the forecasting accuracy for education expenditure by RM. Zhao et al.
(2012) forecasted the per capita annual net income of rural households in China by using
RM and obtained greater accuracy. Besides, RM is able to extend GM to relatively long-term
forecasting.
Recently, researchers have also paid close attention to the optimization problem to
improve the predictive ability of GM; the parameters of which are constant or set artifi-
cially (Kayacan et al. 2010). Several optimization techniques, such as the genetic algorithm
(GA) (Min et al. 2006; Yao and Chu 2008) and NN (Hsu 2010, 2011, Hsu and Chen 2003),
have been proposed. However, they did not consider the impact of the recent data from
the sequence. In this article, we proposed an RM based GM optimized by using the par-
ticle swarm optimization (PSO) algorithm to handle short and noisy but secularly regular
data sequences. The purposes of the proposed method are not only to improve the pre-
diction accuracy of individual year but also to improve the accuracy of trend in the next
few years. PSO, which developed by Eberhart and Kennedy (1995), is considered as a tool
for optimization of difficult numerical solutions. The PSO algorithm had been enormously
successful in about 700 applications (Poli 2008). It does not require that the optimiza-
tion problem be differentiable. More specifically, the empirical studies conducted by
Durillo et al. (2010) and Calborean et al. (2013) showed that PSO was found to be the best
overall algorithm in terms of convergence speed on the multiobjective optimization prob-
lems, and the application of the automatic space exploration on the superscalar computer
system. Our empirical study also shows that PSO is the best overall algorithm for parameter
optimization when considering the accuracy, convergence speed, and degree of certainty for
economic predictions.
PSO-RGM IN ECONOMIC PREDICTION

This article is organized as follows. Section 2 outlines the basic GM and the improved
GM model with RM. Section 3 presents the proposed forecasting model. We report the
experiments and evaluation results on three economic data sets in Section 4. Section 5 dis-
cusses the performances of other forecasting models and variant PSOs, and the effects of
other model factors. Section 6 concludes this article.

2. FUNDAMENTAL CONCEPTS OF GREY MODEL

The grey system theory focuses on extracting realistic governing laws of the system
from the available data of the system generally with white noise data. A GM in grey system
theory is denoted by GM(n,m), where n indicates the order of the difference equation and
m indicates the number of variables. Although various types of GMs can be mentioned, we
focus our attention on GM(1,1) model in this study because of its computational efficiency.
Besides, GM(1,1) only need to fit one variable, and thus, it is able to be used for insufficient
data sequence modeling.

2.1. Principal of Basic Model GM(1,1)


GM(1,1) is the basic GM, which has been widely applied to carry on the short-term
prediction because of its computational efficiency. It uses a first order differential equation
to predict an unknown system. A GM(1,1) algorithm is described in the succeeding text:
Step 1: The original time sequence is initiated by
 
x .0/ D x .0/ .1/; x .0/ .2/; : : : ; x .0/ .n/ ;

where x .0/ .i/ the time series data at time i, n is the length of sequence which must be equal
to or larger than 4.
On the basis of the initial sequence x .0/ , a new sequence
 
x .1/ D x .1/ .1/; x .1/ .2/; : : : ; x .1/ .n/

is set up through the accumulated operator generates, which is monotonically increasing to


weaken the variation tendency defined as
Xk
x .1/ .k/ D x .0/ .i/:
i D1

For instance, a time series sequence x .0/ D .1; 2; 3; 4; 5/ representing 5-year economic
data does not have a clear regularity. Grey system theory is applied to accumulate gener-
ation of x .0/ to obtain a new sequence x .1/ D .1; 3; 6; 10; 15/, which has a clear growing
tendency.
Step 2: Establishing the first-order differential equation of GM(1,1) as

dx .1/
C ax .1/ D b (1)
dt
and its difference equation is

x .0/ .k/ C a´.1/ .k/ D b; 2  k  n: (2)


COMPUTATIONAL INTELLIGENCE

where a is the development coefficient and b is the driving coefficient, and ´.1/ D
´ .2/; ´.1/ .3/; : : : ; ´.1/ .n/ is the generated sequence of ´.1/ .k/ D ˛x .1/ .k/ C .1 
.1/

˛/x .1/ .k  1/.


In the original GM(1,1), ˛ is set to the mean value of adjacent data ´.1/ .k/ D 0:5 
x .1/ .k/ C 0:5  x .1/ .k  1/.
Step 3: From equation (2), we can obtain the following equation,

x .0/ .2/ C a´.1/ .2/ D b;


x .0/ .3/ C a´.1/ .2/ D b;
(3)
:::
x .0/ .4/ C a´.1/ .2/ D b:

In the preceding text, P D Œa; bT is a sequence of coefficient parameters that can be
computed by employing the least squares method:

 1 T
P D BT B B YN (4)

where yN is the constant vector


h iT
YN D x .0/ .2/; x .0/ .3/; : : : ; x .0/ .n/ ;

and B is the accumulated matrix


2 3
´.1/ .2/ 1
6 ´.1/ .3/ 1 7
BD6
4 :::
7:
:::5
´.1/ .n/ 1

Step 4: Substituting P in equation (3) with equation (4), the solution of the prediction value
of x .1/ at time k is
 
.1/ .1/ b a.k1/ b
xO .k/ D x .1/  e C : (5)
a a

After performing an inverse accumulated generating operation on equation (5), the


predicted value of x .0/ .k/ at time k is xO .0/ .k/ D xO .1/ .k/  xO .1/ .k  1/, where 2  k  n.

2.2. GM(1,1) with Rolling Mechanism (RGM(1,1))


GM(1,1) uses the whole data set for prediction. However, the recent data can improve
forecast accuracy (Akay and Atak 2007). RM, which is a metabolism technique that updates
the input data by discarding old data for each loop in grey prediction, can be applied to
perform the prediction. The purpose of RM is that, in each rolling step, the data utilized for
next forecasting is the most recent data.
PSO-RGM IN ECONOMIC PREDICTION

Algorithm: RGM(1,1).

Input:
 
x .0/ D x .0/ .1/; x .0/ .2/; : : : ; x .0/ .l/ —a sequence of sample data.

Output:
 
xO .0/ D xO .0/ .l C 1/; xO .0/ .l C 2/; : : : ; xO .0/ .l C n/ —a sequence of predicted data.

Parameters:

l—the number of sample data to build the GM(1,1) model in each rolling loop.
m—the number of predicted data in each loop, namely n data to be predicted in total, and
m  n. ˙n
k—an integer number that is called rolling number, k D m .
 
1: Rs D x .0/ .1/; x .0/ .2/; : : : ; x .0/ .l/ ;
2: Build a GM.1; 1/ model using the data set Rs by the algorithm described in Section 2.1;
3: k D 1;  ˙ n 
4: WHILE k < m DO
 .0/ 
O
5: Calculate R D xO .l C .k  1/m C 1/; xO .0/ .l C .k  1/m C 2/; : : : ; xO .0/ .l C km/
m

by using the latest model GM.1; 1/ with ˛ D 0:5; 


6: Rs D xO .0/ .km C 1/; : : : ; xO .0/ .l C .k  1/m/; xO .0/ .l C .k  1/m C 1/; : : : ; xO .0/ .l C km/ ;
7: Rebuild the GM.1; 1/ model using the data set Rs ;
8: k D k C 1;
9: END WHILE  
10: RETURN xO .0/ D xO .0/ .l C 1/; xO .0/ .l C 2/; : : : ; xO .0/ .l C n/ I
Figure 1 shows an example of predicting the data of the sixth–eighth year from the
first 5 years’ data by using the rolling grey model (RGM(1,1)) algorithm, where l D 5
and m D 1. Unlike the method that predict all of the data of the 6th–8th year by using
the same GM(1,1) model built from the first 5 years’ data, RGM(1,1) only predicts the
data of the sixth year xO .0/ .6/ by the GM(1,1) model built from the first 5 years’ data
.0/ .0/ .0/ .0/ .0/
x .1/; x .2/; x .3/; x .4/; x .5/ . And then it predicts the data of the seventh year
xO .0/ .7/ by the new model built from its former 5 years’ data, namely the data of second– 
fifth year and the 6th year’s predictive data x .0/ .2/; x .0/ .3/; x .0/ .4/; x .0/ .5/; xO .0/ .6/ ,

FIGURE 1. An example of the forecasting procedure by RGM(1,1), where l D 5; m D 1, and k D 3.


COMPUTATIONAL INTELLIGENCE

which can improve the prediction accuracy of the subsequent data. Similarly, it pre-
dicts the data of the eighth year xO .0/ 8/ by the model built from its former 5 years’
data x .0/ .3/; x .0/ .4/; x .0/ .5/; xO .0/ .6/; xO .0/ .7/ . There are totally three loops calculated in
RGM(1,1) for predicting xO .0/ .6/; xO .0/ .7/, and xO .0/ .8/. Hence, the rolling number k is 3.

3. GM(1,1) WITH ROLLING MECHANISM OPTIMIZED BY PSO

Because ˛ directly influences the calculation of a and b in GM(1,1) model, and is one
of the most important factors that may influence the performance of the models, we present
an algorithm based on RGM(1,1) combined with PSO, which optimizes the parameter ˛ in
each rolling period to improve the forecasting accuracy.

3.1. The Parameter ˛ in GM(1,1)


In GM(1,1) model, the value of ˛ is customized to the mean value for each
.1/
´
 .1/ .k/ D ˛x .1/ .k/ C .1   ˛/x .1/ .k  1/ in the generated sequence ´.1/ D
´ .2/; ´.1/ .3/; : : : ; ´.1/ .n/ . It means that each data point has the equal weight for the
predication. However, Tan (2000) found that GM(1,1) model often poorly performs and
makes delay errors for quick growth sequences P by .1/ using the mean value. Tan proposed
p1 n x .k/
a method that set ˛ to 2p , where p D kD2 x .1/ .k1/ , to widen the adaptability of
GM(1,1) model to various kinds of time sequences. Chang et al. (2005) found that the RGM
with variable ˛ value generates better forecasts than it with a fixed ˛ value. They deter-
mined the ˛ value by the timely percent change. From these study, we can find that for the
trend prediction of nonmonotonous functions, the forecasting outcomes are much better if
the value of ˛ is set appropriately on the grey predicted results. However, Tan’s method used
the whole data set to calculate a fixed value of ˛. It did not consider the effect of recent
data which would improve accuracy. In this article, we proposed a new algorithm frame-
work ˛-RGM(1,1), which is based on RGM(1,1) that considers the recent data in future
prediction.

3.2. ˛-RGM(1,1) Algorithm


 .0/ 
Assuming l initial sample data x .0/ D x .1/; x .0/ .2/; : : : ; x .0/ .l/
.0/
used
.0/
to build GM(1,1) model and
.0/ .0/
 to predict n future data x D
x .l C 1/; x .l C 2/; : : : ; x .l C n/ . The best ˛ value should be found in each loop
by a given method. In the original GM(1,1), ˛ always equals to 0.5.

Algorithm: ˛-RGM(1,1).

Input:
 
x .0/ D x .0/ .1/; x .0/ .2/; : : : ; x .0/ .l/ —a sequence of sample data.

Output:
 
xO .0/ D xO .0/ .l C 1/; xO .0/ .l C 2/; : : : ; xO .0/ .l C n/ —a sequence of predicted data.
PSO-RGM IN ECONOMIC PREDICTION

Parameters:

l—the number of sample data to build the GM(1,1) model in each rolling loop.
m—the number of predicted data in each loop, namely n data to be predicted in total, and
m  n. ˙n
k—an integer number that is called rolling number, k D m .
 
1: Rs D x .0/ .1/; x .0/ .2/; : : : ; x .0/ .l/ ;
2: k D 1;  ˙ n 
3: WHILE k < m DO
4: Find a best value of ˛ by using a given strategy;
5: Build a GM.1; 1/ model with the parameter ˛ using the data set Rs ; 
6: Calculate RO m D xO .0/ .l C .k  1/m C 1/; xO .0/ .l C .k  1/m C 2/; : : : ; xO .0/ .l C km/
by GM.1; 1/; 
7: Rs D xO .0/ .kmC1/; : : : ; xO .0/ .l C.k 1/m/; xO .0/ .l C.k  1/m C 1/; : : : ; xO .0/ .l Ckm/ ;
8: k D k C 1;
9: END WHILE  
10: RETURN xO .0/ D xO .0/ .l C 1/; xO .0/ .l C 2/; : : : ; xO .0/ .l C n/ .

3.3. Calculating ˛ by PSO


In ˛-RGM(1,1) algorithm, the strategy of finding a value of ˛ could be proposed in a
variety of ways. RGM(1,1) sets the value of ˛ to 0.5, which does not consider any influence
of sequence data. Although Tan’s strategy could adapt to various sequences, it did not con-
sider the impact of the recent data from the sequence. Chang’s strategy only considered the
timely percent change for the prediction. It could perform well on short-term prediction, but
it could not acclimatize itself to the noisy data sequence for relatively long-term prediction.
In this article, we use PSO to find the value of ˛ in each loop in ˛-RGM(1,1). We named
our PSO-based algorithm as PSO-RGM(1,1) (or simply PSO-RGM).

3.3.1. ˛-PSO Algorithm. In this section, we present the ˛-PSO algorithm to find a
best value of ˛ based on PSO. PSO consisting of a swarm of particles searches large spaces
of new candidate solutions iteratively, where the fitness is calculated as certain quality
measure, with few or without any assumptions about the problem being optimized. Each
particle contains a position that represents a possible solution and a velocity that represents
the direction and position moving to the next solution. Both variables can be updated at
every iteration according to the equation (˛-PSO Algorithm, lines 27–29). During the
movements, each particle decides its next velocity and position moving into the best area
in terms of its current best fitness value with the population’s best fitness value at every
iteration.

Algorithm: ˛-PSO.

Input:
 
xs.0/ D x .0/ .1/; x .0/ .2/; : : : ; x .0/ .q/ —a sequence of training data.
 
xp.0/ D x .0/ .q C 1/; x .0/ .q C 2/; : : : ; x .0/ .q C d / —a sequence of verifying data.
COMPUTATIONAL INTELLIGENCE

Output:

˛best —the value of ˛ with the best fitness value in particle searching space.

Parameters:

q—the number of sample data used to build the RGM(1,1) model.


d —the number of data to be predicted used in fitness function.
popsize—the population size of candidates of ˛ in particle searching space.
itermax —the maximum number of iterations.
w; c1 ; c2 —the parameters of PSO.

1: /* Initialize popsize candidates with the values between 0 and 1 */


2: FOR EACH i W 1  i  popsize DO
3: ˛i1 D rand./;
4: END ®FOR ¯
5: P D ˛iiter W 1  i  popsize ;
6: iter D 1;
7: /* Find the best value of ˛ repeatedly until the maximum iterations are reached. */
8: WHILE .iter  itermax / DO
9: /* Find the best fitness value for each candidates */
10: FOR EACH ˛iiter 2 P DO
11: Build RGM.1; 1/ by using xs.0/ with the ˛iiter value;
 
12: Calculate xO p.0/ D xO .0/ .q C 1/; xO .0/ .q C 2/; : : : ; xO .0/ .q C d / by RGM.1; 1/;
13: /* Choose
 the best fitness value of the i t h candidate in history. */
 iter 
14: IF pBesti > fitness  ˛i THEN
15: pBesti D fitness ˛iiter ;
16: END IF
17: END FOR
18: /* Choose the candidate with the best fitness value of all the candidates */
19: FOR EACH ˛iiter 2 P DO
20: IF .gBest > pBesti / THEN
21: gBest D pBesti ;
22: ˛best D ˛iiter ;
23: END IF
24: END FOR
25: /*Update the values of all the candidates by using PSO’s evolution equations. */
26: FOR EACH ˛iiter 2 P DO    
27: viiterC1 D w  viiter C c1  rand./  pBesti  ˛iiter C c2  rand./  gBest  ˛iiter ;
28: ˛iiterC1 D ˛iiter C viiterC1 ;
29: END FOR
30: P D ¹˛iiterC1 W 1  i  popsizeº;
31: iter D iter C 1;
32: END WHILE
33: RETURN ˛best

˛iiter is the ith candidate solution at the iterth iteration. pBesti is the best fitness value
of the ith candidate found so far. gBest is the best fitness value found so far from all the
PSO-RGM IN ECONOMIC PREDICTION

candidates. rand./ generates a random value between 0 and 1. A population of ˛ candidates


is initialized with a set of random values between 0 and 1. They move iteratively to the new
candidates with the best fitness value. We defined the fitness function as
ˇ ˇ!
  1 Xd ˇˇ x .0/ .q C k/  xO .0/ .q C k/ ˇˇ
fitness ˛iiter D sigmoid ˇ ˇ ; (6)
d kD1 ˇ x .0/ .q C k/ ˇ

which calculates the fitness value of the ith candidate at iterth iteration. The fitness function
indicates the average degree of the forecasting bias compared with the actual data. Theoret-
ically, the range of the forecasting bias is Œ0; 1. The sigmoid./ is the sigmoid function that
maps from Œ0; 1 to Œ0; 1. It is mathematically formulated as
´
sigmoid.´/ D p : (7)
1 C ´2
The candidate with the minimum fitness value is elected as the best solution at current
iteration. The ˛-PSO algorithm updates all candidates according to the fitness values by
using velocity and position evolution equations. Finally, the candidate with the best fitness
value is selected from the set of best fitness values at every iteration as the solution of ˛.

3.3.2. Parameters. In ˛-PSO algorithm, the configuration of the cognitive weight


.c1 /, the social weight .c2 /, and the inertia weight .w/ would have a significant impact on
forecasting performance, and convergence speed as well. The two acceleration coefficients
c1 and c2 are used to move the particle toward the individual best and the global best solu-
tion found so far, respectively. Larger c1 indicates that particle has higher confidence toward
self-exploitation. On the other hands, c2 has a contribution toward exploration in global best
direction. w controls the influence of a particle’s historical velocity on the current velocity.
Either theoretical or empirical studies on various problems resolved by PSO can be
found available to help in the selection of appropriate values (Bergh 2002; Kennedy et al.
2001; Trelea 2003). Generally, c1 equals to c2 and ranges from Œ0; 4. A usual choice for
c1 and c2 is fixing both to 1.494. Ratnaweera et al. (2004) investigated the time-varying
acceleration coefficients that c1 and c2 vary with time. It is suggested that particles are
encouraged to move around for better exploitation of their local spaces with a large starting
value of c1 and a small starting value of c2 at the beginning. The value of c1 is getting
smaller, while the value of c2 is getting larger, to boost converging to the global optima
gradually until the end of iterations. Equations 8 and 9 present the procedure of the linearly
varying, where c1C and c2 are the start values, and c1 and c2C are the end values.
  iter
c1 .iter/ D c1C  c1C  c1 : (8)
itermax

  iter
c2 .iter/ D c2 C c2C  c2 : (9)
itermax
A proper value of the inertia weight provides a balance between global and local explo-
rations. A large inertia weight favors global search (GS), while a small inertia weight favors
local search (Shi and Eberhart 1998, 1999). In general, settings near 1 facilitate GS, and
settings ranging from Œ0:2; 0:5 facilitate rapid local search. The linear decreasing weight
controlling was suggested by Shi and Eberhart (1999) that the inertia weight is dynamically
COMPUTATIONAL INTELLIGENCE

adapted by introducing the linearly decreasing Equation 10. w C and w  are usually set to
0.9 and 0.4.
iter
w.iter/ D w C  .w C  w  / : (10)
itermax
The nonlinearly decreasing inertia weight (equation 11) was also proposed by Alfi and
Modares (2011) who incorporated the hyperbolic tangent function (equation 12) to update
wi of each particle i.
1
wi .iter/ D : (11)
1 C tanh.NI i .iter//

e ´  e ´
tanh.´/ D : (12)
e ´ C e ´
where NI iter
i is the neighborhood index of the particle i, which is calculated at each iteration
as
 
iter fitness ˛iiter  gW orst
NI i D : (13)
gBest  gW orst
where gWorstiter is the global worst fitness value at the current iteration. A small NI iter i
indicates the current position is bad that needs global exploration with a large inertia weight.
On the contrary, a large NI iter
i indicates the requirement of local exploitation with a small
inertia weight.
The constriction factor  was proposed by Clerc and Kennedy (2002) to control the
magnitude of the velocities, instead of w. The velocity update scheme (˛-PSO Algorithm,
line 27) is replaced with the following:
   

viiterC1 D   viiter C c1  rand./  pBesti  ˛iiter C c2  rand./  gBest  ˛iiter


(14)
and
2k
D ˇ p ˇ (15)
ˇ 2 ˇ
ˇ2      4 ˇ

where  D c1 C c2 and generally k D 1.


More sophisticated PSO variants were continually being introduced in an attempt to
improve optimization performance by alleviating premature convergence or adapting the
parameters during optimization. However, we use the canonical PSO in the following exper-
iments for the comparisons. We will discuss the effect of other parameter configuration
methods on forecasting performances in Section 5.2.1. Besides, we will investigate the
impact of the train-to-verify ratio .q W d / in Section 5.2.2, which also has important impact
on forecasting results.

4. EXPERIMENTS AND EVALUATIONS

4.1. Data Sets


The prediction of the development of tertiary industry plays an important role in the area
of economic and finance. To evaluate the ability of the PSO-RGM algorithm to handle this
type of prediction, we collected three featured economic data sets: Financial Intermediation,
PSO-RGM IN ECONOMIC PREDICTION

Real Estate and Semiconductor Industry Production. All these data sets were collected from
the China Statistical Yearbook, National Bureau of Statistics of China, and were shown in
Figure 2.
Financial Intermediation in Beijing from 1994 to 2010 has relatively smoothing trends.
Real Estate in Beijing from 1994 to 2010 seems noisy that may be caused by Chinese gov-
ernment policy about estate and the financial crisis in 2008. Taiwan Semiconductor Industry
Production seems regular from 1994 to 2000 but irregular since 2000. All these data sets
are small, 17 data points in the first two data sets, while nine data points in the last data set.

4.2. Evaluation Metrics


Prediction accuracy is an important criteria for evaluating a forecasting algorithm
(Yokuma and Armstrong 1995). We use three kinds of evaluation metrics to evaluate the
prediction accuracy: the accuracy of forecasting a single point, the overall accuracy of
forecasting multiple points, and the accuracy of prediction trend.
A usual way to examine a single point forecasting accuracy of a model is to evaluate the
prediction error (PE) for the ith year by comparing the actual sequence ¹x 0 .i/ºniD1 to the
® ¯n
forecasting sequence xO 0 .i/ i D1 .
ˇ ˇ
ˇ x .0/ .i/  xO .0/ .i/ ˇ
ˇ ˇ
PE(i).%/ D ˇ ˇ ; i D 1; 2; : : : ; n (16)
ˇ .0/
x .i/ ˇ

We used three evaluation metrics to evaluate that overall accuracy of multiple predicted
points: mean absolute percentage error (MAPE), mean absolute deviation (MAD), and mean
squared error (MSE), which are often adopted by De Gooijer and Hyndman (2006) and Tang
and Yin (2012). MAPE is a general accepted metric for prediction accuracy. The formula of
MAPE (Hsu and Wang 2007) is listed in Table 1.
ˇ ˇ
1 Xn ˇˇ x .0/ .i/  xO .0/ .i/ ˇˇ
MAPE.%/ D ˇ ˇ: (17)
n i D1 ˇ x .0/ .i/ ˇ

Mean absolute deviation and MSE measure the average magnitude of the forecast errors.
But the latter imposes a greater penalty on a large error than several small errors. The
2000 1200 8000
1800 7000
Value(100 million yuan)

Value(100 million yuan)


Value(100 million yuan)

1000
1600
6000
1400
800
1200 5000

1000 600 4000


800 3000
400
600
2000
400
200
200 1000

0 0 0
1994 1996 1998 2000 2002 2004 2006 2008 2010 1994 1996 1998 2000 2002 2004 2006 2008 2010 1994 1996 1998 2000 2002
year year year
(a) Financial Intermediation (b) Real Estate (c) Semiconductor Industry
Production

FIGURE 2. The experiments used three data sets, each of which was split into two groups, sample data, and
a sequence of five test data. A portion of sample data were used to train the prediction model, while others were
remained for model verification.
COMPUTATIONAL INTELLIGENCE

TABLE 1. Criterion of MAPE.

MAPE(%) Forecasting power


<10 Excellent
10–20 Good
20–50 Reasonable
>50 Incorrect
MAPE, mean absolute percentage
error.

smaller the values, the closer the predicted values to the actual values (Chen et al. 2012).

1 X ˇˇ .0/ ˇ
n
_.0/ ˇ
MAD D ˇx .i/  x .i/ˇ ; (18)
n
i D1

1 X  .0/ 2
n
_.0/
MSE D x .i/  x .i/ : (19)
n
i D1

Besides, the coefficient of determination, denoted as r 2 , is also applied to evaluate


models in our experiments.
SSE
r2 D 1  ; (20)
SST
where
P  .0/ 2
S SE D x .i/  xO .0/ .i/ ; i D 1; 2; : : : ; n,
P P .0/ 2
S S T D x .0/ .i/2  x .i/ =n; k D 1; 2; : : : ; n.

The sum of squares total (SST) measures the deviations of the observations from their
mean. The sum of squares error (SSE) measures the deviations of observations from their
predicted values. The smaller SSE, the more reliable the predictions obtained from the
model. Therefore, the higher the value of r 2 , the more successful the model is at predicting
statistical data (Zhao et al. 2012). The maximum value of the coefficient of determination
r 2 is 1. Because MAPE, MAD, and MSE measure the mean performance of all predicted
data points, r 2 indicates how well-predicted data points fit the trend.

4.3. Experimental Setup


The experiments were divided into three parts, Experiment I, Experiment II, and Experi-
ment III. Experiment I compared the performance of PSO-RGM with that of GM and RGM.
Experiment II compared the performance of PSO-RGM with those of well-known economic
forecasting models. Experiment III compared the performance of the RGM models with its
parameter ˛ optimized by different popular metaheuristic methods.
All these three data sets were used in our experiments. Financial Intermediation and
Real Estate used 12 data points from 1994 to 2005 as a sample set, while other five data
points from 2006 to 2010 were used as test set. Using only four sample data points selected
PSO-RGM IN ECONOMIC PREDICTION

from 1994 to 1997 in Semiconductor Industry Production to predict the next five data points
is a real challenge for most of forecasting methods.
Experiment I aims to compare the PSO-RGM with the other two basic GMs. GM and
RGM were constructed with a fixed ˛ value 0.5. In RGM and PSO-RGM, the rolling number
k equals to 5. In PSO-RGM, the train-to-verify ratio q W d was set to 3 W 1, namely q D 9
and d D 3 for Financial Intermediation and Real Estate, and q D 3 and d D 1 for
Semiconductor Industry Production.
The purpose of Experiment II is to compare the PSO-RGM with other five well-known
economic forecasting models: autoregressive model, ARMA, ARIMA, Volterra model
(Volterra), and neutral network(NN). We also considered other commonly used forecasting
models in this study, such as Markov forecasting model (MM), hidden Markov forecasting
model, generalized autoregressive conditional heteroskedasticity model, and so on. How-
ever, a relatively sufficient number of sample data are required to train these models. For
instance, the observation number produced from the sample set in our study (e.g., three
training data in Semiconductor Industry Production) is insufficient to fit the variables of the
Markov forecasting model model.
Experiment III was designed to compare different metaheuristics optimizing the param-
eter ˛ of the RGM. Six well-known metaheuristics, ant colony optimization, estimation
of distribution algorithms (EDA), GA, GS, simulated annealing (SA), and Tabu Search
(TABU), were used to compare with PSO with the same objective function in equation 6 to
find a parameter ˛ with the best value. Each RGM model with different metaheuristics is
simply implemented by replacing the ˛-PSO algorithm to the corresponding meta-heuristic
algorithm with the same inputs and outputs (see the ˛-RGM algorithm, line 4).
For all the learning-based methods, including NN and all metaheuristics, we split the
sample set into two parts, training set and verifying set. The train-to-verify ratio was set to
3 W 1. We will discuss the effect of train-to-verify ratio in Section 5.2.2.
The parameters used by the different forecasting models and the metaheuristics
(Appendix: Table 1) were selected based on the analysis from the literature or this study.
We will discuss the influence of the parameter settings on prediction metrics, but we do not
intent to study the property of each model or metaheuristic, nor to analyze the similarities
and differences between them in detail, which are out of scope of this study.
For PSO-RGM, we set popsize D 1, and itermax D 100. We selected the weights with
c1 D 2, c2 D 2, and w D 0:5. Either each forecasting model or each optimized RGM
model was executed for 100 times, because the randomness or probability mechanisms
within them could cause various results. We chose the predicted values with the best r 2 in
our comparisons. The certainty of the models and the metaheuristics will be discussed in
Section 5.2.4.

4.4. Experiment I
Table 2 shows the predication results of the three algorithms. The PSO-RGM obtains the
best overall prediction accuracy. For Financial Intermediation, GM obtains the best single
accuracy at data points in 2007 and 2010 with PEs 11:49% and 3:6%, respectively. PSO-
RGM has the lowest PEs at other points. Besides, PSO-RGM outperformed both GM and
RGM on MAPE, MAD, MSE, and r 2 . Figure 3(a) and Figure 3(d) intuitively show that very
little difference exists among the three models.
For Real Estate, Figure 3(b) shows that PEs obtained by PSO-RGM under 7% were
smaller than those of GM and RGM. It also shows that PSO-RGM can forecast the trend
very well that it catches the descending trend between 2009 and 2010. Therefore, PSO-
RGM obtains a very high r 2 . Both GM and RGM poorly performed on every metric. A
COMPUTATIONAL INTELLIGENCE

TABLE 2. The Parameters and Predicted Values Calculated by GM, RGM, and PSO-RGM, Respectively,
and the Comparison of the Evaluation Metrics on These Three Models.

GM RGM PSO-RGM
Year Actual ˛ Predicted PE(%) ˛ Predicted PE(%) ˛ Predicted PE(%)
Financial Intermediation
2006 982 994 1.24 994 1.24 0.484 990 0.83
2007 1,302 1,152 11.49 1,101 14.62 0.501 1,147 11.90
2008 1,519 0.5 1,336 12.01 0.5 1,282 15.31 0.583 1,360 10.46
2009 1,603 1,549 3.36 1,494 6.29 0.589 1,622 1.15
2010 1,863 1,796 3.60 1,758 5.61 0.575 1,955 4.90

MAPE(%) 6.3452 8.7544 5.8500?


MAD 93.1390 128.5700 86.4316?
MSE 12,666 22,025 11,618?
r2 0.8560 0.7495 0.8679?

Real Estate
2006 658 835 26.93 835 26.93 0.312 690 4.87
2007 821 1,076 31.074 1,054 28.36 0.363 787 4.11
2008 844 0.5 1,387 64.28 0.5 1,329 57.39 0.378 899 6.44
2009 1,062 1,787 14.85 1,674 57.56 0.414 1,038 2.29
2010 1,006 2,303 128.89 2,106 109.24 0.253 1,019 1.33

MAPE(%) 63.8925 55.9013 3.8100?


MAD 599.65 521.2528 31.6248?
MSE 520,120 380,754 1,182?
r2 24.2190 17.4615 0.9427?

Semiconductor Industry Production


1998 2,834 2,932 3.49 2,932 3.49 0.493 2,933 3.49
1999 4,235 3,559 15.95 3,540 16.39 0.378 4,515 6.63
2000 7,144 0.5 4,319 50.17 0.5 4,271 40.20 0.408 5,355 25.03
2001 5,269 5,241 0.52 5,151 2.23 0.359 6,102 15.81
2002 6,529 6,360 2.58 6,207 4.91 0.270 6,746 3.34

MAPE(%) 12.4203 13.4504 10.8597?


MAD 759.2134 820.9533 649.467?
MSE 1,695,156 1,771,760 930,740?
r2 0.2983 0.2666 0.6147?
Note: Boldface refers to the best annual performance on PE with respect to the three models. The combination
of boldface and star symbol refers to the best performance on MAPE, MAD, MSE, and r 2 , respectively. GM,
grey model; RGM, rolling grey model; PSO, particle swarm optimization; MAPE, mean absolute percentage
error; MAD, mean absolute deviation; MSE, mean squared error; PE, prediction error.

larger than 500 MAD means that the average PE is more than 50% of the actual values.
The MAD and MSE metrics on PSO-RGM indicate that the average differences between
the prediction value and the actual value at each year would not exceed ˙50.˙5%/.
PSO-RGM IN ECONOMIC PREDICTION
2000
Value(100 million yuan) 2400
7000
1800 2100
6000
1600 1800
5000
1400 1500

1200 4000
1200

1000 900 3000

800 600 2000


2006 2007 2008 2009 2010 2006 2007 2008 2009 2010 1998 1999 2000 2001 2002
year year year
(a) Financial Intermediation (b) Real Estate (c) Semiconductor Industry Production
100 100 100

80 80 80

60 60 60
PE(%)

40 40 40

20 20 20

0 0 0
2006 2007 2008 2009 2010 2006 2007 2008 2009 2010 1998 1999 2000 2001 2002
year year year
(d) Financial Intermediation (e) Real Estate (f) Semiconductor Industry Production

FIGURE 3. (a)–(c): The comparison of the annual values predicted by GM, RGM, and PSO-RGM, respec-
tively, with the actual value. (d)–(f): The annual errors of the predicted value compared with the actual
value.

For Semiconductor Industry Production, although PSO-RGM with its MAPE 10.8597%
is not much better than either GM or RGM, it greatly improved the r 2 and the MSE. The
singularity arises at year 2000 with a spike, followed by a suddenly drop at the next year.
All the three models performed poorly but reasonably at this year. PSO-RGM performs the
best with a 25% PE. RGM also obtains a reasonable accuracy at this year. The reason is that
both RGM and PSO-RGM uses the RM which is more sensitive to the recent change.

4.5. Experiment II
Table 3 and Figure 4 shows the comparison of the six models on different evaluation
metrics. PSO-RGM generally outperformed other models except ARIMA and NN across
all three data sets.
For Financial Intermediation, the ARIMA, NN and PSO-RGM models obtain excellent
results on all evaluation metrics. The other models also achieved good MAPE less than 20%
because of the relative regularity of this data set. Figure 5(a) shows that NN model obtains
the best prediction values almost at each year. However, the NN model with the best r 2 was
selected after 100 trials. We found that there were huge differences among the results from
these 100 trails, therefore we could not find the best one in real economic forecasting where
the future data are unknown that we cannot calculate the r 2 or the MAPE for the comparison
between different trained networks. We will discuss the uncertainty of the NN models in
Section 5.1.2. And the metaheuristics that incorporate the randomness mechanism would
also encounter this uncertainty. The degree of certainty of the PSO-RGM will be discussed
in Section 5.2.4.
For Real Estate, the MAPE of the PSO-RGM is less than 5%, which is better than other
models. The other models except ARMA have excellent performances on MAPE, but r 2 is
much lower compared with a value of 0.94 of the PSO-RGM. Others get fairly good MAPEs
but not r 2 .
COMPUTATIONAL INTELLIGENCE

TABLE 3. The Comparison of the Evaluation Metrics on PSO-RGM with Other Classic
Forecasting Methods.

AR ARMA ARIMA Volterra NN PSO-RGM


Financial Intermediation
MAPE(%) 10.971 14.566 5.203 18.034 2.719? 5.850
MAD 165.167 228.341 76.441 278.113 35.009? 86.431
MSE 33,008 65,747 9,156 93,012 1,541? 11,618
r2 0.624 0.252 0.895 0.057 0.982? 0.867

Real Estate
MAPE(%) 8.816 12.264 9.229 8.839 7.340 3.810?
MAD 78.057 111.915 82.486 79.908 60.669 31.624?
MSE 7,869 24,763 9,624 11,163 8,958 1,182?
r2 0.618 0.200 0.533 0.458 0.565 0.942?

Semiconductor Industry Production


MAPE(%) 35.924 48.045 13.399 47.028 19.830 10.859?
MAD 2,069.015 2,882.921 821.550 2,465.420 855.625 649.467?
MSE 8,259,920 11,703,282 1,670,647 9,708,412 1,015,599 930,740?
r2 2.419 3.844 0.308 3.018 0.579 0.615?
Note: The combination of boldface and star symbol refers to the best performance on MAPE,
MAD, MSE, and r 2 , respectively. PSO, particle swarm optimization; RGM, rolling grey model;
AR, autoregressive; ARMA, autoregressive moving average; ARIMA, autoregressive integrated
moving average; MAPE, mean absolute percentage error; MAD, mean absolute deviation; NN,
neural network; MSE, mean squared error.

PSO-RGM
100 100 100 AR
ARMA
ARIMA
Volterra
80 80 80 NN

60 60 60
PE(%)

40 40 40

20 20 20

0 0 0
2006 2007 2008 2009 2010 2006 2007 2008 2009 2010 1998 1999 2000 2001 2002
year year year
(a) Financial Intermediation (b) Real Estate (c) Semiconductor Industry
Production

FIGURE 4. The comparison of the annual prediction errors among different forecasting models.

PSO-RGM still obtains a good MAPE with 15% for Semiconductor Industry Produc-
tion, whereas most of other models except ARIMA and NN are actually not applicable for
this data set because only four data points are not sufficient for training and lead them to
overestimate the values after year 2000, which was followed by a sharp falling at year 2001.
The model trained by either ARMA or Volterro is almost incorrect in terms of their MAPEs
that are as high as approximately 50%. The ARIMA model predicted well with its MAPE
13%. Figure 5(c) shows that almost all of the yearly PEs from ARIMA are a little higher
than those of PSO-RGM.
PSO-RGM IN ECONOMIC PREDICTION
PSO-RGM
100 100 100 ACO
EDA
GA
GS
80 80 80 SA

60 60 60
PE(%)

40 40 40

20 20 20

0 0 0
2006 2007 2008 2009 2010 2006 2007 2008 2009 2010 1998 1999 2000 2001 2002
year year year
(a) Financial Intermediation (b) Real Estate (c) Semiconductor Industry
Production

FIGURE 5. The comparison of the annual prediction errors of different metaheuristics compared with the
actual values.

4.6. Experiment III


Figure 5 and Table 4 show the comparison between the PSO-RGM and the RGMs opti-
mized by different metaheuristics. For Financial Intermediation, we found that the predicted
values are nearly the same by using EDA, GA, GS, SA, and TABU. The ACO overestimates
the first data point, but accurately predicts at the rest of them. Other metaheuristics achieved
with a relatively low PE at the first data point, and a higher PE at the following data points.
All of metaheuristics obtain excellent MAPEs for Financial Intermediation, but the PSO is
the best on r 2 . The PSO outperforms all other metaheuristics for Real Estate. The ACO,
EDA, and TABU were superior to the GA, GS, and SA. For the last data set, EDA with a
MAPE 10:655% outperforms other metaheuristics on average PE. The MAPE of the PSO is
0.2 higher than that of the EDA. However, PSO obtains the best r 2 in all of the metaheuris-
tics. It also indicates that the RGM model with ˛ optimization by all these metaheuristics
expect SA is able to perform well in Semiconductor Industry Production.

5. DISCUSSION

In this section, we discuss the factors related to the statistical models and NN model
that would influence on the forecasting performance. We also examine the performance of
variant PSOs and the effect of train-to-verify ratio. Furthermore, we present and discuss
other two important evaluation metrics, convergence speed and degree of certainty.

5.1. Forecasting Models


A large variety of statistical and machine-learning-based models are available for fore-
casting economic time series including GDP growth, currency inflation and divisia, and so
on. Moreover, evaluating and comparing the forecasting performance of different models
were presented in many literatures. To the best of our knowledge, it is the first time that dif-
ferent forecasting models are compared and evaluated on the territory industry data sets that
have a relatively small number of samples.

5.1.1. Statistical Models. According to the results of our experiments, ARIMA that is
a generalization of ARMA outperforms other two linear models. The reason for the unfavor-
able score produced by ARMA is that the moving-average model that combines with AR in
COMPUTATIONAL INTELLIGENCE

TABLE 4. The Comparison of the Evaluation Metrics on PSO-RGM with RGM Using Other
Metaheuristics Optimizing ˛.

ACO EDA GA GS SA TABU PSO


Financial Intermediation
MAPE (%) 9.540 10.605 10.748 10.752 10.75 10.701 5.850?
MAD 128.025 159.996 162.459 162.521 162.488 161.687 86.431?
MSE 21,815 31,764 32,652 32,675 32,663 32,367 11,618?
r2 0.751 0.638 0.628 0.628 0.628 0.631 0.867?

Real Estate
MAPE (%) 6.040 7.927 11.174 11.210 11.188 9.283 3.810?
MAD 53.309 69.636 101.864 102.212 102.001 83.834 31.624?
MSE 4,149 8,540 17,644 17,749 17,686 12,524 1,182?
r2 0.798 0.585 0.144 0.139 0.142 0.392 0.942?

Semiconductor Industry Production


MAPE (%) 12.873 10.655? 16.546 16.550 209.100 16.415 10.859
MAD 809.637 632.350? 866.136 866.350 1,1530.881 858.827 649.467
MSE 1,846,038 942,916 986,472 986,709 172,959,090 978,221 930,740?
r2 0.235 0.609 0.591 0.591 70.593 0.595 0.615?
Note: The combination of boldface and star symbol refers to the best performance using the seven meta-
heuristics on MAPE, MAD, MSE, and r 2 , respectively. PSO, particle swarm optimization; RGM, rolling
grey model; ACO, ant colony optimization; GA, genetic algorithm; GS, global search; SA, simulated
annealing; TABU, Tabu Search; MAPE, mean absolute percentage error; MAD, mean absolute deviation;
NN, neural network; MSE, mean squared error.

the model requires that the data sequence should follow a fairly linear trend and have a def-
inite rhythmic pattern of fluctuations. However, all sequences in three data sets are neither
very regular nor seasonal that makes the irregular information being removed entirely by
the moving-average method. In ARIMA, there is an additional parameter called differenc-
ing degree, which indicates the number of nonseasonal differences to fine tune the models
to be more accurate on the basis of ARMA.
Finding appropriate values of the arguments p and q in the AR.p/, ARMA.p; q/,
and ARIMA.p; d; q/ models can be facilitated by plotting the partial autocorrelation func-
tions for an estimate of p, and likewise using the autocorrelation functions for an estimate
of q. We used the forward–backward approach to find p for AR model, while the AICc
that was recommended by Brockwell and Davis (2009) to find p and q for ARMA and
ARIMA models. These models can be fitted by least squares regression to find the values
of the parameters, which minimize the error term after p and q are chosen. More param-
eters besides the parameters in AR model should be fitted for moving-average model in
ARMA and ARIMA. In our experiment, for Financial Intermediation and Real Estate,
AR(4), ARMA(3,1), and ARIMA(3, 3, 1) obtained the best performances, respectively.
We also tried other combinations of p; q, and d . When we increased the value of d , we
found the forecasting performance improved gradually until d D 4. When d is set to 4,
the sample data are not sufficient for fitting more parameters required in ARIMA. On the
other hand, it is getting worse when q is larger than 1. For Semiconductor Industry Pro-
duction with extremely short sample sequence, only ARIMA(1, 1, 1) works for building
the ARIMA forecasting model. The samples are insufficient for fitting parameters in higher
order ARIMA.
PSO-RGM IN ECONOMIC PREDICTION

In general, the performances of statistical models are very dependent on their argument
orders, as well as the number of input sample data. We also observed that the statistical
models have good prediction accuracy at the first point, but often poorly perform at the rest.
It indicates that the statistical models are usually more suitable for short-term forecasting.

5.1.2. Neural Network. Neural networks used as nonlinear forecasting models have
gained enormous popularity and success in time series forecasting because there is now
a growing evidence that economic time series contain nonlinearities. As we expected, NN
outperforms most of other models shown in Table 3.
However, there are many parameters required to be elaborately configured. There were
no established rules for choosing the appropriate values of these parameters on economic
forecasting. We had to resort to trial to obtain their appropriate values that lead to the best
forecasting performance. Although there were many studies on how to tune the parameters
of NN, clearly, selection over the whole space of the parameters is beyond the scope of this
article.
We examined the performance with the different configurations of the combinations
of three key parameters, train-to-verify ratio (1:1, 2:1, 3:1, 4:1, and 5:1), feedback delays
(1–5), and hidden layers(1–10) for Financial Intermediation. However, it is hard to find a
rule of the correlations between the parameters and the forecasting performances. Conse-
quently, it is difficult to find an appropriate combination of parameters that brings the model
to the best performance in the practical economic forecasting where MAPE and r 2 are
unknown.
During our experiment on NN, we trained the networks models 100 times for each con-
figuration with the same parameter setting. The forecasting values with the best performance
by NN (the best r 2 in Table 3) were selected to compare with other models in Experiment II.
However, we found that there were giant differences on forecasting values among the net-
works trained with the same configuration. This is caused by the randomness and probability
mechanism inside the NN training methods. Besides, a small number of the sample data
are another reason that a relative steady model cannot be trained. The best MAPE shown in
Figure 6 is 2:71%, but the worst one is 95:56%. Most of the MAPEs are between 10% and
20%. It is impractical to use NN in real economic forecasting application where the future
data are unknown. It is difficult to select which one is the “best” because it is unable to eval-
uate the MAPE or the r 2 without the actual data. One possible solution could be calculating
the mean forecasting values from all training networks. However, experimental results show
that the average MAPE is above 15% by using the regression method.
Another common criticism of NNs is that they require an enormous amount of data for
training in real-world operation. Any machine-learning method needs sufficient representa-
tive samples to build the underlying structure to be more generalized to handle new cases,
5 1.0
100 25
0
-5 0.8
80 20
MAPE(%)
MAPE(%)

-10 0.6
60 15 -15
r2
r2

10 -20 0.4
40
-25
20 5 -30 0.2
-35
0 0 0.0
0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100
network network network network
(c) 2 (d) 2
(a) MAPE (b) MAPE (below 25%) (between 0 and 1)

FIGURE 6. An illustration showing the high degree of uncertainty on forecasting performance for the data
set of Financial Intermediation by using neural network to construct the networks for 100 times under the same
configuration with 4:1 train-to-verify ratio, four feedback delays, and three hidden layers.
COMPUTATIONAL INTELLIGENCE

including forecasting. As such, the NN model for establishing a network with only four
samples for Semiconductor Industry Production was not effective compared with the model
trained from other two data sets with relatively longer sample sequence. We found that the
minimum number of samples required in Matlab Neural Network Toolbox is 10 for building
an effective NN model.

5.2. Metaheuristics
The metaheuristics were used to search the optimal parameter ˛ in this study. They use
different high level strategies that address the exploitation and exploration of the search
spaces. Exploration generally refers to the identification of new high quality solutions by
visiting entirely new regions of a search space, while exploitation refers to the determination
of the regions within the neighborhood of previously visited regions. A class of metaheuris-
tics, for example, TS, SA, EDA and GS, aim to escape from local minima and to move
on to explore other better local minima by using different neighborhood structures accord-
ing to various probability distributions or merely random mechanism. Metaheuristics such
as ACO and EA, as well as PSO, incorporate an intelligent learning component to identify
high quality regions by recombination of previous solutions or sampling the search space to
strike a balance between exploration and exploitation.
It is widely accepted that it is hardly possible to produce a completely accurate survey
on metaheuristics. We listed three important characteristics summarized by Blum and Roli
(2003) to differentiate among the metaheuristics used in this study in Table 5. However, we
do not go into the details of comparing the effectiveness of the exploitation or the explo-
ration among them, nor analyzing the different concepts or philosophy in them. We will
discuss two factors that would impact on the forecasting results, as well as two other metrics
besides accuracy that could also validate the forecasting performance of the metaheuristics.

5.2.1. Parameter Settings. Almost all of metaheuristics are required to set a number
of parameters, which might lead to different outcomes, for example, multiple locally optimal
solutions in the parameter space in terms of solution quality (Silberholz and Golden 2010).
It is believed that it is hard neither to conclude the rules of parameter configuration nor to
figure out the principles under it in general applications. Hence, we conducted an experiment

TABLE 5. Three Significant Characteristics on Different Metaheuristics.

Number of solutionsa Dynamic neighborhood structureb History usagec


ACO Population Yes Yes
EDA Population Yes No
GA Population Yes No
GS Single No No
SA Single No No
TABU Single No Yes
PSO Population Yes Yes
a It indicates the number of solutions used at the same time. Single means the metaheuristic
searches only one point in the search space at any time, while Population means the meta-
heuristic performs the search process of visiting a set of points at the same time.
b It indicates whether the metaheuristic changes the way it constructs the neighborhood in
the course of the search.
c It indicates whether the metaheuristic makes use of the search history.
PSO-RGM IN ECONOMIC PREDICTION

that used various parameter configurations on PSO to find out how sensitive the forecasting
performance is related to the variation of parameter settings for the economic prediction.

Acceleration Coefficient. We evaluated the constant setting and linearly varying settings
of c1 and c2 on prediction accuracy. In constant settings, the configuration of c1 D c2 D 1:5
is the best. It is in accordance with most of the previous conclusions. In linearly varying
setting (equations 8 and 9), there is not much improvement on the metrics compared with the
constant setting. We also evaluated the forecasting performances with diverse combinations
of the start values c1C and c2 , as well as the end values c1 and c2C ranging from Œ0:5; 4 with
a step of 0.5, and found that there still isn’t much difference among them for all of the three
data sets.

Inertia Weight. We evaluated three kinds of w settings, constant, linearly decreasing,


and nonlinearly decreasing. In constant setting, the optimal setting is w D 0:5 for all of the
data sets. We also observed that the performance keeps exactly the same when the popula-
tion size is 10 with different values of w ranging from Œ0; 1. In linearly decreasing setting
(equation 10), we varied the combinations of w  and w C ranging from Œ0; 1 with a step of
0.1, respectively. The results showed that there is nearly no difference on the metrics among
different combinations. It indicates that the historical setting does not have much impact on
the forecasting performance by linearly updating w.
We also used the nonlinearly varying (equation 11) and the constriction factor 
(equation 15) to update particles’ velocities (equation 14). Figure 7 shows that the nonlin-
early varying setting and the constriction factor setting with linearly varying c1 and c2 in the
meantime can improve the prediction performance. The nonlinearly varying method does
not require an initial setting of w  or w C . It calculates the w dynamically according to the
current situation. A large w is set if current position is far away from the global best posi-
tion, or a small w is set if current position is near to the global best position. The constriction
factor can slow down the velocities, but need to combine with linearly varying method to
control the effects of c1 and c2 to search much more spaces.

Population Size. We observed that the performance of PSO is getting worse when the
population size increases regardless of other parameter settings. We found that single parti-
cle PSO is the best on forecasting accuracy. The results on the population size of the PSO
indicated that the exploitation is more significant than the exploration for the search of

12

10

8
MAPE(%)

0
Financial Intermediation RealEstate Semiconductor IndustryProduction
Dataset

FIGURE 7. The influence of different methods to calculate the inertia weight w on MAPE.
COMPUTATIONAL INTELLIGENCE

optimal parameter in RGM. In fact, PSO turns to be a single-based algorithm rather than
population-based when the population size was set to 1.

5.2.2. Train-to-Verify Ratio. We configured various train-to-verify ratios to find out


the influence of the usage of recent data on performance by using different metaheuristics.
The ratios were set to 1:1, 2:1, 3:1, 4:1, and 5:1 increasingly for Financial Intermediation
and Real Estate, and 1:1 and 3:1 for Semiconductor Industry Production. Because there are
12 sample data on either Financial Intermediation or Real Estate, the ratio 2:1 means that
eight sample data were used as training data for building the RGM model while four sample
data as verifying data for finding the value of ˛ with best fitness value. However, only two
ratios were examined on Semiconductor Industry Production on account of only four sample
data used. Large proportion of the training data indicates that more recent data are used for
training; otherwise, more recent data are used for verification. In our study, we found that
the prediction accuracy gets better when the proportion of the training data increases by
using PSO. We also found that all of the metaheuristics follow this rule. This is because GM
could be trained more efficiently by the usage of recent data. We suggest that using higher
ratio of train-to-verify to optimize the parameter ˛ of rolling GM on economic prediction.

5.2.3. Convergence. Because Sudholt (2008) concluded that the computational com-
plexity of evolutionary algorithms and swarm intelligence still remains a challenging issue,
we use convergence speed in our empirical study as one of evaluation metrics. The conver-
gence speed seems to be in connection with the exploitation effectiveness that could find
a set of possible solutions quickly without wasting too much time in regions of the search
space, which are either already explored or which do not provide high-quality solutions
(Blum and Roli 2003). However, the exploration and the exploitation are believed two con-
flict goals in many applications. It indicates that the contradiction would exist between the
convergence speed and the prediction accuracy. Although the computation time in a fast
manner is not the primary purpose in economic forecasting most of which is monthly, quar-
terly or yearly based, we compared the convergence speeds of different metaheuristics by
using the convergence criteria, which was defined as less than 105 difference between 10
consecutive values of the fitness function.
Figure 8 shows an illustration of the evolution of the fitness at the first predicted year in
all of the data sets. Figure 8(a)–8(c) compares the evolutions among canonical PSOs with
different population sizes. We observed that larger population size can bring about faster
convergence. However, fast convergence does not mean low runtime. The time complexity
of the PSO is O.itermax  popsize  O. fitness//. The runtime is dependent on both population
size and iteration number. According to our empirical study, the itermax can be set to 60–80
in the single particle PSO. Figure 8(d)–8(f) shows the comparison of the convergence speed
among variant PSOs. There is no general rule on these PSOs for all of the data sets, but all
PSOs converge after 60–80 iterations at most.
We also evaluated other metaheuristics’ convergence speed in our study, and found
that the single-based methods have fixed convergence speeds for each data set, while the
population-based methods, ACO, EDA, and GA, have unfixed convergence speeds. How-
ever, all population-based methods converge at the iteration between 20 and 60. Besides, we
observed that ACO has the best convergence speed overall.

5.2.4. Certainty. Because most of the metaheuristic methods incorporate either the
randomness or the probability mechanism into their operations, the forecasting results are
usually different at each trial even using the same configuration. On the other hand, the
PSO-RGM IN ECONOMIC PREDICTION

FIGURE 8. The convergence speed comparison of the fitness values among canonical PSOs with different
populations sizes ((a)–(c)), and among PSOs with different parameter setting methods as well ((d)–(f)), to predict
the 1st year value for all the data sets.

best result generated by these methods could not be known in real forecasting applications
where the future values are not available to calculate the metrics for comparison. Hence,
the certainty of a metaheuristic is also one of the most significant factors on forecasting
performance.
q
We defined the degree of certainty using the standard deviation, DC(M)D
Pn
N 2
kD1 .Mk M /
n
, where n is the number of trials, Mk is the value of the kth forecasting trial
on the metric M, MN is the average value of all n trails. The DC indicates the degree of the
differences of the metrics among each forecasting with the same configuration. The smaller
the DC is, the higher the degree of certainty is indicated.
Figure 9 shows the distribution of different results of 100 trials on the two met-
rics by using single particle canonical PSO-RGM under the same configuration for each
data set. The maximum differences on MAPE are 3:52% for Financial Intermediation,
7:19% for Real Estate, and 1:15% for Semiconductor Industry Production. The average
MAPEs of all the 100 times are 7:47%; 9:06%, and 11:53% compared with the minimum
MAPEs 5:85%; 3:81%, and 10:85%, respectively. Similarly, the average r 2 are 0.81, 0.88,
and 0.59, with the maximum r 2 0.86, 0.94, and 0.61 for the three data sets, respectively.
The DC(MAPE) are 0.03, 0.08, and 0.01, and the DC(r 2 ) are 0.002, 0.003, and 0.0004,
respectively.
We found that PSO with a larger population size can have higher certainty, as shown in
Figure 10. More population of particles can guarantee the certainty of particles’ direction
and position to a global best space. However, larger population may lead to a local minima
COMPUTATIONAL INTELLIGENCE

Financial Intermediation Financial Intermediation


25 Real Estate 1.0 Real Estate
Semiconductor Industry Production Semiconductor Industry Production

20

0.8
15
MAPE(%)

r2
10
0.6

0 0.4
0 20 40 60 80 100 0 20 40 60 80 100
Trial No. Trial No.
(a) MAPE (b) 2

FIGURE 9. The distributions on MAPE and r 2 of 100 trials by using single particle canonical PSO-RGM.

FIGURE 10. Degree of certainty with different population sizes for the three data sets.

quickly. It is a contradiction between the certainty and the accuracy that the degree of cer-
tainty is getting lower with the decrease of the population size, which would lead to better
accuracy according to our study.
Variant PSOs with different parameter settings were evaluated in this study. Figure 11
shows the differences in values of the different variant PSOs on DC(MAPE) and DC(r 2 )
compared with the canonical PSO. The PSOs with linearly varying w or constriction factor
 combined with linearly varying c1 and c2 can improve the degree of certainty. And only
the PSO with constriction factor combined with linearly varying c1 and c2 performs better
PSO-RGM IN ECONOMIC PREDICTION

FIGURE 11. The difference values of variant PSOs against canonical PSO on DC(MAPE) and DC(r 2 ).

TABLE 6. The Degree of Certainties of Different Metaheuristics.

ACO EDA GA GS SA TABU


Financial Intermediation (train-to-verify ratio 5:1)
DC(MAPE) 0.48 0.0279 4  104 5  1016 5  104 102
2 3 5
DC(r ) 0.0412 2  10 2  10 4  106 2  105 8  104
Real Estate (train-to-verify ratio 5:1)
DC(MAPE) 3.27 0.06 103 9  1016 103 0.0326
2 4
DC(r ) 2.7710 0.01 10 1017 104 4  103
Semiconductor Industry Production (train-to-verify ratio 3:1)
DC(MAPE) 0.3 0.02 1015 1015 103 6  103
2 4 6
DC(r ) 0.0208 2  10 6  10 2  1017 9  104 104
ACO, ant colony optimization; EDA, estimation of distribution algorithms; GA, genetic algorithm;
GS, global search; SA, simulated annealing; TABU, Tabu Search; MAPE, mean absolute percentage
error.

certainty than canonical PSO for all of the three data sets. Although the nonlinearly varying
w method can improve the forecasting performance, it has higher uncertainty than other
methods.
Table 6 also shows the degree of certainty of other metaheuristics. We found that all
other metaheuristics have the high degree of certainty except ACO. GS has the best certainty.
From these empirical studies, a metaheuristic with the degree of certainty of DC(MAPE)
0:1 and DC(r 2 ) 0:01 is acceptable for economic prediction.

6. CONCLUSIONS AND FUTURE WORK

A practical economic prediction method not only forecasts one single data point accu-
rately but also is able to accurately forecast the trend that contains several consecutive
data points. Many methods have been proposed for economic prediction. However, these
prediction methods are seldom used to deal with both objectives on short and noisy data
sequences. Motivated by recent progress in optimization-based prediction, we proposed
PSO-RGM(1,1) model, which is able to do reasonable prediction in the short and noisy
time series. We evaluated and compared PSO-RGM(1,1) with not only other commonly
used forecasting models but also the RGM(1,1) models optimized by other well-known
COMPUTATIONAL INTELLIGENCE

metaheuristics, in the Financial Intermediation, Real Estate, and Semiconductor Industry


Production data sets; each of them moves steadily upward in secular trend, but is noisy and
short. Experimental results showed that the PSO-RGM(1,1) generally outperforms other
models.
We evaluated other variant PSOs and found that the single particle PSO outperforms
others in terms of evaluation metrics, convergence speed, and degree of certainty. We also
observed that the nonlinearly varying of w can greatly improve the forecasting performance,
but lead to more uncertainty. An extension of this work includes keeping the balance of the
conflictions between different evaluation metrics by handling the forecasting problems with
multi-objectives in fitness function. Furthermore, our future research will focus on analyzing
the principles of balancing exploitation and exploration of metaheuristics on forecasting.

ACKNOWLEDGMENTS

The authors would like to thank the corresponding editor and the anonymous reviewers
for their valuable comments, which greatly helped to improve the quality of this work.
This work has been supported in part by the National University of Singapore under
grants R-252-000-478-133 and R-252-000-478-750.
This manuscript is submitted to the Special Issue of Computational Intelligence
on Incentives and Trust in E-Commerce with guest editors, namely, Stephen Marsh,
Jie Zhang, and Christian Damsgaard Jensen and assistant ditor Zeinab Noorian, email:
z.noorian@unb.ca.

REFERENCES

AKAY, D., and M. ATAK. 2007. Grey prediction with rolling mechanism for electricity demand forecasting of
turkey. Energy, 32(9):1670–1675.
ALFI, A., and H. MODARES. 2011. System identification and control using adaptive particle swarm optimization.
Applied Mathematical Modelling, 35(3):1210–1221.
BERGH, F. 2002. An analysis of particle swarm optimizers, Ph.D. Thesis, University of Pretoria, Pretoria, South
Africa.
BLUM, C., and A. ROLI. 2003. Metaheuristics in combinatorial optimization: Overview and conceptual
comparison. ACM Computing Surveys, 35(3):268–308.
BROCKWELL, P. J., and R. A. DAVIS. 2009. Time Series: Theory and Methods (2nd ed.). Springer: New York.
CALBOREAN, H., R. JAHR, T. UNGERER, and L. VINTAN. 2013. A comparison of multi-objective algorithms
for the automatic design space exploration of a superscalar system. In Advances in Intelligent Control
Systems and Computer Science, vol. 187. Edited by DUMITRACHE, L., Advances in Intelligent Systems
and Computing. Springer: Berlin Heidelberg, pp. 489–502.
CHANG, S. C., H. C. LAI, and H. C. YU. 2005. A variablep value rolling grey forecasting model for Taiwan
semiconductor industry production. Technological Forecasting and Social Change, 72(5):623–640.
CHEN, C. F., M. C. LAI, and C. C. YEH. 2012. Forecasting tourism demand based on empirical mode
decomposition and neural network. Knowledge-Based Systems, 26(0):281–287.
CLERC, M., and J. KENNEDY. 2002. The particle swarm—explosion, stability, and convergence in a multidi-
mensional complex space. IEEE Transactions on Evolutionary Computation, 6(1):58–73.
DE GOOIJER, J. G., and R. J. HYNDMAN. 2006. 25 years of time series forecasting. International Journal of
Forecasting, 22(3):443–473.
PSO-RGM IN ECONOMIC PREDICTION

DENG, J. 1989. Grey Prediction and Decision-Making, Vol. M. Huazhong University of Science and Technology
Press: Wuhan, China.
DURILLO, J. J., A. J. NEBRO, F. LUNA, C. A. COELLO COELLO, and E. ALBA. 2010. Convergence speed in
multi-objective metaheuristics: Efficiency criteria and empirical study. International Journal for Numerical
Methods in Engineering, 84(11):1344–1375.
EBERHART, R., and J. KENNEDY. 1995. New optimizer using particle swarm theory. In Proceedings of the 1995
6th International Symposium on Micro Machine and Human Science: Nagoya, Japan, pp. 39–43.
HE, W., Z. WANG, and H. JIANG. 2008. Model optimizing and feature selecting for support vector regression in
time series forecasting. Neurocomputing, 72(1C3):600–611.
HSU, C. C., and C. Y. CHEN. 2003. Applications of improved grey prediction model for power demand
forecasting. Energy Conversion and Management, 44(14):2241–2249.
HSU, L., and C. WANG. 2007. Forecasting the output of integrated circuit industry using a grey model improved
by the bayesian analysis. Technological Forecasting and Social Change, 74(6):843–853.
HSU, L. C. 2003. Applying the grey prediction model to the global integrated circuit industry. Technological
Forecasting and Social Change, 70(6):563–574.
HSU, L. C. 2010. A genetic algorithm based nonlinear grey Bernoulli model for output forecasting in integrated
circuit industry. Expert Systems with Applications, 37(6):4318–4323.
HSU, L. C. 2011. Using improved grey forecasting models to forecast the output of opto-electronics industry.
Expert Systems with Applications, 38(11):13879–13885.
JO, T. C. 2003. The effect of virtual term generation on the neural based approaches to time series prediction. In
The IEEE Fourth Conference on Control and Automation. Concordia University, Montreal, Canada, Vol. 3,
pp. 516–520.
JU-LONG, D. 1982. Control problems of grey systems. Systems and Control Letters, 1(5):288 –294.
KANDEL, A. 1991. Fuzzy Expert Systems. CRC Press: Boca Raton, FL.
KAYACAN, E., B. ULUTAS, and O. KAYNAK. 2010. Grey system theory-based models in time series prediction.
Expert Systems with Applications, 37(2):1784–1789.
KENNEDY, J., R. C. EBERHART, and Y. SHI. 2001. Swarm Intelligence. Morgan Kaufmann Publishers: San
Francisco.
LIU, X. Q., B. W. ANG, and T. N. GOH. 1991. Forecasting of electricity consumption: A comparison between
an econometric model and a neural network model. In IEEE International Joint Conference on Neural
Networks, Seattle, WA, Vol. 2, pp. 1254–1259.
MA, J., and J. TENG. 2004. Predict chaotic time-series using unscented Kalman filter. In Proceedings of 2004
International Conference on Machine Learning and Cybernetics, Shanghai, China, Vol. 2, pp. 687–690.
MIN, H., H. JEUNG KO, and C. SEONG KO. 2006. A genetic algorithm approach to developing the multi-echelon
reverse logistics network for product returns. Omega, 34(1):56–69.
POLI, R. 2008. Analysis of the publications on the applications of particle swarm optimisation. Journal of
Artificial Evolution and Applications, 2008:1–10.
QUAH, T. S., and B. SRINIVASAN. 1999. Improving returns on stock investment through neural network
selection. Expert Systems with Applications, 17(4):295–301.
RABINER, L. R. 1989. A tutorial on hidden markov models and selected applications in speech recognition.
Proceedings of the IEEE, 77(2):257–286.
RATNAWEERA, A., S. K. HALGAMUGE, and H. C. WATSON. 2004. Self-organizing hierarchical particle swarm
optimizer with time-varying acceleration coefficients. IEEE Transactions on Evolutionary Computation,
8(3):240–255.
ROMAN, J., and A. JAMEEL. 1996. Backpropagation and recurrent neural networks in financial analysis of
multiple stock market returns. In IEEE System Sciences Proceedings of the 29th Hawaii International
Conference, Maui, HI, Vol. 2, pp. 454–460.
COMPUTATIONAL INTELLIGENCE

SHEN, J., Z. CANXIN, C. LIAN, H. HU, and M. MAMMADOV. 2010. Investment decision model via an improved
bp neural network. In 2010 IEEE International Conference on Information and Automation (ICIA), Harbin,
China, pp. 2092–2096.
SHI, Y., and R. C. EBERHART. 1998. Parameter selection in particle swarm optimization. In Evolutionary Pro-
gramming VII, vol. 1447. Edited by PORTO, V. W., N. SARAVANAN, D. WAAGEN, and A. E. EIBEN, Lecture
Notes in Computer Science. Springer: Berlin Heidelberg, pp. 591–600.
SHI, Y., and R. C. EBERHART. 1999. Empirical study of particle swarm optimization. In CEC 99. Proceedings
of the 1999 Congress on Evolutionary Computation, Washington, DC, Vol. 3, p. 1950.
SILBERHOLZ, J., and B. GOLDEN. 2010. Handbook of metaheuristics. Edited by GENDREAU, M., and J. Y.
POTVIN. Springer: New York, pp. 625–640.
SUDHOLT, D. 2008. Computational complexity of evolutionary algorithms, hybridizations, and swarm intelli-
gence, Ph.D. Thesis, Dortmund University of Technology, Dortmund, Germany.
TAN, G. 2000. The structure method and application of background value in grey system gm(1,1) model (i).
Systems Engineering-Theory and Practice, 2000(4):98–103.
TANG, H. W. V., and M. S. YIN. 2012. Forecasting performance of grey prediction for education expenditure
and school enrollment. Economics of Education Review, 31(4):452–462.
TKACZ, G. 2001. Neural network forecasting of Canadian GDP growth. International Journal of Forecasting,
17(1):57–69.
TRELEA, I. C. 2003. The particle swarm optimization algorithm: convergence analysis and parameter selection.
Information Processing Letters, 85(2003):317–325.
YAO, M. J., and W. M. CHU. 2008. A genetic algorithm for determining optimal replenishment cycles to
minimize maximum warehouse space requirements. Omega, 36(4):619–631.
YOKUMA, J. T., and J. S. ARMSTRONG. 1995. Beyond accuracy: comparison of criteria used to select forecasting
methods. International Journal of Forecasting, 11(4):591–597.
ZHAO, Z., J. WANG, J. ZHAO, and Z. SU. 2012. Using a grey model optimized by differential evolution algorithm
to forecast the per capita annual net income of rural households in china. Omega, 40(5):525–532.

APPENDIX

TABLE A1. The Parameter Configurations in Our Experiments.

Parameters

Models
AR Scalar: n=2, where n is the length of the sample data
Parameter fitting: forward–backward approacha

ARMA Number of AR lags and MA lags: 1 and 1


Parameter fitting: AICc

ARIMA Number of AR lags and MA lags: 3 and 1


Differencing degree: 3
Parameter fitting: AICc

Volterra Delays and dimensions and Volterra orders: 1 and 1 and 1


b
NN Network type: nonlinear autoregressive network (NAR)
Feedback delays: 1–5
Hidden layer size: 1–10
Pre/postprecessing functions: removeconstantrows/mapminmaxc
Training and performance function: Levenberg–Marquardt and MSE
Metaheuristics
ACO Maximum iterations: 100
Population size: 1,000
Pheromone deposition and evaporation fraction: 1 and 0.7
Moving speed: 0.1
PSO-RGM IN ECONOMIC PREDICTION
TABLE A1. Continued

Parameters

Metaheuristics
EDA Maximum iterations: 100
Population size: 1,000
Learning and sampling method: mixture of full Gaussian model
Replacement method: choose best elitism
Selection method: truncation selection

GA Population size: 20
Generations: 100
Crossover fraction: 0.8
Migration interval and fraction: 20 and 0.2
Selection/crossover/mutation functions: stochastic/scattered/Gaussian

GS Local search algorithm: trust region reflectived

SA Initial and stop temperature: 1 and 108


Annealing rate: 0.8

TABU Maximum iterations: 200


TABU list length: random number between 6 and 13
Local search scale: 5
a
We also used other four algorithms for computing the least squares
autoregressive model, including Burg’s lattice-based method, geomet-
ric lattice approach, least-squares approach, and Yule–Walker approach,
and found that the forward–backward approach obtained the best results
in comparison.
b
The combinations of the parameters in NN that leaded to the best perfor-
mance among the three data sets were different. We chose the best predicted
values and metrics from the NN models with different combinations on dif-
ferent data sets in our following comparison studies.
c
removeconstantrows preprocesses matrices by removing rows with con-
stant values. mapminmax postprocesses matrices by mapping row minimum
and maximum values to Œ0; 1.
d
We also tested other three algorithms to find neighbor values, includ-
ing active set, interior point, and SQP. However, the performance by using
different local search algorithms does not seem to have much difference.

You might also like