Professional Documents
Culture Documents
Grey system theory has been widely used to forecast the economic data that are often nonlinear, irregular, and
nonstationary. Current forecasting models based on grey system theory could adapt to various economic time series
data. However, these models ignored the importance of the model parameter optimization and the use of recent data,
which lead to poor forecasting accuracy. In this article, we propose a novel forecasting model, called particle swarm
optimization rolling grey model (PSO-RGM(1,1)), based on a rolling mechanism GM with optimized parameters
by using the particle swarm optimization algorithm. The simple model is shown to be very effective in forecasting
the tertiary industry data sequences, which are short and noisy but regular in secular trend. The experimental results
show that PSO-RGM(1,1) outperforms other commonly used forecasting models on three real economic data sets.
Our empirical study shows that PSO is found to be the best overall algorithm to optimize the parameter of RGM
compared with other well-known metaheuristics. Furthermore, we evaluated other variant PSOs and found that
single particle PSO outperforms others overall in terms of prediction accuracy, convergence speed, and degree of
certainty.
1. INTRODUCTION
Address correspondence to Jianzhou Wang, School of Information Science and Engineering, Lanzhou University, Gansu
730000, China; e-mail: wjz@lzu.edu.cn
Correction added on 29 October 2015, after first online publication: the first affiliation has been added for Li Liu and the rest
of the author affiliations have been reordered as a result of this change.
and Jameel 1996), support vector machines (De Gooijer and Hyndman 2006; He et al.
2008; Shen et al. 2010; Tkacz 2001), fuzzy systems (Kandel 1991), linear regression,
Kalman filtering (Ma and Teng 2004), and hidden Markov models (Rabiner 1989). All
of these approaches were used for learning the forecasting models. The statistical mod-
els are generally not as accurate as the learning-based approaches for nonlinear problems
(Kayacan et al. 2010), and are usually used for short-term forecasting. However, the
machine-learning-based approaches are often limited by insufficient data (Hsu 2003) and
the training time is relatively long (Jo 2003).
However, tertiary industry economic data are often highly nonlinear, irregular, and non-
stationary, but upward moving in secular trend. Besides, the data sequences generated from
these yearly or quarterly tertiary industry reports are often short. On the other hand, risk
management requires not only the prediction of individual data point but also the estimation
of trend in tertiary industry. As such, it is very difficult to fit a model to them by the use of
conventional linear statistical methods or NNs to forecast either short-term or medium-term
production from these noisy and insufficient data sets. In this study, grey prediction theory
is proposed to alleviate the problem.
Grey system theory was introduced and developed by Deng (1989) for mathematical
analysis on the data set with uncertainty and roughness. It only requires a small set of train-
ing data, which are discrete or incomplete, to construct a model for forecasting. The data
with uncertainty and roughness are called “grey” data. Grey system theory has been widely
and successfully used to do short-term forecasting in many areas. It is shown that the grey
model (GM) is more robust with respect to noise and lack of modeling information when
compared to conventional methods because grey predictors adapt their variables to new
conditions because new outputs become available. In recent years, GM has been optimized
in many different ways. Rolling mechanism (RM) is one of the most effective methods to
improve the performance of GM (Akay and Atak 2007; He et al. 2008; Ju-Long 1982; Tang
and Yin 2012). It incorporates the usage of recent data to handle noisy sequence. Tang and
Yin (2012) improved the forecasting accuracy for education expenditure by RM. Zhao et al.
(2012) forecasted the per capita annual net income of rural households in China by using
RM and obtained greater accuracy. Besides, RM is able to extend GM to relatively long-term
forecasting.
Recently, researchers have also paid close attention to the optimization problem to
improve the predictive ability of GM; the parameters of which are constant or set artifi-
cially (Kayacan et al. 2010). Several optimization techniques, such as the genetic algorithm
(GA) (Min et al. 2006; Yao and Chu 2008) and NN (Hsu 2010, 2011, Hsu and Chen 2003),
have been proposed. However, they did not consider the impact of the recent data from
the sequence. In this article, we proposed an RM based GM optimized by using the par-
ticle swarm optimization (PSO) algorithm to handle short and noisy but secularly regular
data sequences. The purposes of the proposed method are not only to improve the pre-
diction accuracy of individual year but also to improve the accuracy of trend in the next
few years. PSO, which developed by Eberhart and Kennedy (1995), is considered as a tool
for optimization of difficult numerical solutions. The PSO algorithm had been enormously
successful in about 700 applications (Poli 2008). It does not require that the optimiza-
tion problem be differentiable. More specifically, the empirical studies conducted by
Durillo et al. (2010) and Calborean et al. (2013) showed that PSO was found to be the best
overall algorithm in terms of convergence speed on the multiobjective optimization prob-
lems, and the application of the automatic space exploration on the superscalar computer
system. Our empirical study also shows that PSO is the best overall algorithm for parameter
optimization when considering the accuracy, convergence speed, and degree of certainty for
economic predictions.
PSO-RGM IN ECONOMIC PREDICTION
This article is organized as follows. Section 2 outlines the basic GM and the improved
GM model with RM. Section 3 presents the proposed forecasting model. We report the
experiments and evaluation results on three economic data sets in Section 4. Section 5 dis-
cusses the performances of other forecasting models and variant PSOs, and the effects of
other model factors. Section 6 concludes this article.
The grey system theory focuses on extracting realistic governing laws of the system
from the available data of the system generally with white noise data. A GM in grey system
theory is denoted by GM(n,m), where n indicates the order of the difference equation and
m indicates the number of variables. Although various types of GMs can be mentioned, we
focus our attention on GM(1,1) model in this study because of its computational efficiency.
Besides, GM(1,1) only need to fit one variable, and thus, it is able to be used for insufficient
data sequence modeling.
where x .0/ .i/ the time series data at time i, n is the length of sequence which must be equal
to or larger than 4.
On the basis of the initial sequence x .0/ , a new sequence
x .1/ D x .1/ .1/; x .1/ .2/; : : : ; x .1/ .n/
For instance, a time series sequence x .0/ D .1; 2; 3; 4; 5/ representing 5-year economic
data does not have a clear regularity. Grey system theory is applied to accumulate gener-
ation of x .0/ to obtain a new sequence x .1/ D .1; 3; 6; 10; 15/, which has a clear growing
tendency.
Step 2: Establishing the first-order differential equation of GM(1,1) as
dx .1/
C ax .1/ D b (1)
dt
and its difference equation is
where a is the development coefficient and b is the driving coefficient, and ´.1/ D
´ .2/; ´.1/ .3/; : : : ; ´.1/ .n/ is the generated sequence of ´.1/ .k/ D ˛x .1/ .k/ C .1
.1/
In the preceding text, P D Œa; bT is a sequence of coefficient parameters that can be
computed by employing the least squares method:
1 T
P D BT B B YN (4)
Step 4: Substituting P in equation (3) with equation (4), the solution of the prediction value
of x .1/ at time k is
.1/ .1/ b a.k1/ b
xO .k/ D x .1/ e C : (5)
a a
Algorithm: RGM(1,1).
Input:
x .0/ D x .0/ .1/; x .0/ .2/; : : : ; x .0/ .l/ —a sequence of sample data.
Output:
xO .0/ D xO .0/ .l C 1/; xO .0/ .l C 2/; : : : ; xO .0/ .l C n/ —a sequence of predicted data.
Parameters:
l—the number of sample data to build the GM(1,1) model in each rolling loop.
m—the number of predicted data in each loop, namely n data to be predicted in total, and
m n. ˙n
k—an integer number that is called rolling number, k D m .
1: Rs D x .0/ .1/; x .0/ .2/; : : : ; x .0/ .l/ ;
2: Build a GM.1; 1/ model using the data set Rs by the algorithm described in Section 2.1;
3: k D 1; ˙ n
4: WHILE k < m DO
.0/
O
5: Calculate R D xO .l C .k 1/m C 1/; xO .0/ .l C .k 1/m C 2/; : : : ; xO .0/ .l C km/
m
which can improve the prediction accuracy of the subsequent data. Similarly, it pre-
dicts the data of the eighth year xO .0/ 8/ by the model built from its former 5 years’
data x .0/ .3/; x .0/ .4/; x .0/ .5/; xO .0/ .6/; xO .0/ .7/ . There are totally three loops calculated in
RGM(1,1) for predicting xO .0/ .6/; xO .0/ .7/, and xO .0/ .8/. Hence, the rolling number k is 3.
Because ˛ directly influences the calculation of a and b in GM(1,1) model, and is one
of the most important factors that may influence the performance of the models, we present
an algorithm based on RGM(1,1) combined with PSO, which optimizes the parameter ˛ in
each rolling period to improve the forecasting accuracy.
Algorithm: ˛-RGM(1,1).
Input:
x .0/ D x .0/ .1/; x .0/ .2/; : : : ; x .0/ .l/ —a sequence of sample data.
Output:
xO .0/ D xO .0/ .l C 1/; xO .0/ .l C 2/; : : : ; xO .0/ .l C n/ —a sequence of predicted data.
PSO-RGM IN ECONOMIC PREDICTION
Parameters:
l—the number of sample data to build the GM(1,1) model in each rolling loop.
m—the number of predicted data in each loop, namely n data to be predicted in total, and
m n. ˙n
k—an integer number that is called rolling number, k D m .
1: Rs D x .0/ .1/; x .0/ .2/; : : : ; x .0/ .l/ ;
2: k D 1; ˙ n
3: WHILE k < m DO
4: Find a best value of ˛ by using a given strategy;
5: Build a GM.1; 1/ model with the parameter ˛ using the data set Rs ;
6: Calculate RO m D xO .0/ .l C .k 1/m C 1/; xO .0/ .l C .k 1/m C 2/; : : : ; xO .0/ .l C km/
by GM.1; 1/;
7: Rs D xO .0/ .kmC1/; : : : ; xO .0/ .l C.k 1/m/; xO .0/ .l C.k 1/m C 1/; : : : ; xO .0/ .l Ckm/ ;
8: k D k C 1;
9: END WHILE
10: RETURN xO .0/ D xO .0/ .l C 1/; xO .0/ .l C 2/; : : : ; xO .0/ .l C n/ .
3.3.1. ˛-PSO Algorithm. In this section, we present the ˛-PSO algorithm to find a
best value of ˛ based on PSO. PSO consisting of a swarm of particles searches large spaces
of new candidate solutions iteratively, where the fitness is calculated as certain quality
measure, with few or without any assumptions about the problem being optimized. Each
particle contains a position that represents a possible solution and a velocity that represents
the direction and position moving to the next solution. Both variables can be updated at
every iteration according to the equation (˛-PSO Algorithm, lines 27–29). During the
movements, each particle decides its next velocity and position moving into the best area
in terms of its current best fitness value with the population’s best fitness value at every
iteration.
Algorithm: ˛-PSO.
Input:
xs.0/ D x .0/ .1/; x .0/ .2/; : : : ; x .0/ .q/ —a sequence of training data.
xp.0/ D x .0/ .q C 1/; x .0/ .q C 2/; : : : ; x .0/ .q C d / —a sequence of verifying data.
COMPUTATIONAL INTELLIGENCE
Output:
˛best —the value of ˛ with the best fitness value in particle searching space.
Parameters:
˛iiter is the ith candidate solution at the iterth iteration. pBesti is the best fitness value
of the ith candidate found so far. gBest is the best fitness value found so far from all the
PSO-RGM IN ECONOMIC PREDICTION
which calculates the fitness value of the ith candidate at iterth iteration. The fitness function
indicates the average degree of the forecasting bias compared with the actual data. Theoret-
ically, the range of the forecasting bias is Œ0; 1. The sigmoid./ is the sigmoid function that
maps from Œ0; 1 to Œ0; 1. It is mathematically formulated as
´
sigmoid.´/ D p : (7)
1 C ´2
The candidate with the minimum fitness value is elected as the best solution at current
iteration. The ˛-PSO algorithm updates all candidates according to the fitness values by
using velocity and position evolution equations. Finally, the candidate with the best fitness
value is selected from the set of best fitness values at every iteration as the solution of ˛.
iter
c2 .iter/ D c2 C c2C c2 : (9)
itermax
A proper value of the inertia weight provides a balance between global and local explo-
rations. A large inertia weight favors global search (GS), while a small inertia weight favors
local search (Shi and Eberhart 1998, 1999). In general, settings near 1 facilitate GS, and
settings ranging from Œ0:2; 0:5 facilitate rapid local search. The linear decreasing weight
controlling was suggested by Shi and Eberhart (1999) that the inertia weight is dynamically
COMPUTATIONAL INTELLIGENCE
adapted by introducing the linearly decreasing Equation 10. w C and w are usually set to
0.9 and 0.4.
iter
w.iter/ D w C .w C w / : (10)
itermax
The nonlinearly decreasing inertia weight (equation 11) was also proposed by Alfi and
Modares (2011) who incorporated the hyperbolic tangent function (equation 12) to update
wi of each particle i.
1
wi .iter/ D : (11)
1 C tanh.NI i .iter//
e ´ e ´
tanh.´/ D : (12)
e ´ C e ´
where NI iter
i is the neighborhood index of the particle i, which is calculated at each iteration
as
iter fitness ˛iiter gW orst
NI i D : (13)
gBest gW orst
where gWorstiter is the global worst fitness value at the current iteration. A small NI iter i
indicates the current position is bad that needs global exploration with a large inertia weight.
On the contrary, a large NI iter
i indicates the requirement of local exploitation with a small
inertia weight.
The constriction factor was proposed by Clerc and Kennedy (2002) to control the
magnitude of the velocities, instead of w. The velocity update scheme (˛-PSO Algorithm,
line 27) is replaced with the following:
Real Estate and Semiconductor Industry Production. All these data sets were collected from
the China Statistical Yearbook, National Bureau of Statistics of China, and were shown in
Figure 2.
Financial Intermediation in Beijing from 1994 to 2010 has relatively smoothing trends.
Real Estate in Beijing from 1994 to 2010 seems noisy that may be caused by Chinese gov-
ernment policy about estate and the financial crisis in 2008. Taiwan Semiconductor Industry
Production seems regular from 1994 to 2000 but irregular since 2000. All these data sets
are small, 17 data points in the first two data sets, while nine data points in the last data set.
We used three evaluation metrics to evaluate that overall accuracy of multiple predicted
points: mean absolute percentage error (MAPE), mean absolute deviation (MAD), and mean
squared error (MSE), which are often adopted by De Gooijer and Hyndman (2006) and Tang
and Yin (2012). MAPE is a general accepted metric for prediction accuracy. The formula of
MAPE (Hsu and Wang 2007) is listed in Table 1.
ˇ ˇ
1 Xn ˇˇ x .0/ .i/ xO .0/ .i/ ˇˇ
MAPE.%/ D ˇ ˇ: (17)
n i D1 ˇ x .0/ .i/ ˇ
Mean absolute deviation and MSE measure the average magnitude of the forecast errors.
But the latter imposes a greater penalty on a large error than several small errors. The
2000 1200 8000
1800 7000
Value(100 million yuan)
1000
1600
6000
1400
800
1200 5000
0 0 0
1994 1996 1998 2000 2002 2004 2006 2008 2010 1994 1996 1998 2000 2002 2004 2006 2008 2010 1994 1996 1998 2000 2002
year year year
(a) Financial Intermediation (b) Real Estate (c) Semiconductor Industry
Production
FIGURE 2. The experiments used three data sets, each of which was split into two groups, sample data, and
a sequence of five test data. A portion of sample data were used to train the prediction model, while others were
remained for model verification.
COMPUTATIONAL INTELLIGENCE
smaller the values, the closer the predicted values to the actual values (Chen et al. 2012).
1 X ˇˇ .0/ ˇ
n
_.0/ ˇ
MAD D ˇx .i/ x .i/ˇ ; (18)
n
i D1
1 X .0/ 2
n
_.0/
MSE D x .i/ x .i/ : (19)
n
i D1
The sum of squares total (SST) measures the deviations of the observations from their
mean. The sum of squares error (SSE) measures the deviations of observations from their
predicted values. The smaller SSE, the more reliable the predictions obtained from the
model. Therefore, the higher the value of r 2 , the more successful the model is at predicting
statistical data (Zhao et al. 2012). The maximum value of the coefficient of determination
r 2 is 1. Because MAPE, MAD, and MSE measure the mean performance of all predicted
data points, r 2 indicates how well-predicted data points fit the trend.
from 1994 to 1997 in Semiconductor Industry Production to predict the next five data points
is a real challenge for most of forecasting methods.
Experiment I aims to compare the PSO-RGM with the other two basic GMs. GM and
RGM were constructed with a fixed ˛ value 0.5. In RGM and PSO-RGM, the rolling number
k equals to 5. In PSO-RGM, the train-to-verify ratio q W d was set to 3 W 1, namely q D 9
and d D 3 for Financial Intermediation and Real Estate, and q D 3 and d D 1 for
Semiconductor Industry Production.
The purpose of Experiment II is to compare the PSO-RGM with other five well-known
economic forecasting models: autoregressive model, ARMA, ARIMA, Volterra model
(Volterra), and neutral network(NN). We also considered other commonly used forecasting
models in this study, such as Markov forecasting model (MM), hidden Markov forecasting
model, generalized autoregressive conditional heteroskedasticity model, and so on. How-
ever, a relatively sufficient number of sample data are required to train these models. For
instance, the observation number produced from the sample set in our study (e.g., three
training data in Semiconductor Industry Production) is insufficient to fit the variables of the
Markov forecasting model model.
Experiment III was designed to compare different metaheuristics optimizing the param-
eter ˛ of the RGM. Six well-known metaheuristics, ant colony optimization, estimation
of distribution algorithms (EDA), GA, GS, simulated annealing (SA), and Tabu Search
(TABU), were used to compare with PSO with the same objective function in equation 6 to
find a parameter ˛ with the best value. Each RGM model with different metaheuristics is
simply implemented by replacing the ˛-PSO algorithm to the corresponding meta-heuristic
algorithm with the same inputs and outputs (see the ˛-RGM algorithm, line 4).
For all the learning-based methods, including NN and all metaheuristics, we split the
sample set into two parts, training set and verifying set. The train-to-verify ratio was set to
3 W 1. We will discuss the effect of train-to-verify ratio in Section 5.2.2.
The parameters used by the different forecasting models and the metaheuristics
(Appendix: Table 1) were selected based on the analysis from the literature or this study.
We will discuss the influence of the parameter settings on prediction metrics, but we do not
intent to study the property of each model or metaheuristic, nor to analyze the similarities
and differences between them in detail, which are out of scope of this study.
For PSO-RGM, we set popsize D 1, and itermax D 100. We selected the weights with
c1 D 2, c2 D 2, and w D 0:5. Either each forecasting model or each optimized RGM
model was executed for 100 times, because the randomness or probability mechanisms
within them could cause various results. We chose the predicted values with the best r 2 in
our comparisons. The certainty of the models and the metaheuristics will be discussed in
Section 5.2.4.
4.4. Experiment I
Table 2 shows the predication results of the three algorithms. The PSO-RGM obtains the
best overall prediction accuracy. For Financial Intermediation, GM obtains the best single
accuracy at data points in 2007 and 2010 with PEs 11:49% and 3:6%, respectively. PSO-
RGM has the lowest PEs at other points. Besides, PSO-RGM outperformed both GM and
RGM on MAPE, MAD, MSE, and r 2 . Figure 3(a) and Figure 3(d) intuitively show that very
little difference exists among the three models.
For Real Estate, Figure 3(b) shows that PEs obtained by PSO-RGM under 7% were
smaller than those of GM and RGM. It also shows that PSO-RGM can forecast the trend
very well that it catches the descending trend between 2009 and 2010. Therefore, PSO-
RGM obtains a very high r 2 . Both GM and RGM poorly performed on every metric. A
COMPUTATIONAL INTELLIGENCE
TABLE 2. The Parameters and Predicted Values Calculated by GM, RGM, and PSO-RGM, Respectively,
and the Comparison of the Evaluation Metrics on These Three Models.
GM RGM PSO-RGM
Year Actual ˛ Predicted PE(%) ˛ Predicted PE(%) ˛ Predicted PE(%)
Financial Intermediation
2006 982 994 1.24 994 1.24 0.484 990 0.83
2007 1,302 1,152 11.49 1,101 14.62 0.501 1,147 11.90
2008 1,519 0.5 1,336 12.01 0.5 1,282 15.31 0.583 1,360 10.46
2009 1,603 1,549 3.36 1,494 6.29 0.589 1,622 1.15
2010 1,863 1,796 3.60 1,758 5.61 0.575 1,955 4.90
Real Estate
2006 658 835 26.93 835 26.93 0.312 690 4.87
2007 821 1,076 31.074 1,054 28.36 0.363 787 4.11
2008 844 0.5 1,387 64.28 0.5 1,329 57.39 0.378 899 6.44
2009 1,062 1,787 14.85 1,674 57.56 0.414 1,038 2.29
2010 1,006 2,303 128.89 2,106 109.24 0.253 1,019 1.33
larger than 500 MAD means that the average PE is more than 50% of the actual values.
The MAD and MSE metrics on PSO-RGM indicate that the average differences between
the prediction value and the actual value at each year would not exceed ˙50.˙5%/.
PSO-RGM IN ECONOMIC PREDICTION
2000
Value(100 million yuan) 2400
7000
1800 2100
6000
1600 1800
5000
1400 1500
1200 4000
1200
80 80 80
60 60 60
PE(%)
40 40 40
20 20 20
0 0 0
2006 2007 2008 2009 2010 2006 2007 2008 2009 2010 1998 1999 2000 2001 2002
year year year
(d) Financial Intermediation (e) Real Estate (f) Semiconductor Industry Production
FIGURE 3. (a)–(c): The comparison of the annual values predicted by GM, RGM, and PSO-RGM, respec-
tively, with the actual value. (d)–(f): The annual errors of the predicted value compared with the actual
value.
For Semiconductor Industry Production, although PSO-RGM with its MAPE 10.8597%
is not much better than either GM or RGM, it greatly improved the r 2 and the MSE. The
singularity arises at year 2000 with a spike, followed by a suddenly drop at the next year.
All the three models performed poorly but reasonably at this year. PSO-RGM performs the
best with a 25% PE. RGM also obtains a reasonable accuracy at this year. The reason is that
both RGM and PSO-RGM uses the RM which is more sensitive to the recent change.
4.5. Experiment II
Table 3 and Figure 4 shows the comparison of the six models on different evaluation
metrics. PSO-RGM generally outperformed other models except ARIMA and NN across
all three data sets.
For Financial Intermediation, the ARIMA, NN and PSO-RGM models obtain excellent
results on all evaluation metrics. The other models also achieved good MAPE less than 20%
because of the relative regularity of this data set. Figure 5(a) shows that NN model obtains
the best prediction values almost at each year. However, the NN model with the best r 2 was
selected after 100 trials. We found that there were huge differences among the results from
these 100 trails, therefore we could not find the best one in real economic forecasting where
the future data are unknown that we cannot calculate the r 2 or the MAPE for the comparison
between different trained networks. We will discuss the uncertainty of the NN models in
Section 5.1.2. And the metaheuristics that incorporate the randomness mechanism would
also encounter this uncertainty. The degree of certainty of the PSO-RGM will be discussed
in Section 5.2.4.
For Real Estate, the MAPE of the PSO-RGM is less than 5%, which is better than other
models. The other models except ARMA have excellent performances on MAPE, but r 2 is
much lower compared with a value of 0.94 of the PSO-RGM. Others get fairly good MAPEs
but not r 2 .
COMPUTATIONAL INTELLIGENCE
TABLE 3. The Comparison of the Evaluation Metrics on PSO-RGM with Other Classic
Forecasting Methods.
Real Estate
MAPE(%) 8.816 12.264 9.229 8.839 7.340 3.810?
MAD 78.057 111.915 82.486 79.908 60.669 31.624?
MSE 7,869 24,763 9,624 11,163 8,958 1,182?
r2 0.618 0.200 0.533 0.458 0.565 0.942?
PSO-RGM
100 100 100 AR
ARMA
ARIMA
Volterra
80 80 80 NN
60 60 60
PE(%)
40 40 40
20 20 20
0 0 0
2006 2007 2008 2009 2010 2006 2007 2008 2009 2010 1998 1999 2000 2001 2002
year year year
(a) Financial Intermediation (b) Real Estate (c) Semiconductor Industry
Production
FIGURE 4. The comparison of the annual prediction errors among different forecasting models.
PSO-RGM still obtains a good MAPE with 15% for Semiconductor Industry Produc-
tion, whereas most of other models except ARIMA and NN are actually not applicable for
this data set because only four data points are not sufficient for training and lead them to
overestimate the values after year 2000, which was followed by a sharp falling at year 2001.
The model trained by either ARMA or Volterro is almost incorrect in terms of their MAPEs
that are as high as approximately 50%. The ARIMA model predicted well with its MAPE
13%. Figure 5(c) shows that almost all of the yearly PEs from ARIMA are a little higher
than those of PSO-RGM.
PSO-RGM IN ECONOMIC PREDICTION
PSO-RGM
100 100 100 ACO
EDA
GA
GS
80 80 80 SA
60 60 60
PE(%)
40 40 40
20 20 20
0 0 0
2006 2007 2008 2009 2010 2006 2007 2008 2009 2010 1998 1999 2000 2001 2002
year year year
(a) Financial Intermediation (b) Real Estate (c) Semiconductor Industry
Production
FIGURE 5. The comparison of the annual prediction errors of different metaheuristics compared with the
actual values.
5. DISCUSSION
In this section, we discuss the factors related to the statistical models and NN model
that would influence on the forecasting performance. We also examine the performance of
variant PSOs and the effect of train-to-verify ratio. Furthermore, we present and discuss
other two important evaluation metrics, convergence speed and degree of certainty.
5.1.1. Statistical Models. According to the results of our experiments, ARIMA that is
a generalization of ARMA outperforms other two linear models. The reason for the unfavor-
able score produced by ARMA is that the moving-average model that combines with AR in
COMPUTATIONAL INTELLIGENCE
TABLE 4. The Comparison of the Evaluation Metrics on PSO-RGM with RGM Using Other
Metaheuristics Optimizing ˛.
Real Estate
MAPE (%) 6.040 7.927 11.174 11.210 11.188 9.283 3.810?
MAD 53.309 69.636 101.864 102.212 102.001 83.834 31.624?
MSE 4,149 8,540 17,644 17,749 17,686 12,524 1,182?
r2 0.798 0.585 0.144 0.139 0.142 0.392 0.942?
the model requires that the data sequence should follow a fairly linear trend and have a def-
inite rhythmic pattern of fluctuations. However, all sequences in three data sets are neither
very regular nor seasonal that makes the irregular information being removed entirely by
the moving-average method. In ARIMA, there is an additional parameter called differenc-
ing degree, which indicates the number of nonseasonal differences to fine tune the models
to be more accurate on the basis of ARMA.
Finding appropriate values of the arguments p and q in the AR.p/, ARMA.p; q/,
and ARIMA.p; d; q/ models can be facilitated by plotting the partial autocorrelation func-
tions for an estimate of p, and likewise using the autocorrelation functions for an estimate
of q. We used the forward–backward approach to find p for AR model, while the AICc
that was recommended by Brockwell and Davis (2009) to find p and q for ARMA and
ARIMA models. These models can be fitted by least squares regression to find the values
of the parameters, which minimize the error term after p and q are chosen. More param-
eters besides the parameters in AR model should be fitted for moving-average model in
ARMA and ARIMA. In our experiment, for Financial Intermediation and Real Estate,
AR(4), ARMA(3,1), and ARIMA(3, 3, 1) obtained the best performances, respectively.
We also tried other combinations of p; q, and d . When we increased the value of d , we
found the forecasting performance improved gradually until d D 4. When d is set to 4,
the sample data are not sufficient for fitting more parameters required in ARIMA. On the
other hand, it is getting worse when q is larger than 1. For Semiconductor Industry Pro-
duction with extremely short sample sequence, only ARIMA(1, 1, 1) works for building
the ARIMA forecasting model. The samples are insufficient for fitting parameters in higher
order ARIMA.
PSO-RGM IN ECONOMIC PREDICTION
In general, the performances of statistical models are very dependent on their argument
orders, as well as the number of input sample data. We also observed that the statistical
models have good prediction accuracy at the first point, but often poorly perform at the rest.
It indicates that the statistical models are usually more suitable for short-term forecasting.
5.1.2. Neural Network. Neural networks used as nonlinear forecasting models have
gained enormous popularity and success in time series forecasting because there is now
a growing evidence that economic time series contain nonlinearities. As we expected, NN
outperforms most of other models shown in Table 3.
However, there are many parameters required to be elaborately configured. There were
no established rules for choosing the appropriate values of these parameters on economic
forecasting. We had to resort to trial to obtain their appropriate values that lead to the best
forecasting performance. Although there were many studies on how to tune the parameters
of NN, clearly, selection over the whole space of the parameters is beyond the scope of this
article.
We examined the performance with the different configurations of the combinations
of three key parameters, train-to-verify ratio (1:1, 2:1, 3:1, 4:1, and 5:1), feedback delays
(1–5), and hidden layers(1–10) for Financial Intermediation. However, it is hard to find a
rule of the correlations between the parameters and the forecasting performances. Conse-
quently, it is difficult to find an appropriate combination of parameters that brings the model
to the best performance in the practical economic forecasting where MAPE and r 2 are
unknown.
During our experiment on NN, we trained the networks models 100 times for each con-
figuration with the same parameter setting. The forecasting values with the best performance
by NN (the best r 2 in Table 3) were selected to compare with other models in Experiment II.
However, we found that there were giant differences on forecasting values among the net-
works trained with the same configuration. This is caused by the randomness and probability
mechanism inside the NN training methods. Besides, a small number of the sample data
are another reason that a relative steady model cannot be trained. The best MAPE shown in
Figure 6 is 2:71%, but the worst one is 95:56%. Most of the MAPEs are between 10% and
20%. It is impractical to use NN in real economic forecasting application where the future
data are unknown. It is difficult to select which one is the “best” because it is unable to eval-
uate the MAPE or the r 2 without the actual data. One possible solution could be calculating
the mean forecasting values from all training networks. However, experimental results show
that the average MAPE is above 15% by using the regression method.
Another common criticism of NNs is that they require an enormous amount of data for
training in real-world operation. Any machine-learning method needs sufficient representa-
tive samples to build the underlying structure to be more generalized to handle new cases,
5 1.0
100 25
0
-5 0.8
80 20
MAPE(%)
MAPE(%)
-10 0.6
60 15 -15
r2
r2
10 -20 0.4
40
-25
20 5 -30 0.2
-35
0 0 0.0
0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100
network network network network
(c) 2 (d) 2
(a) MAPE (b) MAPE (below 25%) (between 0 and 1)
FIGURE 6. An illustration showing the high degree of uncertainty on forecasting performance for the data
set of Financial Intermediation by using neural network to construct the networks for 100 times under the same
configuration with 4:1 train-to-verify ratio, four feedback delays, and three hidden layers.
COMPUTATIONAL INTELLIGENCE
including forecasting. As such, the NN model for establishing a network with only four
samples for Semiconductor Industry Production was not effective compared with the model
trained from other two data sets with relatively longer sample sequence. We found that the
minimum number of samples required in Matlab Neural Network Toolbox is 10 for building
an effective NN model.
5.2. Metaheuristics
The metaheuristics were used to search the optimal parameter ˛ in this study. They use
different high level strategies that address the exploitation and exploration of the search
spaces. Exploration generally refers to the identification of new high quality solutions by
visiting entirely new regions of a search space, while exploitation refers to the determination
of the regions within the neighborhood of previously visited regions. A class of metaheuris-
tics, for example, TS, SA, EDA and GS, aim to escape from local minima and to move
on to explore other better local minima by using different neighborhood structures accord-
ing to various probability distributions or merely random mechanism. Metaheuristics such
as ACO and EA, as well as PSO, incorporate an intelligent learning component to identify
high quality regions by recombination of previous solutions or sampling the search space to
strike a balance between exploration and exploitation.
It is widely accepted that it is hardly possible to produce a completely accurate survey
on metaheuristics. We listed three important characteristics summarized by Blum and Roli
(2003) to differentiate among the metaheuristics used in this study in Table 5. However, we
do not go into the details of comparing the effectiveness of the exploitation or the explo-
ration among them, nor analyzing the different concepts or philosophy in them. We will
discuss two factors that would impact on the forecasting results, as well as two other metrics
besides accuracy that could also validate the forecasting performance of the metaheuristics.
5.2.1. Parameter Settings. Almost all of metaheuristics are required to set a number
of parameters, which might lead to different outcomes, for example, multiple locally optimal
solutions in the parameter space in terms of solution quality (Silberholz and Golden 2010).
It is believed that it is hard neither to conclude the rules of parameter configuration nor to
figure out the principles under it in general applications. Hence, we conducted an experiment
that used various parameter configurations on PSO to find out how sensitive the forecasting
performance is related to the variation of parameter settings for the economic prediction.
Acceleration Coefficient. We evaluated the constant setting and linearly varying settings
of c1 and c2 on prediction accuracy. In constant settings, the configuration of c1 D c2 D 1:5
is the best. It is in accordance with most of the previous conclusions. In linearly varying
setting (equations 8 and 9), there is not much improvement on the metrics compared with the
constant setting. We also evaluated the forecasting performances with diverse combinations
of the start values c1C and c2 , as well as the end values c1 and c2C ranging from Œ0:5; 4 with
a step of 0.5, and found that there still isn’t much difference among them for all of the three
data sets.
Population Size. We observed that the performance of PSO is getting worse when the
population size increases regardless of other parameter settings. We found that single parti-
cle PSO is the best on forecasting accuracy. The results on the population size of the PSO
indicated that the exploitation is more significant than the exploration for the search of
12
10
8
MAPE(%)
0
Financial Intermediation RealEstate Semiconductor IndustryProduction
Dataset
FIGURE 7. The influence of different methods to calculate the inertia weight w on MAPE.
COMPUTATIONAL INTELLIGENCE
optimal parameter in RGM. In fact, PSO turns to be a single-based algorithm rather than
population-based when the population size was set to 1.
5.2.3. Convergence. Because Sudholt (2008) concluded that the computational com-
plexity of evolutionary algorithms and swarm intelligence still remains a challenging issue,
we use convergence speed in our empirical study as one of evaluation metrics. The conver-
gence speed seems to be in connection with the exploitation effectiveness that could find
a set of possible solutions quickly without wasting too much time in regions of the search
space, which are either already explored or which do not provide high-quality solutions
(Blum and Roli 2003). However, the exploration and the exploitation are believed two con-
flict goals in many applications. It indicates that the contradiction would exist between the
convergence speed and the prediction accuracy. Although the computation time in a fast
manner is not the primary purpose in economic forecasting most of which is monthly, quar-
terly or yearly based, we compared the convergence speeds of different metaheuristics by
using the convergence criteria, which was defined as less than 105 difference between 10
consecutive values of the fitness function.
Figure 8 shows an illustration of the evolution of the fitness at the first predicted year in
all of the data sets. Figure 8(a)–8(c) compares the evolutions among canonical PSOs with
different population sizes. We observed that larger population size can bring about faster
convergence. However, fast convergence does not mean low runtime. The time complexity
of the PSO is O.itermax popsize O. fitness//. The runtime is dependent on both population
size and iteration number. According to our empirical study, the itermax can be set to 60–80
in the single particle PSO. Figure 8(d)–8(f) shows the comparison of the convergence speed
among variant PSOs. There is no general rule on these PSOs for all of the data sets, but all
PSOs converge after 60–80 iterations at most.
We also evaluated other metaheuristics’ convergence speed in our study, and found
that the single-based methods have fixed convergence speeds for each data set, while the
population-based methods, ACO, EDA, and GA, have unfixed convergence speeds. How-
ever, all population-based methods converge at the iteration between 20 and 60. Besides, we
observed that ACO has the best convergence speed overall.
5.2.4. Certainty. Because most of the metaheuristic methods incorporate either the
randomness or the probability mechanism into their operations, the forecasting results are
usually different at each trial even using the same configuration. On the other hand, the
PSO-RGM IN ECONOMIC PREDICTION
FIGURE 8. The convergence speed comparison of the fitness values among canonical PSOs with different
populations sizes ((a)–(c)), and among PSOs with different parameter setting methods as well ((d)–(f)), to predict
the 1st year value for all the data sets.
best result generated by these methods could not be known in real forecasting applications
where the future values are not available to calculate the metrics for comparison. Hence,
the certainty of a metaheuristic is also one of the most significant factors on forecasting
performance.
q
We defined the degree of certainty using the standard deviation, DC(M)D
Pn
N 2
kD1 .Mk M /
n
, where n is the number of trials, Mk is the value of the kth forecasting trial
on the metric M, MN is the average value of all n trails. The DC indicates the degree of the
differences of the metrics among each forecasting with the same configuration. The smaller
the DC is, the higher the degree of certainty is indicated.
Figure 9 shows the distribution of different results of 100 trials on the two met-
rics by using single particle canonical PSO-RGM under the same configuration for each
data set. The maximum differences on MAPE are 3:52% for Financial Intermediation,
7:19% for Real Estate, and 1:15% for Semiconductor Industry Production. The average
MAPEs of all the 100 times are 7:47%; 9:06%, and 11:53% compared with the minimum
MAPEs 5:85%; 3:81%, and 10:85%, respectively. Similarly, the average r 2 are 0.81, 0.88,
and 0.59, with the maximum r 2 0.86, 0.94, and 0.61 for the three data sets, respectively.
The DC(MAPE) are 0.03, 0.08, and 0.01, and the DC(r 2 ) are 0.002, 0.003, and 0.0004,
respectively.
We found that PSO with a larger population size can have higher certainty, as shown in
Figure 10. More population of particles can guarantee the certainty of particles’ direction
and position to a global best space. However, larger population may lead to a local minima
COMPUTATIONAL INTELLIGENCE
20
0.8
15
MAPE(%)
r2
10
0.6
0 0.4
0 20 40 60 80 100 0 20 40 60 80 100
Trial No. Trial No.
(a) MAPE (b) 2
FIGURE 9. The distributions on MAPE and r 2 of 100 trials by using single particle canonical PSO-RGM.
FIGURE 10. Degree of certainty with different population sizes for the three data sets.
quickly. It is a contradiction between the certainty and the accuracy that the degree of cer-
tainty is getting lower with the decrease of the population size, which would lead to better
accuracy according to our study.
Variant PSOs with different parameter settings were evaluated in this study. Figure 11
shows the differences in values of the different variant PSOs on DC(MAPE) and DC(r 2 )
compared with the canonical PSO. The PSOs with linearly varying w or constriction factor
combined with linearly varying c1 and c2 can improve the degree of certainty. And only
the PSO with constriction factor combined with linearly varying c1 and c2 performs better
PSO-RGM IN ECONOMIC PREDICTION
FIGURE 11. The difference values of variant PSOs against canonical PSO on DC(MAPE) and DC(r 2 ).
certainty than canonical PSO for all of the three data sets. Although the nonlinearly varying
w method can improve the forecasting performance, it has higher uncertainty than other
methods.
Table 6 also shows the degree of certainty of other metaheuristics. We found that all
other metaheuristics have the high degree of certainty except ACO. GS has the best certainty.
From these empirical studies, a metaheuristic with the degree of certainty of DC(MAPE)
0:1 and DC(r 2 ) 0:01 is acceptable for economic prediction.
A practical economic prediction method not only forecasts one single data point accu-
rately but also is able to accurately forecast the trend that contains several consecutive
data points. Many methods have been proposed for economic prediction. However, these
prediction methods are seldom used to deal with both objectives on short and noisy data
sequences. Motivated by recent progress in optimization-based prediction, we proposed
PSO-RGM(1,1) model, which is able to do reasonable prediction in the short and noisy
time series. We evaluated and compared PSO-RGM(1,1) with not only other commonly
used forecasting models but also the RGM(1,1) models optimized by other well-known
COMPUTATIONAL INTELLIGENCE
ACKNOWLEDGMENTS
The authors would like to thank the corresponding editor and the anonymous reviewers
for their valuable comments, which greatly helped to improve the quality of this work.
This work has been supported in part by the National University of Singapore under
grants R-252-000-478-133 and R-252-000-478-750.
This manuscript is submitted to the Special Issue of Computational Intelligence
on Incentives and Trust in E-Commerce with guest editors, namely, Stephen Marsh,
Jie Zhang, and Christian Damsgaard Jensen and assistant ditor Zeinab Noorian, email:
z.noorian@unb.ca.
REFERENCES
AKAY, D., and M. ATAK. 2007. Grey prediction with rolling mechanism for electricity demand forecasting of
turkey. Energy, 32(9):1670–1675.
ALFI, A., and H. MODARES. 2011. System identification and control using adaptive particle swarm optimization.
Applied Mathematical Modelling, 35(3):1210–1221.
BERGH, F. 2002. An analysis of particle swarm optimizers, Ph.D. Thesis, University of Pretoria, Pretoria, South
Africa.
BLUM, C., and A. ROLI. 2003. Metaheuristics in combinatorial optimization: Overview and conceptual
comparison. ACM Computing Surveys, 35(3):268–308.
BROCKWELL, P. J., and R. A. DAVIS. 2009. Time Series: Theory and Methods (2nd ed.). Springer: New York.
CALBOREAN, H., R. JAHR, T. UNGERER, and L. VINTAN. 2013. A comparison of multi-objective algorithms
for the automatic design space exploration of a superscalar system. In Advances in Intelligent Control
Systems and Computer Science, vol. 187. Edited by DUMITRACHE, L., Advances in Intelligent Systems
and Computing. Springer: Berlin Heidelberg, pp. 489–502.
CHANG, S. C., H. C. LAI, and H. C. YU. 2005. A variablep value rolling grey forecasting model for Taiwan
semiconductor industry production. Technological Forecasting and Social Change, 72(5):623–640.
CHEN, C. F., M. C. LAI, and C. C. YEH. 2012. Forecasting tourism demand based on empirical mode
decomposition and neural network. Knowledge-Based Systems, 26(0):281–287.
CLERC, M., and J. KENNEDY. 2002. The particle swarm—explosion, stability, and convergence in a multidi-
mensional complex space. IEEE Transactions on Evolutionary Computation, 6(1):58–73.
DE GOOIJER, J. G., and R. J. HYNDMAN. 2006. 25 years of time series forecasting. International Journal of
Forecasting, 22(3):443–473.
PSO-RGM IN ECONOMIC PREDICTION
DENG, J. 1989. Grey Prediction and Decision-Making, Vol. M. Huazhong University of Science and Technology
Press: Wuhan, China.
DURILLO, J. J., A. J. NEBRO, F. LUNA, C. A. COELLO COELLO, and E. ALBA. 2010. Convergence speed in
multi-objective metaheuristics: Efficiency criteria and empirical study. International Journal for Numerical
Methods in Engineering, 84(11):1344–1375.
EBERHART, R., and J. KENNEDY. 1995. New optimizer using particle swarm theory. In Proceedings of the 1995
6th International Symposium on Micro Machine and Human Science: Nagoya, Japan, pp. 39–43.
HE, W., Z. WANG, and H. JIANG. 2008. Model optimizing and feature selecting for support vector regression in
time series forecasting. Neurocomputing, 72(1C3):600–611.
HSU, C. C., and C. Y. CHEN. 2003. Applications of improved grey prediction model for power demand
forecasting. Energy Conversion and Management, 44(14):2241–2249.
HSU, L., and C. WANG. 2007. Forecasting the output of integrated circuit industry using a grey model improved
by the bayesian analysis. Technological Forecasting and Social Change, 74(6):843–853.
HSU, L. C. 2003. Applying the grey prediction model to the global integrated circuit industry. Technological
Forecasting and Social Change, 70(6):563–574.
HSU, L. C. 2010. A genetic algorithm based nonlinear grey Bernoulli model for output forecasting in integrated
circuit industry. Expert Systems with Applications, 37(6):4318–4323.
HSU, L. C. 2011. Using improved grey forecasting models to forecast the output of opto-electronics industry.
Expert Systems with Applications, 38(11):13879–13885.
JO, T. C. 2003. The effect of virtual term generation on the neural based approaches to time series prediction. In
The IEEE Fourth Conference on Control and Automation. Concordia University, Montreal, Canada, Vol. 3,
pp. 516–520.
JU-LONG, D. 1982. Control problems of grey systems. Systems and Control Letters, 1(5):288 –294.
KANDEL, A. 1991. Fuzzy Expert Systems. CRC Press: Boca Raton, FL.
KAYACAN, E., B. ULUTAS, and O. KAYNAK. 2010. Grey system theory-based models in time series prediction.
Expert Systems with Applications, 37(2):1784–1789.
KENNEDY, J., R. C. EBERHART, and Y. SHI. 2001. Swarm Intelligence. Morgan Kaufmann Publishers: San
Francisco.
LIU, X. Q., B. W. ANG, and T. N. GOH. 1991. Forecasting of electricity consumption: A comparison between
an econometric model and a neural network model. In IEEE International Joint Conference on Neural
Networks, Seattle, WA, Vol. 2, pp. 1254–1259.
MA, J., and J. TENG. 2004. Predict chaotic time-series using unscented Kalman filter. In Proceedings of 2004
International Conference on Machine Learning and Cybernetics, Shanghai, China, Vol. 2, pp. 687–690.
MIN, H., H. JEUNG KO, and C. SEONG KO. 2006. A genetic algorithm approach to developing the multi-echelon
reverse logistics network for product returns. Omega, 34(1):56–69.
POLI, R. 2008. Analysis of the publications on the applications of particle swarm optimisation. Journal of
Artificial Evolution and Applications, 2008:1–10.
QUAH, T. S., and B. SRINIVASAN. 1999. Improving returns on stock investment through neural network
selection. Expert Systems with Applications, 17(4):295–301.
RABINER, L. R. 1989. A tutorial on hidden markov models and selected applications in speech recognition.
Proceedings of the IEEE, 77(2):257–286.
RATNAWEERA, A., S. K. HALGAMUGE, and H. C. WATSON. 2004. Self-organizing hierarchical particle swarm
optimizer with time-varying acceleration coefficients. IEEE Transactions on Evolutionary Computation,
8(3):240–255.
ROMAN, J., and A. JAMEEL. 1996. Backpropagation and recurrent neural networks in financial analysis of
multiple stock market returns. In IEEE System Sciences Proceedings of the 29th Hawaii International
Conference, Maui, HI, Vol. 2, pp. 454–460.
COMPUTATIONAL INTELLIGENCE
SHEN, J., Z. CANXIN, C. LIAN, H. HU, and M. MAMMADOV. 2010. Investment decision model via an improved
bp neural network. In 2010 IEEE International Conference on Information and Automation (ICIA), Harbin,
China, pp. 2092–2096.
SHI, Y., and R. C. EBERHART. 1998. Parameter selection in particle swarm optimization. In Evolutionary Pro-
gramming VII, vol. 1447. Edited by PORTO, V. W., N. SARAVANAN, D. WAAGEN, and A. E. EIBEN, Lecture
Notes in Computer Science. Springer: Berlin Heidelberg, pp. 591–600.
SHI, Y., and R. C. EBERHART. 1999. Empirical study of particle swarm optimization. In CEC 99. Proceedings
of the 1999 Congress on Evolutionary Computation, Washington, DC, Vol. 3, p. 1950.
SILBERHOLZ, J., and B. GOLDEN. 2010. Handbook of metaheuristics. Edited by GENDREAU, M., and J. Y.
POTVIN. Springer: New York, pp. 625–640.
SUDHOLT, D. 2008. Computational complexity of evolutionary algorithms, hybridizations, and swarm intelli-
gence, Ph.D. Thesis, Dortmund University of Technology, Dortmund, Germany.
TAN, G. 2000. The structure method and application of background value in grey system gm(1,1) model (i).
Systems Engineering-Theory and Practice, 2000(4):98–103.
TANG, H. W. V., and M. S. YIN. 2012. Forecasting performance of grey prediction for education expenditure
and school enrollment. Economics of Education Review, 31(4):452–462.
TKACZ, G. 2001. Neural network forecasting of Canadian GDP growth. International Journal of Forecasting,
17(1):57–69.
TRELEA, I. C. 2003. The particle swarm optimization algorithm: convergence analysis and parameter selection.
Information Processing Letters, 85(2003):317–325.
YAO, M. J., and W. M. CHU. 2008. A genetic algorithm for determining optimal replenishment cycles to
minimize maximum warehouse space requirements. Omega, 36(4):619–631.
YOKUMA, J. T., and J. S. ARMSTRONG. 1995. Beyond accuracy: comparison of criteria used to select forecasting
methods. International Journal of Forecasting, 11(4):591–597.
ZHAO, Z., J. WANG, J. ZHAO, and Z. SU. 2012. Using a grey model optimized by differential evolution algorithm
to forecast the per capita annual net income of rural households in china. Omega, 40(5):525–532.
APPENDIX
Parameters
Models
AR Scalar: n=2, where n is the length of the sample data
Parameter fitting: forward–backward approacha
Parameters
Metaheuristics
EDA Maximum iterations: 100
Population size: 1,000
Learning and sampling method: mixture of full Gaussian model
Replacement method: choose best elitism
Selection method: truncation selection
GA Population size: 20
Generations: 100
Crossover fraction: 0.8
Migration interval and fraction: 20 and 0.2
Selection/crossover/mutation functions: stochastic/scattered/Gaussian