Professional Documents
Culture Documents
Abstract—In this paper, we propose a strategy to improve the decomposition of a time series in components of low and
the forecasting of traffic accidents in Concepción, Chile. high frequency from the singular values of the Hankel matrix.
The forecasting strategy consists of four stages: embedding, The strategy is applied in four stages, embedding with Hankel
decomposition, estimation and recomposition. At the first stage,
the Hankel matrix is used to embed the original time series. At the matrix, decomposing with SVD, estimation with ANN-PSO,
second stage, the Singular Value Decomposition (SVD) technique and recomposing with simple addition. The time series are the
is applied. SVD extracts the singular values and the singular traffic accidents of Concepción - Chile, sinister number and
vectors, which are used to obtain the components of low and injured, from year 2000 to 2012, with weekly sampling.
high frequency. At the third stage, the estimation is implemented The paper is structured as follows. Section II describes
with an Autoregressive Neural Network (ANN) based on Particle
Swarm Optimization (PSO). The final stage is recomposition, the time series forecasting strategy. Section III presents the
where the forecasted value is obtained. The results are compared forecasting accuracy metrics. Section IV presents the results
with the values given by the conventional forecasting process. Our and discussion. Section V gives conclusions.
strategy shows high accuracy and is superior to the conventional
process.
II. T IME SERIES FORECASTING STRATEGY
Index Terms—Autoregressive neural network, particle swarm
optimization, singular value decomposition. Our forecasting strategy is presented in Figure 1. It consists
of four stages: embedding, decomposition, estimation, and
I. I NTRODUCTION recomposition. Embedding means to map the time series in
a Hankel matrix, decomposition is developed with SVD, the
ISSN 2395-8618
Fig. 1. Time series forecasting strategy
Therefore, the matrix Ai contains the i − th component, the where x is the data input matrix, with order N × P, N is the
extraction process is: sample length, and P is the number of input variables (lags
terms), w is the weight matrix of order P × Nh , with Nh hidden
Ci =
Ai (1, :) Ai (2, n : m)T
(5) units. The sigmoid transfer function is applied at hidden layer
with
where Ci is the i−th component, the elements of Ci are located φ (net) = 1/(1 + e−net ) (11)
in the first row and last column of Ai . The weights of the ANN connections, w and b are adjusted
The optimal number of components (M) is given by the with PSO learning algorithm. In the swarm the N p particles
maximum peak of differential energy ∆E of each pair of has a position vector Xi = (Xi1 , Xi2 , . . . , XiD ), and a velocity
sequential components, and it computation is vector Vi = (Vi1 ,Vi2 , . . . ,ViD ), each particle is considered a
potential solution in a D-dimensional search space. During
∆Ei = Ei − Ei+1 (6) each iteration the particles are accelerated toward the previous
best position denoted by pid and toward the global best
where Ei is the energy of the i − th component, and i = position denoted by pgd . The swarm has N p xD values and
1, . . . , M − 1. The energy of each singular is computed with is initialized randomly, D is computed with P × Nh + Nh ; the
process finish when the lowest error is obtained based on the
M
Ei = s2i /( ∑ s2i ) (7) fitness function evaluation, or when the maximum number of
i=1 iterations is reached [20], [21].
where si is the i − th singular value of the Hankel Vidl+1 = I l ×Vidl + c1 × rd1 (plid + Xidl )+
matrix obtained before. The first component extracted is c2 × rd2 (plgd + Xidl ) (12)
the component CL and the second is the component CH (if
Xidl+1 = Xidl
+Vidl+1 (13)
M = 2). When the optimal number of components M > 2,
the component CH is computed with the summation of the Il − Il
I l = Imax
l
− max min × l, (14)
components from 2 to M − th, as follows itermax
where i = 1, . . . , N p , d = 1, . . . , D; I denotes the inertia weight,
M
CH = ∑ Ci (8) c1 and c2 are learning factors, rd1 and rd2 are positive
i=2 random numbers in the range [0, 1] under normal distribution,
ISSN 2395-8618
l is the lth iteration. Inertia weight has linear decreasing, in was computed, this is shown in the Fig. 2, the maximum
equation 14, Imax is the maximum value of inertia, Imin is the peak represents the optimal M. The embedding and the
lowest, and itermax is total of iterations. decomposing is executed again with the optimal M, for time
The particle Xid represents the optimal solution of the set series number of accidents the optimal was M = 6, and for
of weights in the neural network, therefore Xi d contains the the time series injured people the optimal found was M = 4.
ANN connections weights w and b. The component of low frequency extracted and estimated for
the time series number of accidents is shown in the Fig. 4a,
D. Recomposing the time series while the component of high frequency the same time series
is shown in the Fig. 4b. The component of low frequency
The recomposition of the time series is done with the
extracted and estimated for the time series injured people is
addition of the estimated components, then the forecasted time
shown in the Fig. 5a, while the component of high frequency
series is obtained using
the same time series is shown in the Fig. 5b.
x̂ = ĈL + ĈH . (15)
B. Estimation and Recomposition
III. F ORECASTING ACCURACY METRICS The calibration of the number of lags of the ANN was
The number of lags for the ANN is determined with the determined with the GCV metric, for the two time series was
metric: Generalized Cross Validation (GCV), this determines found an optimal ANN(K, Nh , 1), with K = 7 inputs for the
the best number based on the accuracy of the forecasting time series number of accidents and K = 5 for the time series
probing a determined range of values. The evaluation of the injured people as shown the Fig. 3, and the number of hidden
forecasting is computed with the metrics: Mean Absolute nodes was assigned in Nh = 6 for the two time series, this
Percentage Error (MAPE), Coefficient of determination R2 , value was computed with the natural logarithm of the training
Root Mean Squared Error (RMSE), and Relative Error (RE). data set length (normally used in our experiments).
v The PSO learning algorithm was applied to determine the
u
u 1 Nv weights of the ANN, after trial and error they were configured
RMSE = t ∑ (xi − x̂i )2 (16) with a swarm of N p ×D dimension, N p = 40 particles, and D =
Nv i=1
N p × Nh + Nh , inertia weight parameter I has linear decreasing
RMSE with a maximum value of 1 and a minimum value of 0.2, the
GCV = (17)
(1 − K/Nv )2 acceleration factors c1 and c2 were fixed in 1.05 and 2.95
" #
1 Nv respectively, the iterm ax is 2500.
MAPE = ∑ |(xi − x̂i )/xi | × 100
Nv i=1
(18) The evaluation performed at the testing stage for the time
series number of accidents is presented in the Fig. 6 and
σ 2 (er) Table I. The observed values vs. the estimated values are
R2 = 1 − (19)
σ 2 (x) illustrated in the Fig. 6a, reaching a good accuracy, while the
Nv relative error is presented in the Fig. 6b, which shows that the
RE = ∑ (xi − x̂i )/xi (20) 98.5% of the points present an error lower than the ±10%.
i=1 The evaluation performed at the testing stage for the time
where Nv is the validation (testing) sample size, xi is the i-th series number of injured people is presented in the Fig. 7
observed value, x̂i is the i-th estimated value, and K is the and Table II. The observed values vs. the estimated values are
number of lagged values. illustrated in the Fig. 7a, reaching a good accuracy, while the
relative error is presented in the Fig. 7b, which shows that the
IV. R ESULTS AND DISCUSSION 95.54% of the points present an error lower than the ±10%.
The applied data are available from the CONASET web TABLE I
site [22], and they represent the number of accidents and the N UMBER OF ACCIDENTS F ORECASTING
injured of Concepción-Chile, from year 2000 to 2012 with
SVD-ANN-PSO ANN-PSO
weekly sampling. The training data set contains the 70% of Components 6 —
the sample, consequently the testing data set contains the 30% RMSE 0.0211 0.087
In the next subsections are evaluated the time series forecasting MAPE 3.17% 14.48%
R2 98.29% 70.71%
strategy presented in Fig. 1. RE ± 10% 98.5% 48.76%
A. Embedding and Decomposition The results presented in Table I show that the major
The time series is embed in a Hankel matrix, the optimal accuracy of the forecasting of the time series number of
number of components M was determined using and initial accidents is achieved with the model SVD-ANN-PSO(7,6,1),
number M = N/2 components. Once obtained the number the with a RMSE of 0.0211, and a MAPE of 3.17%, the 98.5%
components, the differential energy ∆E, of each component of the points have an relative error lower than the ±10%.
ISSN 2395-8618
Fig. 2. ∆E : (a) Number of Accidents, (b) Injured people
TABLE II
I NJURED PEOPLE F ORECASTING V. C ONCLUSIONS
ISSN 2395-8618
Fig. 5. Injured people components(a) CL , (b) CH
with the proposed strategy for the two time series analyzed, ACKNOWLEDGEMENTS
SVD-ANN-PSO shows superiority regards the conventional
This research was partially supported by the Chilean
implementation. For the time series number of accidents,
National Science Fund through the project Fondecyt-Regular
SVD-ANN-PSO reaches an RMSE of 0.0211, and a MAPE of
1131105 and by the VRIEA of the Pontificia Universidad
3.17%, in front of the conventional ANN-PSO that reaches an
Católica de Valparaı́so.
RMSE of 0.087, and a MAPE of 14.48%. For traffic accidents
reaches an RMSE of 0.0172, and a MAPE of 3.58%, in front
of the conventional ANN-PSO that reaches an RMSE of 0.101, R EFERENCES
and a MAPE of 21.42%. [1] Hornik K., Stinchcombe X., White H.: Multilayer feedforward networks
In the future, this strategy will be evaluated with data of are universal approximators. Neural Networks. 2(5), 359–366 (1989)
[2] Svozil D., Kvasnicka V., Pospichal J.: Introduction to multi-layer
traffic accidents of other regions of Chile, other countries, and feed-forward neural networks. Chemometrics and Intelligent Laboratory
with time series of other engineering fields. Systems. 39(1), 43–62 (1997)
ISSN 2395-8618
[3] Chattopadhyay G., Chattopadhyay S.:Autoregressive forecast of monthly [12] Jeong K., Koo C., Hong T.: An estimation model for determining
total ozone concentration: A neurocomputing approach. Computers & the annual energy cost budget in educational facilities using SARIMA
Geosciences. 35(9), 1925–1932 (2009) (seasonal autoregressive integrated moving average) and ANN (artificial
[4] Maali Y., Al-Jumaily A.: Multi Neural Networks Investigation based neural network). Energy. 71, 71–79 (2014)
Sleep Apnea Prediction. Procedia Computer Science. 24, 97–102 (2013) [13] Wei Y., Chen M.C.: Forecasting the short-term metro passenger flow
[5] Rojas I., Pomares H., Bernier J.L., Ortega J., Pino B., Pelayo F.J., with empirical mode decomposition and neural networks. Transportation
Prieto A.: Time series analysis using normalized PG-RBF network with Research Part C: Emerging Technologies. 21(1), 148–162 (2012)
regression weights. Neurocomputing. 42(1–4), 267–285 (2002) [14] Shoaib M., Shamseldin A.Y., Melville B.W.: Comparative study of
[6] Roh S.B., Oh S.K., Pedrycz W.: Design of fuzzy radial basis function- different wavelet based neural network models for rainfall–runoff
based polynomial neural networks. Fuzzy Sets and Systems. 185(1), modeling. Journal of Hydrology. 515, 47–58 (2014)
15–37 (2011) [15] Zhou J., Duan Z., Li Y., Deng J., Yu D.: PSO-based neural network
[7] Liu F., Ng G.S., Quek C.: RLDDE: A novel reinforcement learning- optimization and its utilization in a boring machine. Journal of Materials
based dimension and delay estimator for neural networks in time series Processing Technology. 178(13), 19–23 (2006)
prediction. Neurocomputing. 70(7–9), 1331–1341 (2007) [16] Mohandes M.A.: Modeling global solar radiation using particle swarm
[8] Scarselli F., Chung A.: Universal Approximation Using Feedforward optimization PSO. Solar Energy. 86(11), 3137–3145 (2012)
Neural Networks: A Survey of Some Existing Methods, and Some New [17] de Mingo López L.F., Blas N.G., Arteta A. The optimal combination:
Results. Neural Networks. 11(1), 15–37 (1998) Grammatical swarm, particle swarm optimization and neural networks.
[9] Gheyas I.A., Smith L.S.: A novel neural network ensemble architecture Journal of Computational Science. 3(12), 46–55, (2012)
for time series forecasting. Neurocomputing. 74(18), 3855–3864 (2011) [18] Shores, T.S.: Applied Linear Algebra and Matrix Analysis. Springer,
[10] Gao D., Kinouchi Y., Ito K., Zhao X.: Neural networks for event 291–293, (2007)
extraction from time series: a back propagation algorithm approach. [19] Freeman J.A., Skapura D.M.: Neural Networks, Algorithms, Applica-
Future Generation Computer Systems. 21(7), 1096–1105 (2005) tions, and Programming Techniques. Addison-Wesley, California (1991)
[11] Khashei M., Bijari M., Ali G.: Hybridization of autoregressive integrated [20] Eberhart R.C., Shi Y., Kennedy J.: Swarm Intelligence. Morgan
moving average (ARIMA) with probabilistic neural networks (PNNs). Kaufmann, San Francisco CA (2001)
Computers & Industrial Engineering. 63(1), 37–45 (2012) [21] Yang X.S.: Chapter 7. Particle Swarm Optimization: Nature-Inspired
Optimization Algorithms. Elsevier. 99–110 (2014)
[22] National Commission of Transit Security, http://www.conaset.cl