You are on page 1of 23

Neurocomputing 347 (2019) 59–81

Contents lists available at ScienceDirect

Neurocomputing
journal homepage: www.elsevier.com/locate/neucom

A deep increasing–decreasing-linear neural network for financial time


series prediction
Ricardo de A. Araújo a,∗, Nadia Nedjah b, Adriano L.I. Oliveira c, Silvio R. de L. Meira d
a
Laboratório de Inteligência Computacional do Araripe, Instituto Federal do Sertão Pernambucano, Brazil
b
Departamento de Engenharia Eletrônica e Telecomunicações, Universidade Estadual do Rio de Janeiro, Brazil
c
Centro de Informática, Universidade Federal de Pernambuco, Brazil
d
Instituto SENAI para Inovação em Tecnologias da Informação e Comunicação, Recife, Pernambuco, Brazil

a r t i c l e i n f o a b s t r a c t

Article history: Several neural network models have been proposed in the literature to predict the future behavior of
Received 26 March 2018 financial time series. However, an intrinsic limitation arises from this particular prediction task with
Revised 28 January 2019
modeling via neural networks, since the prediction, when sampled in daily frequency, have 1-step-ahead
Accepted 6 March 2019
delay with respect to real time series observations. In order to overcome such drawback, we present a
Available online 13 March 2019
deep increasing–decreasing-linear neural network (wherein each layer is composed of a set of increasing–
Communicated by Dr F.A Khan decreasing-linear processing units) to predict the behavior of financial time series. In addition, we present
a learning process to train the proposed model using a descending gradient-based approach. In order to
Keywords:
Deep increasing–decreasing-linear Neural assess the model’s prediction performance, we use twelve financial time series from relevant stock mar-
network kets around the world. The obtained results show that the proposed model have competitiveness, in
Gradient-based learning terms of predictive performance, and have better effectiveness when compared to recent models pre-
Financial time series sented in the literature of time series prediction.
Prediction
© 2019 Elsevier B.V. All rights reserved.

1. Introduction a particular dilemma still remains without answer so far when we


consider the prediction of financial time series, sampled in daily
The need of knowledge about the future has attracted signifi- frequency, using neural network-based models [1,3,4,14,17,31,44].
cant interest from researchers to find ways to build models able to This is because it is not possible to predict financial time series
predict financial time series [1,14,25,28,38,54,56]. In this context, without the 1-step-ahead delay with respect to real time series ob-
the prediction is considered potentially valuable for the decision servations [1,10].
making process in financial investments [34,52], allowing to the in- Recently, with the advance of deep learning, several deep neu-
vestors maximizing profit yet minimizing risks of their operations ral network models have been recently presented in the literature
within stock market [4]. for signal and image analysis and recognition [18,22,29,40,48,51].
Several neural network models have been pro- In the context, Gashler and Ashmore [21] and Chong et al. [4] have
posed in the literature to predict financial time series achieved promising empirical results using deep neural network
[3,11,13,14,28,35,37,41,43,47,49,54,56]. The reason for that is due to models for time series prediction, showing new paths to be forged
emergence of the efficient market hypothesis (EMH) [49], since in this research area.
there is an uniform agreement that the generator phenomenon of In this way, we intend to extend our previous work [14], an-
the stock market has some nonlinearity [4,5,26]. swering some open issues about the generator phenomenon of
In this sense, although there is no consensus on the predictabil- daily frequency financial time series, as well as developing a deep
ity (efficiency) of the stock market, several empirical studies have neural network model, called the deep increasing–decreasing-
shown that the generator phenomenon of financial time series is linear perceptron (DIDLP), for producing a hybrid mapping to
to some extent predictable [3,4,7,12,14,26,28,45,53,54,56]. However, predict financial time series. Each layer of the proposed model
is composed of a set of increasing–decreasing-linear processing
units, followed by an activation function. In addition to that, we

Corresponding author. have extended the descending gradient-based learning process to
E-mail addresses: ricardo.araujo@ifsertao-pe.edu.br (R.d.A. Araújo), deal with the deep structure of the proposed model, using ideas
nadia@eng.uerj.br (N. Nedjah), alio@cin.ufpe.br (A.L.I. Oliveira), from the back-propagation algorithm [24,42] and employing a
silvio@meira.com (S. R. de L. Meira).

https://doi.org/10.1016/j.neucom.2019.03.017
0925-2312/© 2019 Elsevier B.V. All rights reserved.
60 R.d.A. Araújo, N. Nedjah and A.L.I. Oliveira et al. / Neurocomputing 347 (2019) 59–81

Fig. 1. Behavior of the time series: (a) CAC40, (b) DAX, (c) DJIA, (d) HANGSENG, (e) IBOVESPA, (f) IPSA, (g) MERVAL, (h) NASDAQ, (i) NIKKEI225, (j) NYSE, (k) SP500 and (l)
SSE.

systematic approach to circumvent the nondifferentiability prob- observations of the stock index from 2013/07/12 to 2017/06/29, as
lem of the increasing–decreasing operator, according to [36]. In or- depicted in Fig. 1.
der to assess the model’s prediction performance, we use twelve Initially, we have investigated the autocorrelation function (ACF)
financial time series from relevant stock markets around the world. [2], depicted in Fig. 2. In this figure we can observe an hyperbolic
The obtained results demonstrate competitiveness of the proposed decay, confirming the supposition of a short-term dominant linear
deep model, in terms of predictive performance, and in terms of a dependence and suggesting the presence of some nonlinear depen-
better effectiveness when compared to recent models presented in dence in the generator phenomenon of these time series.
the literature of time series prediction. However, the autocorrelation function cannot be used to eval-
We organize this work as follows. In Section 2, we present a uate nonlinear dependencies [2]. Therefore, we have investigated
time series analysis of the investigated financial time series. After the mean mutual information (MMI) [19,46], depicted in Fig. 3.
that, in Section 3, we describe the proposed model and the pro- This graphic reveals the existence of nonlinear dependence in
posed learning process. Subsequently, in Section 4, we show and all these time series. Observe that the MMI value converges
analyze the simulations and the experimental results. At the end, around 0.5 for CAC40, DAX, HANGSENG, IBOVESPA, IPSA, NYSE and
in Section 5, we draw some useful conclusions and point out some NIKKEI225 time series, around 1.0 for DJIA, NASDAQ, SP500 and
promising future directions of this work. SSE time series, and around 1.5 for the MERVAL time series.
Even with the confirmation of the nonlinear dependence of
2. The time series analysis these time series, it is necessary to assess the nature of such non-
linearity. In this context, we have investigated the Hurst parame-
The Cac 40 Index (CAC), Dax Index (DAX), Dow Jones Industrial ter (HP) [16,27,32], depicted in Fig. 4. Note that, when we consider
Average Index (DJIA), Hang Seng Index (HS), Bovespa Index (IBOV), low order time lags to calculate the hurst parameter, the obtained
Ipsa Index (IPSA), Merval Index (MER), Nasdaq Index (NASDAQ), values for all time series are close to 0.5. The practical implica-
Nikkei 225 Index (NK), New York Stock Exchange Index (NYSE), tion of such fact is that, for low order time lags, the generator
Standard and Poor 500 Index (SP) and Sse Index (SSE), time se- phenomenon of these time series tends to be guided by a random
ries investigated in this work, represent the closing values of the walk process. Nevertheless, when we consider high order time lags
stock index from the most important financial markets around to calculate the hurst parameter, it is possible to observe values
the world. These time series are composed by daily frequency significantly smaller than 0.5. Observe that the increase in the
R.d.A. Araújo, N. Nedjah and A.L.I. Oliveira et al. / Neurocomputing 347 (2019) 59–81 61

Fig. 2. Autocorrelation function of the time series: (a) CAC40, (b) DAX, (c) DJIA, (d) HANGSENG, (e) IBOVESPA, (f) IPSA, (g) MERVAL, (h) NASDAQ, (i) NIKKEI225, (j) NYSE, (k)
SP500 and (l) SSE.

order of time lags implies the decrease of the hurst parameter Besides, we have investigated the first discrete derivative, de-
value. The behavior of an anti-persistent time series, that is, a time picted in Fig. 7. In this figure we can observe the presence of
series build by an auto-similar process with long-term nonlinear both increasing and decreasing components in the time series. This
dependency. In this sense, it is possible to observe that there is a confirms the hypothesis that the long-term subdominant nonlinear
strong relation between predictability and the time lags order. The component can be represented by a balanced combination increas-
wrong choice of the time lags order can make a time series unpre- ing and decreasing mappings.
dictable. Therefore, the time series analysis has provided several evi-
In an attempt to confirm, in a subjective way, the hurst param- dences, considering samples in daily frequency, that investigated
eter analysis, we investigate the lagplot graphic of a low order time financial time series can be modeled in terms of a balanced com-
lag, depicted in Fig. 5, and a high-order time lag, depicted in Fig. 6. bination of a short-term dominant linear component and a long-
Note that, for the low order time lag, we have chosen the first time term subdominant nonlinear component with both increasing and
lag, while for the high order time lag, we have chosen those that decreasing behaviors.
produce the lowest hurst parameter value (to better characterize
the time series generator phenomenon), for each particular time 3. The proposed model
series.
Fig. 5 suggests the presence of a short-term dominant linear re- The proposed model in this work, called the deep increasing–
lationship for all these time series. Note that there is a dominant decreasing-linear perceptron (DIDLP), has a deep layer architec-
linear structure within lagplot graph, confirming the hurst param- ture composed of increasing–decreasing-linear processing units
eter analysis. In addition to that, Fig. 6 reveals nonlinear struc- given by a combination between a hybrid nonlinear morphologi-
tures in the lagplot graph, characterizing the long-term subdom- cal operator (an increasing–decreasing operator) and a linear op-
inant nonlinear relationship in these time series. Therefore, the erator (classical perceptron), followed by an activation function.
lagplot analysis has confirmed, in a subjective way, that these time It is worth mentioning that the proposed model is inspired by
series are generated by a balanced combination between linear and previous neural architectures proposed by Pessoa and Maragos
nonlinear components. [36] and de A. Araújo et al. [8,9]. However, instead of the use of
62 R.d.A. Araújo, N. Nedjah and A.L.I. Oliveira et al. / Neurocomputing 347 (2019) 59–81

CAC40 DAX DJIA HANGSENG


2.2 2.4 2.6 2.2

2 2.2 2.4 2

1.8 2 1.8
2.2
Mean Mutual Information

Mean Mutual Information

Mean Mutual Information

Mean Mutual Information


1.6 1.8 1.6
2
1.4 1.6 1.4
1.8
1.2 1.4 1.2
1.6
1 1.2 1
1.4
0.8 1 0.8

1.2
0.6 0.8 0.6

0.4 0.6 1 0.4

0.2 0.4 0.8 0.2


10 20 30 40 50 60 70 80 90 100 110 120 10 20 30 40 50 60 70 80 90 100 110 120 10 20 30 40 50 60 70 80 90 100 110 120 10 20 30 40 50 60 70 80 90 100 110 120
Time Lags Time Lags Time Lags Time Lags

(a) (b) (c) (d)


IBOVESPA IPSA MERVAL NASDAQ
2.2 2 2.8 2.6

2 2.4
2.6

1.8
2.2
Mean Mutual Information

Mean Mutual Information

Mean Mutual Information

Mean Mutual Information


2.4
1.6 1.5
2
1.4 2.2
1.8
1.2 2
1.6
1 1
1.8
1.4
0.8

1.6 1.2
0.6

0.4 0.5 1.4 1


10 20 30 40 50 60 70 80 90 100 110 120 10 20 30 40 50 60 70 80 90 100 110 120 10 20 30 40 50 60 70 80 90 100 110 120 10 20 30 40 50 60 70 80 90 100 110 120
Time Lags Time Lags Time Lags Time Lags

(e) (f) (g) (h)


NIKKEI225 NYSE SP500 SSE
2.4 2.2 2.6 2.4

2.2 2
2.4 2.2
1.8
2
2.2 2
Mean Mutual Information

Mean Mutual Information

Mean Mutual Information

Mean Mutual Information


1.6
1.8
2 1.8
1.4
1.6
1.2 1.8 1.6
1.4
1
1.6 1.4
1.2
0.8
1.4 1.2
1
0.6

0.8 1.2 1
0.4

0.6 0.2 1 0.8


10 20 30 40 50 60 70 80 90 100 110 120 10 20 30 40 50 60 70 80 90 100 110 120 10 20 30 40 50 60 70 80 90 100 110 120 10 20 30 40 50 60 70 80 90 100 110 120
Time Lags Time Lags Time Lags Time Lags

(i) (j) (k) ( )


Fig. 3. Mean mutual information of the time series: (a) CAC40, (b) DAX, (c) DJIA, (d) HANGSENG, (e) IBOVESPA, (f) IPSA, (g) MERVAL, (h) NASDAQ, (i) NIKKEI225, (j) NYSE,
(k) SP500 and (l) SSE.

morphological-rank-linear or dilation-erosion-linear processing Nl−1



units, respectively, we use increasing–decreasing-linear processing βn(l ) = yi(l−1) p(n,i
l)
+ ρn(l ) , (3)
units. In this sense, it can be verified that these models presented i=1
in [8,9,36] are particular cases of the proposed model in this work. and
Furthermore, it is worth mentioning the proposed model is a  
direct mapping to build the generator phenomenon of daily fre- αn(l ) = θn(l ) τn(l ) + 1 − θn(l ) κn(l ) , θn(l ) ∈ [0, 1], (4)
quency financial time series due to the inclusion of linear, increas- in which
ing and decreasing components (characteristics found within gen-  
erator phenomena of the investigated financial time series – see
τn(l ) = ϕn(l ) δn(l ) + 1 − ϕn(l ) εn(l ) , ϕn(l ) ∈ [0, 1], (5)
Section 2). In the sequel, we present the formal definition of the and
proposed model. (l )  
κn(l ) = ωn(l ) δ n + 1 − ωn(l ) εn(l ) , ωn(l ) ∈ [0, 1], (6)
with
3.1. Model definition
  N
l−1
 (l−1) 
The nth output of a processing unit from lth layer of the pro-
δn(l ) = δan(l ) y(l−1) = yi (l )
+ an,i , (7)
i=1
posed DIDLP is given by
    N
l−1
 (l−1)  (l ) 
yn(l ) = f un(l ) , n = 1, . . . , Nl , (1) εn(l ) = εbn(l ) y(l−1) = yi + bn,i , (8)
i=1
in which
l−1 
  (l ) (l )   N  (l−1) ∗  (l ) 
un(l ) = λn(l ) αn(l ) + 1 − λn(l ) βn , λn(l ) ∈ [0, 1], (2) δ n = δ cn(l ) y(l−1) = yi + cn,i , (9)
i=1
with
R.d.A. Araújo, N. Nedjah and A.L.I. Oliveira et al. / Neurocomputing 347 (2019) 59–81 63

CAC40 DAX DJIA HANGSENG


0.5 0.55 0.55 0.55

0.45 0.5
0.5 0.5
0.4
0.45
Hurst Parameter

Hurst Parameter

Hurst Parameter

Hurst Parameter
0.35
0.45 0.45
0.4
0.3
0.35
0.4 0.4
0.25

0.3
0.2
0.35 0.35
0.15 0.25

0.1 0.2 0.3 0.3


10 20 30 40 50 60 70 80 90 100 110 120 10 20 30 40 50 60 70 80 90 100 110 120 10 20 30 40 50 60 70 80 90 100 110 120 10 20 30 40 50 60 70 80 90 100 110 120
Time Lags Time Lags Time Lags Time Lags

(a) (b) (c) (d)


IBOVESPA IPSA MERVAL NASDAQ
0.5 0.52 0.54 0.5

0.52 0.48
0.45 0.5
0.5 0.46
0.4
0.48 0.48 0.44
Hurst Parameter

Hurst Parameter

Hurst Parameter

Hurst Parameter
0.35
0.46 0.42
0.46
0.3 0.44 0.4
0.44
0.42 0.38
0.25

0.42 0.4 0.36


0.2
0.38 0.34
0.15 0.4
0.36 0.32

0.1 0.38 0.34 0.3


10 20 30 40 50 60 70 80 90 100 110 120 10 20 30 40 50 60 70 80 90 100 110 120 10 20 30 40 50 60 70 80 90 100 110 120 10 20 30 40 50 60 70 80 90 100 110 120
Time Lags Time Lags Time Lags Time Lags

(e) (f) (g) (h)


NIKKEI225 NYSE SP500 SSE
0.5 0.48 0.48 0.57

0.46 0.46 0.56


0.45
0.44 0.44 0.55

0.42 0.42 0.54


0.4
Hurst Parameter

Hurst Parameter

Hurst Parameter

Hurst Parameter
0.4 0.4 0.53

0.35 0.38 0.38 0.52

0.36 0.36 0.51


0.3
0.34 0.34 0.5

0.32 0.32 0.49


0.25
0.3 0.3 0.48

0.2 0.28 0.28 0.47


10 20 30 40 50 60 70 80 90 100 110 120 10 20 30 40 50 60 70 80 90 100 110 120 10 20 30 40 50 60 70 80 90 100 110 120 10 20 30 40 50 60 70 80 90 100 110 120
Time Lags Time Lags Time Lags Time Lags

(i) (j) (k) ( )


Fig. 4. Hurst parameter of the time series: (a) CAC40, (b) DAX, (c) DJIA, (d) HANGSENG, (e) IBOVESPA, (f) IPSA, (g) MERVAL, (h) NASDAQ, (i) NIKKEI225, (j) NYSE, (k) SP500
and (l) SSE.

l−1  
  N  (l−1) ∗ is composed by the increasing–decreasing-linear operator, which is
ε n(l ) = εdn(l ) y(l−1) = yi (l )
+ dn,i . (10) given by a combination (the mix parameter is given by θn(l ) ) be-
i=1 tween the increasing module (defined by τn(l ) ) and the decreas-
Note that Nl denotes the number of processing units within the ing module (defined by κn(l ) ). The increasing module is composed
lth layer. Parameters λn(l ) , θn(l ) , ϕn(l ) , ωn(l ) , ρn(l ) ∈ [0, 1], and parame- by another combination (the mix parameter is given by ϕn(l ) ) be-
ters an(l ) , bn(l ) , cn(l ) , dn(l ) , pn(l ) ∈ RNl−1 . Notice that the output of any tween a dilation operator (defined by δn(l ) ) and an erosion opera-
processing unit employs an activation function f(un(l ) ). Any activa- tor (defined by εn(l ) ). The decreasing module is composed by an-
tion function can be used within the proposed model. However, in other combination (the mix parameter is given by ωn(l ) ) between
this work we have take advantage of the logistic and relu activa- (l )
an anti-dilation operator (defined by δ n ) and an anti-erosion op-
tion functions, which are respectively given by:
erator (defined by ε n(l ) ). Fig. 8 depicts the processing unit of the
  1 proposed DIDLP model.
f un(l ) =  , (11)
1 + exp −un(l ) Vector pn(l ) represents the coefficients of the linear operator and
the scalar ρn(l ) represents the bias. Vectors a, b, c and d represent,
and respectively, the structuring elements of dilation (δ (l ) (y(l−1 ) )),
  an
f un(l ) = max(0, un(l ) ). (12) erosion (ε (l−1 ) )), anti-dilation (δ (l−1 ) ))
(l ) ( y (l ) ( y and anti-erosion
bn cn
Also, notice that the internal activation (un(l ) ) of any processing (ε (l )
dn
(y ( l−1 ) )) operators. Note that operators and repre-
unit is composed by a combination (the mix parameter is given by sent supremum and infimum operators. The operation + dif-
λn(l ) ) between βn(l ) and αn(l ) , which represent linear and nonlinear fers from + only when (−∞ ) + (+∞ ) = +∞ instead of (−∞ ) +
modules, respectively, of the processing unit. The linear module is (+∞ ) = −∞. Finally, the term x∗i represents the conjugate of ele-
composed only by the classical perceptron. The nonlinear module ment xi , given by
64 R.d.A. Araújo, N. Nedjah and A.L.I. Oliveira et al. / Neurocomputing 347 (2019) 59–81

104 104 104


5400 2.8
1.25 2.1
2.7
5200
1.2
2 2.6
5000 1.15
2.5
1.9
1.1
4800 2.4

x(t)

x(t)
x(t)
x(t)

1.05 1.8
4600 2.3
1
2.2
4400 1.7
0.95
2.1
4200 0.9 1.6 2

4000 0.85 1.9


1.5
4000 4200 4400 4600 4800 5000 5200 5400 0.85 0.9 0.95 1 1.05 1.1 1.15 1.2 1.25 1.5 1.6 1.7 1.8 1.9 2 2.1 1.9 2 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8
x(t-1) x(t-1) 104 x(t-1) 104 x(t-1) 104

(a) (b) (c) (d)


4 4
10 10
2.2
4800
6.5 2 6000

4600 1.8
6
1.6 5500
4400

5.5 1.4
x(t)

x(t)
4200 5000
x(t)

x(t)
1.2
5 4000
1
4500
3800 0.8
4.5

0.6 4000
3600
4
0.4
3400
4 4.5 5 5.5 6 6.5 3400 3600 3800 4000 4200 4400 4600 4800 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 4000 4500 5000 5500 6000
x(t-1) 104 x(t-1) x(t-1) 104 x(t-1)

(e) (f) (g) (h)


104 104

2400 5000
2
1.15

1.9 2300
4500

1.1 2200
1.8
4000

1.7 2100
x(t)

1.05
x(t)

x(t)

x(t)
3500
1.6 2000

1
1.5 1900 3000

1.4 1800
0.95 2500

1.3 1700
2000
1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 0.95 1 1.05 1.1 1.15 1700 1800 1900 2000 2100 2200 2300 2400 2000 2500 3000 3500 4000 4500 5000
x(t-1) 104 x(t-1) 104 x(t-1) x(t-1)

(i) (j) (k) ( )


Fig. 5. Lagplot graphic (low order time lag) of the time series: (a) CAC40, (b) DAX, (c) DJIA, (d) HANGSENG, (e) IBOVESPA, (f) IPSA, (g) MERVAL, (h) NASDAQ, (i) NIKKEI225,
(j) NYSE, (k) SP500 and (l) SSE.

+∞, if xi = −∞ error criterium until convergence. In this context, it is necessary to


x∗i = −∞, if xi = +∞ , (13) define the cost function J, which is given by
−xi , otherwise
1 
M

in which −xi represents the inverse of xi . J= ξ ( m ), (17)


M
It is worth mentioning that the input and output of the model m=1

are respectively given by: in which


(0 )
y = x = (x1 , . . . , xN0 ), (14) ξ ( m ) = e ( m )2 , (18)
and with
(L )
y = y = (y1 , . . . , yNL ). (15) 
NL
e ( m )2 = [e n ( m ) ] ,
2
(19)
n=1
3.2. Learning process
where M represents the amount of training patterns, and e(m ) =
We present a descending gradient-based learning process to de- [e1 (m ), . . . , eNL (m )] is the signal error for the m-th training pattern.
sign the proposed model based on ideas from [8,9,36]. According Note that en (m) is the instantaneous error of the nth processing
to the DIDLP definition, we can see that it requires the adjustment unit, which is given by
of the following parameters, which are termed as weights:
en (m ) = tn (m ) − yn(L ) (m ), (20)
(l ) (l ) (l ) (l ) (l ) (l ) (l ) (l ) (l ) (l ) (l )
wn = (λn , θn , ϕn , ωn , ρn , pn , an , bn , cn , dn ), (16) (L )
where tn (m) and yn (m ) represent the nth target and the nth out-
with n = 1, . . . , Nl and l = 1, . . . , L. Note that variable L represents put, respectively.
the number of layers of the DIDLP. Considering that the cost function generates an error surface,
The proposed learning process employs a supervised training the main problem is to find a weight vector that minimizes the
approach, where the DIDLP weights are adjusted according to an error between target and output. In this context, the weight vector
R.d.A. Araújo, N. Nedjah and A.L.I. Oliveira et al. / Neurocomputing 347 (2019) 59–81 65

Fig. 6. Lagplot graphic (high order time lag) of the time series: (a) CAC40, (b) DAX, (c) DJIA, (d) HANGSENG, (e) IBOVESPA, (f) IPSA, (g) MERVAL, (h) NASDAQ, (i) NIKKEI225,
(j) NYSE, (k) SP500 and (l) SSE.

can be updated according to the iterative formula: ∂ yn(l ) ˙  (l ) 


= f un . (27)
wn(l ) (i + 1 ) = wn(l ) (i ) + μgn(l ) (i ), (21) ∂ un(l )
Let
where μ > 0 represents the learning rate and gn(l ) is the gradient of 1 ∂ξ
the cost function J. e (l ) ≡ − , (28)
2 ∂ y (l )
Based on the steepest descent algorithm, we can see that
and
∂ξ (i )
gn(l ) (i ) = ∇ J = − . (22) ∂ un(l )
∂ wn(l ) γn(l ) ≡ . (29)
∂ wn(l )
Let W(l) and G(l) matrices given by Then, Eq. (26) can be rewritten as:
    (l )
(l ) (l )
= w1 , . . . , wN , (l ) ∂ξ
W
l
(23)
(l )
= −2en(l ) f˙ un(l ) γn . (30)
∂ wn
and
If we define the local gradients by:
   
G(l ) = g1(l ) , . . . , gN(l ) . (24) ζ (l ) ≡ e(l )  F˙ u(l ) , (31)
l

   
Then, Eq. (21) can be rewritten by
with F˙ u(l ) = f˙ u1(l ) , . . . , f˙ uN
(l )
and  denoting an element by
l
W ( l ) ( i + 1 ) = W ( l ) ( i ) + μG ( l ) ( i ) . (25) element multiplication.
Then, Eq. (30) can be rewritten as:
In this context, we can use the chain rule to evaluate G(l) (i):
∂ξ
(l ) (l )
= −2ζn(l ) γn(l ) . (32)
∂ξ ∂ξ ∂ yn ∂ un ∂ wn(l )
= . (26)
∂ wn(l ) ∂ yn(l ) ∂ un(l ) ∂ wn(l ) However, from Eqs. (22) and (32), we have:

From Eq. (26) we have: gn(l ) = 2ζn(l ) γn(l ) . (33)


66 R.d.A. Araújo, N. Nedjah and A.L.I. Oliveira et al. / Neurocomputing 347 (2019) 59–81

Fig. 7. Discrete derivative of the time series (increasing component – blue line – and decreasing component – red line): (a) CAC40, (b) DAX, (c) DJIA, (d) HANGSENG, (e)
IBOVESPA, (f) IPSA, (g) MERVAL, (h) NASDAQ, (i) NIKKEI225, (j) NYSE, (k) SP500 and (l) SSE. (For interpretation of the references to color in this figure legend, the reader is
referred to the web version of this article.)

Let  (l) a matrix given by: 1  (l+1) ∂ yn(l+1)


Nl+1

  e (l ) = en , (38)
 (l ) = γ1(l ) , . . . , γN(ll ) . (34) 2
n=1
∂ y (l )
in which
Then, Eq. (33) can be rewritten as
  ∂ yn(l+1) ˙  (l+1)  ∂ un(l+1)
G(l ) = 2diag ζ (l )
 (l ) . (35) = f un . (39)
∂ y (l ) ∂ y (l )
The main problem for such approach is how to compute the If we define
local gradients ζ (l) , when l < L, due to the term e(l) . An efficient
way to overcome such problem is to recursively compute e(l) with ∂ un(l )
υn(l ) ≡ . (40)
the back-propagation algorithm [24,42]. ∂ y(l−1)
Therefore, from Eqs. (15), (18) and (19), when l = L, Eq. (28) is
From Eqs. (31), (38) and (39) we have:
written by:
Nl+1
1 ∂ξ 
e (L ) = − = e. (36) e (l ) = ζ (l+1) υ (l+1) .
n n (41)
2 ∂ y (L ) n=1

When l < L, according to the chain rule, we have: Let ϒ (l) a matrix given by:
 
∂ξ 
l+1 N
∂ξ ∂ yn(l+1) ϒ (l ) = υ1(l ) , . . . , υN(ll ) . (42)
= . (37)
∂ y(l ) n=1 ∂ yn(l+1) ∂ y(l ) Then, Eq. (38) can be written by:

Therefore, Eq. (28) can be written by: e(l ) = ζ (l+1) ϒ (l+1) . (43)
R.d.A. Araújo, N. Nedjah and A.L.I. Oliveira et al. / Neurocomputing 347 (2019) 59–81 67

Fig. 8. The nth processing unit of the proposed DIDLP.

Note that the main problem of this learning process is how to ∂αn(l )  
(l ) ∂ un(l ) (l ) ∂ u (l ) υn(l ) = λn(l ) + 1 − λn(l ) pn(l ) , (47)
evaluate the following derivatives: f˙ , υn = (l−1
∂y ) and γn = n(l ) . ∂ y(l−1)
∂ wn
As we use two activation functions in the proposed model,
there are two ways to estimate the derivative f˙ (un(l ) ). The first one in which
is from Eq. (11), which is given by:
∂αn(l ) ∂αn(l ) ∂αn(l ) ∂αn(l ) ∂αn(l )
      = + + + . (48)
(l )
f˙ un = f un (l )
1 − f un(l )
, (44) ∂y ( l−1 )
∂ an(l ) ∂ bn(l ) ∂ cn(l ) ∂ dn(l )

and the second one is from Eq. (12), which is given by: Next, we demonstrate how to evaluate Eqs. (46) and (48).
∂ un(l )
 The partial derivative is given by:
∂λn(l )
  1 un(l ) > 0
f˙ un(l ) = . (45)
0 otherwise
∂ un(l )
= αn(l ) − βn(l ) , (49)
In this context, from Eqs. (16) and (29), we have: ∂λn(l )
 ∂ un(l )
∂ un(l ) ∂ un(l ) ∂ un(l ) ∂ un(l ) ∂ un(l ) ∂ un(l ) ∂ un(l ) The partial derivative is given by:
γn(l ) = , , , , , , , ∂θn(l )
∂λn(l ) ∂θn(l ) ∂ϕn(l ) ∂ωn(l ) ∂ρn(l ) ∂ pn(l ) ∂ an(l )

∂ un(l ) ∂ un(l ) ∂ un(l ) ∂ un(l ) ∂ un(l ) ∂αn(l )
, , . (46) = , (50)
∂ bn(l ) ∂ cn(l ) ∂ dn(l ) ∂θn(l ) ∂αn(l ) ∂θn(l )

Similarly, from Eqs. (2), (3) and (40), we have: in which


68 R.d.A. Araújo, N. Nedjah and A.L.I. Oliveira et al. / Neurocomputing 347 (2019) 59–81

∂ un(l ) Therefore, from Eq. (63), we have:


= λn(l ) , (51)  
∂αn(l ) y(l−1) + an(l ) ≡ R1 (y(l−1) + an(l ) ), (64)
and
 
∂αn(l ) y(l−1) + bn(l ) ≡ Rn (y(l−1) + bn(l ) ),
= τn(l ) − κn(l ) . (52) (65)
∂θn(l )
∂ u (l )   ∗   ∗ 
The partial derivative ∂ϕn is given by: y(l−1) + cn(l ) ≡ Rn y(l−1) + cn(l ) , (66)

∂ un(l ) ∂ un(l ) ∂αn(l ) ∂τn(l ) ∂αn(l ) ∂τn(l )   ∗   ∗ 


= = λn(l ) , (53)
∂ϕn (l ) (l )
∂αn ∂τn ∂ϕn(l ) (l )
∂τn(l ) ∂ϕn(l ) y(l−1) + dn(l ) ≡ R1 y(l−1) + dn(l ) . (67)
in which Therefore, we can estimate these derivatives using the concept
∂αn(l ) of rank indicator vector, originally proposed by Pessoa and Mara-
= θn(l ) , (54)
∂τn(l ) gos [36] in terms of morphological operators. However, a draw-
back arises from such approach, since it can lead to abrupt changes
and and compromising the numerical robustness of the learning pro-
∂τn(l ) cess [36].
= δn(l ) − εn(l ) . (55) In this way, we employ the concept of smoothed rank indi-
∂ϕn(l ) cator vector, extended in terms of morphological operators, be-
∂ un(l ) cause the smoothed rank operator can approximate morpholog-
The partial derivative is given by:
∂ωn(l ) ical operators in terms of differentiable operators. Notice that
such approach depends of a smoothed impulse function Qσ (x ) =
∂ un(l ) ∂ un(l ) ∂αn(l ) ∂ κn(l ) (l )
(l ) ∂αn ∂ κn
(l )
= = λn , (56) [qσ (x1 ), qσ (x2 ), . . . , qσ (xn )], in which
∂ωn(l ) ∂αn(l ) ∂ κn(l ) ∂ωn(l ) ∂ κn(l ) ∂ωn(l ) 1 x 
in which qσ (xi ) = exp
i 2
, ∀ i = 1, . . . , n , (68)
2 σ
(l )
∂αn
= 1 − θn(l ) , (57) wherein σ > 0 is a smoothing factor that directly affects the esti-
∂ κn(l ) mation of derivatives. Note that the proposed learning process also
and works with σ → 0, since the derivatives are estimated in terms of
usual rank indicator vector.
∂ κn(l ) (l ) ∂ un(l )
= δ n − ε n(l ) . (58) The partial derivative is given by:
∂ωn (l ) ∂ an(l )

∂ un(l ) ∂ un(l ) ∂ un(l ) ∂αn(l ) (l ) ∂αn


(l )
The partial derivative
∂ρn(l )
is given by: = = λn , (69)
∂ an(l ) ∂αn(l ) ∂ an(l ) ∂ an(l )
∂ un(l )
(l )
= 1 − λn(l ) , (59) in which
∂ρn
∂αn(l ) ∂αn(l ) ∂τn(l ) ∂δn(l ) ∂τ (l ) ∂δn(l )
∂ u (l ) (l )
= (l ) (l ) (l )
= θn(l ) n(l ) , (70)
The partial derivative n(l ) is given by:
∂ pn
∂ an ∂τn ∂δn ∂ an ∂δn ∂ an(l )
where
∂ un(l ) ∂ un(l ) ∂βn(l )
= , (60)
∂ pn(l ) ∂βn(l ) ∂ pn(l ) ∂τn(l )
= ϕn(l ) , (71)
in which ∂δn(l )
∂ un(l ) and
= 1 − λn(l ) , (61)   
∂βn(l ) ∂δn(l ) Qσ δn(l ) ·1 − y(l−1) + an(l )
=    . (72)
and ∂ an(l ) Qσ δn(l ) ·1 − y(l−1) + an(l ) · 1T
∂βn(l )
= y(l−1) . (62) The partial derivative
∂ un(l )
is given by:
∂ pn(l ) ∂ bn(l )
However, a problem arises for estimating partial derivatives
∂ un(l ) ∂ un(l ) ∂αn(l ) ∂αn(l )
∂ un(l ) ∂ un(l ) ∂ un(l ) ∂ un(l ) = = λn(l ) , (73)
(l ) , (l ) , and . Recall that dilation, erosion, anti-
∂ bn(l ) (l )
∂αn ∂ bn (l )
∂ bn(l )
∂ an ∂ bn ∂ cn(l ) ∂ dn(l )
dilation and anti-erosion operators are not differentiable in usual
in which
way, due to infimum and supremum functions used within these
operators. ∂αn(l ) ∂αn(l ) ∂τn(l ) ∂εn(l ) (l )
(l ) ∂τn ∂εn
(l )
In this sense, de A. Araújo et al. [8,9,14] have demonstrated = = θ n , (74)
∂ bn(l ) ∂τn(l ) ∂εn(l ) ∂ bn(l ) ∂εn(l ) ∂ bn(l )
that morphological operators of dilation, erosion, anti-dilation and
anti-erosion can be seen as particular cases of the rank oper- where
ator. The rth rank operator of a vector x = (x1 , x2 , . . . , xn )T ∈
∂τn(l )
Rn is the rth component from a vector ordered decreasingly = 1 − ϕn(l ) , (75)
(x(1) ≥ x(2) ≥  ≥ x(n) ), and given by: ∂εn(l )
Rr ( x ) = x ( r ) , r = 1, 2, . . . , n. (63) and
R.d.A. Araújo, N. Nedjah and A.L.I. Oliveira et al. / Neurocomputing 347 (2019) 59–81 69

  Table 1
∂εn (l ) Qσ εn(l ) ·1 − y(l−1) + bn(l ) Summary of the investigated values for each hk .
=   . (76)
∂ bn(l ) Qσ εn(l ) ·1 − y(l−1) + bn(l ) · 1T
k h1 h2 h3 h4 h5 h6 h7 h8

1 100 – – – – – – –
∂ un(l ) 3 100 50 10 – – – – –
The partial derivative is given by: 8 100 100 100 50 50 50 10 10
∂ cn(l )

∂ un(l ) ∂ un(l ) ∂αn(l ) ∂α (l )


(l )
= (l ) (l )
= λn(l ) (nl ) , (77) Table 2
∂ cn ∂αn ∂ cn ∂ cn Summary of the investigated
values for time lags.
in which
Time series Time lags
(l ) (l )
∂αn(l ) ∂αn(l ) ∂ κn(l ) ∂ δ n (l )
(l ) ∂ κn ∂ δ n
= = ( 1 − θ n ) , (78) CA 2–106
∂ cn(l ) ∂ κn(l ) ∂ δ n(l ) ∂ cn(l ) (l ) (l )
∂ δ n ∂ cn DAX 2–91
DJIA 2–44
where HS 2–82
IBOV 2–72
∂ κn(l ) IPSA 2–110
(l )
= ωn(l ) , (79) MER 2–51
∂ δn NASDAQ 2–105
NK 2–86
and NYSE 2–59
(l )  ∗  SP 2–50
(l ) Qσ δ n ·1 − y(l−1) + c
∂ δn SSE 2–91
=    . (80)
∂ cn(l ) Qσ δ n(l ) ·1 − y(l−1) ∗ + c · 1T
this work in order to ensure a fair comparison among the investi-
∂ un(l )
The partial derivative is given by: gated models. Observe that the reason for such inclusion is in or-
∂ dn(l )
der to adjust time phase distortions, which can occur in the gen-
∂ un(l ) ∂ un(l ) ∂αn(l ) ∂αn(l ) erator phenomenon of financial time series [10,12].
(l )
= (l ) (l )
= λn(l ) (81)
∂ dn ∂αn ∂ dn ∂ dn(l ) For the experiments using the proposed model, it is nec-
essary to define a basic architecture defined of DIDLP (I;H =
in which h1 , . . . , hk ;O;μ;σ ). Term I represents the input dimensionality, term
∂αn(l ) ∂αn(l ) ∂ κn(l ) ∂ ε n(l ) ∂ κ (l ) ∂ ε n(l ) hk defines the amount of processing units in kth hidden layer, term
(l )
= (l ) (l ) (l )
= (1 − θn(l ) ) n(l ) , (82) O represents the output dimensionality (as we are dealing with
∂ dn ∂ κn ∂ εn ∂ dn ∂ ε n ∂ dn(l )
1-step-ahead prediction problems, we fixed O = 1), term μ repre-
where sents the learning rate and term σ represents the smoothing factor.
∂ κn(l ) For the amount of hidden layers (k), we have used an empirical
= 1 − ωn(l ) , (83) methodology, using cross validation, wherein the values 1, 3 and
∂ ε n(l ) 8 are investigated. Table 1 summarizes the investigated values for
and each hk .
 ∗  Note that for all k = 1, 3, 8 we employ the relu activation func-
(l )
Qσ ε n(l ) ·1 − y(l−1) + dn
∂ ε n(l ) tion for all hidden layers and the logistic activation function for
=    . (84) the output layer. In this way, the best value found for k is 3. Fig. 9
∂ dn(l ) Qσ ε n(l ) ·1 − y(l−1) ∗ + dn(l ) · 1T illustrates the architecture of the proposed DIDLP.
For the learning rate (μ), we also have used an empirical
4. Simulations and experimental results methodology, using cross validation, wherein the values 0.001, 0.01
and 0.1 are investigated. In this way, the best value found for μ
In our simulations, we normalize all time series to lie within is 0.01. For the smoothing factor (σ ), we also have employed an
the range [0, 1] and we divide the data in three sets [39]: training empirical methodology, using cross validation, wherein the values
set (50% of the data points), validation set (25% of the data points) 0.005, 0.05 and 0.5 are investigated. In this sense, the best value
and test set (25% of the data points). found for σ is 0.05.
In order to establish the performance of the proposed predic- The choice of the input dimensionality was based on the time
tor, we compare our results with those obtained with classical series analysis presented in Section 2. Table 2 summarizes the cho-
and robust models presented in the literature: (i) statistical mod- sen time lags for each time series. Note that the first time lag is
els (the autoregressive integrated moving average – ARIMA [2] and not used because a specific structure is necessary to use the auto-
the support vector regressor [30] with linear (SVRL), polynomial matic phase adjustment concept, since the key step of this concept
(SVRP) and radial basis function (SRVR) kernels), (ii) neural net- is the two step prediction to adjust time phase distortions.
work models (the multilayer perceptron – MLP [24] and the lin- The initial values of the parameters are an(l ) , bn(l ) , cn(l ) ,
ear and nonlinear neural network – LNNN [56]), (iii) morphological dn , pn , ρn(l ) ∈ [−1, 1] and λn(l ) , θn(l ) , ϕn(l ) , ωn(l ) ∈ [0, 1]. For the
(l ) (l )

models (the dilation-erosion model - DEM [10] and the dendrite learning process, three stop conditions are used: (i) number of
morphological neuron – DMN [57]), (iv) dynamic models (nonlin- training epochs (epoch = 10, 0 0 0), (ii) Process Training (Pt ≤ 10−6 )
ear autoregressive neural network with exogenous takens inputs – [39], and (iii) generalization Loss (Gl > 5%) [39]. All experiments
NARXT [32]), and (v) hybrid models (the prediction evolutionary using the proposed model were performed with the Matlab
levenberg-marquardt neural networks – PELMNN [1] and the ro- software.
bust automatic phase-adjustment method – RAA [12]). It is worth mentioning that relevant metrics are used to eval-
It is worth mentioning that the concept of automatic phase ad- uate prediction performance: mean squared error (MSE) [6], mean
justment [14] was applied to all prediction models investigated in absolute percentage error (MAPE) [6], theil statistics (THEIL) [23],
70 R.d.A. Araújo, N. Nedjah and A.L.I. Oliveira et al. / Neurocomputing 347 (2019) 59–81

Fig. 9. The architecture of the proposed DIDLP.

Fig. 10. Design and evaluation procedure for the proposed model.

prediction of change in direction (POCID) [55] and average rela- According to Friedman test, it is possible to confirm, statis-
tive variance (ARV) [23]. Observe that, for each model setting, we tically, the results presented in Table 3. Note that the proposed
have performed fifty experiments, where the mean (MEAN) and model obtained the smallest rank value for the Friedman test, sug-
the standard deviation (STD) were computed for each performance gesting that it can be considered the most accurate prediction
metric. Fig. 10 depicts the procedure to design and to evaluate the model for the investigated time series, considering the ARV met-
proposed model. ric. Besides, it is worth mentioning that the pairwise analysis pro-
In addition, in order to statistically validate the model with best vided by the Tukey test reveals that the proposed model has a
performance, we apply the Friedman test [20] with significance better performance regarding all pairs, considering the ARV met-
level α = 0.05, since it establishes a performance rank for the in- ric. We can observe that the highest value for Tukey test statistic
vestigated models. Furthermore, we use a post hoc test, called the is −1.50 (with respect to pair DIDLP – RAA), suggesting that the
Tukey test [50] with α = 0.05, trying to evaluate the pairwise per- proposed model has a statistically better prediction performance
formance of all investigated models. It is worth mentioning that than the best model among all the investigated models.
we have used both tests considering all datasets together (using
the approach proposed by Demsar [15] and Nobrega and Oliveira 4.2. Analysis of the MAPE metric
[33]).
In Table 4, we summarize the obtained results of all financial
time series, considering the MEAN and STD statistics, as well as the
4.1. Analysis of the ARV metric results of the Friedman test (χ 2 = 107.15 and p-value = 6.77E−18)
and Tukey test for the MAPE metric.
In Table 3, we summarize the obtained results of all financial Table 4 reveals that the proposed model obtained the best pre-
time series, considering the MEAN and STD statistics, as well as the diction performance for the MAPE metric, considering DAX, DJIA,
results of the Friedman test (χ 2 = 106.60 and p-value = 8.72E−18) HS, IBOV, MER, NK, NYSE and SP time series. Although without the
and Tukey test for the ARV metric. best performance for CAC, IPSA, NASDAQ and SSE time series, the
According to Table 3, we can observe that the proposed model proposed model obtained equal or similar performances to those
obtained the best prediction performance, considering the ARV yielded by the most accurate prediction models for these time se-
metric, for CAC, DAX, DJIA, HS, IBOV, NASDAQ, NK, NYSE, SP and ries (NARXT and RAA models, respectively). It is worth mentioning
SSE time series. Even without the best performance for IPSA and that the proposed model is the best prediction model in 67% of
MER time series, the proposed model achieved similar perfor- the investigated time series. Furthermore, we can observe that the
mance to those yielded by the most accurate prediction models obtained values of the MAPE metric within the range [8E−5,1E−3]
for these time series (DMN and RAA models, respectively). Such suggests that the proposed model has a small percentage devia-
fact implies that the proposed model is the best prediction model tions.
in 83% of the investigated time series. Also, it is worth mention- Once again, according to Friedman test results, it is possible to
ing that the obtained values of the ARV metric within the range confirm, statistically, the results presented in Table 4. Also, we ob-
[8E−7,2E−4] suggests that the proposed model has a far better serve that the proposed model achieved the smallest rank value
performance regarding a naive prediction model. for the Friedman test, suggesting that it can be considered the best
Table 3
Summary of the testing performance for the ARV metric.

R.d.A. Araújo, N. Nedjah and A.L.I. Oliveira et al. / Neurocomputing 347 (2019) 59–81
Model Time series Friedman test Tukey test

CAC DAX DJIA HS IBOV IPSA MER NASDAQ NK NYSE SP SSE Position Rank Statistic p-value

DIDLP 2.21E−5 1.93E−5 3.07E−5 8.25E−7 1.30E−5 0.0 0 01 3.16E−5 1.03E−5 1.45E−5 5.09E−6 6.41E−6 0.0 0 02 1 1.25 − −
± 2.88E−5 ± 3.19E−5 ± 3.78E−5 ± 1.08E−6 ± 2.09E−5 ± 0.0 0 02 ± 6.09E−5 ± 2.20E−5 ± 2.55E−5 ± 5.23E−6 ± 7.76E−6 ± 0.0 0 03
ARIMA 0.0159 0.0141 0.0075 0.0313 0.0795 0.0084 0.0148 0.0124 0.0246 0.0239 0.0131 0.1027 10 9.33 −8.08 3.98E−8
± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0
DEM 0.0 0 02 0.0 0 06 0.0010 0.0 0 04 0.0 0 08 0.0010 0.0053 0.0016 0.0 0 03 0.0 0 06 0.0 0 07 0.0050 4 3.83 −2.58 7.92E−2
± 0.0 0 02 ± 0.0 0 03 ± 0.0018 ± 0.0 0 03 ± 0.0011 ± 0.0013 ± 0.0031 ± 0.0027 ± 0.0 0 02 ± 0.0 0 07 ± 0.0010 ± 0.0074
DMN 3.72E−5 0.0 0 04 0.0017 0.0 0 02 0.0 0 07 0.0 0 01 0.0013 0.0016 0.0 0 02 0.0 0 09 0.0021 0.0067 3 3.50 −2.25 1.26E−1
± 8.02E−5 ± 0.0 0 06 ± 0.0018 ± 0.0 0 02 ± 0.0 0 06 ± 0.0 0 01 ± 0.0013 ± 0.0031 ± 0.0 0 02 ± 0.0 0 06 ± 0.0025 ± 0.0111
LNNN 0.0 0 03 0.0031 0.0165 0.0015 0.0085 0.0255 0.0013 0.0136 0.0 0 04 0.0 0 06 0.0101 0.0364 6 7.00 −5.75 9.36E−5
± 0.0 0 02 ± 0.0067 ± 0.0219 ± 0.0017 ± 0.0159 ± 0.0381 ± 0.0026 ± 0.0118 ± 0.0 0 07 ± 0.0013 ± 0.0119 ± 0.0638
MLP 0.0135 0.0120 0.0091 0.0260 0.0802 0.0093 0.0137 0.0119 0.0215 0.0229 0.0141 0.0631 7 7.58 −6.33 1.68E−5
± 1.21E−7 ± 7.18E−7 ± 7.60E−8 ± 1.79E−7 ± 1.02E−7 ± 5.55E−7 ± 1.08E−7 ± 2.25E−6 ± 4.53E−7 ± 2.30E−7 ± 2.11E−7 ± 3.34E−7
NARXT 3.60E−5 0.0031 0.0038 0.0018 0.0043 0.0017 0.0139 0.0073 8.67E−5 2.06E−5 0.0 0 07 0.0036 5 4.17 −2.92 4.75E−2
± 7.84E−5 ± 0.0029 ± 0.0082 ± 0.0019 ± 0.0066 ± 0.0028 ± 0.0289 ± 0.0154 ± 8.63E−5 ± 4.14E−5 ± 0.0010 ± 0.0046
PELMNN 0.0147 0.0136 0.0088 0.0292 0.0844 0.0108 0.0161 0.0135 0.0236 0.0243 0.0149 0.0849 11 10.08 −8.83 1.96E−9
± 0.0 0 03 ± 0.0 0 05 ± 0.0 0 01 ± 0.0 0 08 ± 0.0 0 08 ± 0.0 0 03 ± 0.0 0 07 ± 0.0 0 02 ± 0.0 0 08 ± 0.0 0 03 ± 0.0015 ± 0.0142
RAA 0.0 0 03 0.0 0 01 0.0036 0.0 0 04 0.0 0 01 2.31E−6 1.74E−5 0.0033 5.25E−5 0.0 0 01 0.0018 3.89E−5 2 2.75 −1.50 3.08E−1
± 0.0 0 06 ± 0.0 0 02 ± 0.0034 ± 0.0 0 07 ± 0.0 0 03 ± 3.54E−6 ± 1.47E−5 ± 0.0039 ± 6.59E−5 ± 0.0 0 02 ± 0.0034 ± 6.44E−5
SVRL 0.0133 0.0127 0.0109 0.0259 0.0834 0.0110 0.0190 0.0132 0.0219 0.0233 0.0145 0.0653 9 9.25 −8.00 5.48E−8
± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0
SVRP 0.0130 0.0124 0.0099 0.0264 0.0794 0.0096 0.0188 0.0123 0.0219 0.0224 0.0141 0.0679 8 8.42 −7.17 1.12E−6
± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0
SVRR 0.0167 0.0138 0.0095 0.0293 0.0856 0.0136 0.0218 0.0161 0.0234 0.0240 0.0138 0.0937 12 10.83 −9.58 7.48E−11
± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0

71
72
Table 4
Summary of the testing performance for the MAPE metric.

R.d.A. Araújo, N. Nedjah and A.L.I. Oliveira et al. / Neurocomputing 347 (2019) 59–81
Model Time series Friedman test Tukey test

CAC DAX DJIA HS IBOV IPSA MER NASDAQ NK NYSE SP SSE Position Rank Statistic p-value

DIDLP 0.0 0 07 0.0 0 04 0.0 0 07 8.41E−5 0.0 0 02 0.0010 0.0 0 03 0.0 0 02 0.0 0 03 0.0 0 02 0.0 0 02 0.0 0 04 1 1.25 − −
± 0.0 0 05 ± 0.0 0 04 ± 0.0 0 05 ± 7.96E−5 ± 0.0 0 02 ± 0.0015 ± 0.0 0 04 ± 0.0 0 04 ± 0.0 0 04 ± 0.0 0 02 ± 0.0 0 02 ± 0.0 0 04
ARIMA 0.0228 0.0154 0.0103 0.0187 0.0189 0.0143 0.0094 0.0103 0.0166 0.0156 0.0094 0.0090 9 9.17 −7.92 7.51E−8
± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0
DEM 0.0022 0.0033 0.0032 0.0020 0.0019 0.0038 0.0065 0.0033 0.0018 0.0026 0.0022 0.0019 5 4.08 −2.83 5.42E−2
± 0.0016 ± 0.0013 ± 0.0034 ± 0.0010 ± 0.0019 ± 0.0040 ± 0.0023 ± 0.0034 ± 0.0 0 09 ± 0.0014 ± 0.0020 ± 0.0019
DMN 0.0 0 08 0.0022 0.0050 0.0010 0.0018 0.0017 0.0026 0.0029 0.0013 0.0032 0.0041 0.0020 3 3.67 −2.42 1.00E−1
± 0.0014 ± 0.0016 ± 0.0033 ± 0.0011 ± 0.0011 ± 0.0 0 08 ± 0.0019 ± 0.0034 ± 0.0012 ± 0.0016 ± 0.0030 ± 0.0022
LNNN 0.0030 0.0046 0.0127 0.0037 0.0055 0.0184 0.0024 0.0111 0.0019 0.0017 0.0090 0.0049 6 6.67 −5.42 2.33E−4
± 0.0019 ± 0.0061 ± 0.0143 ± 0.0026 ± 0.0067 ± 0.0221 ± 0.0029 ± 0.0079 ± 0.0023 ± 0.0023 ± 0.0064 ± 0.0053
MLP 0.0200 0.0139 0.0114 0.0166 0.0188 0.0147 0.0089 0.0103 0.0150 0.0156 0.0098 0.0067 7 7.67 −6.42 1.30E−5
± 7.36E−8 ± 5.29E−7 ± 6.41E−8 ± 7.82E−8 ± 2.65E−8 ± 3.97E−7 ± 5.39E−8 ± 1.40E−6 ± 2.33E−7 ± 8.20E−8 ± 1.10E−7 ± 1.01E−7
NARXT 0.0 0 07 0.0064 0.0048 0.0041 0.0043 0.0047 0.0062 0.0056 0.0010 0.0 0 04 0.0020 0.0016 4 3.83 −2.58 7.92E−2
± 0.0012 ± 0.0050 ± 0.0075 ± 0.0035 ± 0.0043 ± 0.0056 ± 0.0103 ± 0.0088 ± 0.0 0 08 ± 0.0 0 05 ± 0.0023 ± 0.0016
PELMNN 0.0215 0.0151 0.0113 0.0177 0.0198 0.0162 0.0098 0.0112 0.0160 0.0158 0.0102 0.0080 11 10.17 −8.92 1.38E−9
± 0.0 0 03 ± 0.0 0 02 ± 0.0 0 01 ± 0.0 0 03 ± 0.0 0 02 ± 0.0 0 02 ± 0.0 0 02 ± 7.87E−5 ± 0.0 0 04 ± 0.0 0 02 ± 0.0 0 06 ± 0.0 0 07
RAA 0.0018 0.0013 0.0066 0.0018 0.0 0 07 0.0 0 02 0.0 0 03 0.0052 0.0 0 07 0.0 0 06 0.0031 0.0 0 02 2 2.75 −1.50 3.08E−1
± 0.0027 ± 0.0010 ± 0.0059 ± 0.0021 ± 0.0 0 08 ± 0.0 0 02 ± 0.0 0 01 ± 0.0039 ± 0.0 0 06 ± 0.0010 ± 0.0035 ± 0.0 0 02
SVRL 0.0199 0.0144 0.0128 0.0166 0.0195 0.0159 0.0109 0.0111 0.0152 0.0157 0.0101 0.0068 10 9.33 −8.08 3.98E−8
± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0
SVRP 0.0196 0.0141 0.0120 0.0167 0.0186 0.0149 0.0108 0.0108 0.0154 0.0154 0.0101 0.0071 8 8.50 −7.25 8.41E−7
± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0
SVRR 0.0234 0.0151 0.0118 0.0178 0.0200 0.0181 0.0117 0.0127 0.0163 0.0158 0.0098 0.0083 12 10.92 −9.67 5.12E−11
± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0
R.d.A. Araújo, N. Nedjah and A.L.I. Oliveira et al. / Neurocomputing 347 (2019) 59–81 73

prediction model for the investigated time series, considering the to pair DIDLP – RAA), suggesting that the proposed model has
MAPE metric. Furthermore, the pairwise analysis provided by the a statistically better prediction performance than the best model
Tukey test reveals that the proposed model has a better perfor- among all the investigated models.
mance regarding all pairs, considering the MAPE metric. Note that
the highest value for Tukey test statistic is −1.50 (with respect to 4.5. Analysis of the THEIL metric
pair DIDLP – RAA), suggesting that the proposed model has a sta-
tistically better prediction performance than the best model among In Table 7, we summarize the obtained results of all financial
all the investigated models. time series, considering the MEAN and STD statistics, as well as the
results of the Friedman test (χ 2 = 106.48 and p-value = 9.19E−18)
4.3. Analysis of the MSE metric and Tukey test for the THEIL metric.
According to Table 7, we can verify that, except for IPSA, MER
In Table 5, we summarize the obtained results of all financial and SSE time series, the proposed model obtained the best predic-
time series, considering the MEAN and STD statistics, as well as the tion performance, considering the THEIL metric, for all time series
results of the Friedman test (χ 2 = 106.60 and p-value = 8.72E−18) investigated in this work. However, even without the best perfor-
and Tukey test for the MSE metric. mance, similar results can be found by the proposed model with
According to Table 5, except for the SSE time series, the pro- respect to the best prediction model for these time series (RAA
posed model obtained the best prediction performance, consider- model). In this way, the proposed model has the best performance
ing the MSE metric, for all time series investigated in this work. in 75% of the investigated time series. In addition to that, the ob-
For the particular case of the SSE time series, we can observe that tained values of the THEIL metric within the range [3E−5,1E−2] in-
similar results can be achieved with the proposed model with re- dicates that the prediction generated by the proposed model does
spect to the most accurate prediction model for this time series not have an 1-step-ahead delay regarding its real time series obser-
(RAA model). Note that the proposed model achieved the best per- vations. It is worth mentioning that, except for the proposed model
formance in 92% of the investigated time series. Also, we can ver- and the DEM, DMN, LNNN, NARXT and RAA models (THEIL ≤ 1), the
ify that the obtained values of the MSE metric within the range prediction models considered in this work are not able to over-
[3E−9,1E−6] indicates that predictions generated by the proposed come such a behavior (THEIL ≥ 1).
model are too close to real time series values. Again, according to Friedman test results, it is possible to con-
Again, according to Friedman test results, it is possible to con- firm, statistically, the results presented in Table 7. We observe that
firm, statistically, the results presented in Table 5. In addition to the proposed model achieved the smallest rank value for the Fried-
that, we observe that the proposed model achieved the smallest man test, suggesting that it can be considered the best prediction
rank value for the Friedman test, suggesting that it can be con- model for the investigated time series, considering the THEIL met-
sidered the best prediction model for the investigated time se- ric. Furthermore, the pairwise analysis provided by the Tukey test
ries, considering the MSE metric. Furthermore, the pairwise anal- reveals that the proposed model has a better performance regard-
ysis provided by the Tukey test reveals that the proposed model ing all pairs, considering the POCID metric. We further observe that
has a better performance regarding all pairs, considering the MSE the highest value for Tukey test statistic is −1.29 (with respect to
metric. We further observe that the highest value for Tukey test pair DIDLP – RAA), suggesting that the proposed model has a sta-
statistic is −1.50 (with respect to pair DIDLP – RAA), suggesting tistically better prediction performance than the best model among
that the proposed model has a statistically better prediction per- all the investigated models.
formance than the best model among all the investigated models.
4.6. Analysis of the prediction behavior
4.4. Analysis of the POCID metric
Fig. 11 depicts a comparative analysis between real and pre-
In Table 6, we summarize the obtained results of all financial dicted observations (the last ten points of the test set), built using
time series, considering the MEAN and STD statistics, as well as the the proposed model, for all time series investigated in this work. In
results of the Friedman test (χ 2 = 105.74 and p-value = 1.29E−17) these figures we cannot identify any presence of 1-step-ahead de-
and Tukey test for the POCID metric. lay of the prediction with respect to observed time series values.
Table 6 reveals that, except for CAC, IPSA, MER and SSE time Note that the prediction is almost superimposed on real observa-
series, the proposed model obtained the best prediction perfor- tion. This means that the proposed model is able to efficiently ad-
mance, considering the POCID metric, for all time series investi- just any time phase distortions, which can occur in the represen-
gated in this work. It is worth mentioning that that similar re- tation of the investigated financial time series.
sults can be achieved with the proposed model with respect to
the most accurate prediction models for these time series (NARXT, 4.7. Analysis of the residuals
DMN and RAA models). In this sense, we verify that the proposed
model achieved the best performance in 67% of the investigated Fig. 12 depicts the histogram of residuals generated by the pro-
time series. Besides, we observe that the obtained values of the posed model for all investigated financial time series. These fig-
POCID metric within the range [0.97,0.99] indicates that the pro- ures suggest that the residuals are not normally distributed. In an
posed model has much better performance than a “coin-tossing” attempt to confirm such a claim, we have used the Kolmogorov–
experiment. Smirnov test, which rejected the null hypothesis, at 5% of signifi-
Over again, according to Friedman test results, it is possible to cance level, of the residuals are normally distributed, for all time
confirm, statistically, the results presented in Table 6. We observe series investigated in this work. The results of the Kolmogorov–
that the proposed model achieved the smallest rank value for the Smirnov test are summarized in Table 8.
Friedman test, suggesting that it can be considered the best predic- Fig. 13 depicts the autocorrelation function of residuals gener-
tion model for the investigated time series, considering the POCID ated by the proposed model for all investigated financial time se-
metric. Furthermore, the pairwise analysis provided by the Tukey ries. These figures suggest that the residuals are autocorrelated,
test reveals that the proposed model has a better performance re- due to the presence of a characteristic behavior like a hyperbolic
garding all pairs, considering the POCID metric. We further observe decay. In order to confirm such a claim, we have used the Ljung-
that the highest value for Tukey test statistic is −1.29 (with respect Box Q-test, which rejected the null hypothesis, at 5% of significance
74
Table 5
Summary of the testing performance for the MSE metric.

R.d.A. Araújo, N. Nedjah and A.L.I. Oliveira et al. / Neurocomputing 347 (2019) 59–81
Model Time series Friedman test Tukey test

CAC DAX DJIA HS IBOV IPSA MER NASDAQ NK NYSE SP SSE Position Rank Statistic p-value

DIDLP 3.47E−7 2.04E−7 3.19E−7 3.71E−9 4.66E−8 1.45E−6 1.38E−7 6.46E−8 1.13E−7 3.93E−8 3.66E−8 5.12E−8 1 1.25 − −
± 4.52E−7 ± 3.37E−7 ± 3.92E−7 ± 4.87E−9 ± 7.52E−8 ± 2.90E−6 ± 2.66E−7 ± 1.38E−7 ± 1.98E−7 ± 4.03E−8 ± 4.43E−8 ± 7.43E−8
ARIMA 0.0 0 03 0.0 0 01 7.76E−5 0.0 0 01 0.0 0 03 0.0 0 01 6.47E−5 7.74E−5 0.0 0 02 0.0 0 02 7.49E−5 2.33E−5 10 9.33 −8.08 3.98E−8
± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0
DEM 3.13E−6 6.31E−6 1.02E−5 1.58E−6 2.99E−6 1.47E−5 2.30E−5 1.03E−5 2.27E−6 5.01E−6 4.22E−6 1.13E−6 4 3.83 −2.58 7.92E−2
± 3.33E−6 ± 3.39E−6 ± 1.84E−5 ± 1.18E−6 ± 3.88E−6 ± 1.85E−5 ± 1.35E−5 ± 1.69E−5 ± 1.63E−6 ± 5.61E−6 ± 5.44E−6 ± 1.68E−6
DMN 5.85E−7 4.34E−6 1.80E−5 7.09E−7 2.34E−6 1.98E−6 5.82E−6 9.92E−6 1.29E−6 6.74E−6 1.22E−5 1.52E−6 3 3.50 −2.25 1.26E−1
± 1.26E−6 ± 6.47E−6 ± 1.86E−5 ± 9.63E−7 ± 2.16E−6 ± 1.80E−6 ± 5.84E−6 ± 1.94E−5 ± 1.33E−6 ± 4.54E−6 ± 1.44E−5 ± 2.52E−6
LNNN 4.60E−6 3.33E−5 0.0 0 02 6.55E−6 3.05E−5 0.0 0 04 5.83E−6 8.53E−5 2.97E−6 4.69E−6 5.76E−5 8.24E−6 6 7.00 −5.75 9.36E−5
± 3.90E−6 ± 7.12E−5 ± 0.0 0 02 ± 7.72E−6 ± 5.72E−5 ± 0.0 0 05 ± 1.14E−5 ± 7.41E−5 ± 5.74E−6 ± 9.94E−6 ± 6.80E−5 ± 1.44E−5
MLP 0.0 0 02 0.0 0 01 9.45E−5 0.0 0 01 0.0 0 03 0.0 0 01 5.98E−5 7.44E−5 0.0 0 02 0.0 0 02 8.03E−5 1.43E−5 7 7.58 −6.33 1.68E−5
± 1.90E−9 ± 7.59E−9 ± 7.89E−10 ± 8.07E−10 ± 3.67E−10 ± 7.88E−9 ± 4.73E−10 ± 1.41E−8 ± 3.52E−9 ± 1.78E−9 ± 1.20E−9 ± 0.0 0 0 0
NARXT 5.66E−7 3.30E−5 3.96E−5 8.16E−6 1.55E−5 2.42E−5 6.06E−5 4.59E−5 6.74E−7 1.59E−7 3.72E−6 8.13E−7 5 4.17 −2.92 4.75E−2
± 1.23E−6 ± 3.07E−5 ± 8.54E−5 ± 8.57E−6 ± 2.37E−5 ± 4.04E−5 ± 0.0 0 01 ± 9.64E−5 ± 6.71E−7 ± 3.20E−7 ± 5.48E−6 ± 1.05E−6
PELMNN 0.0 0 02 0.0 0 01 9.16E−5 0.0 0 01 0.0 0 03 0.0 0 02 7.04E−5 8.48E−5 0.0 0 02 0.0 0 02 8.48E−5 1.92E−5 11 10.08 −8.83 1.96E−9
± 4.57E−6 ± 4.89E−6 ± 1.26E−6 ± 3.62E−6 ± 3.03E−6 ± 4.82E−6 ± 3.27E−6 ± 1.36E−6 ± 5.95E−6 ± 1.99E−6 ± 8.81E−6 ± 3.21E−6
RAA 4.14E−6 1.29E−6 3.74E−5 1.71E−6 4.83E−7 3.28E−8 7.62E−8 2.05E−5 4.08E−7 7.98E−7 1.02E−5 8.80E−9 2 2.75 −1.50 3.08E−1
± 8.71E−6 ± 1.60E−6 ± 3.52E−5 ± 3.04E−6 ± 9.11E−7 ± 5.04E−8 ± 6.44E−8 ± 2.44E−5 ± 5.12E−7 ± 1.77E−6 ± 1.94E−5 ± 1.46E−8
SVRL 0.0 0 02 0.0 0 01 0.0 0 01 0.0 0 01 0.0 0 03 0.0 0 02 8.32E−5 8.24E−5 0.0 0 02 0.0 0 02 8.28E−5 1.48E−5 9 9.25 −8.00 5.48E−8
± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0
SVRP 0.0 0 02 0.0 0 01 0.0 0 01 0.0 0 01 0.0 0 03 0.0 0 01 8.20E−5 7.68E−5 0.0 0 02 0.0 0 02 8.03E−5 1.54E−5 8 8.42 −7.17 1.12E−6
± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0
SVRR 0.0 0 03 0.0 0 01 9.83E−5 0.0 0 01 0.0 0 03 0.0 0 02 9.50E−5 0.0 0 01 0.0 0 02 0.0 0 02 7.86E−5 2.12E−5 12 10.83 −9.58 7.48E−11
± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0
Table 6
Summary of the testing performance for the POCID metric.

R.d.A. Araújo, N. Nedjah and A.L.I. Oliveira et al. / Neurocomputing 347 (2019) 59–81
Model Time series Friedman test Tukey test

CAC DAX DJIA HS IBOV IPSA MER NASDAQ NK NYSE SP SSE Position Rank Statistic p-value

DIDLP 0.9847 0.9858 0.9723 0.9982 0.9965 0.9783 0.9881 0.9910 0.9868 0.9940 0.9915 0.9823 1 1.42 – –
± 0.0104 ± 0.0127 ± 0.0235 ± 0.0024 ± 0.0036 ± 0.0314 ± 0.0148 ± 0.0152 ± 0.0125 ± 0.0049 ± 0.0030 ± 0.0327
ARIMA 0.6757 0.7212 0.7353 0.7281 0.6926 0.7466 0.7331 0.7309 0.6696 0.7137 0.7161 0.6858 8 8.71 −7.29 6.96E−7
± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0
DEM 0.9505 0.90 0 0 0.9067 0.9570 0.9610 0.9095 0.7347 0.8771 0.9366 0.9239 0.9068 0.9097 5 4.29 −2.88 5.03E−2
± 0.0365 ± 0.0311 ± 0.0850 ± 0.0300 ± 0.0408 ± 0.0940 ± 0.0958 ± 0.1213 ± 0.0278 ± 0.0454 ± 0.0752 ± 0.0922
DMN 0.9865 0.9301 0.8143 0.9737 0.9662 0.9584 0.8975 0.9049 0.9542 0.9248 0.8178 0.8823 3 3.54 −2.13 1.48E−1
± 0.0209 ± 0.0578 ± 0.1267 ± 0.0322 ± 0.0232 ± 0.0196 ± 0.1027 ± 0.0856 ± 0.0426 ± 0.0435 ± 0.1172 ± 0.1105
LNNN 0.9523 0.8965 0.7773 0.9254 0.8883 0.7611 0.9034 0.6744 0.9454 0.9556 0.7347 0.8035 6 5.75 −4.33 3.18E−3
± 0.0289 ± 0.1303 ± 0.1850 ± 0.0497 ± 0.1321 ± 0.2514 ± 0.1221 ± 0.2165 ± 0.0596 ± 0.0708 ± 0.1310 ± 0.1889
MLP 0.7162 0.7434 0.6975 0.7184 0.6926 0.7330 0.7585 0.7175 0.6960 0.6838 0.7119 0.7345 7 8.63 −7.21 9.31E−7
± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0020 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0
NARXT 0.9865 0.8221 0.8773 0.9228 0.9134 0.8950 0.8466 0.8350 0.9665 0.9923 0.9034 0.9035 4 3.96 −2.54 8.36E−2
± 0.0156 ± 0.1482 ± 0.1669 ± 0.0681 ± 0.0932 ± 0.1305 ± 0.2360 ± 0.2063 ± 0.0234 ± 0.0102 ± 0.0849 ± 0.1166
PELMNN 0.6955 0.7018 0.6924 0.7211 0.6883 0.7403 0.7339 0.7318 0.6872 0.6829 0.6839 0.7115 10 9.83 −8.42 1.01E−8
± 0.0068 ± 0.0139 ± 0.0046 ± 0.0119 ± 0.0097 ± 0.0145 ± 0.0132 ± 0.0102 ± 0.0125 ± 0.0063 ± 0.0193 ± 0.0145
RAA 0.9622 0.9690 0.8387 0.9570 0.9861 0.9964 0.9898 0.8054 0.9744 0.9838 0.8678 0.9938 2 2.71 −1.29 3.79E−1
± 0.0457 ± 0.0285 ± 0.1482 ± 0.0553 ± 0.0120 ± 0.0038 ± 0.0038 ± 0.1458 ± 0.0234 ± 0.0340 ± 0.1187 ± 0.0074
SVRL 0.7072 0.7035 0.6429 0.7105 0.6667 0.7421 0.7203 0.7309 0.6916 0.6880 0.6949 0.7168 11 9.88 −8.46 8.59E−9
± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0
SVRP 0.7477 0.7345 0.6933 0.7368 0.6710 0.7330 0.7415 0.7220 0.7004 0.6838 0.6864 0.7434 9 8.75 −7.33 6.01E−7
± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0
SVRR 0.6757 0.6991 0.7017 0.6974 0.6710 0.7330 0.7288 0.7265 0.6608 0.6795 0.7119 0.7257 12 10.54 −9.13 5.29E−10
± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0

75
76
Table 7
Summary of the testing performance for the THEIL metric.

R.d.A. Araújo, N. Nedjah and A.L.I. Oliveira et al. / Neurocomputing 347 (2019) 59–81
Model Time series Friedman test Tukey test

CAC DAX DJIA HS IBOV IPSA MER NASDAQ NK NYSE SP SSE Position Rank Statistic p-value

DIDLP 0.0017 0.0018 0.0043 3.16E−5 0.0 0 02 0.0125 0.0024 0.0010 0.0 0 07 0.0 0 02 0.0 0 05 0.0036 1 1.25 − −
± 0.0023 ± 0.0030 ± 0.0053 ± 4.14E−5 ± 0.0 0 03 ± 0.0251 ± 0.0046 ± 0.0020 ± 0.0012 ± 0.0 0 02 ± 0.0 0 06 ± 0.0052
ARIMA 1.2374 1.3162 1.0401 1.2029 1.0243 1.0321 1.0994 1.1509 1.1529 1.1269 1.0565 1.6150 10 9.33 −8.08 3.98E−8
± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0
DEM 0.0157 0.0564 0.1369 0.0135 0.0107 0.1267 0.3923 0.1531 0.0137 0.0306 0.0595 0.0788 4 3.83 −2.58 7.92E−2
± 0.0167 ± 0.0303 ± 0.2477 ± 0.0102 ± 0.0139 ± 0.1602 ± 0.2310 ± 0.2511 ± 0.0098 ± 0.0343 ± 0.0766 ± 0.1170
DMN 0.0029 0.0386 0.2413 0.0061 0.0084 0.0171 0.0993 0.1475 0.0078 0.0411 0.1722 0.1059 3 3.50 −2.25 1.26E−1
± 0.0063 ± 0.0580 ± 0.2496 ± 0.0082 ± 0.0078 ± 0.0155 ± 0.10 0 0 ± 0.2878 ± 0.0081 ± 0.0276 ± 0.2026 ± 0.1749
LNNN 0.0230 0.2979 2.2952 0.0562 0.1091 3.1293 0.0994 1.2666 0.0178 0.0286 0.8116 0.5721 6 7.00 −5.75 9.36E−5
± 0.0196 ± 0.6380 ± 3.0479 ± 0.0665 ± 0.2047 ± 4.6827 ± 0.1936 ± 1.1010 ± 0.0345 ± 0.0606 ± 0.9578 ± 1.0030
MLP 1.0610 1.0926 1.2692 1.0045 1.0333 1.1446 1.0151 1.1057 1.0060 1.0775 1.1293 0.9956 7 7.67 −6.42 1.30E−5
± 9.65E−6 ± 6.63E−5 ± 1.06E−5 ± 6.89E−6 ± 1.32E−6 ± 6.79E−5 ± 7.93E−6 ± 0.0 0 02 ± 2.14E−5 ± 1.08E−5 ± 1.69E−5 ± 4.79E−6
NARXT 0.0028 0.2958 0.5316 0.0701 0.0554 0.2093 1.0341 0.6816 0.0041 0.0010 0.0523 0.0564 5 4.17 −2.92 4.75E−2
± 0.0061 ± 0.2751 ± 1.1479 ± 0.0738 ± 0.0848 ± 0.3495 ± 2.1578 ± 1.4327 ± 0.0040 ± 0.0019 ± 0.0771 ± 0.0728
PELMNN 1.1485 1.2567 1.2284 1.1214 1.0872 1.3224 1.1973 1.2612 1.1096 1.1453 1.1930 1.3387 11 10.08 −8.83 1.96E−9
± 0.0234 ± 0.0450 ± 0.0158 ± 0.0293 ± 0.0109 ± 0.0415 ± 0.0566 ± 0.0199 ± 0.0363 ± 0.0124 ± 0.1261 ± 0.2240
RAA 0.0208 0.0115 0.5021 0.0147 0.0017 0.0 0 03 0.0013 0.3048 0.0025 0.0049 0.1435 0.0 0 06 2 2.75 −1.50 3.08E−1
± 0.0438 ± 0.0143 ± 0.4730 ± 0.0260 ± 0.0033 ± 0.0 0 04 ± 0.0011 ± 0.3625 ± 0.0031 ± 0.0108 ± 0.2731 ± 0.0010
SVRL 1.0439 1.1615 1.5156 1.0014 1.0752 1.3555 1.4121 1.2256 1.0235 1.0975 1.1645 1.0296 9 9.25 −8.00 5.48E−8
± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0
SVRP 1.0219 1.1297 1.3733 1.0234 1.0235 1.1768 1.3916 1.1427 1.0273 1.0547 1.1286 1.0673 8 8.33 −7.08 1.49E−6
± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0
SVRR 1.2953 1.2914 1.3171 1.1286 1.1029 1.6726 1.6164 1.5035 1.1005 1.1325 1.1078 1.4777 12 10.83 −9.58 7.48E−11
± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0 ± 0.0 0 0 0
R.d.A. Araújo, N. Nedjah and A.L.I. Oliveira et al. / Neurocomputing 347 (2019) 59–81 77

Fig. 11. Prediction results of the proposed model for the time series (last ten points of the test set) – blue solid line (real value) and red dashed line (predicted value): (a)
CAC40, (b) DAX, (c) DJIA, (d) HANGSENG, (e) IBOVESPA, (f) IPSA, (g) MERVAL, (h) NASDAQ, (i) NIKKEI225, (j) NYSE, (k) SP500 and (l) SSE. (For interpretation of the references
to color in this figure legend, the reader is referred to the web version of this article.)

Table 8 Table 9
Summary of the Kolmogorov–Smirnov test re- Summary of the Ljung-Box Q-test results
sults for the residuals generated by the pro- for the residuals generated by the pro-
posed model. posed model.

Time series Decision (h) p-value Time series Decision (h) p-value

CAC 1 1.5759E−49 CAC 1 0.00


DAX 1 2.1741E−50 DAX 1 0.00
DJIA 1 4.8973E−53 DJIA 1 0.00
HS 1 7.5244E−51 HS 1 0.00
IBOV 1 1.6587E−51 IBOV 1 0.00
IPSA 1 2.5337E−49 IPSA 1 0.00
MER 1 1.3405E−52 MER 1 0.00
NASDAQ 1 9.1712E−50 NASDAQ 1 0.00
NK 1 1.2506E−50 NK 1 0.00
NYSE 1 3.6755E−52 NYSE 1 0.00
SP 1 1.3358E−52 SP 1 0.00
SSE 1 2.0496E−50 SSE 1 0.00

level, of the residuals are not autocorrelated. The results of the predictive performance and it can practically used to predict finan-
Ljung-Box Q-test are summarized in Table 9. cial time series sampled in daily frequency. We observe that the
hybrid mapping generated by the proposed model has high gener-
4.8. Considerations alization power and is able to estimate financial time series with-
out an 1-step-ahead delay, achieving superior prediction perfor-
Considering the results presented in previous sections, we pro- mance than all investigated models in this work, considering ARV,
vide the needed evidences that the proposed model has high MAPE, MSE, POCID and THEIL metrics. In this sense, the Friedman
78 R.d.A. Araújo, N. Nedjah and A.L.I. Oliveira et al. / Neurocomputing 347 (2019) 59–81

Fig. 12. Histogram of residuals generated by the proposed model for the time series: (a) CAC40, (b) DAX, (c) DJIA, (d) HANGSENG, (e) IBOVESPA, (f) IPSA, (g) MERVAL, (h)
NASDAQ, (i) NIKKEI225, (j) NYSE, (k) SP500 and (l) SSE.

test and the Tukey test have supported these results confirming of a financial time series. In addition to that, another justifica-
statistically the superior predictive performance of the proposed tion is according to the concept of automatic phase adjustment,
model. because it has high dependency of the learning process to set
Also, we observe that the practical implication of the obtained the model’s parameters and the model’s structure itself to work
values for THEIL metric is that the prediction generated using the properly. However, even with this characteristic, it is worth men-
proposed model does not have any delay with respect to real time tioning that these models are not able to overcome, in a sta-
series values (confirmed by the prediction graphics generated by tistical way, the predictive performance of the proposed model
the proposed model). It is worth mentioning that, except for the for all financial time series investigated in this work, consider-
proposed model and for the DEM, DMN, LNNN, NARXT and RAA ing the THEIL metric and also for ARV, MAPE, MSE and POCID
models, the prediction models considered in this work are not able metrics.
to overcome such drawback. In this sense, some evidences can justify the expressive per-
Assuming that a financial time series can be modeled in terms formance of the proposed model: (i) the hypothesis presented in
of a combination of linear and nonlinear components, then a pos- Section 2 that financial time series can be modeled in terms of a
sible reason for such behavior for the ARIMA, MLP, PELMNN, SVRL, balanced combination of a short-term dominant linear component
SVRP and SVRR models is due to the expected value (E[ · ]) of the and a long-term subdominant nonlinear component with both in-
prediction (zt ) tends to the last time series value (zt−1 ), that is, creasing and decreasing behaviors (the proposed model can be
E[zt ] → E[zt−1 ]. This hypothesis can justify why the prediction has viewed as a direct mapping for this kind of time series), (ii) the
an 1-step-ahead delay with respect to the real time series value. ability of the proposed model to estimate the percentage of use
However, other empirical evidences and a mathematical proof of of the short-term dominant linear component and the long-term
such behavior must be investigated as future work. subdominant nonlinear component in each processing unit of the
Nonetheless, when we analyze the results obtained by the DEM, deep layer structure, and (iii) the use of a descending gradient-
DMN, LNNN, NARXT and RAA models, we verify that the achieved based learning process with the concept of automatic phase ad-
values for the THEIL metric are also smaller than 1. A feasible jus- justment.
tification of this behavior is according the ability of the model to From the obtained results we can also verify that the proposed
estimate a hybrid mapping, since it must be composed of both lin- learning process is able to converge for optimal points of the error
ear and nonlinear relationships within the generator phenomenon surface, since the achieved values for ARV, MAPE, MSE and THEIL
R.d.A. Araújo, N. Nedjah and A.L.I. Oliveira et al. / Neurocomputing 347 (2019) 59–81 79

Fig. 13. Autocorrelation of residuals of the proposed model for the time series: (a) CAC40, (b) DAX, (c) DJIA, (d) HANGSENG, (e) IBOVESPA, (f) IPSA, (g) MERVAL, (h) NASDAQ,
(i) NIKKEI225, (j) NYSE, (k) SP500 and (l) SSE.

metrics tend to 0, and the achieved values for POCID metric tend Based upon such evidence, we developed an efficient neu-
to 1. Note that the proposed learning process is stable (low val- ral network, the deep increasing–decreasing-linear perceptron
ues of the standard deviation for all metrics considered), even with (DIDLP). Each layer of the proposed model is composed of hybrid
the nondifferentiability problem of both infimum and supremum neurons (having linear and nonlinear components with increasing
operators employed within in the processing unit structure, prov- and decreasing operators) arranged in a deep structure to build
ing that the smoothed rank indicator vector used in the proposed a hybrid modeling for financial time series. Additionally, for the
learning process can remove such drawback. Moreover, the residual proposed model design, we have presented a descending gradient-
analysis (through the histogram and the autocorrelation function) based learning process with automatic phase adjustment, using
have suggested that the residuals are not normally distributed and ideas from the back-propagation algorithm and employing a sys-
are autocorrelated (fact confirmed by the Kolmogorov–Smirnov test tematic approach to circumvent the nondifferentiability problem of
and the Ljung-Box Q-test, at 5% of significance level, respectively). the increasing–decreasing operator.
Performance evaluation strongly supports the conclusion that
the proposed model has a stable learning and high generalization
5. Conclusions power to predict real financial time series. It is worth mentioning
that regardless of the time series, all DIDLP models do avoid the
In this work we have presented several empirical evidences 1-step-ahead delay from the prediction with respect to real time
that the investigated financial time series, sampled in daily series values (all values for THEIL metric are smaller than 1 and
frequency, are somehow predictable. We have supported such also according to the prediction graphic). Note that the proposed
hypothesis based on: (i) a quantitative approach, using the autocor- model obtained a consistent superior performance, considering all
relation function (to analyze linear dependencies), the mean mu- metrics (ARV, MAPE, MSE, POCID and THEIL), with respect to all
tual information (to analyze nonlinear dependencies) associated to other models considered in this work. Therefore, we can safely
the Hurst parameter (to assess the nature of such nonlinearity in conclude that the proposed model has a far better effectiveness
terms of long term or short term components of auto-similar pro- overall.
cesses) and the first derivative (to analyze the presence of increas- As future work, a particular study about the CPU time and the
ing or decreasing components), and (ii) a qualitative approach, us- computing complexity of the proposed model and its learning pro-
ing the lagplot (to analyze graphically the relationship among time cess must be performed. Also, additional research must be per-
lags). formed to analyze the impact of using the cross-entropy cost func-
80 R.d.A. Araújo, N. Nedjah and A.L.I. Oliveira et al. / Neurocomputing 347 (2019) 59–81

tion within the proposed learning process. As we empirically de- [26] E. Horta, F. Ziegelmann, Dynamics of financial returns densities: a functional
fine the number of layers, and hence the number of processing approach applied to the Bovespa intraday index, Int. J. Forecast. 34 (1) (2018)
75–88.
units in each layer, a sensitivity analysis must be performed to [27] E. Hurst, Long term storage capacity of reservoirs, Trans. Am. Soc. Civil Eng.
choose optimal values for these parameters. A further investiga- 116 (1951) 770–799.
tion about the application of the proposed model for image anal- [28] A.J. Hussain, D. Al-Jumeily, H. Al-Askar, N. Radi, Regularized dynamic self-or-
ganized neural network inspired by the immune algorithm for financial time
ysis and recognition must be performed. An additional study must series prediction, Neurocomputing 188 (2016) 23–30.
be done in order to determine why the ARIMA, MLP, PELMNN, [29] H. Jia, Y. Xia, Y. Song, W. Cai, M. Fulham, D.D. Feng, Atlas registration and en-
SVRL, SVRP and SVRR models are not able to avoid predictions with semble deep convolutional neural network-based prostate segmentation using
magnetic resonance imaging, Neurocomputing 275 (2018) 1358–1369.
1-step-ahead delay, even with the use of the concept automatic
[30] K.-J. Kim, Financial time series forecasting using support vector machines,
phase adjustment, since any theoretical justification of such behav- Neurocomputing 55 (1) (2003) 307–319.
ior is not yet known. Finally, an additional experimental analysis [31] S. Lee, D. Enke, Y. Kim, A relative value trading system based on a correla-
tion and rough set analysis for the foreign exchange futures market, Eng. Appl.
must be done to show the potential, in practice, of the proposed
Artif. Intel. 61 (2017) 47–56.
model in terms of the benefit for financial investors. [32] J.M.P. Menezes, G.A. Barreto, Long-term time series prediction with the
Narx network: an empirical evaluation, Neurocomputing 71 (16-18) (2008)
References 3335–3343.
[33] J.P. Nobrega, A.L.I. Oliveira, Kalman filter-based method for online sequential
[1] S. Asadi, E. Hadavandi, F. Mehmanpazir, M.M. Nakhostin, Hybridization of evo- extreme learning machine for regression problems, Eng. Appl. Artif. Intel. 44
lutionary Levenberg–Marquardt neural networks and data pre-processing for (2015) 101–110.
stock market prediction, Knowl. Based Syst. 35 (2012) 245–258. [34] N. Oliveira, P. Cortez, N. Areal, Stock market sentiment lexicon acquisition us-
[2] G.E.P. Box, G.M. Jenkins, G.C. Reinsel, Time Series Analysis: Forecasting and ing microblogging data and statistical measures, Dec. Supp. Syst. 85 (2016)
Control, third, Prentice Hall, New Jersey, 1994. 62–73.
[3] R.C. Cavalcante, R.C. Brasileiro, V.L. Souza, J.P. Nobrega, A.L. Oliveira, Computa- [35] J. Patel, S. Shah, P. Thakkar, K. Kotecha, Predicting stock market index us-
tional intelligence and financial markets: a survey and future directions, Expert ing fusion of machine learning techniques, Expert Syst. Appl. 42 (4) (2015)
Syst. Appl. 55 (2016) 194–211. 2162–2172.
[4] E. Chong, C. Han, F.C. Park, Deep learning networks for stock market analysis [36] L.F.C. Pessoa, P. Maragos, Neural networks with hybrid morphological rank lin-
and prediction: methodology, data representations, and case studies, Expert ear nodes: a unifying framework with applications to handwritten character
Syst. Appl. 83 (2017) 187–205. recognition, Pattern Recogn. 33 (20 0 0) 945–960.
[5] M.P. Clements, P.H. Franses, N.R. Swanson, Forecasting economic and financial [37] D.H.B. Phan, S.S. Sharma, P.K. Narayan, Stock return forecasting: some new ev-
time-series with non-linear models, Int. J. Forecast. 20 (2) (2004) 169–183. idence, Int. Rev. Financ. Anal. 40 (2015) 38–51.
[6] M.P. Clements, D.F. Hendry, On the limitations of comparing mean square fore- [38] M. Podsiadlo, H. Rybinski, Financial time series forecasting using rough sets
cast errors, J. Forecast. 12 (8) (1993) 617–637. with time-weighted rule voting, Expert Syst. Appl. 66 (2016) 219–233.
[7] N. Crato, E. Ruiz, Can we evaluate the predictability of financial markets? Int. [39] L. Prechelt, Proben1: A set of Neural Network Benchmark Problems and Bench-
J. Forecast. 28 (1) (2012) 1–2. marking Rules, Technical Report, 1994.
[8] R. de A. Araujo, A.L.I. Oliveira, S.R. de L. Meira, A class of hybrid multilayer [40] J. Qian, J. Yang, Y. Tai, H. Zheng, Exploring deep gradient information for bio-
perceptrons for software development effort estimation problems, Expert Syst. metric image feature representation, Neurocomputing 213 (2016) 162–171.
Appl. 90 (2017) 1–12. [41] M. Qin, Z. Li, Z. Du, Red tide time series forecasting by combining Arima and
[9] R. de A. Araujo, A.L.I. Oliveira, S.R. de L. Meira, A morphological neural network deep belief network, Knowl. Based Syst. 125 (2017) 39–52.
for binary classification problems, Eng. Appl. Artif. Intel. 65 (2017) 12–28. [42] D.E. Rumelhart, J.L. McCleland, Parallel Distributed Processing, Explorations in
[10] R. de A. Araújo, A class of hybrid morphological perceptrons with ap- the Microstructure of Cognition Volume 1 & 2, MIT Press, 1987.
plication in time series forecasting, Knowl. Based Syst. 24 (4) (2011) [43] W. Shen, X. Guo, C. Wu, D. Wu, Forecasting stock indices using radial basis
513–529. function neural networks optimized by artificial fish swarm algorithm, Knowl.
[11] R. de A. Araújo, A morphological perceptron with gradient-based learning for Based Syst. 24 (3) (2011) 378–385.
brazilian stock market forecasting, Neural Netw. 28 (2012) 61–81. [44] R. Sitte, J. Sitte, Neural networks approach to the random walk dilemma of
[12] R. de A. Araújo, A robust automatic phase-adjustment method for financial financial time series, Appl. Intell. 16 (3) (2002) 163–171.
forecasting, Knowl. Based Syst. 27 (2012) 245–261. [45] D. Sornette, W.-X. Zhou, Predictability of large future changes in major finan-
[13] R. de A. Araújo, T. Fereira, An intelligent hybrid morphological-rank-linear cial indices, Int. J. Forecast. 22 (1) (2006) 153–168.
method for financial time series prediction, Neurocomputing 72 (10) (2009) [46] M.B. Stojanovic, M.M. Bozic, M.M. Stankovic, Z.P. Stajic, A methodology for
2507–2524. training set instance selection using mutual information in time series pre-
[14] R. de A. Araújo, A.L.I. Oliveira, S. Meira, A hybrid model for high-frequency diction, Neurocomputing 141 (2014) 236–245.
stock market forecasting, Expert Syst. Appl. 42 (8) (2015) 4081–4096. [47] X. Sun, T. Li, Q. Li, Y. Huang, Y. Li, Deep belief echo-state network and its ap-
[15] J. Demsar, Statistical comparisons of classifiers over multiple data sets, J. Mach. plication to time series prediction, Knowl. Based Syst. 130 (2017) 17–29.
Learn. Res. 7 (2006) 1–30. [48] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Van-
[16] N. Diniz, F.G. Lima, A.C. da Silva Filho, The impact of the hurst window in houcke, A. Rabinovich, Going deeper with convolutions, in: Proceedings of the
the financial time series forecast: an analysis through the exchange rate, Rev. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR),
Busin. Res. 12 (2012) 27–33. 2015, pp. 1–9.
[17] E.F. Fama, Random walks in stock market prices, Financ. Anal. J. 21 (5) (1965) [49] A.K. Tiwari, C.T. Albulescu, S.M. Yoon, A multifractal detrended fluctuation
55–59. analysis of financial market efficiency: comparison using dow jones sector ETF
[18] H.M. Fayek, M. Lech, L. Cavedon, Evaluating deep learning architectures for indices, Phys. A Stat. Mech. Appl. 483 (2017) 182–192.
speech emotion recognition, Neural Netw. 92 (2017) 60–68. [50] J.W. Tukey, Comparing individual means in the analysis of variance, Biometrics
[19] A. Fraser, H. Swinney, Independent coordinates for strange atractors from mu- 5 (1949) 99–114.
tual information, Phys. Rev. A 33 (2) (1986) 1134–1140. [51] M. Tzelepi, A. Tefas, Deep convolutional learning for content based image re-
[20] M. Friedman, A comparison of alternative tests of significance for the problem trieval, Neurocomputing 275 (2018) 2467–2478.
of m rankings, Ann. Math. Statist. 11 (1) (1940) 86–92. [52] W. Wang, W. Pedrycz, X. Liu, Time series long-term forecasting model based
[21] M.S. Gashler, S.C. Ashmore, Modeling time series data with deep fourier neural on information granules and fuzzy clustering, Eng. Appl. Artif. Intel. 41 (2015)
networks, Neurocomputing 188 (2016) 3–11. 17–24.
[22] R. Girshick, J. Donahue, T. Darrell, J. Malik, Rich feature hierarchies for accu- [53] J. Westerlund, P. Narayan, Testing for predictability in panels of any time series
rate object detection and semantic segmentation, in: Proceedings of the 2014 dimension, Int. J. Forecast. 32 (4) (2016) 1162–1177.
IEEE Conference on Computer Vision and Pattern Recognition, CVPR ’14, IEEE [54] J. Xue, S. Zhou, Q. Liu, X. Liu, J. Yin, Financial time series prediction using
Computer Society, 2014, pp. 580–587. l2,1RF-ELM, Neurocomputing 277 (2018) 176–186. Hierarchical Extreme Learn-
[23] T.H. Hann, E. Steurer, Much ado about nothing? exchange rate forecasting: ing Machines.
neural networks vs. linear models using monthly and weekly data, Neurocom- [55] J. Yao, C.L. Tan, A case study on using neural networks to perform technical
puting 10 (1996) 323–339. forecasting of forex, Neurocomputing 34 (1-4) (20 0 0) 79–98.
[24] S. Haykin, Neural Networks: A Comprehensive Foundation, Prentice Hall, New [56] U. Yolcu, E. Egrioglu, C.H. Aladag, A new linear & nonlinear artificial neu-
Jersey, 1998. ral network model for time series forecasting, Dec. Supp. Syst. 54 (3) (2013)
[25] H. Herwartz, Stock return prediction under Garch – an empirical assessment, 1340–1347.
Int. J. Forecast. 33 (3) (2017) 569–580. [57] E. Zamora, H. Sossa, Dendrite morphological neurons trained by stochastic gra-
dient descent, Neurocomputing 260 (2017) 420–431.
R.d.A. Araújo, N. Nedjah and A.L.I. Oliveira et al. / Neurocomputing 347 (2019) 59–81 81

Ricardo de A. Araújo graduated in 2006 in Computer Sci- Adriano L. I. Oliveira graduated in 1993 in Electrical En-
ence at Catholic University of Pernambuco, Brazil, and in gineering and in 1997 received an M.Sc. degree in Com-
2012 obtained an M.Sc. degree in Computer Science at puter Science, both at Federal University of Pernambuco.
Federal University of Pernambuco, Brazil. Since 2016 he Since 2004 he holds a Ph.D. degree in Computer Sci-
holds a Ph.D. degree in Computer Science from Federal ence from Federal University of Pernambuco. In 2006 he
University of Pernambuco - -Informatics Center, Brazil. He was elected as IEEE Senior Member. He is currently an
joined the Informatics Department of the Federal Institute Associate Professor at the Informatics Center from Fed-
of Sertão Pernambucano as an Associate Professor. He eral University of Pernambuco. Also, he is currently a re-
is currently a Research Leader of the Araripe Intelligent searcher at the National Institute of Science and Tech-
Computing Laboratory. He published more than twenty nology for Software Engineering. He published more than
journal papers and more than seventy conference papers. twenty journal papers and more than ninety conference
His main research interests include neural networks, evo- papers. His main research interests include neural net-
lutionary computing and mathematical morphology with works and evolutionary computing with applications in
applications in prediction and classification. prediction and classification.

Nadia Nedjah graduated in 1987 in Systems Engineer- Silvio R. de L. Meira graduated in 1977 in Electrical En-
ing and Computation and in 1990 obtained an M.Sc. de- gineering at Technological Institute of Aeronautics and in
gree also in Systems Engineering and Computation. Both 1981 received an M.Sc. degree in Computer Science at
degrees were obtained from University of Annaba, Alge- Federal University of Pernambuco. Since 1985 he holds
ria. Since 1997 she holds a Ph.D. degree from Univer- a Ph.D. degree in Computer Science from University of
sity of Manchester – Institute of Science and Technol- Kent at Canterbury. He is currently an Emeritus Professor
ogy, UK. She joined the Department of Electronics Engi- at the Informatics Center from Federal University of Per-
neering and Telecommunications of the Engineering Fac- nambuco, an Extraordinary Professor at the CESAR.school
ulty of the State University of Rio de Janeiro as an Asso- and Senior Research Scientist at the ISITICS.com. He pub-
ciate Professor. She is currently a member of the Intelli- lishedmore than fifty journal papers and more than two
gent System research area in the Electronics Engineering hundred conference papers. His main research interests
Post-graduate programme of the State University of Rio include software engineering, social machines, social net-
de Janeiro, Brazil. She is the Editor-in-Chief of the Interna- working, creativity, innovation and entrepreneurship.
tional Journals of High Performance System Architecture and of Innovative Comput-
ing Applications, both published by Inderscience, UK. She published three authored
books about Functional and Re-writing Languages, Hardware/Software Co-design
for Systems Acceleration and Hardware for soft Computing vs. Soft Computing for
Hardware. She (co)-guest edited more than fifteen special issues for high impact
journals and more than forty organized books on computational intelligence related
topics, such as Evolvable Machines, Genetic Systems Programming, Evolutionary
Machine Design: Methodologies and Applications and Real-World Multi-Objective
System Engineering. She (co)-authored more than ninety journal papers and more
than one hundred conference papers. She is Associate Editor of more than ten inter-
national journals, such as the Francis & Taylors International Journal of Electronics,
Elseviers Integration, The VLSI Journal and Microprocessors and Microsystems and
IETs Computer & Digital Techniques. She organized two major conferences related
to computational intelligence: the 7th edition of Intelligent Systems Design and Ap-
plication and the 5th edition of Hybrid Intelligent Systems. She also was one of the
founder of the International Conference on Adaptive and Intelligent Systems. (More
details can be found at her home-page: http://www.eng.uerj.br/nadia/english.html.)

You might also like