NARX Model1

Informatica Economică vol. 24, no.
1/2020 5
The Use of LSTM Neural Networks to Implement the NARX Model.

A Case Study of EUR-USD Exchange Rates
Cătălina-Lucia COCIANU, Mihai-Șerban AVRAMESCU

The Bucharest University of Economic Studies, Romania
catalina.cocianu@ie.ase.ro, mihai_serban.avramescu@yahoo.com
The paper focuses on financial data forecasting in terms of one-step-ahead nonlinear model
with exogenous inputs. The main aim is the development of a methodology to forecast the
exchange rate between EURO and US Dollar. The prediction task is carried out by two
recurrent neural networks, the standard NARX neural network and a LSTM-based approach.
The exogenous inputs consist of historical trading data and three widely used technical
indicators, namely a variant of moving average, the Upper Bollinger Frequency Band and the
Lower Bollinger Frequency Band. In order to obtain accurate forecasting algorithms, the
exogenous inputs are filtered using the well-known Gaussian low-pass filter. The quality of
each method is evaluated in terms of both quantitative and qualitative metrics, namely the root
mean squared error, the mean absolute percentage error, and the prediction of change in
direction. Extensive experiments point out that the most suited forecasting method is based on
the proposed LSTM neural network for NARX model.
Keywords: Exchange Rate Forecasting, NARX Model, NARX Networks, LSTM Networks,
statistical metrics
DOI: 10.24818/issn14531305/24.1.2020.01
1 Introduction
Modern financial markets are chaotic and
very hard to predict systems mainly due to
high forecasting accuracy machine-learning
techniques have been reported in literature
([1], [2], [3], [4]).
their volatile features and non-linearity. Nowadays, the field of financial forecasting is
Therefore, stock market prediction is regarded developing fast, the financial data being the
as one of the most challenging applications main valuable source of information for the
among time series forecasting and it is under investors and traders. Most of the modern
continuous investigations for years. A variety studies on financial data forecasting are
of statistical methods have been introduced to focused on developing hybridized models
the time series domain and then applied to the which can combine the advantages of
problem of financial data analysis and statistical, data mining and machine learning
forecasting, such as the autoregressive (AR) algorithms (see de [5], [6], [7]). On one hand,
family models, exponential smoothing, and the aim of using combined models is to
moving average (MA) techniques. However, process the financial datasets with data mining
conventional statistical models mostly failed techniques before considering them for
to capture the complexity and non-stationary learning process in order to remove noise or to
structure of financial data. During the past smooth data. Several data pre-processing
decades, machine learning (ML) models techniques, as for instance feature extraction
turned into serious competitors to classical and feature selection, could decrease the
statistical models in the time series analysis redundancy and noise in time series data and
and forecasting field. The most successful consequently increase the prediction models
ones, such as artificial neural networks (ANN) performances ([8], [9], [10], [11]). On the
and support vector machines (SVM) have other hand, since the exact price of stocks is
been widely used to predict financial time unpredictable, a series of research works are
series due to their ability to learn from focused on predicting the price movements,
nonlinear data and successfully perform therefore the problem of data forecasting is
classification and prediction tasks. Several reduced to a classification problem [12].
6 Informatica Economică vol. 24, no. 1/2020
In the literature, one of the most recently the predicted value of Yt . The general NARX
explored ML field is the deep learning forecasting model is expressed by [17]:
research area (DL). A series of recurrent (𝑑 ) (𝑑 )
Ŷ(t+p) = f (𝑌𝑡 𝑌 , 𝑋𝑇𝑡 𝑋𝑇 ) (1)
neural networks, such as convolutional neural
networks and long-short time memory where
(LSTM) have been developed to solve  f is a non-linear function
(𝑑 )
classification and prediction tasks ([13], [14],  𝑌𝑡 𝑌 = {Yt , Yt−1 , Yt−2 , … , Yt−𝑑𝑌 +1 }
(𝑑𝑋𝑇 )
[15], [16]). The research reported in this paper  𝑋𝑇𝑡 =
aims to investigate the potential of the LSTM- (𝑑 ) (𝑑 ) (𝑑𝑋𝑇 )
{𝑋𝑇𝑡 𝑋𝑇 (1), 𝑋𝑇𝑡 𝑋𝑇 (2), … , 𝑋𝑇𝑡 (𝑁 )}
based approaches to implement the NARX
(𝑑 )
model.  for each 1 ≤ 𝑖 ≤ 𝑁 𝑋𝑇𝑡 𝑋𝑇 (𝑖 ) =
The outline of the paper is as follows. In {XTt (𝑖 ), XTt−1 (𝑖 ), … , XTt−𝑑𝑋𝑇(𝑖)+1 (𝑖 )}.
Section 2 the NARX prediction model is
discussed briefly. Next, the NARX neural In our study we consider p=1, the equation (1)
network and the LSTM networks are describing the one-step ahead prediction
described. The proposed methodology is model. In order to simplify (1), we define
provided in the fourth section of the paper. In 𝑑 = max{𝑑𝑌 , max{𝑑𝑋𝑇 (1), … , 𝑑𝑋𝑇 (𝑁)}}
Section 5 the experimental results of the and the prediction model
proposed methodology together with a
comparative analysis are presented. The (𝑑) (𝑑)
concluding remarks are given in the final Ŷ(t+1) = f(𝑌𝑡 , 𝑋𝑇𝑡 ) (2)
section of the paper.
where
2 The NARX prediction model. The  f is a non-linear function
(𝑑)
NARX Neural Networks  𝑌𝑡 =
The NARX model (Nonlinear Autoregressive {Yt , Yt−1 , Yt−2 , … , Yt−d+1 }
with eXogenous input) is well suited for (𝑑)
 𝑋𝑇𝑡 =
modelling non-linear time series as well as for
{XTt , XTt−1 , XTt−2 , … , XTt−d+1 }
one-step and multi-step ahead prediction. To 𝑇
develop the general forecasting model in case  XTt = (XTt (1), XTt (2), … , X𝑇t (𝑁))
of non-stationary time series one can use
Partial Autocorrelation Function (PACF) to The non-linear function f can be computed by
establish the delay variable. a neural network, the most commonly choice
In the following we denote by: being the NARX networks.
 T - the total number of time periods The NARX neural networks (NARXNN) are
 𝑌 - the variable to be forecast recurrent dynamic networks (RNN) specially
 𝑋𝑇 - the set of N latent variables uses to tailored to model nonlinear systems, as for
forecast, instance time series. Moreover, NARX
𝑇 networks have high generalization
XT = (𝑋𝑇(1), 𝑋𝑇(2), … , 𝑋𝑇(𝑁)) performance and good learning ability which
 𝑑𝑌 , 𝑑𝑋𝑇 = (𝑑𝑋𝑇 (1), … , 𝑑𝑋𝑇 (𝑁)) - the makes them well suited for financial data. The
corresponding delays. NARXNN topology consists of three layers
Note that the delay variable of a time series X connected with each other, namely an input
corresponds to the lag d, where the PACF layer 𝐹𝑋 , a hidden layer 𝐹𝐻 , and an output
function computed for X drops immediately layer 𝐹𝑌 . The general architecture of the
after the dth lag. For 1 ≤ 𝑡 ≤ 𝑇, Yt is the value NARXNN is displayed in Figure 1.
at the moment of time t, and we denote by ̂ Yt
Informatica Economică vol. 24, no. 1/2020 7
Fig. 1. The architecture of the three-layered NARX network model
The training of NARXNN is of supervised of time steps dependencies. The general idea
type and usually uses a gradient descent of LSTM is to recurrently project the input
approach. In most of the cases, the local sequence into a sequence of hidden
memories of 𝐹𝐻 and 𝐹𝑌 are determined by the representations. At each time-step, the LSTM
Levenberg-Marquardt variant of the learns the hidden representation by jointly
backpropagation learning algorithm [18]. considering the associate input and the
Since the actual output is available during the previous hidden representation to capture the
training of the network, a series-parallel sequential dependency. [19]
architecture is created, where the estimated The learning process is accomplished using
target is replaced by the actual output. After specific memory blocks located in the
the training step, the series-parallel recurrent hidden layer. The memory blocks
architecture is converted into a parallel are created from auto-connected cells in
configuration, in order to perform the which neural network temporal states are
prediction task. The standard performance saved. Also, each block has an input structure
function is defined in terms of mean square and an output structure. The LSTM has the
network errors. Denoting by |. | the number of abilities to remove or add information to the
elements of the argument, the sizes of the cell state. A memory cell consists of four
hidden layers 𝐹𝐻 can be computed many units: an input gate, a forget gate, an output
ways, some of the most frequently used gate, and a self-recurrent neuron. Each unit
expressions being cell controls the interactions between
neighbouring memory cells and the memory
|𝐹𝐻 | = 2 [√(|𝐹𝑌 | + 2)|𝐹𝑋 |] (3) cell itself. The input gate decides if the input
modifies the state of the memory cell. The
forget gate chooses whether to remember or to
3 The LSTM Neural Networks forget the previous state of the memory cell.
Long Short Term Memory is a type of RNN The output gate has the role to decide if the
architecture that allows the long-time learning
state of the memory cell should alter the state

of other memory cells. [20] 𝑖𝑡 = 𝜎(𝑊𝑖 ° [𝑦𝑡−1 , 𝑥𝑡 ] + 𝑏𝑖 ) (5)
A LSTM neural network maps a certain input
sequence 𝑥 = (𝑥1 , … , 𝑥 𝑇 ) to an output 𝑐̃𝑡 = 𝑡𝑎𝑛ℎ(𝑊𝑐 ° [𝑦𝑡−1 , 𝑥𝑡 ] + 𝑏𝑐 ) (6)
sequence 𝑦 = (𝑦1 , … , 𝑦𝑇 ) iteratively for 𝑡 =
1, … , 𝑇. The topology of a LSTM network where 𝑊𝑖 and 𝑊𝑐 are the weight matrices, 𝑏𝑖
includes one or multiple hidden layers. The and 𝑏𝑐 are the biases, and 𝑦𝑡−1 is the value of
mathematics behind the most common model the memory cell at time t. Note that 𝑖𝑡 is the
of the LSTM architecture is described as value of the input gate while 𝑐̃𝑡 represents the
follows [20]. We denote by ° the Hadamard candidate state of the memory cell at the time
product. t.
First, one has to define the function used to Based on the first two steps 𝑐𝑡 , an update to
remove a part from cell state information. The the state of the memory cell 𝑐𝑡−1 is computed
decision belongs to the so-called forget gate as follows
layer, usually modelled in terms of the 𝑐𝑡 = 𝑓𝑡 ∙ 𝑐𝑡−1 + 𝑖𝑡 ∙ 𝑐̃𝑡 (7)
sigmoid function, given by
Finally, the output representing a filtered
𝑓𝑡 = 𝜎(𝑊𝑓 ° [𝑦𝑡−1 , 𝑥𝑡 ] + 𝑏𝑓 ) (4) version of the cell state is given by
𝑦𝑡 = 𝑜𝑡 °𝑡𝑎𝑛ℎ(𝑐𝑡 ) (8)
where 𝑊𝑓 is the weight matrix, 𝑏𝑓 is the bias 𝑜𝑡 = 𝜎(𝑊𝑜 ° [𝑦𝑡−1 , 𝑥𝑡 ] + 𝑏𝑜 ) (9)
vector and 𝑓𝑡 is the value of the forget gate at
the time t. where 𝑊𝑜 is the weight matrix, 𝑏0 is the bias
Next, we have to decide what new information vector, and 𝑜𝑡 is the value of the output gate at
should be preserved in the cell state. The the time t.
process consists of two steps: the selection of The inner structure of a LSTM cell containing
the values that should be updated and the the external dependencies is shown in Figure
computation of a new vector of candidate 2.
values. The updated values are usually Note that the additional layer representing a
computed by the input gate layer of sigmoid memory block can be added to the network in
type (5) while the vector containing the new order to prevent a continuous processing of
candidate values is compute by a tanh layer data without a corresponding segmentation.
Fig. 2. Standard LSTM cell

The training difficulty occurs when there are the time series V is computed by averaging the
temporal dependencies at high distances values of the past n recorded values, as
between the time series values. In this case, follows
𝑛
small changes in the iterative flow could lead 𝑉𝑡−(𝑖−1)
𝑉( ) (11)
to high effects over future iterations results. 𝑀𝐴𝑛 𝑡 = ∑
𝑛
The number of hidden neurons is usually 𝑖=1
computed based on the number of training
samples, the input size and the dimension of Note that all the previous values used to
output as follows [21] compute the MA indicator (11) are weighted
1
𝑇 by the same value 𝑝 = 𝑛. In case n is large,
|𝐹𝐻 | = (10)
𝛼 ∙ (|𝐹𝑋 | + |𝐹𝑌 |) one can consider the weights such that higher
values are associated to more current data and
where 𝛼 is an arbitrary scaling factor, 𝛼 ∈ lower values are considered for the older data.
[2,10]. Note that smaller values of the The version of MA used in our study is
parameter 𝛼 could lead to non-overfitting inspired by the exponential ranking
learning schemes, with significantly large distribution probability and it is defined by:
computational effort. 𝑛
𝑀𝐴𝑉𝑛 (𝑡) = ∑ 𝛼𝑖 ∙ 𝑉𝑡−(𝑖−1) (12)

4 The Proposed Methodology 𝑖=1
4.1 Technical Analysis and Variable 𝛽𝑖
𝛼𝑖 = , 𝑖 = 1, … , 𝑛 (13)
Selection 𝐶
Technical analysis is used in financial
evolutions studies, especially to determine
1
various patterns of fluctuations in order to where 𝛽𝑖 = and 𝐶 = ∑𝑛𝑘=1 𝛽𝑘 .
𝑒𝑥𝑝{𝑖}−1
give the investor the advantage of profitable
purchasing and selling decisions. The premise
Another class of technical indicators
behind technical analysis is that all of the
introduced to measure the degree of volatility
internal and external factors that affect a
for the security's prices consists of the
market at any given point in time are already
Bollinger Frequency Bands. Bollinger Bands
factored into that market’s price [22]. At the
are quantitative technical analysis tools
same time, the ability to predict future price
introduced by Bollinger [24]. The use of
fluctuations is still a challenge.
Bollinger Frequency Bands can lead to
In order to establish a suitable methodology,
forecasting systems able to identify short-term
we investigated several technical indicators to
trending properties in a financial time series
obtain more information about the monitored
and exploit them for superior investment
time series and, consequently to obtain more
returns [25].
accurate prediction values. From the point of
The Bollinger Frequency Bands use simple
view of NARX model, the considered
moving average along with lower and higher
indicators are exogenous variables
frequency to measures volatility in price
representing a subset of the neural network
evolution. The frequency bands are usually set
inputs. In the following we briefly describe
to two standard deviations away from the
the motivation and calculus formula for some
simple moving average. The Upper Bollinger
of the most commonly used technical
Frequency Band (UPBB) is computed by
indicators involved by our study.
A moving average (MA) is a time series
𝑈𝑃𝐵𝐵𝑛𝑉 (𝑡) = 𝑀𝐴𝑉𝑛 (𝑡) +
constructed by taking averages of several
sequential values of another time series. [23]
𝑛
A moving average is an indicator that shows 1 2
the average value of a time series over a 2 ∙ √ ∑ (𝑉𝑡−𝑖+1 − 𝑀𝐴𝑉𝑛 (𝑡)) (14)
𝑛−1
specified period of time. The n-period MA for 𝑖=1
evaluate the prediction capabilities of models.

while the Lower Bollinger Frequency Band We used two classes of measures to assess the
(LOBB) is given by forecasting capacity of the developed
methods, namely quantitative indicators and
𝐿𝑂𝐵𝐵𝑛𝑉 (𝑡) = 𝑀𝐴𝑉𝑛 (𝑡) − qualitative (trend-based) metrics.
In the following we denote by
1 2 {𝑌 (1), … , 𝑌(𝑇)} the set of actual values of Y
2 ∙ √𝑛−1 ∑𝑛𝑖=1(𝑉𝑖 − 𝑀𝐴𝑉𝑛 (𝑡)) (15) and we assume that {𝑌̂(1), … , 𝑌̂(𝑇)} is the
collection of the forecasted values.
4.2 Data Pre-processing The mean absolute percentage error (MAPE)
The methodology is developed to analyse measures the magnitude of the forecasting
historical data representing the indicator of error in percentage terms. Due to the fact that
exchange rate between EURO and US Dollar, the predicted values are deviated both
recorded between October 1999 and October positively and negatively in relation to the
2019 [26]. actual values, the calculation of MAPEI
The targeted series is EURO/USD Closing becomes necessary in the analysis process.
Price, which represents the last prices Naturally, past data should be discounted in a
recorded for every bank day. To forecast the more gradual way [27].
targeted values, we use the following 𝑇
1 |𝑌(𝑖) − 𝑌̂ (𝑖 )|
exogenous variables: 𝑀𝐴𝑃𝐸 = 100 × ∑ (16)
𝑇 𝑌 (𝑖 )
 EURO/USD LOW, representing the 𝑖=1
indicator with the minimum value for
every bank day; The root mean square error (RMSE) evaluates
 EURO/USD HIGH, representing the the standard deviation of the differences
indicator with the maximum value for between forecasted values and actual ones.
every bank day. The advantage of using RMSE (17) is that it
 three technical indicators computed based penalizes big errors relatively more than small
on the targeted time series, namely MA, errors [28]
UPBB and LOBB. 𝑇
In order to estimate a suitable window of the 1 2
𝑅𝑀𝑆𝐸 = √ ∑ (𝑌(𝑖 ) − 𝑌̂(𝑖 )) (17)
delay values we computed the PACF value of 𝑇
𝑖=1
each time series. Let d be such that, for all
considered variables, PACF function drops
immediately after the dth lag. This means that The prediction of change in direction
the delay should be set in a certain (POCID) is a metric used to measure the
neighborhood of d. percentage of the number of correct decisions
Next, all the time series recorded values are related to the changes in direction or trend
normalized, that is each variable is of zero ∑𝑇−1
𝑖=1 𝐷𝑖
𝑃𝑂𝐶𝐼𝐷 = 100 × (18)
mean and unit variance. Finally, each 𝑇
exogenous input is filtered using the standard
Gaussian low-pass filter. We evaluated the where
forecasting capacity of the NARX based 𝐷𝑖
model by splitting the available data into train 1, (𝑌(𝑖 + 1) − 𝑌 (𝑖 )) (𝑌̂(𝑖 + 1) − 𝑌̂(𝑖 )) ≥ 0
={
data (70%) and test data (30%), considered as 0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
new, unseen yet samples.
Obviously, good forecasting performances
4.3 Performance Measures correspond to small values of RMSE and
To establish meaningful conclusions MAPE measures and high POCID
regarding the research work, various percentages.
performance metrics have been selected to
5 Experimental Results Based on the values shown in table 1, we have

In order to develop a comparative analysis on tested the proposed methodology with delay
the use of NARX neural network and LSTM parameters belonging to the window 𝑊𝑑 =
neural network to implement the NARX {2,3,4,5,6}.
forecasting model, a long series of Taking into account that the simplest non-
experiments have been conducted. The results linear auto-regressive model (NAR), which
are presented in the following. does not involve exogenous data, is often the
First, we had to establish the delay parameter best choice in time series forecasting, we
in prediction model (2). In order to solve this developed a LSTM neural network to
problem, we computed the Partial implement (19)
Autocorrelation Function (PACF) ̂t+1 = 𝑓1(𝑌𝑡 , 𝑌𝑡−1 , … , 𝑌𝑡−𝑑+1 ) (19)
Y
corresponding to each time series. The analysis of the corresponding forecasting
accuracy is reported in Table 2.
Table 1. The PACF analysis
Series Lag Table 2. The forecasting results using the
X 2 model (19)
X1 3 DELAY MAPEI RMSE
X2 3 1 0.5415 ± 27.8 % 0.0078 ± 22.5 %
EMA – 2 days 3 2 0.5000 ± 22.8 % 0.0073 ± 19.0 %
EMA – 3 days 2 3 0.6170 ± 35.8 % 0.0086 ± 29.4 %
LOBB 4 4 0.5997 ± 35.1 % 0.0085 ± 29.6 %
UPBB 4
Table 3. The forecasting results using NARX NN for NARX model

DELAY MAPEI RMSE
2 1.1871 ± 67.7% 0.0157 ± 62.2%
3 0.7687 ± 35.4% 0.0112 ± 31.9%
4 0.8029 ± 36.6% 0.0116 ± 33.3%
5 0.7746 ± 32.2% 0.0111 ± 27.9%
6 0.8815 ± 19.1% 0.0125 ± 18.2%
Table 4. The forecasting results using LSTM NN for NARX model

DELAY MAPEI RMSE
2 0.4171 ± 22.73% 0.0061 ± 20.15%
3 0.4150 ± 21.91% 0.0060 ± 19.46%
4 0.4207 ± 19.61% 0.0061 ± 17.07%
5 0.4066 ± 17.87% 0.0060 ± 16.38%
6 0.4196 ± 14.83% 0.0062 ± 14.77%
We implemented the NARX forecasting The POCID values vary between 55% in case
model (2) using either the standard NARX of using NARX NN and 60% when LSTM
neural network described in Section 2 or the NN is considered.
LSTM neural network presented in Section 3. Based on the obtained results, we conclude
The results are shown in Table 3 and Table 4. that the LSTM neural network is more suited
The graphic representation of forecasted than standard NARX neural network to
values versus the true ones corresponding to forecast future values of the analysed data
the data in Table 2, Table 3 and Table 4 is from both accuracy and stability points of
depicted in Figure 3, Figure 4 and Figure 5, view.
respectively.
6 Conclusions and Suggestions for Further indicators class. In order to obtain accurate
Work results, the exogenous inputs are filtered using
The reported work focuses on the the well-known Gaussian low-pass filter.
development of financial data forecasting To establish meaningful conclusions
methods to implement the one-step-ahead regarding the accuracy of the obtained results,
nonlinear model with exogenous inputs. In various performance metrics have been
our study the main aim is to obtain a settled. Extensive experiments pointed out
prediction methodology to forecast the that the most suited forecasting method is
exchange rate between EURO and US Dollar. based on the proposed LSTM neural network
The prediction task is carried out by two for NARX model.
recurrent neural networks, the standard We conclude that the results are encouraging
NARX neural network and a LSTM-based and entail future work toward extending this
approach. We analysed five exogenous time approach to more complex NN-based models
series, two of them of stock-based type and the as well as hybrid techniques.
other three belonging to the technical
Fig. 3. The forecasting results using the model (19)
Fig. 4. The forecasting results using NARX NN in model (2)

Fig. 5. The forecasting results using LSTM NN in model (2)
References [10] G. Jianbo, S. Hussain, J. Hu and W.W.

[1] P. Sujin , L. Jaewook, C. Mincheol, J. Tung, Denoising Nonlinear Time Series
Huisu, Predictability of machine learning by Adaptive Filtering and Wavelet
techniques to forecast the trends of market Shrinkage: A Comparison, 2010.
index prices: Hypothesis testing for the [11] K. Torsten and D. Lorenz, A comparison
Korean stock market, 2017. of denoising methods for one dimensional
[2] K. M. Okasha, Using Support Vector time series, 2005.
Machines in Financial Time Series [12] F. Feng, H. Chen, X. He, J. Ding, M. Sun,
Forecasting, 2014. T.S. Chua, Improving Stock Movement
[3] P. Pallabi, K. Binita, Stock Market Prediction with Adversarial Training,
Prediction Using ANN, SVM, ELM: A 2018.
Review, 2017. [13] A. Azzouni and G. Pujolle, A Long
[4] C. Deepika, S.S. Manminder, Stock Short-Term Memory Recurrent Neural
Direction Forecasting Techniques: An Network Framework for Network Traffic
Empirical Study Combining Machine Matrix Prediction, 2017.
Learning System with Market Indicators [14] Y.L. Kong, Q. Huang, C. Wang, J. Chen,
in the Indian Context, 2014. J. Chen and D. He, Long Short-Term
[5] R.D. Smruti, M. Debahuti, R. Minakhi, A Memory Neural Networks for Online
hybridized ELM-Jaya forecasting model Disturbance Detection in Satellite Image
for currency exchangeprediction, 2017. Time Series, 2018.
[6] K. Mergani, Xu-Ning, T. Nashat, AL- [15] Yi Fei Li, Han Cao, Prediction for
Jallad, Hybrid Forecasting Scheme for Tourism Flow based on LSTM Neural
Financial Time-Series Data using Neural Network, 2017.
Network and Statistical Methods, 2017. [16] K. Greff, R. K. Srivastava, J. Koutnık,
[7] J. G. Osório, M. Lotfi, M. Shafie-khah, B.R. Steunebrink, J. Schmidhuber,
M.A. Vasco Campos and P. S. Catalão LSTM: A Search Space Odyssey, 2017.
João, Hybrid Forecasting Model for Short- [17] J. Maria P. Junior and G. A. Barreto,
Term Electricity Market Prices with Long-Term Time Series Prediction with
Renewable Integration, 2018. the NARX Network: An Empirical
[8] A. Azadeh, M. Sheikhalishahi, M. Tabesh Evaluation, 2007.
and A. Negahban, The Effects of Pre- [18] G.A.F. Seber, C.J. Wild, Nonlinear
Processing Methods on Forecasting Regression, John Wiley & Sons, 2003.
Improvement of Artificial Neural [19] F. Feng, H. Chen, X. He, J. Ding, M. Sun,
Networks, 2011. T.S. Chua, Improving Stock Movement
[9] P.Y. Zhou and C.K. Chan, A Feature Prediction with Adversarial Training,
Extraction Method for Multivariate Time 2018.
Series Classification Using Temporal [20] W. Bao, J. Yue, Y. Rao, A deep learning
Patterns, 2015, pp. 409–421. framework for financial time series using
stacked autoencoders and long-short term Intelligence on Financial Engineering and

memory, 2017. Economics, (2012), pp. 1–8.
[21] M. Hagan, H. Demuth, M. Hudson Beale [26] Available online at:
and O. De Jesus, Neural Network Design, https://www.investing.com/
Second Edition, Ed. Martin Hagan, 2012 [27] L.Y. Ren & P. Ren, Revised Mean
[22] M. Louis, Trend Forecasing With Absolute Percentage Errors (MAPE) on
Technical Analysis, 2000, ISBN 1- Errors from Simple Exponential
883272-91-2, pg. 35. Smoothing Methods for Independent
[23] R. Hyndman, Moving averages, 2009, Normal Time Series, Proceedings of
pg. 1. SWDSI, Oklahoma, USA, pp. 602-611,
[24] J. Bollinger, Bollinger On Bollinger 2009.
Bonds, McGraw-Hill, 2002. [28] R. Nau, Forecasting with Moving
[25] M. Butler, D. Kazakov, A learning Averages. Fuqua School of Business,
adaptive Bollinger band system, in IEEE Duke University, 2014
Conference on Computational
Catalina-Lucia COCIANU, Professor, PhD, currently working with

Bucharest University of Economic Studies, Faculty of Cybernetics, Statistics
and Informatics, Department of Informatics and Cybernetics in Economy.
Competence areas: machine learning, statistical pattern recognition, digital
image processing. Research in the fields of pattern recognition, data mining,
signal processing. Author of 20 books and more than 100 papers published in
national and international journals and conference proceedings.
Mihai-Șerban AVRAMESCU graduated the Faculty of Cybernetics,

Statistics and Economic Informatics in 2015. He has graduated the Economic
Informatics Master program organized by the Bucharest University of
Economic Studies, in 2017. In present he is working on his PhD degree in
Economic Informatics with the Doctor’s Degree Thesis: “Analyzing and
predicting economic data using Machine Learning techniques”.
© 2020. This work is published under
https://creativecommons.org/licenses/by/4.0(the “License”). Notwithstanding
the ProQuest Terms and Conditions, you may use this content in accordance
with the terms of the License.

NARX Model1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

NARX Model1

Uploaded by

Copyright:

Available Formats

Informatica Economică vol. 24, no.

The Use of LSTM Neural Networks to Implement the NARX Model.

Cătălina-Lucia COCIANU, Mihai-Șerban AVRAMESCU

Fig. 1. The architecture of the three-layered NARX network model

state of the memory cell should alter the state

Fig. 2. Standard LSTM cell

𝑀𝐴𝑉𝑛 (𝑡) = ∑ 𝛼𝑖 ∙ 𝑉𝑡−(𝑖−1) (12)

evaluate the prediction capabilities of models.

5 Experimental Results Based on the values shown in table 1, we have

Table 3. The forecasting results using NARX NN for NARX model

Table 4. The forecasting results using LSTM NN for NARX model

Fig. 3. The forecasting results using the model (19)

Fig. 4. The forecasting results using NARX NN in model (2)

Fig. 5. The forecasting results using LSTM NN in model (2)

References [10] G. Jianbo, S. Hussain, J. Hu and W.W.

stacked autoencoders and long-short term Intelligence on Financial Engineering and

Catalina-Lucia COCIANU, Professor, PhD, currently working with

Mihai-Șerban AVRAMESCU graduated the Faculty of Cybernetics,

You might also like