Professional Documents
Culture Documents
NARX Model1
NARX Model1
1/2020 5
The paper focuses on financial data forecasting in terms of one-step-ahead nonlinear model
with exogenous inputs. The main aim is the development of a methodology to forecast the
exchange rate between EURO and US Dollar. The prediction task is carried out by two
recurrent neural networks, the standard NARX neural network and a LSTM-based approach.
The exogenous inputs consist of historical trading data and three widely used technical
indicators, namely a variant of moving average, the Upper Bollinger Frequency Band and the
Lower Bollinger Frequency Band. In order to obtain accurate forecasting algorithms, the
exogenous inputs are filtered using the well-known Gaussian low-pass filter. The quality of
each method is evaluated in terms of both quantitative and qualitative metrics, namely the root
mean squared error, the mean absolute percentage error, and the prediction of change in
direction. Extensive experiments point out that the most suited forecasting method is based on
the proposed LSTM neural network for NARX model.
Keywords: Exchange Rate Forecasting, NARX Model, NARX Networks, LSTM Networks,
statistical metrics
DOI: 10.24818/issn14531305/24.1.2020.01
1 Introduction
Modern financial markets are chaotic and
very hard to predict systems mainly due to
high forecasting accuracy machine-learning
techniques have been reported in literature
([1], [2], [3], [4]).
their volatile features and non-linearity. Nowadays, the field of financial forecasting is
Therefore, stock market prediction is regarded developing fast, the financial data being the
as one of the most challenging applications main valuable source of information for the
among time series forecasting and it is under investors and traders. Most of the modern
continuous investigations for years. A variety studies on financial data forecasting are
of statistical methods have been introduced to focused on developing hybridized models
the time series domain and then applied to the which can combine the advantages of
problem of financial data analysis and statistical, data mining and machine learning
forecasting, such as the autoregressive (AR) algorithms (see de [5], [6], [7]). On one hand,
family models, exponential smoothing, and the aim of using combined models is to
moving average (MA) techniques. However, process the financial datasets with data mining
conventional statistical models mostly failed techniques before considering them for
to capture the complexity and non-stationary learning process in order to remove noise or to
structure of financial data. During the past smooth data. Several data pre-processing
decades, machine learning (ML) models techniques, as for instance feature extraction
turned into serious competitors to classical and feature selection, could decrease the
statistical models in the time series analysis redundancy and noise in time series data and
and forecasting field. The most successful consequently increase the prediction models
ones, such as artificial neural networks (ANN) performances ([8], [9], [10], [11]). On the
and support vector machines (SVM) have other hand, since the exact price of stocks is
been widely used to predict financial time unpredictable, a series of research works are
series due to their ability to learn from focused on predicting the price movements,
nonlinear data and successfully perform therefore the problem of data forecasting is
classification and prediction tasks. Several reduced to a classification problem [12].
6 Informatica Economică vol. 24, no. 1/2020
In the literature, one of the most recently the predicted value of Yt . The general NARX
explored ML field is the deep learning forecasting model is expressed by [17]:
research area (DL). A series of recurrent (𝑑 ) (𝑑 )
Ŷ(t+p) = f (𝑌𝑡 𝑌 , 𝑋𝑇𝑡 𝑋𝑇 ) (1)
neural networks, such as convolutional neural
networks and long-short time memory where
(LSTM) have been developed to solve f is a non-linear function
(𝑑 )
classification and prediction tasks ([13], [14], 𝑌𝑡 𝑌 = {Yt , Yt−1 , Yt−2 , … , Yt−𝑑𝑌 +1 }
(𝑑𝑋𝑇 )
[15], [16]). The research reported in this paper 𝑋𝑇𝑡 =
aims to investigate the potential of the LSTM- (𝑑 ) (𝑑 ) (𝑑𝑋𝑇 )
{𝑋𝑇𝑡 𝑋𝑇 (1), 𝑋𝑇𝑡 𝑋𝑇 (2), … , 𝑋𝑇𝑡 (𝑁 )}
based approaches to implement the NARX
(𝑑 )
model. for each 1 ≤ 𝑖 ≤ 𝑁 𝑋𝑇𝑡 𝑋𝑇 (𝑖 ) =
The outline of the paper is as follows. In {XTt (𝑖 ), XTt−1 (𝑖 ), … , XTt−𝑑𝑋𝑇(𝑖)+1 (𝑖 )}.
Section 2 the NARX prediction model is
discussed briefly. Next, the NARX neural In our study we consider p=1, the equation (1)
network and the LSTM networks are describing the one-step ahead prediction
described. The proposed methodology is model. In order to simplify (1), we define
provided in the fourth section of the paper. In 𝑑 = max{𝑑𝑌 , max{𝑑𝑋𝑇 (1), … , 𝑑𝑋𝑇 (𝑁)}}
Section 5 the experimental results of the and the prediction model
proposed methodology together with a
comparative analysis are presented. The (𝑑) (𝑑)
concluding remarks are given in the final Ŷ(t+1) = f(𝑌𝑡 , 𝑋𝑇𝑡 ) (2)
section of the paper.
where
2 The NARX prediction model. The f is a non-linear function
(𝑑)
NARX Neural Networks 𝑌𝑡 =
The NARX model (Nonlinear Autoregressive {Yt , Yt−1 , Yt−2 , … , Yt−d+1 }
with eXogenous input) is well suited for (𝑑)
𝑋𝑇𝑡 =
modelling non-linear time series as well as for
{XTt , XTt−1 , XTt−2 , … , XTt−d+1 }
one-step and multi-step ahead prediction. To 𝑇
develop the general forecasting model in case XTt = (XTt (1), XTt (2), … , X𝑇t (𝑁))
of non-stationary time series one can use
Partial Autocorrelation Function (PACF) to The non-linear function f can be computed by
establish the delay variable. a neural network, the most commonly choice
In the following we denote by: being the NARX networks.
T - the total number of time periods The NARX neural networks (NARXNN) are
𝑌 - the variable to be forecast recurrent dynamic networks (RNN) specially
𝑋𝑇 - the set of N latent variables uses to tailored to model nonlinear systems, as for
forecast, instance time series. Moreover, NARX
𝑇 networks have high generalization
XT = (𝑋𝑇(1), 𝑋𝑇(2), … , 𝑋𝑇(𝑁)) performance and good learning ability which
𝑑𝑌 , 𝑑𝑋𝑇 = (𝑑𝑋𝑇 (1), … , 𝑑𝑋𝑇 (𝑁)) - the makes them well suited for financial data. The
corresponding delays. NARXNN topology consists of three layers
Note that the delay variable of a time series X connected with each other, namely an input
corresponds to the lag d, where the PACF layer 𝐹𝑋 , a hidden layer 𝐹𝐻 , and an output
function computed for X drops immediately layer 𝐹𝑌 . The general architecture of the
after the dth lag. For 1 ≤ 𝑡 ≤ 𝑇, Yt is the value NARXNN is displayed in Figure 1.
at the moment of time t, and we denote by ̂ Yt
Informatica Economică vol. 24, no. 1/2020 7
The training of NARXNN is of supervised of time steps dependencies. The general idea
type and usually uses a gradient descent of LSTM is to recurrently project the input
approach. In most of the cases, the local sequence into a sequence of hidden
memories of 𝐹𝐻 and 𝐹𝑌 are determined by the representations. At each time-step, the LSTM
Levenberg-Marquardt variant of the learns the hidden representation by jointly
backpropagation learning algorithm [18]. considering the associate input and the
Since the actual output is available during the previous hidden representation to capture the
training of the network, a series-parallel sequential dependency. [19]
architecture is created, where the estimated The learning process is accomplished using
target is replaced by the actual output. After specific memory blocks located in the
the training step, the series-parallel recurrent hidden layer. The memory blocks
architecture is converted into a parallel are created from auto-connected cells in
configuration, in order to perform the which neural network temporal states are
prediction task. The standard performance saved. Also, each block has an input structure
function is defined in terms of mean square and an output structure. The LSTM has the
network errors. Denoting by |. | the number of abilities to remove or add information to the
elements of the argument, the sizes of the cell state. A memory cell consists of four
hidden layers 𝐹𝐻 can be computed many units: an input gate, a forget gate, an output
ways, some of the most frequently used gate, and a self-recurrent neuron. Each unit
expressions being cell controls the interactions between
neighbouring memory cells and the memory
|𝐹𝐻 | = 2 [√(|𝐹𝑌 | + 2)|𝐹𝑋 |] (3) cell itself. The input gate decides if the input
modifies the state of the memory cell. The
forget gate chooses whether to remember or to
3 The LSTM Neural Networks forget the previous state of the memory cell.
Long Short Term Memory is a type of RNN The output gate has the role to decide if the
architecture that allows the long-time learning
8 Informatica Economică vol. 24, no. 1/2020
The training difficulty occurs when there are the time series V is computed by averaging the
temporal dependencies at high distances values of the past n recorded values, as
between the time series values. In this case, follows
𝑛
small changes in the iterative flow could lead 𝑉𝑡−(𝑖−1)
𝑉( ) (11)
to high effects over future iterations results. 𝑀𝐴𝑛 𝑡 = ∑
𝑛
The number of hidden neurons is usually 𝑖=1
computed based on the number of training
samples, the input size and the dimension of Note that all the previous values used to
output as follows [21] compute the MA indicator (11) are weighted
1
𝑇 by the same value 𝑝 = 𝑛. In case n is large,
|𝐹𝐻 | = (10)
𝛼 ∙ (|𝐹𝑋 | + |𝐹𝑌 |) one can consider the weights such that higher
values are associated to more current data and
where 𝛼 is an arbitrary scaling factor, 𝛼 ∈ lower values are considered for the older data.
[2,10]. Note that smaller values of the The version of MA used in our study is
parameter 𝛼 could lead to non-overfitting inspired by the exponential ranking
learning schemes, with significantly large distribution probability and it is defined by:
computational effort. 𝑛
We implemented the NARX forecasting The POCID values vary between 55% in case
model (2) using either the standard NARX of using NARX NN and 60% when LSTM
neural network described in Section 2 or the NN is considered.
LSTM neural network presented in Section 3. Based on the obtained results, we conclude
The results are shown in Table 3 and Table 4. that the LSTM neural network is more suited
The graphic representation of forecasted than standard NARX neural network to
values versus the true ones corresponding to forecast future values of the analysed data
the data in Table 2, Table 3 and Table 4 is from both accuracy and stability points of
depicted in Figure 3, Figure 4 and Figure 5, view.
respectively.
12 Informatica Economică vol. 24, no. 1/2020
6 Conclusions and Suggestions for Further indicators class. In order to obtain accurate
Work results, the exogenous inputs are filtered using
The reported work focuses on the the well-known Gaussian low-pass filter.
development of financial data forecasting To establish meaningful conclusions
methods to implement the one-step-ahead regarding the accuracy of the obtained results,
nonlinear model with exogenous inputs. In various performance metrics have been
our study the main aim is to obtain a settled. Extensive experiments pointed out
prediction methodology to forecast the that the most suited forecasting method is
exchange rate between EURO and US Dollar. based on the proposed LSTM neural network
The prediction task is carried out by two for NARX model.
recurrent neural networks, the standard We conclude that the results are encouraging
NARX neural network and a LSTM-based and entail future work toward extending this
approach. We analysed five exogenous time approach to more complex NN-based models
series, two of them of stock-based type and the as well as hybrid techniques.
other three belonging to the technical