Professional Documents
Culture Documents
Abstract - This work shows an implementation of Deep functions equipped with an adaptation mechanism based on
reinforcement learning in currency pairs for the FOREX reinforced learning, although the document presents results of
currency market, using deep learning techniques combined with gains of around 85%, its design has an approach like a learning
reinforced learning, profit is obtained using databases extracted robot, since it does not consider adaptation with fundamental
from recent years, the results found are analyzed, such as the loss news or non-movements fractals and in the same way,
function, compensation and behavior. of the system. evaluation data for the algorithm are not shown. Finally, a
recently published work about forecasting through Deep
Keywords —FOREX, DQN, trading, forecasting, back testing,
learning in [6] shows empirical results based on profit, without
indicator.
having the ability to evaluate the capacity of the algorithm as
I. INTRODUCTION such, this work has a point of great interest that is the
comparison in different frameworks of deep learning in
This paper discusses learning techniques applied to FOREX and a basis for finding the correct parameters hyper
reinforced market exchange FOREX (Foreign exchange) using deep neural network focused on the forex market.
prediction algorithms and techniques specifically, and
forecasting Deep Reinforcement Learning. When observing the recent antecedents in [7], [8] and [6] on
the application of machine learning for the foreign exchange
Thus, throughout history, there have been attempts to market, it is observable that it is mainly used an architecture
implement automatic trading systems based mainly on with memory "LSTM", through which it is sought in First, find
probabilistic techniques such as linear models such as the fractal behaviors in the market, in such a way as to have the
ARIMA and ARMAX models. Although there are robust ability to apply intraday forecasting with trading techniques.
prediction algorithms based on dynamic programming, these
are exclusively oriented to the stock market, thus having a Thus, the use of a hybrid architecture based on the Deep
totally different design and dynamics. Since the market for Learning model. and Q-Learning offers the advantage of being
buying shares such as Microsoft or Facebook; therefore, is able to have an approximation in forecasting in such a way as
mainly linked to market sentiment and a constant rise or fall in to be able to use the algorithm in the first instance as a type
price over a period of time of years, in this way if you indicator hunter and at the same time as a methodology back
investigate algorithms with machine learning applied to the testing in the operations carried out.
FOREX market, it is observable that there is not a very So, when it comes to Q-learning in this field, the main
extensive bibliography due to the high volatility of this market objective is the maximization of profits. So, Q-learning as an
and that its movements are not directly based on market algorithm is a support to have a dynamic indicator, which is not
sentiments, but rather have a high correlation with world impact based on models like Fibonacci or models like Bollinger bands.
news and above all it is a manipulated market by Market This is how this work shows the adaptation of Q-learning based
Makers. on Deep Learning, in the FOREX market in such a way as to
In this way trying to make predictive models for this market obtain a tool adaptable to the market for specific currencies.
is a very expensive process, so observing these previous
parameters it is observable the need to apply dynamic II. BACKGROUND
programming methodologies adjustable in time, since the A. Deep Q learning.
relationship and correlation between currencies does not always
have fractal behavior, from the previous point it is encoded Deep Q learning can be defined as Q-learning with its Q-
found works such as [1], [2], [3] and [4] where the authors seek table replaced by a neural network for its optimization, so this
to apply forecasting using a model with memory "LSTM" proposed approach aims to find that it is possible to iterate the
(Long short-term memory), the authors in [1], get to an state space and the action space, through a network neuronal to
accuracy of 76% using a USD/CNY currency pair, as the approximate the Q function, then the neural network can be
document suggests the use of hybrid adaptation architectures, at trained for the error between the Q-values, in this way the
the same time, at the same time at work [5] it is possible to formulation is given as follows [9].
build a neural network structure that provides memory ܮሺఏሻ ൌ ܧሾሺܽ݊݀௧ െ ܳሺݏ௧ ǡ ܽ௧ Ǣ ߠሻሻଶ ሿ (1)
Authorized licensed use limited to: University of Canberra. Downloaded on May 23,2021 at 03:57:09 UTC from IEEE Xplore. Restrictions apply.
In (1) the parameter ܽ௧ represents the agent's action, ݏ௧
shows the current state of the agent given as a real number that
represents the reward of the selected action, ߠ indexes the mean
square error on the other hand ܽ݊݀௧ defined as the objective
function, is defined by:
ܽ݊݀௧ ൌ ݎ ߛܳ כሺ ݏᇱ ǡ ܽᇱ ǡ ߠሻ (2)
כ ᇱ ᇱ
Where ܳ ሺ ݏǡ ܽ ǡ ߠሻ is the new renewed state, at the same
time it is assumed that future rewards have a penalty given by
ߛ. Up to this point the model presented is based on the work
presented [10] [9] for the stock market, a search is made for an
optimization factor in the algorithm, given by the decrease in
the stochastic gradient given by:
ߘఏ ܮ௧ ሺߠ௧ ሻ ൌ ܧ௦ǡǡǡ௦ᇲ ሾݕ௧ െ ܳఏ ሺݏǡ ܽሻߘఏ ሺݏǡ ܽሻሿ (3)
Where ݐis defined by the time steps Fig. 2. Q learning algorithm for analysis in the FOREX market.
74
Authorized licensed use limited to: University of Canberra. Downloaded on May 23,2021 at 03:57:09 UTC from IEEE Xplore. Restrictions apply.
Subsequently, the implementation of the algorithm takes as
a fundamental basis the work [14] and [7], where the
application of Deep Q learning in market actions can be
observed, based on this algorithm and Remodifying the reward
function of:
To the expression:
ܴ݁ ݀ݎܽݓൌ ሺ ݁ܿ݅ݎݐ݊݁ݎݎݑܥെ ܵ ݏ݁ܿ݅ݎ̴ܲ݇ܿݐെ ݊݅ݏݏ݅݉݉ܥሻ
ݏ݇ܿݐݏ݂݁ݎ݄ܽݏݎܰ כ
Fig. 4. CHF/EUR reward function.
The reward expression presents the reward function, which
takes into account the time with the highest volume movement B. EUR/USD currency pair
in the market, thus presenting a commission of 0.2. It should be
The results shown for the EUR/USD are relatively different
noted that this value depends on the trading company used.
with respect to the CHF/EUR pair, although there is a
IV. RESULTS correlation between these two pairs given by the EUR currency.
It is necessary to point out that this pair has a greater movement
In this section of results, the results applied based on the in the market, put mainly by the USD [15], [16]. Fig. 5 and Fig.
methodology shown in Fig. 3 are analyzed, it is necessary to 6 show the loss function and the behavior of the penalty and
point out that from (1) and (2). It was implemented with reward in this currency, it is observed that the model presented
databases collected from CHF/EUR currencies, EUR/USD, in this work has a better behavior in this currency, thus showing
GBP/USD and NZD/JPY analyzing how the behavior of the its applicability.
model started and the profit and loss results for each currency.
A. CHF/EUR currency pair
When analyzing the results, it is necessary to know the
bases and characteristics of this currency, so the CHF/EUR
currency is used more fluidly in swing-type strategies [15],
[16]. Its volatility has a relative stability, taking into account
that in times of economic uncertainty it was shown that it
follows a trend in favor of the CHF [16]. Regarding the loss
function, a variation less than a value of 10 is observed in the
loss function, although there is an output peak up above value
60, the behavior is relatively linear after 20 epochs.
75
Authorized licensed use limited to: University of Canberra. Downloaded on May 23,2021 at 03:57:09 UTC from IEEE Xplore. Restrictions apply.
C. GBP/USD currency pair
For the GBP/USD currency, it is necessary to know that it is
a highly volatile currency because both the GPB and the USD
influence the market in an independent way, thus making
construction of Correlation models or even forecasting highly
difficult [15],[16]. It is observable from the loss and the
compensation function that the proposed model is not
applicable for this pair.
DQN
Predicted profit Predicted profit
CURRENCIES in training in testing
EUR/USD 29 64
Fig. 7. GBP/USD loss function.
CHF/EUR 0 0
GBP/USD 0 1
NZD/JPY 21 199
76
Authorized licensed use limited to: University of Canberra. Downloaded on May 23,2021 at 03:57:09 UTC from IEEE Xplore. Restrictions apply.
perform a multipoint prediction in the prediction with the [8] L. Dymova, P. Sevastjanov, and K. Kaczmarek, “A Forex trading expert
current algorithm. system based on a new approach to the rule-base evidential reasoning,”
Expert Syst. Appl., vol. 51, pp. 1–13, Jun. 2016, doi: 10.1016 /
Finally, for a future implementation it is recommended to j.eswa.2015.12.028.
use high volume coins capable of moving a high number of [9] L. Chen and Q. Gao, “Application of deep reinforcement learning on
pips in the market through quick operations and at the same automated stock trading,” Proc. IEEE Int. Conf. Softw. Eng. Serv. Sci.
ICSESS, vol. 2019-Octob, pp. 29–33, 2019, doi: 10.1109 /
time to attach a sentiment analysis tool to this algorithm in such ICSESS47205.2019.9040728.
a way that it is able to quantify the sentiments of market
[10] Z. Ju, Y. Liu, D. Zhou, and R. Goebel, Series Editors. 2019.
operators.
[11] T. Nguyen Thi Thu and V. Dang Xuan, “FoRex Trading Using
Supervised Machine Learning,” Int. J. Eng. Technol., vol. 7, no. 4.15, p.
REFERENCES 400, 2018, doi: 10.14419 / ijet.v7i4.15.23024.
[1] T. Zhou, "trend forecasting based on long short-term memory and its [12] JS Chou, DN Truong, and TL Le, “Interval Forecasting of Financial
variations with hybrid activation functions.", Brunel University London, Time Series by Accelerated Particle Swarm-Optimized Multi-Output
2020. Machine Learning System,” IEEE Access, vol. 8, no. 2008, pp. 14798–
[2] L. Di Persio and O. Honchar, “Artificial neural networks architectures 14808, 2020, doi: 10.1109 / ACCESS.2020.2965598.
for stock price prediction: Comparisons and applications,” Int. J. [13] J. Chan, Automation of Trading Machine for Traders How to Develop
Circuits, Syst. Signal Process., vol. 10, pp. 403–413, 2016. Trading Models .
[3] C. Sang and M. Di Pierro, “Improving trading technical analysis with [14] S. Selvin, R. Vinayakumar, EA Gopalakrishnan, VK Menon, and KP
TensorFlow Long Short-Term Memory (LSTM) Neural Network,” J. Soman, “Stock price prediction using LSTM, RNN and CNN-sliding
Financ. Data Sci., Vol. 5, no. 1, pp. 1–11, 2019, doi: 10.1016 / window model,” 2017 Int. Conf. Adv. Comput. Commun. Informatics,
j.jfds.2018.10.003. ICACCI 2017, vol. 2017-Janua, pp. 1643–1647, 2017, doi: 10.1109 /
[4] T. Kim and HY Kim, “Forecasting stock prices with a feature fusion ICACCI.2017.8126078.
LSTM-CNN model using different representations of the same data,” [15] F. Serrano, "Day trading and stock market operations for."
PLoS One, vol. 14, no. 2, pp. 1–23, 2019, doi: 10.1371 / [16] PA From, "Advanced financial trading program financial trading."
journal.pone.0212320.
[5] MR Alimoradi and A. Husseinzadeh Kashan, “A league championship
algorithm equipped with network structure and backward Q-learning for
extracting stock trading rules,” Appl. Soft Comput. J., vol. 68, pp. 478–
Grover Aruquipa Aruquipa
493, 2018, doi: 10.1016 / j.asoc.2018.03.051.
Mechatronic engineer by profession.
[6] AJ Dautel, WK Härdle, S. Lessmann, and H.-V. Seow, “Forex exchange
rate forecasting using deep recurrent neural networks,” Digit. Financ., Investor in the FOREX stock market and
no. 0123456789, 2020, doi: 10.1007 / s42521-020-00019-x. market shares, currently a member of the
[7] T. Théate and D. Ernst, “An Application of Deep Reinforcement StartTrade Investments community.
Learning to Algorithmic Trading,” 2020.
77
Authorized licensed use limited to: University of Canberra. Downloaded on May 23,2021 at 03:57:09 UTC from IEEE Xplore. Restrictions apply.