You are on page 1of 5

2021 International Conference on Emerging Smart Computing and Informatics (ESCI)

AISSMS Institute of Information Technology, Pune, India. Mar 5-7, 2021

Analysis of Algorithmic Trading with Q-Learning in


the Forex Market
2021 International Conference on Emerging Smart Computing and Informatics (ESCI) | 978-1-7281-8519-4/20/$31.00 ©2021 IEEE | DOI: 10.1109/ESCI50559.2021.9396948

Aruquipa A. Grover Rojas S. Gabriel


Centro de Investigación, Desarrollo e Centro de Investigación, Desarrollo e
Innovación en Ingeniería Mecatrónica Innovación en Ingeniería Mecatrónica
Universidad Católica Boliviana “San Pablo” Universidad Católica Boliviana “San Pablo”
La Paz, Bolivia La Paz, Bolivia
grover.aruquipa.9@gmail.com gabriel.rojas.ucb@gmail.com

Abstract - This work shows an implementation of Deep functions equipped with an adaptation mechanism based on
reinforcement learning in currency pairs for the FOREX reinforced learning, although the document presents results of
currency market, using deep learning techniques combined with gains of around 85%, its design has an approach like a learning
reinforced learning, profit is obtained using databases extracted robot, since it does not consider adaptation with fundamental
from recent years, the results found are analyzed, such as the loss news or non-movements fractals and in the same way,
function, compensation and behavior. of the system. evaluation data for the algorithm are not shown. Finally, a
recently published work about forecasting through Deep
Keywords —FOREX, DQN, trading, forecasting, back testing,
learning in [6] shows empirical results based on profit, without
indicator.
having the ability to evaluate the capacity of the algorithm as
I. INTRODUCTION such, this work has a point of great interest that is the
comparison in different frameworks of deep learning in
This paper discusses learning techniques applied to FOREX and a basis for finding the correct parameters hyper
reinforced market exchange FOREX (Foreign exchange) using deep neural network focused on the forex market.
prediction algorithms and techniques specifically, and
forecasting Deep Reinforcement Learning. When observing the recent antecedents in [7], [8] and [6] on
the application of machine learning for the foreign exchange
Thus, throughout history, there have been attempts to market, it is observable that it is mainly used an architecture
implement automatic trading systems based mainly on with memory "LSTM", through which it is sought in First, find
probabilistic techniques such as linear models such as the fractal behaviors in the market, in such a way as to have the
ARIMA and ARMAX models. Although there are robust ability to apply intraday forecasting with trading techniques.
prediction algorithms based on dynamic programming, these
are exclusively oriented to the stock market, thus having a Thus, the use of a hybrid architecture based on the Deep
totally different design and dynamics. Since the market for Learning model. and Q-Learning offers the advantage of being
buying shares such as Microsoft or Facebook; therefore, is able to have an approximation in forecasting in such a way as
mainly linked to market sentiment and a constant rise or fall in to be able to use the algorithm in the first instance as a type
price over a period of time of years, in this way if you indicator hunter and at the same time as a methodology back
investigate algorithms with machine learning applied to the testing in the operations carried out.
FOREX market, it is observable that there is not a very So, when it comes to Q-learning in this field, the main
extensive bibliography due to the high volatility of this market objective is the maximization of profits. So, Q-learning as an
and that its movements are not directly based on market algorithm is a support to have a dynamic indicator, which is not
sentiments, but rather have a high correlation with world impact based on models like Fibonacci or models like Bollinger bands.
news and above all it is a manipulated market by Market This is how this work shows the adaptation of Q-learning based
Makers. on Deep Learning, in the FOREX market in such a way as to
In this way trying to make predictive models for this market obtain a tool adaptable to the market for specific currencies.
is a very expensive process, so observing these previous
parameters it is observable the need to apply dynamic II. BACKGROUND
programming methodologies adjustable in time, since the A. Deep Q learning.
relationship and correlation between currencies does not always
have fractal behavior, from the previous point it is encoded Deep Q learning can be defined as Q-learning with its Q-
found works such as [1], [2], [3] and [4] where the authors seek table replaced by a neural network for its optimization, so this
to apply forecasting using a model with memory "LSTM" proposed approach aims to find that it is possible to iterate the
(Long short-term memory), the authors in [1], get to an state space and the action space, through a network neuronal to
accuracy of 76% using a USD/CNY currency pair, as the approximate the Q function, then the neural network can be
document suggests the use of hybrid adaptation architectures, at trained for the error between the Q-values, in this way the
the same time, at the same time at work [5] it is possible to formulation is given as follows [9].
build a neural network structure that provides memory ‫ܮ‬ሺఏሻ ൌ ‫ܧ‬ሾሺܽ݊݀௧ െ ܳሺ‫ݏ‬௧ ǡ ܽ௧ Ǣ ߠሻሻଶ ሿ (1)

978-1-7281-8519-4/21/$31.00 ©2021 IEEE 73

Authorized licensed use limited to: University of Canberra. Downloaded on May 23,2021 at 03:57:09 UTC from IEEE Xplore. Restrictions apply.
In (1) the parameter ܽ௧ represents the agent's action, ‫ݏ‬௧
shows the current state of the agent given as a real number that
represents the reward of the selected action, ߠ indexes the mean
square error on the other hand ܽ݊݀௧ defined as the objective
function, is defined by:
ܽ݊݀௧ ൌ ‫ ݎ‬൅ ߛܳ‫ כ‬ሺ‫ ݏ‬ᇱ ǡ ܽᇱ ǡ ߠሻ (2)
‫כ‬ ᇱ ᇱ
Where ܳ ሺ‫ ݏ‬ǡ ܽ ǡ ߠሻ is the new renewed state, at the same
time it is assumed that future rewards have a penalty given by
ߛ. Up to this point the model presented is based on the work
presented [10] [9] for the stock market, a search is made for an
optimization factor in the algorithm, given by the decrease in
the stochastic gradient given by:
ߘఏ ‫ܮ‬௧ ሺߠ௧ ሻ ൌ ‫ܧ‬௦ǡ௔ǡ௥ǡ௦ᇲ ሾ‫ݕ‬௧ െ ܳఏ೟ ሺ‫ݏ‬ǡ ܽሻߘఏ೟ ሺ‫ݏ‬ǡ ܽሻሿ (3)
Where ‫ ݐ‬is defined by the time steps Fig. 2. Q learning algorithm for analysis in the FOREX market.

In this way, Fig. 2 shows the steps to be carried out based


on Fig. 1 and equations (1), (2) and (3).
Subsequently, different operations are considered in which
the agent interacts with an E environment, in this case the
FOREX currency market, performing a sequence of actions
along the data frame, observing penalties and rewards. In each
time sample, the agent selects a buy, sell or stop action, from a
set of actions defined within the evaluation function, ‫ ܣ‬ൌ
ሼͳǡ ǥ ‫ܭ‬ሽ. The action is passed to the agent priori and is able to
modify its internal state and achieve change their status to buy
or sell, also the environment must be due to the stochastic
behavior of the data. The agent's goal is to interact with the
environment by selecting stocks in a way that maximizes future
earnings. We make the standard assumption that future rewards
are discounted by a Bellman factor initially of 0.92.
Fig. 1. Deep Q compensation model -Learning.
B. Implementation parameters
Fig. 1 shows the referential model using, the development For the implementation of algorithm 1, it is necessary to
environment is created from the data obtained from the consider that the databases used for training and validation are
currency pairs, the evaluation function, and the policies are extracted from the page finance.Yahoo.com. Showing the
given according to (1), (2) and (3). opening and closing price parameters mainly, it should be taken
In this way, following works [2] and [6] as a reference, a Note. Also, that the prices are given in intervals of days, which
methodology is sought, the rules and conditions for automatically removes the noise from intraday operations.
implementation in the foreign exchange market, and a TABLE I. EUR/USD DATABASE SAMPLES.
formulation for profit maximization is also sought.
Date Open High Low Close Adj
III. METHODOLOGY Close
1/5/2015 0.83206 0.83235 0.83126 0.8321 0.8321
A. Formulation of the parameters of the trading algorithm.
1/6/2015 0.83222 0.83256 0.83126 0.83223 0.83223
Considering the difference between the stock market and
1/7/2015 0.83254 0.83268 0.83195 0.83251 0.83251
the currency market and the experience of professional traders
shown in [11], [12] and [13] with technical methods, the 1/8/2015 0.83257 0.83269 0.83195 0.83253 0.83253
following parameters are taken to formulate the problem. 1/9/2015 0.83256 0.83269 0.83195 0.83251 0.83251
The states ‫ ݏ‬ൌ  ሾ‫݌‬ǡ ݄ǡ ܾሿ the variable ‫ ݌‬defines the states of 1/12/2015 0.8325 0.83269 0.83195 0.83256 0.83256
the model where p includes the market price information, ݄ the 1/13/2015 0.83257 0.83268 0.83195 0.83257 0.83257
stock trend and ܾ the balance after trading, there is also no limit
to the number of traders, the agent can modify the buy and sell 1/14/2015 0.8326 0.83269 0.83195 0.83259 0.83259
values according to the algorithm. The Q values or states are 1/15/2015 0.83255 1.1566 0.83195 0.83255 0.83255
started randomly, the algorithm is evaluated in 100 and 50 1/16/2015 1.0057 1.0183 0.97554 1.0051 1.0051
times as they do in [9], finally a penalty with a value of 0.97 is
considered. 1/19/2015 1.0072 1.0076 0.98103 1.0074 1.0074

74

Authorized licensed use limited to: University of Canberra. Downloaded on May 23,2021 at 03:57:09 UTC from IEEE Xplore. Restrictions apply.
Subsequently, the implementation of the algorithm takes as
a fundamental basis the work [14] and [7], where the
application of Deep Q learning in market actions can be
observed, based on this algorithm and Remodifying the reward
function of:

ܴ݁‫ ݀ݎܽݓ‬ൌ ሺ‫ ݁ܿ݅ݎ݌ݐ݊݁ݎݎݑܥ‬െ ܲ‫ݏ݁ݎ݄ܽݏݏ݁ݎ݄ܽݏ݂݋݁ܿ݅ݎ‬ሻ


‫ݏ݇ܿ݋ݐݏ݂݋݁ݎ݄ܽݏ݋ݎܰ כ‬

To the expression:
ܴ݁‫ ݀ݎܽݓ‬ൌ ሺ‫ ݁ܿ݅ݎ݌ݐ݊݁ݎݎݑܥ‬െ ܵ‫ ݏ݁ܿ݅ݎ̴ܲ݇ܿ݋ݐ‬െ ‫݊݋݅ݏݏ݅݉݉݋ܥ‬ሻ
‫ݏ݇ܿ݋ݐݏ݂݋݁ݎ݄ܽݏ݋ݎܰ כ‬
Fig. 4. CHF/EUR reward function.
The reward expression presents the reward function, which
takes into account the time with the highest volume movement B. EUR/USD currency pair
in the market, thus presenting a commission of 0.2. It should be
The results shown for the EUR/USD are relatively different
noted that this value depends on the trading company used.
with respect to the CHF/EUR pair, although there is a
IV. RESULTS correlation between these two pairs given by the EUR currency.
It is necessary to point out that this pair has a greater movement
In this section of results, the results applied based on the in the market, put mainly by the USD [15], [16]. Fig. 5 and Fig.
methodology shown in Fig. 3 are analyzed, it is necessary to 6 show the loss function and the behavior of the penalty and
point out that from (1) and (2). It was implemented with reward in this currency, it is observed that the model presented
databases collected from CHF/EUR currencies, EUR/USD, in this work has a better behavior in this currency, thus showing
GBP/USD and NZD/JPY analyzing how the behavior of the its applicability.
model started and the profit and loss results for each currency.
A. CHF/EUR currency pair
When analyzing the results, it is necessary to know the
bases and characteristics of this currency, so the CHF/EUR
currency is used more fluidly in swing-type strategies [15],
[16]. Its volatility has a relative stability, taking into account
that in times of economic uncertainty it was shown that it
follows a trend in favor of the CHF [16]. Regarding the loss
function, a variation less than a value of 10 is observed in the
loss function, although there is an output peak up above value
60, the behavior is relatively linear after 20 epochs.

Fig. 5. EUR/USD loss function.

Fig. 3. CHF/EUR loss function.

Regarding the compensation and penalty of the system, it is


shown that after 30 epochs there is a behavior without high
variance with respect to previously trained behavior. Fig. 6. Reward function EUR/USD

75

Authorized licensed use limited to: University of Canberra. Downloaded on May 23,2021 at 03:57:09 UTC from IEEE Xplore. Restrictions apply.
C. GBP/USD currency pair
For the GBP/USD currency, it is necessary to know that it is
a highly volatile currency because both the GPB and the USD
influence the market in an independent way, thus making
construction of Correlation models or even forecasting highly
difficult [15],[16]. It is observable from the loss and the
compensation function that the proposed model is not
applicable for this pair.

Fig. 10. NZD/JPY reward function.

TABLE II. PROFIT OF CURRENCIES.

DQN
Predicted profit Predicted profit
CURRENCIES in training in testing
EUR/USD 29 64
Fig. 7. GBP/USD loss function.
CHF/EUR 0 0
GBP/USD 0 1
NZD/JPY 21 199

Thus, the most important point with respect to the expressed


results is the take profit or profit, due to the fact that in table 1
the profits for the selected currencies are observed. Thus, it is
observable that the model is applicable with a profit reliable for
the EUR/USD and NZD/JPY currencies. Even so in this system
variables such as a STOP LOSS or a STOP LOSS Variable that
implies a risk control are not taken. In this way, the final results
show a reliability of the algorithm to be used as
BACKTESTING indicator in the currencies EUR/USD and
NZD/JPY.
Fig. 8. GPB/USD reward function.
V. CONCLUSIONS
D. NZD/JPY currency pair
This work shows a novel approach to the application of Q
Finally, the Asian pair NZD/JPY has a high volatility learning in the FOREX market, considered one of the most
worldwide, due to the constant movement in this pair [15]. It is volatile markets. Unlike previous works developed in the stock
observed from Fig. 9 and Fig. 10 that the loss function has high market, this algorithm considers the variability of the market,
volatility. It shows that, the model is not applicable if it is used involving an optimization function and time modifying the
in intraday strategies. reward and penalty parameters, the sampling frequency and the
number of epochs to be used in the trading techniques. Thus,
the results presented clearly show profit gains, which, although
they are not very high compared to share prices, considering
previous work in the FOREX market mentioned above, a high
profit is achieved. Therefore, the results of the proposed
algorithm in the EUR/USD and NZD/JPY currencies show that
this approach can be used as a great back testing tool, using it
for the validation of strategies based on technical and
fundamental news, methods and indicators such as like MACD
or RSI. Well, for future work, the aim is to implement the
algorithm in real-time systems with scalping strategies in such a
way that profit is maximized in smaller time intervals, whether
they are seconds or minutes. It is also expected to be able to
Fig. 9. NZD/JPY loss function.

76

Authorized licensed use limited to: University of Canberra. Downloaded on May 23,2021 at 03:57:09 UTC from IEEE Xplore. Restrictions apply.
perform a multipoint prediction in the prediction with the [8] L. Dymova, P. Sevastjanov, and K. Kaczmarek, “A Forex trading expert
current algorithm. system based on a new approach to the rule-base evidential reasoning,”
Expert Syst. Appl., vol. 51, pp. 1–13, Jun. 2016, doi: 10.1016 /
Finally, for a future implementation it is recommended to j.eswa.2015.12.028.
use high volume coins capable of moving a high number of [9] L. Chen and Q. Gao, “Application of deep reinforcement learning on
pips in the market through quick operations and at the same automated stock trading,” Proc. IEEE Int. Conf. Softw. Eng. Serv. Sci.
ICSESS, vol. 2019-Octob, pp. 29–33, 2019, doi: 10.1109 /
time to attach a sentiment analysis tool to this algorithm in such ICSESS47205.2019.9040728.
a way that it is able to quantify the sentiments of market
[10] Z. Ju, Y. Liu, D. Zhou, and R. Goebel, Series Editors. 2019.
operators.
[11] T. Nguyen Thi Thu and V. Dang Xuan, “FoRex Trading Using
Supervised Machine Learning,” Int. J. Eng. Technol., vol. 7, no. 4.15, p.
REFERENCES 400, 2018, doi: 10.14419 / ijet.v7i4.15.23024.
[1] T. Zhou, "trend forecasting based on long short-term memory and its [12] JS Chou, DN Truong, and TL Le, “Interval Forecasting of Financial
variations with hybrid activation functions.", Brunel University London, Time Series by Accelerated Particle Swarm-Optimized Multi-Output
2020. Machine Learning System,” IEEE Access, vol. 8, no. 2008, pp. 14798–
[2] L. Di Persio and O. Honchar, “Artificial neural networks architectures 14808, 2020, doi: 10.1109 / ACCESS.2020.2965598.
for stock price prediction: Comparisons and applications,” Int. J. [13] J. Chan, Automation of Trading Machine for Traders How to Develop
Circuits, Syst. Signal Process., vol. 10, pp. 403–413, 2016. Trading Models .
[3] C. Sang and M. Di Pierro, “Improving trading technical analysis with [14] S. Selvin, R. Vinayakumar, EA Gopalakrishnan, VK Menon, and KP
TensorFlow Long Short-Term Memory (LSTM) Neural Network,” J. Soman, “Stock price prediction using LSTM, RNN and CNN-sliding
Financ. Data Sci., Vol. 5, no. 1, pp. 1–11, 2019, doi: 10.1016 / window model,” 2017 Int. Conf. Adv. Comput. Commun. Informatics,
j.jfds.2018.10.003. ICACCI 2017, vol. 2017-Janua, pp. 1643–1647, 2017, doi: 10.1109 /
[4] T. Kim and HY Kim, “Forecasting stock prices with a feature fusion ICACCI.2017.8126078.
LSTM-CNN model using different representations of the same data,” [15] F. Serrano, "Day trading and stock market operations for."
PLoS One, vol. 14, no. 2, pp. 1–23, 2019, doi: 10.1371 / [16] PA From, "Advanced financial trading program financial trading."
journal.pone.0212320.
[5] MR Alimoradi and A. Husseinzadeh Kashan, “A league championship
algorithm equipped with network structure and backward Q-learning for
extracting stock trading rules,” Appl. Soft Comput. J., vol. 68, pp. 478–
Grover Aruquipa Aruquipa
493, 2018, doi: 10.1016 / j.asoc.2018.03.051.
Mechatronic engineer by profession.
[6] AJ Dautel, WK Härdle, S. Lessmann, and H.-V. Seow, “Forex exchange
rate forecasting using deep recurrent neural networks,” Digit. Financ., Investor in the FOREX stock market and
no. 0123456789, 2020, doi: 10.1007 / s42521-020-00019-x. market shares, currently a member of the
[7] T. Théate and D. Ernst, “An Application of Deep Reinforcement StartTrade Investments community.
Learning to Algorithmic Trading,” 2020.

77

Authorized licensed use limited to: University of Canberra. Downloaded on May 23,2021 at 03:57:09 UTC from IEEE Xplore. Restrictions apply.

You might also like