Commodity and Forex Trade Automation Using Deep Reinforcement Learning

You might also like

You are on page 1of 5

Commodity and Forex trade automation using Deep

Reinforcement Learning
Usha B A Manjunath T N Thrivikram Mudunuri
Professor and Head, Professor, Department of ISE,
Department of ISE, Department of ISE, BMSIT&M
BMSIT&M BMSIT&M Bengaluru, INDIA
Bengaluru, INDIA Bengaluru, INDIA tmudunuri@outlook.com
ushaba@bmsit.in manju.tn@bmsit.in

Abstract— Machine learning is an application of artificial learn from an environment by interacting with it over a
intelligence based on the theory that machines can learn from period of time. This is accomplished by receiving feedback
data, discern patterns and make decisions with negligible from the environment. The agent or bot is rewarded and/or
human intervention. With today's world being inundated by penalized based on the actions taken by it. The agent uses
data, machine learning is very relevant due to the amount of this feedback to decide its next course of action on the
learning potential. Machine learning caters to a variety of environment with the goal of maximizing its reward for
applications including image recognition, speech recognition, every action.
weather prediction, portfolio optimization and so on. The
Forex Exchange is a market that allows traders and investors The union of deep learning and reinforcement learning
to buy, sell and exchange currencies of various nations. It is serves well to address the profit optimization problem in
regarded as the largest financial market with over 5 trillion forex and commodity trading. The associated neural
American dollars in daily trades, which is larger than the network sequential model is involved in action prediction
equity and futures markets combined. The Commodity market and the reinforcement learning agent is used to make
is a market that allows buying, selling and exchanging of raw decisions to maximize reward. Both working in tandem help
materials or primary products. Using the concept of machine optimize profit generation. The action space of trading is
learning, this project aims to develop and introduce an agent to simplified to include either buy, hold or sell. This makes
automate the trade of a given commodity or currency in a
trading an ideal application of deep reinforcement learning
simulated market with the objectives of maximizing returns
and minimizing losses for the trader. The model learns from
[3]. Our agent is trained using historical market data to
trends in historical market data and is capable of buying, observe price trends of currencies and commodities. The
selling or holding a trade at a given instance. The model is agent is then run on an unseen period of forex and
validated by running the agent on unseen market data of a commodity market data in order to maximize profit
later period and the returns generated are analyzed. generation. The return on investment is calculated as a
performance measure of the agent. The goal is to produce a
Keywords— machine learning, deep learning, reinforcement model that maximizes returns while also minimizing losses
learning, commodity trading, forex trading in a bear market.

I. INTRODUCTION II. LITERATURE SURVEY


Machine learning has been the cornerstone of solutions to Typical Reinforcement learning problems encompass
most problems in computer science over the last few solutions to learning what to do—how to map situations to
decades. It has made its way into most modern applications actions—with the objective of maximizing a numerical
of information technology and with that, a rather central, reward signal. They are essentially closed-loop problems as
although usually latent, part of our lives. With the colossal the learning system’s actions influence its later inputs. Most
amounts of data made available to researchers and the public importantly, the learner is not informed of the recommended
in general, there is a good reason to believe that effective and actions, as in many forms of machine learning, but instead
innovative data analysis will become even more prevalent as must discover by itself which actions yield the most reward
a necessary integrant for technological advancement [1]. by trying them all out. In the most interesting and
Machine Learning can be widely classified into supervised challenging cases, actions may affect not only the immediate
learning, unsupervised learning and reinforcement learning. reward but also the next situation and, through that, all
subsequent rewards which is what makes reinforcement
Most of the existing machine learning systems use learning critical in most applications [4].
supervised learning or deep learning to predict prices of
currencies and commodities based on historical market data Supervised learning is the type of machine learning that
– making it a regression problem. This essentially aids is currently studied in the most current research. It involves
human traders to make informed decisions and take learning from a training data set of labelled examples
calculated risks. Since the goal of trading currencies and provided by a competent external overseer. The labelled
commodities is to make a profit and not merely give examples from the training dataset consist of descriptions
predictions, our experiment, instead of just giving out and specifications — the label of the ideal action the system
predictions, uses deep reinforcement learning to make must take in a given situation. The goal of supervised
trading decisions without human intervention [2]. learning is to develop a system that is capable of
Reinforcement learning uses intelligent bots or agents that extrapolating and generalizing its responses to make accurate

978-1-7281-0418-8/19/$31.00©2019 IEEE 27
Authorized licensed use limited to: LAHORE UNIV OF MANAGEMENT SCIENCES. Downloaded on March 12,2024 at 12:06:30 UTC from IEEE Xplore. Restrictions apply.
predictions for situations not presented in the training set.
Despite its relevance and significance in most applications, it
is inadequate for learning from interaction with an TABLE I. DATASETS
environment. Reinforcement learning is different from
supervised learning in that it learns from interaction and Category Dataset
would complement the supervised learning system [4]. This
reiterates that supervised learning, by itself, is not the best Gold (XAU/USD)
approach to a profit optimization problem in forex and Commodities Crude Oil (OPEC)
commodity trading.
Silver (XAG/USD)
Google's DeepMind had published a paper in 2013
presenting a technique of combining deep learning and United States Dollar $ (USD/INR)
Forex
reinforcement learning to play games with a discrete action Euro € (EUR/INR)
space. This has achieved tremendous results overcoming the
drawbacks of each type of learning by supplementing each
other [5]. Historical prices of Gold and Silver are obtained from
investing.com as XAU/USD and XAG/USD trading pairs. It
Forex markets are the largest markets in the world and denotes the price of 1 Troy Ounce of pure Gold and 1 Troy
commodity markets deal in primary products and raw Ounce of pure Silver in United States Dollars.
materials making them relatively attractive to an amateur Historical crude prices are obtained from the
investor who may not be comfortable in buying equity stock Organization of the Petroleum Exporting Countries in
[6]. In this paper, we have attempted to solve the problem of United States Dollars. The price reflects the cost of crude oil
automated profit generation and loss minimization in the
per barrel on a given day.
forex and commodity markets using the deep reinforcement
Historical prices of USD and EUR are obtained from
learning.
investing.com as USD/INR and EUR/INR trading pairs. It
III. PROBLEM DEFINITION denotes the price of 1.00 United States Dollar and 1.00 Euro
in Indian Rupees.
This paper tries to solve the problem of automated profit
generation in the forex and commodity markets. Profit is The obtained data is stored as separate relations before
defined as the amount of money gained minus the amount of any further processing.
money spent.
Investment is the total money spent on the purchase of B. Data Preparation and Formatting
currency or commodity. The returns is the amount of money All the datasets are reduced to two fields namely date and
obtained on sale of the purchased currencies or commodities. price. The obtained datasets are first checked for null values
and missing values in any of the fields. Rows with null
For a given currency or commodity  the purchase cost values and missing values are removed. Since the data is
and the sale price may be given as  and  respectively. presented as time-series values, the date field is treated as the
Hence profit  per unit currency or commodity is given as: index and hence converted into date objects for easy
manipulation.
=– ()
The dataset of historical prices of currencies and
Return on investment is a financial metric of profitability commodities is not entirely continuous due to trading
that is widely used in the industry and academia to measure holidays across various markets. For convenience and easy
the return or gain from an investment. Return on investment comparison of results across markets, multiple inner joins are
or ROI is a simple ratio of the gain from an investment performed to obtain a coherent dataset with prices of all the
relative to the total expense of making the investment commodities and currencies for all the given days in a
including trading fee, taxes and so on. It is very useful in the period. i.e. only prices of currencies and commodities traded
evaluation of the potential return from a stand-alone on the same corresponding day are included in the dataset to
investment [7]. Return on Investment  is used as a ensure accurate plotting of graphs with date as one of the
axes. This exercise also allows analysis of the relation
performance measure of the obtained results.  may be
between price changes of various items.
expressed as a percentage as:
Prices of commodities and currencies in USD are
 = 100 * ( – )  () multiplied with the value of the USD/INR trading pair on the
corresponding date to obtain the price of the items in Indian
IV. METHODOLOGY Rupees.
This project paper is organized as follows: Prices of Gold and Silver are divided by 31.1035 to
present the rates as cost per gram as they are initially present
A. Data Collection in Troy Ounce unit. This ensures granularity in trade
B. Data Preprocessing and Formatting execution as prices of Gold and Silver in Troy Ounce may be
C. Agent Training significantly high for a given initial investment amount.
D. Agent Testing
Oil prices are divided by 158.987 to present the values as
cost per liter as they are initially present in cost per oil barrel
A. Data Collection unit. This ensures granularity in trade execution as prices of
The datasets are detailed in Table I:

28

Authorized licensed use limited to: LAHORE UNIV OF MANAGEMENT SCIENCES. Downloaded on March 12,2024 at 12:06:30 UTC from IEEE Xplore. Restrictions apply.
oil per oil barrel unit may be significantly high for a given V. RESULTS
initial investment amount. The absolute return, initial investment and return on
The dataset is then divided into a test dataset and a train investment are obtained for multiple runs of the agent on the
dataset based on a selected fixed date value. The train dataset market data and is presented in Table III:
consists of values indexed lower than the fixed date value
and the test dataset consists of values indexed higher than the
fixed date value. This exercise translates to the prediction of TABLE III. PERFORMANCE
future prices based on historical or past market data.
Metrics
C. Agent Training Market Run
The training process involves the definition of the neural Return a. Investment a. Market ROI
network’s layers and tuning of the hyperparameters to
achieve optimal weights. The hyperparameters are specified Gold I -1289.26 133034.25 -46.91% -0.97%
in Table II. In a time-series problem, since each trade affects Gold II 10900.47 187977.64 -46.91% 5.8%
the next trade, care must be taken to achieve the right
parameters. This is a crucial activity and could possibly Commodity Oil I 53.29 2335.40 -1.27% 2.28%
determine the long-term outcome of the agent. The window
size is set to 10 to prevent overfitting. Oil II 180.83 1879.22 -1.27% 9.62%

The mean squared error is used as the loss function for Silver 53.29 2335.40 -3.47% 2.28%
the experiment. It measures the average of the squares of the
errors. The optimizer used is Adam optimizer. The learning EUR 71.62 3512.21 3.95% 2.04%
Forex
rate is 10−3. The training is carried out for 800 episodes in USD -7.47 2436.28 -5.33% -0.31%
mini batches of 32.
a.
Values in Indian Rupees ₹
TABLE II. HYPERPARAMTERS OF THE NEURAL NETWORK

Hyperparameter Value

Window size 10

Batch size 32

Learning rate 10-3

Episodes 800

Gamma 0.90

Epsilon 1.0

Epsilon Min 0.02

Epsilon Decay 0.990

Train Set Range 2010-01-01 - 2017-12-31 Fig. 1. Gold I Trades in 2018


Test Set Range 2017-01-01 - 2018-11-30

D. Agent Testing
The trained agent is tested on unseen market data of a
later period obtained from the same sources and allowed to
execute trades in a simulated environment. The action space
consists of buy, sell and hold. The absolute return, initial
investment and return on investment is computed to measure
the performance of the trained agent.
The return on investment (expressed as percentage)
achieved by the agent is compared to the actual return from
the market. With the cost of an item on the first day of the
trading period represented by  and the cost of the same item
on the last day of the trading period represented by , the
actual market return  is computed as (expressed as
percentage): Fig. 2. Gold II Trades in 2018

=– ()

29

Authorized licensed use limited to: LAHORE UNIV OF MANAGEMENT SCIENCES. Downloaded on March 12,2024 at 12:06:30 UTC from IEEE Xplore. Restrictions apply.
Fig. 3. Oil I Trades in 2018 Fig. 6. Euro Trades in 2017

Fig. 4. Oil II Trades in 2018 Fig. 7. US Dollar Trades in 2017

The market returns are calculated as change in the price


of the commodity or currency between the first and last date
of the given trading period. This reflects the return generated
upon purchase of the item on the first day of the trading
period and sale of the same item on the last day of the trading
period.
The losses incurred without the use of the model are
significantly high in bear markets. The returns generated
without use of the model are noticeably lower than the
returns generated by the model. It is therefore inferred that
the agent is able to consistently generate higher returns and
minimize losses in bear markets.
Fig. 5. Silver Trades in 2017 VI. CONCLUSION
This experiment proposes a deep reinforcement learning
method to solve the problem of automated trade execution
for profit generation. Deep reinforcement learning has thus
proven successful in the development of an automated
trading system. This agent does not use any financial concept
to derive insights or boost performance.
On comparison of the market return and the agent’s
return on investment, it is observed that the agent is able to
produce a positive profit and outperform the market on

30

Authorized licensed use limited to: LAHORE UNIV OF MANAGEMENT SCIENCES. Downloaded on March 12,2024 at 12:06:30 UTC from IEEE Xplore. Restrictions apply.
multiple occasions. The agent is also especially effective REFERENCES
during bear runs to minimize losses.
The major limitation of the presented approach is the [1] A. J. Smola and S. Vishwanathan, Introduction to Machine Learning,
agent’s ability to produce a profit only in the short term. Cambridge University Press, 2008.
Despite the availability of ample historical data, the model is [2] Z. Jiang and J. Liang, "Cryptocurrency Portfolio Management with
unable to capture long term trends due to short term state Deep Reinforcement Learning," in Intelligent Systems Conference,
representation. The agent, however, is able to successfully London, 2017.
detect short term trends and execute trades accordingly. [3] H. v. Hasselt, A. Guez and D. Silver, "Deep Reinforcement Learning
with Double Q-Learning," in The AAAI Conference On Artificial
Future work could focus on the agent’s ability to run for Intelligence, Phoenix, 2016.
longer periods taking long-term returns into consideration. [4] R. S. Sutton and A. G. Barto, Reinforcement Learning:, Cambridge,
This would aid most long-term investors who do not have the Massachusetts, London, England: The MIT Press, 2015.
time or expertise to constantly monitor the markets. A one- [5] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D.
time setup could possibly continue to generate income for an Wierstra and M. Riedmiller, "Playing Atari with Deep Reinforcement
investor with minimal human intervention or expertise. Learning," DeepMind Technologies, NIPS Deep Learning Workshop,
2013.
ACKNOWLEDGMENT [6] J. Chen, "Forex - FX," Investopedia , 2018. [Online]. Available:
https://www.investopedia.com/terms/f/forex.asp.
We express our gratitude to the Department of [7] A. Beattie, "FYI on ROI: A Guide to Calculating Return on
Information Science and Engineering, B.M.S. Institute of Investment," Investopedia, 3 December 2018. [Online]. Available:
Technology and Management for providing the necessary https://www.investopedia.com/articles/basics/10/guide-to-calculating-
infrastructure, guidance, and motivation to undertake this roi.asp.
experiment.

31

Authorized licensed use limited to: LAHORE UNIV OF MANAGEMENT SCIENCES. Downloaded on March 12,2024 at 12:06:30 UTC from IEEE Xplore. Restrictions apply.

You might also like