You are on page 1of 8

RCURRENCY: Live Digital Asset Trading Using a

Recurrent Neural Network-based Forecasting


System
Yapeng Jasper Hu Ralph van Gurp Ashay Somai
Delft University of Technology Delft University of Technology Delft University of Technology
Distributed Systems Group Distributed Systems Group Distributed Systems Group
Delft, The Netherlands Delft, The Netherlands Delft, The Netherlands
y.j.hu@student.tudelft.nl r.v.gurp@student.tudelft.nl a.somai@student.tudelft.nl
arXiv:2106.06972v1 [cs.AI] 13 Jun 2021

Hugo Kooijman Jan S. Rellermeyer


Delft University of Technology Delft University of Technology
Distributed Systems Group Distributed Systems Group
Delft, The Netherlands Delft, The Netherlands
h.v.kooijman@student.tudelft.nl j.s.rellermeyer@tudelft.nl

Abstract—Consistent alpha generation, i.e., maintaining an difficulty in predicting traditional asset data is all the more
edge over the market, underpins the ability of asset traders to true for the recent digital asset (e.g. Bitcoin) markets, which
reliably generate profits. Technical indicators and trading strate- have proven to be extraordinarily volatile [8–10].
gies are commonly used tools to determine when to buy/hold/sell
assets, yet these are limited by the fact that they operate on However, the recent rise of AI and ML has opened up new
known values. Over the past decades, multiple studies have opportunities. AI techniques can analyze enormous amounts
investigated the potential of artificial intelligence in stock trading of data and observe decades of conventional experience at a
in conventional markets, with some success. In this paper, we rate that is only limited by the complexity of the technique
present RCURRENCY, an RNN-based trading engine to predict and the available computational power [11, 12]. Furthermore,
data in the highly volatile digital asset market which is able to
successfully manage an asset portfolio in a live environment. By modern machine learning algorithms are capable of discover-
combining asset value prediction and conventional trading tools, ing patterns in data that are hidden to the human eye due to
RCURRENCY determines whether to buy, hold or sell digital the volume and complexity of the data [13]. Unsurprisingly,
currencies at a given point in time. Experimental results show investment and asset management firms have started to invest
that, given the data of an interval t, a prediction with an error billions into developing AI-based asset trading and investment
of less than 0.5% of the data at the subsequent interval t+1 can
be obtained. Evaluation of the system through backtesting shows applications [14].
that RCURRENCY can be used to successfully not only maintain In this paper, we present our findings and experiences
a stable portfolio of digital assets in a simulated live environment in designing and implementing RCURRENCY, an automated
using real historical trading data but even increase the portfolio trading engine tasked with managing a portfolio of digital
value over time. assets (Section III). The primary goal of our research is to
Index Terms—Machine Learning, Digital currencies, Neural
networks, Forecasting, Deep learning, Bitcoin provide insights into the capabilities of AI-powered digital
asset trading in a highly volatile and therefore challenging live
market environment. Effectively managing a portfolio requires
I. I NTRODUCTION
the AI-based training engine to estimate when to sell off assets,
Ever since the emergence of modern financial markets in buy new ones, or hold the current position. Decisions are made
the 17th century, specialists have made an effort to maximize based on data gathered from a digital asset exchange at a
profits from trading goods, services, stocks and securities. regular interval. For this purpose, we combine the prediction
Consequently, consistent alpha generation [1], a term to de- with previous (known) values and apply a trading strategy to
scribe the edge over the market or the ability to make profit, derive and execute a final buy/hold/sell decision. In practice,
has remained a hot topic [2, 3]. With the advent of the achieving a stable portfolio is considered a high bar for
digital age and the ability to process large amounts of data, automatic trading engines to master.
many strategies have revolved around new techniques and We experimentally evaluate different trading strategies
computational breakthroughs in an effort to predict the upward based on actual historical market data (Section IV). Our
and downward trends of assets (e.g., [4, 5]). The reliability results show that the prediction accuracy is highly dependent
of these predictions often remains questionable [6, 7]. This on the configuration of the neural network. The Root Mean
Squared Error over a prediction can increase significantly variables is limited; resulting the loss of potentially valuable
when incorrect parameters are chosen. We thoroughly evaluate network input data.
the most important parameter spaces and their impact on the
prediction accuracy. When an adequate set of parameters is III. T HE RCURRENCY L IVE T RADING S YSTEM
chosen, the trading engine is capable of estimating future asset In order to address the challenge of making reliable predic-
values with an error of approximately 0.4%. In practice, this tions of cryptocurrency markets for live trading, we designed
makes it possible to reliably predict up and down motions, the RCURRENCY system and structured it into three separate
as well as approximate the absolute change in value. This, components. The responsibility of the first component is to
in turn, allows the system to maintain and increase portfolio collect and preprocess the timely-ordered series of data. The
value over time. second component feeds this data into the neural network and,
by doing so, trains it to recognize patterns in the processed
II. R ELATED W ORK
data. These two components are sufficient to grant the system
The suggestion of predicting stock prices utilizing historic predictive capabilities for single steps in the time series data.
data has been an active research topic for decades. One of the The third component allows the system to make trading
early prediction methods based on an inter-temporal general decisions depending not only on a single prediction, but also
equilibrium model was proposed by Balvers et al. [15]. These on an elaborate analysis of trends and patterns. It implements
early methods used a purely analytical approach applying and applies various trading strategies, each of which calculates
basic mathematics and statistics. However, some schools of a trading signal based on its respective predefined rules. The
thought even then disputed the merit of the approach and flow diagram of the RCURRENCY system is visualized in
results of predicting the stock prices. For instance, it has been Figure 1.
empirically shown that if a random walk strategy accurately
reflects the reality, then all models concerning (stock) market A. Training Data
predictions are meaningless [6]. A neural network-based prediction method is only as reli-
Nowadays, more variables are added to the prediction able as the data that can be leveraged to train the network. As a
model, such as the market volume [16], twitter sentiment result, an essential step in the processing pipeline is to collect
[17], investor sentiment, or media news content [18]. Whereas the raw data and turn it into preprocessed training data for the
fundamental analysis describes the examination of data fo- learning algorithm to digest. Our system obtains raw Bitcoin
cused on macro-economic variables [19], technical analysis training data by calling an API provided by CryptoCompare1 .
puts more emphasis on extracting trends and patterns of an This company offers services concerning over 3.000 digital
investment instrument’s price, volume, breadth and trading assets, including the ability to retrieve historical price data.
activities [20–22]. The methodology proposed and analyzed Such data comprises candlestick data points in one-minute
in this paper focuses on technical analysis while it could be time frames, where each data point contains four values: the
expanded to support fundamental analysis in future works. open, high, low, and close price of the time frame. The system
Most recently, multiple studies have used AI techniques for runs a simple script to recursively and sequentially call this
prediction. Examples of concrete techniques include particle API to fetch historical Bitcoin price data, which consists of
swarm optimization [23], nearest neighbor classification [24], approximately 80.000 hourly data points, spanning back to
and neural networks [25]. An empirical study by Gu et al. August 2010. In contrast, live data on the current market
[26] compared multiple AI techniques in combination with condition is instead fetched directly from a cryptocurrency
trading techniques in the conventional stock market. Their exchange to minimize acquisition delays.
fundamental result is that overall AI can be successfully The raw data then undergoes preprocessing. Unfortunately,
utilized and the predictive capability of the AI techniques approximately the first 30 percent of the data set contains
carries a significant advantage. However, using AI in highly many near-zero values that tend to deteriorate the prediction
volatile markets such as digital assets [9] has gone largely capabilities of the network. These values do not contain
unexplored to this date. sufficient variance for the gradient descent of the error function
Some initial efforts to predict prices and trends of Bitcoin to converge. As a consequence, all data up until May 2013
and other digital currencies have been undertaken by McNally has been manually removed, roughly up until the time Bitcoin
et al. [27] and [28]. McNally et al. [27] explore prediction of reached values around one hundred USD.
Bitcoin prices based on the Simple Moving Average (SMA) Next, the system applies some technical analysis on the
technical indicator, through RNN, LSTM and ARIMA meth- resulting data set, which involves extracting trends and pat-
ods. terns from the price data by constructing technical trading
Alessandretti et al. [28] investigate price prediction of a indicators. Each indicator is calculated according to their
large number of cryptocurrencies using tree boosting tech- respective formula, where each formula uses some (or all) of
niques, as well as an LSTM-based method. The researchers the data set as input. RCURRENCY implements the financial
simulate trading of currencies, taking into account transaction technical indicators. The calculated indicator values are added
fees, but fail to explore the myriad of strategies traders use to
determine their investments. In either case the number of input 1 https://min-api.cryptocompare.com/
Fig. 1: Structure and data flow in RCURRENCY.

as additional rows to the training data set. This extra training value in the predictions of the actions following the current
data provides the neural network with more information about subsequence.
the market and should thus lead to higher predictive accuracy. The recurrent neural network encompasses three layers:
The data is subsequently converted to what is called a 1) Linear layer
stationary data set. Such a data set is created by taking the The linear layer is required to map the nodes of the
difference of each data point from the previous data point in input layer to the number of nodes in the hidden layer.
a column-based manner. It is a fully connected layer, of which every edge has an
Finally, row-based normalization is applied to the data set individual weight.
to fit the resulting values in the range of -1 and 1 in order to 2) Fast LSTM layer
obtain better results and reduce the training time [29]. The new The FastLSTM layer is a special, faster version of the
row-normalized matrix is mathematically defined as follows: conventional LSTM layer. It combines the calculation of
Given a matrix x, with dimensions m×n, for the new matrix the input, forget and output gates and the hidden state
y holds: in a single step.
The composite functions of the standard LSTM cell are
xi,j − y min
yi,j = 2 ∗ ( ) − 1, 1 ≤ i ≤ m, &1 ≤ j ≤ n. (1) based on the paper from [37]. and are defined as follows:
y max − y min
it = σ(Wxi xt + Whi ht−1 + Wci ct−1 + bi )
B. Recurrent Neural Network
ft = σ(Wxf xt + Whf ht−1 + Wcf ct−1 + bf )
There are various options available regarding networks
capable of consuming a series of data points in order to make ct = ft ct−1 + it tanh(Wxc xt + Whc ht−1 + bc ) (2)
predictions on future values, such as the time series analysis ot = σ(Wxo xt + Who ht−1 + Wco ct−1 + bo )
technique ARIMA [30], or neural networks. As evaluated ht = ot tanh(ct ).
by Kohzadi et al. [31], a neural network outperforms the However, as the network implements a FastLSTM layer
ARIMA network over longer periods of time, justifying its rather than a conventional LSTM layer, the composite
preference for our system. Specifically a recurrent neural functions are slightly altered [38].
network functions as RCURRENCY’s predictor, which is
a particular type of neural network capable of exhibiting it = σ(Wxi xt + Whi ht−1 + bi )
temporal dynamic behavior. As first introduced by [32], these ft = σ(Wxf xt + Whf ht−1 + bf )
networks model the connections between layer nodes as a ct = ft ct−1 + tanh(Wxc xt + Whc ht−1 + bc ) (3)
directed graph along a temporal sequence. The input data
ot = σ(Wxo xt + Who ht−1 + bo )
for RCURRENCY’s recurrent neural network consists of so-
called candlesticks, data structures containing numerical stock ht = tanh(ct ).
market information on assets. The values in a candlestick are 3) Output layer with Mean Squared Error (MSE) perfor-
dependent on previous candlestick values and do therefore not mance function
follow a random walk [33]. The output layer evaluates the neural network, in case
A conventional recurrent neural network experiences prob- it is used for training purposes. If the neural network
lems with the gradient descent when learning long-time de- is used to predict a values, the value that is returned
pendencies, as depicted in the research by Bengio et al. reflects the output of the specified output layer. In this
[34]. When an LSTM layer is used in the architecture of case, the MSE function is used to evaluate the network.
the recurrent neural network, the problems of exploding and It does that as follows:
vanishing gradients are solved [35]. An example of a long-term m n
1 XX
dependency in the current context is given by Tsinaslanidis M SE = (xij − x̂ij )2 (4)
m ∗ n i=1 j=1
and Kugiumtzis [36]. Their paper shows that a previously
encountered subsequence of the time series data could be where xij is the actual value, x̂ij is the predicted value
similar to the current trend. The consequent actions of the price in row i and column j, m is the total amount of rows,
following the previously encountered subsequence could carry and n is the total amount of columns.
The network also uses the Adam optimizer, introduced by result in an asset being marked as overbought (indicating
Kingma and Ba [39], to speed up the training process. The the holder should sell) and vice versa.
Adam optimizer is computationally efficient, requires little • Double Exponential Moving Averages (DEMA)
memory, is invariant to diagonal re-scaling of the gradients, The DEMA strategy utilizes two moving averages to de-
and is well suitable for large quantities of data, or problems termine whether to buy, hold or sell an asset. Crossovers
with many parameters. Therefore, it is a good fit for the of these averages indicate buy and sell moments as the
solution. The resulting formula for updating parameters is: momentum changes between long-term and short-term
mt asset value.
m̂t = , • Moving Average Convergence / Divergence (MACD)
1 − β1t
vt MACD is a well-known strategy based on the difference
vˆt = , between two moving averages with different lengths,
1 − β2t
p (5) called slow- and fast moving average. This is then com-
a ∗ 1 − β2t
at = , pared to the moving average of the difference itself, called
1 − β1t the signal line. A decision to buy, hold or sell is made
at ∗ mt
θt+1 = θt − √ m̂t , whenever the difference crosses the signal line.
v̂t +  • Random Walk
where α is the step size, mt is the mean at time t, and vt is A controversial yet widely regarded investment theory
the uncentered variance of the gradients at time t. known as the efficient market hypothesis states that
all available information is immediately reflected in the
C. Trading Strategies and Prediction value of an asset, and therefore no consistently profitable
Technical analysis of digital assets is indeed sensitive to the trading strategies should exist [43]. Consequently, a series
trading strategy, i.e, the methods of buying and selling based of random actions should be able to approximate the
on predefined rules to make trading decisions. There have been results of a series of calculated trades.
longstanding discourses on how useful trading strategies are D. Implementation
in practice [40]. Common trading strategies based on time
RCURRENCY is implemented in C++ with various libraries
series pattern analysis, such as momentum and contrarian, as
making up its core. First, the historical data is fetched using the
well as other technical analysis strategies have had varying
API provided by CryptoCompare. Since the API is currently
degrees of success [24, 41, 42]. As such, it was decided
not rate-limited or metered, this step takes only minutes.
to add this element to the experimental AI, giving it the
Live data is fetched in real-time by creating a WebSocket
capability of applying four different trading strategies based
connection with the exchange (Binance) and fetching the
on technical indicators. Due to the vast number of strategies
candles after they have formed.
available and conceivable, priority was given to those that are
After obtaining the historical data, it is preprocessed and fed
most commonly adopted by existing trading platforms, such as
to the neural network. This process utilizes two well-known
TradingView2 or Cryptowatch3 . Each trading strategy is briefly
libraries: Armadillo4 and TA-Lib5 . The Armadillo library
discussed below.
provides highly optimized matrix computation functionality
Every time period (minutely, hourly, daily) the neural net-
desired for regular data set manipulation. TA-Lib provides
work makes a prediction of the next period’s asset price. This
all necessary functionality for efficiently calculating technical
value is digested by the chosen trading strategy or strategies
indicators from stock market data. As such, preprocessing and
to determine a decision signal, which is most commonly
preparation of the training dataset is completed in under a
facilitated by observing the intersection between two or more
minute.
moving averages. Volatility of the portfolio value can be con-
The recurrent neural network, as described in Section III,
tained by setting limits on the minimum and maximum amount
is implemented using MLPack [44]. MLPack is a fast and
of portfolio value that can be traded each time period. Altering
flexible C++ machine learning library, implementing various
these limits and changing the trading strategies determines the
types of configurable neural networks and machine learning
level of risk management in trading.
methods. It utilizes the Armadillo library for linear algebra
• Rate-of-Change (ROC)
computations, allowing for relatively low network training
The ROC strategy considers the change of the predicted times. Training the network on all of the historical data
value relative to the last actual value. When the change is between May 15th, 2013 and January 31st, 2019 takes ap-
below, in between, or above a lower and upper threshold, proximately 10.5 seconds on a single commodity machine
a decision is made whether to sell, buy or hold. (Apple MacBook Pro model A1502, 2.4GHz Intel Core i5-
• Relative Strength Indicator (RSI)
4258U, 8GB RAM). The prediction of a data point as well
This strategy tracks upward and downward trends in asset as an associated trading decision can be obtained in under a
value. A sufficiently long series of positive value changes millisecond.
2 https://www.tradingview.com/ 4 http://arma.sourceforge.net/
3 https://cryptowat.ch/ 5 http://ta-lib.org/
Fig. 2: Performance of the network under k-fold cross- Fig. 3: Impact of the different weight initialization values on
validation. the RMSE.

IV. E XPERIMENTAL E VALUATION A. Parameter Optimization


The performance of the neural network is highly dependent
Considering the fact that the training data set is time- on several parameters that fine-tune different aspects of the
dependent, Time Series k-fold Cross Validation was applied network. These parameters relate to the topology of the net-
to assess the accuracy of the AI’s predictive capabilities work itself, as well as mathematical functions used to increase
and express them numerically in a meaningful manner. The the speed of learning and overcome local optima.
training data set is divided into a training part (80%) of size p To optimize the hyperparameters, the validation technique
samples and a validation part (20%) of size q samples, where of the previous section was applied while conducting an
each sample represents the values of one time unit. exhaustive search for the minimal RMSE by modifying a
The value k represents how many samples are used in the single parameter at a time. While related works using neural
validation step, or how many time units are predicted in each networks often remains silent on how and why specific hyper-
training-validation cycle. Next, a software environment was parameters have been chosen, in practice they are crucial to
developed to facilitate this validation method. The software the accuracy of the RNN and therefore to the success of any
continuously trained the model on samples 1, . . . , p, produced RNN-based agent.
a prediction for the next hour p + 1, validated the prediction 1) Edge Weight Initialization: The initialization values of
using validation sample p + 1, trained the model anew on these edges will notably influence the training time of the
samples 1, . . . , p + 1, and so on. Setting the value of k to neural network. A more optimal setting of initial weights will
1 means q cycles are performed to complete one series of lead to a faster and better convergence of the final weight
cross validation. An increasing value of k means a decreasing values, and this will reduce the training time significantly [45].
amount of training-validation cycles. After each cycle, the As Figure 3 shows, random numbers in the range of -0.75 and
error is calculated and stored. Finally, an error function is 0.75 proved to have the lowest RMSE across the different folds
used to measure the performance. A range of values for k of the Bitcoin data set.
was tested in order to obtain the lowest margin of error, as 2) Optimization Algorithm: The role of the optimization
shown in Figure 2 where the Root Mean Squared Error is algorithm is to adjust the weights on each training iteration
expressed for the number of chosen folds. of the network, impacting both the speed of convergence
As expected, the AI makes fewer predictive errors for a as well as the accuracy of the prediction [46]. Stochastic
lower value of k, as a larger number of folds reduces the Gradient Descent (SGD) methods are the most commonly used
amount of data that is used to train the network before the optimization algorithms. After testing it with different step
prediction cycle(s). The figure also shows that the prediction sizes against the Adam optimizer [39], the latter proved to
error on all four elements stays well below fifty dollars for give the best balance of speed and accuracy at a learning rate
lower values of k. This effectively allows the system to of 0.01. The results are shown in Figure 4a.
accurately predict up- and downward trends, as the absolute 3) Rho Value (Look-back): The rho value sets the length of
change between each interval is on average higher than the the back-propagation through time (BPTT) used to train the
prediction error. Since the initial plan for the trading engine network. Effectively, it determines how many recent values are
was to make hourly predictions in the first place, the value of used to predict the output at the next step in the time series.
k was fixed to one. A new validation cycle is automatically As shown in Figure 4b, the optimal value was found to be
performed each hour after the JavaScript API pulls new data. around 150.
Fig. 5: Performance of the different hidden layer sizes, mea-
sured against the RMSE.

Fig. 4: Impact of parameter optimization on the RMSE.

4) Hidden Layer Complexity: The complexity of the hidden


layer refers to the number of hidden nodes used in the Fig. 6: Portfolio growth of 10.000 USDT between January
FastLSTM layer. Increasing the number of nodes brings the 31th and September 13th, 2019.
potential to learn more complex rules, but at the same time
requires more data to train. Furthermore, it increases the risk of
over-fitting and thereby losing the ability to generalize, which the quote currency. The total value is calculated by converting
results in inaccurate results when the network is used with the portfolio to quote currency at each period.
slightly different data sets. Figure 5 shows that the RMSE However, merely achieving a higher monetary value does
was the lowest at approximately 36 hidden layer nodes. not imply successful trading if the trader is outperformed
by the market itself. If the BTC/USDT value was only ever
B. Evaluating Trading Strategy Results increasing during a certain time period, then even taking no
This section discusses the results obtained by validating action whatsoever would have turned a profit. Incidentally, this
the trading engine and each of its five previously described is known as the Buy-And-Hold strategy. With this strategy, a
trading strategies on historical data; a technique popularly designated asset volume is simply bought and held; no further
known as backtesting. The set of historical data points acts trading actions are taken. This strategy therefore provides an
as a simulation of live input and updates the portfolio by appropriate baseline for comparison, also known as ”trading
executing hypothetical (’dummy’) trades. The performance of against Alpha”. The results of this comparison and the exper-
each trading strategy in minimizing losses to the portfolio is iments in general are shown in Figure 6.
dependent on the quality of the prediction used to determine The AI was trained on approximately 90% of the data set
a course of action, as well as the quality of the strategy itself. and used to predict the remaining 10% of the data. Each
The recurrent neural network in these experiments utilizes strategy was tested independently, each receiving an initial
the optimized parameter configuration as mentioned in the portfolio containing 10.000 USDT and subsequently used for
previous section to maximize prediction accuracy. trades using the stock market data spanning January 31st to
Each experiment concerns trading the BTC/USDT pair, with September 13th 2019, approximately 5.400 hours. The Alpha
BTC as the base currency and USDT as the quote currency. baseline was generated by simply converting the 10.000 USDT
The goal is to maximize the portfolio’s total monetary value in into BTC at t = 0 and taking no further trading actions. Some
limitations have been imposed on the trading agent in order to
achieve more stable results. Primarily, a maximum of 25% of
the total portfolio value may be traded at a time. Secondarily,
no trades will be made if the expected value increase across
the traded volume is smaller than the trading fee (set at 0.1%).
Figure 6 shows that during this part of historical Bitcoin
data the RSI strategy managed to achieve the highest portfolio
value, with the MACD strategy coming in as a very close
second and the DEMA strategy performing only slightly
worse. However, all three strategies managed to maintain a
stable portfolio value, keeping up with and even outperforming
the Alpha baseline. The ROC strategy performs almost as
badly as the Random Walk, both barely managing to retain
a third of the Alpha baseline value.
Although the AI is able to outperform the Alpha baseline Fig. 7: Sharpe Ratios of the six trading strategies between
with certain strategies, the question remains whether the profit February 13th and September 13th, 2019.
merits the risks involved. One way of answering this question
is by calculating the Sharpe Ratio [47], which gives an
indication of how well an investor is compensated by the asset value) and seven values derived from this data (technical
returns of an asset compared to a risk-free asset with respect indicators), while the four outputs represent the predictions for
to the risk. The Sharpe Ratio is calculated by subtracting some the four asset values. Performance is measured using the Root
risk-free asset return rate from the considered asset return rate, Mean Squared Error, which emphasizes the gravity of large
divided by the standard-deviation of the considered asset return prediction errors. Results obtained from the TSKCV suggest
rate. Figure 7 shows the monthly Sharpe Ratios of each of the that the network is capable of predicting new asset values
six trading strategies measured over the final seven (virtual) with approximately 0.4% error across the four output channels.
months of the backtesting period, ranging from February 13th The strategies have varying rates of success in maintaining
to September 13th, 2019. The return rate R is calculated as a steady portfolio value throughout the historical dataset but
the percentage of excess returns relative to the start of each every single one performs better than the randomized strategy.
month. As for determining the risk-free rate of return Rf , Despite the fact that RCURRENCY is tested for Bitcoin, its
monthly U.S. Treasury Bills (T-Bills) made for an appropriate design allows the system to support other digital assets as well.
risk-free asset, as the portfolio is measured in USDT. The No strategy was able to predict large and sudden drops in
T-Bills during the backtesting period had a risk-free rate asset value correctly. This, however, is a common problem
of roughly 2%, as indicated on the website of the United of existing AI-based solutions and future research could shed
States Treasury Department [48]. To put these numbers into light on the question whether utilizing a broader set of input
perspective, a Sharpe Ratio above is 1 is generally considered data can improve the ability to predict such events effectively.
a good investment, above 2 is considered excellent [49]. Neural networks generally perform better given more train-
The diagram reveals that even though the RSI and MACD ing data. One specific input readily available is volume data,
strategies attained the highest portfolio values respectively, it which measures what quantities of an asset have been traded
is the DEMA strategy that holds the highest Sharpe Ratio. in a given period of time. Unfortunately, volume data had to
Since the returns (R) of the RSI and MACD strategies were be excluded as it was unknown from which exchanges the
higher and the risk-free rate of return (Rf ) of each month was volume data originated, a problem that would likely lead to
the same for every strategy, the explanation is that the RSI and poor predictions. Finding out the origins of the volume data
MACD strategies displayed a higher volatility (expressed as would provide the system with additional training data.
σR ) than the DEMA strategy during this backtesting period. Despite the fact that RCURRENCY is tested for Bitcoin,
The Random Walk strategy suffered both suffered major losses the design allows the system to support other digital assets as
and a high volatility, thus its Sharpe Ratio fell below negative. well. The training data has to include the historical data of the
Although the ROC strategy suffered similar losses, it managed alternative digital asset. Subsequently, the hyper-parameters
to keep a positive Sharpe Ratio due to lower volatility levels. have to be verified or re-optimized for the newly added data.
Another addition, is combining the historical data of multiple
V. C ONCLUSION AND F UTURE W ORK assets in different slices per asset, supplying the model with
With RCURRENCY, we have built a viable AI-based trad- the possibility to utilize the correlations between the slices.
ing agent that is able to operate within the highly volatile cryp- Lastly, it is a common trader’s tactic to combine multiple
tocurrency market and even generate profit under real market trading strategies to maximize profits. Further simulations
conditions. RCURRENCY uses a FastLSTM-backed recurrent could be executed in order to test the performance disparity of
neural network to predict future stock prices. Network inputs the trading engine when combining two, three or even more
are a combination of direct data (four values representing trading strategies.
R EFERENCES ticipating cryptocurrency prices using machine learning,” Complexity,
[1] W. H. Gross, “Consistent alpha generation through structure,” Financial vol. 2018, 2018.
Analysts Journal, vol. 61, no. 5, pp. 40–43, 2005. [29] J. Sola and J. Sevilla, “Importance of input data normalization for the
[2] G. P. C. Fung, J. X. Yu, and W. Lam, “News sensitive stock trend application of neural networks to complex industrial problems,” IEEE
prediction,” in Adv. in Knowledge Discovery and Data Mining, M.-S. Trans Nucl Sci, vol. 44, no. 3, 1997.
Chen, P. S. Yu, and B. Liu, Eds., 2002. [30] T. C. Mills and T. C. Mills, Time series techniques for economists.
[3] E. W. Saad, D. V. Prokhorov, and D. C. Wunsch, “Comparative study Cambridge University Press, 1991.
of stock trend prediction using time delay, recurrent and probabilistic [31] N. Kohzadi, M. S. Boyd, B. Kermanshahi, and I. Kaastra, “A comparison
neural networks,” IEEE Transactions on neural networks, vol. 9, no. 6, of artificial neural network and time series models for forecasting
pp. 1456–1470, 1998. commodity prices,” Neurocomputing, vol. 10, no. 2, pp. 169–181, 1996.
[4] M.-C. Lee, “Using support vector machine with a hybrid feature [32] T. Mikolov, M. Karafiát, L. Burget, J. Černockỳ, and S. Khudanpur,
selection method to the stock trend prediction,” Expert Systems with “Recurrent neural network based language model,” in Eleventh annual
Applications, vol. 36, no. 8, pp. 10 896–10 904, 2009. conference of the international speech communication association, 2010.
[5] L.-P. Ni, Z.-W. Ni, and Y.-Z. Gao, “Stock trend prediction based on [33] A. W. Lo and A. C. MacKinlay, “Stock market prices do not follow
fractal feature selection and support vector machine,” Expert Systems random walks: Evidence from a simple specification test,” The review
with Applications, vol. 38, no. 5, 2011. of financial studies, vol. 1, no. 1, pp. 41–66, 1988.
[6] E. F. Fama, “Random walks in stock market prices,” Financial analysts [34] Y. Bengio, P. Simard, P. Frasconi et al., “Learning long-term depen-
journal, vol. 51, no. 1, pp. 75–80, 1995. dencies with gradient descent is difficult,” IEEE transactions on neural
[7] C. S. Vui, G. K. Soon, C. K. On, R. Alfred, and P. Anthony, “A networks, vol. 5, no. 2, pp. 157–166, 1994.
review of stock market prediction with artificial neural network (ann),” [35] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural
in 2013 IEEE International Conference on Control System, Computing Comput., 1997.
and Engineering, 2013. [36] P. E. Tsinaslanidis and D. Kugiumtzis, “A prediction scheme using per-
[8] R. Ali, J. Barrdear, R. Clews, and J. Southgate, “The economics of ceptually important points and dynamic time warping,” Expert Systems
digital currencies,” Bank of England Quarterly Bulletin, p. Q3, 2014. with Applications, vol. 41, no. 15, pp. 6848–6860, 2014.
[9] A. H. Dyhrberg, “Bitcoin, gold and the dollar–a garch volatility analy- [37] A. Graves, A.-r. Mohamed, and G. Hinton, “Speech recognition with
sis,” Finance Research Letters, vol. 16, 2016. deep recurrent neural networks,” in Acoustics, speech and signal pro-
[10] D. Yermack, “Is bitcoin a real currency? an economic appraisal,” in cessing (icassp), 2013 ieee international conference on. IEEE, 2013,
Handbook of digital currency. Elsevier, 2015, pp. 31–43. pp. 6645–6649.
[11] L. Bottou, “Large-scale machine learning with stochastic gradient de- [38] M. Edel, “Fast lstm layer documentation,” jun 2018.
scent,” in COMPSTAT’2010, 2010, pp. 177–186. [Online]. Available: https://github.com/mlpack/mlpack/blob/master/src/
[12] L. Bottou, F. E. Curtis, and J. Nocedal, “Optimization methods for large- mlpack/methods/ann/layer/fast lstm.hpp
scale machine learning,” Siam Review, vol. 60, no. 2, pp. 223–311, 2018. [39] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”
[13] I. H. Witten, E. Frank, M. A. Hall, and C. J. Pal, Data Mining: Practical arXiv preprint arXiv:1412.6980, 2014.
machine learning tools and techniques. Morgan Kaufmann, 2016. [40] P. J. Knez and M. J. Ready, “Estimating the profits from trading
[14] J. Lee, “One of the world’s oldest quants is going all-in with robots,” strategies,” The Review of Fin. Studies, vol. 9, no. 4, 1996.
Bloomberg, 2019. [41] J. Conrad and G. Kaul, “An anatomy of trading strategies,” The Review
[15] R. J. Balvers, T. F. Cosimano, and B. McDonald, “Predicting stock of Financial Studies, vol. 11, no. 3, pp. 489–519, 1998.
returns in an efficient market,” The Journal of Finance, vol. 45, no. 4, [42] H. Bessembinder and K. Chan, “Market efficiency and the returns to
pp. 1109–1128, 1990. technical analysis,” Financial mgmnt, 1998.
[16] C. Brooks, “Predicting stock index volatility: can market volume help?” [43] S. Basu, “Investment performance of common stocks in relation to their
Journal of Forecasting, vol. 17, no. 1, pp. 59–80, 1998. price-earnings ratios: A test of the efficient market hypothesis,” The
[17] X. Zhang, H. Fuehres, and P. A. Gloor, “Predicting stock market journal of Finance, vol. 32, no. 3, pp. 663–682, 1977.
indicators through twitter “i hope it is not as bad as i fear”,” Procedia- [44] R. Curtin, M. Edel, M. Lozhnikov, Y. Mentekidis, S. Ghaisas, and
Social and Behavioral Sciences, vol. 26, 2011. S. Zhang, “Mlpack 3: A fast, flexible machine learning library,” Journal
[18] P. C. Tetlock, “Giving content to investor sentiment: The role of media in of Open Source Softw., 2018.
the stock market,” The Journal of finance, vol. 62, no. 3, pp. 1139–1168, [45] J. Y. Yam and T. W. Chow, “Feedforward networks training speed
2007. enhancement by optimal initialization of the synaptic coefficients,” IEEE
[19] J. S. Abarbanell and B. J. Bushee, “Fundamental analysis, future Trans Neural Netw, vol. 12, no. 2, 2001.
earnings, and stock prices,” Journal of accounting research, vol. 35, [46] “A weight initialization method for improving training speed in feedfor-
no. 1, pp. 1–24, 1997. ward neural network,” Neurocomputing, vol. 30, no. 1, pp. 219 – 232,
[20] M. C. Thomsett, Mastering technical analysis. Dearborn Trade 2000.
Publishing, 1999. [47] W. F. Sharpe, “Mutual fund performance,” The Journal of business,
[21] M. J. Pring, Technical analysis explained: The successful investor’s vol. 39, no. 1, pp. 119–138, 1966.
guide to spotting investment trends and turning points. McGraw-Hill [48] U.S. Department of the Treasury, “Daily treasury yield curve rates,”
New-York, 1991, vol. 4. https://bit.ly/3kYRIax, 2019.
[22] T. A. Meyers, The Technical Analysis Course: A Winning Program for [49] W. Goetzmann, J. Ingersoll, M. I. Spiegel, and I. Welch, “Sharpening
Investors and Traders. Irwin Prof. Publ., 1994. sharpe ratios,” Tech. Rep., 2002.
[23] J. Nenortaite and R. Simutis, “Stocks’ trading system based on the
particle swarm optimization algorithm,” in International Conference on
Computational Science, 2004.
[24] L. A. Teixeira and A. L. I. De Oliveira, “A method for automatic stock
trading combining technical analysis and nearest neighbor classifica-
tion,” Expert systems with applications, vol. 37, no. 10, pp. 6885–6890,
2010.
[25] M. Lam, “Neural network techniques for financial performance predic-
tion: integrating fundamental and technical analysis,” Decision support
systems, vol. 37, no. 4, pp. 567–581, 2004.
[26] S. Gu, B. Kelly, and D. Xiu, “Empirical asset pricing via machine
learning,” NBER, Tech. Rep., 2018.
[27] S. McNally, J. Roche, and S. Caton, “Predicting the price of bitcoin using
machine learning,” in 2018 26th Euromicro International Conference
on Parallel, Distributed and Network-based Processing (PDP). IEEE,
2018, pp. 339–343.
[28] L. Alessandretti, A. ElBahrawy, L. M. Aiello, and A. Baronchelli, “An-

You might also like