Professional Documents
Culture Documents
ABSTRACT 1 INTRODUCTION
Designing profitable and reliable trading strategies is challenging in A profitable and reliable trading strategy in the cryptocurrency
the highly volatile cryptocurrency market. Existing works applied market is critical for hedge funds and investment banks. Deep rein-
deep reinforcement learning methods and optimistically reported forcement learning methods prove to be a promising approach [17],
increased profits in backtesting, which may suffer from the false including crypto portfolio allocation [2, 27] and trade execution
positive issue due to overfitting. In this paper, we propose a prac- [23]. However, three major challenges prohibit the adoption and
tical approach to address backtest overfitting for cryptocurrency deployment in a real market: 1) the cryptocurrency market is highly
trading using deep reinforcement learning. First, we formulate the volatile; 2) the historical market data have a low signal-to-noise
detection of backtest overfitting as a hypothesis test. Then, we train ratio [13]; and 3) there are large fluctuations (e.g., market crash) in
the DRL agents, estimate the probability of overfitting, and reject the cryptocurrency market.
the overfitted agents, increasing the chance of good trading perfor- Existing works may suffer from the backtest overfitting issue.
mance. Finally, on 10 cryptocurrencies over a testing period from Many methods [35, 50, 51] adopted a walk-forward backtest method
05/01/2022 to 06/27/2022 (during which the crypto market crashed and optimistically reported increased profits in backtesting. The
two times), we show that the less overfitted deep reinforcement walk-forward method divides data in a training-validation-testing
learning agents have a higher Sharpe ratio than that of more over- manner, but using a single validation set can easily result in over-
fitted agents, an equal weight strategy, and the S&P DBM Index fitting. Another approach [27] considered a 𝑘-fold cross-validation
(market benchmark), offering confidence in possible deployment to method (essentially leave one period out) with an assumption that
a real market. the training set and the validation sets are draw from an IID process,
which does not hold in financial tasks. For example, [36] showed
CCS CONCEPTS that there is a very strong time-series momentum effect in crypto
• Computing methodologies → Markov decision processes; returns. Finally, DRL algorithms are highly sensitive to hyperparam-
Neural networks; Reinforcement learning. eters, resulting in high variability of DRL algorithms’ performance
[12, 24, 38]. A researcher may get ‘lucky’ on the validation set and
KEYWORDS obtain a false positive agent. A key question from practitioners is
that “are the results reproducible under different market situations?"
Deep reinforcement learning, Markov Decision Process, cryptocur-
This paper proposes a practical approach to address the backtest
rency trading, backtest overfitting
overfitting issue. Researchers in the DRL cryptocurrency trading
ACM Reference Format: niche submit papers containing overfitted backtest results. The
Berend Gort, Xiao-Yang Liu, Xinghang Sun, Jiechao Gao, Shuaiyu Chen, value of a quantitative metric for detecting overfitting will be sig-
and Christina Dan Wang. 2022. Deep Reinforcement Learning for Cryp- nificant. First, we formulate the detection of backtest overfitting as
tocurrency Trading: Practical Approach to Address Backtest Overfitting. In
a hypothesis testing. Such a test employs an estimated probability
Proceedings of ACM Conference (Conference’17). ACM, New York, NY, USA,
of overfitting to determine whether a trained agent is acceptable.
9 pages. https://doi.org/XXXXXXX.XXXXXXX
Second, we provide detailed steps to estimate the probability of
∗ B. Gort finished this project as a research assistant at Columbia University. backtest overfitting. If the probability exceeds the threshold of the
† Corresponding author. hypothesis test, we reject it. Finally, on 10 cryptocurrencies over
a testing period from 05/01/2022 to 06/27/2022 (during which the
Permission to make digital or hard copies of all or part of this work for personal or crypto market crashed two times), we show that the less overfit-
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation ted deep reinforcement learning agents have a higher Sharpe ratio
on the first page. Copyrights for components of this work owned by others than ACM than that of the more overfitted agents, an equal weight strategy,
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specific permission and/or a
and the S&P DBM Index (market benchmark), offering confidence
fee. Request permissions from permissions@acm.org. in possible deployment to a real market. The DRL-based strategy
Conference’17, July 2017, Washington, DC, USA appears to be related to the volatility-managed strategies - buy
© 2022 Association for Computing Machinery. (sell) when volatility is high (low), and is popular in the asset man-
ACM ISBN 978-x-xxxx-xxxx-x/YY/MM. . . $15.00
https://doi.org/XXXXXXX.XXXXXXX agement industry [40]. We hope DRL practitioners may apply the
Conference’17, July 2017, Washington, DC, USA Berend Gort, Xiao-Yang Liu, Xinghang Sun, Jiechao Gao, Shuaiyu Chen, and Christina Dan Wang
2 RELATED WORKS
Existing works can be classified into three categories: 1) backtest
using the walk-forward method, 2) backtest using cross-validation
methods, and 3) backtest with hyperparameter tuning.
indicators. Over the training period (from 02/02/2022 to 04/30/2022, • reward function computes 𝑟 (𝒔 𝒕 , 𝒂 𝒕 , 𝒔 𝒕+1 ) = 𝑣𝑡 +1 − 𝑣𝑡 as follows:
as shown in Fig. 3), we compute Pearson correlations of the features
and obtain a correlation matrix in Fig. 1. We list the 9 technical 𝑟 (𝒔𝑡 , 𝒂𝑡 , 𝒔𝑡 +1 ) = 𝑏𝑡 +1 + 𝒑𝑇𝒕+1 𝒉 𝒕+1 − 𝑏𝑡 + 𝒑𝑇𝒕 𝒉 𝒕 . (2)
indicators as follows: Trading constraints
• Relative Strength Index (RSI) measures price fluctuation. 1). Transaction fees. Each trade has transaction fees, and dif-
• Moving Average Convergence Divergence (MACD) is a momen- ferent brokers charge varying commissions. For cryptocurrency
tum indicator for moving averages. trading, we assume that the transaction cost is 0.3% of the value of
• Commodity Channel Index (CCI) compares the current price to each trade. Therefore, (2) becomes
the average price over a period. 𝑟 (𝒔𝑡 , 𝒂𝑡 , 𝒔𝑡 +1 ) = 𝑏𝑡 +1 + 𝒑𝑇𝒕+1 𝒉 𝒕+1 − 𝑏𝑡 + 𝒑𝑇𝒕 𝒉 𝒕 − 𝑐𝑡 , (3)
• Directional Index (DX) measures the trend strength by quantify-
ing the amount of price movement. and the transactions fee 𝑐𝑡 is
• The rate of change (ROC) is the speed at which variable changes 𝑐𝑡 = 𝒑𝑇𝑡 |𝒂𝑡 | × 0.3%, (4)
over a period [20].
• Ultimate Oscillator (ULTSOC) measures the price momentum of where |𝒂𝑡 | means taking entry-wise absolute value of 𝒂𝑡 .
an asset across multiple timeframes [21]. 2). Non-negative balance. We do not allow short, thus we make
• Williams %R (WILLR) measures overbought and oversold levels sure that 𝑏𝑡 +1 ∈ R+ is non-negative,
[43]. 𝑏𝑡 +1 = 𝑏𝑡 + 𝒑𝑇𝑡 𝒂𝑡𝑆 + 𝒑𝑇𝑡 𝒂𝑡𝐵 ≥ 0, for 𝑡 = 0, ...,𝑇 − 1, (5)
• On Balance Volume (OBV) measures buying and selling pres-
sure as a cumulative indicator that adds volume on up-days and where 𝒂𝑡𝑆 ∈ R𝐷 𝐵 𝐷
− and 𝒂𝑡 ∈ R+ denote the selling orders and buying
subtracts volume on down-days [20]. orders, respectively, such that 𝒂𝑡 = 𝒂𝑡𝑆 + 𝒂𝑡𝐵 . Therefore, action 𝒂𝑡 is
• The Hilbert Transform Dominant (HT) is used to generate in- executed as follows: first execute the selling orders 𝒂𝑡𝑆 ∈ R𝐷 − and
phase and quadrature components of a detrended real-valued then the buying orders 𝒂𝑡𝐵 ∈ R𝐷 + ; and if there is not enough cash, a
signal to analyze variations of the instantaneous phase and am- buying order will not be executed.
plitude [41]. 3). Risk control. The cryptocurrency market regularly drops in
terms of market capitalization, sometimes even ≥ 70%. To control
As shown in Fig. 1, if two features have a correlation exceeding
the risk for these market situations, we employ the Cryptocurrency
±60%, we drop either one of the two. Finally, 𝐼 = 6 uncorrelated
Volatility Index, CVIX [6]. Extreme market situations increase the
features are kept in the feature vector 𝒇𝑡 ∈ R6𝐷 , which are trading
value of CVIX. Once the CVIX exceeds a certain threshold, we stop
volume, RSI, DX, ULTSOC, OBV, and HT. Since the OHLC prices
buying and sell all our cryptocurrency holdings. We resume trading
are highly correlated, 𝒑𝑡 in 𝒔𝑡 is chosen to be the close price over
once the CVIX returns under the threshold.
the period [𝑡, 𝑡 + 1]. Note that the close price of period [𝑡, 𝑡 + 1]
equals to the open price of period [𝑡 + 1, 𝑡 + 2]. The feature vector
3.3 Training a Trading Agent
𝒇𝑡 characterizes the market situation. For the case 𝐷 = 10, 𝒔𝑡 has
size 81. A trading agent learns a policy 𝜋 (𝒂𝑡 |𝒔𝑡 ) that maximizes the dis-
counted cumulative return 𝑅 = 𝑡∞=0 𝛾 𝑡 𝑟 (𝒔𝑡 , 𝒂𝑡 , 𝒔𝑡 +1 ), where 𝛾 ∈
Í
(0, 1] is a discount factor and 𝑟 (𝒔𝑡 , 𝒂𝑡 , 𝒔𝑡 +1 ) is given in (3). The Bell-
3.2 Building Market Environment man equation gives the optimality condition for an MDP problem,
We build a market environment by replaying historical data, fol- which takes a recursive form as follows:
lowing the style of OpenAI Gym [9]. A trading agent interacts with
𝑄 𝜋 (𝒔 𝒕 , 𝒂 𝒕 ) = E𝒔 𝒕 +1 [𝑟 (𝒔 𝒕 , 𝒂 𝒕 , 𝒔 𝒕+1 )]+𝛾E𝒂 𝒕 +1 𝑄 𝜋 (𝒔 𝒕+1 , 𝒂 𝒕+1 ) . (6)
the market environment in multiple episodes, where an episode
replays the market data (time series) in a time-driven manner from There are tens of DRL algorithms that can be adapted to crypto
𝑡 = 0 to 𝑡 = 𝑇 − 1. At the beginning (𝑡 = 0), the environment sends trading. Popular ones are TD3 [19], SAC [22], and PPO [47].
an initial state 𝒔 0 to the agent that returns an action 𝒂 0 . Then, the Next, we describe a general flow of agent trading. At the be-
environment executes the action 𝒂𝑡 and sends a reward value 𝑟𝑡 ginning of training, we set hyperparameters such as the learning
and a new state 𝒔𝑡 +1 to the agent, for 𝑡 = 0, ...,𝑇 − 1. Finally, 𝒔𝑇 −1 rate, batch size, etc. DRL algorithms are highly sensitive to hy-
is set to be the terminal state. perparameters, meaning that an agent’s trading performance may
The market environment has the following three functions: vary significantly. We have multiple sets of hyperparameters in the
training stage for different trials. Each trial trains with one set of
• reset function resets the environment to 𝒔 0 = [𝑏 0, 𝒉 0, 𝒑 0, 𝒇0 ] hyperparameters and obtains a trained agent. Then, we pick the
where 𝑏 0 is the investment capital and 𝒉 0 = 0 (zero vector) since DRL agent with the best performing hyperparameters and re-train
there are no share holdings yet. the agent on the whole training data.
• step function takes an action 𝑎𝑡 and updates state 𝒔𝑡 to 𝒔𝑡 +1 . For
𝒔𝑡 +1 at time 𝑡 + 1, 𝒑𝑡 +1 and 𝒇𝑡 +1 are accessible by looking up the 3.4 Backtest Overfitting Issue
time series of market data, and update 𝑏𝑡 +1 and 𝒉𝑡 +1 as follows:
Backtest [14] uses historical data to simulate the market and eval-
uates the performance of an agent, namely, how would an agent
𝑏𝑡 +1 = 𝑏𝑡 − 𝒑𝑇𝑡 𝒂𝑡 , have performed should it have been run over a past time period.
(1)
𝒉𝑡 +1 = 𝒉𝑡 + 𝒂𝑡 . Researchers often perform backtests by splitting the data into two
Conference’17, July 2017, Washington, DC, USA Berend Gort, Xiao-Yang Liu, Xinghang Sun, Jiechao Gao, Shuaiyu Chen, and Christina Dan Wang
Figure 4: Logit distribution 𝑓 (𝜆) of three conventional Figure 5: Logit distribution 𝑓 (𝜆) of three DRL agents.
agents.
Metrics × Method S&P BDM Index Equal-weight PPO WF PPO KCV PPO TD3 SAC
Cumulative return −50.78% −47.78% −49.39% −55.54% −34.96% −59.08% −59.48%
Sharpe ratio −1.89 −1.73 −1.67 −1.89 −1.59 −1.86 −1.88
Prob. of overfitting 𝑝 - - 17.5% 7.9% 8.0% 9.6% 21.3%
Figure 6: The average trading cumulative return curves for conventional agents. From 05/01/2022 to 06/27/2022, the initial
capital is $1, 000, 000.
Figure 7: The average trading cumulative return curves for DRL algorithms. From 05/01/2022 to 06/27/2022, the initial capital
is $1, 000, 000.
Conference’17, July 2017, Washington, DC, USA Berend Gort, Xiao-Yang Liu, Xinghang Sun, Jiechao Gao, Shuaiyu Chen, and Christina Dan Wang
6 CONCLUSION [21] M Ugur Gudelek, S Arda Boluk, and A Murat Ozbayoglu. 2017. A deep learning
based stock trading model with 2-D CNN trend detection. In IEEE Symposium
In this paper, we have shown the importance of addressing the Series on Computational Intelligence (SSCI). IEEE, 1–8.
backtesting overfitting issue in cryptocurrency trading with deep [22] Tuomas Haarnoja, Aurick Zhou, Kristian Hartikainen, George Tucker, Sehoon
Ha, Jie Tan, Vikash Kumar, Henry Zhu, Abhishek Gupta, Pieter Abbeel, et al. 2018.
reinforcement learning. Results show that the least overfitting agent Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905
PPO (with combinatorial CV method) outperforms the conventional (2018).
agents (WF and KCV methods), two other DRL agents (TD3 and [23] Ben Hambly, Renyuan Xu, and Huining Yang. 2021. Recent advances in rein-
forcement learning in finance. arXiv preprint arXiv:2112.04553 (2021).
SAC), and the S&P DBM Index in cumulative return and Sharpe [24] Peter Henderson, Riashat Islam, Philip Bachman, Joelle Pineau, Doina Precup,
ratio, showing good robustness. and David Meger. 2018. Deep reinforcement learning that matters. 32nd AAAI
Future work will be interesting to 1). explore the evolution of the Conference on Artificial Intelligence, AAAI 2018 (2018), 3207–3214.
[25] Chris C Heyde. 1963. On a property of the lognormal distribution. Journal of the
probability of overfitting during training and for different agents; 2). Royal Statistical Society: Series B (Methodological) 25, 2 (1963), 392–393.
test limit order setting and trade closure; 3). explore large-scale data, [26] Dow Jones Indices and Index Methodology. 2022. S&P Digital Market Indices
Methodology. January (2022).
such as all currencies corresponding to the S&P BDM index; and 4). [27] Zhengyao Jiang and Jinjun Liang. 2017. Cryptocurrency portfolio management
consider more features for the state space, including fundamentals with deep reinforcement learning. In Intelligent Systems Conference (IntelliSys).
and sentiment features. IEEE, 905–913.
[28] Arthur Juliani, Vincent-Pierre Berges, Ervin Teng, Andrew Cohen, Jonathan
Harper, Chris Elion, Chris Goy, Yuan Gao, Hunter Henry, Marwan Mattar,
et al. 2018. Unity: A general platform for intelligent agents. arXiv preprint
REFERENCES arXiv:1809.02627 (2018).
[1] Rishabh Agarwal, Max Schwarzer, Pablo Samuel Castro, Aaron C Courville, and [29] Alexander Kuhnle, Michael Schaarschmidt, and Kai Fricke. 2017. Tensorforce:
Marc Bellemare. 2021. Deep reinforcement learning at the edge of the statistical a TensorFlow library for applied reinforcement learning. Web page. https:
precipice. Advances in Neural Information Processing Systems 34 (2021), 29304– //github.com/tensorforce/tensorforce
29320. [30] Xinyi Li, Yinchuan Li, Yuancheng Zhan, and Xiao-Yang Liu. 2019. Optimistic bull
[2] Andrew Ang, Tom Morris, et al. 2022. Asset Allocation with Crypto: Application or pessimistic bear: Adaptive deep reinforcement learning for stock portfolio
of Preferences for Positive Skewness. (2022). allocation. arXiv preprint arXiv:1907.01503 (2019).
[3] David H Bailey, Jonathan Borwein, Marcos Lopez de Prado, and Qiji Jim Zhu. [31] Zechu Li, Xiao-Yang Liu, Jiahao Zheng, Zhaoran Wang, Anwar Walid, and Jian
2016. The probability of backtest overfitting. Journal of Computational Finance, Guo. 2021. FinRL-Podracer: High performance and scalable deep reinforcement
forthcoming (2016). learning for quantitative finance. In Proceedings of the Second ACM International
[4] David H Bailey, Jonathan M Borwein, Marcos López de Prado, and Qiji Jim Zhu. Conference on AI in Finance. 1–9.
2014. Pseudomathematics and financial charlatanism: The effects of backtest over [32] Eric Liang, Richard Liaw, Robert Nishihara, Philipp Moritz, Roy Fox, Ken Gold-
fitting on out-of-sample performance. Notices of the AMS 61, 5 (2014), 458–471. berg, Joseph Gonzalez, Michael Jordan, and Ion Stoica. 2018. RLlib: Abstractions
[5] Lawrence E Barker. 2005. Logit models from economics and other fields. for distributed reinforcement learning. In International Conference on Machine
[6] Yosef Bonaparte. 2021. Introducing the Cryptocurrency VIX: CVIX. SSRN Elec- Learning. PMLR, 3053–3062.
tronic Journal (2021), 1–9. https://doi.org/10.2139/ssrn.3948898 [33] Richard Liaw, Eric Liang, Robert Nishihara, Philipp Moritz, Joseph E Gonzalez,
[7] Yosef Bonaparte. 2021. Introducing the Cryptocurrency VIX: CVIX. Available at and Ion Stoica. 2018. Tune: A research platform for distributed model selection
SSRN 3948898 (2021). and training. arXiv preprint arXiv:1807.05118 (2018).
[8] Xavier Bouthillier, Pierre Delaunay, Mirko Bronzi, Assya Trofimov, Brennan [34] Jimmy Lin, Daniel Campos, Nick Craswell, Bhaskar Mitra, and Emine Yilmaz.
Nichyporuk, Justin Szeto, Nazanin Mohammadi Sepahvand, Edward Raff, Kanika 2021. Significant improvements over the state of the art? A case study of the ms
Madan, Vikram Voleti, et al. 2021. Accounting for variance in machine learning marco document ranking leaderboard. In Proceedings of the 44th International
benchmarks. Proceedings of Machine Learning and Systems 3 (2021), 747–769. ACM SIGIR Conference on Research and Development in Information Retrieval.
[9] Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schul- 2283–2287.
man, Jie Tang, and Wojciech Zaremba. 2016. OpenAI Gym. (2016), 1–4. [35] Xiao-Yang Liu, Hongyang Yang, Jiechao Gao, and Christina Dan Wang. 2021.
arXiv:1606.01540 FinRL: Deep reinforcement learning framework to automate trading in quantita-
[10] Michael W Browne. 2000. Cross-validation methods. Journal of Mathematical tive finance. In Proceedings of the Second ACM International Conference on AI in
Psychology. Journal of Mathematical Psychology 44 (2000), 108–132. Finance. 1–9.
[11] Stephanie CY Chan, Samuel Fishman, John Canny, Anoop Korattikara, and Sergio [36] Yukun Liu and Aleh Tsyvinski. 2021. Risks and returns of cryptocurrency. The
Guadarrama. 2019. Measuring the reliability of reinforcement learning algorithms. Review of Financial Studies 34, 6 (2021), 2689–2727.
arXiv preprint arXiv:1912.05663 (2019). [37] Andrew W Lo. 2003. “The Statistics of Sharpe Ratios”: Author’s Response. Finan-
[12] Kaleigh Clary, Emma Tosch, John Foley, and David Jensen. 2019. Let’s play again: cial Analysts Journal 59, 5 (2003), 17–17.
variability of deep reinforcement learning agents in Atari environments. arXiv [38] Horia Mania, Aurelia Guy, and Benjamin Recht. 2018. Simple random search
preprint arXiv:1904.06312 (2019). provides a competitive approach to reinforcement learning. arXiv preprint
[13] Christian Conrad, Anessa Custovic, and Eric Ghysels. 2018. Long- and short-term arXiv:1803.07055 (2018).
Cryptocurrency volatility components: A GARCH-MIDAS Analysis. Journal of [39] John Moody and Matthew Saffell. 2001. Learning to trade via direct reinforcement.
Risk and Financial Management 11, 2 (2018), 23. IEEE Transactions on Neural Networks 12, 4 (2001), 875–889.
[14] Marcos Lopez De Prado. 2018. Advances in financial machine learning. John [40] Alan Moreira and Tyler Muir. 2017. Volatility-managed portfolios. The Journal
Wiley & Sons. of Finance 72, 4 (2017), 1611–1644.
[15] Yue Deng, Feng Bao, Youyong Kong, Zhiquan Ren, and Qionghai Dai. 2016. Deep [41] Noemi Nava, Tiziana Di Matteo, and Tomaso Aste. 2016. Time-dependent scaling
direct reinforcement learning for financial signal representation and trading. patterns in high frequency financial data. The European Physical Journal Special
IEEE Transactions on Neural Networks and Learning Systems 28, 3 (2016), 653–664. Topics 225, 10 (2016), 1997–2016.
[16] Jesse Dodge, Gabriel Ilharco, Roy Schwartz, Ali Farhadi, Hannaneh Hajishirzi, and [42] Jerzy Neyman and Egon S Pearson. 1933. The testing of statistical hypotheses in
Noah Smith. 2020. Fine-tuning pretrained language models: Weight initializations, relation to probabilities a priori. In Mathematical proceedings of the Cambridge
data orders, and early stopping. arXiv preprint arXiv:2002.06305 (2020). philosophical society, Vol. 29. Cambridge University Press, 492–510.
[17] Fan Fang, Carmine Ventre, Michail Basios, Leslie Kanthan, David Martinez-Rego, [43] He Ni and Hujun Yin. 2009. Exchange rate prediction using hybrid neural
Fan Wu, and Lingbo Li. 2022. Cryptocurrency trading: a comprehensive survey. networks and trading indicators. Neurocomputing 72, 13-15 (2009), 2815–2823.
Financial Innovation 8, 1 (2022), 1–59. [44] Hyungjun Park, Min Kyu Sim, and Dong Gu Choi. 2020. An intelligent financial
[18] Farhad Farokhi and Mohamed Ali Kaafar. 2020. Modelling and quantifying mem- portfolio trading strategy using deep Q-learning. Expert Systems with Applications
bership information leakage in machine learning. arXiv preprint arXiv:2001.10648 158 (2020), 113573.
(2020). [45] Antonin Raffin, Ashley Hill, Adam Gleave, Anssi Kanervisto, Maximilian Ernestus,
[19] Scott Fujimoto, Herke Van Hoof, and David Meger. 2018. Addressing Function and Noah Dormann. 2021. Stable-baselines3: Reliable reinforcement learning
Approximation Error in Actor-Critic Methods. 35th International Conference on implementations. Journal of Machine Learning Research (2021).
Machine Learning, ICML 2018 4 (2018), 2587–2601. arXiv:1802.09477 [46] PM Robinson and CA Sims. 1994. Time series with strong dependence. In Ad-
[20] Dirk F Gerritsen, Elie Bouri, Ehsan Ramezanifar, and David Roubaud. 2020. The vances in Econometrics, sixth World Congress, Vol. 1. 47–95.
profitability of technical trading rules in the Bitcoin market. Finance Research [47] John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov.
Letters 34 (2020), 101263. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347
Deep Reinforcement Learning for Cryptocurrency Trading: Practical Approach to Address Backtest Overfitting Conference’17, July 2017, Washington, DC, USA