Professional Documents
Culture Documents
Sofien Kaabar
7.3K Followers About Follow
You have 2 free member-only stories left this month. Sign up for Medium and get an extra one
Back-testing is your proof of work for any strategy you might think of. If the trader’s
friend is the trend, then back-testing is his best friend. Many traders out there just “accept”
the mainstream strategies that are already out there and use them in hopes of becoming
rich.
The correct framework is to get ideas from here and there, build-up on them, combine
them, bring in a touch of your creativity, and create your strategy with your own risk
profile. Trading has a huge research and analysis component in it, and it should not
really be delegated to online common knowledge. The below are some common back-
testing traps and how to overcome them.
I have just published a new book after the success of New Technical Indicators in
Python. It features a more complete description and addition of complex trading
strategies with a Github page dedicated to the continuously updated code. If you feel
that this interests you, feel free to visit the below link, or if you prefer to buy the PDF
Get started Open in app
version, you could contact me on Linkedin.
Overfitting is when a model forecasts data using a relationship that is so close to the past
values that it fails to account for the general existing relationship. Thus, on the training
period, the results will be good but on the testing period, they will be disappointing.
Underfitting is the opposite of overfitting, it is the equivalent of a model that has not
done its homework to fully understand the data. We have what we call a bias and a
variance problem which are both considered “fit” issues:
Bias is also known as underfitting and this is simply when the model encounters a
signal and thinks it is noise.
Variance is also known as overfitting and this is simply when the model encounters
noise and thinks it is a signal.
The problem in real life is that by reducing one, you are increasing the other, hence, you
are making a trade-off between the two. You want to find the right balance between
sacrificing one for the sake of the other, this is called the Bias-Variance Trade-Off.
Get started Open in app
Decreasing the training period so that model does not exactly fit the training data.
If you are using neural networks, consider a dropout function and an early stopping
technique.
Increasing the training period so that model finds more data to work with and be
able to make predictions based on sufficient information.
Naturally, when we want to back-test our model, we want it to use its knowledge on
unseen data, so we split our historical data into two with the first set called the training
set and it is the one that the model will analyze while the second (more recent) one
called the testing set, and it is the one that the implied relationship will be calculated on.
In other words, the model has understood the relationship by looking at the training set
and is now expecting it to continue on the test set, so, it will predict on the test set.
Get started Open in app
If we want to be technically correct, the only out-of-sample period is the one we haven’t
seen yet (i.e. the future), but for now, we can consider the test set to be out-of-sample.
Overfitting and Underfitting occur during the in-sample period and we see the
disappointing results on the out-of-sample period.
For instance, let us assume that the average historical bid-ask spread on the USDCAD
pair given by your broker is 0.6 pips, the best thing to do is to suppose that the actual
spread is at least the historical average plus a margin for all the unexpected costs. This is
an example of upward biasing because we know that by time, bid-ask spreads are getting
more competitive (i.e. smaller) which is a positive thing regarding market efficiency.
The disadvantage of doing so is that many short-term models will get filtered out, for
instance, models that run on M5 and M15 time frames are more sensitive to costs than
models that run on hourly time frames and thus cost management is imperative for the
model to be able to provide consistent results.
Look-Ahead Bias
This bias is known in the field of back-testing and research and although it is considered
more of a rookie mistake, it must never be forgotten. Look-ahead bias is when you are
using the future to predict the past. Consider a strategy that relies on the daily closing
values of the S&P500 to make predictions. When designing the model, you erroneously,
use today’s close to predict today’s close, hence, you are using a future information to try
to predict the future and this is of course unfeasible.
The table below shows quarterly GDP data for the United Kingdom in 2015
Notice that the third quarter GDP which usually refers to the period between
Get started Open in app
01/07/2018 and 30/09/2018 is published on 09/11/2018, hence, a lag of more than a
month. Any researcher should be careful not to put these figures with other figures that
actually do get released on 30/09/2018. Using and assigning data that would otherwise
have not been available on that date is referred to as look-ahead bias. Failing to make the
appropriate adjustments will make the whole analysis erroneous, unrealistic, and
impossible to reproduce. Economic data particularly suffers from this problem and
adjustments should be made to take into account the lag.
With some governmental data, changes and updates can be made and thus introducing
errors to the model. For instance, the above GDP table has had many revisions for it to
finally settle on the above values. Data update also contributes to the fact that correct
information might come even later than expected. After correcting for look-ahead bias,
we may find ourselves in front of a newly revised value that could alter predictions and
temper with the training of the model. A possible and naïve way to correct for this issue
is to only use the preliminary releases and disregard any updates, as it is understood that
they will only be known at a later time in the future, it becomes useless to incorporate
them inside the model. We deal with what we have now.
When performing a back-test, you have to make sure to define what type is your strategy
and on which regimes (states) will you be testing it on. Let us look at an example,
consider a strategy that goes long (initiates a buy order) each time we have three
consecutive lower lows and goes short (initiates a sell order) each time we have three
consecutive higher highs. It is clear that this is a contrarian strategy more suited inside a
range.
Now, if we apply it on a trending market such as the S&P500, what would be the result? I
will give you a hint: Bad.
Get started Open in app
The answer is clear, trending markets require trend-following strategies. But what about
markets that alternate between ranging and trending? In that case, we need other tools
to approximate the current state and use a proper strategy. This can get complicated very
fast. Here is an example of a trend-following strategy I like to use from time to time:
Non-Stationary Data
Stationarity is synonymous to a constant mean over time. A changing mean will cause
the model to produce erroneous forecasts. A time series data is stationary if it has a
constant mean and variance, that is, its mean does not change much by time. The same
goes for its variance (volatility).
In other more technical terms, stationarity is when prices diffuse at a slower rate
than a geometric random walk. Financial data have too much noise and differencing
or taking log returns will make them almost stationary at the cost of losing their
memory, but that is the best we have got right now at the very basic level.
Note that we are talking about feeding our machine learning models with inputs to produce
a forecast.
Get started Open in app
As time series (prices per se) exhibit significant autocorrelation in small intervals of
time, it is rational to assume that it is quite easy for the model to deliver such good
results. When you have a machine learning algorithm that has predictive power, you
must use it on stationarity data otherwise, the results will be false.
In layman’s terms, we will apply a machine learning model to actual prices (and not
returns) of Bitcoin relative to the US dollar (BTCUSD pair) and evaluate our predictions
on the out-of-sample dataset. Below are the rules and results of the predictions:
Rules
Results
R-square: 0.98
Interpretation
The R-square means that our model explains 98% of the variations, that is a 0.99
correlation between the predicted values and the actual values. A utopian model like
this does not exist in such a complex world, and whoever is capable of making such a
model will be a billionaire in less than a week.
Something must be wrong here, and there is. First, we take a look at the below graph
showing the high linear correlation between the predicted and real values using a simple
linear regression model that uses past prices as explanatory variables to guide it with
finding future prices.
Get started Open in app
R-square between the predicted and actual values shows a superior modelling power of our algorithm. (Image by
author)
However, if we plot the actual and predicted values on a line chart to better visualize the
correlation and see how good our model is doing, we see something strange. Indeed, it
seems that our model is following the actual values with a lag of one. If today’s values go
up, our forecast for tomorrow is also up and vice versa. It appears that we are not really
doing anything but repeating yesterday’s news. Not only does this model not have a
predictive power but over time the transaction costs will eat any stochastic (random)
profits that may come with luck.
Actual values vs predicted values. The model seems to be simply replicating the value of yesterday. (Image by
author)
Test #2: Stationary Data
Get started Open in app
The models we use are based on the fact that the time series is stationary which in turn
will provide a real forecast, that is, the model is actually doing something useful.
Evidently, most machine learning models should be used to predict the differences
between prices (in the case of asset predictions). Let us now, repeat the above
experiment with the exact same rules but only this time we will be using returns data
(differenced data can also be used). The lagged period is also the same.
The results show that now the model actually does a bit worse than a random walk
model which might suggest the same for the dataset. Are the fluctuations of the
BTCUSD random? It takes more analysis to answer that question but for now we can
safely say that our algorithm with the actual parameters cannot forecast the direction of
the asset. This is obvious, because a simple linear model cannot predict a highly complex
market.
If you are also interested by more technical indicators and using Python to create
strategies, then my best-selling book on Technical Indicators may interest you:
Rules
Results
R-square: 0.03
Get started Open in app
Accuracy: 49%
The plot of predicted vs actual values shows no correlation between the two whatsoever, giving a stronger
conviction of the underperformance of our model. (Image by author)
The returns of BTCUSD were much more volatile than predicted with the linear regression model. The model has
been wrong about 51% of the time. (Image by author)
Another measure worth mentioning in the case of a linear model is the R-square. This
goodness-of-fit measure is very common in econometrics. It is the percentage of the
dependent variable that is explained by the independent variable(s). Before we
introduce the formula (that is very simple), we must mention two calculations, SSE
Get started Open in app
(squared sum of errors) is the unexplained part by the model and the SST (squared sum
of totals) is the unexplained plus the explained part by the model. Intuitively, from the
formula below we can see that the R-square measures the percentage explained by the
model.
This gives you a profit of 80 x 0.2 = $16. Alright, not bad but let us see the remaining
losing 20 trades which have lost $1 each. This gives you a loss of 20 x 1.0 = $20. Your
net profit is therefore -$4.00. Hence, by getting it right 8 out of 10 times, you have
managed to lose money. How to fix this?
Get started Open in app
Risk-reward Trade-off and the Hit Ratio. At a risk-reward Ratio of 1.00, we need 50% to breakeven. (Image by
author)
We have to expect at least $1.8 for every $1.0 we are risking. This gives us a margin to
wiggle with. With a risk-reward ratio of 1.8, we only need a hit ratio of 35.70% to
breakeven. Thus, consider evaluating a strategy that had 40% hit ratio with a risk-
reward of 1.82 and using the same position sizes.
To compute the required hit ratio to break even, you can use the following formula:
An example of a Strategy that has produced positive cumulative returns. (Image by author)
Although it does look good, when we take the years one by one, we find some losing
years that are much less attractive and can actually wipe us out if we start trading this
strategy at the wrong time. This begs the question, if we were truly trading based on
this strategy and had a bad year, would we continue? Unfortunately, we will never
know and that is why we need a strategy that wins most of the time (years) and not one
that greatly outperforms in a few years but spends most years losing money or being
flat.
We should look at the evolution of the strategy and not just stick to basic performance
statistics. The above strategy had a 61.67% Hit ratio but still manages to be somewhat
bad.
Following a “buy when the RSI(14) touches 20 and sell when the RSI touches 80”
strategy, we get the following results for both tests:
Notice the huge difference between the final results? Even though they are both
negative, the one without risk management did much worse. We do not want that to
happen when we switch from virtual to real time trading.
Conclusion
There will always be some form of bias in the back-test. Our job as researchers and
traders is to minimize them so as to maximize the probability of realization. We are all of
course familiar with the saying that the history does not repeat itself or that the past is
not a reflection for future profits but the past is the best we have in our fight against the
Get started Open in app
future. If you manage to at least incorporate some of the above points, then you are
likely on the right track. Remember, finding your strategy is not an overnight process, be
patient.
Your email
Subscribe
By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about
our privacy practices.
Finance Trading Investing Machine Learning Artificial Intelligence
Get started Open in app