Professional Documents
Culture Documents
Finance
A
common goal of investment funds is researchers would be particularly careful typically have numerous parameters and
to deliver a higher percentage return when conducting statistical inference. Sadly, choices. Suppose that an investor believes
than the overall market without the opposite is true. that there may be monthly patterns in certain
incurring a greater probability of a A leading reason for the failure of sets of stocks that may lead to a profitable
financial loss. To devise investment strategies investment models is backtest overfitting strategy, say by purchasing shares on a fixed
to achieve this goal, firms and analysts typically (see the Glossary, page 25, for a definition day of the month, and selling on another
feed historical market data into computer of this and other italicised terms). This fixed date. There are many variations for such
programs that test a multitude of combinations occurs when historical market data is used a strategy, as illustrated in Figure 1.
of financial instruments, weighting factors, to develop an investment model, fund or Note that even with this simple investment
decision points and other parameters, all to strategy, but too many model variations are strategy (which, by the way, is very unlikely to
identify an “optimal” design. With this “optimal” tried relative to the amount of data available. produce reliable market-beating profits), there
design in hand, they tout the potential return It is a form of selection bias under multiple are 435 choices just for the start and end dates
that an investment based on this design is likely testing. Models, funds and strategies suffering of each monthly investment cycle. Admittedly,
to deliver, based on its simulated performance from this type of statistical overfitting typically not all of these choices count as independent
on historical data – a process known as target the random patterns present in the trials, but each additional choice raises the
backtesting. However, in all too many cases, limited in-sample test data on which they probability of a fluke. In any event, it is clear
such investments deliver only disappointing are based, and thus often perform erratically that designing such a strategy by searching
performance when fielded.1 The “optimal” when presented with new, truly out-of-sample via computer over the space of all parameter
design turns out to be a false discovery. data. The sobering consequence is that a combinations, in order to design an “optimal”
Three features of financial research substantial portion of the models, funds and strategy, is virtually certain to produce an
make this field particularly prone to false strategies employed in the investment world overfitted backtest, unless one explicitly
discoveries. First, the probability of finding may be merely statistical mirages. guards against it using rigorous statistical
a profitable investment strategy is very low, tools and a solid economic rationale.2
due to intense competition. Second, true Designing investment strategies
findings are mostly short-lived, as a result by computer search Overfitting in the design of
of the rapidly changing nature of financial The potential for backtest overfitting in stock funds
systems. Third, unlike in the natural sciences, the financial field has grown enormously Consider the problem of designing an
it is rarely possible to verify statistical in recent years with the increased use of investment fund to meet some desired
findings through controlled experiments. In computer programs to search a space performance profile. One increasingly
the absence of controlled experiments, it is of millions or even billions of parameter popular investment product is the exchange-
virtually impossible to debunk a false claim. variations for a given model, fund or strategy. traded fund (ETF), namely a mutual fund that
One would hope that, in such circumstances, Even very simple investment strategies may be freely traded during the day like an
1.3
parameters and selecting only the “optimal”
parameters for an index fund subsequently
fielded in the financial markets. 1.25
Evaluating investments
Investments are typically evaluated by the Sharpe ratio, a metric millions, or even billions of variations of a given strategy and
of the performance of an asset relative to its volatility select only the “optimal” variation, it follows that it is very easy to
(“riskiness”).7 It is calculated by dividing the expected excess find impressive-looking strategy variations that are nothing more
returns relative to a risk-free asset, like a US Treasury bond, by the than false positives.
standard deviation of the returns. To make Sharpe ratios The present authors combined the ideas behind the probabilistic
comparable across investments with different sampling frequency, Sharpe ratio and the false strategy theorem to derive a formula for
the ratio is often “annualised”, by multiplying it by the square root deflating the Sharpe ratio.10 The deflated Sharpe ratio is the
of the number of observations in a year. However, annualised probability that an observed Sharpe ratio was drawn from a
Sharpe ratios should not be thought of as t-values for testing the distribution with positive mean, after controlling for sample
significance of the sample mean, since they do not take into length, skewness, kurtosis, and the number of strategy variations
account the number of observations. To correct for this problem, explored. Let us suppose that a researcher is constructing a
the present authors proposed the probabilistic Sharpe ratio,8 financial model or strategy based on the daily closing values of the
which allows one to test the significance of the Sharpe ratio under FTSE 100 index. An observed annualised Sharpe ratio of 1, where
general conditions of stationarity and ergodicity. the backtest length is 10 years of daily returns drawn, may appear
Another useful tool is the false strategy theorem.9 An investment to be strong evidence of a true discovery. However, if the
analyst may carry out a large number of simulation trials on researcher conducted three or more independent trials, our
historical data, and report only the model, fund or strategy with the confidence that the finding is statistically significant is below the
maximum Sharpe ratio. But the distribution of the maximum standard 95% cutoff. Figure 3 shows the deflated Sharpe ratios for
Sharpe ratio is clearly not the same as the distribution of a Sharpe strategies with observed annualised Sharpe ratios of 0.5, 1, and
ratio randomly chosen among the trials. Instead, the expected value 1.5, as a function of the number of trials. In practice, investment
of the maximum Sharpe ratio is greater than the expected value of strategies’ returns often exhibit positive autocorrelation, negative
the Sharpe ratio from a random trial. In particular, given an skewness, and fat tails, which further depress the deflated Sharpe
investment strategy with expected Sharpe ratio zero and non-zero ratio. The implication is that, in most cases, as few as three
variance, the expected value of the maximum Sharpe ratio steadily independent trials suffice to produce an investment strategy that
increases, up from zero, as a function of the number of trials. One is likely false.
can thus deduce an expected maximum Sharpe ratio, namely the
hurdle or threshold that the reported Sharpe ratio must exceed
before it can be considered a significant finding. This result is
known as the false strategy theorem: given a sample of estimated
performance statistics {Sk}, k = 1, …, K, each independently
following a zero-mean, unit-variance Gaussian distribution, we have
1 1
E[maxk{Sk}] ≈ (1 – γ)Z–1 1 – + γZ–1 1 –
K Ke
where E[·] denotes expected value, Z–1[·] denotes the inverse of the
standard Gaussian cumulative distribution function, e is Euler’s
number (2.718281828…, the base of natural logarithms), and γ is
the Euler–Mascheroni constant (approx. 0.5772156649…).
In practical terms, the false strategy theorem tells us that the
optimal outcome of an unknown number of historical market data
simulations is right-unbounded. In other words, with enough
trials, there is no Sharpe ratio threshold sufficiently large to reject
the hypothesis that a strategy is false. The rule of thumb of halving
the backtest’s Sharpe ratio, popular among many investment
Figure 3: Deflated Sharpe ratios as a function of the number of trials, based
professionals, has no scientific basis. Again, given the ease with on backtests of 10 years of independent and identically distributed normal
which one can use a computer to explore literally thousands, daily returns.
yet they underestimated the index level by first half of 2009. In other words, as Kaissar technical analysis, a relatively unsophisticated
10.6% for the initial recovery year, 2003. A lamented, “the forecasts were least useful form of historical data analysis.3 Expanding on
similar phenomenon was seen in 2008, when when they mattered most”. an earlier study, we analysed forecasts based
strategists overestimated the S&P 500’s year- In 2018, the present authors published, with on two key factors: the time-frame of the
end level by a whopping 64.3%, but then a colleague, an in-depth analysis of 68 market forecast and the importance and specificity
underestimated the index by 10.9% for the forecasters, including many who employ of the forecast. Our study found that the