You are on page 1of 10

30 Years of Evidence for (and against)

Anomalous Returns
from Cash Flow Statements
in the United States and lessons on quantitative validation.

HangukQuant1, 2 *

January 13, 2023

1 2

Abstract

In this paper we look at excess returns generated from accrual effects. We find moderate
usefulness, particularly when paired with long biased weights, but the evidence for excess returns
is not significant compared to results touted in literature.

1
*1: hangukquant@gmail.com, hangukquant.substack.com
2
*2: DISCLAIMER: the contents of this work are not intended as investment, legal, tax or any other advice, and
is for informational purposes only. It is illegal to make unauthorized copies, forward to an unauthorized user or to
post this article electronically without express written consent by HangukQuant.

1
1 Introduction

The accrual anomaly effect is well known in literature, with significant effort attributed to the study
of its return and possible attributions. Some useful papers in this regard are mentioned:

1. https://papers.ssrn.com/sol3/papers.cfm?abstract id=1558464.

2. https://link.springer.com/article/10.1023/B:REQU.0000015852.00973.8f.

3. https://www.sciencedirect.com/science/article/pii/S0927538X15000293.

The common argument given for the presence of accrual anomalies are due to earnings fixation
hypothesis and manipulation of income statements. The resulting difference between earnings
persistence and cash flow persistence creates a gap in expected future returns. We refer readers to
the papers referenced for more in depth arguments. We present the code and implementation. We
find initially attractive results consistent with other literature, but these returns are diminished
when controlled for potential biases. Although the conclusions are anticlimactic, we make some
illuminating notes on controlling for look-ahead bias in testing studies. We open issues in our
Russian doll hypothesis testing and raise developments for future works.

1.1 A Note of Precaution

There is absolutely no warranty or guarantee implied with this product. Use at your own risk. I
provide no guarantee that it will be functional, destructive or constructive in any sense of the word.
Use at your own risk. Trading is a risky operation.

2 Notes

When I first came across this anomaly, I talked about it on Twitter


https://twitter.com/HangukQuant/status/1609517231624572929, and those were my initial drafts
for the equity curve, which I thought were worthy of paying attention to. It was before I had done
more extensive tests, and to my disappointment the anomaly did not stand the series of tests I
put it through. I just wanted to make an aside here and comment that, many more of these will

2
come. Many of the strategies published, including those that I talk about will end up being ghost
patterns that represent insignificant phenomena - although it is possible for me to obfuscate the
results and only post the attractive artefacts of the strategy, making it look like a strong source
of alpha - those are obviously NOT the objectives of our work. Our job is to train each other’s
‘bullshit detector’, and make better scientists/traders out of ourselves. We will unapologetically
continue to try to refute our own work, and fantastically disappoint you in the process. Hopefully
you too, can learn in the process of us detailing our mistakes. The source code is provided in the
cbz file in the attendant post.

3 Results and Implementation

We refer readers to the reasoning for accrual anomalies in relevant papers referenced inside Section
1. We discuss implementation and initial results. We discuss changes made that diminishes the
impact of these results. We offer notes in avoidance of future mistakes and open running issues in
the Russian Doll model.

We implement the percent operating accrual variant of the anomaly computation. Data ob-
tained is from eodhistoricaldata. The signals are rebalanced not on the reporting date, but on the
filing date to prevent lookahead errors. To prevent ambiguity about when the report was released,
for day t filing we assume the relevant figures are available at t + 1. Here the signal is computed

αi = (N Ii − COi )/|N Ii | (1)

where N Ii is net income of stock i, COi is cash from operations for stock i. We want to take long
positions in the low accrual decile, and short positions in the high accrual decile. Therefore we take
random variable Xi = −αi and take order statistics to obtain X(i) , i ∈ [n], the signals computed
in increasing order. We take order statistics as opposed to raw value since the variable |N Ii | can
be very close to zero, throwing off forecast range and therefore weight stability across time. The
construction of a long biased portfolio making use of positive equity drift takes forecast
1+n
2
f(i) = X(i) − (2)
2
1+n
where subtracting the numerator 2 makes the forecast a centred random variable and the further
division by two makes Efi > 0, a long biased forecast. This forecast is passed through the Russian

3
Doll (see HangukQuant [2]) such that forecast values are serially scaled, and volatility is asset nor-
malized. Then the positions are taken and traded in walk forward with time normalizing volatility
to obtain results. Hypothesis testing from the Russian Doll is employed to test for significance.
Code provided looks across geographical markets at

[’GSPC’, ’RUT’, ’FTSE’, ’OBX’, ’SSMI’] #sp500, rut2000, ftse100, oslo, swiss

but here we only discuss our simulations on the components of SP500 (less banking stocks) for
brevity.

The equity curve is presented in Figure 1, with Sharpe ratio of 1.343 under costless assumptions.
This is a quarterly rebalancing scheme, with minor volatility adjustments on liquid assets - we can
expect cost-adjusted Sharpe returns to still remain attractive, conservatively above 1.00. All figures
presented here and in the remainder of the paper are logarithmic.

Figure 1: pnl, 485 stocks of sp500 traded with long tilt

At this point, we are tempted to think - since this is a long biased portfolio (with short
positions), there must be performance attribution to the equity index itself. The mean Sharpe of
the SP500 is 0.65 ∼ 0.75 thereabout, and even though our strategy is market-biased, the Sharpe

4
of 1.34 almost about doubles that of the index. Our strategy is useful beyond market drift. Recall
that we can test this from the hypothesis test for managerial skill in the Russian Doll. We run
the permutation Monte Carlo tests to obtain p-values of < 0.01 for asset timing, asset picking and
trader p-values. Remarkable. Statistical evidence for excess returns, alpha found and dusted.

Not. A few things wrong with our initial assumptions - first is that the SP500 is a dynamic asset
universe, with stocks coming in and out of the index. The Sharpe computed from market-weight
adjusted index components are not directly comparable to the historical performance of the SP500.
This is a survivorship bias, or a form of lookahead bias. But we mostly knew that already even
for moderately experienced traders. The next issue is our prior - our a priori assumption that the
benchmark for our strategy is a market weighted index return is presumably questionable. Besides,
large stocks face lesser trading friction and different liquidity pressures, and there is undoubtedly
differences in pricing between stocks of different liquidity quantiles. Even if were to compare cost-
free trading performances between our strategy and the market weighted index, a dollar parity
portfolio is likely to better take on the role of a null model than the cap weighted index. This is
perhaps less obvious.

In an ideal world, we would have a database of when stocks were actively listed, accurate
pricing data and infinite computational resources. Tracking the history of listed stocks in an index
could easily expand the universe size in the tens of thousands, even for moderate lapses of time.
Then our backtest gets increasingly computationally expensive, and tools such as permutation
testing becomes impossible to run for thousands of iterations in post-hoc analysis.

Fortunately, we can employ a smart trick and apply the same biases to a horse race in a
controlled experiment, under the settings of dollar parity and lookahead bias. That is, we apply a
backtest with the same biases, and where position forecasts are equally distributed among assets.
Figure 2] represents the strategy horse race, with parity in orange. Same volatility targeting both
serially and cross sectionally were applied to keep all other factors invariant.

5
Figure 2: pnl, 485 stocks of sp500 traded with long parity

The resulting parity portfolio yielded Sharpe of 1.28 compared to the previous 1.34 .
Clearly, we are not that impressed by our original results anymore. It is only moderately better,
despite being a long-short portfolio - not nothing, but not much.

Diagnosis. We then found another interesting statistic that revealed an issue: the hypothesis
tests were designed to give statistical insignificance when compared to a null model of random
positions. Or, we shall expect it to give statistical insignificance for timing or picking skill relative
to a non-random but parity adjusted portfolio. Running the hypothesis test on the parity portfolio,
we obtain p-value of < 0.01 on timing, picker, trader tests. Preposterous.

It turns out that that whenever the overall market drift is positive and we are generally long
the market - when we shuffle the decision making process - there is a good probability that we are
mixing positive weight attribution to zero return components. We assigned zero return to periods
where a stock did not trade, and then by reshuffling the decision vector, we could be assigning ‘trade
decision’ to ‘inactive stock’, which combined with overall positive drift severely underestimates the
p-value. We dealt with this in the data shuffler permutation, but forgot that the decision shuffler
faces a tangential problem. We are opening up this issue here, and look to fix it in future works.

6
General Recommendations. The gold standard for avoiding biased tests is to well, avoid
the bias. However, we often do not have the resources or data to make these adjustments, resulting
in first order approximations. We suggest three resolutions for quantitative researchers as lessons
from our work serving as improvements over such rough approximations.

1) In the presence of directional bias factors, we propose a parity adjusted horse race as
control, combined with non-parametric tests for significance conclusions. The adjustment is in
relation to the other risk management factors, and directionality bias present in the testing engine,
and is independent of the signal generation. Comparing the accrual effect portfolio to the parity
adjustment portfolio, pairing their returns and applying one-sample Wilcoxon rank sum test to
their differences, we obtain p-value of 0.02157 under the null hypothesis that median returns
in the accrual portfolio is higher. Despite only a small difference in Sharpe statistics, there is
significant difference in the SP500 dataset. However, this significance is not replicated in any
other dataset. The statistical tests provide probabilistic arguments for the results obtained. The
difference between the parity portfolio and index returns also offer information on the degree of
lookahead bias in absolute Sharpe under the same settings of portfolio size and backtest duration.
For long-bias in strategy of b, a lookahead bias regularisation constant can be computed between
the bias-ful parity Sharpe Ps and bias-free equal weighted index Sharpe Is , given

c = b(Ps − Is ). (3)

The expected risk-adjusted returns of the strategy portfolio of concern should incur this penalisation
factor

Sˆs = Ss − c, (4)

where Sˆs is our bias adjusted Sharpe estimate for the strategy.

2) Regardless of directional bias factors, we propose publishing results on the same factor
return applied in a long-short directionally neutral framework , where possible. Since we
are doing an apples-to-apples comparison of stocks that existed in the past, while removing the
drift factor, the directionally neutral performance significantly mutes the impact of both known
and unknown bias factors. Figure 3 presents the directionally neutral accrual portfolio. The Sharpe
of this neutral accrual portfolio now has a revised downward estimate of 0.283. We test if the
Equation 4 is roughly correct in adjusting for bias.

7
Denote the strategy long bias to be b, and fully long parity portfolio to have Sharpe Ps . Let
equal weighted index return be denoted Is , which are commonly known in financial literature and
news media. The adjustment is c = b(Ps − Is ). Then we argue that both known and unknown
biases are muted in the directionally neutral portfolio such that the adjusted strategy Sharpe is
now

Sˆs = Ss − c (5)

= Ss − b(Ps − Is ) (6)

≈ b · Is + Ds (7)

where Ds is the Sharpe of the neutral portfolio. Now assume an equal weighted SP500 Sharpe
of about Is = 0.70, which is roughly consistent with reported figures. Our directionally-biased
accrual portfolio has approximately b = 0.80 fraction of positions long, which in comparing with
the biased-parity portfolio results in regularisation constant c = 0.80(1.28 − 0.70) = 0.464, and
adjusted strategy Sharpe 1.34 − 0.464 = 0.876. In contrast, using the equation b · Is + Ds we have
0.8·0.7+0.283 = 0.843. Both give very similar estimates for bias-adjusted Sharpe performances, and
despite not explicitly controlling for survivorship bias, the directionally neutral portfolio proxies the
optimistic effects. Both give more conservative estimates for adjustments post adjustment, relative
to the raw performance. This is not mathematically formalised though, and requires some more
work in both empirical and theoretical justifications.

8
Figure 3: pnl, 485 stocks of sp500 traded with equal long-short

3) We propose the use Monte Carlo permutation tests to the strategy decision making
process and data components. The discussion and basis for using Monte Carlo permutation methods
in hypothesis testing were discussed in works by HangukQuant [1]. In addition to being useful in
both directional and non-directional strategies, the quantification of p-value is possible with finer
control over different skill factors such as asset picking and asset timing. Updates to the open issue
will be provided in future works.

4 Conclusion

We discussed unique, new frameworks in adjusting for both known and unknown bias under quanti-
tative testing without requiring more data, encouraging better reporting standards in quantitative
literature. These improvements should be made with regards to both descriptive statistics and
hypothesis testing on the efficacy of systematic trading rules.

9
References

[1] HangukQuant. Probabilistic Analysis of Trading Systems, Paper. https://hangukquant.


substack.com/p/probabilistic-analysis-of-trading, .

[2] HangukQuant. Statistical Suites with Russian Doll System (IMPORTANT!). https://
hangukquant.substack.com/p/statistical-suites-with-russian-doll, .

10

You might also like