You are on page 1of 15

International Journal of Forecasting 32 (2016) 1193–1207

Contents lists available at ScienceDirect

International Journal of Forecasting


journal homepage: www.elsevier.com/locate/ijforecast

Equity premium prediction: Are economic and technical


indicators unstable?
Fabian Baetje a , Lukas Menkhoff b,c,∗
a
Department of Economics, Leibniz University Hannover, Königsworther Platz 1, D-30167 Hannover, Germany
b
DIW Berlin (German Institute for Economic Research), 10108 Berlin, Germany
c
Humboldt-University Berlin, Germany

article info abstract


Keywords: We show that technical indicators deliver stable economic value in predicting the US
Equity premium predictability
equity premium over the out-of-sample period from 1966 to 2014. The results tentatively
Economic indicators
improve over time, and beat alternatives over a large continuum of sub-periods. In contrast,
Technical indicators
Break tests economic indicators work well only until the 1970s, but lose predictive power thereafter,
even when considering the last crisis. Translating the predictive power of technical
indicators into a standard investment strategy delivers an annualized average Sharpe ratio
of 0.55 p.a. (after transaction costs) for investors who entered the market at any point in
time.
© 2016 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved.

1. Introduction return predictability. Thus, our prime goal is to examine


the stability of the predictive performances of forecasting
There is a long-standing debate as to whether the equity indicators over time.
premium is predictable or not. While predictability seemed There are two recent developments which provide fur-
to be largely accepted for some time (e.g., Campbell & ther motivation for our analysis. First, economic indicators
Shiller, 1988a,b; Cochrane, 2008; Fama & French, 1988, predict the equity premium quite well in crisis periods,
1989), Goyal and Welch (2008) presented strong evidence which might lead to considerably improved forecasting re-
that challenged this view of predictability. They showed sults when the recent crisis of 2008/09 is included. Sec-
that the standard economic indicators that are used for ond, Neely, Rapach, Tu, and Zhou (2014) show that their
predicting equity returns perform poorly over time, due universe of 14 technical trading rules is also able to pre-
at least partly to instability issues. In particular, a large dict the equity premium out-of-sample. The performances
share of the good forecasting performance arises from the of these indicators are comparable to those of 14 conven-
period up to the early 1970s, but there is little evidence tional indicators which are based on economic reasoning,
of predictability in later decades. This indicates that many such as the ‘‘dividend-price ratio’’. Hence, expanding the
earlier results in favor of predictability may be driven by sample period and the universe of predictors seems to al-
the specific samples, rather than suggesting systematic low for a more powerful test regarding the stability of re-
turn predictability than the earlier literature, such as Goyal
and Welch (2008).
∗ Corresponding author at: DIW Berlin (German Institute for Economic Based on these arguments, we examine the possible in-
Research), 10108 Berlin, Germany. Tel.: +49 0 30 89 789 435. stability of economic and technical indicators for predicting
E-mail addresses: baetje@gif.uni-hannover.de (F. Baetje), the US equity premium thoroughly. While the two kinds
lmenkhoff@diw.de (L. Menkhoff). of indicators provide similar predictive performances over
http://dx.doi.org/10.1016/j.ijforecast.2016.02.006
0169-2070/© 2016 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved.
1194 F. Baetje, L. Menkhoff / International Journal of Forecasting 32 (2016) 1193–1207

the full sample (Neely et al., 2014), their degrees of instabil- who optimizes his risk-return profile depending on the
ity are completely different: our main finding is that only predicted equity premium. The performance is determined
technical indicators provide stable economic value rela- via the certainty equivalent return and the Sharpe ratio, us-
tive to the historical average (as a standard benchmark). ing various risk-aversion coefficients, transaction costs and
Transforming this kind of equity premium prediction into a constraints on portfolio weights. We find that technical in-
conventional investment strategy generates an annualized dicators are able to beat alternative investment strategies
average Sharpe ratio of 0.55 p.a. (after transaction costs) in almost all relevant cases, especially during most sub-
for an investor who entered the market at any point in periods. In contrast, the declining predictive ability of eco-
time from the mid-1960s onward. Reassuringly, this per- nomic indicators translates into a disappointing economic
formance tentatively improves over time, thus standing in value for investment strategies.
clear contrast to economic indicators which lost their pre-
Beyond its close relationship with the work of Goyal
dictive power after the 1970s, even when we consider the
and Welch (2008) and Neely et al. (2014), our research
recent crisis period.
belongs to four broader strands of the literature. We first
Our data and procedures follow the main studies in
refer to studies on the prediction of the equity premium
this field closely, in particular Goyal and Welch (2008)
using economic indicators, such as the dividend-price ra-
and Neely et al. (2014), in order to make our analysis
tio, as surveyed by Rapach and Zhou (2013). These studies
directly comparable. We choose the same selection of 14
are reflected and refined in the collection of papers sum-
economic and 14 technical indicators for predicting the
equity premium over the sample period from 1966 to marized by Spiegel (2008), including those by Campbell
the end of 2014. Based on this (extended) replication of and Thompson (2008), Cochrane (2008), Goyal and Welch
the earlier result of Neely et al. (2014), which shows the (2008) and Lettau and Van Nieuwerburgh (2008). More re-
forecasting power of economic and technical indicators cently, Rapach, Strauss, and Zhou (2010) suggested that
over the full sample period, we implement various tests for combination measures of economic indicators may lead
uncovering potential forecasting instability over time. to better forecasting results. Second, our study is inspired
We start by assessing the predictive performance over by earlier works showing the usefulness of certain techni-
time, applying the Goyal and Welch (2003, 2008) ap- cal indicators for predicting stock returns, such as Brock,
proach. Specifically, the first prediction is made for January Lakonishok, and LeBaron (1992), Brown, Goetzmann, and
1966, and the in-sample estimation period is extended one Kumar (1998), Lo, Mamaysky, and Wang (2000) and Neely
month at a time thereafter. We find that the predictive per- et al. (2014). Third, our main concern here is not actually
formance of the economic indicators does not seem to im- the predictability as such, but its potential instability. In
prove when the sample is extended by an additional nine a first step, we follow Paye and Timmermann (2006) and
years relative to that of Goyal and Welch (2008). We also Rapach and Wohar (2006) in testing for the structural sta-
find an unstable predictive performance when applying bility of predictive regression models. Our results provide
the same procedure to the set of technical indicators in- evidence of structural instability, but not always at the
troduced by Neely et al. (2014); however, the performance same point in time. In order to take into account breaks
has improved over recent decades. The obvious instabil- of which the exact timing is unclear, we employ a re-
ity motivates us to examine the empirical relationship be- cursive estimation setting based on rolling initialization
tween the equity premium and the forecasting variables periods. Fourth and finally, we evaluate the predictive per-
using conventional break tests. The results do show break formance in economic terms, which provides a direct link-
points for the economic indicators (and hardly any for the age to its practical usefulness. In this respect, we follow
technical indicators), but the empirical evidence of break- Cenesizoglu and Timmermann (2012) and Rapach and
points is not consistent across various tests.
Zhou (2013), among others. Again, our main objective is
We propose to account for instability in the forecasting
not to determine the economic value of certain strategies
process (as indicated by the Goyal and Welch approach
over the full sample, but to analyze the instability of eco-
and break tests) by neglecting data from the distant
nomic values over many sub-periods.
past. Instead of using a fixed starting point and enlarging
This paper is organized in five more sections. Section 2
the sample period from there on (see Goyal & Welch,
2003, 2008), we use a fixed end-point and shorten discusses the approach and the data. Section 3 contains our
the sample by shifting the initial estimation period examination of out-of-sample equity premium prediction,
successively through time, thus examining hundreds of focusing on the analysis of instability. The results are
overlapping sub-periods. This procedure introduces the assessed in Section 4, where we analyze the economic
idea of rolling windows but avoids the unreliable results of value from the preceding section. Further robustness tests
a standard rolling window approach. The results confirm are sketched in Sections 5, and 6 concludes.
that economic indicators do not have stable forecasting
powers (Goyal & Welch, 2008), and especially not in recent
2. Approach and data
periods. In contrast, we show that the technical indicators
can forecast the equity premium more accurately through
to the most recent decades. This section provides background information for the
Finally, we follow this rolling recursive approach to as- research presented in later sections. It describes the
sess the stability of forecasts using utility-based metrics. forecasting approach (Section 2.1) and the data being used
More specifically, we consider a mean–variance investor (Section 2.2).
F. Baetje, L. Menkhoff / International Journal of Forecasting 32 (2016) 1193–1207 1195

2.1. Forecasting approach squared forecasting errors under the benchmark specifi-
cation against the one-sided alternative of lower mean
Our empirical application is based on the typical squared forecasting errors when using the predictive vari-
specification for equity premium prediction, i.e. able under analysis. Statistical inference is then assessed
via an upper tail test, corresponding to t-statistics obtained
rt +1 = αi + βi xi,t + εi,t +1 , (1)
by regressing {dt }Tt=s on a constant.
where rt +1 is the equity premium at time t + 1 and xi,t is  2 2 
dt = (rt − r̄t )2 −

the one-month lagged predictive variable, stemming from rt − r̂t ,i − r̄t − r̂t ,i
a broad set of economic variables and technical trading
for t = s, . . . , T . (3)
rules, indexed by i. εi,t +1 denotes the corresponding eq-
uity premium innovation. In addition, we also make use During the course of the examination, we allow for the
of forecasting strategies which should yield superior pre- use of various different specifications, to ensure that the
dictive performances by addressing concerns of in-sample empirical results are stable and economically important.
overfitting, model uncertainty and parameter instability
(summarized by Rapach & Zhou, 2013). Specifically, we use 2.2. Data description
forecasting strategies that incorporate information from
our full set of predictor variables that stem from economic Our sample consists of monthly observations from
variables, technical indicators or both. We follow Neely December 1950 to December 2014, giving a total of
et al. (2014) in this respect and estimate latent factor struc- 769 observations, which should be long enough for our
ture models, as proposed by Stock and Watson (2002a,b). objective of stability screening. The dataset and the sample
Regarding the number of principal components used in the size are those of Neely et al. (2014), but updated by three
predictive setting, we employ the Schwarz information cri- additional years. Our application is based on forecasting
terion (SIC), and assume a maximum number of three com- the monthly US equity premium, which is defined as
mon components based on the set of 14 economic variables the difference between the continuously compounded
and technical indicators, and four based on the full set of log return on the S&P 500 (including dividends) and
28 predictors. The results on alternative strategies are pre- the log return on a risk-free bill. We make use of 14
sented in the robustness section. economic variables that have been used prevalently in
Given the empirical finding that the out-of-sample the empirical literature, and also focus on 14 predictive
evidence of equity premium prediction is worse than variables that stem from the category of technical trading
for in-sample prediction (see for example Bossaerts & rules, for comparison purposes. A detailed description of
Hillion, 1999; Goyal & Welch, 2003, 2008), our application the variables is given in Appendix 1.
is based solely on ex-ante identification (see Campbell,
2008). Therefore, we are interested in whether predictor 2.2.1. Economic indicators
variables deliver equity premium forecasts in a real-time The set of 14 economic predictor variables is a
setting, and, more precisely, whether they outperform the representative outline of variables that are used commonly
historical average that is used commonly as a benchmark to predict the equity return (see for example Goyal
specification. To address the out-of-sample aspect of our & Welch, 2008; Rapach et al., 2010). These variables
analysis, the predictive regression in Eq. (1) is converted comprise information about both stock characteristics:
into a real-time setting, where we split the total sample the (log) dividend-price ratio (DP); the (log) dividend yield
into an initialization period [1 : s−1] and an out-of-sample (DY); the (log) earnings-price ratio (EP); the (log) dividend-
evaluation period [s : T ]. More specifically, one-step-ahead payout ratio (DE); the equity risk premium volatility
forecasts are obtained by recursive estimates. (RVOL); the book-to-market ratio (BM); and net equity
The out-of-sample forecast accuracy is then assessed expansion (NTIS); and interest related information: the
using the R2OS evaluation statistic suggested by Campbell Treasury bill rate (TBL); the long-term yield (LTY); the
and Thompson (2008): long-term return (LTR); the term spread (TMS); the default
yield spread (DFY); the default return spread (DFR) and
T

(rt − r̂t )2 inflation (INFL).1
t =s
R2OS =1− T
, (2)
2.2.2. Technical indicators
(rt − r̄t )
 2
t =s
Following Neely et al. (2014), the full set of 14 technical
 T indicators is based on three kinds of popular technical
where r̂t t =s represents the out-of-sample forecasts trading strategies. At the end of each period, i.e., each
based on the predictive variables and {r̄t }Tt=s are the fore- month in our setting, each of these indicators provides a
casts using the historical average instead. Moreover, when buy (sell) signal based on recent price movements. We
examining whether the predictors contain significant in-
formation above and beyond the historical average, we
1 We follow Neely et al. (2014) by using a slightly different volatility
make use of the MSFE-adjusted test statistic proposed by
measure that was proposed by Mele (2007) and attenuates the outlier
Clark and West (2007) to compare the forecast accuracy problem in October 1987. Because inflation information is released with
of nested models. Specifically, the performance compari- a one-month delay, we follow Goyal and Welch (2008) by inserting one
son is based on the null hypothesis of equal or lower mean additional month of waiting.
1196 F. Baetje, L. Menkhoff / International Journal of Forecasting 32 (2016) 1193–1207

Table 1
Summary statistics.
Variable Mean Std. Skew. Kurt. AC(1) AC(2) AC(3) Sharpe ratio

rt 0.52 4.20 −0.67 5.42 0.06 −0.03 0.04 0.43


Economic variables
DP −3.51 0.42 −0.31 2.47 0.99 0.98 0.97
DY −3.50 0.42 −0.31 2.49 0.99 0.98 0.97
EP −2.78 0.43 −0.85 6.09 0.99 0.97 0.94
DE −0.73 0.30 2.54 18.06 0.99 0.95 0.90
RVOL 0.14 0.05 0.81 3.88 0.96 0.92 0.88
BM 0.53 0.25 0.52 2.60 0.99 0.99 0.98
NTIS 0.01 0.02 −1.08 4.46 0.98 0.95 0.92
TBL 4.46 3.05 0.88 4.20 0.99 0.97 0.95
LTY 6.15 2.72 0.83 3.22 0.99 0.98 0.98
LTR 0.55 2.75 0.51 6.33 0.04 −0.07 −0.02
TMS 1.69 1.42 −0.11 2.81 0.96 0.91 0.86
DFY 0.96 0.45 1.81 7.54 0.97 0.92 0.88
DFR 0.02 1.38 −0.34 10.00 −0.09 −0.06 −0.02
INFL 0.30 0.33 0.55 7.29 0.61 0.47 0.38
Technical indicators
MA(1,9) 0.69 0.46 −0.82 1.68 0.70 0.55 0.43
MA(1,12) 0.72 0.45 −0.96 1.92 0.78 0.65 0.53
MA(2,9) 0.70 0.46 −0.85 1.72 0.77 0.60 0.47
MA(2,12) 0.72 0.45 −0.95 1.91 0.83 0.69 0.56
MA(3,9) 0.70 0.46 −0.88 1.77 0.79 0.62 0.48
MA(3,12) 0.72 0.45 −0.98 1.95 0.83 0.68 0.57
MOM(6) 0.69 0.46 −0.82 1.67 0.69 0.55 0.44
MOM(12) 0.73 0.44 −1.05 2.10 0.81 0.72 0.64
VOL(1,9) 0.68 0.47 −0.77 1.60 0.60 0.54 0.42
VOL(1,12) 0.71 0.46 −0.90 1.82 0.70 0.64 0.50
VOL(2,9) 0.68 0.47 −0.75 1.57 0.76 0.56 0.46
VOL(2,12) 0.70 0.46 −0.88 1.77 0.82 0.65 0.56
VOL(3,9) 0.69 0.46 −0.84 1.70 0.76 0.58 0.45
VOL(3,12) 0.70 0.46 −0.88 1.78 0.83 0.70 0.58
Notes: The table reports summary statistics, including the mean, standard deviation (Std.), skewness (Skew.) and kurtosis (Kurt.), of the monthly log equity
premium (in percentages) and predictor variables that stem from economic and technical indicators. We also report the first to third-order autocorrelation
coefficients AC(.) and the annualized Sharpe ratio for the log equity premium. The sample period is December 1950 to December 2014. A full description
of the data is given in the data appendix.

generate six technical trading strategies based on moving- premium provides a return of 0.52% per month on average,
average rules, which compare short-term (1, 2, 3 months) with a monthly standard deviation of 4.20%, leading to an
and long-term (9, 12 months) moving averages in order annualized Sharpe ratio of 0.43. The summary statistics
to detect changes in stock price trends. In addition, we on the technical indicators show a sample mean in the
also obtain two technical trading strategies by comparing range of 0.68 to 0.73, which involves buy signals for at
current and past stock prices, i.e., momentum rules. If least two-thirds of the whole sample range. The first order
the current price level exceeds the previous level (6, 12 autocorrelation coefficients for the technical indicators are
months periods ago), then the trading rule generates a highly statistically significant and in the range of 0.60 to
buy signal, i.e., a trend-following perspective.2 The third 0.83. This tentatively supports the underlying assumption
category is based on volume rules. These six technical of the technical analysis that past price trends persist into
trading indicators relate the volume to price changes the future.
(short-term = 1, 2, 3 months; long-term = 9, 12 months) Economic predictors, on the other hand, confirm
so as to detect strong price trend movements, as was the previous findings of a highly statistically significant
proposed by Granville (1963). The importance of volume persistency near the unit root for almost all variables.
comes from the interpretation that price movements that With the exception of the long-term return (LTR), the
are confirmed by high trading volumes generate more default return spread (DFR) and the inflation rate (INFL),
serious signals of stock price trends. all economic variables are highly autocorrelated, with first
order autocorrelation coefficients near one. The second
2.2.3. Descriptive statistics to third order autocorrelation coefficients illustrate that
the persistent behavior of economic variables decays more
Descriptive statistics for the US equity premium and
slowly over time than that of technical indicators.
predictor variables are reported in Table 1. The equity

3. Out-of-sample equity premium prediction


2 Due to a referee request, we apply the popular six-month momentum
rule here, whereas Neely et al. (2014) use a nine-month momentum rule.
This section presents our prediction results in four
However, the differences between the two are minor and do not affect steps. We start by replicating earlier exercises for a
our findings. somewhat longer period (Section 3.1). Next, we apply
F. Baetje, L. Menkhoff / International Journal of Forecasting 32 (2016) 1193–1207 1197

the Goyal and Welch (2003, 2008) stability procedure the same models perform poorly if these unusual years are
to the economic and technical indicators (Section 3.2) excluded from the sample.
and analyze these time series using conventional break To examine whether the forecast performance over the
tests (Section 3.3). Finally, we apply a rolling-recursive full sample (as documented in Table A.I) may benefit from
estimation approach to measure the performance stability short-lived periods, this section follows Goyal and Welch
over time (Section 3.4). (2003, 2008), who propose focusing on the cumulative
sum of differences in the squared forecast errors under the
3.1. Out-of-sample prediction results benchmark specification and the squared forecast errors
based on predictive variables (CDSFE):
As the first step in our empirical analysis, we document
T
the forecasting results of the 14 economic and 14 technical 
indicators when applying a standard recursive setting. This CDSFE (t , i) = ((rt − r̄t )2 − (rt − r̂t ,i )2 ). (4)
t =s
allows comparisons with earlier studies, and in particular
with that of Neely et al. (2014), which covers a somewhat To save space, Fig. 1 shows the out-of-sample perfor-
shorter period, from January 1966 to December 2011. In
mances of the principal component indicators, relative to
line with the literature, we start with a 15-year in-sample
the benchmark, at each point in time. First, values above
period and produce a forecast for January 1966, then
zero indicate that the predictive model has a positive per-
increase the in-sample period by one month and make
formance up to the point in time that is being considered.
forecasts for February 1966, etc. At the end, we evaluate
the average performance over these hundreds of forecasts. Second, an increasing process makes a positive contribu-
We find that adding three additional years of observations tion, whereas a declining line implies that the predictive
does not change the results qualitatively. performance is negative in the period under considera-
While detailed results are reported in Table A.I in tion. The three panels show the predictive performances
the Appendix, we refer only to information from the of three principal components, representing economic in-
full set of economic and technical indicators, by forming dicators, technical indicators and all indicators (graphs for
principal components. As expected, the R2OS for the all 28 single indicators are available on request).
economic variables is negative and that for the technical Overall, we confirm earlier findings. (i) We show that no
indicators is positive. Nevertheless, the p-values for prediction model outperforms the historical average con-
the MSFE-adjusted test statistic are below 0.05 for the sistently over time, i.e., there are no persistently upward
economic variables, whereas the technical indicators only sloping curves. (ii) Local predictability is concentrated in
outperform the historical average at the 10% level.3 recessions rather than expansions. (iii) The indicator PCEcon
The principal components based on both economic and (see Panel A) provides some outperformance through the
technical indicators (Panel C) indicate highly statistically first half of the sample, with a sharp improvement in pre-
significantly better performances at the 1% level, with a R2OS
dictive performance during the recessions of the 1970s (as
value of 1.30% for the full sample. Overall, this supports the
was mentioned by Goyal & Welch, 2008) and 1980s.4 (iv)
notion that technical indicators contain information above
None of the 14 individual economic indicators performs
and beyond that in economic variables over the business
considerably better than PCEcon . (v) The performances of
cycle, as was shown by Neely et al. (2014).
the principal component predictive regressions based on
Finally, the majority of the forecasting power of nearly
all of the predictor variables is in recession periods, which technical indicators, PCTech (see Panel B), are never much
is in line with the findings reported by Cochrane (1999, worse than that of the benchmark over longer periods,
2007) and Fama and French (1989), and highlighted by i.e., there are only small negative values, and the long-term
Henkel, Martin, and Nardari (2011), among others. trend is rather upwards than downwards. (vi) Looking at
the 14 technical indicators individually largely confirms
these findings. (vii) Finally, Panel C shows the forecasting
3.2. Dynamic out-of-sample prediction performance
performance when combining the information from eco-
nomic and technical indicators (PCAll ). Overall, the pattern
As was mentioned by Timmermann (2008), ‘‘Most of
the time the forecasting models perform rather poorly, follows PCEcon , but it is moderated by the influence of PCTech .
but there is evidence of relatively short-lived periods Given this strongly time-dependent predictive abil-
with modest return predictability’’, which might lead to a ity, further analysis seems warranted, to analyze whether
positive R2OS over the full sample period. This is in line with the predictability is driven solely by specific samples or
the findings of Goyal and Welch (2008), who show that the whether the predictor variables show a systematic rela-
predictive ability of economic variables increases sharply tionship. These aspects are analyzed in Sections 3.3 and 3.4.
during the oil price shock recession in the 1970s, but that

4 This finding is in line with the strong deterioration in the predictive


3 Clark and West (2007) mentioned that the null hypothesis can be performance of dividend-price ratios since the mid 90s that was
rejected even if we observe a negative R2OS , due to the adjustment term shown by Ang and Bekaert (2007), Goyal and Welch (2003) and Lettau
which accounts for a potential upward bias in the MSFE, produced by and Ludvigson (2001), which resulted from a sharp increase in their
parameter estimates that are zero under the null. persistency.
1198 F. Baetje, L. Menkhoff / International Journal of Forecasting 32 (2016) 1193–1207

Fig. 1. Dynamic predictive performance at any point of time. Notes: The graphs plot the dynamic out-of-sample predictive performances of forecasts
based on principal components from the full set of macroeconomic variables (PCEcon ) or technical indicators (PCTech ), or from taking both predictor
groups into account simultaneously (PCAll ). Following Goyal and Welch (2003, 2008), the graphs show the cumulative sum of differences in the squared
prediction errors under the benchmark specification, and the squared prediction errors based on information regarding predictive variables: CDSFE (t , i) =
t =s ((rt − r̄t ) − (rt − r̂t ,i ) ), where (rt − r̄t ) are squared prediction errors of the historical average and (rt − r̂t ,i ) are the forecasting errors of the model
T 2 2 2 2

i named in the headings. The shaded areas correspond to NBER-dated recessions. Overall, upward-sloping curves indicate that lower MSPEs are achieved
by making use of predictor variables.

3.3. Structural stability tests constant or sparsely time-varying. However, the recent
literature (e.g., Lettau & Van Nieuwerburgh, 2008; Pesaran
Early evidence of instability in predictive performances, & Timmermann, 2002; Pettenuzzo & Timmermann, 2011;
using valuation ratios (see Ang & Bekaert, 2007; Goyal & Rapach et al., 2010) has highlighted the effects of model
Welch, 2003 and Lettau & Ludvigson, 2001, for example), and parameter instability, due to occasional structural
has recently being linked to the presence of occasional breaks. Such breaks might also explain weak out-of-
break dates. However, the possibility of occasional changes sample results compared to their in-sample counterparts
does not seem to be restricted to economic variables: (see Clark & McCracken, 2005).5 Paye and Timmermann
Park and Irwin (2007) mention that technical trading (2006) and Rapach and Wohar (2006) provide evidence
strategies are also subject to substantial changes, and their of the presence of structural breaks in the 1990s, and
profitability tended to vanish after the late 1990s. highlight the fact that the relationship between the
In Section 3.2 above, we related the equity premium equity premium and the dividend-price ratio decreased
to predictor variables in a recursive estimation setting.
This achieves the most efficient coefficient estimates by
incorporating more information as it becomes available. 5 Pesaran and Timmermann (2002, 2004) show that the usage of pre-
Nevertheless, these out-of-sample forecasts are based break data can improve the stock return predictability in the presence of
on the assumption that the underlying relationship is structural breaks.
F. Baetje, L. Menkhoff / International Journal of Forecasting 32 (2016) 1193–1207 1199

substantially after 1990. Interest rate related variables, the empirical relationship between the equity premium
like the term spread, offer breakpoints in the 1970s. and the predictor variables, it is less clear whether the
Accordingly, ignoring the presence of possible breaks predictive ability is stable or exists only at specific points
would lead to biased estimates, and thus to a failure to in time (e.g., at the beginning or at the end of the sample).
predict the equity premium out-of-sample. Naturally, rolling window regressions might be suitable
Postulating one breakpoint up to time T , the data for accounting for such shifts; however, this approach
generating process is of the following form has several disadvantages. Concerning the bias-efficiency
trade-off, while rolling window regressions might reduce
rt +1 = α1,i + β1,i xi,t + εi,t +1 t = 1, . . . , k1 ,
the potential estimation bias, this approach does suffer
rt +1 = α2,i + β2,i xi,t + εi,t +1 t = k1 , . . . , T − 1. (5) from an increase in the estimation uncertainty (see Pesaran
& Timmermann, 2007). In addition, breaks seem to be
To examine whether there are structural breaks in the
frequent, meaning that, in order to account for this fact, the
equity premium prediction regressions, we run three kinds
initialization period should be comparably short, which is
of empirical break tests, following Paye and Timmermann
opposite to the requirements of the precise identification
(2006) and Rapach and Wohar (2006) in this respect. (1)
of common components.
Using in-sample predictive regressions, we employ the
We therefore account for these effects by using a
Andrews (1993) SupF statistic, testing the null hypothesis
rolling-recursive estimation setting, where we allow the
of no structural break against the alternative of occasional
in-sample estimation period (15 years) to vary over time.
changes at unknown dates. We impose a 15% trimming
In our case, we shift the starting point of the out-of-
percentage when determining the minimum window
sample period continuously forward by one month. Such a
length between breaks.6 (2) Allowing for multiple breaks,
procedure is equal to different subsample analyses without
we employ the Bai and Perron (1998) UDmax and WDmax
the sample start being chosen arbitrarily. In addition, we
(5%) statistics for testing the null hypothesis of no
are able to examine whether the sample under analysis
structural breaks against the alternative of multiple breaks,
is responsible for the out-of-sample predictability results
with a maximum of five occasional changes. Bai (1997) and
obtained, or whether the predictive ability remains even
Bai and Perron (2006) mention that the UDmax and WDmax
under more recent subsamples, i.e., forecasting stability
statistics can be more powerful than the Andrews SupF test
over time.
in the case of multiple breaks. (3) Finally, we make use of
Specifically, Fig. 2 shows the time-varying process of
Elliott and Müller’s (2006) qLL
 test, which has good power
the R2OS by starting with an estimation window over
and size properties even under heteroskedastic settings.
the evaluation period 1966:01–2014:12. Thus, the first
The results shown in Table 2 do not provide consistent
points of the three strategies, shown in Panels A to C,
evidence of structural instability. While the empirical evi-
are exactly the R2OS values mentioned in Section 3.1, for
dence for technical indicators is quite clear (nearly all tests
example 1.30% for PCAll . Next, we examine the out-of-
fail to reject the null hypothesis of no structural break),
sample predictability over the sample 1966:02–2014:12
predictive regressions using economic indicators seem to
using an initialization period from 1951:01 to 1966:01, and
be affected by breaks more intensively. Nevertheless, the
so on. Thus, the black line in Panel C shows, month by
findings are mixed and depend strongly on the selected
month, the average forecasting performance (measured by
break-test. Thus, while the previous evidence of structural
R2OS ) of the PCAll strategy from that month on until 2014:12.
instability cannot be confirmed, it is not obvious whether
To account for the problems that can arise from the use of
or when the predictive performance might offer major in-
short out-of-sample evaluation periods, our analysis ends
stability. Therefore, we attempt to highlight possible insta-
with the evaluation period 1995:01–2014:12, i.e., covering
bility by accounting for possible breaks in a more dynamic
at least 20 years.7
estimation setting in the following section.
Concerning our subsample stability analysis, Fig. 2
shows that there are large differences in the forecast
3.4. Performance stability in a rolling-recursive setting performances of the economic and technical indicators
over time. The time-varying R2OS values of the economic
Motivated by the concerns of Clark and McCracken predictor variables (represented by principal component
(2005) and Pesaran and Timmermann (2007) as to possible predictive regressions) do not outperform the benchmark
distortions of the earlier approaches, we apply a rolling- model consistently. In fact, the opposite is the case,
recursive setting here. This is new in the literature on i.e., most of the time findings reveal higher prediction
equity premium prediction, and complements the other errors (negative R2OS ) than those of forecasts made by
approaches. the historical average. Remarkably, some of the economic
The findings presented so far have been based on predictor variables never exceed the zero line. In contrast,
recursive estimates over the full sample range, which the technical indicators seem to be much more robust
might benefit strongly from the specific sample period predictors over time, though at a low level of predictability.
under analysis (see Clark & McCracken, 2005). Moreover,
as there is no distinct evidence of structural breaks in
7 As was mentioned by Inoue and Kilian (2004) and Hansen and
Timmermann (2012), out-of-sample forecast evaluation results have
6 Given general nonstationarities in the regressors, the statistical reduced power when using short sample periods. Thus, our shortest
inference is based on the Hansen (2000) heteroskedastic fixed-regressor evaluation period covers at least 240 months, which should avoid such
bootstrap, which has better size properties in finite samples. problems.
1200 F. Baetje, L. Menkhoff / International Journal of Forecasting 32 (2016) 1193–1207

Table 2
Equity premium predictive regression and structural break tests.

Predictor SupF Breakpoint UDmax WDmax qLL


 Predictor SupF Breakpoint UDmax WDmax qLL

(5%) (5%)

Panel A: Bivariate predictive regression forecasts


DP 10.00 1994:10 9.72 13.38∗∗∗∗ −10.95 MA(1,9) 7.75 2000:08 9.36 11.32 −10.42
DY 11.57 1994:10 11.21∗ 14.03∗∗∗∗ −11.23 MA(1,12) 3.93 1998:09 6.88 8.72 −7.53
∗∗ ∗∗∗∗
EP 3.76 1982:06 11.85 15.00 −10.33 MA(2,9) 4.30 2000:09 6.59 10.00 −7.08
DE 14.47∗∗ 1974:08 18.35∗∗∗ 18.35∗∗∗∗ −7.59 MA(2,12) 3.36 1961:10 4.57 6.85 −6.79
RVOL 4.89 1961:10 7.01 8.25 −11.06 MA(3,9) 3.24 1961:10 5.84 8.11 −7.49
BM 6.58 1969:03 10.64∗ 12.87∗∗∗∗ −9.83 MA(3,12) 3.56 1961:10 5.36 7.88 −7.88
NTIS 9.89 2003:02 10.86∗ 13.13∗∗∗∗ −16.66∗∗ MOM(6) 6.98 2000:01 10.28∗ 12.23 −9.69
∗∗∗ ∗∗∗ ∗∗∗∗
TBL 19.07 1974:08 23.74 23.74 −10.16 MOM(12) 3.83 1961:10 5.88 8.06 −7.40
LTY 16.33∗∗ 1974:08 20.53∗∗∗ 20.53∗∗∗∗ −8.89 VOL(1,9) 4.25 1961:10 5.77 7.93 −6.40
LTR 4.84 1961:10 6.71 7.21 −8.67 VOL(1,12) 4.95 1964:06 6.75 9.27 −7.35
TMS 12.15∗∗ 1975:05 14.13∗∗ 14.13∗∗∗∗ −11.94 VOL(2,9) 8.42 1969:04 10.51∗ 10.79 −8.70
DFY 4.61 1961:10 11.19∗ 13.17∗∗∗∗ −13.49∗ VOL(2,12) 7.82 1965:09 10.15 11.35 −9.38
DFR 9.42 1973:01 11.22∗ 11.22 −8.77 VOL(3,9) 4.66 2000:01 10.83∗ 13.88∗∗∗∗ −8.84
INFL 7.94 1974:08 9.88 11.62 −9.42 VOL(3,12) 5.01 2000:01 7.79 10.48 −8.52
Panel B: Principal component predictive regression forecasts
PCEcon (1st) 6.97 1982:06 10.66∗ 12.54 −9.61 PCTech (1st) 4.30 2000:02 7.59 9.75 −7.15
PCEcon (1st–2nd) 10.28 1994:11 14.50∗∗ 17.68∗∗∗∗ −19.93∗∗ PCTech (1st–2nd) 6.54 1965:03 8.81 11.85 −12.41
PCEcon (1st–3rd) 31.10∗∗∗ 1994:11 30.07∗∗∗ 30.07∗∗∗∗ −21.44 PCTech (1st–3rd) 7.09 1965:03 9.30 12.37 −15.17
Panel C: Principal component predictive regression forecasts, all predictors
PCAll (1st) 4.52 2000:02 7.27 9.57 −6.85
PCAll (1st–2nd) 5.45 1994:10 9.06 12.87 −10.00
∗∗ ∗∗∗∗
PCAll (1st–3rd) 11.50 1994:11 14.62 19.31 −18.71
PCAll (1st–4th) 21.86∗ 1994:11 21.33∗∗∗ 21.33∗∗∗∗ −18.37
Notes: This table reports several test statistics for analyzing whether an occasional change exists over the sample period 1950:12–2014:12. We employ the
Andrews (1993) SupF statistic, and estimated breakpoints with stars refer to significance levels of 10% (*), 5% (**), and 1% (***), based on the heteroskedastic
fixed-regressor bootstrap proposed by Hansen (2000). The SupF statistic tests the null hypothesis of no structural break against the one-sided alternative
that a structural change exists. We also account for multiple breaks (following Bai & Perron, 1998) by testing the null hypothesis of zero breaks against
the alternative of up to five breaks. The 10%, 5%, and 1% critical values are equal to 10.16, 11.70 and 15.41 for UDmax (stars refer to significance at the
corresponding levels) and the 5% critical value for WDmax equals 12.81; significance is indicated by ∗∗∗∗ . In addition, qLL
 indicates the test statistic proposed
by Elliott and Müller (2006), with stars referring to significance levels of 10% (*), 5% (**), and 1% (***).

While most of the technical indicators exhibit a substantial 4.1. Asset allocation
decline in the R2OS for the out-of-sample evaluation period
in the 1990s (as was mentioned by Park & Irwin, 2007), Statistical measures of forecast ability are informative
the predictive performance subsequently returns to its but not necessarily decisive for investment and asset al-
previous level. This relative forecasting stability of the location decisions. Cenesizoglu and Timmermann (2012)
technical indicators is transferred to forecasting strategies show that statistical and economic measures of forecast-
that take both economic variables and technical indicators ing performance are only weakly positively correlated. Ac-
into account. cordingly, low or even negative R2OS values, such as those
The figure also illustrates the time-varying predictabil- documented in Section 3, may still provide economic value.
ity during periods of recession and expansion. In line with In examining the economic value of forecasting indica-
the analyses presented earlier, the predictive ability of in-
tors, our method follows those of Campbell and Thompson
dicators consistently exhibits prediction errors that are
(2008), Marquering and Verbeek (2004) and Neely et al.
higher than the historical average in expansions, but prof-
(2014), in order to keep our results comparable with theirs.
its from recession phases.
We consider an investor who composes his portfolio
Overall, our analysis illustrates that, in contrast to the
optimally by allocating across assets, in our case the equity
literature’s focus on economic variables (motivated by
premium, and a risk-free asset according to
Cochrane, 1999, 2007), technical indicators clearly exhibit
a greater stability over time. rp,s = ws−1 rs + rf s , (6)

4. Economic value of equity premium prediction where rp,s represents the portfolio return at the end of
period s, determined by allocating a share of ws−1 to
The quality of equity premium prediction is often the risky asset and 1 to the risk-free bill. For the sake
assessed based on the returns generated by forecasting of simplicity, we conduct the asset allocation exercises
strategies, as we do in Section 3 above. However, the using simple (instead of log) returns. We postulate a
high instability demonstrated provides a strong motivation mean–variance utility function of the following form:
for examining the economic value of such strategies.
1
γ Var s−1 (rp,s ),
   
Thus, Section 4.1 introduces asset allocation decisions U rp,s = Es−1 rp,s − (7)
for measuring the economic value of equity premium 2
forecasts, Section 4.2 applies them to our data, and where γ indicates an investor’s degree of relative risk-
Section 4.3 examines their temporal stability. aversion. Maximizing the utility function with respect to
F. Baetje, L. Menkhoff / International Journal of Forecasting 32 (2016) 1193–1207 1201

Fig. 2. Time-varying predictive performances. Notes: These figures show the time-varying out-of-sample predictive performances measured by R2OS , over
different subsamples. Our analysis starts with recursive forecast estimation over an initial in-sample estimation period of 15 years (1950:12–1965:12),
and conducts real-time forecasts up to 2014:12. Next, we discard the most distant data (i.e. 1950:12), yielding an in-sample estimation sample of
1951:01–1966:01 (15 years), and perform out-of-sample forecasts up to 2014:12, the most recent data point of our sample period. The beginning of the
out-of-sample evaluation period is given on the x-axis. Thus, the last R2OS is obtained over the sample period 1995:01–2014:12. The black line shows the
time-varying R2OS , the grey solid line signals the R2OS during recessions, and the grey dotted line corresponds to the R2OS over expansions. The corresponding
predictive regressions are named in the headings.

ws−1 yields an optimal portfolio weight for the investor of volatility is latent and has to be approximated, we follow
the recent literature (like Christiansen, Schmeling, &
Es−1 (rs )
  
1
ws−1 = . (8) Schrimpf, 2012) and rely on realized volatility forecasts.9
γ Var s−1 (rs )
As can be seen from Eq. (8), and fully in line with
investments for a risk-tolerant investor who ‘‘buys’’ risky investments
conventional theory, optimal portfolio allocation depends
and ‘‘sells’’ insurance to a more risk-adverse investor.
positively on the equity risk premium forecast and 9 The dynamics of the stock market volatility form an important factor
negatively on the conditional variance.8 Because the for asset allocation decisions. In contrast to other studies, which use
constant or slightly time-varying volatility measures (based on rolling
window estimates of monthly historical returns), we do not consider such
8 As a referee mentioned, this conventional procedure does not approaches to be an appropriate way to capture the true latent volatility
take into account that high (risk-adjusted) returns may be adequate process (see Andersen, Bollerslev, Diebold, & Labys, 2003).
1202 F. Baetje, L. Menkhoff / International Journal of Forecasting 32 (2016) 1193–1207

Specifically, the realized volatility is defined as the sum of the annualized utility gains, but our findings for Sharpe
daily squared returns in month t, ratios are qualitatively the same.
Seven of the 14 economic variables outperform the
Mt
 historical average according to positive utility gains, while
RV t = rt2,τ , (9) only two offer positive R2OS values. However, we find
τ =1
large differences in the realized utility gains. Three of
where Mt is the number of trading days and rt ,τ denotes the economic indicators perform comparatively well, with
the return on day τ in month t. Due to the high annualized gains of over 1.80%. This means that access to
persistency of RV t , volatility forecasts are then obtained the information in the predictive regression forecast rather
by using an AR(1)-process based on the log of the than the historical average has a value of at least 180 basis
realized variance, which shifts the distribution closer to points for investors. The highest utility gain is provided by
normality (see Christiansen et al., 2012). Using the same the term spread, with a gain of 304 basis points.
volatility estimate for all models rules out any differences Concerning the technical indicators, the results are
in portfolio allocations being implied by the model more in line with previous evidence. While the maximum
specifications (see Fig. A.I in the Appendix). Furthermore, utility gain is only 213 basis points, all of the forecasts using
we check whether equity premium prediction models technical indicators are valuable. Similarly to the limited
also add economic value due to volatility forecasts, but R2OS , the added value is smaller than for the best economic
the results are nearly unchanged (as is reported in the indicators. Nevertheless, 10 of the technical indicators
robustness section). generate utility gains of more than 50 basis points, and five
of these indicators report average gains of over 100 basis
points.
4.2. The economic value of forecasting models
Portfolio performance measures that make use of
principal component predictive regressions behave well
To determine the economic relevance, we use different
(see Neely et al., 2014), and individual principal component
measures to examine the relative performances of equity
predictive regressions add economic value (economic
premium forecasts and predictions based on the historical
variables by 241 basis points; technical indicators by 218
average. In addition to the average realized portfolio return
basis points). Even better, PCAll offers the highest Sharpe
and the corresponding standard deviation, we also show
ratio (0.47), with an average utility gain of 277 basis points.
the difference in the realized utility when using predictor
variables instead of the historical average (i.e., the certainty
equivalent return). This utility gain can be understood as a 4.3. Stability of economic values
management fee that an investor is willing to pay in order
to have access to the information in the prediction model Analogous to Section 3.4, we also investigate both
rather than that in the historical average. In what follows, whether the reported utility gains are stable over time
the values reported are annualized, such that they can be and whether they exist in the more recent history. Given
understood as an annualized percentage management fee. the time-varying nature of the R2OS (Fig. 2), performance
measures might face the same problems; i.e., the economic
1CER = ((µ̂i − 0.5γ σ̂i2 ) − (µ̂0 − 0.5γ σ̂02 )) ∗ 1200. (10) value could profit from an empirical relationship in the
distant past. To account for possible instabilities, Fig. 3
Here, µ̂i (σ̂i2 ) indicates the sample average (variance) of shows the annualized Sharpe ratio for a mean–variance
the portfolio return formed on prediction model i, while investor with a relative risk aversion coefficient of three
µ̂0 (σ̂02 ) denotes the sample average (variance) using the and allocation constraints of 0 ≤ ws−1 ≤ 1.5 (with
historical average forecast instead. In addition, we also transaction costs imposed). To make our results easy to
report the annualized Sharpe ratio, which is defined as the compare, we use the same rolling-recursive scheme as in
portfolio excess return divided by its volatility. We follow Section 3.4, allowing the initialization period to vary over
Campbell and Thompson (2008) and Cooper and Priestley time. For comparison purposes, this figure also shows the
(2009) and choose a relative risk aversion coefficient Sharpe ratios from using historical average forecasts and a
of three and transaction costs of 50 basis points per simple buy-and-hold strategy in the S&P 500.
turnover, and constrain the optimal portfolio weight for This means, for example, that the first point on the black
the investor by preventing short sales of stocks and taking line in Panel A provides the average Sharpe ratio for all
leverage of no more than 50% (variations are reported in out-of-sample forecasts made by the PCEcon strategy for the
the robustness section). Our findings are documented in period between January 1966 and December 2014, while
Table 3. the last data point covers the sample from January 1995
We note that average portfolio returns of forecasting to December 2014, in order to have at least 20 years for
strategies do not differ much from each other when calculating an average Sharpe ratio.
relying on different conditioning variables. However, the The resulting lines show a very heterogeneous pattern
sample variances are more heterogeneous, which leads across the three strategies indicated by the Sharpe ratios
to larger differences when looking at certainty equivalent of investment strategies starting at different points in
returns. In comparison to the results based on the MSFE, time. Principal component forecasts based on economic
Table 3 shows that most of the predictive regressions add variables (Panel A) performed relatively well until the
economic value beyond the historical average, even if the 1970s. Since then, the Sharpe ratio has declined, and
R2OS values were very small. In what follows, we focus on the previously detected utility gains have vanished, not
F. Baetje, L. Menkhoff / International Journal of Forecasting 32 (2016) 1193–1207 1203

Table 3
Economic measures of the forecasting performance (transaction costs = 50 bp).
Predictor Mean Std. 1CER (ann.) SR Predictor Mean Std. 1CER (ann.) SR

HA 0.92% 5.34 5.96% 0.32


Panel A: Bivariate predictive regression forecasts
DP 0.72% 4.29 −0.64% 0.24 MA(1,9) 0.89% 4.93 0.40% 0.33
DY 0.68% 4.22 −1.03% 0.21 MA(1,12) 1.02% 4.98 1.84% 0.42
EP 0.92% 5.00 0.51% 0.34 MA(2,9) 0.92% 4.97 0.65% 0.35
DE 0.77% 4.31 −0.03% 0.28 MA(2,12) 1.04% 4.96 2.13% 0.43*
RVOL 0.96% 5.55 0.03% 0.34 MA(3,9) 0.97% 4.87 1.43% 0.39
BM 0.85% 5.14 −0.54% 0.29 MA(3,12) 0.93% 5.06 0.56% 0.34
NTIS 0.92% 5.40 −0.15% 0.32 MOM(6) 0.87% 4.97 0.02% 0.31
TBL 0.90% 3.97 2.04% 0.42 MOM(12) 0.94% 5.17 0.46% 0.34
LTY 0.86% 3.69 1.89% 0.41 VOL(1,9) 0.92% 5.06 0.51% 0.34
LTR 0.83% 4.98 −0.43% 0.29 VOL(1,12) 0.98% 4.91 1.48% 0.39
TMS 1.13% 5.04 3.04% 0.49** VOL(2,9) 0.97% 5.16 0.88% 0.37
DFY 0.95% 5.47 0.06% 0.33 VOL(2,12) 0.96% 5.12 0.80% 0.36
DFR 0.85% 5.22 −0.68% 0.28 VOL(3,9) 0.90% 5.08 0.19% 0.32
INFL 0.88% 4.84 0.37% 0.33 VOL(3,12) 0.97% 4.98 1.17% 0.38
Panel B: Principal component predictive regression forecasts
PCEcon 0.92% 3.89 2.41% 0.45 PCTech 1.06% 5.01 2.18% 0.44*
Panel C: Principal component predictive regression forecasts, all predictors
PCAll 1.02% 4.42 2.77% 0.47*
Notes: The table reports means and standard deviations (Std.) of portfolio returns for a mean–variance investor with a relative risk aversion coefficient of
three and transaction costs of 50 basis points per monthly turnover over the evaluation period 1966:01–2014:12. 1CER denotes the annualized certainty
equivalent return gain of predictive regression forecasts in comparison to the historical average forecast, and SR is the annualized Sharpe ratio, defined as
the average portfolio excess return divided by the sample standard deviation. Conditional variance forecasts are obtained by an AR(1) process of the stock
returns realized volatility. HA indicates the historical average forecast where the portfolio performance measures are given in levels; Panel A reveals
results for bivariate predictive models; Panel B shows results using the principal component extracted from the full set of macroeconomic variables
(PCEcon ) or technical indicators (PCTech ); and Panel C indicates the predictive performance when taking the economic and technical indicators into account
simultaneously (PCAll ). The number of factors is selected according to the Schwarz information criterion (SIC). In addition, we follow Ledoit and Wolf (2008)
and test for equality of the Sharpe ratios between historical average forecasts and predictive regressions using the stationary block-bootstrap procedure
of Politis and Romano (1994), with 5000 repetitions and a block size of five. We test the null hypothesis of equal or lower Sharpe ratios using historical
average forecasts against the one-sided (upper-tail) alternative of a higher Sharpe ratio using the predictive variable under analysis.
*
Indicate significance at the 10% level.
**
Indicate significance at the 5% level.

only compared to the portfolio allocations based on The performances of the three investment strategies
the historical average forecast, but also compared to a of course improve when the transaction costs of 50 basis
simple buy-and-hold strategy. There are visible forecasting points are neglected. This case is shown in the Appendix
improvements since the late 1980s, but they still do not for comparison purposes (see Table A.II and Figs. A.II and
compensate for the earlier decline fully. A.III).
A completely different path is found for the technical
indicators, as Panel B shows. While the R2OS is small in mag- 5. Robustness tests
nitude, the reported Sharpe ratio indicates a tentatively in-
creasing slope. With some exceptions, trading strategies This section briefly describes the robustness exercises
based on PCTech forecasts are more valuable than those us- that are documented in length in Appendix B of this
ing either the historical average or a simple buy-and hold
paper. These tests move in three directions: (1) we
strategy. Comparing full sample Sharpe ratios (given in
combine economic and technical indicators in various
Table 3) with the average Sharpe ratio using our rolling-
ways (Table A.III and Fig. A.IV), (2) we demonstrate the
recursive estimation setting, we confirm the previous
small effects of various alternative specifications of the
finding of a highly unstable prediction performance
volatility prediction models (Tables A.IV and A.V), and
for economic variables. While PCEcon yields an annu-
(3) we examine the effects of alternative restrictions on
alized Sharpe ratio of 0.45 over the sample period
the portfolio formation, i.e., leveraged investments and
1966:01–2014:12, the average Sharpe ratio shrinks to 0.43
shorting (Table A.VI). Our results remain qualitatively
when we consider our complete set of subsamples. In con-
unchanged in all of these cases.
trast, the average Sharpe ratio of PCTech is 0.55, which in-
dicates a rise of 0.11 points compared to the evaluation
sample starting in 1966:01. Again, the behavior of PCAll is 6. Conclusions
closely related. The reported benefits are affected strongly
by the performance of technical indicators. While we ob- Equity premium prediction is a long-standing issue in
serve some economic outperformance in the distant past, the literature (see Spiegel, 2008). At least for standard
only technical indicators stabilize the performance mea- economic variables, Goyal and Welch (2008) demonstrated
sure afterwards. that the high predictive ability of economic indicators
A behavior similar to that shown in Fig. 3 is obtained that was observed until the 1970s basically disappeared
if we consider utility gains measured by the annualized thereafter. Given this, it seems obvious to examine the
certainty equivalent return (Fig. 4). predictive ability of technical indicators, and Neely et al.
1204 F. Baetje, L. Menkhoff / International Journal of Forecasting 32 (2016) 1193–1207

Fig. 3. Time-varying Sharpe ratios of forecasting strategies (transaction costs = 50 bp). Notes: The graphs show the time-varying annualized Sharpe ratio
using the rolling-recursive estimation setting of forecasting strategies named in the headings. Our analysis starts with recursive forecast estimation over
an initial in-sample estimation period of 15 years (1950:12–1965:12), and conducts real-time forecasts up to 2014:12. Next, we discard the most distant
data (i.e. 1950:12), yielding an in-sample estimation sample of 1951:01–1966:01 (15 years), and perform out-of-sample forecasting up to 2014:12, the
most recent data in our sample period. The beginning of the portfolio formation period is given on the x-axis. Thus, the last Sharpe ratio corresponds to the
sample period 1995:01–2014:12. We assume a relative risk-aversion coefficient of three, and transaction costs of 50 basis points per monthly turnover.
The black line shows the annualized Sharpe ratio of predictive regression forecasts, the grey solid line indicates the Sharpe ratio based on historical average
forecasts, and the grey dotted line corresponds to a simple buy-and-hold strategy.

(2014) showed their potential. We contribute to this Goyal & Welch, 2003, 2008; Neely et al., 2014). Thus, we
literature by complementing the analysis of forecasting consider the same set of economic and technical indicators
ability with a focus on possible instability. We confirm the as before, and basically replicate earlier results, merely ex-
instability of economic indicators that was indicated by tending the sample period to recent years. We then apply
Goyal and Welch (2008), using more data and a range of the structural stability test, with somewhat disappointing
methods. In contrast, we find technical indicators to show results, as various tests do not converge at the breakpoints.
much less instability, supporting the favorable impression However, there do seem to be breakpoints, and this moti-
given by Neely et al. (2014), and even indicating that vates us to complement the standard recursive approach
technical indicator-based forecasting may have economic for demonstrating the forecasting performance over time
value. by using a specific rolling-recursive approach.
As it is our ambition to complement earlier work, we We aim to simulate the fact that investors may enter the
follow the main approach in this literature closely (see market at a point in time that is unknown ex ante, and that
F. Baetje, L. Menkhoff / International Journal of Forecasting 32 (2016) 1193–1207 1205

Fig. 4. Time-varying realized utility of the forecasting strategies (transaction costs = 50 bp). Notes: These graphs show the time-varying annualized
realized utility (CERp = 1200(µ̂p − 0.5γ σ̂p2 )) using the rolling-recursive estimation setting of forecasting strategies named in the headings. Our analysis
starts with recursive forecast estimation over an initial in-sample estimation period of 15 years (1950:12–1965:12), and conducts real-time forecasts up
to 2014:12. Next, we discard the most distant data (i.e. 1950:12), yielding an in-sample estimation sample of 1951:01–1966:01 (15 years), and perform
out-of-sample forecasting up to 2014:12, the most recent data point in our sample period. The beginning of the portfolio formation period is given on the
x-axis. Thus, the last CER corresponds to the sample period 1995:01–2014:12. We assume a relative risk-aversion coefficient of three, and transaction costs
of 50 basis points per monthly turnover. The black line shows the annualized CER of the predictive regression forecasts, the grey solid line indicates the
CER based on historical average forecasts, and the grey dotted line corresponds to a simple buy-and-hold strategy.

they do not use an infinite amount of information from the Thus, we examine the possible economic value of
past, due to possible instabilities (which we have shown to such forecasting indicators. Largely confirming earlier
be real). Thus, we propose a 15-year in-sample estimation findings, economic indicators do not provide consistent
period and forecast the equity premium from that point economic value. In contrast, however, technical indicators
on until the last month of our sample, i.e., December consistently deliver economic value. This value tends to
2014. Accordingly, an investor may have invested in any increase over time – which is in stark contrast to economic
month between January 1966 and January 1995 (in order indicators – and is higher than the economic value of
to ensure a long out-of-sample period). We find from benchmark investment strategies over almost all sub-
this rolling-recursive approach that the performance of a samples. As a single figure showing the predictive ability of
strategy using economic indicators would indeed be very technical indicators, their average annualized Sharpe ratio
unstable over time, but that strategies built on technical (over the rolling-recursive estimations) is about 0.55 (after
indicators are much better. transaction costs), thus providing sizeable utility gains of
1206 F. Baetje, L. Menkhoff / International Journal of Forecasting 32 (2016) 1193–1207

152 (105) basis points on average compared to a strategy Cenesizoglu, T., & Timmermann, A. (2012). Do return prediction models
based on the historical mean (buy and hold). add economic value? Journal of Banking & Finance, 36(9), 2974–2987.
Christiansen, C., Schmeling, M., & Schrimpf, A. (2012). A comprehensive
Overall, the predictive abilities of economic and tech-
look at financial volatility prediction by economic variables. Journal
nical indicators seem to be of a similar quality when as- of Applied Econometrics, 27(6), 956–977.
sessed by their long-term forecasting errors. However, the Clark, T. E., & McCracken, M. W. (2005). The power of tests of predictive
performances over time are completely different: the eco- ability in the presence of structural breaks. Journal of Econometrics,
124(1), 1–31.
nomic indicators lose power, but the technical indicators
Clark, T. E., & West, K. D. (2007). Approximately normal test for equal
remain powerful or even increase in predictive power. predictive accuracy in nested models. Journal of Econometrics, 138(1),
Thus, technical indicators perform more consistently well 291–311.
over time. When we complement the statistical perfor- Cochrane, J. H. (1999). New facts in finance. Economic Perspectives, 23(3),
36–58.
mance with measures of the economic value, which should Cochrane, J. H. (2007). Financial markets and the real economy. In R.
be more important for the functioning of financial markets, Mehra (Ed.), Handbook of equity premium. Amsterdam: Elsevier.
the discrepancy between economic and technical indica- Cochrane, J. H. (2008). The dog that did not bark: a defense of return
predictability. Review of Financial Studies, 21(4), 1533–1575.
tors widens still further. Only technical indicators provide
Cooper, I., & Priestley, R. (2009). Time-varying risk premiums and the
economic value and they do so in quite a stable way. This output gap. Review of Financial Studies, 22(7), 2801–2833.
may motivate further research on the robustness of our Elliott, G., & Müller, U. K. (2006). Efficient tests for general persistent time
findings and their origins. variation in regression coefficients. Review of Economic Studies, 73(4),
907–940.
Fama, E. F., & French, K. R. (1988). Dividend yields and expected stock
Acknowledgments returns. Journal of Financial Economics, 22(1), 3–25.
Fama, E. F., & French, K. R. (1989). Business conditions and expected
returns on stocks and bonds. Journal of Financial Economics, 25(1),
We thank two anonymous referees, an associate editor, 23–49.
and participants at several workshops and seminars, in Goyal, A., & Welch, I. (2003). Predicting the equity premium with dividend
particular Guglielmo Maria Caporale, Ana-Maria Fuertes, ratios. Management Science, 49(5), 639–654.
Goyal, A., & Welch, I. (2008). A comprehensive look at the empirical
Richard Payne and Maik Schmeling, for their very useful
performance of equity premium prediction. Review of Financial
comments. Studies, 21(4), 1455–1508.
Granville, J. (1963). Granville’s new key to stock market profits. New York:
Appendix A. Supplementary data Prentice-Hall.
Hansen, B. E. (2000). Testing for structural change in conditional models.
Journal of Econometrics, 97(1), 93–115.
Supplementary material related to this article can be Hansen, P. R., & Timmermann, A. (2012). Choice of sample split in out-
found online at http://dx.doi.org/10.1016/j.ijforecast.2016. of-sample forecast evaluation. European University Institute Working
02.006. Paper ECO 2012/10.
Henkel, S. J., Martin, J. S., & Nardari, F. (2011). Time-varying short-horizon
predictability. Journal of Financial Economics, 99(3), 560–580.
References Inoue, A., & Kilian, L. (2004). In-sample or out-of-sample tests of
predictability: which one should we use? Econometric Reviews, 23(4),
Andersen, T. G., Bollerslev, T., Diebold, F. X., & Labys, P. (2003). Modeling 371–402.
and forecasting realized volatility. Econometrica, 71(2), 529–626. Ledoit, O., & Wolf, M. (2008). Robust performance hypothesis testing with
Andrews, D. W. K. (1993). Tests for parameter instability and structural the Sharpe ratio. Journal of Empirical Finance, 15(5), 850–859.
change with unknown change point. Econometrica, 61(4), 821–856. Lettau, M., & Ludvigson, S. (2001). Consumption, aggregate wealth, and
Ang, A., & Bekaert, G. (2007). Stock return predictability: is it there? expected stock returns. Journal of Finance, 56(3), 815–849.
Review of Financial Studies, 20(3), 651–707. Lettau, M., & Van Nieuwerburgh, S. (2008). Reconciling the return
Bai, J. (1997). Estimation of a change point in multiple regressions. Review predictability evidence. Review of Financial Studies, 21(4), 1607–1652.
of Economics and Statistics, 79(4), 551–563. Lo, A. W., Mamaysky, H., & Wang, J. (2000). Foundations of technical anal-
Bai, J., & Perron, P. (1998). Estimating and testing linear models with ysis: computational algorithms, statistical inference, and empirical
multiple structural changes. Econometrica, 66(1), 47–78. implementation. Journal of Finance, 55(4), 1705–1765.
Bai, J., & Perron, P. (2006). Multiple structural change models: a Marquering, W., & Verbeek, M. (2004). The economic value of predicting
simulation analysis. In P. C. B. Phillips, D. Corbae, S. Durlauf, & B. E. stock index returns and volatility. Journal of Financial and Quantitative
Hansen (Eds.), Econometric theory and practice: Frontiers of analysis Analysis, 39(2), 407–429.
and applied research. New York: Cambridge University Press. Mele, A. (2007). Asymmetric stock market volatility and the cyclical
Bossaerts, P., & Hillion, P. (1999). Implementing statistical criteria to behavior of expected returns. Journal of Financial Economics, 86(2),
select return forecasting models: what do we learn? Review of 446–478.
Financial Studies, 12(2), 405–428. Neely, C. J., Rapach, D., Tu, J., & Zhou, G. (2014). Forecasting the equity risk
Brock, W., Lakonishok, J., & LeBaron, B. (1992). Simple technical trading premium: the role of technical indicators. Management Science, 60(7),
rules and the stochastic properties of stock returns. Journal of Finance, 1772–1791.
Park, C. H., & Irwin, S. H. (2007). What do we know about the profitability
47(5), 1731–1764.
Brown, S. J., Goetzmann, W. N., & Kumar, A. (1998). The Dow theory: of technical analysis? Journal of Economic Surveys, 21(4), 786–826.
William Peter Hamilton’s track record reconsidered. Journal of Paye, B. S., & Timmermann, A. (2006). Instability of return prediction
Finance, 53(4), 1311–1333. models. Journal of Empirical Finance, 13(3), 274–315.
Campbell, J. Y. (2008). Viewpoint: estimating the equity premium. The Pesaran, M. H., & Timmermann, A. (2002). Market timing and return
Canadian Journal of Economics, 41(1), 1–21. prediction under model instability. Journal of Empirical Finance, 9(5),
Campbell, J. Y., & Shiller, R. J. (1988a). The dividend-price ratio and 495–510.
Pesaran, M. H., & Timmermann, A. (2004). How costly is it to ignore breaks
expectations of future dividends and discount factors. Review of
when forecasting the direction of a time series? International Journal
Financial Studies, 1(3), 195–228.
of Forecasting, 20(3), 411–425.
Campbell, J. Y., & Shiller, R. J. (1988b). Stock prices, earnings, and expected
Pesaran, M. H., & Timmermann, A. (2007). Selection of estimation window
dividends. Journal of Finance, 43(3), 661–676.
in the presence of breaks. Journal of Econometrics, 137(1), 134–161.
Campbell, J. Y., & Thompson, S. B. (2008). Predicting excess stock returns
Pettenuzzo, D., & Timmermann, A. (2011). Predictability of stock returns
out of sample: can anything beat the historical average? Review of
and asset allocation under structural breaks. Journal of Econometrics,
Financial Studies, 21(4), 1509–1531.
164(1), 60–78.
F. Baetje, L. Menkhoff / International Journal of Forecasting 32 (2016) 1193–1207 1207

Politis, D. N., & Romano, J. P. (1994). The stationary bootstrap. Journal of Stock, J. H., & Watson, M. W. (2002b). Forecasting using principal
the American Statistical Association, 89(428), 1303–1313. components from a large number of predictors. Journal of the
Rapach, D. E., Strauss, J. K., & Zhou, G. (2010). Out-of-sample equity American Statistical Association, 97(460), 1167–1179.
premium prediction: combination forecasts and links to the real Timmermann, A. (2008). Elusive return predictability. International
economy. Review of Financial Studies, 23(2), 821–862. Journal of Forecasting, 24(1), 1–18.
Rapach, D. E., & Wohar, M. E. (2006). Structural breaks and predictive
regression models of aggregate US stock returns. The Journal of Fabian Baetje is Ph.D. student at the Leibniz University Hannover.
Financial Econometrics, 4(2), 238–274.
Rapach, D. E., & Zhou, G. (2013). Forecasting stock returns. In G. Elliott,
& A. Timmermann (Eds.), Handbook of economic forecasting, Vol. 2. Lukas Menkhoff is professor of economics at Humboldt-University
Amsterdam: Elsevier. Berlin and the German Institute for Economic Research (DIW Berlin).
Spiegel, M. (2008). Forecasting the equity premium: where we stand His research interests are in economics and finance. Before joining
today. Review of Financial Studies, 21(4), 1453–1454. Berlin in 2015, he was professor at the University of Kiel and the Kiel
Stock, J. H., & Watson, M. W. (2002a). Macroeconomic forecasting using Institute for two years, at the Leibniz University Hannover for 12 years,
diffusion indexes. Journal of Business and Economic Statistics, 20(2), at the RWTH Aachen for five years and held positions in the financial
147–162. industry.

You might also like