You are on page 1of 37

Predicting Recessions with

Leading Indicators: Model


Averaging and Selection Over
the Business Cycle

Travis Berge
April 2013
RWP 13-05
April 2013
Predicting recessions with leading indicators:
model averaging and selection over the business cycle

Abstract
This paper evaluates the ability of several commonly followed economic indicators to predict business
cycle turning points. As a baseline, forecasts from univariate models are combined by taking averages or
by weighting forecasts with model-implied posterior probabilities. These combined forecasts are compared
to those from a sophisticated model selection algorithm that allows for nonlinear model specications. The
preferred forecasting model is one that allows for nonlinear behavior across the business cycle and combines
information from the yield curve with other indicators, especially at very short and very long horizons.
JEL: C25, C53, E32
Keywords: Business cycle turning points; variable selection; model averaging; probabilistic forecasts.
Travis J. Berge
Federal Reserve Bank of Kansas City
One Memorial Drive
Kansas City, MO 64198
Email: travis.j.berge@kc.frb.org

The views herein do not necessarily reect the views of the Federal Reserve Bank of Kansas City or the Federal
Reserve System.
1 Introduction
The business cycle turning points of the National Bureau of Economic Research determine accounts
of economic conditions in the United States, even decades following the end of a given recession.
For households, recession suggests lower earnings and high unemployment. For businesses, slow
economic growth reduces demand for products and decreases the number of protable economic
opportunities. Given these observations, the enthusiasm with which households, businesses and
policymakers wish to determine the current and future states of the economy comes as no surprise.
Classifying economic variables into leading, coincident and lagging indicators of economic activ-
ity is a long-lived tradition in economic research (Burns & Mitchell 1946). This paper examines the
ability of closely-followed leading indicators to produce useful forecasts of recession. Academic re-
search emphasizes the information content of the yield curve for turning point prediction (see, e.g.,
Estrella & Mishkin, 1998; Wright, 2006; Kauppi & Saikkonen, 2008; and Rudebusch & Williams,
2009). However, practitioners follow a wide range of economic indicators, so that the production
of accurate forecasts of business cycle turning points requires the combination of information from
many indicators. Of course, forecast accuracy depends critically on using the correct indicators to
forecast at the proper horizon. Moreover, while there are many economic indicators that provide
useful signals of the present and future states of the economy, the relationships between these indi-
cators and the state of the economy has not been stable over time. For example, the nancialization
of the U.S. economy and decline of manufacturing as a major sector of the economy likely altered
the predictive content of many nancial indicators of the economy. Further complicating matters,
Hamilton (2005) and Morley & Piger (2012) have documented the asymmetry of many economic
indicators around business cycle turning points. Asymmetry complicates the forecasting problem,
as it is not sucient to specify a model for the joint evolution of the state of the economy and a
vector of leading indicators. Instead, dierent forecasting models are required when forecasting at
dierent horizons.
This paper produces forecasts of business cycle turning points, focusing on combining informa-
tion from a standard set of economic variables. The baseline model relates each possible covariate
to the state of the economy and forecasting up to 24 months in the future. Two methods pro-
duce forecasts that condition on multiple indicators of the economy. First, a forecast combination
1
approach weights univariate models, either equally or with a Bayesian Model Averaging approach
that weights each forecast by its model-implied posterior probability. Secondly, a model selection
algorithm is used to endogenously select the forecasting model most appropriate for each forecast
horizon. By producing models that are biased, the method is able to condition on a large number
of indicators but produce forecasts that have low variance. The method also allows for nonlinear
dynamics across the business cycle.
Many of the indicators analyzed contain information that can be exploited to make forecasts of
future states of the economy. The power of the yield curve as a predictor of future economic activity
endures. The results indicate that the predictive power is limited to forecasts made at the medium-
termreal economic variables most accurately describe the current state of the economy. Interest
rate spreads on corporate bonds also appear to contain information useful for forecasting, especially
at very long horizons. Both model averaging and model selection produce useful probabilistic
forecasts of recession, although the forecasts are produced with models that are quite dierent.
When forecasts are weighted by posterior probabilities, the data strongly favor one or possibly
two indicators for forecasts at each horizon. In contrast, forecasts produced using model selection
methods condition on a much larger number of indicators. The use of many dierent indicators
improves forecast ability, particularly in an out-of-sample forecast exercise.
The plan for the paper is as follows. The next section describes the methods used for model
combination and model selection. Section 3 describes the data and the evaluation of probabilistic
forecasts of a discrete outcome. Sections 4 and 5 present the empirical results.
2 Empirical setup
2.1 A baseline model
Let Y
t
denote the state of the business cycle as determined by the NBER business cycle dating
committee, where Y
t
= 1 denotes that month t is an NBER-dened recession and Y
t
= 0 indicates
2
an expansion instead.
1
Assume that Y
t
is related to an unobserved variable, y
t
:
Y
t
=
_

_
1 if y
t
0
0 if y
t
< 0.
The model relates a vector of observables x
th1
to the latent variable
y
t
= f(x
th1
) +
t
, (1)
where f(.) is R
K+1
R, x is a (K +1) 1 vector of K observables plus a constant and
t
is an iid
shock with unit variance. A typical probit or logit model would specify f(x) as a linear function,
but equation (1) is written more generally to encompass a variety of possible specications.
The objective in this paper is to forecast the state of the economy conditional on a set of
covariates. Let E be the expectations operator, and let p
t|th1
denote the conditional probability
of recession, so that
E [Y
t
= 1|x
th1
] p
t|th1
= (y
t
), (2)
where (.) is a cumulative distribution function. In the application below is specied to be the
logistic function, but any twice-dierentiable continuous distribution function could be used.
A large literature has used a model similar to that in (1) and (2) to relate future economic
activity to the term structure of interest rates. Estrella & Mishkin (1998), Wright (2006) and
Rudebusch & Williams (2009) focus on simple limited dependent models, conditioning on the slope
and level of the yield curve. There are many ways that one could extend the specication described
in (1) and (2). Dueker (2005), Chauvet & Potter (2005) and Kauppi & Saikkonen (2008) extend the
model by focusing on various dynamic specications of the same basic setup. Chauvet & Senyuz
(2012) provide a more modern specication, as they relate the state of the economy to a dynamic
factor model of the yield curve.
Conditioning on other economic indicators is likely to improve forecast ability. Although the
yield curve is a useful summary of market expectations for the future path of short-term interest
rates, it is not a sucient statistic for summarizing future states of the economy. Risk and term
premia also complicate the relationship between the yield curve and macroeconomic variables.
1
See www.nber.org/cycles/
3
Moreover, the nonlinear behavior of real variables across the business cycle makes model speci-
cation dicult. Under nonlinearity, it is not enough to iterate a linear model forward to produce
forecasts. As indicated by equation (1), all forecasts in the paper are made using the direct method.
This allows for the evaluation of the forecasting models themselves to reveal the information con-
tent carried in various indicators. Direct forecasting allows more sophisticated specications of the
function relating the latent variable y
t
to covariates x
th1
.
The remainder of this section introduces the two methods that are used to combine information
from dierent leading indicators, forecast combination and model selection.
2.2 Averaging model forecasts
Model averaging has long been recognized as a method of combining information from a given set
of models. Previous applications have shown that model averaging also tends to improve forecast
accuracy, either because the combination either combines information from partially overlapping
information sets as in the canonical work of Bates & Granger (1969), or because the combina-
tion alleviates possible model misspecication (Hendry & Clements, 2004; Stock & Watson, 2004;
Timmermann, 2006). In addition, the model weights themselves can be of interest if they are
constructed so that they give the posterior probability that a given model produced the observed
data. In the current application, these posteriors reveal information regarding the usefulness of
particular indicators at various forecast horizons.
The method has a solid statistical foundation and is straightforward to implement. Each of a
set of M models is used to produce a forecast of some event y
t
, resulting in { y
1t
, y
2t
, ..., y
Mt
}. In the
current application, recall that y
t
is the latent variable that relates covariates to the aggregate state
of the economy. The combination problem is to nd weights w
m
for each forecast to combine the
individual forecasts into a single forecast y
C
t
= C( y
1t
, ..., y
Mt
, w
1
, w
2
, ..., w
M
). In principle, because
forecasts are useful only to the extent that they impact the actions of policymakers and other
economic agents, the weights are the outcome of the minimization of a loss function, which in turn
could reect the underlying utility of decision makers, in a way analogous to Elliott & Lieli (2009).
In the current application, the tradeo between true and false positives or true and false nega-
tives for recession forecasts is uncertain. For this reason, two weighting schemes produce recession
forecasts. The rst is the most basic application of model averaging: each of the M forecasts is
4
assigned the same weight. The equally weighted forecast of the latent is then
y
EW
t
=
1
M
M

i=1
y
it
. (3)
A Bayesian Model Averaging (BMA) framework is used to produced forecasts that are weighted
to reect the posterior probability of each model (Leamer 1978). These weights can be interpreted
as the posterior probability that a given model is true. The Bayesian model averaged forecast is
the probability-weighted sum of the model-specic forecasts:
y
BMA
t
=
M

i=1
y
it
Pr(M
i
|D
th1
) (4)
where y
it
is the forecast from model i and Pr(M
i
|D
th1
) denotes posterior probability of model
i conditional on the data available at the time the forecast is made. By Bayes Law, the posterior
probability of model i is proportional to that models marginal likelihood multiplied by its prior
probability:
Pr(M
i
|D
t
) Pr(D
t
|M
i
)Pr(M
i
),
where
Pr(D
t
|M
i
) =
_
Pr(D
t
|
i
, M
i
)Pr(
i
|M
i
)
i
.
Implementation of (4) requires an estimate of the marginal likelihood, which is an average
of the likelihood function with respect to the prior distribution for each model parameter. The
marginal likelihood of each model is approximated with the Bayesian information criterion, which
is a consistent estimate of the marginal likelihood of a model (Raftery 1995). This approximation is
commonly used in applied work and is advantageous since its it requires only a maximum likelihood
estimate and allows the researcher to set aside the production of priors for each models parameters
(see, e.g., Sala-I-Martin, Doppelhofer & Miller (2004), Brock, Durlauf & West (2007) and Morley &
Piger (2012)). Since there is no natural method that can be used to incorporate economic intuition
into the prior for each model, each model is assumed equally likely a priori.
Given these assumptions, model posterior probabilities are calculated as model t relative to
the t of all models, or
Pr(M
i
|D
th1
) =
exp(

BIC
i
)

M
i=1
exp(

BIC
i
)
.
For both the equally-weighted and Bayesian Model Averaged (BMA) forecasts, model forecasts
5
of the latent are made using equation (3) or (4), respectively. Probabilistic forecasts are then made
by applying equation (2).
2.3 Model selection via the boosting algorithm
Model averaging is one solution to the problem of model specication. An alternative solution to
the problem is to perform model selection. This section introduces a model selection algorithm as
a methodology with which one can produce empirically-driven forecasting models of the business
cycle. We wish to model the relationship between the observed discrete variable Y
t
and a vector
of potential covariates, x
t
= (x
1t
, x
2t
, ..., x
Kt
). In the section above, this was achieved by dening
an unobserved continuous latent variable that depends linearly on covariates. The approach here
is more general, in the sense that we wish to endogenously model the choice of covariates to be
included in the model (that is, forecasts may be made using only a subset of x
1
, x
2
, ..., x
K
). The
method allows for a potentially non-linear relationship between the latent and the covariates.
These goals are accomplished by estimating a function F : R
K
R that minimizes the expected
loss L(Y, F), i.e.,

F(x) arg min


F(x)
E [L(Y, F(x))] . (5)
We require only that the loss function is dierentiable with respect to the function F. The setup
encompasses many dierent types of problems. For example, if Y were a continuous outcome, then
specifying L(Y, F(x)) as squared-error loss and F(x) as a linear function would result in a standard
OLS regression. More often the loss function is specied as the negative of the log-likelihood of
the errors distribution. F can be specied very generallyin the machine-learning literature it is
common to specify F(x) as decision trees, a non-parametric method. Smoothing splines are another
common choice.
The following algorithm minimizes the empirical counterpart to (5) by specifying that F(x)
is an ane combination of so-called weak learners, each of which are specied separately. The
algorithm is due to Friedman (2001) and can be summarized as follows. The algorithm begins by
initializing the learner in order to compute an approximate gradient of the loss function. Step 3
ts each weak learner to the current estimate of the negative gradient of the loss function. Step
4 searches among each weak learner and chooses the one that most quickly descends the function
space and then chooses the step size. In step 5 we iterate on 2-4 until iteration M, which will be
6
endogenously determined.
Functional Gradient Descent.
1. Initialize the model. Choose a functional form for each weak learner, f
(k)
, k = 1, ...K. Each
weak learner is a regression estimator with a xed set of inputs. Most commonly each covariate
receives its own functional form, which need not be identical across each variable.
Let m denote iterations of the algorithm, and set m = 0. Initialize the strong learner F
0
. It
is common to set F
0
equal to the constant c that minimizes the empirical loss.
2. Increase m by 1.
3. Projection. Compute the negative gradient of the loss function evaluated at the current
estimate of F,

F
m1
. This produces:
u
m
{u
m,t
}
t=1,...,T
=
L(Y
t
, F)
F
|
F=

F
m1
(xt)
, t = 1, ..., T
Fit each of the K weak learners separately to the current negative gradient vector u
m
, and
produce predicted values from each weak learner.
4. Update F
m
. Let

f
()
m
denote the weak learner with the smallest residual sum of squares among
the K weak learners. Update the estimate of F by adding the weak learner to the estimate
of F:

F
m
=

F
m1
+

m
Most algorithms simply use a constant but suciently small shrinkage factor, . Alternatively,
one can solve an additional minimization problem for the best step-size.
5. Iterate. Iterate on Steps 2 through 4 until m = M.
The two parameters and M jointly determine the number of iterations required by the algo-
rithm in order to converge. Small values of are desirable to avoid overtting, since the computa-
tional cost of additional iterations is low. In the applications below, M is chosen to minimize an
Akaike Information Criterion criterion; i.e. M arg min
m
AIC(m).
2
A weak learner can be se-
lected many times throughout the course of the boosting algorithm, or not at all. This data-driven
approach of model selection is very exible; dierent specications of the loss function L and the
weak learners lead to approximations of dierent models. Moreover, when used in a forecasting ex-
ercise, re-estimating the model at each point in time also allows the relationship between covariates
and the dependent variable to change.
2
Throughout, the AIC is dened as 2 log-likelihood + 2 df, where df denotes the degrees of freedom of the
model Models with better t have lower AIC values.
7
3 Data and evaluation
3.1 Data
The implementation of either the forecasting schemes described in the previous section requires
that the model-space to be dened. In the interest of parsimony, and following the seminal work of
Estrella & Mishkin (1998), the analysis is limited to the commonly followed nancial and macroe-
conomic indicators listed in table 1.
While the majority of the literature focuses on the slope of the yield curve as a predictor of future
economic activity, in principle other features of the curve, such as its level and curvature, may also
be important predictors. The level, slope and curvature of the yield curve are constructed using
monthly averages of the daily yields of zero-coupon 3-month, 2-year and 10-year yields compiled
by Gurkaynak, Sack & Wright (2007). Specically, the level of the curve is calculated by taking
the mean of the 3-month, 2-year and 10-year yields; the slope of the curve is constructed as the
dierence between the 10-year and the 3-month yields; and curvature is measured by taking the
dierence between two times the 2-year yield and the sum of the 3-month and 10-year yields.
Additional nancial variables are included in the forecasting exercise. The TED spread and two
dierent corporate bond spreads measure the degree of credit risk in the economy. Other nancial
indicators in the empirical model include money growth rates (both nominal and real), and, because
they are forward looking, changes in a stock price index and the value of the U.S. dollar. The VIX
is included in the model search in recognition that nancial volatility may presage a decline in real
economic activity.
3
Several variables that describe the real economy are also included as predictors in the model.
Industrial production serves as a proxy for output. Housing permits proxy for the housing market.
In order to gauge the health of the labor market, the four-week moving average of initial claims for
unemployment insurance and a measure of hours worked are included in the model. The purchasing
managers index, a commonly followed leading indicator, is also included in the model. The data
span the period January 1973June 2012 at a monthly frequency. Table 1 lists the indicators in
detail.
3
VIX is taken from CBOE and is available from 1990 to present. Pre-1990, VIX is proxied by the within month
standard deviation of the daily return to the S&P 500 index, normalized to the same mean and variance as the VIX
for the period when they overlap (1990-2012). Actual and implied volatility correlated at 0.869.
8
[Table 1 about here.]
3.2 Evaluating forecasts of discrete outcomes
Model and forecast evaluation is undertaken using three distinct metrics. The rst is intended
to measure in-sample t and is analogous to the R
2
statistic of a standard linear regression. The
other two measures focus on the predictive ability of each model, with one measure focusing on each
models ability to classify future dates into recessions and expansions. The pseudo-R
2
developed
by Estrella (1998) measures the goodness-of-t of a model t to discrete outcomes. The pseudo-R
2
can be written
pseudoR
2
= 1
_
logL
u
logL
c
_
(2/n)logLc
. (6)
L
u
is the value of the likelihood function and L
c
is the value of the likelihood function under
the constraint that all coecients are zero except for the constant. As with its standard OLS
counterpart, a value of zero indicates that the model does not t the data whereas a value of one
indicates perfect t.
The test statistic of Giacomini & White (2006) is used to evaluate predictive ability. The GW
test assesses the dierence between the losses associated with the predictions from a prospective
model and those from a null model. Importantly given the methods used to make forecasts, the
statistic tests the conditional forecast and does not depend on the asymptotic convergence of model
parameters to their true values. Specically, let L(
i
t
) be a loss function, where i {0, 1} denotes
the model used to produce the forecast, and
i
t
y
i
t
y
t
is the forecast error. Note that in the
current application, the outcome variable, y, is discrete and takes a value of zero and one, while
the prediction y is a probability. The test statistic that evaluates P predictions can be written as:
GW
1,0
=

L

L
/

P
N(0, 1), where

L =
1
P

t
_
L(
1
t
) L(
0
t
)
_
;
2
=
1
P

t
_
L(
1
t
) L(
0
t
)
_
2
GW test statistics are computed for two loss functions: absolute error and squared error. Since
forecasts are made multiple periods into the future, tables display p-values from Giacomini-White
tests that are robust to heteroskedasticity and autocorrelation.
The Giacomini-White test with squared-error loss is closely related to the commonly-used
Quadratic Probability Score (Brier & Allen 1951), another popular evaluation test for probabilistic
9
forecasts. One well-known drawback of the QPS is that it does not measure resolution or dis-
crimination; it is simply a test of the distance from the probabilistic forecast and the realized 0/1
outcome. Thus, the nal tool used to evaluate the forecasts explicitly recognizes that the problem
is one of classication. Classication is a distinct measure of model performance since two models
could have dierent model t but still classify the discrete outcome in the same way (Hand &
Vinciotti 2003).
The area under the Receiver Operating Characteristics (ROC) curve is used to evaluate each
models ability to distinguish between recessions and expansions. The ROC curve describes all
possible combinations of true positive and false positive rates that arise as one varies the threshold
used to make binomial forecasts from an real-valued classier. As a threshold c is varied from 0 to
1, and treating FP(c) as the abscissa, a curve is traced out in {FP(c), TP(c)} space that describes
the classication ability of the model. The area underneath this curve (AUC) is a well-known
summary statistic that describes the classication ability of a given model.
4
The statistic has a
lower bound of 0.5 and an upper bound of 1 where a higher value indicates superior classication
ability. The statistic has standard asymptotic properties, although for inferential purposes standard
errors are found with the bootstrap.
An important advantage of ROC curves relative to the Giacomini-White test described above is
that it does not require the specication of a loss function. Instead, the ROC curve is a description
of the tradeos between true positives and false negatives produced by a forecasting model. The
statistic is a non-parametric summary of the potential classication ability of a particular model.
In the current application, this is advantageous since it is hard to know how to weigh true and false
positives against true and false negatives when forecasting states of the business cycle.
4 In-sample results
Although the primary interest is in the out-of-sample performance of the two models, this section
rst evaluates their in-sample performance. Estimating the models on the full sample provides
a sense of how each model and forecast method performs on average. As an initial investigation
into the statistical and predictive power of commonly acknowledged leading indicators, table 2
presents estimates of univariate logit models at each horizon. For each variable, the full sample
4
For a complete introduction to ROC curves, see Pepe (2003).
10
of available data is used to estimate model parameters, which are then used to produce predicted
recession probabilities. The predicted probabilities are then compared to NBER-dened recessions
for evaluation. Table 2 displays three summary statistics that measure model t for each variable
and at each forecast horizon. The rst statistic contains the pseudo-R
2
that provides a measure of
model t. The second row is the t-statistic that tests the null hypothesis that the model coecient
from the logit regression is zero. The nal row is the AUC statistic, a non-parametric summary
statistic that summarizes the models ability to discern expansions from recessions. In the interest
of concision, tests of forecast accuracy are not presented for in-sample results.
[Table 2 about here.]
Variables that describe real economic activity perform best in the very short-term. For variables
that describe real economic activity, both model t and classication ability decline as the forecast
horizon grows, an observation that is not unsurprising since the NBER recession dates themselves
are based on the contemporaneous behavior of similar variables. Both industrial production and
initial claims, for example, are useful indicators of the state of the economy at short horizons,
each with high pseudo-R
2
and AUC statistics approaching 0.90. In both instances, however, the
predictive power of the indicator is limited at longer horizons.
In contrast, many of the nancial indicators exhibit forward-looking behavior as their most
precise t is not contemporaneous. The slope of the yield curve performs best at the horizon of 12
months. Unconditionally, the level and curvature of the yield curve appear to contain only modest
information about the state of the economy. Variables that reect turmoil in nancial markets
give information that reects the probability of a recession, particularly in the near-term. Changes
in stock prices also appear to contain modest explanatory power, although only at short forecast
horizons. Neither nominal nor real money growth appear to consistently contain useful information
across forecast horizons, with very low pseudo-R
2
values and modest forecast ability. Finally,
although model t and classication ability are distinct features of a model, in the application here
the two align very closely. Models that have a better t also display superior classication ability.
4.1 Model averaging
Table 3 evaluates the forecasts produced by the two weighting schemes described in section 2.2.
The top panel presents results from the equally-weighted forecasts of each of the K models; the
11
bottom panel weights each forecast according to its posterior probability as in equation (4). Fore-
cast combination produces forecasts that do not perform worse than their univariate counterparts,
indeed, often forecast performance is improved. Weighing the probabilistic forecasts by model pos-
terior probabilities does not improve the recession forecasts over the equally-weighted forecasts.
At forecast horizons of six months or less, the equally weighted forecasts clearly outperform the
weighted forecasts.
[Table 3 about here.]
Figure 1 gives further insight into the performance of the combined forecasts. The posterior
probabilities of the models strongly favor only one or in same cases, two, forecast models. Con-
temporaneously, both industrial production and initial claims are included in the forecast model;
forecasts from all other models receive a weight of zero. The slope of the yield curve is strongly
preferred at forecast horizons of six and twelve months; indeed, it is the only forecast model in-
cluded in the model averaging procedure. At longer horizons18 and 24 monthscorporate yield
spreads dominate in terms of model t, and are the only indicators that receive non-zero weight.
[Figure 1 about here.]
4.2 Forecasts from the boosted models
4.2.1 A linear model
An alternative to assigning weights to univariate forecasts is to use a model selection procedure to
specify the forecast model. This section evaluates the empirical performance of one such procedure,
boosting. Equation (5) is estimated using a loss function of negative one-half times the Bernoulli
log likelihood function. Each indicator listed in Table 1 is included in the model search. The initial
weak learner specied as a univariate linear function; this is equivalent to the logit models used
in the forecast averaging exercise. At each iteration m, the covariate that minimizes the empirical
loss at that iteration is included in the forecast model. After the M iterations, the nal model is:

F
M
(x) =
M

m=1

m

f
m
(x)

f
m
(x) = f

(x) (7)
= arg min
k
T

t=1
(u
t


f
k
(x
t
))
2
12
I set
m
= = 0.1, as is common in the boosting literature. The number of boosting iterations M
is chosen to be the one that minimizes the AIC of the nal boosted model.
Table 4 presents the in-sample estimates of equations (5) and (7). As the number of iter-
ations grows large, the boosted model is equivalent to a kitchen-sink logit model. Thus as a
method of comparison, the table presents the ratio of the coecient from the boosted model to
its unrestricted kitchen-sink logit counterpart for each variable included in the model search (i.e.,

boost
/
kitchensink
).
5
The forecasting models produced by the boosting model dier from their BMA
counterpart in that there are many more indicators included in the forecast model at each horizon
on average. However, the coecients for the variables included in the forecasting model are shrunk
signicantly towards zero.
[Table 4 about here.]
Figure 2 presents the model for each forecast horizon in a slightly dierent way. The gure
shows the fraction of iterations that the boosting algorithm selected a particular covariate for
both the linear and non-linear variants of the model.
6
Although there are many more variables
included in the boosted model, the variables that are selected are qualitatively similar to the
forecasting models produced by Bayesian Model Averaging. At short forecast horizons, measures
of real activity and a nancial indicators are included in the forecast model. Unsurprisingly, when
forecasting a six and twelve months ahead, the model relies on the slope of the yield curve to
forecast business cycle turning points. Only at some horizons do other elements of the yield curve
enter the modelcurvature enters into models forecasting into the medium-termand the level
of the curve is informative for forecasts into the very distant future. Corporate yield spreads are
included in the models that forecast into the distant future. Finally, several variables are never
or only marginally included in the nal forecasting models. The measures of money growth, both
nominal and real, as well as VIX, are essentially never included in the nal forecast model.
5
Since the boosted model minimizes one-half the log of the odds-ratio, coecients from the boosted model are
doubled to facilitate comparison.
6
The plot is of
h
k
for each covariate k. h denotes forecast horizon and denotes the covariate with minimum
mean squared error at iteration m.

h
k
=
1
M
M

m=1
I(k = ); (8)
13
The bottom half of table 4 describes the classication ability of each model. The value of
the maximized AIC is shown in the table. In the interest of brevity I do not report the AIC
models from the univariate models reported in table 2 but for each forecast horizon, the boosted
model is strongly preferred. For example, at the horizon of 12 months, the AIC of the best-tting
univariate modelthe one using the slope of the yield curveis 280.3, which is clearly dominated by
the boosted AIC. In terms of classication ability, the models produce predicted probabilities that
track NBER recession dates very well, particularly at short horizons. When nowcasting, the linear
model achieves an AUC statistic of 0.97, indicating that model achieves near-perfect classication
of economic activity contemporaneously. At longer horizons the AUC decreasesat a forecast
horizon of two years the AUC falls to 0.84. Relative to the AUC values in table 2, the boosted
model has considerably higher classication ability than the best univariate model at each horizon.
Although many of the forecasting models are sparse, adding information from disparate variables
improves both model t and forecast ability.
4.2.2 A nonlinear model
In a standard logistic model, the unobserved latent variable depends on the covariates in a linear
fashion, as in equation (1). However, this functional form is used only because it is convenient, and
there are good reasons to believe that a non-linear specication is appropriate. Financial market
indicators are often erratic and behave in a non-linear fashion. Hamilton (2005) and Morley &
Piger (2012) argue that movements from one state of the economy to another are associated with
non-linear movements in real economic variables.
For these reasons, non-linearity is introduced into the forecast model using the smoothing
splines of Eilers & Marx (1996). Incorporating smoothing splines into the boosting algorithm is
straightforward. The weak learner for each covariate k within the boosting algorithm is specied
to minimize the penalized sum of squared error:
PSSE(f
k
, ) =
T

t=1
[y
t
f
k
(x
t
)]
2
+
_
[f
k

(z)]
2
dz (9)
where the smoothing parameter determines the magnitude of the penalty for functions with
a large second derivative. Splines have several attractive features: they are estimated globally,
conserve moments of the data and are computationally ecient as they are extensions of generalized
14
linear models.
7
As with the linear case, at each iteration m the covariate that best minimizes the
empirical loss at that iteration is included in the forecast model. Let f
k
m
be the smoothing spline
t to indicator k at iteration m, then at each iteration the weak learner can be expressed:

f
m
(x) = f

(x)

f
k
(x) = arg min
f(x)
_
PSSE(f, , x
k
)
_
(10)
= arg min
k
T

t=1
(u
t


f
k
(x
t
))
2
As before, the number of boosting iterations M is chosen to be the one that minimizes the AIC of
the nal boosted model.
The performance of the non-linear models in-sample is impressive. Table 5 displays the results
of the non-linear model t to the full sample. Comparing information criteria to that from the lin-
ear model presented in table 4, it is clear that the data prefer a non-linear version of the forecasting
model. For each forecast horizon, the AIC strongly prefers a non-linear specication to a linear
boosted model. The non-linear models also classify the data more accurately than their linear
counterparts. For each forecast horizon, the AUC indicates that the non-linear specication im-
proves the classication ability of the indicators. The dierence between forecasting models is most
notable at long forecast horizons. The dierence between the AUC of the linear and the non-linear
models is not statistically dierent when forecasting at h = 0. However, when forecast six months
ahead, the dierence is statistically signicant at the 90 percent condence level. When forecasting
at horizons longer than six months the dierence is signicant at the 95 percent condence interval.
[Table 5 about here.]
The second panel of gure 2 presents the fraction of time the algorithm included a particular
covariate at each iteration for each non-linear forecasting model. It is clear that no covariate
dominates a forecasting model as was the case when performing model averaging. Nonlinearity
allows some covariates to enter into the forecasting model when they were excluded from the linear
7
Eilers & Marx approximate the penalty term in (9) by constraining the dierence in parameter values of the
spline in neighboring regions of the data, transforming the problem into a modied regression equation that is
computationally very ecient. Buhlmann & Yu (2003) have explored boosting with smoothing spline weak learners
and nd that setting = 4 is a reasonable setting for this parameter, which is the value used here.
15
case. For example, VIX, which was rarely included in the linear model, enters into the nonlinear
model at all forecast horizons, the contemporaneous model most frequently.
[Figure 2 about here.]
4.3 Comparison to NBER recession dates
Figure 3 shows the recession probabilities from each of the methods estimated at a forecast horizon
of zero months. The graphs also contain an optimal threshold that is used to produce a binomial
series. Specically, given the assumption of symmetric utility/disutility from true/false positives,
the utility of a classier is the dierence between the true and false positive rates (Berge & Jord`a
2011). The threshold shown for each forecast is the one that maximizes this dierence.
As was highlighted by the models chosen by the BMA averaging method and the model selection
scheme, the economic indicators vary widely in the information that they carry at dierent forecast
horizons. When forecasting contemporaneous probability of recession, many of the indicators were
found to have only modest utility. Although model averaging alleviates many problems concerning
model misspecication, the average still depends critically on the utility of the underlying indica-
tors. As seen in panel a) of gure 3, the result of averaging ineective measures is a forecast that
hovers near the unconditional probability of recession. The BMA forecast (panel b) highlights the
opposite extreme. The model relies heavily on only one indicator, initial claims for unemployment
insurance. The forecasts appear to be an improvement over the unweighted averagethe probabil-
ity of recession crosses threshold during each of the recessionsbut the forecasts also have a high
variance. The bottom half of the gure shows the in-sample forecasts from the model selection
method. The two boosted models produce in-sample forecasts that are quite similar. The proba-
bilities from each cross the threshold during NBER recessions, and conditional on the use of the
optimal threshold, there are no false positives in the sample.
[Figure 3 about here.]
The in-sample probabilities of recession can be combined with the threshold value to produce
a chronology of business cycle turning points for the U.S. economy. Table 6 shows the peaks and
troughs from each of the four models, relative to NBER recession dates. For each model, peaks
and troughs are the rst and nal month for which the recession probability is equal to or greater
16
than the threshold. I also impose the natural restriction each phase of the business cycle lasts more
than three months. The recession dates that result from the model averaging scheme align with
the NBER dates less closely than dates from the boosted models. For example, both the weighted
and unweighted model averaging schemes produce a peak date for the most recent recession period
of August 2008, whereas the peak date implied by the boosted models is March 2008. The model
averaging schemes produce two false negative eventsunweighted BMA misses the 1990 and 2001
recessions, while the weighted BMA forecasts miss the 1981 and 1991 recessions. The linear boosted
event also produces a false negative event for the 2001 recession. In the case of the BMA forecast
and the linear boosted model forecast, the probability of recession does cross the threshold, but
dips back below the threshold during the recession event. In each case, the rule that a recession
last more than three months lters these signals out. Finally, each of the models appear to have
an easier time identifying trough dates rather than peak; for example, the trough dates from the
non-linear boosted model are never more than one month dierent than the NBER-dened dates.
[Table 6 about here.]
5 Out-of-sample performance
Both the model averaging scheme and the model selection algorithm produce valuable probabilistic
forecasts of recession in-sample. However, aggressive model search may produce models that overt
the data, which would reduce the out-of-sample forecast ability of the method. Weighting forecasts
with a statistical measure of t carries the same risk. This section considers the out-of-sample
performance of the forecasts produced by the four alternative methods presented above. Forecasts
are produced using an expanding window, and the initial out-of-sample forecast is made for May
1985. From this point forward, at each point in time, a total of 20 forecast models are estimated:
each of the four dierent model produces forecasts of recession at ve horizons, zero, six, 12, 18
and 24 months. After the forecasts have been produced, an additional data point is added to the
model and the process is repeated.
Table 7 displays the results of the out-of-sample exercise. Although the forecast performance
is diminished relative to the in-sample forecasts, it is clear that each of the methods produces
forecasts that are superior to a naive forecast. All four methods clearly outperform a naive forecast
in terms of the Giacomini-White test statistic, and the Area Under the ROC curve also shows
17
that the models have superior classication ability. As was the case for the for the in-sample
forecasts, the forecasts produced by the boosting algorithms outperform those produced by the
model combination schemes.
[Table 7 about here.]
A risk of either the model combination or model selection scheme is that they are unable to
produce weights or model parameters that can produce functional forecasts out-of-sample. This
concern does not appear to be born out by either of the methods. The posterior probabilities of
the models do not change in a meaningful way as the models are re-estimated with the expanding
window. For example, the slope of the yield curve receives by far the highest weight when producing
BMA-weighted forecasts at a forecast horizon of 12 monthsin-sample, the slope of the yield curve
received a weight of 0.99. For each of the models forecasted with the expanding window, the
minimum weight received by the slope of the yield curve was 0.997; essentially the BMA weights
did not change through time.
In contrast, the boosted models display a much higher degree of exibility. Figure 4 is the out-of-
sample analogue of gure 2; it displays the frequency of times for which a given covariate is selected
by the boosting algorithm for the linear model that forecasts the twelve month ahead probability
of recession. In the interest of legibility, the ve most frequently selected covariates are displayed,
with all other covariates grouped together so that for each forecast model the frequencies summed
across covariates is equal to one. The forecast models are quite stable throughout expansion periods,
relying primarily on the slope and level of the yield curve. However, the models display substantial
variation around business cycle turning points. For example, at the most recent recession, the
inuence of the Treasury yield curve goes down substantially, while the inuence of the BAA-10Y
yield spread, the value of the dollar and industrial production increase substantially (these are all
within the All others category). That the boosted model outperforms the model combination
schemes reveals that although the boosted models change through time, the method is able to select
those covariates that contain useful information for forecasting purposes.
[Figure 4 about here.]
To get a sense for the performance of the models in real-time, gure 5 displays the out-of-
sample contemporaneous (h=0) forecasts for each of the four forecast models between July 2007
18
and January 2010 in six month increments. The NBER declared the most recent recession to
have been December 2007June 2009. Forecasts need to be considered within the context of their
thresholds, which for the full-sample, are shown in gure 3.
8
The forecasts from each of the
boosted models easily surpass their thresholds of 0.40 beginning in January 2008. The forecasted
probability of recession dips back below the optimal threshold for one month (in February for
the linear model; March for the nonlinear boosted model), but after that each stays above the
threshold values until July of 2009. The boosted models provide a clear signal that the economy
is in recession; the estimated probability of recession is nearly one in each of the months displayed
in the gure. The forecasts from the model combination schemes provide a less clear signal. The
unweighted model average does not surpass its threshold value of 0.21 until September of 2008, and
achieves a maximum value of only 0.44 in November, 2008. The BMA-weighted forecasts perform
somewhat betterthe estimated probability hovers near a value of 0.3 beginning in January of
2008. However, the signal does not cross its threshold value of 0.55 until September 2008. The
BMA-weighted contemporaneous forecast stays above this value until May 2009.
[Figure 5 about here.]
6 Conclusion
There is an intense interest in establishing the current and future states of the business cycle.
Yet even a casual consumer of macroeconomic news would observe the spotty record economists
have when forecasting economic downturns. This paper evaluates the information content of many
commonly cited economic indicators, providing evidence that many economic indicators do con-
tain information useful for forecasting business cycle turning points. However, dierent indicators
provide valuable information about dierent forecast horizons. This observation complicates the
modeling decision faced by forecasters, especially when one considers the nonlinear behavior of
many dierent economic indicators around business cycle turning points.
The methods of model averaging and model selection highlight the diculties of forecasting
business cycle turning points. Since many commonly followed indicators are valuable only at
particular forecast horizons or have only modest predictive value at any horizon, a simple model
8
The thresholds change little when estimated in a true, out-of-sample way. For this reason, and in the interest of
concision, I suppress comparison to the out-of-sample threshold.
19
average dilutes useful information. In contrast, a weighting scheme that gives each model its weight
based on a posterior probability produces more accurate forecasts. However, the forecasts rely on
very few indicators; posterior probabilities of the univariate models t at dierent horizons strongly
prefer one indicator to all others. Forecasts of the current state of the economy rely on measures
of real economic activity: industrial production and initial claims. In contrast, forecasts into the
medium-term (six or 12 months) rely on signals originating from the bond market. When forecasting
with a model selection algorithm, models tend to condition on a wide number of economic indicators.
Boosting exploits the bias-variance tradeo: although estimates of model parameters are biased,
the forecasts are more accurate due to reduced variance. The method is also able to take the
nonlinear behavior of real economic variables around business cycle turning points into account.
Out-of-sample evidence indicates that the boosted models do not overt.
Overall, the results indicate that there is no sucient summary statistic for identifying or
forecasting business cycle turning points. Indeed, this is the approach taken by the Business Cycle
Dating Committee of the NBER, who consider a wide-variety of economic indicators when making
pronouncements regarding the state of the economy. At the same time, the power of the yield curve
as a predictor of future economic activity endures, although combining the information from the
yield curve with information from other leading indicators improves forecast ability.
20
References
Bates, J. & Granger, C. (1969), The combination of forecasts, Operations Research Quarterly
20, 451468.
Berge, T. J. & Jord`a, O. (2011), Evaluating the classication of economic activity, American
Economic Journal: Macroeconomics 3.
Brier, G. & Allen, R. (1951), Verication of weather forecasts, in Compendium of Meteorology,
American Meteorology Society, pp. 841848.
Brock, W. A., Durlauf, S. N. & West, K. D. (2007), Model uncertainty and policy evaluation: Some
theory and empirics, Journal of Econometrics 136(2), 629664.
URL: http://ideas.repec.org/a/eee/econom/v136y2007i2p629-664.html
Buhlmann, P. & Yu, B. (2003), Boosting with the l2 loss: Regression and classication, Journal of
the American Statistical Association 98, 324340.
Burns, A. F. & Mitchell, W. C. (1946), Measuring Business Cycles, National Bureau of Economic
Research, New York, NY.
Chauvet, M. & Potter, S. (2005), Forecasting recessions using the yield curve, Journal of Forecasting
24(2), 77103.
Chauvet, M. & Senyuz, Z. (2012), A dynamic factor model of the yield curve as a predictor of the
economy, Technical Report 2012-32, Federal Reserve Board.
Dueker, M. (2005), Dynamic forecasts of qualitative variables: A qual var model of u.s. recessions,
Journal of Business & Economic Statistics 23, 96104.
URL: http://ideas.repec.org/a/bes/jnlbes/v23y2005p96-104.html
Eilers, P. H. & Marx, B. D. (1996), Flexible smoothing with b-splines and penalties, Statistical
Science 11(2), 89121.
Elliott, G. & Lieli, R. P. (2009), Predicting binary outcomes. U.C.S.D. mimeograph.
Estrella, A. (1998), A new measure of t for equations with dichotomous dependent variables,
Journal of Business & Economic Statistics 16(2), 198205.
Estrella, A. & Mishkin, F. S. (1998), Predicting u.s. recessions: Financial variables as leading
indicators, The Review of Economics and Statistics 80(1), 4561.
Friedman, J. (2001), Greedy function approximation: A gradient boosting machine, The Annals of
Statistics 29(5).
Giacomini, R. & White, H. (2006), Tests of conditional predictive ability, Econometrica
74(6), 15451578.
Gurkaynak, R. S., Sack, B. & Wright, J. H. (2007), The u.s. treasury yield curve: 1961 to the
present, Journal of Monetary Economics 54(8), 22912304.
Hamilton, J. D. (2005), Whats real about the business cycle?, Review pp. 435452.
Hand, D. J. & Vinciotti, V. (2003), Local versus global models for classication problems: Fitting
models where it matters, The American Statitician 57, 124131.
21
Hendry, D. F. & Clements, M. P. (2004), Pooling of forecasts, The Econometrics Journal 7.
Kauppi, H. & Saikkonen, P. (2008), Predicting u.s. recessions with dynamic binary response models,
The Review of Economics and Statistics 90(4), 777791.
Leamer, E. E. (1978), Specication Searches: Ad Hoc Inference With Nonexperimental Data, John
Wiley & Sons, Inc., New York, NY.
Morley, J. & Piger, J. (2012), The asymmetric business cycle, The Review of Economics and
Statistics 94(1), 208221.
Pepe, M. S. (2003), The Statistical Evaluation of Medical Test for Classication and Prediction,
Oxford: Oxford University Press.
Raftery, A. E. (1995), Bayesian model selection in social research, Sociological Methodology 25, 111
163.
Rudebusch, G. D. & Williams, J. C. (2009), Forecasting recessions: the puzzle of the enduring
power of the yield curve, Journal of Business & Economic Statistics 27(4).
Sala-I-Martin, X., Doppelhofer, G. & Miller, R. I. (2004), Determinants of long-term growth: a
bayesian averaging of classical estimates (bace) approach, The American Economic Review
94(4), 813835.
Stock, J. H. & Watson, M. (2004), Combination forecasts of output growth in a seven-country data
set, Journal of Forecasting (23), 405430.
Timmermann, A. (2006), Forecast combinations, in G. Elliott, C. Granger & A. Timmermann, eds,
Handbook of Economic Forecasting.
Wright, J. H. (2006), The yield curve and predicting recessions, Technical report, Federal Reserve
Board.
22
Figures and tables
Variable Denition Transformation
Interest rates and interest rate spreads
Level of yield curve Average of 3-mo, 2- and 10-year yields
Slope of yield curve 10-yr less 3-mo yield
Curvature of yield curve 2x2-yr minus sum of 3-mo and 10-yr yield
TED spread 3-mo. ED less 3-mo. treasury yield
BAA corporate spread BAA less 10-yr. treasury yield
AAA corporate spread AAA less 10-yr. treasury yield
Other nancial variables
Change in stock index Dow Jones Industrial Average 3-month log dierence
Money growth M2 3-month log dierence
Real money growth M2 deated by CPI 3-month log dierence
U.S. dollar Trade-weighted dollar 3-month log dierence
VIX VIX from CBOE and extended following Bloom
Macroeconomic indicators
Output Industrial production (s.a.) 3-month log dierence
Housing permits 3-month log dierence
Initial claims 4-week moving average (s.a.) 3-month log dierence
Weekly hours, manufacturing 3-month log dierence
Purchasing managers index 3-month log dierence
Table 1: Variables included in forecasting models.
23
N h=0 h=6 h=12 h=18 h=24
Level Pseudo-R
2
477 0.03 0.05 0.05 0.04 0.01
t-statistic 3.82 4.97 4.52 4.06 2.35
AUC 0.61 0.65 0.65 0.64 0.59
Slope Pseudo-R
2
477 0.04 0.23 0.28 0.21 0.10
t-statistic -4.09 -8.48 -8.62 -7.96 -6.23
AUC 0.63 0.83 0.88 0.86 0.78
Curve Pseudo-R
2
477 0.00 0.00 0.01 0.02 0.03
t-statistic 0.76 0.66 1.56 3.07 3.31
AUC 0.49 0.49 0.54 0.61 0.63
TED Pseudo-R
2
477 0.27 0.19 0.07 0.04 0.01
t-statistic 9.16 8.24 5.43 4.55 2.02
AUC 0.83 0.82 0.76 0.72 0.62
SP AAA10Y Pseudo-R
2
477 0.01 0.14 0.21 0.24 0.15
t-statistic -2.08 -7.05 -7.90 -8.04 -6.92
AUC 0.565 0.77 0.85 0.87 0.81
SP BAA10Y Pseudo-R
2
477 0.00 0.10 0.18 0.22 0.14
t-statistic 0.69 -6.15 -7.49 -7.67 -6.65
AUC 0.50 0.74 0.83 0.86 0.80
DJIA Pseudo-R
2
477 0.11 0.05 0.00 0.00 0.00
t-statistic -6.57 -4.91 -1.42 0.52 1.23
AUC .741 0.71 0.57 0.51 0.54
M2 Pseudo-R
2
477 0.01 0.00 0.01 0.00 0.00
t-statistic 1.87 1.37 1.68 0.27 -0.00
AUC 0.57 0.57 0.60 0.52 0.49
RM2 Pseudo-R
2
477 0.02 0.04 0.02 0.04 0.02
t-statistic -2.61 -3.89 -3.12 -3.78 -3.07
AUC 0.62 0.65 0.61 0.64 0.61
TWD Pseudo-R
2
477 0.01 0.01 0.00 0.00 0.00
t-statistic 2.51 2.06 1.23 -1.30 -0.70
AUC 0.57 0.55 0.54 0.55 0.54
INDPRO Pseudo-R
2
477 0.32 0.04 0.00 0.00 0.00
t-statistic -8.58 -4.19 -0.92 0.05 -0.14
AUC 0.88 0.71 0.59 0.48 0.53
PERMIT Pseudo-R
2
477 0.15 0.12 0.02 0.02 0.00
t-statistic -7.44 -6.88 -3.24 -3.07 -1.28
AUC 0.76 0.76 0.67 0.67 0.59
IC4WSA Pseudo-R
2
477 0.33 0.06 0.01 0.00 0.00
t-statistic 8.74 5.17 2.52 0.73 -0.75
AUC 0.88 0.74 0.65 0.55 0.55
AWHMAN Pseudo-R
2
477 0.11 0.02 0.00 0.00 0.00
t-statistic -6.12 -3.07 -1.98 -0.16 0.18
AUC 0.79 0.65 0.61 0.50 0.53
NAPM Pseudo-R
2
477 0.10 0.05 0.00 0.00 0.00
t-statistic -6.01 -4.32 -1.19 -1.13 0.29
AUC 0.69 0.68 0.62 0.59 0.50
VIX Pseudo-R
2
477 0.09 0.01 0.00 0.01 0.03
t-statistic 5.95 2.57 -0.68 -1.60 -3.24
AUC 0.77 0.64 0.47 0.53 0.63
Table 2: In-sample statistics for univariate forecasts for each indicator. The rst row presents the
pseudo-R
2
of Estrella & Mishkin (1998), the second row is the z-score of the test of whether the
slope coecient is dierent from zero, and the nal row gives the classication ability of the logit
model as measured by the AUC.
24
Forecast horizon
h=0 h=6 h=12 h=18 h=24
N 477 470 464 458 452
Equally weighted
GW (abs. error) 0.00 0.00 0.00 0.00 0.00
GW (sq. error) 0.00 0.00 0.00 0.00 0.00
AUC 0.953 0.888 0.858 0.872 0.811
(0.01) (0.02) (0.02) (0.02) (0.02)
BMA weights
GW (abs. error) 0.00 0.00 0.00 0.00 0.00
GW (sq. error) 0.00 0.00 0.01 0.02 0.08
AUC 0.890 0.834 0.887 0.876 0.813
(0.02) (0.03) (0.02) (0.02) (0.02)
Table 3: Forecast accuracy of combined recession forecasts, unweighted and weighted. See text for
details.
25
Forecast horizon
h=0 h=6 h=12 h=18 h=24
Level 7.1 9.4
Slope 26.7 53.7 50.4
Curve 33.2 58.5
TED 22.1 30.4 13.5
SP BAA10Y -2.3
SP AAA10Y 12.6 7.9
DJIA 31.1 33.4 51.4 33.9
M2
RM2 0.7 0.7
TWD 4.8 13.3 35.3 68.9
INDPRO 28.1 25.2 14.2
PERMIT 15.4 26.5
IC4WSA 13.1 9.9 12.7 15.5
AWHMAN 6.1 18.6 7.3
NAPM 6.5 17.7 41.8
VIX 31.8
const. 5.1 -9.9 -41.4 29.7
N 470 464 458 452 446
AIC 176.9 272.6 272.5 260.0 270.4
GW (abs. error) 0.00 0.00 0.00 0.00 0.00
GW (sq. error) 0.00 0.00 0.02 0.00 0.01
AUC 0.973 0.926 0.903 0.898 0.844
(0.01) (0.01) (0.02) (0.02) (0.03)
Table 4: In-sample results for linear boosted model. For each regressor, the table shows the
ratio of the coecient of the boosted model to the coecient from an unrestricted kitchen-sink
logit regression, expressed as a percentage. indicates that the coecient was not selected by
the boosted model. GW test statistics are p-values from a one-sided test that the model forecast
outperforms the null. P-values are robust to heteroskedasticity and autocorrelation. Standard
errors of AUC statistic in parentheses. See text for details. Standard errors of AUC statistic in
parentheses. See text for details.
26
Forecast horizon
h=0 h=6 h=12 h=18 h=24
N 470 464 458 452 446
AIC 139.1 193.5 189.2 192.0 184.5
GW (abs. error) 0.00 0.00 0.00 0.00 0.00
GW (sq. error) 0.00 0.00 0.00 0.00 0.00
AUC 0.985 0.970 0.957 0.948 0.920
(0.00) (0.01) (0.01) (0.01) (0.02)
Table 5: In-sample performance of boosted model t with smoothing splines as weak learners.
GW test statistics are p-values from a one-sided test that the model forecast outperforms the null.
P-values are robust to heteroskedasticity and autocorrelation. Standard errors of AUC statistic in
parentheses. See text for details. Standard errors of AUC statistic in parentheses. See text for
details.
27
Peak dates
NBER BMA (unweighted) BMA (weighted) Linear boost Non-linear boost
1973:11 8 10 6 2
1980:1 -2 4 -4 -1
1981:7 0 0 1
1990:7 4 3
2001:3 -1 0
2007:12 9 9 3 3
Trough dates
NBER BMA (unweighted) BMA (weighted) Linear boost Non-linear boost
1975:3 -2 0 2 1
1980:7 0 1 1 0
1982:11 -9 0 0
1991:3 -1 0
2001:11 -6 1
2009:6 -1 -1 0 0
Table 6: Business cycle turning points, 1973-2012. Value shown for model-implied recession dates
is the peak/trough calculated using the optimal threshold and a rule that each phase of the business
cycle last more than three months, relative to the NBER-dened peak/trough date. indicates
that the model did not generate a business cycle phase lasting 3 months that corresponds to the
NBER recession. See text for details.
28
Forecast horizon
h=0 h=6 h=12 h=18 h=24
N 332 332 332 332 332
Equally weighted
GW (abs. error) 0.00 0.00 0.00 0.00 0.00
GW (sq. error) 0.01 0.01 0.00 0.00 0.04
AUC 0.810 0.721 0.756 0.797 0.705
(0.04) (0.04) (0.03) (0.03) (0.04)
BMA weighted
GW (abs. error) 0.00 0.00 0.00 0.00 0.00
GW (sq. error) 0.03 0.14 0.00 0.00 0.00
AUC 0.803 0.741 0.887 0.923 0.842
(0.05) (0.04) (0.03) (0.02) (0.03)
Linear boost
GW (abs. error) 0.000 0.000 0.000 0.000 0.000
GW (sq. error) 0.002 0.041 0.049 0.100 0.125
AUC 0.928 0.890 0.904 0.870 0.858
(0.03) (0.03) (0.02) (0.03) (0.03)
Nonlinear boost
GW (abs. error) 0.000 0.000 0.000 0.000 0.000
GW (sq. error) 0.001 0.007 0.003 0.005 0.015
AUC 0.947 0.902 0.949 0.915 0.920
(0.02) (0.03) (0.01) (0.03) (0.03)
Table 7: Out-of-sample forecast performance. GW test statistics are p-values from a one-sided
test that the model forecast outperforms the null. P-values are robust to heteroskedasticity and
autocorrelation. Standard errors of AUC statistic in parentheses. See text for details.
29
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1

L
e
v
e
l


S
l
o
p
e


C
u
r
v
e

T
E
D


A
A
A
-
1
0
Y


B
A
A
-
1
0
Y


D
J
I
A


M
2

R
M
2

T
W
D


I
N
D
P
R
O


P
E
R
M
I
T


I
C
4
W
S
A


A
W
H
M
A
N


N
A
P
M

V
I
X

h=1
h=6
h=12
h=18
h=24
Figure 1: Posterior probabilities, in-sample BMA exercise
30
0.0%
5.0%
10.0%
15.0%
20.0%
25.0%
30.0%
35.0%
40.0%
C
o
n
s
t
L
e
v
e
l
S
lo
p
e

C
u
r
v
e

T
E
D

B
A
A
-1
0
Y

A
A
A
-1
0
Y

D
JIA

M
2

R
M
2

T
W
D

IP

P
e
r
m
it
IC
4
W
M
A

A
W
H
M
A
N

N
A
P
M

V
IX

h=0
h=6
h=12
h=18
h=24
0.0%
5.0%
10.0%
15.0%
20.0%
25.0%
30.0%
35.0%
40.0%
C
o
n
s
t
L
e
v
e
l
S
lo
p
e

C
u
r
v
e

T
E
D

B
A
A
-1
0
Y

A
A
A
-1
0
Y

D
JIA

M
2

R
M
2

T
W
D

IP

P
e
r
m
it
IC
4
W
M
A

A
W
H
M
A
N

N
A
P
M

V
IX

h=0
h=6
h=12
h=18
h=24
Figure 2: In-sample model selection frequency, linear and non-linear models. The bar graph presents
the fraction of iterations for which a particular covariate was selected by the model selection pro-
cedure for each model, linear and non-linear and for each forecast horizon. See text for details.
31
0
.
2
.
4
.
6
.
8
1
1970m1 1980m1 1990m1 2000m1 2010m1
date
Optimal threshold for unweighted average: 0.21
Unweighted average in-sample forecast
0
.
2
.
4
.
6
.
8
1
1970m1 1980m1 1990m1 2000m1 2010m1
date
Optimal threshold for BMA: 0.58
BMA weighted in-sample forecast
0
.
2
.
4
.
6
.
8
1
1970m1 1980m1 1990m1 2000m1 2010m1
date
Optimal threshold for unweighted average: 0.41
Linear boosted model
0
.
2
.
4
.
6
.
8
1
1970m1 1980m1 1990m1 2000m1 2010m1
date
Optimal threshold for BMA: 0.41
Non-linear boosted model
Figure 3: In-sample recession probabilities forecast at horizon of zero months. The top panels
contain forecasts from model combination schemes, both unweighted (left) and Bayesian model
averaged (right). The two bottom panels contain recession probabilities from linear (left) and non-
linear (right) boosted forecast models. Grey shading indicates NBER-dened recession dates and
dashed line denotes optimal threshold. See text for details.
32
0
0.25
0.5
0.75
1
M
a
y
-
8
4

M
a
y
-
8
5

M
a
y
-
8
6

M
a
y
-
8
7

M
a
y
-
8
8

M
a
y
-
8
9

M
a
y
-
9
0

M
a
y
-
9
1

M
a
y
-
9
2

M
a
y
-
9
3

M
a
y
-
9
4

M
a
y
-
9
5

M
a
y
-
9
6

M
a
y
-
9
7

M
a
y
-
9
8

M
a
y
-
9
9

M
a
y
-
0
0

M
a
y
-
0
1

M
a
y
-
0
2

M
a
y
-
0
3

M
a
y
-
0
4

M
a
y
-
0
5

M
a
y
-
0
6

M
a
y
-
0
7

M
a
y
-
0
8

M
a
y
-
0
9

M
a
y
-
1
0

M
a
y
-
1
1

M
a
y
-
1
2

AllOthers
PERMIT
Curve
M2
Level
Slope
Figure 4: Out of sample selection frequency of covariates for linear boosted model. The gure
shows the fraction of time a covariate was selected for the model forecasting 12 months ahead at
time t, as forecast model is run through the sample. See text for details.
33
0
.
2
5
.
5
.
7
5
1
P
r
o
b
a
b
i
l
i
t
y

o
f

r
e
c
e
s
s
i
o
n
July 2007 Jan 2008 July 2008 Jan 2009 July 2009 Jan 2010
Equal weights BMA weights
Linear boosted Non-linear boosted
Figure 5: Contemporaneous out-of-sample recession probabilities at forecast horizon of h=0. See
text for details.
34
Appendix
Data sources
Table 8: Data sources
Variable Available dates Source
Interest rates
Eective Fed Funds Rate 1954m7-2012m12 FRB H.15 release
3-month Eurodollar deposit rate 1971m1-2012m12 FRB H.15 release
3-month Treasury yield (secondary market) 1934m1-2012m12 Gurkaynak, Sack and Wright (2006)
1-year Treasury yield (constant maturity) 1953m4-2012m12 Gurkaynak, Sack and Wright (2006)
10-year Treasury yield (constant maturity) 1953m4-2012m12 Gurkaynak, Sack and Wright (2006)
Moodys BAA corporate bond yield 1947m1-2012m12 FRB H.15 release
Other nancial variables
Dow Jones Industrial Average 1947m1-2012m12 Dow Jones & Company
M2 (seasonally adj.) 1959m1-2012m12 FRB H.6 release
Trade-weighted US Dollar: major currencies 1973m1-2012m12 FRB H.10 release
Macroeconomic indicators
CPI index (seasonally adj.) 1947m1-2012m12 U.S. Dept of Labor: BLS
Industrial production (seasonally adj.) 1947m1-2012m12 FRB G.17 release
ISM manufacturing PMI index 1948m1-2012m12 Institute for Supply Management
Housing permits 1960m1-2012m12 U.S. Census Bureau
Average weekly hours: manufacturing 1947m1-2012m12 U.S. Dept of Labor: BLS
Initial claims for unemployment insurance 1967m1-2012m12 U.S. Dept of Labor: BLS
35