Are you sure?
This action might not be possible to undo. Are you sure you want to continue?
Leading Indicators: Model
Averaging and Selection Over
the Business Cycle
Travis Berge
April 2013
RWP 1305
April 2013
Predicting recessions with leading indicators:
model averaging and selection over the business cycle
∗
Abstract
This paper evaluates the ability of several commonly followed economic indicators to predict business
cycle turning points. As a baseline, forecasts from univariate models are combined by taking averages or
by weighting forecasts with modelimplied posterior probabilities. These combined forecasts are compared
to those from a sophisticated model selection algorithm that allows for nonlinear model speciﬁcations. The
preferred forecasting model is one that allows for nonlinear behavior across the business cycle and combines
information from the yield curve with other indicators, especially at very short and very long horizons.
• JEL: C25, C53, E32
• Keywords: Business cycle turning points; variable selection; model averaging; probabilistic forecasts.
Travis J. Berge
Federal Reserve Bank of Kansas City
One Memorial Drive
Kansas City, MO 64198
Email: travis.j.berge@kc.frb.org
∗
The views herein do not necessarily reﬂect the views of the Federal Reserve Bank of Kansas City or the Federal
Reserve System.
1 Introduction
The business cycle turning points of the National Bureau of Economic Research determine accounts
of economic conditions in the United States, even decades following the end of a given recession.
For households, recession suggests lower earnings and high unemployment. For businesses, slow
economic growth reduces demand for products and decreases the number of proﬁtable economic
opportunities. Given these observations, the enthusiasm with which households, businesses and
policymakers wish to determine the current and future states of the economy comes as no surprise.
Classifying economic variables into leading, coincident and lagging indicators of economic activ
ity is a longlived tradition in economic research (Burns & Mitchell 1946). This paper examines the
ability of closelyfollowed leading indicators to produce useful forecasts of recession. Academic re
search emphasizes the information content of the yield curve for turning point prediction (see, e.g.,
Estrella & Mishkin, 1998; Wright, 2006; Kauppi & Saikkonen, 2008; and Rudebusch & Williams,
2009). However, practitioners follow a wide range of economic indicators, so that the production
of accurate forecasts of business cycle turning points requires the combination of information from
many indicators. Of course, forecast accuracy depends critically on using the correct indicators to
forecast at the proper horizon. Moreover, while there are many economic indicators that provide
useful signals of the present and future states of the economy, the relationships between these indi
cators and the state of the economy has not been stable over time. For example, the ﬁnancialization
of the U.S. economy and decline of manufacturing as a major sector of the economy likely altered
the predictive content of many ﬁnancial indicators of the economy. Further complicating matters,
Hamilton (2005) and Morley & Piger (2012) have documented the asymmetry of many economic
indicators around business cycle turning points. Asymmetry complicates the forecasting problem,
as it is not suﬃcient to specify a model for the joint evolution of the state of the economy and a
vector of leading indicators. Instead, diﬀerent forecasting models are required when forecasting at
diﬀerent horizons.
This paper produces forecasts of business cycle turning points, focusing on combining informa
tion from a standard set of economic variables. The baseline model relates each possible covariate
to the state of the economy and forecasting up to 24 months in the future. Two methods pro
duce forecasts that condition on multiple indicators of the economy. First, a forecast combination
1
approach weights univariate models, either equally or with a Bayesian Model Averaging approach
that weights each forecast by its modelimplied posterior probability. Secondly, a model selection
algorithm is used to endogenously select the forecasting model most appropriate for each forecast
horizon. By producing models that are biased, the method is able to condition on a large number
of indicators but produce forecasts that have low variance. The method also allows for nonlinear
dynamics across the business cycle.
Many of the indicators analyzed contain information that can be exploited to make forecasts of
future states of the economy. The power of the yield curve as a predictor of future economic activity
endures. The results indicate that the predictive power is limited to forecasts made at the medium
term—real economic variables most accurately describe the current state of the economy. Interest
rate spreads on corporate bonds also appear to contain information useful for forecasting, especially
at very long horizons. Both model averaging and model selection produce useful probabilistic
forecasts of recession, although the forecasts are produced with models that are quite diﬀerent.
When forecasts are weighted by posterior probabilities, the data strongly favor one or possibly
two indicators for forecasts at each horizon. In contrast, forecasts produced using model selection
methods condition on a much larger number of indicators. The use of many diﬀerent indicators
improves forecast ability, particularly in an outofsample forecast exercise.
The plan for the paper is as follows. The next section describes the methods used for model
combination and model selection. Section 3 describes the data and the evaluation of probabilistic
forecasts of a discrete outcome. Sections 4 and 5 present the empirical results.
2 Empirical setup
2.1 A baseline model
Let Y
t
denote the state of the business cycle as determined by the NBER business cycle dating
committee, where Y
t
= 1 denotes that month t is an NBERdeﬁned recession and Y
t
= 0 indicates
2
an expansion instead.
1
Assume that Y
t
is related to an unobserved variable, y
t
:
Y
t
=
_
¸
¸
_
¸
¸
_
1 if y
t
≥ 0
0 if y
t
< 0.
The model relates a vector of observables x
t−h−1
to the latent variable
y
t
= f(x
t−h−1
) +ε
t
, (1)
where f(.) is R
K+1
→R, x is a (K +1) ×1 vector of K observables plus a constant and ε
t
is an iid
shock with unit variance. A typical probit or logit model would specify f(x) as a linear function,
but equation (1) is written more generally to encompass a variety of possible speciﬁcations.
The objective in this paper is to forecast the state of the economy conditional on a set of
covariates. Let E be the expectations operator, and let p
tt−h−1
denote the conditional probability
of recession, so that
E [Y
t
= 1x
t−h−1
] ≡ p
tt−h−1
= Λ(y
t
), (2)
where Λ(.) is a cumulative distribution function. In the application below Λ is speciﬁed to be the
logistic function, but any twicediﬀerentiable continuous distribution function could be used.
A large literature has used a model similar to that in (1) and (2) to relate future economic
activity to the term structure of interest rates. Estrella & Mishkin (1998), Wright (2006) and
Rudebusch & Williams (2009) focus on simple limited dependent models, conditioning on the slope
and level of the yield curve. There are many ways that one could extend the speciﬁcation described
in (1) and (2). Dueker (2005), Chauvet & Potter (2005) and Kauppi & Saikkonen (2008) extend the
model by focusing on various dynamic speciﬁcations of the same basic setup. Chauvet & Senyuz
(2012) provide a more modern speciﬁcation, as they relate the state of the economy to a dynamic
factor model of the yield curve.
Conditioning on other economic indicators is likely to improve forecast ability. Although the
yield curve is a useful summary of market expectations for the future path of shortterm interest
rates, it is not a suﬃcient statistic for summarizing future states of the economy. Risk and term
premia also complicate the relationship between the yield curve and macroeconomic variables.
1
See www.nber.org/cycles/
3
Moreover, the nonlinear behavior of real variables across the business cycle makes model speciﬁ
cation diﬃcult. Under nonlinearity, it is not enough to iterate a linear model forward to produce
forecasts. As indicated by equation (1), all forecasts in the paper are made using the direct method.
This allows for the evaluation of the forecasting models themselves to reveal the information con
tent carried in various indicators. Direct forecasting allows more sophisticated speciﬁcations of the
function relating the latent variable y
t
to covariates x
t−h−1
.
The remainder of this section introduces the two methods that are used to combine information
from diﬀerent leading indicators, forecast combination and model selection.
2.2 Averaging model forecasts
Model averaging has long been recognized as a method of combining information from a given set
of models. Previous applications have shown that model averaging also tends to improve forecast
accuracy, either because the combination either combines information from partially overlapping
information sets as in the canonical work of Bates & Granger (1969), or because the combina
tion alleviates possible model misspeciﬁcation (Hendry & Clements, 2004; Stock & Watson, 2004;
Timmermann, 2006). In addition, the model weights themselves can be of interest if they are
constructed so that they give the posterior probability that a given model produced the observed
data. In the current application, these posteriors reveal information regarding the usefulness of
particular indicators at various forecast horizons.
The method has a solid statistical foundation and is straightforward to implement. Each of a
set of M models is used to produce a forecast of some event y
t
, resulting in {ˆ y
1t
, ˆ y
2t
, ..., ˆ y
Mt
}. In the
current application, recall that y
t
is the latent variable that relates covariates to the aggregate state
of the economy. The combination problem is to ﬁnd weights w
m
for each forecast to combine the
individual forecasts into a single forecast ˆ y
C
t
= C(ˆ y
1t
, ..., ˆ y
Mt
, w
1
, w
2
, ..., w
M
). In principle, because
forecasts are useful only to the extent that they impact the actions of policymakers and other
economic agents, the weights are the outcome of the minimization of a loss function, which in turn
could reﬂect the underlying utility of decision makers, in a way analogous to Elliott & Lieli (2009).
In the current application, the tradeoﬀ between true and false positives or true and false nega
tives for recession forecasts is uncertain. For this reason, two weighting schemes produce recession
forecasts. The ﬁrst is the most basic application of model averaging: each of the M forecasts is
4
assigned the same weight. The equally weighted forecast of the latent is then
ˆ y
EW
t
=
1
M
M
i=1
ˆ y
it
. (3)
A Bayesian Model Averaging (BMA) framework is used to produced forecasts that are weighted
to reﬂect the posterior probability of each model (Leamer 1978). These weights can be interpreted
as the posterior probability that a given model is true. The Bayesian model averaged forecast is
the probabilityweighted sum of the modelspeciﬁc forecasts:
ˆ y
BMA
t
=
M
i=1
ˆ y
it
Pr(M
i
D
t−h−1
) (4)
where ˆ y
it
is the forecast from model i and Pr(M
i
D
t−h−1
) denotes posterior probability of model
i conditional on the data available at the time the forecast is made. By Bayes’ Law, the posterior
probability of model i is proportional to that model’s marginal likelihood multiplied by its prior
probability:
Pr(M
i
D
t
) ∝ Pr(D
t
M
i
)Pr(M
i
),
where
Pr(D
t
M
i
) =
_
Pr(D
t
θ
i
, M
i
)Pr(θ
i
M
i
)∂θ
i
.
Implementation of (4) requires an estimate of the marginal likelihood, which is an average
of the likelihood function with respect to the prior distribution for each model parameter. The
marginal likelihood of each model is approximated with the Bayesian information criterion, which
is a consistent estimate of the marginal likelihood of a model (Raftery 1995). This approximation is
commonly used in applied work and is advantageous since its it requires only a maximum likelihood
estimate and allows the researcher to set aside the production of priors for each model’s parameters
(see, e.g., SalaIMartin, Doppelhofer & Miller (2004), Brock, Durlauf & West (2007) and Morley &
Piger (2012)). Since there is no natural method that can be used to incorporate economic intuition
into the prior for each model, each model is assumed equally likely a priori.
Given these assumptions, model posterior probabilities are calculated as model ﬁt relative to
the ﬁt of all models, or
Pr(M
i
D
t−h−1
) =
exp(
BIC
i
)
M
i=1
exp(
BIC
i
)
.
For both the equallyweighted and Bayesian Model Averaged (BMA) forecasts, model forecasts
5
of the latent are made using equation (3) or (4), respectively. Probabilistic forecasts are then made
by applying equation (2).
2.3 Model selection via the boosting algorithm
Model averaging is one solution to the problem of model speciﬁcation. An alternative solution to
the problem is to perform model selection. This section introduces a model selection algorithm as
a methodology with which one can produce empiricallydriven forecasting models of the business
cycle. We wish to model the relationship between the observed discrete variable Y
t
and a vector
of potential covariates, x
t
= (x
1t
, x
2t
, ..., x
Kt
). In the section above, this was achieved by deﬁning
an unobserved continuous latent variable that depends linearly on covariates. The approach here
is more general, in the sense that we wish to endogenously model the choice of covariates to be
included in the model (that is, forecasts may be made using only a subset of x
1
, x
2
, ..., x
K
). The
method allows for a potentially nonlinear relationship between the latent and the covariates.
These goals are accomplished by estimating a function F : R
K
→R that minimizes the expected
loss L(Y, F), i.e.,
ˆ
F(x) ≡ arg min
F(x)
E [L(Y, F(x))] . (5)
We require only that the loss function is diﬀerentiable with respect to the function F. The setup
encompasses many diﬀerent types of problems. For example, if Y were a continuous outcome, then
specifying L(Y, F(x)) as squarederror loss and F(x) as a linear function would result in a standard
OLS regression. More often the loss function is speciﬁed as the negative of the loglikelihood of
the error’s distribution. F can be speciﬁed very generally—in the machinelearning literature it is
common to specify F(x) as decision trees, a nonparametric method. Smoothing splines are another
common choice.
The following algorithm minimizes the empirical counterpart to (5) by specifying that F(x)
is an aﬃne combination of socalled ‘weak learners,’ each of which are speciﬁed separately. The
algorithm is due to Friedman (2001) and can be summarized as follows. The algorithm begins by
initializing the learner in order to compute an approximate gradient of the loss function. Step 3
ﬁts each weak learner to the current estimate of the negative gradient of the loss function. Step
4 searches among each weak learner and chooses the one that most quickly descends the function
space and then chooses the step size. In step 5 we iterate on 24 until iteration M, which will be
6
endogenously determined.
Functional Gradient Descent.
1. Initialize the model. Choose a functional form for each weak learner, f
(k)
, k = 1, ...K. Each
weak learner is a regression estimator with a ﬁxed set of inputs. Most commonly each covariate
receives its own functional form, which need not be identical across each variable.
Let m denote iterations of the algorithm, and set m = 0. Initialize the strong learner F
0
. It
is common to set F
0
equal to the constant c that minimizes the empirical loss.
2. Increase m by 1.
3. Projection. Compute the negative gradient of the loss function evaluated at the current
estimate of F,
ˆ
F
m−1
. This produces:
u
m
≡ {u
m,t
}
t=1,...,T
= −
∂L(Y
t
, F)
∂F

F=
ˆ
F
m−1
(xt)
, t = 1, ..., T
Fit each of the K weak learners separately to the current negative gradient vector u
m
, and
produce predicted values from each weak learner.
4. Update F
m
. Let
ˆ
f
(κ)
m
denote the weak learner with the smallest residual sum of squares among
the K weak learners. Update the estimate of F by adding the weak learner κ to the estimate
of F:
ˆ
F
m
=
ˆ
F
m−1
+ρ
ˆ
f
κ
m
Most algorithms simply use a constant but suﬃciently small shrinkage factor, ρ. Alternatively,
one can solve an additional minimization problem for the best stepsize.
5. Iterate. Iterate on Steps 2 through 4 until m = M.
The two parameters ρ and M jointly determine the number of iterations required by the algo
rithm in order to converge. Small values of ρ are desirable to avoid overﬁtting, since the computa
tional cost of additional iterations is low. In the applications below, M is chosen to minimize an
Akaike Information Criterion criterion; i.e. M ≡ arg min
m
AIC(m).
2
A weak learner can be se
lected many times throughout the course of the boosting algorithm, or not at all. This datadriven
approach of model selection is very ﬂexible; diﬀerent speciﬁcations of the loss function L and the
weak learners lead to approximations of diﬀerent models. Moreover, when used in a forecasting ex
ercise, reestimating the model at each point in time also allows the relationship between covariates
and the dependent variable to change.
2
Throughout, the AIC is deﬁned as −2 ×loglikelihood + 2 ×df, where df denotes the degrees of freedom of the
model Models with better ﬁt have lower AIC values.
7
3 Data and evaluation
3.1 Data
The implementation of either the forecasting schemes described in the previous section requires
that the modelspace to be deﬁned. In the interest of parsimony, and following the seminal work of
Estrella & Mishkin (1998), the analysis is limited to the commonly followed ﬁnancial and macroe
conomic indicators listed in table 1.
While the majority of the literature focuses on the slope of the yield curve as a predictor of future
economic activity, in principle other features of the curve, such as its level and curvature, may also
be important predictors. The level, slope and curvature of the yield curve are constructed using
monthly averages of the daily yields of zerocoupon 3month, 2year and 10year yields compiled
by Gurkaynak, Sack & Wright (2007). Speciﬁcally, the level of the curve is calculated by taking
the mean of the 3month, 2year and 10year yields; the slope of the curve is constructed as the
diﬀerence between the 10year and the 3month yields; and curvature is measured by taking the
diﬀerence between two times the 2year yield and the sum of the 3month and 10year yields.
Additional ﬁnancial variables are included in the forecasting exercise. The TED spread and two
diﬀerent corporate bond spreads measure the degree of credit risk in the economy. Other ﬁnancial
indicators in the empirical model include money growth rates (both nominal and real), and, because
they are forward looking, changes in a stock price index and the value of the U.S. dollar. The VIX
is included in the model search in recognition that ﬁnancial volatility may presage a decline in real
economic activity.
3
Several variables that describe the real economy are also included as predictors in the model.
Industrial production serves as a proxy for output. Housing permits proxy for the housing market.
In order to gauge the health of the labor market, the fourweek moving average of initial claims for
unemployment insurance and a measure of hours worked are included in the model. The purchasing
managers index, a commonly followed leading indicator, is also included in the model. The data
span the period January 1973–June 2012 at a monthly frequency. Table 1 lists the indicators in
detail.
3
VIX is taken from CBOE and is available from 1990 to present. Pre1990, VIX is proxied by the within month
standard deviation of the daily return to the S&P 500 index, normalized to the same mean and variance as the VIX
for the period when they overlap (19902012). Actual and implied volatility correlated at 0.869.
8
[Table 1 about here.]
3.2 Evaluating forecasts of discrete outcomes
Model and forecast evaluation is undertaken using three distinct metrics. The ﬁrst is intended
to measure insample ﬁt and is analogous to the R
2
statistic of a standard linear regression. The
other two measures focus on the predictive ability of each model, with one measure focusing on each
model’s ability to classify future dates into recessions and expansions. The pseudoR
2
developed
by Estrella (1998) measures the goodnessofﬁt of a model ﬁt to discrete outcomes. The pseudoR
2
can be written
pseudoR
2
= 1 −
_
logL
u
logL
c
_
−(2/n)logLc
. (6)
L
u
is the value of the likelihood function and L
c
is the value of the likelihood function under
the constraint that all coeﬃcients are zero except for the constant. As with it’s standard OLS
counterpart, a value of zero indicates that the model does not ﬁt the data whereas a value of one
indicates perfect ﬁt.
The test statistic of Giacomini & White (2006) is used to evaluate predictive ability. The GW
test assesses the diﬀerence between the losses associated with the predictions from a prospective
model and those from a null model. Importantly given the methods used to make forecasts, the
statistic tests the conditional forecast and does not depend on the asymptotic convergence of model
parameters to their true values. Speciﬁcally, let L(ˆ ε
i
t
) be a loss function, where i ∈ {0, 1} denotes
the model used to produce the forecast, and ˆ ε
i
t
≡ ˆ y
i
t
− y
t
is the forecast error. Note that in the
current application, the outcome variable, y, is discrete and takes a value of zero and one, while
the prediction ˆ y is a probability. The test statistic that evaluates P predictions can be written as:
GW
1,0
=
∆
¯
L
ˆ σ
L
/
√
P
→N(0, 1), where
∆
¯
L =
1
P
t
_
L(ˆ ε
1
t
) −L(ˆ ε
0
t
)
_
; ˆ σ
2
=
1
P
t
_
L(ˆ ε
1
t
) −L(ˆ ε
0
t
)
_
2
GW test statistics are computed for two loss functions: absolute error and squared error. Since
forecasts are made multiple periods into the future, tables display pvalues from GiacominiWhite
tests that are robust to heteroskedasticity and autocorrelation.
The GiacominiWhite test with squarederror loss is closely related to the commonlyused
Quadratic Probability Score (Brier & Allen 1951), another popular evaluation test for probabilistic
9
forecasts. One wellknown drawback of the QPS is that it does not measure resolution or dis
crimination; it is simply a test of the distance from the probabilistic forecast and the realized 0/1
outcome. Thus, the ﬁnal tool used to evaluate the forecasts explicitly recognizes that the problem
is one of classiﬁcation. Classiﬁcation is a distinct measure of model performance since two models
could have diﬀerent model ﬁt but still classify the discrete outcome in the same way (Hand &
Vinciotti 2003).
The area under the Receiver Operating Characteristics (ROC) curve is used to evaluate each
model’s ability to distinguish between recessions and expansions. The ROC curve describes all
possible combinations of true positive and false positive rates that arise as one varies the threshold
used to make binomial forecasts from an realvalued classiﬁer. As a threshold c is varied from 0 to
1, and treating FP(c) as the abscissa, a curve is traced out in {FP(c), TP(c)} space that describes
the classiﬁcation ability of the model. The area underneath this curve (AUC) is a wellknown
summary statistic that describes the classiﬁcation ability of a given model.
4
The statistic has a
lower bound of 0.5 and an upper bound of 1 where a higher value indicates superior classiﬁcation
ability. The statistic has standard asymptotic properties, although for inferential purposes standard
errors are found with the bootstrap.
An important advantage of ROC curves relative to the GiacominiWhite test described above is
that it does not require the speciﬁcation of a loss function. Instead, the ROC curve is a description
of the tradeoﬀs between true positives and false negatives produced by a forecasting model. The
statistic is a nonparametric summary of the potential classiﬁcation ability of a particular model.
In the current application, this is advantageous since it is hard to know how to weigh true and false
positives against true and false negatives when forecasting states of the business cycle.
4 Insample results
Although the primary interest is in the outofsample performance of the two models, this section
ﬁrst evaluates their insample performance. Estimating the models on the full sample provides
a sense of how each model and forecast method performs on average. As an initial investigation
into the statistical and predictive power of commonly acknowledged leading indicators, table 2
presents estimates of univariate logit models at each horizon. For each variable, the full sample
4
For a complete introduction to ROC curves, see Pepe (2003).
10
of available data is used to estimate model parameters, which are then used to produce predicted
recession probabilities. The predicted probabilities are then compared to NBERdeﬁned recessions
for evaluation. Table 2 displays three summary statistics that measure model ﬁt for each variable
and at each forecast horizon. The ﬁrst statistic contains the pseudoR
2
that provides a measure of
model ﬁt. The second row is the tstatistic that tests the null hypothesis that the model coeﬃcient
from the logit regression is zero. The ﬁnal row is the AUC statistic, a nonparametric summary
statistic that summarizes the model’s ability to discern expansions from recessions. In the interest
of concision, tests of forecast accuracy are not presented for insample results.
[Table 2 about here.]
Variables that describe real economic activity perform best in the very shortterm. For variables
that describe real economic activity, both model ﬁt and classiﬁcation ability decline as the forecast
horizon grows, an observation that is not unsurprising since the NBER recession dates themselves
are based on the contemporaneous behavior of similar variables. Both industrial production and
initial claims, for example, are useful indicators of the state of the economy at short horizons,
each with high pseudoR
2
and AUC statistics approaching 0.90. In both instances, however, the
predictive power of the indicator is limited at longer horizons.
In contrast, many of the ﬁnancial indicators exhibit forwardlooking behavior as their most
precise ﬁt is not contemporaneous. The slope of the yield curve performs best at the horizon of 12
months. Unconditionally, the level and curvature of the yield curve appear to contain only modest
information about the state of the economy. Variables that reﬂect turmoil in ﬁnancial markets
give information that reﬂects the probability of a recession, particularly in the nearterm. Changes
in stock prices also appear to contain modest explanatory power, although only at short forecast
horizons. Neither nominal nor real money growth appear to consistently contain useful information
across forecast horizons, with very low pseudoR
2
values and modest forecast ability. Finally,
although model ﬁt and classiﬁcation ability are distinct features of a model, in the application here
the two align very closely. Models that have a better ﬁt also display superior classiﬁcation ability.
4.1 Model averaging
Table 3 evaluates the forecasts produced by the two weighting schemes described in section 2.2.
The top panel presents results from the equallyweighted forecasts of each of the K models; the
11
bottom panel weights each forecast according to its posterior probability as in equation (4). Fore
cast combination produces forecasts that do not perform worse than their univariate counterparts,
indeed, often forecast performance is improved. Weighing the probabilistic forecasts by model pos
terior probabilities does not improve the recession forecasts over the equallyweighted forecasts.
At forecast horizons of six months or less, the equally weighted forecasts clearly outperform the
weighted forecasts.
[Table 3 about here.]
Figure 1 gives further insight into the performance of the combined forecasts. The posterior
probabilities of the models strongly favor only one or in same cases, two, forecast models. Con
temporaneously, both industrial production and initial claims are included in the forecast model;
forecasts from all other models receive a weight of zero. The slope of the yield curve is strongly
preferred at forecast horizons of six and twelve months; indeed, it is the only forecast model in
cluded in the model averaging procedure. At longer horizons—18 and 24 months—corporate yield
spreads dominate in terms of model ﬁt, and are the only indicators that receive nonzero weight.
[Figure 1 about here.]
4.2 Forecasts from the boosted models
4.2.1 A linear model
An alternative to assigning weights to univariate forecasts is to use a model selection procedure to
specify the forecast model. This section evaluates the empirical performance of one such procedure,
boosting. Equation (5) is estimated using a loss function of negative onehalf times the Bernoulli
log likelihood function. Each indicator listed in Table 1 is included in the model search. The initial
weak learner speciﬁed as a univariate linear function; this is equivalent to the logit models used
in the forecast averaging exercise. At each iteration m, the covariate that minimizes the empirical
loss at that iteration is included in the forecast model. After the M iterations, the ﬁnal model is:
ˆ
F
M
(x) =
M
m=1
ρ
m
ˆ
f
m
(x)
ˆ
f
m
(x) = f
κ
(x) (7)
κ = arg min
k
T
t=1
(u
t
−
ˆ
f
k
(x
t
))
2
12
I set ρ
m
= ρ = 0.1, as is common in the boosting literature. The number of boosting iterations M
is chosen to be the one that minimizes the AIC of the ﬁnal boosted model.
Table 4 presents the insample estimates of equations (5) and (7). As the number of iter
ations grows large, the boosted model is equivalent to a ‘kitchensink’ logit model. Thus as a
method of comparison, the table presents the ratio of the coeﬃcient from the boosted model to
its unrestricted ‘kitchensink’ logit counterpart for each variable included in the model search (i.e.,
β
boost
/β
kitchensink
).
5
The forecasting models produced by the boosting model diﬀer from their BMA
counterpart in that there are many more indicators included in the forecast model at each horizon
on average. However, the coeﬃcients for the variables included in the forecasting model are shrunk
signiﬁcantly towards zero.
[Table 4 about here.]
Figure 2 presents the model for each forecast horizon in a slightly diﬀerent way. The ﬁgure
shows the fraction of iterations that the boosting algorithm selected a particular covariate for
both the linear and nonlinear variants of the model.
6
Although there are many more variables
included in the boosted model, the variables that are selected are qualitatively similar to the
forecasting models produced by Bayesian Model Averaging. At short forecast horizons, measures
of real activity and a ﬁnancial indicators are included in the forecast model. Unsurprisingly, when
forecasting a six and twelve months ahead, the model relies on the slope of the yield curve to
forecast business cycle turning points. Only at some horizons do other elements of the yield curve
enter the model—curvature enters into models forecasting into the mediumterm—and the level
of the curve is informative for forecasts into the very distant future. Corporate yield spreads are
included in the models that forecast into the distant future. Finally, several variables are never
or only marginally included in the ﬁnal forecasting models. The measures of money growth, both
nominal and real, as well as VIX, are essentially never included in the ﬁnal forecast model.
5
Since the boosted model minimizes onehalf the log of the oddsratio, coeﬃcients from the boosted model are
doubled to facilitate comparison.
6
The plot is of ψ
h
k
for each covariate k. h denotes forecast horizon and κ denotes the covariate with minimum
mean squared error at iteration m.
ψ
h
k
=
1
M
M
m=1
I(k = κ); (8)
13
The bottom half of table 4 describes the classiﬁcation ability of each model. The value of
the maximized AIC is shown in the table. In the interest of brevity I do not report the AIC
models from the univariate models reported in table 2 but for each forecast horizon, the boosted
model is strongly preferred. For example, at the horizon of 12 months, the AIC of the bestﬁtting
univariate model–the one using the slope of the yield curve–is 280.3, which is clearly dominated by
the boosted AIC. In terms of classiﬁcation ability, the models produce predicted probabilities that
track NBER recession dates very well, particularly at short horizons. When nowcasting, the linear
model achieves an AUC statistic of 0.97, indicating that model achieves nearperfect classiﬁcation
of economic activity contemporaneously. At longer horizons the AUC decreases—at a forecast
horizon of two years the AUC falls to 0.84. Relative to the AUC values in table 2, the boosted
model has considerably higher classiﬁcation ability than the best univariate model at each horizon.
Although many of the forecasting models are sparse, adding information from disparate variables
improves both model ﬁt and forecast ability.
4.2.2 A nonlinear model
In a standard logistic model, the unobserved latent variable depends on the covariates in a linear
fashion, as in equation (1). However, this functional form is used only because it is convenient, and
there are good reasons to believe that a nonlinear speciﬁcation is appropriate. Financial market
indicators are often erratic and behave in a nonlinear fashion. Hamilton (2005) and Morley &
Piger (2012) argue that movements from one state of the economy to another are associated with
nonlinear movements in real economic variables.
For these reasons, nonlinearity is introduced into the forecast model using the smoothing
splines of Eilers & Marx (1996). Incorporating smoothing splines into the boosting algorithm is
straightforward. The weak learner for each covariate k within the boosting algorithm is speciﬁed
to minimize the penalized sum of squared error:
PSSE(f
k
, λ) =
T
t=1
[y
t
−f
k
(x
t
)]
2
+λ
_
[f
k
(z)]
2
dz (9)
where the smoothing parameter λ determines the magnitude of the penalty for functions with
a large second derivative. Splines have several attractive features: they are estimated globally,
conserve moments of the data and are computationally eﬃcient as they are extensions of generalized
14
linear models.
7
As with the linear case, at each iteration m the covariate that best minimizes the
empirical loss at that iteration is included in the forecast model. Let f
k
m
be the smoothing spline
ﬁt to indicator k at iteration m, then at each iteration the weak learner can be expressed:
ˆ
f
m
(x) = f
κ
(x)
ˆ
f
k
(x) = arg min
f(x)
_
PSSE(f, λ, x
k
)
_
(10)
κ = arg min
k
T
t=1
(u
t
−
ˆ
f
k
(x
t
))
2
As before, the number of boosting iterations M is chosen to be the one that minimizes the AIC of
the ﬁnal boosted model.
The performance of the nonlinear models insample is impressive. Table 5 displays the results
of the nonlinear model ﬁt to the full sample. Comparing information criteria to that from the lin
ear model presented in table 4, it is clear that the data prefer a nonlinear version of the forecasting
model. For each forecast horizon, the AIC strongly prefers a nonlinear speciﬁcation to a linear
boosted model. The nonlinear models also classify the data more accurately than their linear
counterparts. For each forecast horizon, the AUC indicates that the nonlinear speciﬁcation im
proves the classiﬁcation ability of the indicators. The diﬀerence between forecasting models is most
notable at long forecast horizons. The diﬀerence between the AUC of the linear and the nonlinear
models is not statistically diﬀerent when forecasting at h = 0. However, when forecast six months
ahead, the diﬀerence is statistically signiﬁcant at the 90 percent conﬁdence level. When forecasting
at horizons longer than six months the diﬀerence is signiﬁcant at the 95 percent conﬁdence interval.
[Table 5 about here.]
The second panel of ﬁgure 2 presents the fraction of time the algorithm included a particular
covariate at each iteration for each nonlinear forecasting model. It is clear that no covariate
dominates a forecasting model as was the case when performing model averaging. Nonlinearity
allows some covariates to enter into the forecasting model when they were excluded from the linear
7
Eilers & Marx approximate the penalty term in (9) by constraining the diﬀerence in parameter values of the
spline in neighboring regions of the data, transforming the problem into a modiﬁed regression equation that is
computationally very eﬃcient. Buhlmann & Yu (2003) have explored boosting with smoothing spline weak learners
and ﬁnd that setting λ = 4 is a reasonable setting for this parameter, which is the value used here.
15
case. For example, VIX, which was rarely included in the linear model, enters into the nonlinear
model at all forecast horizons, the contemporaneous model most frequently.
[Figure 2 about here.]
4.3 Comparison to NBER recession dates
Figure 3 shows the recession probabilities from each of the methods estimated at a forecast horizon
of zero months. The graphs also contain an optimal threshold that is used to produce a binomial
series. Speciﬁcally, given the assumption of symmetric utility/disutility from true/false positives,
the utility of a classiﬁer is the diﬀerence between the true and false positive rates (Berge & Jord`a
2011). The threshold shown for each forecast is the one that maximizes this diﬀerence.
As was highlighted by the models chosen by the BMA averaging method and the model selection
scheme, the economic indicators vary widely in the information that they carry at diﬀerent forecast
horizons. When forecasting contemporaneous probability of recession, many of the indicators were
found to have only modest utility. Although model averaging alleviates many problems concerning
model misspeciﬁcation, the average still depends critically on the utility of the underlying indica
tors. As seen in panel a) of ﬁgure 3, the result of averaging ineﬀective measures is a forecast that
hovers near the unconditional probability of recession. The BMA forecast (panel b) highlights the
opposite extreme. The model relies heavily on only one indicator, initial claims for unemployment
insurance. The forecasts appear to be an improvement over the unweighted average—the probabil
ity of recession crosses threshold during each of the recessions—but the forecasts also have a high
variance. The bottom half of the ﬁgure shows the insample forecasts from the model selection
method. The two boosted models produce insample forecasts that are quite similar. The proba
bilities from each cross the threshold during NBER recessions, and conditional on the use of the
optimal threshold, there are no false positives in the sample.
[Figure 3 about here.]
The insample probabilities of recession can be combined with the threshold value to produce
a chronology of business cycle turning points for the U.S. economy. Table 6 shows the peaks and
troughs from each of the four models, relative to NBER recession dates. For each model, peaks
and troughs are the ﬁrst and ﬁnal month for which the recession probability is equal to or greater
16
than the threshold. I also impose the natural restriction each phase of the business cycle lasts more
than three months. The recession dates that result from the model averaging scheme align with
the NBER dates less closely than dates from the boosted models. For example, both the weighted
and unweighted model averaging schemes produce a peak date for the most recent recession period
of August 2008, whereas the peak date implied by the boosted models is March 2008. The model
averaging schemes produce two false negative events–unweighted BMA misses the 1990 and 2001
recessions, while the weighted BMA forecasts miss the 1981 and 1991 recessions. The linear boosted
event also produces a false negative event for the 2001 recession. In the case of the BMA forecast
and the linear boosted model forecast, the probability of recession does cross the threshold, but
dips back below the threshold during the recession event. In each case, the rule that a recession
last more than three months ﬁlters these signals out. Finally, each of the models appear to have
an easier time identifying trough dates rather than peak; for example, the trough dates from the
nonlinear boosted model are never more than one month diﬀerent than the NBERdeﬁned dates.
[Table 6 about here.]
5 Outofsample performance
Both the model averaging scheme and the model selection algorithm produce valuable probabilistic
forecasts of recession insample. However, aggressive model search may produce models that overﬁt
the data, which would reduce the outofsample forecast ability of the method. Weighting forecasts
with a statistical measure of ﬁt carries the same risk. This section considers the outofsample
performance of the forecasts produced by the four alternative methods presented above. Forecasts
are produced using an expanding window, and the initial outofsample forecast is made for May
1985. From this point forward, at each point in time, a total of 20 forecast models are estimated:
each of the four diﬀerent model produces forecasts of recession at ﬁve horizons, zero, six, 12, 18
and 24 months. After the forecasts have been produced, an additional data point is added to the
model and the process is repeated.
Table 7 displays the results of the outofsample exercise. Although the forecast performance
is diminished relative to the insample forecasts, it is clear that each of the methods produces
forecasts that are superior to a naive forecast. All four methods clearly outperform a naive forecast
in terms of the GiacominiWhite test statistic, and the Area Under the ROC curve also shows
17
that the models have superior classiﬁcation ability. As was the case for the for the insample
forecasts, the forecasts produced by the boosting algorithms outperform those produced by the
model combination schemes.
[Table 7 about here.]
A risk of either the model combination or model selection scheme is that they are unable to
produce weights or model parameters that can produce functional forecasts outofsample. This
concern does not appear to be born out by either of the methods. The posterior probabilities of
the models do not change in a meaningful way as the models are reestimated with the expanding
window. For example, the slope of the yield curve receives by far the highest weight when producing
BMAweighted forecasts at a forecast horizon of 12 months–insample, the slope of the yield curve
received a weight of 0.99. For each of the models forecasted with the expanding window, the
minimum weight received by the slope of the yield curve was 0.997; essentially the BMA weights
did not change through time.
In contrast, the boosted models display a much higher degree of ﬂexibility. Figure 4 is the outof
sample analogue of ﬁgure 2; it displays the frequency of times for which a given covariate is selected
by the boosting algorithm for the linear model that forecasts the twelve month ahead probability
of recession. In the interest of legibility, the ﬁve most frequently selected covariates are displayed,
with all other covariates grouped together so that for each forecast model the frequencies summed
across covariates is equal to one. The forecast models are quite stable throughout expansion periods,
relying primarily on the slope and level of the yield curve. However, the models display substantial
variation around business cycle turning points. For example, at the most recent recession, the
inﬂuence of the Treasury yield curve goes down substantially, while the inﬂuence of the BAA10Y
yield spread, the value of the dollar and industrial production increase substantially (these are all
within the ”All others” category). That the boosted model outperforms the model combination
schemes reveals that although the boosted models change through time, the method is able to select
those covariates that contain useful information for forecasting purposes.
[Figure 4 about here.]
To get a sense for the performance of the models in realtime, ﬁgure 5 displays the outof
sample contemporaneous (h=0) forecasts for each of the four forecast models between July 2007
18
and January 2010 in six month increments. The NBER declared the most recent recession to
have been December 2007–June 2009. Forecasts need to be considered within the context of their
thresholds, which for the fullsample, are shown in ﬁgure 3.
8
The forecasts from each of the
boosted models easily surpass their thresholds of 0.40 beginning in January 2008. The forecasted
probability of recession dips back below the optimal threshold for one month (in February for
the linear model; March for the nonlinear boosted model), but after that each stays above the
threshold values until July of 2009. The boosted models provide a clear signal that the economy
is in recession; the estimated probability of recession is nearly one in each of the months displayed
in the ﬁgure. The forecasts from the model combination schemes provide a less clear signal. The
unweighted model average does not surpass its threshold value of 0.21 until September of 2008, and
achieves a maximum value of only 0.44 in November, 2008. The BMAweighted forecasts perform
somewhat better—the estimated probability hovers near a value of 0.3 beginning in January of
2008. However, the signal does not cross its threshold value of 0.55 until September 2008. The
BMAweighted contemporaneous forecast stays above this value until May 2009.
[Figure 5 about here.]
6 Conclusion
There is an intense interest in establishing the current and future states of the business cycle.
Yet even a casual consumer of macroeconomic news would observe the spotty record economists
have when forecasting economic downturns. This paper evaluates the information content of many
commonly cited economic indicators, providing evidence that many economic indicators do con
tain information useful for forecasting business cycle turning points. However, diﬀerent indicators
provide valuable information about diﬀerent forecast horizons. This observation complicates the
modeling decision faced by forecasters, especially when one considers the nonlinear behavior of
many diﬀerent economic indicators around business cycle turning points.
The methods of model averaging and model selection highlight the diﬃculties of forecasting
business cycle turning points. Since many commonly followed indicators are valuable only at
particular forecast horizons or have only modest predictive value at any horizon, a simple model
8
The thresholds change little when estimated in a true, outofsample way. For this reason, and in the interest of
concision, I suppress comparison to the outofsample threshold.
19
average dilutes useful information. In contrast, a weighting scheme that gives each model its weight
based on a posterior probability produces more accurate forecasts. However, the forecasts rely on
very few indicators; posterior probabilities of the univariate models ﬁt at diﬀerent horizons strongly
prefer one indicator to all others. Forecasts of the current state of the economy rely on measures
of real economic activity: industrial production and initial claims. In contrast, forecasts into the
mediumterm (six or 12 months) rely on signals originating from the bond market. When forecasting
with a model selection algorithm, models tend to condition on a wide number of economic indicators.
Boosting exploits the biasvariance tradeoﬀ: although estimates of model parameters are biased,
the forecasts are more accurate due to reduced variance. The method is also able to take the
nonlinear behavior of real economic variables around business cycle turning points into account.
Outofsample evidence indicates that the boosted models do not overﬁt.
Overall, the results indicate that there is no suﬃcient summary statistic for identifying or
forecasting business cycle turning points. Indeed, this is the approach taken by the Business Cycle
Dating Committee of the NBER, who consider a widevariety of economic indicators when making
pronouncements regarding the state of the economy. At the same time, the power of the yield curve
as a predictor of future economic activity endures, although combining the information from the
yield curve with information from other leading indicators improves forecast ability.
20
References
Bates, J. & Granger, C. (1969), ‘The combination of forecasts’, Operations Research Quarterly
20, 451–468.
Berge, T. J. & Jord`a, O. (2011), ‘Evaluating the classiﬁcation of economic activity’, American
Economic Journal: Macroeconomics 3.
Brier, G. & Allen, R. (1951), Veriﬁcation of weather forecasts, in ‘Compendium of Meteorology’,
American Meteorology Society, pp. 841–848.
Brock, W. A., Durlauf, S. N. & West, K. D. (2007), ‘Model uncertainty and policy evaluation: Some
theory and empirics’, Journal of Econometrics 136(2), 629–664.
URL: http://ideas.repec.org/a/eee/econom/v136y2007i2p629664.html
Buhlmann, P. & Yu, B. (2003), ‘Boosting with the l2 loss: Regression and classiﬁcation’, Journal of
the American Statistical Association 98, 324–340.
Burns, A. F. & Mitchell, W. C. (1946), Measuring Business Cycles, National Bureau of Economic
Research, New York, NY.
Chauvet, M. & Potter, S. (2005), ‘Forecasting recessions using the yield curve’, Journal of Forecasting
24(2), 77–103.
Chauvet, M. & Senyuz, Z. (2012), A dynamic factor model of the yield curve as a predictor of the
economy, Technical Report 201232, Federal Reserve Board.
Dueker, M. (2005), ‘Dynamic forecasts of qualitative variables: A qual var model of u.s. recessions’,
Journal of Business & Economic Statistics 23, 96–104.
URL: http://ideas.repec.org/a/bes/jnlbes/v23y2005p96104.html
Eilers, P. H. & Marx, B. D. (1996), ‘Flexible smoothing with bsplines and penalties’, Statistical
Science 11(2), 89–121.
Elliott, G. & Lieli, R. P. (2009), Predicting binary outcomes. U.C.S.D. mimeograph.
Estrella, A. (1998), ‘A new measure of ﬁt for equations with dichotomous dependent variables’,
Journal of Business & Economic Statistics 16(2), 198–205.
Estrella, A. & Mishkin, F. S. (1998), ‘Predicting u.s. recessions: Financial variables as leading
indicators’, The Review of Economics and Statistics 80(1), 45–61.
Friedman, J. (2001), ‘Greedy function approximation: A gradient boosting machine’, The Annals of
Statistics 29(5).
Giacomini, R. & White, H. (2006), ‘Tests of conditional predictive ability’, Econometrica
74(6), 1545–1578.
Gurkaynak, R. S., Sack, B. & Wright, J. H. (2007), ‘The u.s. treasury yield curve: 1961 to the
present’, Journal of Monetary Economics 54(8), 2291–2304.
Hamilton, J. D. (2005), ‘What’s real about the business cycle?’, Review pp. 435–452.
Hand, D. J. & Vinciotti, V. (2003), ‘Local versus global models for classiﬁcation problems: Fitting
models where it matters’, The American Statitician 57, 124–131.
21
Hendry, D. F. & Clements, M. P. (2004), ‘Pooling of forecasts’, The Econometrics Journal 7.
Kauppi, H. & Saikkonen, P. (2008), ‘Predicting u.s. recessions with dynamic binary response models’,
The Review of Economics and Statistics 90(4), 777–791.
Leamer, E. E. (1978), Speciﬁcation Searches: Ad Hoc Inference With Nonexperimental Data, John
Wiley & Sons, Inc., New York, NY.
Morley, J. & Piger, J. (2012), ‘The asymmetric business cycle’, The Review of Economics and
Statistics 94(1), 208–221.
Pepe, M. S. (2003), The Statistical Evaluation of Medical Test for Classiﬁcation and Prediction,
Oxford: Oxford University Press.
Raftery, A. E. (1995), ‘Bayesian model selection in social research’, Sociological Methodology 25, 111–
163.
Rudebusch, G. D. & Williams, J. C. (2009), ‘Forecasting recessions: the puzzle of the enduring
power of the yield curve’, Journal of Business & Economic Statistics 27(4).
SalaIMartin, X., Doppelhofer, G. & Miller, R. I. (2004), ‘Determinants of longterm growth: a
bayesian averaging of classical estimates (bace) approach’, The American Economic Review
94(4), 813–835.
Stock, J. H. & Watson, M. (2004), ‘Combination forecasts of output growth in a sevencountry data
set’, Journal of Forecasting (23), 405–430.
Timmermann, A. (2006), Forecast combinations, in G. Elliott, C. Granger & A. Timmermann, eds,
‘Handbook of Economic Forecasting’.
Wright, J. H. (2006), The yield curve and predicting recessions, Technical report, Federal Reserve
Board.
22
Figures and tables
Variable Deﬁnition Transformation
Interest rates and interest rate spreads
Level of yield curve Average of 3mo, 2 and 10year yields –
Slope of yield curve 10yr less 3mo yield –
Curvature of yield curve 2x2yr minus sum of 3mo and 10yr yield –
TED spread 3mo. ED less 3mo. treasury yield –
BAA corporate spread BAA less 10yr. treasury yield –
AAA corporate spread AAA less 10yr. treasury yield –
Other ﬁnancial variables
Change in stock index Dow Jones Industrial Average 3month log diﬀerence
Money growth M2 3month log diﬀerence
Real money growth M2 deﬂated by CPI 3month log diﬀerence
U.S. dollar Tradeweighted dollar 3month log diﬀerence
VIX VIX from CBOE and extended following Bloom –
Macroeconomic indicators
Output Industrial production (s.a.) 3month log diﬀerence
Housing permits – 3month log diﬀerence
Initial claims 4week moving average (s.a.) 3month log diﬀerence
Weekly hours, manufacturing – 3month log diﬀerence
Purchasing managers index – 3month log diﬀerence
Table 1: Variables included in forecasting models.
23
N h=0 h=6 h=12 h=18 h=24
Level PseudoR
2
477 0.03 0.05 0.05 0.04 0.01
tstatistic 3.82 4.97 4.52 4.06 2.35
AUC 0.61 0.65 0.65 0.64 0.59
Slope PseudoR
2
477 0.04 0.23 0.28 0.21 0.10
tstatistic 4.09 8.48 8.62 7.96 6.23
AUC 0.63 0.83 0.88 0.86 0.78
Curve PseudoR
2
477 0.00 0.00 0.01 0.02 0.03
tstatistic 0.76 0.66 1.56 3.07 3.31
AUC 0.49 0.49 0.54 0.61 0.63
TED PseudoR
2
477 0.27 0.19 0.07 0.04 0.01
tstatistic 9.16 8.24 5.43 4.55 2.02
AUC 0.83 0.82 0.76 0.72 0.62
SP AAA10Y PseudoR
2
477 0.01 0.14 0.21 0.24 0.15
tstatistic 2.08 7.05 7.90 8.04 6.92
AUC 0.565 0.77 0.85 0.87 0.81
SP BAA10Y PseudoR
2
477 0.00 0.10 0.18 0.22 0.14
tstatistic 0.69 6.15 7.49 7.67 6.65
AUC 0.50 0.74 0.83 0.86 0.80
DJIA PseudoR
2
477 0.11 0.05 0.00 0.00 0.00
tstatistic 6.57 4.91 1.42 0.52 1.23
AUC .741 0.71 0.57 0.51 0.54
M2 PseudoR
2
477 0.01 0.00 0.01 0.00 0.00
tstatistic 1.87 1.37 1.68 0.27 0.00
AUC 0.57 0.57 0.60 0.52 0.49
RM2 PseudoR
2
477 0.02 0.04 0.02 0.04 0.02
tstatistic 2.61 3.89 3.12 3.78 3.07
AUC 0.62 0.65 0.61 0.64 0.61
TWD PseudoR
2
477 0.01 0.01 0.00 0.00 0.00
tstatistic 2.51 2.06 1.23 1.30 0.70
AUC 0.57 0.55 0.54 0.55 0.54
INDPRO PseudoR
2
477 0.32 0.04 0.00 0.00 0.00
tstatistic 8.58 4.19 0.92 0.05 0.14
AUC 0.88 0.71 0.59 0.48 0.53
PERMIT PseudoR
2
477 0.15 0.12 0.02 0.02 0.00
tstatistic 7.44 6.88 3.24 3.07 1.28
AUC 0.76 0.76 0.67 0.67 0.59
IC4WSA PseudoR
2
477 0.33 0.06 0.01 0.00 0.00
tstatistic 8.74 5.17 2.52 0.73 0.75
AUC 0.88 0.74 0.65 0.55 0.55
AWHMAN PseudoR
2
477 0.11 0.02 0.00 0.00 0.00
tstatistic 6.12 3.07 1.98 0.16 0.18
AUC 0.79 0.65 0.61 0.50 0.53
NAPM PseudoR
2
477 0.10 0.05 0.00 0.00 0.00
tstatistic 6.01 4.32 1.19 1.13 0.29
AUC 0.69 0.68 0.62 0.59 0.50
VIX PseudoR
2
477 0.09 0.01 0.00 0.01 0.03
tstatistic 5.95 2.57 0.68 1.60 3.24
AUC 0.77 0.64 0.47 0.53 0.63
Table 2: Insample statistics for univariate forecasts for each indicator. The ﬁrst row presents the
pseudoR
2
of Estrella & Mishkin (1998), the second row is the zscore of the test of whether the
slope coeﬃcient is diﬀerent from zero, and the ﬁnal row gives the classiﬁcation ability of the logit
model as measured by the AUC.
24
Forecast horizon
h=0 h=6 h=12 h=18 h=24
N 477 470 464 458 452
Equally weighted
GW (abs. error) 0.00 0.00 0.00 0.00 0.00
GW (sq. error) 0.00 0.00 0.00 0.00 0.00
AUC 0.953 0.888 0.858 0.872 0.811
(0.01) (0.02) (0.02) (0.02) (0.02)
BMA weights
GW (abs. error) 0.00 0.00 0.00 0.00 0.00
GW (sq. error) 0.00 0.00 0.01 0.02 0.08
AUC 0.890 0.834 0.887 0.876 0.813
(0.02) (0.03) (0.02) (0.02) (0.02)
Table 3: Forecast accuracy of combined recession forecasts, unweighted and weighted. See text for
details.
25
Forecast horizon
h=0 h=6 h=12 h=18 h=24
Level – – – 7.1 9.4
Slope – 26.7 53.7 50.4 –
Curve – 33.2 58.5 – –
TED 22.1 30.4 – – 13.5
SP BAA10Y 2.3 – – – –
SP AAA10Y – – – 12.6 7.9
DJIA 31.1 33.4 51.4 – 33.9
M2 – – – – –
RM2 – 0.7 0.7 – –
TWD – 4.8 13.3 35.3 68.9
INDPRO 28.1 25.2 – – 14.2
PERMIT 15.4 26.5 – – –
IC4WSA 13.1 9.9 12.7 – 15.5
AWHMAN – – 6.1 18.6 7.3
NAPM – 6.5 17.7 – 41.8
VIX 31.8 – – – –
const. 5.1 9.9 41.4 – 29.7
N 470 464 458 452 446
AIC 176.9 272.6 272.5 260.0 270.4
GW (abs. error) 0.00 0.00 0.00 0.00 0.00
GW (sq. error) 0.00 0.00 0.02 0.00 0.01
AUC 0.973 0.926 0.903 0.898 0.844
(0.01) (0.01) (0.02) (0.02) (0.03)
Table 4: Insample results for linear boosted model. For each regressor, the table shows the
ratio of the coeﬃcient of the boosted model to the coeﬃcient from an unrestricted kitchensink
logit regression, expressed as a percentage. “–” indicates that the coeﬃcient was not selected by
the boosted model. GW test statistics are pvalues from a onesided test that the model forecast
outperforms the null. Pvalues are robust to heteroskedasticity and autocorrelation. Standard
errors of AUC statistic in parentheses. See text for details. Standard errors of AUC statistic in
parentheses. See text for details.
26
Forecast horizon
h=0 h=6 h=12 h=18 h=24
N 470 464 458 452 446
AIC 139.1 193.5 189.2 192.0 184.5
GW (abs. error) 0.00 0.00 0.00 0.00 0.00
GW (sq. error) 0.00 0.00 0.00 0.00 0.00
AUC 0.985 0.970 0.957 0.948 0.920
(0.00) (0.01) (0.01) (0.01) (0.02)
Table 5: Insample performance of boosted model ﬁt with smoothing splines as weak learners.
GW test statistics are pvalues from a onesided test that the model forecast outperforms the null.
Pvalues are robust to heteroskedasticity and autocorrelation. Standard errors of AUC statistic in
parentheses. See text for details. Standard errors of AUC statistic in parentheses. See text for
details.
27
Peak dates
NBER BMA (unweighted) BMA (weighted) Linear boost Nonlinear boost
1973:11 8 10 6 2
1980:1 2 4 4 1
1981:7 0 – 0 1
1990:7 – – 4 3
2001:3 – 1 – 0
2007:12 9 9 3 3
Trough dates
NBER BMA (unweighted) BMA (weighted) Linear boost Nonlinear boost
1975:3 2 0 2 1
1980:7 0 1 1 0
1982:11 9 – 0 0
1991:3 – – 1 0
2001:11 – 6 – 1
2009:6 1 1 0 0
Table 6: Business cycle turning points, 19732012. Value shown for modelimplied recession dates
is the peak/trough calculated using the optimal threshold and a rule that each phase of the business
cycle last more than three months, relative to the NBERdeﬁned peak/trough date. ‘–’ indicates
that the model did not generate a business cycle phase lasting 3 months that corresponds to the
NBER recession. See text for details.
28
Forecast horizon
h=0 h=6 h=12 h=18 h=24
N 332 332 332 332 332
Equally weighted
GW (abs. error) 0.00 0.00 0.00 0.00 0.00
GW (sq. error) 0.01 0.01 0.00 0.00 0.04
AUC 0.810 0.721 0.756 0.797 0.705
(0.04) (0.04) (0.03) (0.03) (0.04)
BMA weighted
GW (abs. error) 0.00 0.00 0.00 0.00 0.00
GW (sq. error) 0.03 0.14 0.00 0.00 0.00
AUC 0.803 0.741 0.887 0.923 0.842
(0.05) (0.04) (0.03) (0.02) (0.03)
Linear boost
GW (abs. error) 0.000 0.000 0.000 0.000 0.000
GW (sq. error) 0.002 0.041 0.049 0.100 0.125
AUC 0.928 0.890 0.904 0.870 0.858
(0.03) (0.03) (0.02) (0.03) (0.03)
Nonlinear boost
GW (abs. error) 0.000 0.000 0.000 0.000 0.000
GW (sq. error) 0.001 0.007 0.003 0.005 0.015
AUC 0.947 0.902 0.949 0.915 0.920
(0.02) (0.03) (0.01) (0.03) (0.03)
Table 7: Outofsample forecast performance. GW test statistics are pvalues from a onesided
test that the model forecast outperforms the null. Pvalues are robust to heteroskedasticity and
autocorrelation. Standard errors of AUC statistic in parentheses. See text for details.
29
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
L
e
v
e
l
S
l
o
p
e
C
u
r
v
e
T
E
D
A
A
A

1
0
Y
B
A
A

1
0
Y
D
J
I
A
M
2
R
M
2
T
W
D
I
N
D
P
R
O
P
E
R
M
I
T
I
C
4
W
S
A
A
W
H
M
A
N
N
A
P
M
V
I
X
h=1
h=6
h=12
h=18
h=24
Figure 1: Posterior probabilities, insample BMA exercise
30
0.0%
5.0%
10.0%
15.0%
20.0%
25.0%
30.0%
35.0%
40.0%
C
o
n
s
t
L
e
v
e
l
S
lo
p
e
C
u
r
v
e
T
E
D
B
A
A
1
0
Y
A
A
A
1
0
Y
D
JIA
M
2
R
M
2
T
W
D
IP
P
e
r
m
it
IC
4
W
M
A
A
W
H
M
A
N
N
A
P
M
V
IX
h=0
h=6
h=12
h=18
h=24
0.0%
5.0%
10.0%
15.0%
20.0%
25.0%
30.0%
35.0%
40.0%
C
o
n
s
t
L
e
v
e
l
S
lo
p
e
C
u
r
v
e
T
E
D
B
A
A
1
0
Y
A
A
A
1
0
Y
D
JIA
M
2
R
M
2
T
W
D
IP
P
e
r
m
it
IC
4
W
M
A
A
W
H
M
A
N
N
A
P
M
V
IX
h=0
h=6
h=12
h=18
h=24
Figure 2: Insample model selection frequency, linear and nonlinear models. The bar graph presents
the fraction of iterations for which a particular covariate was selected by the model selection pro
cedure for each model, linear and nonlinear and for each forecast horizon. See text for details.
31
0
.
2
.
4
.
6
.
8
1
1970m1 1980m1 1990m1 2000m1 2010m1
date
Optimal threshold for unweighted average: 0.21
Unweighted average insample forecast
0
.
2
.
4
.
6
.
8
1
1970m1 1980m1 1990m1 2000m1 2010m1
date
Optimal threshold for BMA: 0.58
BMA weighted insample forecast
0
.
2
.
4
.
6
.
8
1
1970m1 1980m1 1990m1 2000m1 2010m1
date
Optimal threshold for unweighted average: 0.41
Linear boosted model
0
.
2
.
4
.
6
.
8
1
1970m1 1980m1 1990m1 2000m1 2010m1
date
Optimal threshold for BMA: 0.41
Nonlinear boosted model
Figure 3: Insample recession probabilities forecast at horizon of zero months. The top panels
contain forecasts from model combination schemes, both unweighted (left) and Bayesian model
averaged (right). The two bottom panels contain recession probabilities from linear (left) and non
linear (right) boosted forecast models. Grey shading indicates NBERdeﬁned recession dates and
dashed line denotes optimal threshold. See text for details.
32
0
0.25
0.5
0.75
1
M
a
y

8
4
M
a
y

8
5
M
a
y

8
6
M
a
y

8
7
M
a
y

8
8
M
a
y

8
9
M
a
y

9
0
M
a
y

9
1
M
a
y

9
2
M
a
y

9
3
M
a
y

9
4
M
a
y

9
5
M
a
y

9
6
M
a
y

9
7
M
a
y

9
8
M
a
y

9
9
M
a
y

0
0
M
a
y

0
1
M
a
y

0
2
M
a
y

0
3
M
a
y

0
4
M
a
y

0
5
M
a
y

0
6
M
a
y

0
7
M
a
y

0
8
M
a
y

0
9
M
a
y

1
0
M
a
y

1
1
M
a
y

1
2
AllOthers
PERMIT
Curve
M2
Level
Slope
Figure 4: Out of sample selection frequency of covariates for linear boosted model. The ﬁgure
shows the fraction of time a covariate was selected for the model forecasting 12 months ahead at
time t, as forecast model is run through the sample. See text for details.
33
0
.
2
5
.
5
.
7
5
1
P
r
o
b
a
b
i
l
i
t
y
o
f
r
e
c
e
s
s
i
o
n
July 2007 Jan 2008 July 2008 Jan 2009 July 2009 Jan 2010
Equal weights BMA weights
Linear boosted Nonlinear boosted
Figure 5: Contemporaneous outofsample recession probabilities at forecast horizon of h=0. See
text for details.
34
Appendix
Data sources
Table 8: Data sources
Variable Available dates Source
Interest rates
Eﬀective Fed Funds Rate 1954m72012m12 FRB H.15 release
3month Eurodollar deposit rate 1971m12012m12 FRB H.15 release
3month Treasury yield (secondary market) 1934m12012m12 Gurkaynak, Sack and Wright (2006)
1year Treasury yield (constant maturity) 1953m42012m12 Gurkaynak, Sack and Wright (2006)
10year Treasury yield (constant maturity) 1953m42012m12 Gurkaynak, Sack and Wright (2006)
Moody’s BAA corporate bond yield 1947m12012m12 FRB H.15 release
Other ﬁnancial variables
Dow Jones Industrial Average 1947m12012m12 Dow Jones & Company
M2 (seasonally adj.) 1959m12012m12 FRB H.6 release
Tradeweighted US Dollar: major currencies 1973m12012m12 FRB H.10 release
Macroeconomic indicators
CPI index (seasonally adj.) 1947m12012m12 U.S. Dept of Labor: BLS
Industrial production (seasonally adj.) 1947m12012m12 FRB G.17 release
ISM manufacturing PMI index 1948m12012m12 Institute for Supply Management
Housing permits 1960m12012m12 U.S. Census Bureau
Average weekly hours: manufacturing 1947m12012m12 U.S. Dept of Labor: BLS
Initial claims for unemployment insurance 1967m12012m12 U.S. Dept of Labor: BLS
35
This action might not be possible to undo. Are you sure you want to continue?