Econometric Reviews, 26(2–4):113–172, 2007
Copyright © Taylor & Francis Group, LLC
ISSN: 07474938 print/15324168 online
DOI: 10.1080/07474930701220071
BAYESIAN ANALYSIS OF DSGE MODELS
Sungbae An School of Economics and Social Sciences, Singapore Management
University, Singapore
Frank Schorfheide Department of Economics, University of Pennsylvania,
Philadelphia, Pennsylvania, USA
This paper reviews Bayesian methods that have been developed in recent years to estimate
and evaluate dynamic stochastic general equilibrium (DSGE) models. We consider the estimation
of linearized DSGE models, the evaluation of models based on Bayesian model checking,
posterior odds comparisons, and comparisons to vector autoregressions, as well as the nonlinear
estimation based on a secondorder accurate model solution. These methods are applied to data
generated from correctly speciﬁed and misspeciﬁed linearized DSGE models and a DSGE model
that was solved with a secondorder perturbation method.
Keywords Bayesian analysis; DSGE models; Model evaluation; Vector autoregressions.
JEL Classiﬁcation C11; C32; C51; C52.
1. INTRODUCTION
Dynamic stochastic general equilibrium (DSGE) models are micro
founded optimizationbased models that have become very popular in
macroeconomics over the past 25 years. They are taught in virtually
every Ph.D. program and represent a signiﬁcant share of publications in
macroeconomics. For a long time the quantitative evaluation of DSGE
models was conducted without formal statistical methods. While DSGE
models provide a complete multivariate stochastic process representation
for the data, simple models impose very strong restrictions on actual time
series and are in many cases rejected against less restrictive speciﬁcations
such as vector autoregressions (VAR). Apparent model misspeciﬁcations
were used as an argument in favor of informal calibration approaches
along the lines of Kydland and Prescott (1982).
Received August 20, 2005; Accepted July 15, 2006
Address correspondence to Frank Schorfheide, Department of Economics, University of
Pennsylvania, 3718 Locust Walk, Philadelphia, PA 19104, USA; Email: schorf@ssc.upenn.edu
114 S. An and F. Schorfheide
Subsequently, many authors have developed econometric frameworks
that formalize aspects of the calibration approach by taking model
misspeciﬁcation explicitly into account. Examples are Smith (1993),
Watson (1993), Canova (1994), DeJong et al. (1996), Diebold et al.
(1998), Geweke (1999b), Schorfheide (2000), Dridi et al. (2007), and
Bierens (2007). At the same time, macroeconomists have improved the
structural models and relaxed many of the misspeciﬁed restrictions of
the ﬁrst generation of DSGE models. As a consequence, more traditional
econometric techniques have become applicable. The most recent vintage
of DSGE models is not just attractive from a theoretical perspective but
is also emerging as a useful tool for forecasting and quantitative policy
analysis in macroeconomics. Moreover, owing to improved time series ﬁt
these models are gaining credibility in policymaking institutions such as
central banks.
This paper reviews Bayesian estimation and evaluation techniques
that have been developed in recent years for empirical work with DSGE
models.
1
We focus on methods that are built around a likelihood function
derived from the DSGE model. The econometric analysis has to cope
with several challenges, including potential model misspeciﬁcation and
identiﬁcation problems. We will illustrate how a Bayesian framework can
address these challenges. Most of the techniques described in this article
have been developed and applied in other papers. Our contribution is
to give a uniﬁed perspective, by applying these methods successively to
artiﬁcial data generated from a DSGE model and a VAR. We provide some
evidence on the performance of Markov Chain Monte Carlo (MCMC)
methods that have been applied to the Bayesian estimation of DSGE
models. Moreover, we present new results on the use of ﬁrstorder accurate
versus secondorder accurate solutions in the estimation of DSGE models.
The paper is structured as follows. Section 2 outlines two versions of
a ﬁveequation New Keynesian DSGE model along the lines of Woodford
(2003) that differ with respect to the monetary policy rule. This model
serves as the foundation for the current generation of largescale models
that are used for the analysis of monetary policy in academic and central
bank circles. Section 3 discusses some preliminaries, in particular the
challenges that have to be confronted by the econometric framework.
We proceed by generating several data sets that are used subsequently
for model estimation and evaluation. We simulate samples from the ﬁrst
order accurate solution of the DSGE model presented in Section 2,
from a modiﬁed version of the DSGE model in order to introduce
1
There is also an extensive literature on classical estimation and evaluation of DSGE models,
but a detailed survey of these methods is beyond the scope of this article. The interested reader is
referred to Kim and Pagan (1995) and the book by Canova (2007), which discusses both classical
and Bayesian methods for the analysis of DSGE models.
Bayesian Analysis of DSGE Models 115
misspeciﬁcation, and from the secondorder accurate solution of the
benchmark DSGE model.
Owing to the computational burden associated with the likelihood
evaluation for nonlinear solutions of the DSGE model, most of the
empirical literature has estimated linearized DSGE models. Section 4
describes a RandomWalk Metropolis (RWM) algorithm and an Importance
Sampler (IS) algorithm that can be used to calculate the posterior
moments of DSGE model parameters and transformations thereof. We
compare the performance of the algorithms and provide an example
in which they explore the posterior distribution only locally in the
neighborhood of modes that are separated from each other by a deep
valley in the surface of the posterior density.
Model evaluation is an important part of the empirical work with
DSGE models. We consider three techniques in Section 5: posterior
predictive model checking, model comparisons based on posterior odds,
and comparisons of DSGE models to VARs. We illustrate these techniques
by ﬁtting correctly speciﬁed and misspeciﬁed DSGE models to artiﬁcial
data. In Section 6 we construct posterior distributions for DSGE model
parameters based on a secondorder accurate solution and compare them
to posteriors obtained from a linearized DSGE model. Finally, Section 7
concludes and provides an outlook on future work.
2. A PROTOTYPICAL DSGE MODEL
Our model economy consists of a ﬁnal goods producing ﬁrm, a
continuum of intermediate goods producing ﬁrms, a representative
household, and a monetary as well as a ﬁscal authority. This model has
become a benchmark speciﬁcation for the analysis of monetary policy
and is analyzed in detail, for instance, in Woodford (2003). To keep the
model speciﬁcation simple, we abstract from wage rigidities and capital
accumulation. More elaborate versions can be found in Smets and Wouters
(2003) and Christiano et al. (2005).
2.1. The Agents and Their Decision Problems
The perfectly competitive, representative, ﬁnal goods producing ﬁrm
combines a continuum of intermediate goods indexed by j ∈ [0, 1] using
the technology
Y
t
=
__
1
0
Y
t
(j )
1−v
dj
_
1
1−v
. (1)
Here 1,v > 1 represents the elasticity of demand for each intermediate
good. The ﬁrm takes input prices P
t
(j ) and output prices P
t
as given.
116 S. An and F. Schorfheide
Proﬁt maximization implies that the demand for intermediate goods is
Y
t
(j ) =
_
P
t
(j )
P
t
_
−1,v
Y
t
. (2)
The relationship between intermediate goods prices and the price of the
ﬁnal good is
P
t
=
__
1
0
P
t
(j )
v−1
v
dj
_
v
v−1
. (3)
Intermediate good j is produced by a monopolist who has access to the
linear production technology
Y
t
(j ) = A
t
N
t
(j ), (4)
where A
t
is an exogenous productivity process that is common to all
ﬁrms and N
t
(j ) is the labor input of ﬁrm j . Labor is hired in a perfectly
competitive factor market at the real wage W
t
. Firms face nominal rigidities
in terms of quadratic price adjustment costs
AC
t
(j ) =
[
2
_
P
t
(j )
P
t −1
(j )
−¬
_
2
Y
t
(j ), (5)
where [ governs the price stickiness in the economy and ¬ is the steady
state inﬂation rate associated with the ﬁnal good. Firm j chooses its labor
input N
t
(j ) and the price P
t
(j ) to maximize the present value of future
proﬁts
t
_
∞
s=0
[
s
Q
t +st
_
P
t +s
(j )
P
t +s
Y
t +s
(j ) −W
t +s
N
t +s
(j ) −AC
t +s
(j )
_
_
. (6)
Here Q
t +st
is the time t value of a unit of the consumption good in period
t +s to the household, which is treated as exogenous by the ﬁrm.
The representative household derives utility from real money balances
M
t
,P
t
and consumption C
t
relative to a habit stock. We assume that the
habit stock is given by the level of technology A
t
. This assumption ensures
that the economy evolves along a balanced growth path even if the utility
function is additively separable in consumption, real money balances,
and leisure. The household derives disutility from hours worked H
t
and
maximizes
t
_
∞
s=0
[
s
_
(C
t +s
,A
t +s
)
1−t
−1
1 −t
+,
M
ln
_
M
t +s
P
t +s
_
−,
H
H
t +s
_
_
, (7)
Bayesian Analysis of DSGE Models 117
where [ is the discount factor, 1,t is the intertemporal elasticity of
substitution, and ,
M
and ,
H
are scale factors that determine steadystate
real money balances and hours worked. We will set ,
H
= 1. The household
supplies perfectly elastic labor services to the ﬁrms taking the real wage
W
t
as given. The household has access to a domestic bond market where
nominal government bonds B
t
are traded that pay (gross) interest R
t
.
Furthermore, it receives aggregate residual real proﬁts D
t
from the ﬁrms
and has to pay lumpsum taxes T
t
. Thus the household’s budget constraint
is of the form
P
t
C
t
+B
t
+M
t
−M
t −1
+T
t
= P
t
W
t
H
t
+R
t −1
B
t −1
+P
t
D
t
+P
t
SC
t
, (8)
where SC
t
is the net cash inﬂow from trading a full set of statecontingent
securities. The usual transversality condition on asset accumulation applies,
which rules out Ponzi schemes.
Monetary policy is described by an interest rate feedback rule of the
form
R
t
= R
∗ 1−j
R
t
R
j
R
t −1
e
e
R,t
, (9)
where e
R,t
is a monetary policy shock and R
∗
t
is the (nominal) target rate.
We consider two speciﬁcations for R
∗
t
, one in which the central bank reacts
to inﬂation and deviations of output from potential output:
R
∗
t
= r ¬
∗
_
¬
t
¬
∗
_
ç
1
_
Y
t
Y
∗
t
_
ç
2
(output gap rule speciﬁcation) (10)
and a second speciﬁcation in which the central bank responds to deviations
of output growth from its equilibrium steadystate ¸:
R
∗
t
= r ¬
∗
_
¬
t
¬
∗
_
ç
1
_
Y
t
¸Y
t −1
_
ç
2
(output growth rule speciﬁcation). (11)
Here r is the steadystate real interest rate, ¬
t
is the gross inﬂation
rate deﬁned as ¬
t
= P
t
,P
t −1
, and ¬
∗
is the target inﬂation rate, which in
equilibrium coincides with the steadystate inﬂation rate. Y
∗
t
in (10) is the
level of output that would prevail in the absence of nominal rigidities.
The ﬁscal authority consumes a fraction ¸
t
of aggregate output Y
t
,
where ¸
t
∈ [0, 1] follows an exogenous process. The government levies a
lumpsum tax (subsidy) to ﬁnance any shortfalls in government revenues
(or to rebate any surplus). The government’s budget constraint is given by
P
t
G
t
+R
t −1
B
t −1
= T
t
+B
t
+M
t
−M
t −1
, (12)
where G
t
= ¸
t
Y
t
.
118 S. An and F. Schorfheide
2.2. Exogenous Processes
The model economy is perturbed by three exogenous processes.
Aggregate productivity evolves according to
lnA
t
= ln¸ +lnA
t −1
+lnz
t
, where lnz
t
= j
z
lnz
t −1
+e
z,t
. (13)
Thus on average technology grows at the rate ¸, and z
t
captures exogenous
ﬂuctuations of the technology growth rate. Deﬁne g
t
= 1,(1 −¸
t
). We
assume that
lng
t
= (1 −j
g
)lng +j
g
lng
t −1
+e
g ,t
. (14)
Finally, the monetary policy shock e
R,t
is assumed to be serially
uncorrelated. The three innovations are independent of each other at all
leads and lags and are normally distributed with means zero and standard
deviations o
z
, o
g
, and o
R
, respectively.
2.3. Equilibrium Relationships
We consider the symmetric equilibrium in which all intermediate goods
producing ﬁrms make identical choices so that the j subscript can be
omitted. The market clearing conditions are given by
Y
t
= C
t
+G
t
+AC
t
and H
t
= N
t
. (15)
Since the households have access to a full set of statecontingent claims,
Q
t +st
in (6) is
Q
t +st
= (C
t +s
,C
t
)
−t
(A
t
,A
t +s
)
1−t
. (16)
It can be shown that output, consumption, interest rates, and inﬂation have
to satisfy the following optimality conditions
1 = [
t
__
C
t +1
,A
t +1
C
t
,A
t
_
−t
A
t
A
t +1
R
t
¬
t +1
_
(17)
1 =
1
v
_
1 −
_
C
t
A
t
_
t
_
+[(¬
t
−¬)
__
1 −
1
2v
_
¬
t
+
¬
2v
_
−[[
t
__
C
t +1
,A
t +1
C
t
,A
t
_
−t
Y
t +1
,A
t +1
Y
t
,A
t
(¬
t +1
−¬)¬
t +1
_
. (18)
In the absence of nominal rigidities ([ = 0) aggregate output is given by
Y
∗
t
= (1 −v)
1,t
A
t
g
t
, (19)
Bayesian Analysis of DSGE Models 119
which is the target level of output that appears in the output gap rule
speciﬁcation.
Since the nonstationary technology process A
t
induces a stochastic
trend in output and consumption, it is convenient to express the model
in terms of detrended variables c
t
= C
t
,A
t
and y
t
= Y
t
,A
t
. The model
economy has a unique steadystate in terms of the detrended variables
that is attained if the innovations e
R,t
, e
g ,t
, and e
z,t
are zero at all times.
The steadystate inﬂation ¬ equals the target rate ¬
∗
and
r =
¸
[
, R = r ¬
∗
, c = (1 −v)
1,t
, and y = g (1 −v)
1,t
. (20)
Let ˆ x
t
= ln(x
t
,x) denote the percentage deviation of a variable x
t
from its
steadystate x. Then the model can be expressed as
1 = [
t
e
−tˆ c
t +1
+tˆ c
t
+
¨
R
t
−ˆ z
t +1
−ˆ ¬
t +1
] (21)
1 −v
v[¬
2
_
e
tˆ c
t
−1
_
=
_
e
ˆ ¬
t
−1
_
__
1 −
1
2v
_
e
ˆ ¬
t
+
1
2v
_
−[
t
__
e
ˆ ¬
t +1
−1
_
e
−tˆ c
t +1
+tˆ c
t
+ˆ y
t +1
−ˆ y
t
+ˆ ¬
t +1
_
(22)
e
ˆ c
t
−ˆ y
t
= e
−ˆ g
t
−
[¬
2
g
2
_
e
ˆ ¬
t
−1
_
2
(23)
¨
R
t
= j
R
¨
R
t −1
+(1 −j
R
)ç
1
ˆ ¬
t
+(1 −j
R
)ç
2
(ˆ y
t
− ˆ g
t
) +e
R,t
(24)
ˆ g
t
= j
g
ˆ g
t −1
+e
g ,t
(25)
ˆ z
t
= j
z
ˆ z
t −1
+e
z,t
. (26)
For the output growth rule speciﬁcation, Equation (24) is replaced by
¨
R
t
= j
R
¨
R
t −1
+(1 −j
R
)ç
1
ˆ ¬
t
+(1 −j
R
)ç
2
_
Aˆ y
t
+ ˆ z
t
_
+e
R,t
. (27)
2.4. Model Solutions
Equations (21) to (26) form a nonlinear rational expectations system
in the variables ˆ y
t
, ˆ c
t
, ˆ ¬
t
,
¨
R
t
, ˆ g
t
, and ˆ z
t
that is driven by the vector of
innovations e
t
= [e
R,t
, e
g ,t
, e
z,t
]
. This rational expectations system has to be
solved before the DSGE model can be estimated. Deﬁne
2
s
t
= [ˆ y
t
, ˆ c
t
, ˆ ¬
t
,
¨
R
t
, e
R,t
, ˆ g
t
, ˆ z
t
]
.
2
Under the output growth rule speciﬁcation for R
∗
t
the vector s
t
also contains ˆ y
t −1
.
120 S. An and F. Schorfheide
The solution of the rational expectations system takes the form
s
t
= 4(s
t −1
, e
t
; 0). (28)
From an econometric perspective, s
t
can be viewed as a (partially latent)
state vector in a nonlinear state space model and (28) is the state transition
equation.
A variety of numerical techniques are available to solve rational
expectations systems. In the context of likelihoodbased DSGE model
estimation, linear approximation methods are very popular because
they lead to a state–space representation of the DSGE model that can
be analyzed with the Kalman ﬁlter. Linearization and straightforward
manipulation of Equations (21) to (23) yields
ˆ y
t
=
t
[ˆ y
t +1
] + ˆ g
t
−
t
[ ˆ g
t +1
] −
1
t
_
¨
R
t
−
¨
t
[¬
t +1
] −
t
[ˆ z
t +1
]
_
(29)
ˆ ¬
t
= [
t
[ ˆ ¬
t +1
] +«(ˆ y
t
− ˆ g
t
) (30)
ˆ c
t
= ˆ y
t
− ˆ g
t
, (31)
where
« = t
1 −v
v¬
2
[
. (32)
Equations (29) to (31) combined with (24) to (26) and the trivial
identity e
R,t
= e
R,t
form a linear rational expectations system in s
t
for
which several solution algorithms are available, for instance, Blanchard
and Kahn (1980), Binder and Pesaran (1997), King and Watson (1998),
Uhlig (1999), Anderson (2000), Kim (2000), Christiano (2002), and Sims
(2002). Depending on the parameterization of the DSGE model there are
three possibilities: no stable rational expectations solution exists, the stable
solution is unique (determinacy), or there are multiple stable solutions
(indeterminacy). We will focus on the case of determinacy and restrict
the parameter space accordingly. The resulting law of motion for the j th
element of s
t
takes the form
s
j ,t
=
J
i=1
4
(s)
j ,i
s
i,t −1
+
n
l =1
4
(e)
j ,l
e
l ,t
, j = 1, . . . , J . (33)
Here J denotes the number of elements of the vector s
t
and n is the
number of shocks stacked in the vector e
t
. Here the coefﬁcients 4
(s)
j ,i
and
4
(e)
j ,l
are functions of the structural parameters of the DSGE model.
While in many applications ﬁrstorder approximations are sufﬁcient,
the higherorder reﬁnement is an active area of research. For instance,
Bayesian Analysis of DSGE Models 121
if the goal of the analysis is to compare welfare across policies or market
structures that do not have ﬁrstorder effects on the model’s steadystate
or to study asset pricing implications of DSGE models, a more accurate
solution may be necessary; see, for instance, Kim and Kim (2003) or
Woodford (2003). A simple example can illustrate this point. Consider a
oneperiod model in which utility is a smooth function of consumption and
is approximated as
U(ˆ c) = U(0) +U
(1)
(0)ˆ c +
1
2
U
(2)
(0)ˆ c
2
+o(ˆ c
2
), (34)
where ˆ c is the percentage deviation of consumption from its steadystate
and the superscript i denotes the ith derivative. Suppose that consumption
is a smooth function of a zero mean random shock e which is scaled by o.
The secondorder approximation of ˆ c around the steadystate (o = 0) is
ˆ c = C
(1)
(c)oe +
1
2
C
(2)
(c)o
2
e
2
+o
p
(o
2
). (35)
If ﬁrstorder approximations are used for both U and C, then the expected
utility is simply approximated by the steadystate utility U(0), which, for
instance, in the DSGE model described above, is not affected by the
coefﬁcients ç
1
and ç
2
of the monetary policy rule (24). A secondorder
approximation of both U and C leads to
[U(ˆ c)] = U(0) +
1
2
U
(2)
(0)C
(1)
2
(c)o
2
+
1
2
U
(1)
(0)C
(2)
(c)o
2
+o(o
2
). (36)
Using a secondorder expansion of the utility function together with a ﬁrst
order expansion of consumption ignores the second term in (36) and is
only appropriate if either U
(1)
(0) or C
(2)
(c) is zero. The ﬁrst case arises
if the marginal utility of consumption in the steadystate is zero, and the
second case arises if the percentage deviation of consumption from the
steadystate is a linear function of the shock.
A secondorder accurate solution to the DSGE model can be obtained
from a secondorder expansion of the equilibrium conditions (21) to (26).
Algorithms to construct such solutions have been developed by Judd
(1998), Collard and Juillard (2001), Jin and Judd (2002), Schmitt
Grohé and Uribe (2006), Kim et al. (2005), and Swanson et al. (2005).
The resulting state transition equation can be expressed as
s
j ,t
= 4
(0)
j
+
J
i=1
4
(s)
j ,i
s
i,t −1
+
n
l =1
4
(e)
j ,l
e
l ,t
+
J
i=1
J
l =1
4
(ss)
j ,il
s
i,t −1
s
l ,t −1
+
J
i=1
n
l =1
4
(se)
j ,il
s
i,t −1
e
l ,t
+
n
i=1
n
l =1
4
(ee)
j ,il
e
i,t
e
l ,t
. (37)
122 S. An and F. Schorfheide
As before, the coefﬁcients are functions of the parameters of the DSGE
model. For the subsequent analysis we use Sims’ (2002) procedure to
compute a ﬁrstorder accurate solution of the DSGE model and Schmitt
Grohé and Uribe’s (2006) algorithm to obtain a secondorder accurate
solution.
While perturbation methods approximate policy functions only locally,
there are several global approximation schemes, including projection
methods such as the ﬁniteelements method and Chebyshev–polynomial
method on the spectral domain. Judd (1998) covers various solution
methods, and Taylor and Uhlig (1990), Den Haan and Marcet (1994), and
Aruoba et al. (2004) compare the accuracy of alternative solution methods.
2.5. Measurement Equations
The model is completed by deﬁning a set of measurement equations
that relate the elements of s
t
to a set of observables. We assume that
the time period t in the model corresponds to one quarter and that the
following observations are available for estimation: quartertoquarter per
capita GDP growth rates (YGR), annualized quartertoquarter inﬂation
rates (INFL), and annualized nominal interest rates (INT). The three
series are measured in percentages, and their relationship to the model
variables is given by the set of equations
YGR
t
= ¸
(Q)
+100(ˆ y
t
− ˆ y
t −1
+ ˆ z
t
)
INFL
t
= ¬
(A)
+400ˆ ¬
t
(38)
INT
t
= ¬
(A)
+r
(A)
+4¸
(Q)
+400
¨
R
t
.
The parameters ¸
(Q)
, ¬
(A)
, and r
(A)
are related to the steady states of the
model economy as
¸ = 1 +
¸
(Q)
100
, [ =
1
1 +r
(A)
,400
, ¬ = 1 +
¬
(A)
400
.
The structural parameters are collected in the vector 0. Since in the
ﬁrstorder approximation the parameters v and [ are not separately
identiﬁable, we express the model in terms of «, deﬁned in (32). Let
0 = [t, «, ç
1
, ç
2
, j
R
, j
g
, j
z
, r
(A)
, ¬
(A)
, ¸
(Q)
, o
R
, o
g
, o
z
]
.
For the quadratic approximation the composite parameter « will be
replaced by either ([, v) or alternatively by («, v). Moreover, 0 will be
augmented with the steadystate ratio c,y, which is equal to 1,g .
Bayesian Analysis of DSGE Models 123
3. PRELIMINARIES
Numerous formal and informal econometric procedures have been
proposed to parameterize and evaluate DSGE models, ranging from
calibration, e.g., Kydland and Prescott (1982), over generalized method of
moments (GMM) estimation of equilibrium relationships, e.g., Christiano
and Eichenbaum (1992), minimum distance estimation based on the
discrepancy among VAR and DSGE model impulse response functions,
e.g., Rotemberg and Woodford (1997) and Christiano et al. (2005), to
fullinformation likelihoodbased estimation as in Altug (1989), McGrattan
(1994), Leeper and Sims (1994), and Kim (2000). Much of the
methodological debate surrounding the various estimation (and model
evaluation) techniques is summarized in papers by Kydland and Prescott
(1996), Hansen and Heckman (1996), and Sims (1996).
We focus on Bayesian estimation of DSGE models, which has three
main characteristics. First, unlike GMM estimation based on equilibrium
relationships such as the consumption Euler equation (21), the price
setting equation of the intermediate goods producing ﬁrms (22), or the
monetary policy rule (24), the Bayesian analysis is systembased and ﬁts
the solved DSGE model to a vector of aggregate time series. Second,
the estimation is based on the likelihood function generated by the
DSGE model rather than, for instance, the discrepancy between DSGE
model responses and VAR impulse responses. Third, prior distributions
can be used to incorporate additional information into the parameter
estimation. Any estimation and evaluation method is confronted with the
following challenges: potential model misspeciﬁcation and possible lack
of identiﬁcation of parameters of interest. We will subsequently elaborate
these challenges and discuss how a Bayesian approach can be used to cope
with them.
Throughout the paper we will use the following notation: the n ×1
vector y
t
stacks the time t observations that are used to estimate the DSGE
model. In the context of the model developed in Section 2 y
t
is composed
of YGR
t
, INFL
t
, INT
t
. The sample ranges from t = 1 to T and the sample
observations are collected in the matrix Y with rows y
t
. We denote the
prior density by p(0), the likelihood function by (0  Y ), and the posterior
density by p(0  Y ).
3.1. Potential Model Misspeciﬁcation
If one predicts a vector of time series y
t
, for instance composed of
output growth, inﬂation, and nominal interest rates, by a function of past
y
t
’s, then the resulting forecast error covariance matrix is nonsingular.
Hence any DSGE model that generates a rankdeﬁcient covariance matrix
for y
t
is clearly at odds with the data and suffers from an obvious form
124 S. An and F. Schorfheide
of misspeciﬁcation. This singularity is an obstacle to likelihood estimation.
Hence one branch of the literature has emphasized procedures that can
be applied despite the singularity, whereas the other branch has modiﬁed
the model speciﬁcation to remove the singularity by adding socalled
measurement errors, e.g., Sargent (1989), Altug (1989), Ireland (2004),
or additional structural shocks as in Leeper and Sims (1994) and more
recently Smets and Wouters (2003). In this paper we pursue the latter
approach by considering a model in which the number of structural shocks
(monetary policy shock, government spending shock, technology growth
shock) equals the number of observables (output growth, inﬂation, interest
rates) to which the model is ﬁtted.
A second source of misspeciﬁcation that is more difﬁcult to correct
is potentially invalid crosscoefﬁcient restrictions on the time series
representation of y
t
generated by the DSGE model. Invalid restrictions
manifest themselves in poor outofsample ﬁt relative to more densely
parameterized reference models such as VARs. Del Negro et al. (2006)
document that even an elaborate DSGE model with capital accumulation
and various nominal and real frictions has difﬁculties attaining the ﬁt
achieved with VARs that have been estimated with welldesigned shrinkage
methods. While the primary research objectives in the DSGE model
literature is to overcome discrepancies between models and reality, it is
important to have empirical strategies available that are able to cope with
potential model misspeciﬁcation.
Once one acknowledges that the DSGE model provides merely an
approximation to the law of motion of the time series y
t
, then it seems
reasonable to assume that there need not exist a single parameter vector
0
0
that delivers, say, the “true” intertemporal substitution elasticity or price
adjustment costs and, simultaneously, the most precise impulse responses
to a technology or monetary policy shock. Each estimation method is
associated with a particular measure of discrepancy between the ‘true’
law of motion and the class of approximating models. Likelihoodbased
estimators, for instance, asymptotically minimize the Kullback–Leibler
distance (see White, 1982).
One reason that pure maximum likelihood estimation of DSGE models
has not turned into the estimation method of choice is the “dilemma of
absurd parameter estimates.” Estimates of structural parameters generated
with maximum likelihood procedures based on a set of observations Y
are often at odds with additional information that the research may have.
For example, estimates of the discount factor [ should be consistent with
our knowledge about the average magnitude of real interest rates, even if
observations on interest rates are not included in the estimation sample
Y . Time series estimates of aggregate labor supply elasticities or price
adjustment costs should be broadly consistent with microlevel evidence.
However, due to the stylized nature and the resulting misspeciﬁcation
Bayesian Analysis of DSGE Models 125
of most DSGE models, the likelihood function often peaks in regions
of the parameter space that appear to be inconsistent with extraneous
information.
In a Bayesian framework, the likelihood function is reweighted by a
prior density. The prior can bring to bear information that is not contained
in the estimation sample Y . Since priors are always subject to revision, the
shift from prior to posterior distribution can be an indicator of the tension
between different sources of information. If the likelihood function peaks
at a value that is at odds with the information that has been used to
construct the prior distribution, then the marginal data density of the
DSGE model, deﬁned as
p(Y ) =
_
(0  Y )p(0)d0, (39)
will be low compared to, say, a VAR, and in a posterior odds comparison
the DSGE model will automatically be penalized for not being able to
reconcile the two sources of information with a single set of parameters.
Section 5 will discuss several methods that have been proposed to assess
the ﬁt of DSGE models: posterior odds comparisons of competing model
speciﬁcations, posterior predictive checks, and comparisons to reference
models that relax some of the crosscoefﬁcient restrictions generated by the
DSGE models.
3.2. Identiﬁcation
Identiﬁcation problems can arise owing to a lack of informative
observations or, more fundamentally, from a probability model that
implies that different values of structural parameters lead to the same
joint distribution for the observables Y . At ﬁrst glance the identiﬁcation
of DSGE model parameters does not appear to be problematic. The
parameter vector 0 is typically low dimensional compared to VARs and
the model imposes tight restrictions on the time series representation
of Y . However, recent experience with the estimation of New Keynesian
DSGE models has cast some doubt on this notion and triggered a more
careful assessment. Lack of identiﬁcation is documented in papers by
Beyer and Farmer (2004) and Canova and Sala (2005). The former paper
provides an algorithm to construct families of observationally equivalent
linear rational expectations models, whereas the latter paper compares
the informativeness of different estimators with respect to key structural
parameters in a variety of DSGE models.
The delicate identiﬁcation problems that arise in rational expectations
models can be illustrated in a simple example adopted from Lubik and
Schorfheide (2006). Consider the following two models, in which y
t
is
126 S. An and F. Schorfheide
the observed endogenous variable and u
t
is an unobserved shock process.
In model
1
, the u
t
’s are serially correlated:
1
: y
t
=
1
:
t
[y
t +1
] +u
t
, u
t
= ju
t −1
+e
t
, e
t
∼ iid
_
0, (1 −j,:)
2
_
.
(40)
In model
2
the shocks are serially uncorrelated, but we introduce
a backwardlooking term [y
t −1
on the righthand side to generate
persistence:
2
: y
t
=
1
:
t
[y
t +1
] +[y
t −1
+u
t
, u
t
= e
t
,
e
t
∼ iid
_
0,
_
: +
_
:
2
−4[:
2:
_
2
_
.
(41)
For both speciﬁcations, the law of motion of y
t
is
y
t
= çy
t −1
+p
t
, p
t
∼ iid(0, 1). (42)
Under restrictions on the parameter spaces that guarantee uniqueness of a
stable rational expectations solution
3
we obtain the following relationships
between ç and the structural parameters:
1
: ç = j,
2
: ç =
1
2
(: −
_
:
2
−4[:).
In model
1
the parameter : is not identiﬁable, and in model
2
, the
parameters : and [ are not separately identiﬁable. Moreover, models
1
and
2
are observationally equivalent. The likelihood functions of
1
and
2
have ridges that can cause serious problems for numerical optimization
procedures. The calculation of a valid conﬁdence set is challenging since it
is difﬁcult to trace out the parameter subspace along which the likelihood
function is constant.
While Bayesian inference is based on the likelihood function, even a
weakly informative prior can introduce curvature into the posterior density
surface that facilitates numerical maximization and the use of MCMC
methods. Consider model
1
. Here 0 = [j, :]
and the likelihood function
can be written as (j, :  Y ) =
(j). Straightforward manipulations of the
Bayes theorem yield
p(j, :  Y ) = p(j  Y )p(:  j). (43)
3
To ensure determinacy we impose : > 1 in
1
and  : −
_
:
2
−4[:   2 and
 : +
_
:
2
−4[:  > 2 in
2
.
Bayesian Analysis of DSGE Models 127
Thus the prior distribution is not updated in directions of the parameter
space in which the likelihood function is ﬂat. This is of course well known
in Bayesian econometrics; see Poirier (1998) for a discussion.
It is difﬁcult to detect directly identiﬁcation problems in large DSGE
models, since the mapping from the vector of structural parameters 0
into the state–space representation that determines the joint probability
distribution of Y is highly nonlinear and typically can only be evaluated
numerically. The posterior distribution is well deﬁned as long as the
joint prior distribution of the parameters is proper. However, lack of
identiﬁcation provides a challenge for scientiﬁc reporting, as the audience
typically would like to know what features of the posterior are generated
by the prior rather than the likelihood. A direct comparison of priors and
posteriors can often provide valuable insights about the extent to which
data provide information about the parameters of interest.
3.3. Priors
As indicated in our previous discussion, prior distributions will play an
important role in the estimation of DSGE models. They might downweigh
regions of the parameter space that are at odds with observations not
contained in the estimation sample Y . They might also add curvature
to a likelihood function that is (nearly) ﬂat in some dimensions of
the parameter space and therefore strongly inﬂuence the shape of the
posterior distribution.
4
While, in principle, priors can be gleaned from
personal introspection to reﬂect strongly held beliefs about the validity
of economic theories, in practice most priors are chosen based on some
observations.
For instance, Lubik and Schorfheide (2006) estimate a twocountry
version of the model described in Section 2. Priors for the autocorrelations
and standard deviations of the exogenous processes, the steadystate
parameters ¸
(Q)
, ¬
(A)
, and r
(A)
, as well as the standard deviation of the
monetary policy shock, are quantiﬁed based on regressions run on pre
(estimation)sample observations of output growth, inﬂation, and nominal
interest rates. The priors for the coefﬁcients in the monetary policy rule
are loosely centered around values typically associated with the Taylor rule.
The prior for the parameter that governs price stickiness is chosen based
on microevidence on price setting behavior provided, for instance, in Bils
and Klenow (2004). To ﬁnetune the prior distribution, in particular the
distribution of shock standard deviations, it is often helpful to simulate the
prior predictive distribution for various sample moments and check that
4
The role of priors in DSGE model estimation is markedly different from the role of priors
in VAR estimation. In the latter case priors are essentially used to reduce the dimensionality of the
econometric model and hence the sampling variability of the parameter estimates.
128 S. An and F. Schorfheide
the prior does not place little or no mass in a neighborhood of important
features of the data. Such an occurrence would suggest that the model is
incapable of explaining salient data properties. A formalization of such a
prior predictive check can be found in Geweke (2005).
Table 2 lists the marginal prior distributions for the structural
parameters of the DSGE model, that we will use in the subsequent
analysis. These priors are adopted from Lubik and Schorfheide (2006). For
convenience, it is assumed that all parameters are a priori independent.
In applications in which the independence assumption is unreasonable
one could derive parameter transformations, such as steady rate ratios,
autocorrelations, or relative volatilities, and specify independent priors
on the transformed parameters, which induce dependent priors for the
original parameters. As mentioned before, rational expectations models
can have multiple equilibria. While this may be of independent interest we
do not pursue this direction in this paper. Hence, the prior distribution
is truncated at the boundary of the determinacy region. Prior to the
truncation the distribution speciﬁed in Table 2 places about 2% of its
mass on parameter values that imply indeterminacy. The parameters v
(conditional on «) and 1,g only affect the secondorder approximation of
the DSGE model.
3.4. The Road Ahead
Throughout this paper we are estimating and evaluating versions of
the DSGE model presented in Section 2 based on simulated data. Table 1
provides a characterization of the model speciﬁcations and data sets.
TABLE 1 Model speciﬁcations and data sets
1
(L) Benchmark Model with output gap rule, consists of Eqs. (21)–(26), solved by ﬁrstorder
approximation.
1
(Q) Benchmark Model with output gap rule, consists of Eqs. (21)–(26), solved by secondorder
approximation.
2
(L) DSGE Model with output growth rule, consists of Eqs. (21)–(26), however, Eq. (24) is
replaced by Eq. (27), solved by ﬁrstorder approximation.
3
(L) Same as
1
(L), except that prices are nearly ﬂexible: « = 5.
4
(L) Same as
1
(L), except that central bank does not respond to the output gap: ç
2
= 0.
5
(L) Same as
1
(L), except in ﬁrstorder approximation Eq. (29)is replaced by
(1 +h)ˆ y
t
=
¨
t
[y
t +1
] +hˆ y
t −1
+ ˆ g
t
−
t
[ ˆ g
t +1
] −
1
t
_
¨
R
t
−
¨
t
[¬
t +1
] −[ˆ z
t +1
]
_
with h = .95.
1
(L) 80 observations generated with
1
(L)
1
(Q) 80 observations generated with
1
(Q)
2
(L) 80 observations generated with
2
(L)
5
(L) 80 observations generated with
5
(L)
Bayesian Analysis of DSGE Models 129
Model
1
is the benchmark speciﬁcation of the DSGE model in which
monetary policy follows an interestrate rule that reacts to the output gap.
1
(L) is approximated by a linear rational expectations system, whereas
1
(Q) is solved with a secondorder perturbation method. In
2
(L) we
replace the output gap in the interestrate feedback rule by output growth.
3
and
4
are identical to
1
(L), except that in one case we impose that
the prices are nearly ﬂexible (« = 5) and in the other case the central
bank does not respond to output (ç
2
= 0). In
5
(L) we replace the log
linearized consumption Euler equation by an equation that includes lagged
output on the righthand side. Loosely speaking, this speciﬁcation can be
motivated with a model in which agents derive utility from the difference
between individual consumption and the overall level of consumption in
the previous period.
Conditional on parameter vectors reported in Tables 2 and 3 we
generate four data sets with 80 observations each. We use the labels
1
(L),
1
(Q),
2
(L), and
5
(L) to denote data sets generated from
1
(L),
1
(Q),
2
(L), and
5
(L), respectively. By and large, the values for the
DSGE model parameters resemble empirical estimates obtained from post
1982 U.S. data. The sample size is fairly realistic for the estimation of
monetary DSGE model. Many industrialized countries experienced a high
inﬂation episode in the 1970s that ended with a disinﬂation in the 1980s.
Subsequently, the level and the variability of inﬂation and interest rates
have been fairly stable, so that it is not uncommon to estimate constant
coefﬁcient DSGE models based on data sets that begin in the early 1980s.
TABLE 2 Prior distribution and DGPs—linear analysis (Sections 4 and 5)
Prior
DGP
Name Domain Density Para (1) Para (2)
1
(L),
2
(L),
5
(L)
t
+
Gamma 2.00 .50 2.00
«
+
Gamma .20 .10 .15
ç
1
+
Gamma 1.50 .25 1.50
ç
2
+
Gamma .50 .25 1.00
j
R
[0, 1) Beta .50 .20 .60
j
G
[0, 1) Beta .80 .10 .95
j
Z
[0, 1) Beta .66 .15 .65
r
(A)
+
Gamma .50 .50 .40
¬
(A)
+
Gamma 7.00 2.00 4.00
¸
(Q)
Normal .40 .20 .50
100o
R
+
InvGamma .40 4.00 .20
100o
G
+
InvGamma 1.00 4.00 .80
100o
Z
+
InvGamma .50 4.00 .45
Notes: Paras (1) and (2) list the means and the standard deviations for Beta, Gamma, and Normal
distributions; the upper and lower bound of the support for the Uniform distribution; s and v for
the Inverse Gamma distribution, where p
G
(o  v, s) ∝ o
−v−1
e
−vs
2
,2o
2
. The effective prior is truncated
at the boundary of the determinacy region.
130 S. An and F. Schorfheide
TABLE 3 Prior distribution and DGP—Nonlinear analysis (Section 6)
Prior
DGP
Name Domain Density Para (1) Para (2)
1
(Q)
t
+
Gamma 2.00 .50 2.00
«
+
Gamma .30 .20 .33
ç
1
+
Gamma 1.50 .25 1.50
ç
2
+
Gamma .50 .25 .125
j
R
[0, 1) Beta .50 .20 .75
j
G
[0, 1) Beta .80 .10 .95
j
Z
[0, 1) Beta .66 .15 .90
r
(A)
+
Gamma .80 .50 1.00
¬
(A)
+
Gamma 4.00 2.00 3.20
¸
(Q)
Normal .40 .20 .55
100o
R
+
InvGamma .30 4.00 .20
100o
G
+
InvGamma .40 4.00 .60
100o
Z
+
InvGamma .40 4.00 .30
v [0, 1) Beta .10 .05 .10
1,g [0, 1) Beta .85 .10 .85
Notes: See Table 2. Parameters v and 1,g only affect the secondorder accurate
solution of the DSGE model.
Section 4 illustrates the estimation of linearized DSGE models under
correct speciﬁcation, that is, we construct posteriors for
1
(L) and
2
(L) based on the data sets
1
(L) and
2
(L). Section 5 considers
model evaluation techniques. We begin with posterior predictive checks
in Section 5.1, which are implemented based on
1
(L) for data sets
1
(L) (correct speciﬁcation of DSGE model) and
5
(L) (misspeciﬁcation
of the consumption Euler equation). In Section 5.2 we proceed with
the calculation of posterior model probabilities for speciﬁcations
1
(L),
3
(L), and
4
(L) conditional on the data set
1
(L). Subsequently in
Section 5.3 we compare
1
(L) to a vector autoregressive speciﬁcation
using
1
(L) (correct speciﬁcation) and
5
(L) (misspeciﬁcation). Finally,
in Section 6 we study the estimation of DSGE models solved with a second
order perturbation method. We compare posteriors for
1
(Q) and
1
(L)
conditional on
1
(Q).
4. ESTIMATION OF LINEARIZED DSGE MODELS
We will begin by describing two algorithms that can be used to generate
draws from the posterior distribution of 0 and subsequently illustrate their
performance in the context of models
1
(L),
1
(L) and
2
(L),
2
(L).
4.1. Posterior Computations
We consider an RWM algorithm and an IS algorithm to generate draws
from the posterior distribution of 0. Both algorithms require the evaluation
Bayesian Analysis of DSGE Models 131
of (0  Y )p(0). The computation of the nonnormalized posterior density
proceeds in two steps. First, the linear rational expectations system is solved
to obtain the state transition equation (33). If the parameter value 0
implies indeterminacy (or nonexistence of a stable rational expectations
solution), then (0  Y )p(0) is set to zero. If a unique stable solution
exists, then the Kalman ﬁlter
5
is used to evaluate the likelihood function
associated with the linear state–space system (33) and (38). Since the
prior is generated from wellknown densities, the computation of p(0) is
straightforward.
The RWM algorithm belongs to the more general class of Metropolis–
Hastings algorithms. This class is composed of universal algorithms that
generate Markov chains with stationary distributions that correspond
to the posterior distributions of interest. A ﬁrst version of such an
algorithm had been constructed by Metropolis et al. (1953) to solve a
minimization problem and was later generalized by Hastings (1970). Chib
and Greenberg (1995) provide an excellent introduction to Metropolis–
Hastings algorithms. The RWM algorithm was ﬁrst used to generate draws
from the posterior distribution of DSGE model parameters by Schorfheide
(2000) and Otrok (2001). We will use the following implementation based
on Schorfheide (2000):
6
RandomWalk Metropolis (RWM) Algorithm
1. Use a numerical optimization routine to maximize ln(0  Y ) +lnp(0).
Denote the posterior mode by
˜
0.
2. Let
¯
l be the inverse of the Hessian computed at the posterior mode
˜
0.
3. Draw 0
(0)
from (
˜
0, c
2
0
¯
l) or directly specify a starting value.
4. For s = 1, . . . , n
sim
, draw 6 from the proposal distribution (0
(s−1)
, c
2
¯
l).
The jump from 0
(s−1)
is accepted (0
(s)
= 6) with probability
min]1, r (0
(s−1)
, 6  Y )] and rejected (0
(s)
= 0
(s−1)
) otherwise. Here
r (0
(s−1)
, 6  Y ) =
(6  Y )p(6)
(0
(s−1)
 Y )p(0
(s−1)
)
.
5. Approximate the posterior expected value of a function h(0) by
1
n
sim
n
sim
s=1
h(0
(s)
).
5
Since according to our model s
t
is stationary, the Kalman ﬁlter can be initialized with the
unconditional distribution of s
t
. To make the estimation in Section 4 comparable to the DSGE
VAR analysis presented in Section 5.3 we adjust the likelihood function to condition on the ﬁrst
four observations in the sample: (0  Y ),(0  y
1
, . . . , y
4
). These four observations are later used to
initialize the lags of a VAR.
6
This version of the algorithm is available in the userfriendly DYNARE (2005) package, which
automates the Bayesian estimation of linearized DSGE models and provides routines to solve DSGE
models with higherorder perturbation methods.
132 S. An and F. Schorfheide
Under fairly general regularity conditions, e.g., Walker (1969),
Crowder (1988), and Kim (1998), the posterior distribution of 0 will be
asymptotically normal. The algorithm constructs a Gaussian approximation
around the posterior mode and uses a scaled version of the asymptotic
covariance matrix as the covariance matrix for the proposal distribution.
This allows for an efﬁcient exploration of the posterior distribution at least
in the neighborhood of the mode.
The maximization of the posterior density kernel is carried out with
a version of the BFGS quasiNewton algorithm, written by Chris Sims
for the maximum likelihood estimation of a DSGE model conducted in
Leeper and Sims (1994). The algorithm uses a fairly simple line search
and randomly perturbs the search direction if it reaches a cliff caused by
nonexistence or nonuniqueness of a stable rational expectations solution
for the DSGE model. Prior to the numerical maximization we transform
all parameters so that their domain is unconstrained. This parameter
transformation is only used in Step 1 of the RWM algorithm. The elements
of the Hessian matrix in Step 2 are computed numerically for various
values of the increment d0. There typically exists a range for d0 over which
the derivatives are stable. As d0 approaches zero the numerical derivatives
will eventually deteriorate owing to inaccuracies in the evaluation of
the objective function. While Steps 1 and 2 are not necessary for the
implementation of the RWM algorithm, they are often helpful.
The RWM algorithm generates a sequence of dependent draws from
the posterior distribution of 0 that can be averaged to approximate
posterior moments. Geweke (1999a, 2005) reviews regularity conditions
that guarantee the convergence of the Markov chain generated by
Metropolis–Hastings algorithms to the posterior distribution of interest
and the convergence of
1
n
sim
n
sim
s=1
h(0
(s)
) to the posterior expectations
[h(0)  Y ].
DeJong et al. (2000) used an IS algorithm to calculate posterior
moments of the parameters of a linearized stochastic growth model. The
idea of the algorithm is based on the identity
[h(0)  Y ] =
_
h(0)p(0  Y )d0 =
_
h(0)p(0  Y )
q(0)
q(0)d0.
Draws from the posterior density p(0  Y ) are replaced by draws from
the density q(0) and reweighted by the importance ratio p(0  Y ),q(0) to
obtain a numerical approximation of the posterior moment of interest.
Hammersley and Handscomb (1964) were among the ﬁrst to propose this
method and Geweke (1989) provides important convergence results. The
particular version of the IS algorithm used subsequently is of the following
form.
Bayesian Analysis of DSGE Models 133
Importance Sampling (IS) Algorithm
1. Use a numerical optimization routine to maximize ln(0  Y ) +lnp(0).
Denote the posterior mode by
˜
0.
2. Let
¯
l be the inverse of the Hessian computed at the posterior mode
˜
0.
3. Let q(0) be the density of a multivariate t distribution with mean
˜
0, scale
matrix c
2
¯
l, and v degrees of freedom.
4. For s = 1, . . . , n
sim
generate draws 0
(s)
from q(0).
5. Compute ˜ w
s
= (0
(s)
 Y )p(0
(s)
),q(0
(s)
) and w
s
= ˜ w
s
,
n
sim
s=1
˜ w
s
.
6. Approximate the posterior expected value of a function h(0) by
n
sim
s=1
w(0
(s)
)h(0
(s)
).
The accuracy of the IS approximation depends on the similarity
between posterior kernel and importance density. If the two are equal, then
the importance weights w
s
are constant and one averages independent
draws to approximate the posterior expectations of interest. If the
importance density concentrates its mass in a region of the parameter
space in which the posterior density is very low, then most of the w
s
’s will be
much smaller than 1,n
sim
and the approximation method is inefﬁcient. We
construct a Gaussian approximation of the posterior near the mode, scale
the asymptotic covariance matrix, and replace the normal distribution with
a fattailed t distribution to implement the algorithm.
4.2. Simulation Results for
1
(L)/
1
(L)
We begin by estimating the loglinearized output gap rule speciﬁcation
1
(L) based on data set
1
(L). We use the RWM algorithm (c
0
= 1, c =
0.3) to generate 1 million draws from the posterior distribution. The
rejection rate is about 45%. Moreover, we generate 200 independent draws
from the truncated prior distribution. Figure 1 depicts the draws from
the prior as well as every 5,000th draw from the posterior distribution
in twodimensional scatter plots. We also indicate the location of the
posterior mode in the 12 panels of the ﬁgure. A visual comparison of
priors and posteriors suggests that the 80 observations sample contains
little information on the riskaversion parameter t and the policy rule
coefﬁcients ç
1
and ç
2
. There is, however, information about the degree
of pricestickiness, captured by the slope coefﬁcient « of the Phillipscurve
relationship (30). Moreover, the posteriors of steadystate parameters, the
autocorrelation coefﬁcients, and the shock standard deviations are sharply
peaked relative to the prior distributions. The lack of information about
some of the key structural parameters resembles the empirical ﬁndings
based on actual observations.
There exists an extensive literature on convergence diagnostics for
MCMC methods such as the RWM algorithm. An introduction to this
134 S. An and F. Schorfheide
FIGURE 1 Prior and posterior – Model
1
(L), Data
1
(L). The panels depict 200 draws from
prior and posterior distributions. Intersections of solid lines signify posterior mode values.
literature can be found, for instance, in Robert and Casella (1999).
These authors distinguish between convergence of the Markov chain to
its stationary distribution, convergence of empirical averages to posterior
moments, and convergence to iid sampling. While many convergence
diagnostics are based on formal statistical tests, we will consider informal
graphical methods in this paper. More speciﬁcally, we will compare draws
and recursively computed means from multiple chains.
We run four independent Markov chains. Except for t and ç
2
, all
parameters are initialized at their posterior mode values. The initial values
for (t, ç
2
) are (3.0, .1), (3.0, 2.0), (.4, .1), and (.6, 2.0), respectively. As
before, we set c = 0.3, generate 1 million draws for each chain, and plot
every 5,000th draw of (t, ç
2
) in Figure 2. Panels (1, 1) and (1, 2) of the
ﬁgure depict posterior contours at the posterior mode. Visual inspection of
Bayesian Analysis of DSGE Models 135
FIGURE 2 Draws from multiple chains – Model
1
(L), Data
1
(L). Panels (1,1) and (1,2): contours
of posterior density at “low” and “high” mode as function of t and ç
2
. Panels (2,1) to (3,2): 200
draws from four Markov chains generated by the Metropolis Algorithm. Intersections of solid lines
signify posterior mode values.
the plots suggests that all four chains, despite the use of different starting
values, converge to the same (stationary) distribution and concentrate in
the highposterior density region. Figure 3 plots recursive means for the
four chains. Despite different initializations, the means converge in the
long run.
We generate another set of 1 million draws using the IS algorithm with
v = 3 and c = 1.5. As for the draws from the RWM algorithm, we compute
recursive means. Since neither the IS nor the RWM approximation of the
posterior means are exact, we construct standard error estimates. For the
importance sampler we follow Geweke (1999b), and for the Metropolis
chain we use Newey–West standard error estimates, truncated at 140 lags.
7
In Figure 4 we plot (recursive) conﬁdence intervals for the RWM and the
IS approximation of the posterior means. For most parameters the two
7
It is not guaranteed that the Newey–West standard errors are formally valid.
136 S. An and F. Schorfheide
FIGURE 3 Recursive means from multiple chains – Model
1
(L), Data
1
(L). Each line
corresponds to recursive means (as a function of the number of draws) calculated from one of the
four Markov chains generated by the Metropolis Algorithm.
conﬁdence intervals overlap. Exceptions are, for instance, «, r
(A)
, j
g
, and
o
g
. However, the magnitude of the discrepancy is generally small relative to,
say, the difference between the posterior mode and the estimated posterior
means.
4.3. A Potential Pitfall: Simulation Results for
2
(L)/
2
(L)
We proceed by estimating the output growth rule version
2
(L) of the
DSGE model based on 80 observations generated from this model (Data
Set
2
(L)). Unlike the posterior for the output gap version, the
2
(L)
posterior has (at least) two modes. One of the modes, which we label
the “high” mode,
˜
0
(h)
, is located at t = 2.06 and ç
2
= .97. The value of
Bayesian Analysis of DSGE Models 137
FIGURE 4 RWM algorithm vs. Importance Sampling – Model
1
(L), Data
1
(L). Panels depict
posterior modes (solid), recursively computed 95% bands for posterior means based on the
metropolis algorithm (dotted) and the importance sampler (dashed).
ln[(0  Y )p(0)] is equal to −175.48. The second (“low”) mode,
˜
0
(l )
, is
located at t = 1.44 and ç
2
= .81 and attains the value −183.23. The ﬁrst
two panels of Figure 5 depict the contours of the nonnormalized posterior
density as a function of t and ç
2
at the low and the high mode, respectively.
Panel (1,1) is constructed by setting 0 =
˜
0
(l )
and then graphing posterior
contours for t ∈ [0, 3] and ç
2
∈ [0, 2.5], keeping all other parameters ﬁxed
at their respective
˜
0
(l )
values. Similarly, Panel (1,2) is obtained by exploring
the shape of the posterior as a function of t and ç
2
, ﬁxing the other
parameters at their respective
˜
0
(h)
values. The intersection of the solid
lines in panels (1,1), (2,1), and (3,1) signify the low mode, whereas the
solid lines in the remaining panels indicate the location of the high mode.
138 S. An and F. Schorfheide
FIGURE 5 Draws from multiple chains – Model
2
(L), Data
2
(L). Panels (1,1) and (1,2): contours
of posterior density at “low” and “high” mode as function of t and ç
2
. Panels (2,1) to (3,2): 200
draws from four Markov chains generated by the metropolis algorithm. Intersections of solid lines
signify “low” (left panels) and “high” (right panels) posterior mode values.
The two modes are separated by a deep valley which is caused by complex
eigenvalues of the state transition matrix.
As in Section 4.2 we generate 1 million draws each for model
2
from
four Markov chains that were initialized at the following values for (t; ç
2
):
(3.0; .1), (3.0; 2.0), (.4; .1), and (.6; 2.0). The remaining parameters were
initialized at the low (high) mode for Chains 1 and 3 (2 and 4).
8
We plot
every 5,000th draw of t and ç
2
from the four Markov chains in Panels (2,1)
to (3,2). Unlike in Panels (1,1) and (1,2), the remaining parameters are
not ﬁxed at their
˜
0
(l )
and
˜
0
(h)
values. It turns out that Chains 1 and 3
explore the posterior distribution locally in the neighborhood of the low
mode, whereas Chains 2 and 4 move through the posterior surface near the
high mode. Given the conﬁguration of the RWM algorithm the likelihood
8
The covariance matrices of the proposal distributions are obtained from the Hessians
associated with the respective modes.
Bayesian Analysis of DSGE Models 139
of crossing the valley that separates the two modes is so small that it did not
occur. The recursive means associated with the four chains are plotted in
Figure 6. They exhibit two limit points, corresponding to the two posterior
modes. While each chain appears to be stable, it only explores the posterior
in the neighborhood of one of the modes.
We explored various modiﬁcations of the RWM algorithm by changing
the scaling factor c and by setting the offdiagonal elements of
¯
l to zero.
Nevertheless, 1,000,000 draws were insufﬁcient for the chains initialized
at the low mode to cross over to the neighborhood of the high mode.
Two remarks are in order. First, extremum estimators that are computed
with numerical optimization methods suffer from the same problem as
FIGURE 6 Recursive means from multiple chains – Model
2
(L), Data
(
L). Each line corresponds
to recursive means (as a function of the number of draws) calculated from one of the four Markov
chains generated by the metropolis algorithm.
140 S. An and F. Schorfheide
the Bayes estimators in this example. One might ﬁnd a local rather than
the global extremum of the objective function. Hence, it is good practice
to start the optimization from different points in the parameter space
to increase the likelihood that the global optimum is found. Similarly,
in Bayesian computation it is helpful to start MCMC methods from
different regions of the parameter space, or vary the distribution q(0)
in the importance sampler. Second, in many applications as well as in
this example there exists a global mode that dominates the local modes.
Hence an exploration of the posterior in the neighborhood of the global
mode might still provide a good approximation to the overall posterior
distribution. All subsequent computations are based on the output gap rule
speciﬁcation of the DSGE model.
4.4. Parameter Transformations for
1
(L)/
1
(L)
Macroeconomists are often interested in posterior distributions of
parameter transformations h(0) to address questions such as what fraction
of the variation in output growth is caused by monetary policy shocks, and
what happens to output and inﬂation in response to a monetary policy
shock. Answers can be obtained from the moving average representation
associated with the state space model composed of (33) and (38). We will
focus on variance decompositions in the remainder of this subsection.
Fluctuations of output growth, inﬂation, and nominal interest rate
in the DSGE model are due to three shocks: technology growth shocks,
government spending shocks, and monetary policy shocks. Hence variance
decompositions of the endogenous variables lie in a threedimensional
simplex, which can be depicted as a triangle in
2
. In Figure 7 we plot
200 draws from the prior distribution (left panels) and 200 draws from the
posterior (right panels) of the variance decompositions of output growth
and inﬂation. The posterior draws are obtained by converting every 5,000th
draw generated with the RWM algorithm. The corners Z, G, and R of the
triangles correspond to decompositions in which the shocks e
z
, e
g
, and e
R
explain 100% of the variation, respectively.
Panels (1,1) and (2,1) indicate that the prior distribution of the
variance decomposition is informative in the sense that it is not uniform
over the simplex. For instance, the output gap policy rule speciﬁcation
1
implies that the government spending shock does not affect inﬂation and
nominal interest rates. Hence all prior and posterior draws concentrate on
the R −Z edge of the simplex. While the prior mean of the fraction of
inﬂation variability explained by the monetary policy shock is about 40%,
a 90% probability interval ranges from 0 to 95%. A priori about 10% of
the variation in output growth is due to monetary policy shocks. A 90%
probability interval ranges from 0 to 20%. The data provide additional
information about the variance decomposition. The posterior distribution
Bayesian Analysis of DSGE Models 141
FIGURE 7 Prior and posterior variance decompositions – Model
1
(L), Data
1
(L). The panels
depict 200 draws from prior and posterior distributions arranged on a 3dimensional simplex. The
three corners (Z,G,R) correspond to 100% of the variation being due to the shocks e
z,t
, e
g ,t
, and
e
R,t
, respectively.
is much more concentrated than the prior. The probability interval for
the contribution of monetary policy shocks to output growth ﬂuctuations
shrinks to the range from 1 to 4.5%. The posterior probability interval for
the fraction of inﬂation variability explained by the monetary policy shock
ranges from 15 to 37%.
4.5. Empirical Applications
There is a rapidly growing empirical literature on the Bayesian
estimation of DSGE models that applies the techniques described in
this paper. The following incomplete list of references aims to give an
overview of this work. Preceding the Bayesian literature are papers that
use maximum likelihood techniques to estimate DSGE models. Altug
(1989) estimates Kydland and Prescott’s (1982) timetobuild model, and
McGrattan (1994) studies the macroeconomic effects of taxation in an
estimated business cycle model. Leeper and Sims (1994) and Kim (2000)
estimated DSGE models that are usable for monetary policy analysis.
Canova (1994), DeJong et al. (1996), and Geweke (1999a) proposed
Bayesian approaches to calibration that do not exploit the likelihood
function of the DSGE model and provide empirical applications that assess
business cycle and asset pricing implications of simple stochastic growth
142 S. An and F. Schorfheide
models. The literature on likelihoodbased Bayesian estimation of DSGE
models began with work by LandonLane (1998), DeJong et al. (2000),
Schorfheide (2000), and Otrok (2001). DeJong et al. (2000) estimate a
stochastic growth model and examine its forecasting performance, Otrok
(2001) ﬁts a real business cycle with habit formation and timetobuild to
the data to assess the welfare costs of business cycles, and Schorfheide
(2000) considers cashinadvance monetary DSGE models. DeJong and
Ingram (2001) study the cyclical behavior of skill accumulation, whereas
Chang et al. (2002) estimate a stochastic growth model augmented with a
learningbydoing mechanism to amplify the propagation of shocks. Chang
and Schorfheide (2003) study the importance of labor supply shocks
and estimate a homeproduction model. FernándezVillaverde and Rubio
Ramírez (2004) use Bayesian estimation techniques to ﬁt a cattle–cycle
model to the data.
Variants of the smallscale New Keynesian DSGE model presented in
Section 2 have been estimated by Rabanal and RubioRamírez (2005a,b)
for the U.S. and the Euro Area. Lubik and Schorfheide (2004) estimate the
benchmark New Keynesian DSGE model without restricting the parameters
to the determinacy region of the parameter space. Schorfheide (2005)
allows for regimeswitching of the target inﬂation level in the monetary
policy rule. Canova (2004) estimates a smallscale New Keynesian model
recursively to assess the stability of the structural parameters over time.
Galí and Rabanal (2005) use an estimated DSGE model to study the effect
of technology shocks on hours worked. Largescale models that include
capital accumulation and additional real and nominal frictions along the
lines of Christiano et al. (2005) have been analyzed by Smets and Wouters
(2003, 2005) both for the U.S. and the Euro Area. Models similar to Smets
and Wouters (2003) have been estimated by Laforte (2004), Onatski and
Williams (2004), and Levin et al. (2006) to study monetary policy. Many
central banks are in the process of developing DSGE models along the
lines of Smets and Wouters (2003) that can be estimated with Bayesian
techniques and used for policy analysis and forecasting.
Bayesian estimation techniques have also been used in the open
economy literature. Lubik and Schorfheide (2003) estimate the small
open economy extension of the model presented in Section 2 to examine
whether the central banks of Australia, Canada, England, and New Zealand
respond to exchange rates. A similar model is ﬁtted by Del Negro (2003) to
Mexican data. Justiniano and Preston (2004) extend the empirical analysis
to situations of imperfect exchange rate passthrough. Adolfson et al.
(2004) analyze an open economy model that includes capital accumulation
as well as numerous real and nominal frictions. Lubik and Schorfheide
(2006), Rabanal and Tuesta (2006), and de Walque and Wouters (2004)
have estimated multicountry DSGE models.
Bayesian Analysis of DSGE Models 143
5. Model Evaluation
In Section 4 we discussed the estimation of a linearized DSGE model
and reported results on the posterior distribution of the model parameters
and variance decompositions. We tacitly assumed that model and prior
provide an adequate probabilistic representation of the uncertainty with
respect to data and parameters. This section studies various techniques that
can be used to evaluate the model ﬁt. We will distinguish the assessment of
absolute ﬁt from techniques that aim to determine the ﬁt of a DSGE model
relative to some other model. The ﬁrst approach can be implemented
by a posterior predictive model check and has the ﬂavor of a classical
hypothesis test. Relative model comparisons, on the other hand, are
typically conducted by enlarging the model space and applying Bayesian
inference and decision theory to the extended model space. Section 5.1
discusses Bayesian model checks, Section 5.2 reviews model posterior odds
comparisons, and Section 5.3 describes a more elaborate model evaluation
based on comparisons between DSGE models and VARs.
5.1. Posterior Predictive Checks
Predictive checks as a tool to assess the absolute ﬁt of a probability
model have been advocated, for instance, by Box (1980). A probability
model is considered as discredited by the data if one observes an
event that is deemed very unlikely by the model. Such model checks
are controversial from a Bayesian perspective. Methods that determine
whether actual data lie in the tail of a model’s data distribution potentially
favor alternatives that make unreasonably diffuse predictions. Nevertheless,
posterior predictive checks have become a valuable tool in applied
Bayesian analysis though they have not been used much in the context of
DSGE models. An introduction to Bayesian model checking can be found,
for instance, in books by Gelman et al. (1995), Lancaster (2004), and
Geweke (2005).
Let Y
rep
be a sample of observations of length T that we could
have observed in the past or that we might observe in the future. We
can derive the predictive distribution of Y
rep
given the current state of
knowledge:
p(Y
rep
 Y ) =
_
p(Y
rep
 0)p(0  Y )d0. (44)
Let h(Y ) be a test quantity that reﬂects an aspect of the data that we
want to examine. A quantitative model check can be based on Bayesian
pvalues. Suppose that the test quantity is univariate, nonnegative, and
has a unimodal density. Then one could compute probability of the tail
144 S. An and F. Schorfheide
event by
_
]h(Y
rep
) ≥ h(Y )]p(Y
rep
 Y )dY
rep
, (45)
where ]x ≥ a] is the indicator function that is 1 if x ≥ : and zero
otherwise. A small tail probability can be viewed as evidence against model
adequacy.
Rather than constructing numerical approximations of the tail
probabilities for univariate functions h(·), we use a graphical approach to
illustrate the model checking procedure in the context of model
1
(L). We
use the following data transformations: the correlation between inﬂation
and lagged interest rates, lagged inﬂation and current interest rates,
output growth and lagged output growth, and output growth and interest
rates. To obtain draws from the posterior predictive distribution of h(Y
rep
)
we take every 5,000th parameter draw of 0
(s)
generated with the RWM
algorithm, simulate a sample Y
rep
of 80 observations
9
from the DSGE model
conditional on 0
(s)
, and calculate h(Y
rep
).
We consider two cases: in the case of no misspeciﬁcation, depicted in
the left panels of Figure 8, the posterior is constructed based on data set
1
(L), whereas under misspeciﬁcation, illustrated in the two right panels
of Figure 8, the posterior is obtained from data set
5
(L). Each panel
of the ﬁgure depicts 200 draws from the posterior predictive distribution
in bivariate scatter plots. Moreover, the intersections of the solid lines
indicate the actual values h(Y ). Since
1
(L) was generated from model
1
(L), whereas
5
(L) was not, we would expect the actual values of h(Y )
to lie further in the tails of the posterior predictive distribution in the
right panels than in the left panels. Indeed a visual inspection of the plots
provides some evidence in which dimensions
1
(L) is at odds with data
generated from a model that includes lags of output in the household’s
Euler equation. The observed autocorrelation of output growth in
5
(L) is
much larger and the actual correlation between output growth and interest
rates is much lower than the draws generated from the posterior predictive
distribution.
5.2. Posterior Odds Comparisons of DSGE Models
The Bayesian framework is naturally geared toward the evaluation
of relative model ﬁt. Researchers can place probabilities on competing
models and assess alternative speciﬁcations based on their posterior odds.
An excellent survey on model comparisons based on posterior probabilities
9
To obtain draws from the unconditional distribution of y
t
we initialize s
0
= 0 and generate
180 observations, discarding the ﬁrst 100.
Bayesian Analysis of DSGE Models 145
FIGURE 8 Posterior predictive check for Model
1
(L). We plot 200 draws from the posterior
predictive distribution of various sample moments. Intersections of solid lines signify the observed
sample moments.
can be found in Kass and Rafterty (1995). For concreteness, suppose in
addition to the speciﬁcation
1
(L) we consider a version of the New
Keynesian model, denoted by
3
(L) in which prices are nearly ﬂexible,
that is, « = 5. Moreover, there is a model
4
(L), according to which the
central bank does not respond to output at all and ç
2
= 0. If we are willing
to place prior probabilities ¬
i,0
on the three competing speciﬁcations then
posterior model probabilities can be computed by
¬
i,T
=
¬
i,0
p(Y 
i
)
j =1,3,4
¬
j ,0
p(Y 
j
)
, i = 1, 3, 4. (46)
The key object in the calculation of posterior probabilities is the marginal
data density p(Y 
i
), which is deﬁned in (39).
Posterior model probabilities can be used to weigh predictions from
different models. In many instances, researchers take a shortcut and
use posterior probabilities to justify the selection of a particular model
speciﬁcation. All subsequent analysis is then conditioned on the chosen
model. It is straightforward to verify that under a 0–1 loss function (the
loss attached to choosing the wrong model is one), the optimal decision
is to select the highest posterior probability model. This 0–1 loss function
is an attractive benchmark in situations in which the researcher believes
that all the important models are included in the analysis. However, even
in situations in which all the models under consideration are misspeciﬁed,
146 S. An and F. Schorfheide
the selection of the highest posterior probability model has some desirable
properties. It has been shown under various regularity conditions, e.g.,
Phillips (1996) and FernándezVillaverde and RubioRamírez (2004), that
posterior odds (or their large sample approximations) asymptotically favor
the DSGE model that is closest to the ‘true’ data generating process in the
KullbackLeibler sense. Moreover, since the logmarginal data density can
be rewritten as
lnp(Y  ) =
T
t =1
lnp(y
t
 Y
t −1
, )
=
T
t =1
ln
__
p(y
t
 Y
t −1
, 0, )p(0  Y
t −1
, )d0
_
, (47)
where lnp(Y  ) can be interpreted as a predictive score (Good,
1952) and the model comparison based on posterior odds captures the
relative onestepahead predictive performance. The practical difﬁculty
in implementing posterior odds comparisons is the computation of the
marginal data density. We will subsequently consider two numerical
approaches: Geweke’s (1999b) modiﬁed harmonic mean estimator and
Chib and Jeliazkov’s (2001) algorithm.
Harmonic mean estimators are based on the identity
1
p(Y )
=
_
f (0)
(0  Y )p(0)
p(0  Y )d0, (48)
where f (0) has the property that
_
f (0)d0 = 1 (see Gelfand and Dey,
1994). Conditional on the choice of f (0) an obvious estimator is
ˆ
p
G
(Y ) =
_
1
n
sim
n
sim
s=1
f (0
(s)
)
(0
(s)
 Y )p(0
(s)
)
_
−1
, (49)
where 0
(s)
is drawn from the posterior p(0  Y ). To make the numerical
approximation efﬁcient, f (0) should be chosen so that the summands are
of equal magnitude. Geweke (1999b) proposed to use the density of a
truncated multivariate normal distribution,
f (0) = t
−1
(2¬)
−d,2
 V
0

−1,2
exp
_
−0.5(0 −
¯
0)
V
−1
0
(0 −
¯
0)
_
×
_
(0 −
¯
0)
V
−1
0
(0 −
¯
0) ≤ F
−1
,
2
d
(t)
_
. (50)
Here
¯
0 and V
0
are the posterior mean and covariance matrix computed
from the output of the posterior simulator, d is the dimension of the
Bayesian Analysis of DSGE Models 147
parameter vector, F
,
2
d
is the cumulative density function of a ,
2
random
variable with d degrees of freedom, and t ∈ (0, 1). If the posterior of 0
is in fact normal then the summands in (49) are approximately constant.
Chib and Jeliazkov (2001) use the following equality as the basis for their
estimator of the marginal data density:
p(Y ) =
(0  Y )p(0)
p(0  Y )
. (51)
The equation is obtained by rearranging the Bayes theorem and has
to hold for all 0. While the numerator can be easily computed, the
denominator requires a numerical approximation. Thus,
ˆ
p
CS
(Y ) =
(
˜
0  Y )p(
˜
0)
ˆ
p(
˜
0  Y )
, (52)
where we replaced the generic 0 in (51) by the posterior mode
˜
0. Within
the RWM Algorithm denote the probability of moving from 0 to 6 by
:(0, 6  Y ) = min]1, r (0, 6  Y )], (53)
where r (0, 6  Y ) was in the description of the algorithm. Moreover, let
q(0,
˜
0  Y ) be the proposal density for the transition from 0 to
˜
0. Then the
posterior density at the mode can be approximated by
ˆ
p(
˜
0  Y ) =
1
n
sim
n
sim
s=1
:(0
(s)
,
˜
0  Y )q(0
(s)
,
˜
0  Y )
J
−1
J
j =1
:(
˜
0, 0
(j )
 Y )
, (54)
where ]0
(s)
]
n
sim
s=1
are sampled draws from the posterior distribution with the
RWM algorithm and ]0
(j )
]
J
j =1
are draws from q(
˜
0, 0  Y ) given the posterior
mode value
˜
0.
We ﬁrst use Geweke’s harmonic mean estimator to approximate
the marginal data density associated with
1
(L),
1
(L). The results are
summarized in Figure 9. We calculate marginal data densities recursively
(as a function of the number of MCMC draws) for the four chains that
were initialized at different starting values as discussed in Section 4. For
the output gap rule all four chains lead to estimates of around −196.7.
Figure 10 provides a comparison of Geweke’s versus Chib and Jeliazkov’s
approximation of the marginal data density for the output gap rule
speciﬁcation. A visual inspection of the plot suggests that both estimators
converge to the same limit point. However, the modiﬁed harmonic mean
estimator appears to be more reliable if the number of simulations is small.
148 S. An and F. Schorfheide
FIGURE 9 Data densities from multiple chains – Model
1
(L), Data
1
(L). For each Markov
chain, log marginal data densities are computed recursively with Geweke’s modiﬁed harmonic mean
estimator and plotted as a function of the number of draws.
FIGURE 10 Geweke vs. Chib–Jeliazkov data densities – Model
1
(L), Data
1
(L). Log marginal
data densities are computed recursively with Geweke’s modiﬁed harmonic mean estimator as well
as the Chib–Jeliazkov estimator and plotted as a function of the number of draws.
Bayesian Analysis of DSGE Models 149
TABLE 4 Log marginal data densities based on
1
(L)
Speciﬁcation lnp(Y  ) Bayes factor versus
1
(L)
Benchmark model
1
(L) −196.7 1.00
Model with nearly ﬂexible prices
3
(L) −245.6 exp[48.9]
No policy reaction to output
4
(L) −201.9 exp[5.2]
Notes: The log marginal data densities for the DSGE model speciﬁcations are
computed based on Geweke’s (1999a) modiﬁed harmonic mean estimator.
Based on data set
1
(L) we also estimate speciﬁcation
3
(L) in which
prices are nearly ﬂexible and speciﬁcation
4
(L) in which the central
bank does not respond to output. The model comparison results are
summarized in Table 4. The marginal data density associated with
3
(L)
is −245.6, which translates into a Bayes factor (ratio of posterior odds
to prior odds) of approximately e
49
in favor of
1
(L). Hence, data set
1
(L) provides very strong evidence against ﬂexible prices. Given the
fairly concentrated posterior of « depicted in Figure 1 this result is not
surprising. The marginal data density of model
4
(L) is equal to −201.9
and the Bayes factor of model
1
(L) versus model
4
(L) is ‘only’ e
5
. The
DSGE model implies that when actual output is close to the target ﬂexible
price output, inﬂation will also be close to its target value. Vice versa,
deviations of output from target coincide with deviations of inﬂation from
its target value. This mechanism makes it difﬁcult to identify the policy rule
coefﬁcients and imposing an incorrect value for ç
2
is not particularly costly
in terms of ﬁt.
5.3. Comparison of a DSGE Model with a VAR
The notion of potential misspeciﬁcation of a DSGE model can be
incorporated in the Bayesian framework by including a more general
reference model
0
into the model set. A natural choice of reference
model in dynamic macroeconomics is a VAR, as linearized DSGE models,
at least approximately, can be interpreted as restrictions on a VAR
representation. The ﬁrst step of the comparison is typically to compute
posterior probabilities for the DSGE model and the VAR, which can be
used to detect the presence of misspeciﬁcations. Since the VAR parameter
space is generally much larger than the DSGE model parameter space,
the speciﬁcation of a prior distribution for the VAR parameter requires
careful attention. Possible pitfalls are discussed in Sims (2003). A VAR with
a prior that is very diffuse is likely to be rejected even against a misspeciﬁed
DSGE model. In a more general context this phenomenon is often called
Lindley’s paradox. We will subsequently present a procedure that allows us
to document how the marginal data density of the DSGE model changes as
150 S. An and F. Schorfheide
the crosscoefﬁcient restrictions that the DSGE model imposes on the VAR
are relaxed.
If the data favor the VAR over the DSGE model then it becomes
important to investigate further the deﬁciencies of the structural model.
This can be achieved by comparing the posterior distributions of
interesting population characteristics such as impulse response functions
obtained from the DSGE model and from a VAR representation that does
not dogmatically impose the DSGE model restrictions. We will refer to
the latter benchmark model as DSGEVAR and discuss an identiﬁcation
scheme that allows us to construct structural impulse response functions
for the vector autoregressive speciﬁcation against which the DSGE model
can be evaluated. Building on work by Ingram and Whiteman (1994) the
DSGEVAR approach of Del Negro and Schorfheide (2004) was designed
to improve forecasting and monetary policy analysis with VARs. The
framework has been extended to a model evaluation tool in Del Negro
et al. (2006) and used to assess the ﬁt of a variant of the Smets and Wouters
(2003) model.
To construct a DSGEVAR we proceed as follows. Consider a vector
autoregressive speciﬁcation of the form
y
t
= 4
0
+4
1
y
t −1
+· · · +4
p
y
t −p
+u
t
, [u
t
u
t
] = l. (55)
Deﬁne the k ×1 vector x
t
= [1, y
t −1
, . . . , y
t −p
]
, the coefﬁcient matrix
4 = [4
0
, 4
1
, . . . , 4
p
]
, the T ×n matrices Y and U composed of rows y
t
and
u
t
, and the T ×k matrix X with rows x
t
. Thus the VAR can be written as
Y = X4 +U. Let
D
0
[·] be the expectation under DSGE model and deﬁne
the autocovariance matrices
!
XX
(0) =
D
0
[x
t
x
t
], !
XY
(0) =
D
0
[x
t
y
t
].
A VAR approximation of the DSGE model can be obtained from restriction
functions that relate the DSGE model parameters to the VAR parameters,
4
∗
(0) = !
−1
XX
(0)!
XY
(0), l
∗
(0) = !
YY
(0) −!
YX
(0)!
−1
XX
(0)!
XY
(0). (56)
This approximation is typically not exact because the state–space
representation of the linearized DSGE model generates moving average
terms. Its accuracy depends on the number of lags p, and the magnitude
of the roots of the moving average polynomial. We will document below
that four lags are sufﬁcient to generate a fairly precise approximation of
the model
1
(L).
10
10
In principle one could start from a VARMA model to avoid the approximation error. The
posterior computations, however, would become signiﬁcantly more cumbersome.
Bayesian Analysis of DSGE Models 151
In order to account for potential misspeciﬁcation of the DSGE model
it is assumed that there is a vector 0 and matrices 4
A
and l
A
such that the
data are generated from the VAR in (55) with the coefﬁcient matrices
4 = 4
∗
(0) +4
A
, l = l
∗
(0) +l
A
. (57)
The matrices 4
A
and l
A
capture deviations from the restriction functions
4
∗
(0) and l
∗
(0).
Bayesian analysis of this model requires the speciﬁcation of a prior
distribution for the DSGE model parameters, p(0), and the misspeciﬁcation
matrices. Rather than specifying a prior in terms of 4
A
and l
A
it is
convenient to specify it in terms of 4 and l conditional on 0. We assume
l 0 ∼ W
_
zTl
∗
(0), zT −k, n
_
(58)
4 l, 0 ∼
_
4
∗
(0),
1
zT
_
l
−1
⊗ !
XX
(0)
_
−1
_
,
where W denotes the inverted Wishart distribution.
11
The prior
distribution can be interpreted as a posterior calculated from a sample of
zT observations generated from the DSGE model with parameters 0; see
Del Negro and Schorfheide (2004). It has the property that its density is
approximately proportional to the Kullback–Leibler discrepancy between
the VAR approximation of the DSGE model and the 4l VAR, which is
emphasized in Del Negro et al. (2006). z is a hyperparameter that scales the
prior covariance matrix. The prior is diffuse for small values of z and shifts
its mass closer to the DSGE model restrictions as z −→∞. In the limit the
VAR is estimated subject to the restrictions (56). The prior distribution is
proper provided that zT ≥ k +n.
The joint posterior density of VAR and DSGE model parameters can be
conveniently factorized as
p
z
(4, l, 0  Y ) = p
z
(4, l Y , 0)p
z
(0  Y ). (59)
The zsubscript indicates the dependence of the posterior on the
hyperparameter. It is straightforward to show, e.g., Zeller (1971), that the
posterior distribution of 4 and l is also of the inverted Wishart normal
form:
l Y , 0 ∼ W
_
(1 +z)T
¨
l
b
(0), (1 +z)T −k, n
_
4 Y , l, 0 ∼
_
¨
4
b
(0), l ⊗ (zT!
XX
(0) +X
X)
−1
_
,
(60)
11
Ingram and Whiteman (1994) constructed a VAR prior from a stochastic growth model
to improve the forecast performance of the VAR. However, their setup did not allow for the
computation of a posterior distribution for 0.
152 S. An and F. Schorfheide
where
¨
4
b
(0) and
¨
l
b
(0) are the given by
¨
4
b
(0) =
_
z
1 +z
!
XX
(0) +
1
1 +z
X
X
T
_
−1
_
z
1 +z
!
XY
+
1
1 +z
X
Y
T
_
¨
l
b
(0) =
1
(1 +z)T
(zT!
YY
(0) +Y
Y ) −(zT!
YX
(0) +Y
X)
×(zT!
XX
(0) +X
X)
−1
(zT!
XY
(0) +X
Y )].
Hence the larger the weight z of the prior, the closer the posterior mean
of the VAR parameters is to 4
∗
(0) and l
∗
(0), the values that respect
the crossequation restrictions of the DSGE model. On the other hand,
if z = (n +k),T, then the posterior mean is close to the OLS estimate
(X
X)
−1
X
Y . The marginal posterior density of 0 can be obtained through
the marginal likelihood
p
z
(Y  0) =
zT!
XX
(0) +X
X
−
n
2
(1 +z)T
¨
l
b
(0)
−
(1+z)T−k
2
zT!
XX
(0)
−
n
2
zTl
∗
(0)
−
zT−k
2
×
(2¬)
−nT,2
2
n((1+z)T−k)
2
n
i=1
![((1 +z)T −k +1 −i),2]
2
n(zT−k)
2
n
i=1
![(zT −k +1 −i),2]
.
A derivation is provided in Del Negro and Schorfheide (2004). The paper
also shows that in large samples the resulting estimator of 0 can be
interpreted as a Bayesian minimum distance estimator that projects the
VAR coefﬁcient estimates onto the subspace generated by the restriction
functions (56).
Since the empirical performance of the DSGEVAR procedure crucially
depends on the weight placed on the DSGE model restrictions, it is
important to consider a datadriven procedure to determine z. A natural
criterion for the choice of z in a Bayesian framework is the marginal data
density
p
z
(Y ) =
_
p
z
(Y  0)p(0)d0. (61)
For computational reasons we restrict the hyperparameter to a ﬁnite grid
A. If one assigns equal prior probability to each grid point then the
normalized p
z
(Y )’s can be interpreted as posterior probabilities for z.
Del Negro et al. (2006) emphasize that the posterior of z provides a
measure of ﬁt for the DSGE model: high posterior probabilities for large
Bayesian Analysis of DSGE Models 153
values of z indicate that the model is well speciﬁed and a lot of weight
should be placed on its implied restrictions. Deﬁne
ˆ
z = argmax
z∈A
p
z
(Y ). (62)
If p
z
(Y ) peaks at an intermediate value of z, say between 0.5 and 2,
then a comparison between DSGEVAR(
ˆ
z) and DSGE model impulse
responses can yield important insights about the misspeciﬁcation of the
DSGE model.
An impulse response function comparison requires the identiﬁcation
of structural shocks in the context of the VAR. So far, the VAR given in (55)
has been speciﬁed in terms of reduced form disturbances u
t
. According to
the DSGE model, the onestepahead forecast errors u
t
are functions of the
structural shocks e
t
, which we represent by
u
t
= l
tr
De
t
. (63)
l
tr
is the Cholesky decomposition of l, and D is an orthonormal
matrix that is not identiﬁable based on the likelihood function associated
with (55). Del Negro and Schorfheide (2004) proposed to construct D as
follows. Let A
0
(0) be the contemporaneous impact of e
t
on y
t
according to
the DSGE model. Using a QR factorization, the initial response of y
t
to the
structural shocks can be can be uniquely decomposed into
_
cy
t
ce
t
_
DSGE
= A
0
(0) = l
∗
tr
(0)D
∗
(0), (64)
where l
∗
tr
(0) is lower triangular and D
∗
(0) is orthonormal. The initial
impact of e
t
on y
t
in the VAR, on the other hand, is given by
_
cy
t
ce
t
_
VAR
= l
tr
D. (65)
To identify the DSGEVAR, we maintain the triangularization of its
covariance matrix l and replace the rotation D in (65) with the function
D
∗
(0) that appears in (64). The rotation matrix is chosen so that in absence
of misspeciﬁcation the DSGE’s and the DSGEVAR’s impulse responses
to all shocks approximately coincide. To the extent that misspeciﬁcation
is mainly in the dynamics, as opposed to the covariance matrix of
innovations, the identiﬁcation procedure can be interpreted as matching,
at least qualitatively, the shortrun responses of the VAR with those from
the DSGE model. The estimation of the DSGEVAR can be implemented
as follows.
154 S. An and F. Schorfheide
MCMC Algorithm for DSGEVAR
1. Use the RWM algorithm to generate draws 0
(s)
from the marginal
posterior distribution p
z
(0  Y ).
2. Use Geweke’s modiﬁed harmonic mean estimator to obtain a numerical
approximation of p
z
(Y ).
3. For each draw 0
(s)
generate a pair 4
(s)
, l
(s)
, by sampling from the
W − distribution. Moreover, compute the orthonormal matrix
D
(s)
= D
∗
(0) as described above.
We now implement the DSGEVAR procedure for
1
(L) based on
artiﬁcially generated data. All the results reported subsequently are based
on a DSGEVAR with p = 4 lags. The ﬁrst step of our analysis is to construct
a posterior distribution for the parameter z. We assume that z lies on the
grid A = ].25, .5, .75, 1, 5, ∞] and assign equal probabilities to each grid
point. For z = ∞ the misspeciﬁcation matrices 4
A
and l
A
are zero and we
estimate the VAR by imposing the restrictions 4 = 4
∗
(0) and l = l
∗
(0).
Table 5 reports log marginal data densities for the DSGEVAR as a
function of z for data sets
1
(L) and
5
(L). The ﬁrst row reports the
marginal data density associated with the state space representation of
the DSGE model, whereas the second row corresponds to the DSGE
VAR(∞). If the VAR approximation of the DSGE model were exact, then
the values in the ﬁrst and second row of Table 5 would be identical. For
data set
1
(L), which has been directly generated from the state space
representation of
1
(L), the two values are indeed quite close. Moreover,
the marginal data densities are increasing as a function of z, indicating that
there is no evidence of misspeciﬁcation and that it is best to dogmatically
impose the DSGE model restrictions.
With respect to data set
5
(L) the DSGE model is misspeciﬁed. This
misspeciﬁcation is captured in the inverse Ushape of the marginal data
density function, which is representative for the shapes found in empirical
applications in Del Negro and Schorfheide (2004) and Del Negro et al.
(2006). The value z = 1.00 has the highest likelihood. The marginal data
TABLE 5 Log marginal data densities for
1
(L) DSGEVARs
Speciﬁcation
1
(L)
5
(L)
DSGE Model −196.66 −245.62
DSGEVAR z = ∞ −196.88 −244.54
DSGEVAR z = 5.00 −198.87 −241.95
DSGEVAR z = 1.00 −206.57 −238.59
DSGEVAR z = .75 −209.53 −239.40
DSGEVAR z = .50 −215.06 −241.81
DSGEVAR z = .25 −231.20 −253.61
Notes: See Table 4.
Bayesian Analysis of DSGE Models 155
densities drop substantially for z ≥ 5 and z ≤ 0.5. In order to assess the
nature of the misspeciﬁcation for
5
(L) we now turn to the impulse
response function comparison.
In principle, there are three different sets of impulse responses to
be compared: responses from the state–space representation of the DSGE
model, the z = ∞ and the z =
ˆ
z DSGEVAR. Moreover, these impulse
responses can either be compared based on the same parameter values
0 or their respective posterior estimates of the DSGE model parameters.
Figure 11 depicts posterior mean impulse responses for the state–space
representation of the DSGE model as well as the DSGEVAR(∞). In both
FIGURE 11 Impulse responses, DSGE, and DSGEVAR(z = ∞) – Model
1
(L), Data
5
(L).
DSGE model responses computed from state–space representation: posterior mean (solid); DSGE
VAR(z = ∞) responses: posterior mean (dashed) and pointwise 90% probability bands (dotted).
156 S. An and F. Schorfheide
cases we use the same posterior distribution of 0, namely, the one obtained
from the DSGEVAR(∞). The discrepancy between the posterior mean
responses (solid and dashed lines) indicates the magnitude of the error
that is made by approximating the state–space representation of the DSGE
model by a fourthorder VAR. Except for the response of output to the
government spending shock, the DSGE and DSGEVAR(∞) responses are
virtually indistinguishable, indicating that the approximation error is small.
The (dotted) bands in Figure 11 depict pointwise 90% probability intervals
for the DSGEVAR(∞) and reﬂect posterior parameter uncertainty with
respect to 0.
To assess the misspeciﬁcation of the DSGE model we compare posterior
mean responses of the z = ∞ (dashed) and
ˆ
z (solid) DSGEVAR in
Figure 12. Both sets of responses are constructed from the
ˆ
z posterior
of 0. The (dotted) 90% probability bands reﬂect uncertainty with respect
to the discrepancy between the z = ∞and
ˆ
z responses. A brief description
of the computation can clarify the interpretation. For each draw of 0
from the
ˆ
z DSGEVAR posterior we compute the
ˆ
z and the z = ∞
response functions, calculate their differences, and construct probability
intervals of the differences. To obtain the dotted bands in the ﬁgure,
we take the upper and lower bound of these probability intervals and
shift them according to the posterior mean of the z = ∞ response.
Roughly speaking, the ﬁgure answers the question: to what extent do the
ˆ
z
estimates of the VAR coefﬁcients deviate from the DSGE model restriction
functions 4
∗
(0) and l
∗
(0). The discrepancy, however, is transformed from
the 4–l space into impulse response functions because they are easier
to interpret.
While the relaxation of the DSGE model restrictions has little effect on
the propagation of the technology shock, the inﬂation and interest rate
responses to a monetary policy shock are markedly different. According to
the VAR approximation of the DSGE model, inﬂation returns quickly to
its steadystate level after a contractionary policy shock. The DSGEVAR(
ˆ
z)
response of inﬂation, on the other hand, has a slight hump shape, and the
reversion to the steadystate is much slower. Unlike in the DSGEVAR(∞),
the interest rate falls below steadystate in period four and stays negative
for several periods.
In a general equilibrium model a misspeciﬁcation of the households’
decision problem can have effects on the dynamics of all the endogenous
variables. Rather than looking at the dynamic responses of the endogenous
variables to the structural shocks, we can also ask to what extent are the
optimality conditions that the DSGE model imposes satisﬁed by the DSGE
VAR(
ˆ
z) responses. According to the loglinearized DSGE model output,
inﬂation, and interest rates satisfy the following relationships in response
Bayesian Analysis of DSGE Models 157
FIGURE 12 Impulse responses, DSGEVAR(z = ∞), and DSGEVAR(z = 1) – Model
1
(L), Data
5
(L). DSGEVAR(z = ∞) posterior mean responses (dashed), DSGEVAR(z = 1) posterior mean
responses (solid). Pointwise 90% probability bands (dotted) signify shifted probability intervals for
the difference between z = ∞ and z = 1 responses.
to the shock e
R,t
:
0 = ˆ y
t
−
¨
t
[y
t +1
] +
1
t
(
¨
R
t
−
¨
t
[¬
t +1
]) (66)
0 = ˆ ¬
t
−[
t
[ ˆ ¬
t +1
] −«ˆ y
t
(67)
e
R,t
=
¨
R
t
−j
R
¨
R
t −1
−(1 −j
R
)ç
1
ˆ ¬
t
−(1 −j
R
)ç
2
ˆ y
t
(68)
Figure 13 depicts the path of the righthand side of Equations (66) to (68)
in response to a monetary policy shock. Based on draws from the joint
posterior of the DSGE and VAR parameters we can ﬁrst calculate identiﬁed
158 S. An and F. Schorfheide
FIGURE 13 Impulse responses, DSGEVAR(z = ∞), and DSGEVAR(z = 1) – Model
1
(L), Data
5
(L). DSGE model responses: posterior mean (dashed); DSGEVAR responses: posterior mean
(solid) and pointwise 90% probability bands (dotted).
VAR responses to obtain the path of output, inﬂation, and interest rate,
and in a second step use the DSGE model parameters to calculate the
wedges in the Euler equation, the price setting equation, and the monetary
policy rule. The dashed lines show the posterior mean responses according
to the state–space representation of the DSGE model, and the solid lines
depict DSGEVAR responses. The top three panels of Figure 13 show that
both for the DSGE model as well as the DSGEVAR(∞) the three equations
are satisﬁed. The bottom three panels of the ﬁgure are based on the DSGE
VAR(
ˆ
z), which relaxes the DSGE model restrictions. While the price setting
relationship and the monetary policy rule restriction seem to be satisﬁed,
at least for the posterior mean, Panel (2,1) indicates a substantial violation
of the consumption Euler equation. This ﬁnding is encouraging for the
evaluation strategy since the data set
5
(L) was indeed generated from a
model with a modiﬁed Euler equation.
While this section focused mostly on the assessment of a single DSGE
model, Del Negro et al. (2006) use the DSGEVAR framework also to
compare multiple DSGE models. Schorfheide (2000) proposed a loss
functionbased evaluation framework for (multiple) DSGE models that
Bayesian Analysis of DSGE Models 159
augments potentially misspeciﬁed DSGE models with a more general
reference model, constructs a posterior distribution for population
characteristics of interest such as autocovariances and impulse responses,
and then examines the ability of the DSGE models to predict the
population characteristics. This prediction is evaluated under a loss
function and the risk is calculated under an overall posterior distribution
that averages the predictions of the DSGE models and the reference
model according to their posterior probabilities. This lossfunctionbased
approach nests DSGE model comparisons based on marginal data densities
as well as assessments based on a comparison of model moments to sample
moments, popularized in the calibration literature, as special cases.
6. NONLINEAR METHODS
For a nonlinear/nonnormal state–space model, the linear Kalman
ﬁlter cannot be used to compute the likelihood function. Instead,
numerical methods have to be used to integrate the latent state vector. The
evaluation of the likelihood function can be implemented with a particle
ﬁlter, also known as a sequential Monte Carlo ﬁlter. Gordon et al. (1993)
and Kitagawa (1996) are early contributors to this literature. Arulampalam
et al. (2002) provide an excellent survey. In economics, the particle ﬁlter
has been applied to analyze the stochastic volatility models by Pitt and
Shephard (1999) and Kim et al. (1998). Recently FernándezVillaverde
and RubioRamírez (2005) use the ﬁlter to construct the likelihood for
DSGE model solved with a projection method. We follow their approach
and use the particle ﬁlter for DSGE model
1
solved with a secondorder
perturbation method. A brief description of the procedure is given below.
We use Y
t
to denote the t ×n matrix with rows y
t
, t = 1, . . . , t. The vector
s
t
has been deﬁned in Section 2.4.
Particle Filter
1) Initialization: Draw N particles s
i
0
, i = 1, . . . , N, from the initial
distribution p(s
0
 0). By induction, in period t we start with the particles
_
s
i
t −1
_
N
i=1
, which approximate p
_
s
t −1
 Y
t −1
, 0
_
.
2) Prediction: Draw onestepahead forecasted particles
_
˜ s
i
t
_
N
i=1
from
p
_
s
t
 Y
t −1
, 0
_
. Note that
p
_
s
t
 Y
t −1
, 0
_
=
_
p (s
t
 s
t −1
, 0) p
_
s
t −1
 Y
t −1
, 0
_
ds
t −1
≈
1
N
N
i=1
p
_
s
t
 s
i
t −1
, 0
_
.
Hence one can draw N particles from p
_
s
t
 Y
t −1
, 0
_
by generating one
particle from p
_
s
t
 s
i
t −1
, 0
_
for each i.
160 S. An and F. Schorfheide
3) Filtering: The goal is to approximate
p
_
s
t
 Y
t
, 0
_
=
p
_
y
t
 s
t
, 0
_
p
_
s
t
 Y
t −1
, 0
_
p
_
y
t
 Y
t −1
, 0
_ , (69)
which amounts to updating the probability weights assigned to the particles
_
˜ s
i
t
_
N
i=1
. We begin by computing the nonnormalized importance weights
˜ ¬
i
t
= p
_
y
t
 ˜ s
i
t
, 0
_
. The denominator in (69) can be approximated by
p
_
y
t
 Y
t −1
, 0
_
=
_
p
_
y
t
 s
t
, 0
_
p
_
s
t
 Y
t −1
, 0
_
ds
t
≈
1
N
N
i=1
˜ ¬
i
t
. (70)
Now deﬁne the normalized weights
¬
i
t
=
˜ ¬
i
t
N
j =1
˜ ¬
j
t
and note that the importance sampler
_
˜ s
i
t
, ¬
i
t
_
N
i=1
approximates
12
the
updated density p
_
s
t
 Y
t
, 0
_
.
4) Resampling: We now generate a new set of particles
_
s
i
t
_
N
i=1
by
resampling with replacement N times from an approximate discrete
representation of p
_
s
t
 Y
t
, 0
_
given by
_
˜ s
i
t
, ¬
i
t
_
N
i=1
so that
Pr
_
s
i
t
= ˜ s
i
t
_
= ¬
i
t
, i = 1, . . . , N.
The resulting sample is in fact an iid sample from the discretized density of
p(s
t
 Y
t
, 0) and hence is equally weighted.
5) Likelihood Evaluation: According to (70) the log likelihood function
can be approximated by
ln(0  Y
T
) = ln p(y
1
 0) +
T
t =2
lnp
_
y
t
 Y
t −1
, 0
_
≈
N
t =1
_
1
N
N
i=1
˜ ¬
i
t
_
.
To compare results from ﬁrst and secondorder accurate DSGE model
solutions we simulated artiﬁcial data of 80 observations from the quadratic
approximation of model
1
with parameter values reported in Table 3
under the heading
1
(Q). In the remainder of this section we will refer
to these parameters as “true” values as opposed to “pseudotrue” values to
12
The sequential importance sampler (SIS) is a Monte Carlo method which utilized this aspect,
but it is well documented that it suffers from a degeneracy problem, where after a few iterations,
all but one particle will have negligible weight.
Bayesian Analysis of DSGE Models 161
be deﬁned later. These true parameter values by and large resemble the
parameter values that have been used to generate data from the loglinear
DSGE model speciﬁcations. However, there are some exceptions. The
degree of imperfect competition, v, is set to 0.1, which implies a steadystate
markup of 11% that is consistent with the estimates of Basu (1995). 1,g
is chosen as .85, which implies that the steadystate consumption amounts
to 85% of output. The slope coefﬁcient of the Phillips curve, «, is chosen
to be .33, which implies less price stickiness than in the linear case. The
implied quadratic adjustment cost coefﬁcient, [, is 53.68. We also make
monetary policy less responsive to the output gap by setting ç
2
= .125. ¸
(Q)
,
r
(A)
, and ¬
(A)
are chosen as .55, 1.0, and 3.2 to match the average output
growth rate, interest rate, and inﬂation between the artiﬁcial data and the
real U.S. data. These values are different from the mean of the historical
observations because in the nonlinear version of the DSGE model means
differ from steady states. Other exogenous process parameters, j
g
, j
z
, o
R
,
o
g
, and o
z
, are chosen so that the second moments are matched to the U.S.
data. For computational convenience, a measurement error is introduced
to each measurement equation, whose standard deviation is set as 20% of
that of each observation.
13
6.1. Conﬁguration of the Particle Filter
There are several issues concerning the practical implementation of
the particle ﬁlter. First, we need a scheme to draw from the initial state
distribution, p (s
0
 0). In the linear case, it is straightforward to calculate the
unconditional distribution of s
t
associated with the vector autoregressive
representation (33). For the secondorder approximation we rewrite the
state transition equation (37) as follows. Decompose s
t
= [x
t
, ¸
t
]
, where ¸
t
is composed of the exogenous processes e
R,t
, ˆ g
t
, and ˆ z
t
. The law of motion
for the endogenous state variables x
j ,t
can be expressed as
x
j ,t
= +
(0)
j
++
(1)
j
w
t
+w
t
+
(2)
j
w
t
(71)
where w
t
= [x
t −1
, ¸
t
]
. We generate x
0
by drawing ¸
0
from its unconditional
distribution (the law of motion for ¸
t
is linear) and setting x
−1
= x, where
x is the steadystate of x
t
.
Second, we have to choose the number of particles. For a good
approximation of the prediction error distribution, it is desirable to
have many particles, especially enough particles to capture the tails of
13
Without the measurement error two complications arise: since p(y
t
 s
t
) degenerates to a point
mass and the distribution of s
t
is that of a multivariate noncentral ,
2
the evaluation of p(y
t
 Y
t −1
)
becomes difﬁcult. Moreover, the computation of p(s
t
 Y
t
) requires the solution of a system of
quadratic equations.
162 S. An and F. Schorfheide
p(s
t
 Y
t
). Moreover, the number of particles affects the performance
of the resampling algorithm. If the number of particles is too small,
the resampling will not work well. In our implementation, the stratiﬁed
resampling scheme proposed by Kitagawa (1996) is applied. It is optimal
in terms of variance in the class of unbiased resampling schemes. If the
measurement errors in the conditional distribution p(y
t
 s
t
) are small,
more particles are needed to obtain an accurate approximation of the
likelihood function and to ensure that the posterior weights ¬
i
t
do not
assign probability one to a single particle. In our application, we found
that 40,000 particles were enough to get stable evalutaions of the likelihood
given the size of the measurement errors.
Even though the estimation procedure involves extensive random
sampling, the particle ﬁlter can be readily implemented on a good desktop
computer. We implement most of the procedure in MATLAB (2005) so
that we can exploit the symbolic toolbox to solve the DSGE model, but it
takes too much time to evaluate the likelihood function using the particle
ﬁlter. The ﬁltering step is implemented as FORTRAN mex library, so we
can call it as a function in MATLAB. Based on a sample of 80 observations
we can generate 1,000 draws with 40,000 particles in 1 h and 40 min (6.0 sec
per draw) on the AMD64 3000+ machine with 1 GB RAM. Linear methods
are more than 200 times faster: 1,000 draws are generated in 24 sec in our
Matlab routines and 3 sec in our GAUSS routines.
6.2. A Look at the Likelihood Function
We will begin our analysis by studying proﬁles of likelihood functions.
For brevity, we will refer to the likelihood obtained from the ﬁrstorder
(secondorder) accurate solution of the DSGE model as linear (quadratic)
likelihood. It is natural to evaluate the quadratic likelihood function in the
neighborhood of the true parameter values given in Table 3. However, we
evaluate the linear likelihood around the pseudotrue parameter values,
which are obtained by ﬁnding the mode of the linear likelihood using
3,000 observations. The parameter values for v and 1,g are ﬁxed at their
true values because they do not enter the linear likelihood. Recall that we
had introduced a reducedform Phillips curve parameter « in (32) because
v and [ cannot be identiﬁed with the loglinearized DSGE model.
Figure 14 shows the loglikelihood proﬁles along each structural
parameter dimension. Two features are noticeable in this comparison.
First, the quadratic loglikelihood peaks around the true parameter values,
while the linear loglikelihood attains its maximum around the pseudotrue
parameter values. The differences between the peaks are most pronounced
for the steadystate parameters r
(A)
and ¬
(A)
. While in the linearized
model steady states and longrun averages coincide, they differ if the
model is solved by a secondorder approximation. For some parameters
Bayesian Analysis of DSGE Models 163
FIGURE 14 Linear vs. quadratic approximations: likelihood proﬁles. Data Set
1
(Q). Likelihood
proﬁle for each parameter:
1
(L)/Kalman ﬁlter (dashed) and
1
(Q)/particle ﬁlter (solid). 40,000
particles are used. Vertical lines signify true (dotted) and pseudotrue (dasheddotted) values.
164 S. An and F. Schorfheide
FIGURE 15 Linear vs. quadratic approximations: Likelihood contours. Data set
1
(Q). Contours
of likelihood (solid) for
1
(L) and
1
(Q). 40,000 particles are used. Large dot indicates true and
pseudotrue parameter value. « is constant along dashed lines.
the quadratic loglikelihood function is more concave than the linear
one, which implies that the nonlinear approach is able to extract more
information on the structural parameters from the data. For instance,
it appears that the monetary policy parameter such as ç
1
can be more
precisely estimated with the quadratic approximation.
As noted before, the quadratic approximation can identify some
structural parameters that are unidentiﬁable under the loglinearized
model. g does not enter the linear version of the model and hence
the loglikelihood proﬁle along this dimension is ﬂat. In the quadratic
approximation, however, the loglikelihood is slightly sloped in 1,g = c,y
dimension. Moreover, the linear likelihood is ﬂat for the values of v and
[ that imply a constant «. This is illustrated in Figure 15, which depicts
the linear and quadratic likelihood contours in v[ space. The contours
of the linear likelihood trace out iso« lines. The right panel of Figure 15
indicates that the iso« lines intersect with the contours of the quadratic
likelihood function, which suggests that v and [ are potentially separately
identiﬁable. However, our sample of 80 observations is too short to obtain
a sharp estimate of the two parameters.
6.3. The Posterior
We now compare the posterior distributions obtained from the linear
and nonlinear analysis. In both cases we use the same prior distribution
reported in Table 3. Unlike in the analysis of the likelihood function we
now substitute the adjustment cost parameter [ with the Phillipscurve
parameter « in the quadratic approximation of the DSGE model. A beta
prior with mean .1 and standard deviation 0.05 is placed on v, which
Bayesian Analysis of DSGE Models 165
roughly covers micro evidences of 10–15% markup. Note that 1 −1,g is
the steadystate government spending–output ratio, and hence a beta prior
with mean 0.85 and standard deviation .1 is used for 1,g .
We use the RWM algorithm to generate draws from the posterior
distributions associated with the linear and quadratic approximations of
the DSGE model. We ﬁrst compute the posterior mode for the linear
speciﬁcation with the additional parameters, v and 1,g , ﬁxed at their true
values. After that, we evaluate the Hessian to be used in the RWM algorithm
at the (linear) mode without ﬁxing v and 1,g , so that the inverse Hessian
reﬂects the prior variance of the additional parameters. This Hessian is
used for both the linear and the nonlinear analysis. We use scaling factors
.5 (linear) and .25 (quadratic) to target an acceptance rate of about 30%.
The Markov chains are initialized in the neighborhood of the pseudotrue
and true parameter values, respectively.
Figures 16 and 17 depict the draws from prior, linear posterior, and
quadratic posterior distribution. 100,000 draws from each distribution are
generated and every 500th draw is plotted. The draws tend to be more
concentrated as we move from prior to posterior. While, for most of our
parameters, linear and quadratic posteriors are quite similar, there are a
few exceptions. The quadratic posterior mean of ç
1
is larger than the
linear posterior mean and closer to the true value. The quadratic posterior
means for the steadystate parameters r
(A)
and ¬
(A)
are also greater than the
corresponding linear posterior means, which is consistent with the positive
difference between “true” and “pseudotrue” values of these parameters.
Now consider the parameter v that determines the demand elasticity
for the differentiated products. Conditional on « this parameter is not
identiﬁable under the linear approximation and hence its posterior is
not updated. The parameter does, however, affect the secondorder terms
that arise in the quadratic approximation of the DSGE model. Indeed,
the quadratic posterior of v is markedly different from the prior as it
concentrates around 0.05. Moreover, there is a clear negative correlation
between « and v according to the quadratic posterior.
Table 6 reports the log marginal data densities for our linear (
1
(L))
and quadratic (
1
(Q)) approximations of the DSGE model based on
Data Set
1
(Q). The log marginal likelihoods are −416.06 for
1
(L)
and −408.09 for
1
(Q), which imply a Bayes factor of about e
8
in favor
of the quadratic approximation. This result suggests that
1
(Q) exhibits
nonlinearities that can be detected by the likelihoodbased estimation
methods. Our simulation results with respect to linear versus nonlinear
estimation are in line with the ﬁndings reported in FernándezVillaverde
and RubioRamírez (2005) for a version of the neoclassical stochastic
growth model. In their analysis the nonlinear solution combined with
the particle ﬁlter delivers a substantially better ﬁt of the model to the
data as measured by the marginal data density, both for simulated as well
166 S. An and F. Schorfheide
FIGURE 16 Posterior draws: linear vs. quadratic approximation I. Data set
1
(Q). 100,000 draws
from the prior and posterior distributions. Every 500th draw is plotted. Intersections of solid and
dotted lines signify true and pseudotrue parameter values. 40,000 particles are used.
Bayesian Analysis of DSGE Models 167
FIGURE 17 Posterior draws: Linear vs. quadratic approximation II. See Figure 16.
168 S. An and F. Schorfheide
TABLE 6 Log marginal data densities based on
1
(Q)
Speciﬁcation lnp(Y 
1
) Bayes factor vesus
1
(Q)
Benchmark model, linear solution
1
(L) −416.06 exp[7.97]
Benchmark model, quadratic solution
1
(Q) −408.09 1.00
Notes: See Table 4.
as actual data. Moreover, the authors point out that the differences in
terms of point estimates, although relatively small in magnitude, may have
important effects on the moments of the model.
7. CONCLUSIONS AND OUTLOOK
There exists by now a large and growing body of empirical work on
the Bayesian estimation and evaluation of DSGE models. This paper has
surveyed the tools used in this literature and illustrated their performance
in the context of artiﬁcial data. While most of the existing empirical work
is based on linearized models, techniques to estimate DSGE models solved
with nonlinear methods have become implementable thanks to advances
in computing technology. Nevertheless, many challenges remain. Model
size and dimensionality of the parameter space pose a challenge for MCMC
methods. Lack of identiﬁcation of structural parameters is often difﬁcult to
detect since the mapping from the structural form of the DSGE model into
a state–space representation is highly nonlinear and creates a challenge
for scientiﬁc reporting as audiences are typically interested in disentangling
information provided by the data from information embodied in the
prior. Model misspeciﬁcation is and will remain a concern in empirical
work with DSGE models despite continuous efforts by macroeconomists
to develop more adequate models. Hence it is important to develop
methods that incorporate potential model misspeciﬁcation in the measures
of uncertainty constructed for forecasts and policy recommendations.
ACKNOWLEDGMENTS
Part of the research was conducted while Schorfheide was visiting
the Research Department of the Federal Reserve Bank of Philadelphia,
for whose hospitality he is grateful. The views expressed here are those
of the authors and do not necessarily reﬂect the views of the Federal
Reserve Bank of Philadelphia or the Federal Reserve System. We thank
Herman van Dijk and Gary Koop for inviting us to write this survey
paper for Econometric Reviews. We also thank our discussants as well as
Marco Del Negro, Jesús FernándezVillaverde, Thomas Lubik, Juan Rubio
Ramírez, Georg Strasser, seminar participants at Columbia University, and
Bayesian Analysis of DSGE Models 169
the participants of the 2006 EABCN Training School in Eltville for helpful
comments and suggestions. Schorfheide gratefully acknowledges ﬁnancial
support from the Alfred P. Sloan Foundation.
REFERENCES
Adolfson, M., Laséen, S., Lindé, J., Villani, M. (2004). Bayesian estimation of an open economy
DSGE model with incomplete passthrough. Journal of International Economics, forthcoming.
Altug, S. (1989). Timetobuild and aggregate ﬂuctuations: some new evidence. International Economic
Review 30(4):889–920.
Anderson, G. (2000). A reliable and computationally efﬁcient algorithm for imposing the saddle
point property in dynamic models. Manuscript, Federal Reserve Board of Governors.
Arulampalam, M. S., Maskell, S., Gordon, N., Clapp, T. (2002). A tutorial on particle ﬁlters
for online nonlinear/nonGaussian Bayesian tracking. IEEE Transactions on Signal Processing
50(2):173–188.
Aruoba, S. B., FernándezVillaverde, J., RubioRamírez, J. F. (2004). Comparing solution methods for
dynamic equilibrium economies. Journal of Economic Dynamics and Control 30(12):2477–2508.
Basu, S. (1995). Intermediate goods and business cycles: implications for productivity and welfare.
American Economic Review 85(3):512–531.
Beyer, A., Farmer, R. (2004). On the indeterminacy of new keynesian economics. ECB Working
Paper 323.
Bierens, H. J. (2007). Econometric analysis of linearized singular dynamic stochastic general
equilibrium models. Journal of Econometrics 136(2):595–627.
Bils, M., Klenow, P. (2004). Some evidence on the importance of sticky prices. Journal of Political
Economy 112(5):947–985.
Binder, M., Pesaran, H. (1997). Multivariate linear rational expectations models: characterization
of the nature of the solutions and their fully recursive computation. Econometric Theory
13(6):877–888.
Blanchard, O. J., Kahn, C. M. (1980). The solution of linear difference models under rational
expectations. Econometrica 48(5):1305–1312.
Box, G. E. P. (1980). Sampling and Bayes inference in scientiﬁc modelling and robustness. Journal
of the Royal Statistical Society Series A 143(4):383–430.
Canova, F. (1994). Statistical inference in calibrated models. Journal of Applied Econometrics
9:S123–S144.
Canova, F. (2004). Monetary policy and the evolution of the US economy: 19482002. Manuscript,
IGIER and Universitat Pompeu Fabra.
Canova, F. (2007). Methods for Applied Macroeconomic Research. NJ: Princeton University Press.
Canova, F., Sala, L. (2005). Back to square one: identiﬁcation issues in DSGE models. Manuscript,
IGIER and Bocconi University.
Chang, Y., Gomes, J., Schorfheide, F. (2002). Learningbydoing as a propagation mechanism.
American Economic Review 92(5):1498–1520.
Chang, Y., Schorfheide, F. (2003). Laborsupply shifts and economic ﬂuctuations. Journal of Monetary
Economics 50(8):1751–1768.
Chib, S., Greenberg, E. (1995). Understanding the metropolishastings algorithm. American
Statistician 49(4):327–335.
Chib, S., Jeliazkov, I. (2001). Marginal likelihood from the metropolishastings output. Journal of the
American Statistical Association 96(453):270–281.
Christiano, L. J. (2002). Solving dynamic equilibrium models by a methods of undetermined
coefﬁcients. Computational Economics 20(1–2):21–55.
Christiano, L. J., Eichenbaum, M. (1992). Current realbusiness cycle theories and aggregate labor
market ﬂuctuations. American Economic Review 82(3):430–450.
Christiano, L. J., Eichenbaum, M., Evans, C. L. (2005). Nominal rigidities and the dynamic effects
of a shock to monetary policy. Journal of Political Economy 113(1):1–45.
Collard, F., Juillard, M. (2001). Accuracy of stochastic perturbation methods: the case of asset pricing
models. Journal of Economic Dynamics and Control 25(6–7):979–999.
170 S. An and F. Schorfheide
Crowder, M. (1988). Asymptotic expansions of posterior expectations, distributions, and densities
for stochastic processes. Annals of the Institute for Statistical Mathematics 40:297–309.
de Walque, G., Wouters, R. (2004). An open economy DSGE model linking the Euro area and the
U.S. economy. Manuscript, National Bank of Belgium.
DeJong, D. N., Ingram, B. F. (2001). The cyclical behavior of skill acquisition. Review of Economic
Dynamics 4(3):536–561.
DeJong, D. N., Ingram, B. F., Whiteman, C. H. (1996). A Bayesian approach to calibration. Journal
of Business and Economic Statistics 14(1):1–9.
DeJong, D. N., Ingram, B. F., Whiteman, C. H. (2000). A Bayesian approach to dynamic
macroeconomics. Journal of Econometrics 98(2):203–223.
Del Negro, M. (2003). Fear of ﬂoating: a structural estimation of monetary policy in Mexico.
Manuscript, Federal Reserve Bank of Atlanta.
Del Negro, M., Schorfheide, F. (2004). Priors from general equilibrium models for VARs.
International Economic Review 45(2):643–673.
Del Negro, M., Schorfheide, F., Smets, F., Wouters, R. (2006). On the ﬁt and forecasting
performance of new keynesian models. Journal of Business and Economic Statistics, forthcoming.
Den Haan, W. J., Marcet, A. (1994). Accuracy in simulation. Review of Economic Studies 61(1):3–17.
Diebold, F. X., Ohanian, L. E., Berkowitz, J. (1998). Dynamic equilibrium economies: a framework
for comparing models and data. Review of Economic Studies 65(3):433–452.
Dridi, R., Guay, A., Renault, E. (2007). Indirect inference and calibration of dynamic stochastic
general equilibrium models. Journal of Econometrics 136(2):397–430.
DYNARE (2005). software available from http://cepremap.cnrs.fr/dynare/.
FernándezVillaverde, J., RubioRamírez, J. F. (2004). Comparing dynamic equilibrium models to
data: aBayesian approach. Journal of Econometrics 123(1):153–187.
FernándezVillaverde, J., RubioRamírez, J. F. (2005). Estimating dynamic equilibrium economies:
linear versus nonlinear likelihood. Journal of Applied Econometrics 20(7):891–910.
Galí, J., Rabanal, P. (2005). Technology shocks and aggregate ﬂuctuations: how well does the RBC
model ﬁt postwar U.S. data. In: Gertler, M., Rogoff, K., eds. NBER Macroeconomics Annual
2004. Cambridge, MA: MIT Press, pp. 225–228.
GAUSS (2004). Software available from http://www.aptech.com/.
Gelfand, A. E., Dey, D. K. (1994). Bayesian model choice: asymptotics and exact calculations. Journal
of the Royal Statistical Society Series B 56(3):501–514.
Gelman, A., Carlin, J., Stern, H., Rubin, D. (1995). Bayesian Data Analysis. New York: Chapman &
Hall.
Geweke, J. (1989). Bayesian inference in econometric models using Monte Carlo integration.
Econometrica 57(6):1317–1339.
Geweke, J. (1999a). Computational experiments and reality. Manuscript, University of Iowa.
Geweke, J. (1999b). Using simulation methods for bayesian econometric models: inference,
development and communication. Econometric Reviews 18(1):1–126.
Geweke, J. (2005). Contemporary Bayesian Econometrics and Statistics. Hoboken: John Wiley & Sons.
Good, I. J. (1952). Rational decisions. Journal of the Royal Statistical Society Series B 14(1):107–114.
Gordon, N. J., Salmond, D. J., Smith, A. F. M. (1993). Novel approach to nonlinear and non
gaussian Bayesian state estimation. IEE ProceedingsF 140(2):107–113.
Hammersley, J., Handscomb, D. (1964). Monte Carlo Methods. New York: John Wiley.
Hansen, L. P., Heckman, J. J. (1996). The empirical foundations of calibration. Journal of Economic
Perspectives 10(1):87–104.
Hastings, W. K. (1970). Monte Carlo sampling methods using markov chain and their applications.
Biometrika 57(1):97–109.
Ingram, B. F., Whiteman, C. M. (1994). Supplanting the Minnesota prior: forecasting
macroeconomic time series using real business cycle model priors. Journal of Monetary
Economics 34(3):497–510.
Ireland, P. N. (2004). A method for taking models to the data. Journal of Economic Dynamics and
Control 28(6):1205–1226.
Jin, H.H., Judd, K. L. (2002). Perturbation methods for general dynamic stochastic models.
Manuscript, Hoover Institute.
Judd, K. L. (1998). Numerical Methods in Economics. Cambridge, MA: MIT Press.
Bayesian Analysis of DSGE Models 171
Justiniano, A., Preston, B. (2004). Small open economy Dsge models – speciﬁcation, estimation, and
model ﬁt. Manuscript, International Monetary Fund and Columbia University.
Kass, R. E., Rafterty, A. E. (1995). Bayes factors. Journal of the American Statistical Association 90(430):
773–795.
Kim, J. (2000). Constructing and estimating a realistic optimizing model of monetary policy. Journal
of Monetary Economics 45(2):329–359.
Kim, J., Kim, S. H. (2003). Spurious welfare reversal in international business cycle models. Journal
of International Economics 60(2):471–500.
Kim, J., Kim, S. H., Schaumburg, E., Sims, C. A. (2005). Calculating and using second order accurate
solutions of discrete time dynamic equilibirum models. Manuscript, Princeton University.
Kim, J.Y. (1998). Large sample properties of posterior densities, bayesian information criterion, and
the likelihood principle in nonstationary time series models. Econometrica 66(2):359–380.
Kim, K., Pagan, A. (1995). Econometric analysis of calibrated macroeconomic models. In: Pesaran,
H., Wickens, M., eds. Handbook of Applied Econometrics: Macroeconomics. Oxford: Basil Blackwell,
pp. 356–390.
Kim, S., Shephard, N., Chib, S. (1998). Stochastic volatility: likelihood inference and comparison
with ARCH Models. Review of Economic Studies 65(3):361–393.
King, R. G., Watson, M. W. (1998). The solution of singluar linear difference systems under rational
expectations. International Economic Review 39(4):1015–1026.
Kitagawa, G. (1996). Monte Carlo ﬁlter and smoother for nongaussian nonlinear state space
models. Journal of Computational and Graphical Statistics 5(1):1–25.
Klein, P. (2000). Using the generalized Schur form to solve a multivariate linear rational
expectations model. Journal of Economic Dynamics and Control 24(10):1405–1423.
Kydland, F. E., Prescott, E. C. (1982). Time to build and aggregate ﬂuctuations. Econometrica
50(6):1345–1370.
Kydland, F. E., Prescott, E. C. (1996). The computational experiment: an econometric tool. Journal
of Economic Perspectives 10(1):69–85.
Laforte, J.P. (2004). Comparing monetary policy rules in an estimated equilibrium model of the
US economy. Manuscript, Federal Reserve Board of Governors.
Lancaster, A. (2004). An Introduction to Modern Bayesian Econometrics. Oxford: Blackwell Publishing.
LandonLane, J. (1998). Bayesian Comparison of Dynamic Macroeconomic Models. Ph.D.
dissertation, University of Minnesota.
Leeper, E. M., Sims, C. A. (1994). Toward a modern macroeconomic model usable for policy
analysis. In: Stanley, F., Rotemberg, J. J., eds. NBER MAcroeconomics Annual 1994. Cambridge,
MA: MIT Press, pp. 81–118.
Levin, A. T., Alexei O., Williams, J. C., Williams, N. (2006). Monetary policy under uncertainty in
microfounded macroeconometric models. In: Mark G., Rogoff, K., eds. NBER Macroeconomics
Annual 2005. Cambridge, MA: MIT Press, pp. 229–287.
Lubik, T. A., Schorfheide, F. (2003). Do central banks respond to exchange rate ﬂuctuation – a
structural investigation. Manuscript, University of Pennsylvania.
Lubik, T. A., Schorfheide, F. (2004). Testing for indeterminacy: an application to US monetary
policy. American Economic Review 94(1):190–217.
Lubik, T. A., Schorfheide, F. (2006). A Bayesian look at new open economy macroeconomics. In:
Mark, G., Rogoff, K., eds. NBER Macroeconomics Annual 2005. Cambridge, MA: MIT Press,
pp. 313–366.
MATLAB (2005). Version 7. Software available from http://www.mathworks.com/.
McGrattan, E. R. (1994). The macroeconomic effects of distortionary taxation. Journal of Monetary
Economics 33(3):573–601.
Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., Teller, E. (1953). Equation of
state calculations by fast computing machines. Journal of Chemical Physics 21(6):1087–1092.
Onatski, A., Williams, N. (2004). Empirical and policy performance of a forwardlooking monetary
model. Manuscript, Princeton University.
Otrok, C. (2001). On measuring the welfare cost of business cycles. Journal of Monetary Economics
47(1):61–92.
Phillips, P. C. B. (1996). Econometric model determination. Econometrica 64(4):763–812.
Pitt, M. K., Shephard, N. (1999). Filtering via simulation: Auxiliary particle ﬁlters. Journal of the
American Statistical Association 94(446):590–599.
172 S. An and F. Schorfheide
Poirier, D. (1998). Revising beliefs in nonidentiﬁed models. Econometric Theory 14(4):483–509.
Rabanal, P., RubioRamírez, J. F. (2005a). Comparing new keynesian models in the Euro area: A
bayesian approach. Federal Reserve Bank of Atlanta Working Paper 2005–8.
Rabanal, P., RubioRamírez, J. F. (2005b). Comparing new keynesian models of the business cycle:
A Bayesian approach. Journal of Monetary Economics 52(6):1152–1166.
Rabanal, P., Tuesta, V. (2006). EuroDollar real exchange rate dynamics in an estimated twocountry
model: What is important and what is not. CEPR Discussion Paper No. 5957.
Robert, C., Casella, G. (1999). Monte Carlo Statistical Methods. New York: Springer Verlag.
Rotemberg, J. J., Woodford, M. (1997). An optimizationbased econometric framework for the
evaluation of monetary policy. In: Bernanke, B. S., Rotemberg, J. J., eds. NBER Macroeconomics
Annual 1997. Cambridge, MA: MIT Press, pp. 297–346.
Sargent, T. J. (1989). Two models of measurements and the investment accelerator. Journal of Political
Economy 97(2):251–287.
SchmittGrohé, S., Uribe, M. (2006). Optimal simple and implementable monetary and ﬁscal rules.
Journal of Monetray Economics. Duke University.
Schorfheide, F. (2000). Loss functionbased evaluation of DSGE models. Journal of Applied Econometrics
15(6):645–670.
Schorfheide, F. (2005). Learning and monetary policy shifts. Review of Economic Dynamics
8(2):392–419.
Sims, C. A. (1996). Macroeconomics and methodology. Journal of Economic Perspectives 10(1):105–120.
Sims, C. A. (2002). Solving linear rational expectations models. Computational Economics 20(1–2):1–20.
Sims, C. A. (2003). Probability models for monetary policy decisions. Manuscript, Princeton
University.
Smets, F., Wouters, R. (2003). An estimated dynamic stochastic general equilibrium model of the
Euro area. Journal of European Economic Association 1(5):1123–1175.
Smets, F., Wouters, R. (2005). Comparing shocks and frictions in US and Euro area business cycles:
a Bayesian DSGE Approach. Journal of Applied Econometrics 20(2):161–183.
Smith, A. A. (1993). Estimating nonlinear timeseries models using simulated vector autoregressions.
Journal of Applied Econometrics 8:S63–S84.
Swanson, E., Anderson, G., Levin, A. (2005). HigherOrder perturbation solutions to dynamic,
discretetime rational expectations models. Manuscript, Federal Reserve Bank of San
Fransisco.
Taylor, J. B., Uhlig, H. (1990). Solving nonlinear stochastic growth models: a comparison of
alternative solution methods. Journal of Business and Economic Statistics 8(1):1–17.
Uhlig, H. (1999). A toolkit for analyzing nonlinear dynamic stochastic models easily. In: Marimón,
R., Scott, A., eds. Computational Methods for the Study of Dynamic Economies. Oxford, UK: Oxford
University Press, pp. 30–61.
Walker, A. M. (1969). On the asymptotic behaviour of posterior distributions. Journal of the Royal
Statistical Society Series B 31(1):80–88.
Watson, M. W. (1993). Measures of ﬁt for calibrated models. Journal of Political Economy
101(6):1011–1041.
White, H. (1982). Maximum likelihood estimation of misspeciﬁed models. Econometrica 50(1):1–25.
Woodford, M. (2003). Interest and Prices: Foundations of a Theory of Monetary Policy. Princeton, NJ:
Princeton University Press.
Zeller, A. (1971). Introduction to Bayesian Inference in Econometrics. New York: John Wiley & Sons.
114
S. An and F. Schorfheide
Subsequently, many authors have developed econometric frameworks that formalize aspects of the calibration approach by taking model misspeciﬁcation explicitly into account. Examples are Smith (1993), Watson (1993), Canova (1994), DeJong et al. (1996), Diebold et al. (1998), Geweke (1999b), Schorfheide (2000), Dridi et al. (2007), and Bierens (2007). At the same time, macroeconomists have improved the structural models and relaxed many of the misspeciﬁed restrictions of the ﬁrst generation of DSGE models. As a consequence, more traditional econometric techniques have become applicable. The most recent vintage of DSGE models is not just attractive from a theoretical perspective but is also emerging as a useful tool for forecasting and quantitative policy analysis in macroeconomics. Moreover, owing to improved time series ﬁt these models are gaining credibility in policymaking institutions such as central banks. This paper reviews Bayesian estimation and evaluation techniques that have been developed in recent years for empirical work with DSGE models.1 We focus on methods that are built around a likelihood function derived from the DSGE model. The econometric analysis has to cope with several challenges, including potential model misspeciﬁcation and identiﬁcation problems. We will illustrate how a Bayesian framework can address these challenges. Most of the techniques described in this article have been developed and applied in other papers. Our contribution is to give a uniﬁed perspective, by applying these methods successively to artiﬁcial data generated from a DSGE model and a VAR. We provide some evidence on the performance of Markov Chain Monte Carlo (MCMC) methods that have been applied to the Bayesian estimation of DSGE models. Moreover, we present new results on the use of ﬁrstorder accurate versus secondorder accurate solutions in the estimation of DSGE models. The paper is structured as follows. Section 2 outlines two versions of a ﬁveequation New Keynesian DSGE model along the lines of Woodford (2003) that differ with respect to the monetary policy rule. This model serves as the foundation for the current generation of largescale models that are used for the analysis of monetary policy in academic and central bank circles. Section 3 discusses some preliminaries, in particular the challenges that have to be confronted by the econometric framework. We proceed by generating several data sets that are used subsequently for model estimation and evaluation. We simulate samples from the ﬁrstorder accurate solution of the DSGE model presented in Section 2, from a modiﬁed version of the DSGE model in order to introduce
There is also an extensive literature on classical estimation and evaluation of DSGE models, but a detailed survey of these methods is beyond the scope of this article. The interested reader is referred to Kim and Pagan (1995) and the book by Canova (2007), which discusses both classical and Bayesian methods for the analysis of DSGE models.
1
Bayesian Analysis of DSGE Models
115
misspeciﬁcation, and from the secondorder accurate solution of the benchmark DSGE model. Owing to the computational burden associated with the likelihood evaluation for nonlinear solutions of the DSGE model, most of the empirical literature has estimated linearized DSGE models. Section 4 describes a RandomWalk Metropolis (RWM) algorithm and an Importance Sampler (IS) algorithm that can be used to calculate the posterior moments of DSGE model parameters and transformations thereof. We compare the performance of the algorithms and provide an example in which they explore the posterior distribution only locally in the neighborhood of modes that are separated from each other by a deep valley in the surface of the posterior density. Model evaluation is an important part of the empirical work with DSGE models. We consider three techniques in Section 5: posterior predictive model checking, model comparisons based on posterior odds, and comparisons of DSGE models to VARs. We illustrate these techniques by ﬁtting correctly speciﬁed and misspeciﬁed DSGE models to artiﬁcial data. In Section 6 we construct posterior distributions for DSGE model parameters based on a secondorder accurate solution and compare them to posteriors obtained from a linearized DSGE model. Finally, Section 7 concludes and provides an outlook on future work. 2. A PROTOTYPICAL DSGE MODEL Our model economy consists of a ﬁnal goods producing ﬁrm, a continuum of intermediate goods producing ﬁrms, a representative household, and a monetary as well as a ﬁscal authority. This model has become a benchmark speciﬁcation for the analysis of monetary policy and is analyzed in detail, for instance, in Woodford (2003). To keep the model speciﬁcation simple, we abstract from wage rigidities and capital accumulation. More elaborate versions can be found in Smets and Wouters (2003) and Christiano et al. (2005). 2.1. The Agents and Their Decision Problems The perfectly competitive, representative, ﬁnal goods producing ﬁrm combines a continuum of intermediate goods indexed by j ∈ [0, 1] using the technology Yt =
0 1
1 1−
Yt (j )
1−
dj
(1)
Here 1/ > 1 represents the elasticity of demand for each intermediate good. The ﬁrm takes input prices Pt (j ) and output prices Pt as given.
(7)
.116
S. Schorfheide
Proﬁt maximization implies that the demand for intermediate goods is Yt (j ) = Pt (j ) Pt
−1/
Yt
(2)
The relationship between intermediate goods prices and the price of the ﬁnal good is Pt =
0 1
Pt (j )
−1
−1
dj
(3)
Intermediate good j is produced by a monopolist who has access to the linear production technology Yt (j ) = At Nt (j ). and leisure. We assume that the habit stock is given by the level of technology At .
(5)
where governs the price stickiness in the economy and is the steadystate inﬂation rate associated with the ﬁnal good. (4)
where At is an exogenous productivity process that is common to all ﬁrms and Nt (j ) is the labor input of ﬁrm j . An and F. Firm j chooses its labor input Nt (j ) and the price Pt (j ) to maximize the present value of future proﬁts
∞ t s=0 s
Qt +st
Pt +s (j ) Yt +s (j ) − Wt +s Nt +s (j ) − ACt +s (j ) Pt +s
(6)
Here Qt +st is the time t value of a unit of the consumption good in period t + s to the household. The representative household derives utility from real money balances Mt /Pt and consumption Ct relative to a habit stock. real money balances. The household derives disutility from hours worked Ht and maximizes
∞ t s=0 s
(Ct +s /At +s )1− − 1 + 1−
M ln
Mt +s Pt +s
−
H Ht +s
. which is treated as exogenous by the ﬁrm. Firms face nominal rigidities in terms of quadratic price adjustment costs ACt (j ) = Pt (j ) − Pt −1 (j )
2
2
Yt (j ). Labor is hired in a perfectly competitive factor market at the real wage Wt . This assumption ensures that the economy evolves along a balanced growth path even if the utility function is additively separable in consumption.
The ﬁscal authority consumes a fraction t of aggregate output Yt . it receives aggregate residual real proﬁts Dt from the ﬁrms and has to pay lumpsum taxes Tt . We consider two speciﬁcations for Rt∗ . Furthermore. and M and H are scale factors that determine steadystate real money balances and hours worked.
(12)
.Bayesian Analysis of DSGE Models
117
where is the discount factor. The government levies a lumpsum tax (subsidy) to ﬁnance any shortfalls in government revenues (or to rebate any surplus). Monetary policy is described by an interest rate feedback rule of the form Rt = Rt
∗ 1− R
R Rt −1 e R . 1] follows an exogenous process. (8)
where SCt is the net cash inﬂow from trading a full set of statecontingent securities.t is a monetary policy shock and Rt∗ is the (nominal) target rate.
(9)
where R . where Gt =
t Yt . 1/ is the intertemporal elasticity of substitution. which in equilibrium coincides with the steadystate inﬂation rate. where t ∈ [0. which rules out Ponzi schemes. We will set H = 1.t
. one in which the central bank reacts to inﬂation and deviations of output from potential output: Rt∗ = r
∗ t ∗
1
Yt Yt∗
2
(output gap rule speciﬁcation)
(10)
and a second speciﬁcation in which the central bank responds to deviations of output growth from its equilibrium steadystate : Rt∗ = r
∗ t ∗
1
Yt Yt −1
2
(output growth rule speciﬁcation)
(11)
Here r is the steadystate real interest rate. The household supplies perfectly elastic labor services to the ﬁrms taking the real wage Wt as given. The household has access to a domestic bond market where nominal government bonds Bt are traded that pay (gross) interest Rt . The usual transversality condition on asset accumulation applies. and ∗ is the target inﬂation rate. Yt∗ in (10) is the level of output that would prevail in the absence of nominal rigidities. The government’s budget constraint is given by Pt Gt + Rt −1 Bt −1 = Tt + Bt + Mt − Mt −1 . Thus the household’s budget constraint is of the form Pt Ct + Bt + Mt − Mt −1 + Tt = Pt Wt Ht + Rt −1 Bt −1 + Pt Dt + Pt SCt . t is the gross inﬂation rate deﬁned as t = Pt /Pt −1 .
Aggregate productivity evolves according to ln At = ln + ln At −1 + ln zt .2. Qt +st in (6) is Qt +st = (Ct +s /Ct )− (At /At +s )1− (16)
It can be shown that output.t
(13)
Thus on average technology grows at the rate . We assume that ln gt = (1 −
g )ln g
+
g ln gt −1
+
g . The market clearing conditions are given by Yt = Ct + Gt + ACt and Ht = Nt (15)
Since the households have access to a full set of statecontingent claims. Equilibrium Relationships We consider the symmetric equilibrium in which all intermediate goods producing ﬁrms make identical choices so that the j subscript can be omitted. respectively. and inﬂation have to satisfy the following optimality conditions 1= 1= 1
t
Ct +1 /At +1 Ct /At Ct At
t
−
At Rt At +1 t +1
t
(17) 1− 1 2
t +1 t
1−
+ ( Ct +1 /At +1 Ct /At
− )
−
+
2
t +1
−
Yt +1 /At +1 ( Yt /At
− )
(18)
In the absence of nominal rigidities ( = 0) aggregate output is given by Yt∗ = (1 − )1/ At gt .t is assumed to be serially uncorrelated.118
S. where ln zt =
z ln zt −1
+
z. (19)
. the monetary policy shock R . consumption.3. interest rates. The three innovations are independent of each other at all leads and lags and are normally distributed with means zero and standard deviations z . and R . and zt captures exogenous ﬂuctuations of the technology growth rate. Deﬁne gt = 1/(1 − t ). 2. An and F. g . Exogenous Processes The model economy is perturbed by three exogenous processes. Schorfheide
2.t
(14)
Finally.
ct . ˆ t . ˆ t .
c = (1 − )1/ .t . Model Solutions Equations (21) to (26) form a nonlinear rational expectations system ˆ ˆ in the variables yt . The steadystate inﬂation equals the target rate ∗ and r = .t . zt ]
Under the output growth rule speciﬁcation for Rt∗ the vector st also contains yt −1 . gt . Then the model can be expressed as 1= 1−
2 t z e − cˆt +1 + cˆt +Rt −ˆt +1 − ˆ t +1
(21)
e
ct ˆ
− 1 = e ˆt − 1 −
t
1−
1 2
e ˆt +
1 2 (22) (23) − gt ) + ˆ
R .Bayesian Analysis of DSGE Models
119
which is the target level of output that appears in the output gap rule speciﬁcation. y ˆ
2
ˆ ˆ R .t . g .t . The model economy has a unique steadystate in terms of the detrended variables that is attained if the innovations R .t are zero at all times. Since the nonstationary technology process At induces a stochastic trend in output and consumption.t ] . R =r
∗
.
and
y = g (1 − )1/
(20)
Let xt = ln (xt /x) denote the percentage deviation of a variable xt from its ˆ steadystate x.t z. ct .t
y 2 (ˆt
(24) (25) (26)
gt = ˆ zt = ˆ
ˆ g gt −1 ˆ z zt −1
For the output growth rule speciﬁcation. and zt that is driven by the vector of ˆ ˆ innovations t = [ R .t . This rational expectations system has to be solved before the DSGE model can be estimated. Rt . Deﬁne2 st = [ˆt . ˆ
. it is convenient to express the model in terms of detrended variables ct = Ct /At and yt = Yt /At . Equation (24) is replaced by Rt =
R Rt −1
+ (1 −
R)
1 ˆt
+ (1 −
R)
2
yt + z t + ˆ ˆ
R . and z.t
(27)
2. g .4. gt .t
y y e ˆ t +1 − 1 e − cˆt +1 + cˆt +ˆt +1 −ˆt + ˆ t +1 2
y e cˆt −ˆt = e −gˆt −
g
2
e ˆt − 1
R)
2
Rt =
R Rt −1
+ (1 −
R)
1 ˆt
+ (1 − + +
g . z. Rt .
i ( ) are functions of the structural parameters of the DSGE model. The resulting law of motion for the j th element of st takes the form
J n (s) j . ) (28)
From an econometric perspective. ˆ ˆ ˆ where = 1−
2
(32)
Equations (29) to (31) combined with (24) to (26) and the trivial identity R .t . or there are multiple stable solutions (indeterminacy). For instance.t −1 i=1
sj . Here the coefﬁcients j(s) and . Kim (2000). Binder and Pesaran (1997).t =
+
l =1
( ) j . st can be viewed as a (partially latent) state vector in a nonlinear state space model and (28) is the state transition equation. King and Watson (1998). the stable solution is unique (determinacy). linear approximation methods are very popular because they lead to a state–space representation of the DSGE model that can be analyzed with the Kalman ﬁlter. Uhlig (1999). Linearization and straightforward manipulation of Equations (21) to (23) yields yt = ˆ y t [ˆt +1 ] + gt − ˆ ˆt = ˆ t [gt +1 ] − 1 Rt −
t [ t +1 ]
−
z t [ˆt +1 ]
(29) (30) (31)
t [ ˆ t +1 ]
+ (ˆt − gt ) y ˆ
ct = yt − gt .J
(33)
Here J denotes the number of elements of the vector st and n is the number of shocks stacked in the vector t . In the context of likelihoodbased DSGE model estimation. An and F.120
S.t form a linear rational expectations system in st for which several solution algorithms are available.
. j .t = R .
. We will focus on the case of determinacy and restrict the parameter space accordingly. A variety of numerical techniques are available to solve rational expectations systems. Christiano (2002).l While in many applications ﬁrstorder approximations are sufﬁcient. Schorfheide
The solution of the rational expectations system takes the form st = (st −1 . Anderson (2000).
j = 1.i si. the higherorder reﬁnement is an active area of research.l l . t . for instance. Depending on the parameterization of the DSGE model there are three possibilities: no stable rational expectations solution exists. Blanchard and Kahn (1980). and Sims (2002).
SchmittGrohé and Uribe (2006). and Swanson et al. (2005). for instance.t =
(0) j
+
+ +
+
+
(37)
. a more accurate solution may be necessary. which. A secondorder approximation of both U and C leads to 1 2 [U (ˆ )] = U (0) + U (2) (0)C (1) (c) c 2
2
1 + U (1) (0)C (2) (c) 2
2
+ o( 2 )
(36)
Using a secondorder expansion of the utility function together with a ﬁrstorder expansion of consumption ignores the second term in (36) and is only appropriate if either U (1) (0) or C (2) (c) is zero. Suppose that consumption is a smooth function of a zero mean random shock which is scaled by . see. Algorithms to construct such solutions have been developed by Judd (1998). Jin and Judd (2002). The ﬁrst case arises if the marginal utility of consumption in the steadystate is zero. Consider a oneperiod model in which utility is a smooth function of consumption and is approximated as 1 c c c U (ˆ ) = U (0) + U (1) (0)ˆ + U (2) (0)ˆ 2 + o(ˆ 2 ). in the DSGE model described above.t l .il si.t −1 i=1 J n (s ) j . Kim et al.l l .t −1 l .Bayesian Analysis of DSGE Models
121
if the goal of the analysis is to compare welfare across policies or market structures that do not have ﬁrstorder effects on the model’s steadystate or to study asset pricing implications of DSGE models. Kim and Kim (2003) or Woodford (2003). for instance.t −1 i=1 l =1
sj . A simple example can illustrate this point. and the second case arises if the percentage deviation of consumption from the steadystate is a linear function of the shock. A secondorder accurate solution to the DSGE model can be obtained from a secondorder expansion of the equilibrium conditions (21) to (26).t i=1 l =1 J (ss) j . The resulting state transition equation can be expressed as
J n (s) j . is not affected by the coefﬁcients 1 and 2 of the monetary policy rule (24). (2005).t i=1 l =1 J ( ) j . c 2 (34)
where c is the percentage deviation of consumption from its steadystate ˆ and the superscript i denotes the ith derivative. The secondorder approximation of c around the steadystate ( = 0) is ˆ c = C (1) (c) ˆ 1 + C (2) (c) 2
2 2
+ op ( 2 )
(35)
If ﬁrstorder approximations are used for both U and C .t −1 sl .t l =1 n n ( ) j . Collard and Juillard (2001).il i.il si. then the expected utility is simply approximated by the steadystate utility U (0).i si.
For the subsequent analysis we use Sims’ (2002) procedure to compute a ﬁrstorder accurate solution of the DSGE model and SchmittGrohé and Uribe’s (2006) algorithm to obtain a secondorder accurate solution. and Taylor and Uhlig (1990). there are several global approximation schemes. including projection methods such as the ﬁniteelements method and Chebyshev–polynomial method on the spectral domain. ) or alternatively by ( . deﬁned in (32).
will be will be
. Since in the ﬁrstorder approximation the parameters and are not separately identiﬁable.5. (2004) compare the accuracy of alternative solution methods. While perturbation methods approximate policy functions only locally.
R. Let =[ . annualized quartertoquarter inﬂation rates (INFL). Moreover. Measurement Equations The model is completed by deﬁning a set of measurement equations that relate the elements of st to a set of observables.
1. we express the model in terms of . 2. z.
(Q )
. g. model economy as =1+
(Q ) (A) (Q ) (A) (A)
+ 100(ˆt − yt −1 + zt ) y ˆ ˆ + 400 ˆ t +r
(A)
(38)
(Q )
+4
+ 400Rt
.122
S. and annualized nominal interest rates (INT). Judd (1998) covers various solution methods. and Aruoba et al. and their relationship to the model variables is given by the set of equations YGRt = INFLt = INTt = The parameters (Q ) . Den Haan and Marcet (1994).
(A)
. An and F. . 2. and r (A) are related to the steady states of the
(A)
100
. The three series are measured in percentages. r (A)
. augmented with the steadystate ratio c/y.
=
1 1 + r (A) /400
.
g.
=1+
400
The structural parameters are collected in the vector . which is equal to 1/g . R.
z]
For the quadratic approximation the composite parameter replaced by either ( . Schorfheide
As before. We assume that the time period t in the model corresponds to one quarter and that the following observations are available for estimation: quartertoquarter per capita GDP growth rates (YGR). ). the coefﬁcients are functions of the parameters of the DSGE model.
then the resulting forecast error covariance matrix is nonsingular. e. unlike GMM estimation based on equilibrium relationships such as the consumption Euler equation (21).g. inﬂation. Christiano and Eichenbaum (1992).g. minimum distance estimation based on the discrepancy among VAR and DSGE model impulse response functions. Hansen and Heckman (1996).. 3. e. prior distributions can be used to incorporate additional information into the parameter estimation. or the monetary policy rule (24). the discrepancy between DSGE model responses and VAR impulse responses. ranging from calibration. Third.1. and Kim (2000). for instance composed of output growth. Second.. INTt . and Sims (1996). Rotemberg and Woodford (1997) and Christiano et al. and nominal interest rates. INFLt . We will subsequently elaborate these challenges and discuss how a Bayesian approach can be used to cope with them. for instance. which has three main characteristics. to fullinformation likelihoodbased estimation as in Altug (1989). the price setting equation of the intermediate goods producing ﬁrms (22). Throughout the paper we will use the following notation: the n × 1 vector yt stacks the time t observations that are used to estimate the DSGE model. e.g. and the posterior density by p(  Y ). The sample ranges from t = 1 to T and the sample observations are collected in the matrix Y with rows yt . Kydland and Prescott (1982). Hence any DSGE model that generates a rankdeﬁcient covariance matrix for yt is clearly at odds with the data and suffers from an obvious form
. (2005). Potential Model Misspeciﬁcation If one predicts a vector of time series yt . McGrattan (1994). Much of the methodological debate surrounding the various estimation (and model evaluation) techniques is summarized in papers by Kydland and Prescott (1996).. the Bayesian analysis is systembased and ﬁts the solved DSGE model to a vector of aggregate time series.Bayesian Analysis of DSGE Models
123
3. First. the estimation is based on the likelihood function generated by the DSGE model rather than. over generalized method of moments (GMM) estimation of equilibrium relationships. We denote the prior density by p( ). the likelihood function by (  Y ). Leeper and Sims (1994). by a function of past yt ’s. PRELIMINARIES Numerous formal and informal econometric procedures have been proposed to parameterize and evaluate DSGE models. Any estimation and evaluation method is confronted with the following challenges: potential model misspeciﬁcation and possible lack of identiﬁcation of parameters of interest. In the context of the model developed in Section 2 yt is composed of YGRt . We focus on Bayesian estimation of DSGE models.
technology growth shock) equals the number of observables (output growth. due to the stylized nature and the resulting misspeciﬁcation
. However. estimates of the discount factor should be consistent with our knowledge about the average magnitude of real interest rates. One reason that pure maximum likelihood estimation of DSGE models has not turned into the estimation method of choice is the “dilemma of absurd parameter estimates. In this paper we pursue the latter approach by considering a model in which the number of structural shocks (monetary policy shock. interest rates) to which the model is ﬁtted.g. (2006) document that even an elaborate DSGE model with capital accumulation and various nominal and real frictions has difﬁculties attaining the ﬁt achieved with VARs that have been estimated with welldesigned shrinkage methods.124
S. whereas the other branch has modiﬁed the model speciﬁcation to remove the singularity by adding socalled measurement errors.. Hence one branch of the literature has emphasized procedures that can be applied despite the singularity. Schorfheide
of misspeciﬁcation. it is important to have empirical strategies available that are able to cope with potential model misspeciﬁcation. e. inﬂation.” Estimates of structural parameters generated with maximum likelihood procedures based on a set of observations Y are often at odds with additional information that the research may have. A second source of misspeciﬁcation that is more difﬁcult to correct is potentially invalid crosscoefﬁcient restrictions on the time series representation of yt generated by the DSGE model. Sargent (1989). asymptotically minimize the Kullback–Leibler distance (see White. While the primary research objectives in the DSGE model literature is to overcome discrepancies between models and reality. Del Negro et al. This singularity is an obstacle to likelihood estimation. the “true” intertemporal substitution elasticity or price adjustment costs and. For example. even if observations on interest rates are not included in the estimation sample Y . Each estimation method is associated with a particular measure of discrepancy between the ‘true’ law of motion and the class of approximating models. the most precise impulse responses to a technology or monetary policy shock. Ireland (2004). for instance. Time series estimates of aggregate labor supply elasticities or price adjustment costs should be broadly consistent with microlevel evidence. say. Altug (1989). 1982). An and F. or additional structural shocks as in Leeper and Sims (1994) and more recently Smets and Wouters (2003). then it seems reasonable to assume that there need not exist a single parameter vector 0 that delivers. simultaneously. Likelihoodbased estimators. government spending shock. Once one acknowledges that the DSGE model provides merely an approximation to the law of motion of the time series yt . Invalid restrictions manifest themselves in poor outofsample ﬁt relative to more densely parameterized reference models such as VARs.
the likelihood function is reweighted by a prior density. in which yt is
. Since priors are always subject to revision. In a Bayesian framework. Consider the following two models. from a probability model that implies that different values of structural parameters lead to the same joint distribution for the observables Y .Bayesian Analysis of DSGE Models
125
of most DSGE models. However. and in a posterior odds comparison the DSGE model will automatically be penalized for not being able to reconcile the two sources of information with a single set of parameters. Identiﬁcation Identiﬁcation problems can arise owing to a lack of informative observations or. 3. At ﬁrst glance the identiﬁcation of DSGE model parameters does not appear to be problematic. the likelihood function often peaks in regions of the parameter space that appear to be inconsistent with extraneous information. deﬁned as p(Y ) = (  Y )p( )d . a VAR. If the likelihood function peaks at a value that is at odds with the information that has been used to construct the prior distribution. and comparisons to reference models that relax some of the crosscoefﬁcient restrictions generated by the DSGE models. say. posterior predictive checks. then the marginal data density of the DSGE model. The delicate identiﬁcation problems that arise in rational expectations models can be illustrated in a simple example adopted from Lubik and Schorfheide (2006). The former paper provides an algorithm to construct families of observationally equivalent linear rational expectations models. The prior can bring to bear information that is not contained in the estimation sample Y . Lack of identiﬁcation is documented in papers by Beyer and Farmer (2004) and Canova and Sala (2005). The parameter vector is typically low dimensional compared to VARs and the model imposes tight restrictions on the time series representation of Y . more fundamentally. Section 5 will discuss several methods that have been proposed to assess the ﬁt of DSGE models: posterior odds comparisons of competing model speciﬁcations.2. the shift from prior to posterior distribution can be an indicator of the tension between different sources of information. whereas the latter paper compares the informativeness of different estimators with respect to key structural parameters in a variety of DSGE models. (39)
will be low compared to. recent experience with the estimation of New Keynesian DSGE models has cast some doubt on this notion and triggered a more careful assessment.
t
∼ iid 0. Schorfheide
the observed endogenous variable and ut is an unobserved shock process. the law of motion of yt is yt = yt −1 +
t. 1)
(42)
Under restrictions on the parameter spaces that guarantee uniqueness of a stable rational expectations solution3 we obtain the following relationships between and the structural parameters:
1
:
= . even a weakly informative prior can introduce curvature into the posterior density surface that facilitates numerical maximization and the use of MCMC methods. but we introduce a backwardlooking term yt −1 on the righthand side to generate persistence:
2
: yt =
1
t [yt +1 ] +
yt −1 + ut .
2
ut =
2
t. While Bayesian inference is based on the likelihood function.  Y ) = p(  Y )p(  )
3
(43)
 −
2
To
2
 +
ensure determinacy − 4  > 2 in 2 . The likelihood functions of 1 and 2 have ridges that can cause serious problems for numerical optimization procedures. (1 − / )2 (40)
In model 2 the shocks are serially uncorrelated. the parameters and are not separately identiﬁable. Consider model 1 .
ut = ut −1 + t . and in model 2 .126
S. In model 1 .
t
∼ iid 0. the ut ’s are serially correlated:
1
: yt =
1
t [yt +1 ]
+ ut . Here = [ . ] and the likelihood function can be written as ( . The calculation of a valid conﬁdence set is challenging since it is difﬁcult to trace out the parameter subspace along which the likelihood function is constant. t
∼ iid(0.
we
impose
>1
in
1
and
−4
< 2
and
. Moreover. models 1 and 2 are observationally equivalent. An and F.  Y ) = ( ).
+
−4
(41)
2
For both speciﬁcations.
2
:
=
1 ( − 2
2
−4
)
In model 1 the parameter is not identiﬁable. Straightforward manipulations of the Bayes theorem yield p( .
in Bils and Klenow (2004).
. For instance. the steadystate parameters (Q ) . see Poirier (1998) for a discussion. as well as the standard deviation of the monetary policy shock.3. This is of course well known in Bayesian econometrics. A direct comparison of priors and posteriors can often provide valuable insights about the extent to which data provide information about the parameters of interest. in practice most priors are chosen based on some observations. since the mapping from the vector of structural parameters into the state–space representation that determines the joint probability distribution of Y is highly nonlinear and typically can only be evaluated numerically. inﬂation. in principle. prior distributions will play an important role in the estimation of DSGE models. In the latter case priors are essentially used to reduce the dimensionality of the econometric model and hence the sampling variability of the parameter estimates. for instance. and r (A) . (A) . it is often helpful to simulate the prior predictive distribution for various sample moments and check that
4 The role of priors in DSGE model estimation is markedly different from the role of priors in VAR estimation. However. are quantiﬁed based on regressions run on pre(estimation)sample observations of output growth.Bayesian Analysis of DSGE Models
127
Thus the prior distribution is not updated in directions of the parameter space in which the likelihood function is ﬂat. It is difﬁcult to detect directly identiﬁcation problems in large DSGE models. lack of identiﬁcation provides a challenge for scientiﬁc reporting.4 While. They might also add curvature to a likelihood function that is (nearly) ﬂat in some dimensions of the parameter space and therefore strongly inﬂuence the shape of the posterior distribution. and nominal interest rates. 3. They might downweigh regions of the parameter space that are at odds with observations not contained in the estimation sample Y . priors can be gleaned from personal introspection to reﬂect strongly held beliefs about the validity of economic theories. Priors for the autocorrelations and standard deviations of the exogenous processes. Lubik and Schorfheide (2006) estimate a twocountry version of the model described in Section 2. in particular the distribution of shock standard deviations. as the audience typically would like to know what features of the posterior are generated by the prior rather than the likelihood. The prior for the parameter that governs price stickiness is chosen based on microevidence on price setting behavior provided. The posterior distribution is well deﬁned as long as the joint prior distribution of the parameters is proper. Priors As indicated in our previous discussion. To ﬁnetune the prior distribution. The priors for the coefﬁcients in the monetary policy rule are loosely centered around values typically associated with the Taylor rule.
(27). (21)–(26). Benchmark Model with output gap rule. Prior to the truncation the distribution speciﬁed in Table 2 places about 2% of its mass on parameter values that imply indeterminacy. except in ﬁrstorder approximation Eq. autocorrelations. however. except that central bank does not respond to the output gap: 1 (L). consists of Eqs.128
S. These priors are adopted from Lubik and Schorfheide (2006).
(1 + h)ˆt = y
1 (L) 1 (Q ) 2 (L) 5 (L)
+ h yt −1 + gt − ˆ ˆ
ˆ t [gt +1 ]
−
1
Rt −
t [ t +1 ]
− [ˆt +1 ] with h = 95. Table 1 provides a characterization of the model speciﬁcations and data sets. rational expectations models can have multiple equilibria. (21)–(26).
2
1 (L). solved by ﬁrstorder approximation. The Road Ahead Throughout this paper we are estimating and evaluating versions of the DSGE model presented in Section 2 based on simulated data. For convenience. An and F. the prior distribution is truncated at the boundary of the determinacy region.
TABLE 1 Model speciﬁcations and data sets
1 (L)
1 (Q ) 2 (L)
Benchmark Model with output gap rule. consists of Eqs. or relative volatilities. it is assumed that all parameters are a priori independent. Same as Same as Same as
1 (L).
3 (L) 4 (L) 5 (L)
except that prices are nearly ﬂexible:
= 5. Table 2 lists the marginal prior distributions for the structural parameters of the DSGE model. and specify independent priors on the transformed parameters.4. Eq. (29)is replaced by t [yt +1 ]
= 0. which induce dependent priors for the original parameters. z
80 observations generated with 80 observations generated with 80 observations generated with 80 observations generated with
1 (L) 1 (Q ) 2 (L) 5 (L)
. As mentioned before. such as steady rate ratios. (24) is replaced by Eq. A formalization of such a prior predictive check can be found in Geweke (2005). DSGE Model with output growth rule. that we will use in the subsequent analysis. consists of Eqs. solved by secondorder approximation. Such an occurrence would suggest that the model is incapable of explaining salient data properties. In applications in which the independence assumption is unreasonable one could derive parameter transformations. Hence. The parameters (conditional on ) and 1/g only affect the secondorder approximation of the DSGE model. Schorfheide
the prior does not place little or no mass in a neighborhood of important features of the data. While this may be of independent interest we do not pursue this direction in this paper. (21)–(26). solved by ﬁrstorder approximation. 3.
Bayesian Analysis of DSGE Models
129
Model 1 is the benchmark speciﬁcation of the DSGE model in which monetary policy follows an interestrate rule that reacts to the output gap. 1 (L) is approximated by a linear rational expectations system, whereas 1 (Q ) is solved with a secondorder perturbation method. In 2 (L) we replace the output gap in the interestrate feedback rule by output growth. 3 and 4 are identical to 1 (L), except that in one case we impose that the prices are nearly ﬂexible ( = 5) and in the other case the central bank does not respond to output ( 2 = 0). In 5 (L) we replace the loglinearized consumption Euler equation by an equation that includes lagged output on the righthand side. Loosely speaking, this speciﬁcation can be motivated with a model in which agents derive utility from the difference between individual consumption and the overall level of consumption in the previous period. Conditional on parameter vectors reported in Tables 2 and 3 we generate four data sets with 80 observations each. We use the labels 1 (L), 1 (Q ), 2 (L), and 5 (L) to denote data sets generated from 1 (L), (Q ), 2 (L), and 5 (L), respectively. By and large, the values for the 1 DSGE model parameters resemble empirical estimates obtained from post1982 U.S. data. The sample size is fairly realistic for the estimation of monetary DSGE model. Many industrialized countries experienced a high inﬂation episode in the 1970s that ended with a disinﬂation in the 1980s. Subsequently, the level and the variability of inﬂation and interest rates have been fairly stable, so that it is not uncommon to estimate constant coefﬁcient DSGE models based on data sets that begin in the early 1980s.
TABLE 2 Prior distribution and DGPs—linear analysis (Sections 4 and 5) Prior Name Domain
+ + 1 2 R G Z + +
Density Gamma Gamma Gamma Gamma Beta Beta Beta Gamma Gamma Normal InvGamma InvGamma InvGamma
Para (1) 2 00 20 1 50 50 50 80 66 50 7 00 40 40 1 00 50
Para (2) 50 10 25 25 20 10 15 50 2 00 20 4 00 4 00 4 00
1 (L),
DGP 2 (L), 2 00 15 1 50 1 00 60 95 65 40 4 00 50 20 80 45
5 (L)
[0, 1) [0, 1) [0, 1)
+ + + + +
r (A)
(A) (Q )
100 100 100
R G Z
Notes: Paras (1) and (2) list the means and the standard deviations for Beta, Gamma, and Normal distributions; the upper and lower bound of the support for the Uniform distribution; s and for 2 2 the Inverse Gamma distribution, where p G (  , s) ∝ − −1 e − s /2 . The effective prior is truncated at the boundary of the determinacy region.
130
S. An and F. Schorfheide
TABLE 3 Prior distribution and DGP—Nonlinear analysis (Section 6) Prior Name Domain
+ + 1 2 R G Z + +
Density Gamma Gamma Gamma Gamma Beta Beta Beta Gamma Gamma Normal InvGamma InvGamma InvGamma Beta Beta
Para (1) 2 00 30 1 50 50 50 80 66 80 4 00 40 30 40 40 10 85
Para (2) 50 20 25 25 20 10 15 50 2 00 20 4 00 4 00 4 00 05 10
DGP 1 (Q ) 2 00 33 1 50 125 75 95 90 1 00 3 20 55 20 60 30 10 85
[0, 1) [0, 1) [0, 1)
+ + + + +
r (A)
(A) (Q )
100 100 100 1/g
R G Z
[0, 1) [0, 1)
Notes: See Table 2. Parameters solution of the DSGE model.
and 1/g only affect the secondorder accurate
Section 4 illustrates the estimation of linearized DSGE models under correct speciﬁcation, that is, we construct posteriors for 1 (L) and 2 (L) based on the data sets 1 (L) and 2 (L). Section 5 considers model evaluation techniques. We begin with posterior predictive checks in Section 5.1, which are implemented based on 1 (L) for data sets 1 (L) (correct speciﬁcation of DSGE model) and 5 (L) (misspeciﬁcation of the consumption Euler equation). In Section 5.2 we proceed with the calculation of posterior model probabilities for speciﬁcations 1 (L), 4 (L) conditional on the data set 1 (L). Subsequently in 3 (L), and Section 5.3 we compare 1 (L) to a vector autoregressive speciﬁcation using 1 (L) (correct speciﬁcation) and 5 (L) (misspeciﬁcation). Finally, in Section 6 we study the estimation of DSGE models solved with a secondorder perturbation method. We compare posteriors for 1 (Q ) and 1 (L) conditional on 1 (Q ). 4. ESTIMATION OF LINEARIZED DSGE MODELS We will begin by describing two algorithms that can be used to generate draws from the posterior distribution of and subsequently illustrate their performance in the context of models 1 (L)/ 1 (L) and 2 (L)/ 2 (L). 4.1. Posterior Computations We consider an RWM algorithm and an IS algorithm to generate draws from the posterior distribution of . Both algorithms require the evaluation
Bayesian Analysis of DSGE Models
131
of (  Y )p( ). The computation of the nonnormalized posterior density proceeds in two steps. First, the linear rational expectations system is solved to obtain the state transition equation (33). If the parameter value implies indeterminacy (or nonexistence of a stable rational expectations solution), then (  Y )p( ) is set to zero. If a unique stable solution exists, then the Kalman ﬁlter5 is used to evaluate the likelihood function associated with the linear state–space system (33) and (38). Since the prior is generated from wellknown densities, the computation of p( ) is straightforward. The RWM algorithm belongs to the more general class of Metropolis– Hastings algorithms. This class is composed of universal algorithms that generate Markov chains with stationary distributions that correspond to the posterior distributions of interest. A ﬁrst version of such an algorithm had been constructed by Metropolis et al. (1953) to solve a minimization problem and was later generalized by Hastings (1970). Chib and Greenberg (1995) provide an excellent introduction to Metropolis– Hastings algorithms. The RWM algorithm was ﬁrst used to generate draws from the posterior distribution of DSGE model parameters by Schorfheide (2000) and Otrok (2001). We will use the following implementation based on Schorfheide (2000):6 RandomWalk Metropolis (RWM) Algorithm Use a numerical optimization routine to maximize ln (  Y ) + ln p( ). Denote the posterior mode by ˜ . Let be the inverse of the Hessian computed at the posterior mode ˜ . 2 Draw (0) from ( ˜ , c0 ) or directly specify a starting value. For s = 1, , nsim , draw from the proposal distribution ( (s−1) , c 2 ). The jump from (s−1) is accepted ( (s) = ) with probability min 1, r ( (s−1) ,  Y ) and rejected ( (s) = (s−1) ) otherwise. Here r(
(s−1)
1. 2. 3. 4.
,
Y) =
(
(s−1)
(  Y )p( )  Y )p( (s−1) )
5. Approximate the posterior expected value of a function h( ) by nsim 1 (s) ). s=1 h( nsim
Since according to our model st is stationary, the Kalman ﬁlter can be initialized with the unconditional distribution of st . To make the estimation in Section 4 comparable to the DSGEVAR analysis presented in Section 5.3 we adjust the likelihood function to condition on the ﬁrst four observations in the sample: (  Y )/ (  y1 , , y4 ). These four observations are later used to initialize the lags of a VAR. 6 This version of the algorithm is available in the userfriendly DYNARE (2005) package, which automates the Bayesian estimation of linearized DSGE models and provides routines to solve DSGE models with higherorder perturbation methods.
5
DeJong et al. the posterior distribution of will be asymptotically normal. e. The elements of the Hessian matrix in Step 2 are computed numerically for various values of the increment d . This allows for an efﬁcient exploration of the posterior distribution at least in the neighborhood of the mode. Geweke (1999a. written by Chris Sims for the maximum likelihood estimation of a DSGE model conducted in Leeper and Sims (1994). (2000) used an IS algorithm to calculate posterior moments of the parameters of a linearized stochastic growth model.g. The idea of the algorithm is based on the identity [h( )  Y ] = h( )p(  Y )d = h( )p(  Y ) q( )d q( )
Draws from the posterior density p(  Y ) are replaced by draws from the density q( ) and reweighted by the importance ratio p(  Y )/q( ) to obtain a numerical approximation of the posterior moment of interest. The maximization of the posterior density kernel is carried out with a version of the BFGS quasiNewton algorithm. There typically exists a range for d over which the derivatives are stable. The algorithm uses a fairly simple line search and randomly perturbs the search direction if it reaches a cliff caused by nonexistence or nonuniqueness of a stable rational expectations solution for the DSGE model. Schorfheide
Under fairly general regularity conditions.132
S. This parameter transformation is only used in Step 1 of the RWM algorithm. Hammersley and Handscomb (1964) were among the ﬁrst to propose this method and Geweke (1989) provides important convergence results. they are often helpful. While Steps 1 and 2 are not necessary for the implementation of the RWM algorithm. The RWM algorithm generates a sequence of dependent draws from the posterior distribution of that can be averaged to approximate posterior moments. The particular version of the IS algorithm used subsequently is of the following form.. 2005) reviews regularity conditions that guarantee the convergence of the Markov chain generated by Metropolis–Hastings algorithms to the posterior distribution of interest nsim 1 and the convergence of nsim s=1 h( (s) ) to the posterior expectations [h( )  Y ]. Prior to the numerical maximization we transform all parameters so that their domain is unconstrained. The algorithm constructs a Gaussian approximation around the posterior mode and uses a scaled version of the asymptotic covariance matrix as the covariance matrix for the proposal distribution.
. As d approaches zero the numerical derivatives will eventually deteriorate owing to inaccuracies in the evaluation of the objective function. and Kim (1998). An and F. Crowder (1988). Walker (1969).
An introduction to this
. we generate 200 independent draws from the truncated prior distribution. Simulation Results for
1 (L) 1 (L)/ 1 (L)
We begin by estimating the loglinearized output gap rule speciﬁcation based on data set 1 (L). then most of the ws ’s will be much smaller than 1/nsim and the approximation method is inefﬁcient. We construct a Gaussian approximation of the posterior near the mode. Figure 1 depicts the draws from the prior as well as every 5. The lack of information about some of the key structural parameters resembles the empirical ﬁndings based on actual observations. Moreover. then the importance weights ws are constant and one averages independent draws to approximate the posterior expectations of interest. For s = 1. 4. We use the RWM algorithm (c0 = 1. A visual comparison of priors and posteriors suggests that the 80 observations sample contains little information on the riskaversion parameter and the policy rule coefﬁcients 1 and 2 . however. There is. If the importance density concentrates its mass in a region of the parameter space in which the posterior density is very low.2.000th draw from the posterior distribution in twodimensional scatter plots. c = 0 3) to generate 1 million draws from the posterior distribution. the posteriors of steadystate parameters. 5. scale the asymptotic covariance matrix.Bayesian Analysis of DSGE Models
133
1. Let be the inverse of the Hessian computed at the posterior mode ˜ . The rejection rate is about 45%. 3. nsim ˜ ˜ Compute ws = ( (s)  Y )p( (s) )/q( (s) ) and ws = ws / s=1 ws . If the two are equal. We also indicate the location of the posterior mode in the 12 panels of the ﬁgure. Denote the posterior mode by ˜ . and the shock standard deviations are sharply peaked relative to the prior distributions. information about the degree of pricestickiness. and replace the normal distribution with a fattailed t distribution to implement the algorithm. 6.
Importance Sampling (IS) Algorithm Use a numerical optimization routine to maximize ln (  Y ) + ln p( ). and degrees of freedom. Let q( ) be the density of a multivariate t distribution with mean ˜ . scale matrix c 2 . . s=1 w(
The accuracy of the IS approximation depends on the similarity between posterior kernel and importance density. captured by the slope coefﬁcient of the Phillipscurve relationship (30). 2. ˜ Approximate the posterior expected value of a function h( ) by nsim (s) )h( (s) ). 4. nsim generate draws (s) from q( ). Moreover. the autocorrelation coefﬁcients. There exists an extensive literature on convergence diagnostics for MCMC methods such as the RWM algorithm.
in Robert and Casella (1999). Visual inspection of
. Data 1 (L). 2 ) in Figure 2. ( 4.000th draw of ( . and ( 6. respectively. Intersections of solid lines signify posterior mode values. 1) and (1. 1). We run four independent Markov chains. Panels (1. As before. 2 0). for instance. Schorfheide
FIGURE 1 Prior and posterior – Model 1 (L). 2) of the ﬁgure depict posterior contours at the posterior mode. 2 0). While many convergence diagnostics are based on formal statistical tests. we set c = 0 3.134
S. Except for and 2 . we will compare draws and recursively computed means from multiple chains. we will consider informal graphical methods in this paper. The panels depict 200 draws from prior and posterior distributions. all parameters are initialized at their posterior mode values. 2 ) are (3 0. An and F. These authors distinguish between convergence of the Markov chain to its stationary distribution. (3 0. and convergence to iid sampling. convergence of empirical averages to posterior moments. and plot every 5. More speciﬁcally.
literature can be found. The initial values for ( . 1). generate 1 million draws for each chain.
we compute recursive means. Intersections of solid lines signify posterior mode values. the means converge in the long run. For the importance sampler we follow Geweke (1999b). we construct standard error estimates.2): contours of posterior density at “low” and “high” mode as function of and 2 .1) to (3. truncated at 140 lags. Panels (2. and for the Metropolis chain we use Newey–West standard error estimates. Figure 3 plots recursive means for the four chains. As for the draws from the RWM algorithm. converge to the same (stationary) distribution and concentrate in the highposterior density region. For most parameters the two
7
It is not guaranteed that the Newey–West standard errors are formally valid. We generate another set of 1 million draws using the IS algorithm with = 3 and c = 1 5. Panels (1.
the plots suggests that all four chains. Since neither the IS nor the RWM approximation of the posterior means are exact.1) and (1.7 In Figure 4 we plot (recursive) conﬁdence intervals for the RWM and the IS approximation of the posterior means. Data 1 (L).Bayesian Analysis of DSGE Models
135
FIGURE 2 Draws from multiple chains – Model 1 (L). Despite different initializations.
.2): 200 draws from four Markov chains generated by the Metropolis Algorithm. despite the use of different starting values.
3. the 2 (L) posterior has (at least) two modes. for instance. A Potential Pitfall: Simulation Results for
2 (L)/ 2 (L)
We proceed by estimating the output growth rule version 2 (L) of the DSGE model based on 80 observations generated from this model (Data Set 2 (L)). r (A) . and g . say. An and F. 4. which we label the “high” mode. the magnitude of the discrepancy is generally small relative to. g . One of the modes. Unlike the posterior for the output gap version. The value of
. the difference between the posterior mode and the estimated posterior means. ˜ (h) . Each line corresponds to recursive means (as a function of the number of draws) calculated from one of the four Markov chains generated by the Metropolis Algorithm. Schorfheide
FIGURE 3 Recursive means from multiple chains – Model 1 (L). Data 1 (L).136
S. . Exceptions are. However.
conﬁdence intervals overlap. is located at = 2 06 and 2 = 97.
ﬁxing the other parameters at their respective ˜ (h) values.1) signify the low mode.1).1) is constructed by setting = ˜ (l ) and then graphing posterior contours for ∈ [0. The ﬁrst two panels of Figure 5 depict the contours of the nonnormalized posterior density as a function of and 2 at the low and the high mode.Bayesian Analysis of DSGE Models
137
FIGURE 4 RWM algorithm vs. recursively computed 95% bands for posterior means based on the metropolis algorithm (dotted) and the importance sampler (dashed). Importance Sampling – Model 1 (L). is located at = 1 44 and 2 = 81 and attains the value −183 23. 3] and 2 ∈ [0.
ln [ (  Y )p( )] is equal to −175 48. 2 5]. Panel (1.
. whereas the solid lines in the remaining panels indicate the location of the high mode. Panel (1. Similarly.2) is obtained by exploring the shape of the posterior as a function of and 2 . respectively. Panels depict posterior modes (solid). ˜ (l ) . The second (“low”) mode. Data 1 (L). (2. and (3.1). keeping all other parameters ﬁxed at their respective ˜ (l ) values. The intersection of the solid lines in panels (1.
An and F. Schorfheide
FIGURE 5 Draws from multiple chains – Model 2 (L). whereas Chains 2 and 4 move through the posterior surface near the high mode.8 We plot every 5.2). (3 0. It turns out that Chains 1 and 3 explore the posterior distribution locally in the neighborhood of the low mode.2 we generate 1 million draws each for model 2 from four Markov chains that were initialized at the following values for ( . and ( 6. 2 0). ( 4.2): contours of posterior density at “low” and “high” mode as function of and 2 .1) to (3. Data 2 (L). Panels (1. the remaining parameters are not ﬁxed at their ˜ (l ) and ˜ (h) values.
8
.1) to (3. Unlike in Panels (1.2): 200 draws from four Markov chains generated by the metropolis algorithm. The remaining parameters were initialized at the low (high) mode for Chains 1 and 3 (2 and 4).000th draw of and 2 from the four Markov chains in Panels (2. 2 0). Given the conﬁguration of the RWM algorithm the likelihood
The covariance matrices of the proposal distributions are obtained from the Hessians associated with the respective modes. 1). Panels (2.1) and (1.
The two modes are separated by a deep valley which is caused by complex eigenvalues of the state transition matrix. 2 ): (3 0. 1).1) and (1. Intersections of solid lines signify “low” (left panels) and “high” (right panels) posterior mode values.2).138
S. As in Section 4.
First.000. Nevertheless. While each chain appears to be stable. Each line corresponds to recursive means (as a function of the number of draws) calculated from one of the four Markov chains generated by the metropolis algorithm. Data ( L).
. The recursive means associated with the four chains are plotted in Figure 6.000 draws were insufﬁcient for the chains initialized at the low mode to cross over to the neighborhood of the high mode. corresponding to the two posterior modes. We explored various modiﬁcations of the RWM algorithm by changing the scaling factor c and by setting the offdiagonal elements of to zero. it only explores the posterior in the neighborhood of one of the modes. Two remarks are in order.Bayesian Analysis of DSGE Models
139
of crossing the valley that separates the two modes is so small that it did not occur. 1. extremum estimators that are computed with numerical optimization methods suffer from the same problem as
FIGURE 6 Recursive means from multiple chains – Model 2 (L). They exhibit two limit points.
We will focus on variance decompositions in the remainder of this subsection. An and F.000th draw generated with the RWM algorithm. the output gap policy rule speciﬁcation 1 implies that the government spending shock does not affect inﬂation and nominal interest rates. One might ﬁnd a local rather than the global extremum of the objective function. The posterior distribution
. Similarly. Parameter Transformations for
1 (L)/ 1 (L)
Macroeconomists are often interested in posterior distributions of parameter transformations h( ) to address questions such as what fraction of the variation in output growth is caused by monetary policy shocks. Panels (1. Hence an exploration of the posterior in the neighborhood of the global mode might still provide a good approximation to the overall posterior distribution. Answers can be obtained from the moving average representation associated with the state space model composed of (33) and (38). All subsequent computations are based on the output gap rule speciﬁcation of the DSGE model. which can be depicted as a triangle in 2 . 4.1) indicate that the prior distribution of the variance decomposition is informative in the sense that it is not uniform over the simplex. and monetary policy shocks. Hence variance decompositions of the endogenous variables lie in a threedimensional simplex. Schorfheide
the Bayes estimators in this example.140
S. in many applications as well as in this example there exists a global mode that dominates the local modes. While the prior mean of the fraction of inﬂation variability explained by the monetary policy shock is about 40%. in Bayesian computation it is helpful to start MCMC methods from different regions of the parameter space. and R explain 100% of the variation. Hence all prior and posterior draws concentrate on the R − Z edge of the simplex. government spending shocks. A 90% probability interval ranges from 0 to 20%. The data provide additional information about the variance decomposition. In Figure 7 we plot 200 draws from the prior distribution (left panels) and 200 draws from the posterior (right panels) of the variance decompositions of output growth and inﬂation. and R of the triangles correspond to decompositions in which the shocks z . and nominal interest rate in the DSGE model are due to three shocks: technology growth shocks. respectively. it is good practice to start the optimization from different points in the parameter space to increase the likelihood that the global optimum is found. G .4. A priori about 10% of the variation in output growth is due to monetary policy shocks. g . or vary the distribution q( ) in the importance sampler. The posterior draws are obtained by converting every 5. and what happens to output and inﬂation in response to a monetary policy shock. The corners Z . inﬂation. Hence.1) and (2. Second. a 90% probability interval ranges from 0 to 95%. For instance. Fluctuations of output growth.
is much more concentrated than the prior.t . Canova (1994).G. Empirical Applications There is a rapidly growing empirical literature on the Bayesian estimation of DSGE models that applies the techniques described in this paper. The panels depict 200 draws from prior and posterior distributions arranged on a 3dimensional simplex.R) correspond to 100% of the variation being due to the shocks z. respectively. Data 1 (L).t . and McGrattan (1994) studies the macroeconomic effects of taxation in an estimated business cycle model. The probability interval for the contribution of monetary policy shocks to output growth ﬂuctuations shrinks to the range from 1 to 4. Preceding the Bayesian literature are papers that use maximum likelihood techniques to estimate DSGE models. The following incomplete list of references aims to give an overview of this work. The posterior probability interval for the fraction of inﬂation variability explained by the monetary policy shock ranges from 15 to 37%. and Geweke (1999a) proposed Bayesian approaches to calibration that do not exploit the likelihood function of the DSGE model and provide empirical applications that assess business cycle and asset pricing implications of simple stochastic growth
.Bayesian Analysis of DSGE Models
141
FIGURE 7 Prior and posterior variance decompositions – Model 1 (L).5.5%. 4. and R . DeJong et al. g . Leeper and Sims (1994) and Kim (2000) estimated DSGE models that are usable for monetary policy analysis. The three corners (Z.t . (1996). Altug (1989) estimates Kydland and Prescott’s (1982) timetobuild model.
and Schorfheide (2000) considers cashinadvance monetary DSGE models. Schorfheide
models.S. and New Zealand respond to exchange rates. 2005) both for the U. FernándezVillaverde and RubioRamírez (2004) use Bayesian estimation techniques to ﬁt a cattle–cycle model to the data. Many central banks are in the process of developing DSGE models along the lines of Smets and Wouters (2003) that can be estimated with Bayesian techniques and used for policy analysis and forecasting. and the Euro Area. (2000) estimate a stochastic growth model and examine its forecasting performance. England.S. Schorfheide (2005) allows for regimeswitching of the target inﬂation level in the monetary policy rule. Lubik and Schorfheide (2004) estimate the benchmark New Keynesian DSGE model without restricting the parameters to the determinacy region of the parameter space. (2000). Models similar to Smets and Wouters (2003) have been estimated by Laforte (2004). Bayesian estimation techniques have also been used in the open economy literature. DeJong et al.142
S. DeJong and Ingram (2001) study the cyclical behavior of skill accumulation. Lubik and Schorfheide (2003) estimate the small open economy extension of the model presented in Section 2 to examine whether the central banks of Australia. (2006) to study monetary policy. Canova (2004) estimates a smallscale New Keynesian model recursively to assess the stability of the structural parameters over time. The literature on likelihoodbased Bayesian estimation of DSGE models began with work by LandonLane (1998). Canada. Schorfheide (2000). Justiniano and Preston (2004) extend the empirical analysis to situations of imperfect exchange rate passthrough. An and F. (2002) estimate a stochastic growth model augmented with a learningbydoing mechanism to amplify the propagation of shocks. (2005) have been analyzed by Smets and Wouters (2003. DeJong et al. Variants of the smallscale New Keynesian DSGE model presented in Section 2 have been estimated by Rabanal and RubioRamírez (2005a. and Levin et al. Galí and Rabanal (2005) use an estimated DSGE model to study the effect of technology shocks on hours worked. and de Walque and Wouters (2004) have estimated multicountry DSGE models. (2004) analyze an open economy model that includes capital accumulation as well as numerous real and nominal frictions. whereas Chang et al. and Otrok (2001). Chang and Schorfheide (2003) study the importance of labor supply shocks and estimate a homeproduction model. Otrok (2001) ﬁts a real business cycle with habit formation and timetobuild to the data to assess the welfare costs of business cycles. Rabanal and Tuesta (2006). Lubik and Schorfheide (2006). Onatski and Williams (2004).b) for the U. Largescale models that include capital accumulation and additional real and nominal frictions along the lines of Christiano et al. and the Euro Area. Adolfson et al.
. A similar model is ﬁtted by Del Negro (2003) to Mexican data.
We will distinguish the assessment of absolute ﬁt from techniques that aim to determine the ﬁt of a DSGE model relative to some other model. Lancaster (2004). We can derive the predictive distribution of Y rep given the current state of knowledge: p(Y rep  Y ) = p(Y rep  )p(  Y )d (44)
Let h(Y ) be a test quantity that reﬂects an aspect of the data that we want to examine. Posterior Predictive Checks Predictive checks as a tool to assess the absolute ﬁt of a probability model have been advocated. Model Evaluation In Section 4 we discussed the estimation of a linearized DSGE model and reported results on the posterior distribution of the model parameters and variance decompositions. An introduction to Bayesian model checking can be found. Section 5. for instance. in books by Gelman et al. Relative model comparisons. and has a unimodal density. Let Y rep be a sample of observations of length T that we could have observed in the past or that we might observe in the future. for instance. Methods that determine whether actual data lie in the tail of a model’s data distribution potentially favor alternatives that make unreasonably diffuse predictions. posterior predictive checks have become a valuable tool in applied Bayesian analysis though they have not been used much in the context of DSGE models.Bayesian Analysis of DSGE Models
143
5. Nevertheless. A quantitative model check can be based on Bayesian pvalues.1. on the other hand. and Geweke (2005). and Section 5. A probability model is considered as discredited by the data if one observes an event that is deemed very unlikely by the model. The ﬁrst approach can be implemented by a posterior predictive model check and has the ﬂavor of a classical hypothesis test. Such model checks are controversial from a Bayesian perspective. Section 5. This section studies various techniques that can be used to evaluate the model ﬁt. are typically conducted by enlarging the model space and applying Bayesian inference and decision theory to the extended model space. Then one could compute probability of the tail
. by Box (1980).1 discusses Bayesian model checks. 5. Suppose that the test quantity is univariate. (1995). nonnegative. We tacitly assumed that model and prior provide an adequate probabilistic representation of the uncertainty with respect to data and parameters.2 reviews model posterior odds comparisons.3 describes a more elaborate model evaluation based on comparisons between DSGE models and VARs.
depicted in the left panels of Figure 8. Schorfheide
event by h(Y rep ) ≥ h(Y ) p(Y rep  Y )dY rep . Posterior Odds Comparisons of DSGE Models The Bayesian framework is naturally geared toward the evaluation of relative model ﬁt. lagged inﬂation and current interest rates. Researchers can place probabilities on competing models and assess alternative speciﬁcations based on their posterior odds.144
S.000th parameter draw of (s) generated with the RWM algorithm. whereas 5 (L) was not. and calculate h(Y rep ). To obtain draws from the posterior predictive distribution of h(Y rep ) we take every 5. 5. We use the following data transformations: the correlation between inﬂation and lagged interest rates. we use a graphical approach to illustrate the model checking procedure in the context of model 1 (L). An excellent survey on model comparisons based on posterior probabilities
To obtain draws from the unconditional distribution of yt we initialize s0 = 0 and generate 180 observations. An and F. We consider two cases: in the case of no misspeciﬁcation. and output growth and interest rates. A small tail probability can be viewed as evidence against model adequacy. Each panel of the ﬁgure depicts 200 draws from the posterior predictive distribution in bivariate scatter plots. Rather than constructing numerical approximations of the tail probabilities for univariate functions h(·). we would expect the actual values of h(Y ) to lie further in the tails of the posterior predictive distribution in the right panels than in the left panels. Indeed a visual inspection of the plots provides some evidence in which dimensions 1 (L) is at odds with data generated from a model that includes lags of output in the household’s Euler equation. whereas under misspeciﬁcation. output growth and lagged output growth. the intersections of the solid lines indicate the actual values h(Y ). (45)
where x ≥ a is the indicator function that is 1 if x ≥ and zero otherwise.
9
. discarding the ﬁrst 100. simulate a sample Y rep of 80 observations9 from the DSGE model conditional on (s) . illustrated in the two right panels of Figure 8. Since 1 (L) was generated from model 1 (L). The observed autocorrelation of output growth in 5 (L) is much larger and the actual correlation between output growth and interest rates is much lower than the draws generated from the posterior predictive distribution. the posterior is constructed based on data set 1 (L).2. Moreover. the posterior is obtained from data set 5 (L).
Bayesian Analysis of DSGE Models
145
FIGURE 8 Posterior predictive check for Model 1 (L). It is straightforward to verify that under a 0–1 loss function (the loss attached to choosing the wrong model is one). researchers take a shortcut and use posterior probabilities to justify the selection of a particular model speciﬁcation.
can be found in Kass and Rafterty (1995).0 p(Y j =1.T
=
i.4
 i) p(Y  j . Posterior model probabilities can be used to weigh predictions from different models. which is deﬁned in (39). Intersections of solid lines signify the observed sample moments. according to which the central bank does not respond to output at all and 2 = 0. We plot 200 draws from the posterior predictive distribution of various sample moments. 4
(46)
The key object in the calculation of posterior probabilities is the marginal data density p(Y  i ). = 5. This 0–1 loss function is an attractive benchmark in situations in which the researcher believes that all the important models are included in the analysis. that is. All subsequent analysis is then conditioned on the chosen model. even in situations in which all the models under consideration are misspeciﬁed.0 on the three competing speciﬁcations then posterior model probabilities can be computed by
i. In many instances. Moreover. suppose in addition to the speciﬁcation 1 (L) we consider a version of the New Keynesian model. However. the optimal decision is to select the highest posterior probability model. there is a model 4 (L).
. If we are willing to place prior probabilities i.3.0
j)
. 3.
i = 1. denoted by 3 (L) in which prices are nearly ﬂexible. For concreteness.
To make the numerical approximation efﬁcient. We will subsequently consider two numerical approaches: Geweke’s (1999b) modiﬁed harmonic mean estimator and Chib and Jeliazkov’s (2001) algorithm. that posterior odds (or their large sample approximations) asymptotically favor the DSGE model that is closest to the ‘true’ data generating process in the KullbackLeibler sense. It has been shown under various regularity conditions. e.. 1952) and the model comparison based on posterior odds captures the relative onestepahead predictive performance.
)d
. f ( ) should be chosen so that the summands are of equal magnitude. Phillips (1996) and FernándezVillaverde and RubioRamírez (2004). since the logmarginal data density can be rewritten as
T
ln p(Y 
)=
t =1 T
ln p(yt  Y t −1 .
(47)
where ln p(Y  ) can be interpreted as a predictive score (Good. Conditional on the choice of f ( ) an obvious estimator is ˆ pG (Y ) = 1 nsim
nsim
s=1
f ( (s) ) ( (s)  Y )p(
−1 (s) )
. Schorfheide
the selection of the highest posterior probability model has some desirable properties. The practical difﬁculty in implementing posterior odds comparisons is the computation of the marginal data density. (  Y )p( ) (48)
where f ( ) has the property that f ( )d = 1 (see Gelfand and Dey.146
S. d is the dimension of the
. .g.
(49)
where (s) is drawn from the posterior p(  Y ). 1994). An and F. f( )=
−1
(2 )−d/2  V  −1/2 exp −0 5( − ¯ ) V −1 ( − ¯ ) ( − ¯ ) V −1 ( − ¯ ) ≤ F −1 ( ) 2
d
×
(50)
Here ¯ and V are the posterior mean and covariance matrix computed from the output of the posterior simulator.
) )p(  Y t −1 . Moreover. Harmonic mean estimators are based on the identity 1 = p(Y ) f( ) p(  Y )d .
=
t =1
ln
p(yt  Y t −1 . Geweke (1999b) proposed to use the density of a truncated multivariate normal distribution.
r ( . If the posterior of is in fact normal then the summands in (49) are approximately constant.  Y ) was in the description of the algorithm.
(j )
(s)
. Chib and Jeliazkov (2001) use the following equality as the basis for their estimator of the marginal data density: p(Y ) = (  Y )p( ) p(  Y ) (51)
The equation is obtained by rearranging the Bayes theorem and has to hold for all . Y) . ˆ pCS (Y ) = ( ˜  Y )p( ˜ ) . However. let q( . ˜  Y ) be the proposal density for the transition from to ˜ . We calculate marginal data densities recursively (as a function of the number of MCMC draws) for the four chains that were initialized at different starting values as discussed in Section 4. The results are summarized in Figure 9.
. and ∈ (0. For the output gap rule all four chains lead to estimates of around −196 7 Figure 10 provides a comparison of Geweke’s versus Chib and Jeliazkov’s approximation of the marginal data density for the output gap rule speciﬁcation. While the numerator can be easily computed. F 2 is the cumulative density function of a 2 random d variable with d degrees of freedom. 1). (53)
where r ( . ˜  Y )q( (˜. ˆ p( ˜  Y ) (52)
where we replaced the generic in (51) by the posterior mode ˜ . Moreover. the denominator requires a numerical approximation.Bayesian Analysis of DSGE Models
147
parameter vector. A visual inspection of the plot suggests that both estimators converge to the same limit point. Thus.  Y ) given the posterior mode value ˜ . ˜ Y)
J −1
J j =1
Y)
. the modiﬁed harmonic mean estimator appears to be more reliable if the number of simulations is small. Then the posterior density at the mode can be approximated by ˆ p( ˜  Y ) =
n 1 nsim nsim s=1
(
(s)
. Within the RWM Algorithm denote the probability of moving from to by ( . We ﬁrst use Geweke’s harmonic mean estimator to approximate the marginal data density associated with 1 (L)/ 1 (L).  Y ) = min 1.
(54)
sim where (s) s=1 are sampled draws from the posterior distribution with the J RWM algorithm and (j ) j =1 are draws from q( ˜ .
An and F. For each Markov chain. log marginal data densities are computed recursively with Geweke’s modiﬁed harmonic mean estimator and plotted as a function of the number of draws.
. Data 1 (L).
FIGURE 10 Geweke vs.148
S. Log marginal data densities are computed recursively with Geweke’s modiﬁed harmonic mean estimator as well as the Chib–Jeliazkov estimator and plotted as a function of the number of draws. Schorfheide
FIGURE 9 Data densities from multiple chains – Model 1 (L). Chib–Jeliazkov data densities – Model 1 (L). Data 1 (L).
deviations of output from target coincide with deviations of inﬂation from its target value.Bayesian Analysis of DSGE Models
TABLE 4 Log marginal data densities based on Speciﬁcation Benchmark model 1 (L) Model with nearly ﬂexible prices 3 (L) No policy reaction to output 4 (L)
1 (L)
149
ln p(Y  −196 7 −245 6 −201 9
)
Bayes factor versus 1. We will subsequently present a procedure that allows us to document how the marginal data density of the DSGE model changes as
. the speciﬁcation of a prior distribution for the VAR parameter requires careful attention.00 exp[48 9] exp[5 2]
1 (L)
Notes: The log marginal data densities for the DSGE model speciﬁcations are computed based on Geweke’s (1999a) modiﬁed harmonic mean estimator. The marginal data density of model 4 (L) is equal to −201 9 and the Bayes factor of model 1 (L) versus model 4 (L) is ‘only’ e 5 . Hence. The ﬁrst step of the comparison is typically to compute posterior probabilities for the DSGE model and the VAR. Possible pitfalls are discussed in Sims (2003). can be interpreted as restrictions on a VAR representation. inﬂation will also be close to its target value. at least approximately. 5. The DSGE model implies that when actual output is close to the target ﬂexible price output. Given the fairly concentrated posterior of depicted in Figure 1 this result is not surprising.3. The model comparison results are summarized in Table 4. which can be used to detect the presence of misspeciﬁcations. as linearized DSGE models. Since the VAR parameter space is generally much larger than the DSGE model parameter space. data set 1 (L) provides very strong evidence against ﬂexible prices. which translates into a Bayes factor (ratio of posterior odds to prior odds) of approximately e 49 in favor of 1 (L). Comparison of a DSGE Model with a VAR The notion of potential misspeciﬁcation of a DSGE model can be incorporated in the Bayesian framework by including a more general reference model 0 into the model set. The marginal data density associated with 3 (L) is −245 6.
Based on data set 1 (L) we also estimate speciﬁcation 3 (L) in which prices are nearly ﬂexible and speciﬁcation 4 (L) in which the central bank does not respond to output. Vice versa. In a more general context this phenomenon is often called Lindley’s paradox. A VAR with a prior that is very diffuse is likely to be rejected even against a misspeciﬁed DSGE model. This mechanism makes it difﬁcult to identify the policy rule coefﬁcients and imposing an incorrect value for 2 is not particularly costly in terms of ﬁt. A natural choice of reference model in dynamic macroeconomics is a VAR.
10
In principle one could start from a VARMA model to avoid the approximation error.
XY (
)=
D
[xt yt ]
A VAR approximation of the DSGE model can be obtained from restriction functions that relate the DSGE model parameters to the VAR parameters.150
S.
∗
( )=
YY (
)−
YX (
)
−1 XX (
)
XY (
)
(56)
This approximation is typically not exact because the state–space representation of the linearized DSGE model generates moving average terms.
10
. This can be achieved by comparing the posterior distributions of interesting population characteristics such as impulse response functions obtained from the DSGE model and from a VAR representation that does not dogmatically impose the DSGE model restrictions. Its accuracy depends on the number of lags p.
∗
( )=
−1 XX (
)
XY (
). yt −p ] . To construct a DSGEVAR we proceed as follows. If the data favor the VAR over the DSGE model then it becomes important to investigate further the deﬁciencies of the structural model. We will document below that four lags are sufﬁcient to generate a fairly precise approximation of the model 1 (L). Building on work by Ingram and Whiteman (1994) the DSGEVAR approach of Del Negro and Schorfheide (2004) was designed to improve forecasting and monetary policy analysis with VARs. An and F. The framework has been extended to a model evaluation tool in Del Negro et al. . p ] . yt −1 . and the magnitude of the roots of the moving average polynomial. would become signiﬁcantly more cumbersome. (2006) and used to assess the ﬁt of a variant of the Smets and Wouters (2003) model. We will refer to the latter benchmark model as DSGEVAR and discuss an identiﬁcation scheme that allows us to construct structural impulse response functions for the vector autoregressive speciﬁcation against which the DSGE model can be evaluated. Consider a vector autoregressive speciﬁcation of the form yt =
0
+
1 yt −1
+ ··· +
p yt −p
+ ut . . however. Let D [·] be the expectation under DSGE model and deﬁne the autocovariance matrices
XX (
)=
D
[xt xt ]. the T × n matrices Y and U composed of rows yt and ut . The posterior computations. 1 . and the T × k matrix X with rows xt . Thus the VAR can be written as Y = X + U .
[ut ut ] =
(55)
Deﬁne the k × 1 vector xt = [1. the coefﬁcient matrix = [ 0 . Schorfheide
the crosscoefﬁcient restrictions that the DSGE model imposes on the VAR are relaxed.
(1 + )T − k. )p (  Y ) (59)
The subscript indicates the dependence of the posterior on the hyperparameter. which is emphasized in Del Negro et al. p( ). The joint posterior density of VAR and DSGE model parameters can be conveniently factorized as p ( . .
.
where W denotes the inverted Wishart distribution.  Y . Zeller (1971). It is straightforward to show.. n
XX (
). Bayesian analysis of this model requires the speciﬁcation of a prior distribution for the DSGE model parameters. and the misspeciﬁcation matrices.
(60)
11 Ingram and Whiteman (1994) constructed a VAR prior from a stochastic growth model to improve the forecast performance of the VAR. n (58)
−1
1 ( ). ∼ W T
∗ ∗
( ). T − k. ∼ Y. is a hyperparameter that scales the prior covariance matrix. The prior distribution is proper provided that T ≥ k + n. The prior is diffuse for small values of and shifts its mass closer to the DSGE model restrictions as −→ ∞. T
⊗
XX (
)
−1
.
=
∗
( )+
(57)
and capture deviations from the restriction functions The matrices ∗ ( ) and ∗ ( ). e.
⊗( T
) + X X )−1 . see Del Negro and Schorfheide (2004). their setup did not allow for the computation of a posterior distribution for . Rather than specifying a prior in terms of and it is convenient to specify it in terms of and conditional on . ∼ W (1 + )T
b( b(
). However. It has the property that its density is approximately proportional to the Kullback–Leibler discrepancy between the VAR approximation of the DSGE model and the . We assume  ∼  . In the limit the VAR is estimated subject to the restrictions (56).11 The prior distribution can be interpreted as a posterior calculated from a sample of T observations generated from the DSGE model with parameters . (2006).Bayesian Analysis of DSGE Models
151
In order to account for potential misspeciﬁcation of the DSGE model it is assumed that there is a vector and matrices and such that the data are generated from the VAR in (55) with the coefﬁcient matrices =
∗
( )+
. that the posterior distribution of and is also of the inverted Wishart normal form: Y.VAR. .g. Y) = p ( .
the closer the posterior mean of the VAR parameters is to ∗ ( ) and ∗ ( ). (2006) emphasize that the posterior of provides a measure of ﬁt for the DSGE model: high posterior probabilities for large
. then the posterior mean is close to the OLS estimate (X X )−1 X Y . The paper also shows that in large samples the resulting estimator of can be interpreted as a Bayesian minimum distance estimator that projects the VAR coefﬁcient estimates onto the subspace generated by the restriction functions (56). On the other hand. the values that respect the crossequation restrictions of the DSGE model. Del Negro et al. if = (n + k)/T . A natural criterion for the choice of in a Bayesian framework is the marginal data density p (Y ) = p (Y  )p( )d (61)
For computational reasons we restrict the hyperparameter to a ﬁnite grid . If one assigns equal prior probability to each grid point then the normalized p (Y )’s can be interpreted as posterior probabilities for . Since the empirical performance of the DSGEVAR procedure crucially depends on the weight placed on the DSGE model restrictions. Schorfheide
b(
where
) and
b(
) are the given by
XX (
b(
)= )=
1+
)+
1 XX 1+ T
−1
1+
YX (
XY
+
1 XY 1+ T
b(
1 ( T (1 + )T
YY (
)+Y Y)−( T
) + Y X) )+X Y)
×( T
XX (
) + X X )−1 ( T
XY (
Hence the larger the weight of the prior.152
S. An and F. it is important to consider a datadriven procedure to determine . The marginal posterior density of can be obtained through the marginal likelihood p (Y  ) =  T
XX (
) + X X − 2 (1 + )T
XX (
n
b(
)−
(1+ )T −k 2
 T × (2 )−nT /2 2 2
)− 2  T
n i=1
n
∗(
)−
T −k 2
n((1+ )T −k) 2 n( T −k) 2
[((1 + )T − k + 1 − i)/2]
n i=1
[( T − k + 1 − i)/2]
A derivation is provided in Del Negro and Schorfheide (2004).
So far. at least qualitatively. which we represent by ut =
tr t
(63)
. According to the DSGE model. To the extent that misspeciﬁcation is mainly in the dynamics. we maintain the triangularization of its covariance matrix and replace the rotation in (65) with the function ∗ ( ) that appears in (64). say between 0. on the other hand. the identiﬁcation procedure can be interpreted as matching. An impulse response function comparison requires the identiﬁcation of structural shocks in the context of the VAR. Del Negro and Schorfheide (2004) proposed to construct as follows. the onestepahead forecast errors ut are functions of the structural shocks t . The initial tr impact of t on yt in the VAR. is given by yt
t VAR
=
tr
(65)
To identify the DSGEVAR. then a comparison between DSGEVAR( ˆ ) and DSGE model impulse responses can yield important insights about the misspeciﬁcation of the DSGE model. and is an orthonormal tr is the Cholesky decomposition of matrix that is not identiﬁable based on the likelihood function associated with (55).
. as opposed to the covariance matrix of innovations. Using a QR factorization.Bayesian Analysis of DSGE Models
153
values of indicate that the model is well speciﬁed and a lot of weight should be placed on its implied restrictions.5 and 2. Let A0 ( ) be the contemporaneous impact of t on yt according to the DSGE model. the VAR given in (55) has been speciﬁed in terms of reduced form disturbances ut . the initial response of yt to the structural shocks can be can be uniquely decomposed into yt
t DSGE
= A0 ( ) =
∗ tr (
)
∗
( ). the shortrun responses of the VAR with those from the DSGE model. The rotation matrix is chosen so that in absence of misspeciﬁcation the DSGE’s and the DSGEVAR’s impulse responses to all shocks approximately coincide. Deﬁne ˆ = argmax p (Y )
∈
(62)
If p (Y ) peaks at an intermediate value of . The estimation of the DSGEVAR can be implemented as follows.
(64)
where ∗ ( ) is lower triangular and ∗ ( ) is orthonormal.
154
S. We now implement the DSGEVAR procedure for 1 (L) based on artiﬁcially generated data. The marginal data
TABLE 5 Log marginal data densities for Speciﬁcation DSGE Model DSGEVAR = ∞ DSGEVAR = 5 00 DSGEVAR = 1 00 DSGEVAR = 75 DSGEVAR = 50 DSGEVAR = 25 Notes: See Table 4. Table 5 reports log marginal data densities for the DSGEVAR as a function of for data sets 1 (L) and 5 (L). the two values are indeed quite close. With respect to data set 5 (L) the DSGE model is misspeciﬁed. This misspeciﬁcation is captured in the inverse U shape of the marginal data density function. 5. If the VAR approximation of the DSGE model were exact. The ﬁrst row reports the marginal data density associated with the state space representation of the DSGE model. Moreover. For = ∞ the misspeciﬁcation matrices estimate the VAR by imposing the restrictions = ∗ ( ) and = ∗ ( ). We assume that lies on the grid = 25. All the results reported subsequently are based on a DSGEVAR with p = 4 lags. (s) . then the values in the ﬁrst and second row of Table 5 would be identical. For each draw (s) generate a pair (s) . 75. An and F. compute the orthonormal matrix (s) = ∗ ( ) as described above. 1. Use the RWM algorithm to generate draws (s) from the marginal posterior distribution p (  Y ). Use Geweke’s modiﬁed harmonic mean estimator to obtain a numerical approximation of p (Y ).
1 (L) 1 (L)
DSGEVARs
5 (L)
−196 66 −196 88 −198 87 −206 57 −209 53 −215 06 −231 20
−245 62 −244 54 −241 95 −238 59 −239 40 −241 81 −253 61
. ∞ and assign equal probabilities to each grid and are zero and we point. which has been directly generated from the state space representation of 1 (L). Schorfheide
MCMC Algorithm for DSGEVAR 1. For data set 1 (L). which is representative for the shapes found in empirical applications in Del Negro and Schorfheide (2004) and Del Negro et al. the marginal data densities are increasing as a function of . The value = 1 00 has the highest likelihood. 2. (2006). indicating that there is no evidence of misspeciﬁcation and that it is best to dogmatically impose the DSGE model restrictions. 5. by sampling from the W − distribution. 3. whereas the second row corresponds to the DSGEVAR(∞). The ﬁrst step of our analysis is to construct a posterior distribution for the parameter . Moreover.
Moreover. the = ∞ and the = ˆ DSGEVAR. Figure 11 depicts posterior mean impulse responses for the state–space representation of the DSGE model as well as the DSGEVAR(∞).
. In order to assess the nature of the misspeciﬁcation for 5 (L) we now turn to the impulse response function comparison. and DSGEVAR( = ∞) – Model 1 (L). these impulse responses can either be compared based on the same parameter values or their respective posterior estimates of the DSGE model parameters. DSGE. In principle. DSGEVAR( = ∞) responses: posterior mean (dashed) and pointwise 90% probability bands (dotted). DSGE model responses computed from state–space representation: posterior mean (solid).Bayesian Analysis of DSGE Models
155
densities drop substantially for ≥ 5 and ≤ 0 5. In both
FIGURE 11 Impulse responses. there are three different sets of impulse responses to be compared: responses from the state–space representation of the DSGE model. Data 5 (L).
Rather than looking at the dynamic responses of the endogenous variables to the structural shocks. and interest rates satisfy the following relationships in response
. According to the loglinearized DSGE model output. While the relaxation of the DSGE model restrictions has little effect on the propagation of the technology shock. we can also ask to what extent are the optimality conditions that the DSGE model imposes satisﬁed by the DSGEVAR( ˆ ) responses. To obtain the dotted bands in the ﬁgure. however.156
S. inﬂation returns quickly to its steadystate level after a contractionary policy shock. and the reversion to the steadystate is much slower. the DSGE and DSGEVAR(∞) responses are virtually indistinguishable. the inﬂation and interest rate responses to a monetary policy shock are markedly different. The (dotted) bands in Figure 11 depict pointwise 90% probability intervals for the DSGEVAR(∞) and reﬂect posterior parameter uncertainty with respect to . Both sets of responses are constructed from the ˆ posterior of . Roughly speaking. The DSGEVAR( ˆ ) response of inﬂation. on the other hand. The discrepancy. The discrepancy between the posterior mean responses (solid and dashed lines) indicates the magnitude of the error that is made by approximating the state–space representation of the DSGE model by a fourthorder VAR. the interest rate falls below steadystate in period four and stays negative for several periods. is transformed from the – space into impulse response functions because they are easier to interpret. calculate their differences. To assess the misspeciﬁcation of the DSGE model we compare posterior mean responses of the = ∞ (dashed) and ˆ (solid) DSGEVAR in Figure 12. namely. An and F. indicating that the approximation error is small. A brief description of the computation can clarify the interpretation. we take the upper and lower bound of these probability intervals and shift them according to the posterior mean of the = ∞ response. In a general equilibrium model a misspeciﬁcation of the households’ decision problem can have effects on the dynamics of all the endogenous variables. Schorfheide
cases we use the same posterior distribution of . has a slight hump shape. the ﬁgure answers the question: to what extent do the ˆ estimates of the VAR coefﬁcients deviate from the DSGE model restriction functions ∗ ( ) and ∗ ( ). For each draw of from the ˆ DSGEVAR posterior we compute the ˆ and the = ∞ response functions. The (dotted) 90% probability bands reﬂect uncertainty with respect to the discrepancy between the = ∞ and ˆ responses. Except for the response of output to the government spending shock. and construct probability intervals of the differences. the one obtained from the DSGEVAR(∞). According to the VAR approximation of the DSGE model. Unlike in the DSGEVAR(∞). inﬂation.
and DSGEVAR( = 1) – Model 1 (L).Bayesian Analysis of DSGE Models
157
FIGURE 12 Impulse responses. DSGEVAR( = ∞). DSGEVAR( = ∞) posterior mean responses (dashed). Data 5 (L). Pointwise 90% probability bands (dotted) signify shifted probability intervals for the difference between = ∞ and = 1 responses. DSGEVAR( = 1) posterior mean responses (solid).
to the shock
R .t :
0 = yt − ˆ = Rt −
t [yt +1 ]
1 + (Rt −
t [ ˆ t +1 ] R) 1 ˆt
t [ t +1 ])
(66) (67)
R)
0 = ˆt −
R .t R Rt −1
− yt ˆ − (1 − ˆ 2 yt
− (1 −
(68)
Figure 13 depicts the path of the righthand side of Equations (66) to (68) in response to a monetary policy shock. Based on draws from the joint posterior of the DSGE and VAR parameters we can ﬁrst calculate identiﬁed
.
Data 5 (L).
VAR responses to obtain the path of output. which relaxes the DSGE model restrictions.1) indicates a substantial violation of the consumption Euler equation. The dashed lines show the posterior mean responses according to the state–space representation of the DSGE model. and the monetary policy rule. Schorfheide
FIGURE 13 Impulse responses. and DSGEVAR( = 1) – Model 1 (L). An and F. (2006) use the DSGEVAR framework also to compare multiple DSGE models. The bottom three panels of the ﬁgure are based on the DSGEVAR( ˆ ). This ﬁnding is encouraging for the evaluation strategy since the data set 5 (L) was indeed generated from a model with a modiﬁed Euler equation. and interest rate. While this section focused mostly on the assessment of a single DSGE model. and in a second step use the DSGE model parameters to calculate the wedges in the Euler equation. Del Negro et al. DSGEVAR( = ∞). While the price setting relationship and the monetary policy rule restriction seem to be satisﬁed. DSGE model responses: posterior mean (dashed). Schorfheide (2000) proposed a lossfunctionbased evaluation framework for (multiple) DSGE models that
. Panel (2. DSGEVAR responses: posterior mean (solid) and pointwise 90% probability bands (dotted). the price setting equation. The top three panels of Figure 13 show that both for the DSGE model as well as the DSGEVAR(∞) the three equations are satisﬁed. and the solid lines depict DSGEVAR responses. inﬂation.158
S. at least for the posterior mean.
This lossfunctionbased approach nests DSGE model comparisons based on marginal data densities as well as assessments based on a comparison of model moments to sample moments. t = 1. = p (st  st −1 .
i=1
Hence one can draw N particles from p st  Y t −1 . 2) Prediction: Draw onestepahead forecasted particles p st  Y t −1 . 6. By induction. for each i. the linear Kalman ﬁlter cannot be used to compute the likelihood function. Recently FernándezVillaverde and RubioRamírez (2005) use the ﬁlter to construct the likelihood for DSGE model solved with a projection method.Bayesian Analysis of DSGE Models
159
augments potentially misspeciﬁed DSGE models with a more general reference model. A brief description of the procedure is given below. Particle Filter i 1) Initialization: Draw N particles s0 . from the initial distribution p(s0  ). . We use Y to denote the × n matrix with rows yt . The evaluation of the likelihood function can be implemented with a particle ﬁlter. . Arulampalam et al. numerical methods have to be used to integrate the latent state vector. Instead. as special cases. dst −1 ≈ 1 N
N
sti ˜
N i=1
from
p st  sti−1 . also known as a sequential Monte Carlo ﬁlter. and then examines the ability of the DSGE models to predict the population characteristics. (2002) provide an excellent survey. . N .4. which approximate p st −1  Y t −1 . Gordon et al. The vector st has been deﬁned in Section 2. . ) p st −1  Y t −1 .
by generating one
. in period t we start with the particles N sti−1 i=1 . This prediction is evaluated under a loss function and the risk is calculated under an overall posterior distribution that averages the predictions of the DSGE models and the reference model according to their posterior probabilities. Note that p st  Y t −1 . . popularized in the calibration literature. the particle ﬁlter has been applied to analyze the stochastic volatility models by Pitt and Shephard (1999) and Kim et al. i = 1. (1993) and Kitagawa (1996) are early contributors to this literature. constructs a posterior distribution for population characteristics of interest such as autocovariances and impulse responses. (1998). NONLINEAR METHODS For a nonlinear/nonnormal state–space model. We follow their approach and use the particle ﬁlter for DSGE model 1 solved with a secondorder perturbation method. In economics. particle from p st  sti−1 .
˜
i N t i=1
j
and note that the importance sampler updated density p st  Y t . ) and hence is equally weighted.
i = 1. In the remainder of this section we will refer to these parameters as “true” values as opposed to “pseudotrue” values to
12 The sequential importance sampler (SIS) is a Monte Carlo method which utilized this aspect. Schorfheide
3) Filtering: The goal is to approximate p st  Y t . given by sti .
which amounts to updating the probability weights assigned to the particles N sti i=1 . . p st  Y t −1 . . ti i=1 so that ˜ Pr sti = sti =
i t. all but one particle will have negligible weight.160
S.
approximates12 the
N
4) Resampling: We now generate a new set of particles sti i=1 by resampling with replacement N times from an approximate discrete N ˜ representation of p st  Y t . We begin by computing the nonnormalized importance weights ˜ ˜ ˜ ti = p yt  sti . An and F.
.
. 5) Likelihood Evaluation: According to (70) the log likelihood function can be approximated by
T N
ln (  Y ) = ln p(y1  ) +
T t =2
ln p yt  Y
t −1
. (69)
p yt  Y t −1 . but it is well documented that it suffers from a degeneracy problem. where after a few iterations.
≈
t =1
1 N
N
˜ ti
i=1
To compare results from ﬁrst and secondorder accurate DSGE model solutions we simulated artiﬁcial data of 80 observations from the quadratic approximation of model 1 with parameter values reported in Table 3 under the heading 1 (Q ). = p yt  st . dst ≈ 1 N
N
˜ ti
i=1
(70)
Now deﬁne the normalized weights
i t
=
˜ ti N j =1
˜t sti . The denominator in (69) can be approximated by p yt  Y t −1 .N
The resulting sample is in fact an iid sample from the discretized density of p(st  Y t . p st  Y t −1 . . = p yt  st .
is set to 0. z .1. Second. Conﬁguration of the Particle Filter There are several issues concerning the practical implementation of the particle ﬁlter. we have to choose the number of particles. it is desirable to have many particles.33. 1/g is chosen as . which implies that the steadystate consumption amounts to 85% of output. gt . R . . data. interest rate. a measurement error is introduced to each measurement equation. the computation of p(st  Y t ) requires the solution of a system of quadratic equations.2 to match the average output growth rate.
13
. where x is the steadystate of xt . which implies a steadystate markup of 11% that is consistent with the estimates of Basu (1995). especially enough particles to capture the tails of
Without the measurement error two complications arise: since p(yt  st ) degenerates to a pointmass and the distribution of st is that of a multivariate noncentral 2 the evaluation of p(yt  Y t −1 ) becomes difﬁcult. and zt . t ] . 1. First. We generate x0 by drawing 0 from its unconditional distribution (the law of motion for t is linear) and setting x−1 = x.55.68. Moreover. it is straightforward to calculate the unconditional distribution of st associated with the vector autoregressive representation (33). and inﬂation between the artiﬁcial data and the real U. we need a scheme to draw from the initial state distribution. The implied quadratic adjustment cost coefﬁcient. t ] . whose standard deviation is set as 20% of that of each observation. which implies less price stickiness than in the linear case. . The degree of imperfect competition.1. and (A) are chosen as .85. where t is composed of the exogenous processes R .13 6. there are some exceptions. g . Other exogenous process parameters. g .0. is chosen to be .t . (Q ) . For the secondorder approximation we rewrite the state transition equation (37) as follows. These true parameter values by and large resemble the parameter values that have been used to generate data from the loglinear DSGE model speciﬁcations. We also make monetary policy less responsive to the output gap by setting 2 = 125. For computational convenience.S. For a good approximation of the prediction error distribution. However.S. The slope coefﬁcient of the Phillips curve. These values are different from the mean of the historical observations because in the nonlinear version of the DSGE model means differ from steady states. r (A) .Bayesian Analysis of DSGE Models
161
be deﬁned later.t can be expressed as xj . . The law of motion ˆ ˆ for the endogenous state variables xj . and 3. data. In the linear case. are chosen so that the second moments are matched to the U. is 53. p (s0  ).t =
(0) j
+
(1) j wt
+ wt
(2) j wt
(71)
where wt = [xt −1 . Decompose st = [xt . and z .
162
S. The ﬁltering step is implemented as FORTRAN mex library. we found that 40. In our application. The differences between the peaks are most pronounced for the steadystate parameters r (A) and (A) . Based on a sample of 80 observations we can generate 1. the number of particles affects the performance of the resampling algorithm. An and F. In our implementation. The parameter values for and 1/g are ﬁxed at their true values because they do not enter the linear likelihood. the quadratic loglikelihood peaks around the true parameter values.000 particles were enough to get stable evalutaions of the likelihood given the size of the measurement errors.000 particles in 1 h and 40 min (6. If the number of particles is too small. A Look at the Likelihood Function We will begin our analysis by studying proﬁles of likelihood functions. but it takes too much time to evaluate the likelihood function using the particle ﬁlter. more particles are needed to obtain an accurate approximation of the likelihood function and to ensure that the posterior weights ti do not assign probability one to a single particle. We implement most of the procedure in MATLAB (2005) so that we can exploit the symbolic toolbox to solve the DSGE model. For some parameters
. Two features are noticeable in this comparison. It is optimal in terms of variance in the class of unbiased resampling schemes. while the linear loglikelihood attains its maximum around the pseudotrue parameter values.0 sec per draw) on the AMD64 3000+ machine with 1 GB RAM. While in the linearized model steady states and longrun averages coincide. For brevity.000 observations.2. the particle ﬁlter can be readily implemented on a good desktop computer. they differ if the model is solved by a secondorder approximation. Schorfheide
p(st  Y t ). However.000 draws are generated in 24 sec in our Matlab routines and 3 sec in our GAUSS routines. First. which are obtained by ﬁnding the mode of the linear likelihood using 3. Linear methods are more than 200 times faster: 1. the stratiﬁed resampling scheme proposed by Kitagawa (1996) is applied. Figure 14 shows the loglikelihood proﬁles along each structural parameter dimension. we evaluate the linear likelihood around the pseudotrue parameter values. If the measurement errors in the conditional distribution p(yt  st ) are small. Moreover. It is natural to evaluate the quadratic likelihood function in the neighborhood of the true parameter values given in Table 3. the resampling will not work well.000 draws with 40. 6. Recall that we had introduced a reducedform Phillips curve parameter in (32) because and cannot be identiﬁed with the loglinearized DSGE model. we will refer to the likelihood obtained from the ﬁrstorder (secondorder) accurate solution of the DSGE model as linear (quadratic) likelihood. Even though the estimation procedure involves extensive random sampling. so we can call it as a function in MATLAB.
Bayesian Analysis of DSGE Models
163
FIGURE 14 Linear vs. Likelihood proﬁle for each parameter: 1 (L)/Kalman ﬁlter (dashed) and 1 (Q )/particle ﬁlter (solid). quadratic approximations: likelihood proﬁles. Vertical lines signify true (dotted) and pseudotrue (dasheddotted) values.000 particles are used. Data Set 1 (Q ). 40.
.
which implies that the nonlinear approach is able to extract more information on the structural parameters from the data. Unlike in the analysis of the likelihood function we now substitute the adjustment cost parameter with the Phillipscurve parameter in the quadratic approximation of the DSGE model. The right panel of Figure 15 indicates that the iso.1 and standard deviation 0. The Posterior We now compare the posterior distributions obtained from the linear and nonlinear analysis.lines intersect with the contours of the quadratic likelihood function. the linear likelihood is ﬂat for the values of and that imply a constant . Schorfheide
FIGURE 15 Linear vs. is constant along dashed lines.lines. which
. As noted before. which depicts the linear and quadratic likelihood contours in . This is illustrated in Figure 15. 40. g does not enter the linear version of the model and hence the loglikelihood proﬁle along this dimension is ﬂat. Data set 1 (Q ). The contours of the linear likelihood trace out iso. Large dot indicates true and pseudotrue parameter value. Moreover. A beta prior with mean . it appears that the monetary policy parameter such as 1 can be more precisely estimated with the quadratic approximation.05 is placed on . quadratic approximations: Likelihood contours.3.000 particles are used.space. 6. Contours of likelihood (solid) for 1 (L) and 1 (Q ). In the quadratic approximation. which suggests that and are potentially separately identiﬁable. the loglikelihood is slightly sloped in 1/g = c/y dimension.164
S. our sample of 80 observations is too short to obtain a sharp estimate of the two parameters. the quadratic approximation can identify some structural parameters that are unidentiﬁable under the loglinearized model. For instance.
the quadratic loglikelihood function is more concave than the linear one. However. however. In both cases we use the same prior distribution reported in Table 3. An and F.
The quadratic posterior mean of 1 is larger than the linear posterior mean and closer to the true value. The draws tend to be more concentrated as we move from prior to posterior. ﬁxed at their true values. and 1/g . This result suggests that 1 (Q ) exhibits nonlinearities that can be detected by the likelihoodbased estimation methods. The log marginal likelihoods are −416 06 for 1 (L) and −408 09 for 1 (Q ). which imply a Bayes factor of about e 8 in favor of the quadratic approximation.85 and standard deviation . Conditional on this parameter is not identiﬁable under the linear approximation and hence its posterior is not updated. This Hessian is used for both the linear and the nonlinear analysis. for most of our parameters. there is a clear negative correlation between and according to the quadratic posterior. We use scaling factors . there are a few exceptions. we evaluate the Hessian to be used in the RWM algorithm at the (linear) mode without ﬁxing and 1/g . the quadratic posterior of is markedly different from the prior as it concentrates around 0. linear posterior. Figures 16 and 17 depict the draws from prior. The parameter does. While. however. and hence a beta prior with mean 0. Moreover. Our simulation results with respect to linear versus nonlinear estimation are in line with the ﬁndings reported in FernándezVillaverde and RubioRamírez (2005) for a version of the neoclassical stochastic growth model. After that. both for simulated as well
. In their analysis the nonlinear solution combined with the particle ﬁlter delivers a substantially better ﬁt of the model to the data as measured by the marginal data density. We ﬁrst compute the posterior mode for the linear speciﬁcation with the additional parameters.000 draws from each distribution are generated and every 500th draw is plotted. Table 6 reports the log marginal data densities for our linear ( 1 (L)) and quadratic ( 1 (Q )) approximations of the DSGE model based on Data Set 1 (Q ). The quadratic posterior means for the steadystate parameters r (A) and (A) are also greater than the corresponding linear posterior means. and quadratic posterior distribution. respectively. which is consistent with the positive difference between “true” and “pseudotrue” values of these parameters. The Markov chains are initialized in the neighborhood of the pseudotrue and true parameter values. so that the inverse Hessian reﬂects the prior variance of the additional parameters. Note that 1 − 1/g is the steadystate government spending–output ratio.05.Bayesian Analysis of DSGE Models
165
roughly covers micro evidences of 10–15% markup. Now consider the parameter that determines the demand elasticity for the differentiated products.1 is used for 1/g . We use the RWM algorithm to generate draws from the posterior distributions associated with the linear and quadratic approximations of the DSGE model.25 (quadratic) to target an acceptance rate of about 30%. Indeed.5 (linear) and . linear and quadratic posteriors are quite similar. 100. affect the secondorder terms that arise in the quadratic approximation of the DSGE model.
40. 100. quadratic approximation I. An and F.166
S. Intersections of solid and dotted lines signify true and pseudotrue parameter values.000 particles are used. Every 500th draw is plotted.000 draws from the prior and posterior distributions. Data set 1 (Q ).
. Schorfheide
FIGURE 16 Posterior draws: linear vs.
.Bayesian Analysis of DSGE Models
167
FIGURE 17 Posterior draws: Linear vs. See Figure 16. quadratic approximation II.
7. Georg Strasser.
ln p(Y 
1)
Bayes factor vesus exp[7 97] 1. We also thank our discussants as well as Marco Del Negro. the authors point out that the differences in terms of point estimates. Thomas Lubik. Jesús FernándezVillaverde. seminar participants at Columbia University. An and F. Moreover. The views expressed here are those of the authors and do not necessarily reﬂect the views of the Federal Reserve Bank of Philadelphia or the Federal Reserve System. CONCLUSIONS AND OUTLOOK There exists by now a large and growing body of empirical work on the Bayesian estimation and evaluation of DSGE models. many challenges remain. Hence it is important to develop methods that incorporate potential model misspeciﬁcation in the measures of uncertainty constructed for forecasts and policy recommendations. We thank Herman van Dijk and Gary Koop for inviting us to write this survey paper for Econometric Reviews. linear solution 1 (L) Benchmark model. ACKNOWLEDGMENTS Part of the research was conducted while Schorfheide was visiting the Research Department of the Federal Reserve Bank of Philadelphia. may have important effects on the moments of the model. and
. Nevertheless. Schorfheide
1 (Q )
TABLE 6 Log marginal data densities based on Speciﬁcation Benchmark model. Lack of identiﬁcation of structural parameters is often difﬁcult to detect since the mapping from the structural form of the DSGE model into a state–space representation is highly nonlinear and creates a challenge for scientiﬁc reporting as audiences are typically interested in disentangling information provided by the data from information embodied in the prior. techniques to estimate DSGE models solved with nonlinear methods have become implementable thanks to advances in computing technology. This paper has surveyed the tools used in this literature and illustrated their performance in the context of artiﬁcial data. Juan RubioRamírez. While most of the existing empirical work is based on linearized models. for whose hospitality he is grateful. Model misspeciﬁcation is and will remain a concern in empirical work with DSGE models despite continuous efforts by macroeconomists to develop more adequate models.00
1 (Q )
−416 06 −408 09
as actual data. quadratic solution 1 (Q ) Notes: See Table 4. Model size and dimensionality of the parameter space pose a challenge for MCMC methods.168
S. although relatively small in magnitude.
On the indeterminacy of new keynesian economics. E. Some evidence on the importance of sticky prices. Journal of Monetary Economics 50(8):1751–1768. Canova. Laborsupply shifts and economic ﬂuctuations. Journal of Economic Dynamics and Control 25(6–7):979–999. The solution of linear difference models under rational expectations. Manuscript. IGIER and Bocconi University. (2004). Statistical inference in calibrated models. Box. FernándezVillaverde. (1995). O. J.Bayesian Analysis of DSGE Models
169
the participants of the 2006 EABCN Training School in Eltville for helpful comments and suggestions. Journal of Political Economy 113(1):1–45. Villani. (2004). Klenow.. Eichenbaum. American Economic Review 92(5):1498–1520. (2000). G. (1989). (2001). R. Accuracy of stochastic perturbation methods: the case of asset pricing models. A. Juillard. IEEE Transactions on Signal Processing 50(2):173–188. (2005). International Economic Review 30(4):889–920. Canova. M. J. Multivariate linear rational expectations models: characterization of the nature of the solutions and their fully recursive computation. (1992).. Journal of Economic Dynamics and Control 30(12):2477–2508. M. M.. F.. J. F.. S.. Arulampalam. H. L. Canova. NJ: Princeton University Press. Bierens.. Back to square one: identiﬁcation issues in DSGE models. American Statistician 49(4):327–335. Chang. Sloan Foundation. American Economic Review 85(3):512–531. N. (2005). M. Gordon. Basu. J. Manuscript. Pesaran.. M. Econometrica 48(5):1305–1312. Farmer. (2002). S.. S. I. M.. M. Schorfheide. Y. E. S. Econometric analysis of linearized singular dynamic stochastic general equilibrium models. Aruoba. Schorfheide gratefully acknowledges ﬁnancial support from the Alfred P. Bils. (2004). J. G. Federal Reserve Board of Governors. Christiano. Christiano. T. Evans. Jeliazkov. J. F. Lindé. REFERENCES
Adolfson. F. J. Chib.. (2007). forthcoming. Greenberg. Collard. Gomes. Current realbusiness cycle theories and aggregate labormarket ﬂuctuations. (1980). A reliable and computationally efﬁcient algorithm for imposing the saddle point property in dynamic models. Anderson. S.. M. Clapp. ECB Working Paper 323. Monetary policy and the evolution of the US economy: 19482002. L. Journal of the Royal Statistical Society Series A 143(4):383–430. Intermediate goods and business cycles: implications for productivity and welfare. Marginal likelihood from the metropolishastings output. A tutorial on particle ﬁlters for online nonlinear/nonGaussian Bayesian tracking.. (2002). L. Canova. (2001). P. L. Journal of Applied Econometrics 9:S123–S144. Altug. (2003).. Sala.. IGIER and Universitat Pompeu Fabra. Comparing solution methods for dynamic equilibrium economies. P.. Chib. Christiano. (2004). Schorfheide. Methods for Applied Macroeconomic Research. J. Journal of International Economics. M. American Economic Review 82(3):430–450. (2002). F. Nominal rigidities and the dynamic effects of a shock to monetary policy. F. Y. Timetobuild and aggregate ﬂuctuations: some new evidence. L. RubioRamírez. Beyer. Binder. (1994). Manuscript. Maskell. Sampling and Bayes inference in scientiﬁc modelling and robustness. F. (1997).. S.
. H. Journal of Political Economy 112(5):947–985.. Kahn. (2004). C. Chang. (2007). Learningbydoing as a propagation mechanism. S. Solving dynamic equilibrium models by a methods of undetermined coefﬁcients. B. S. C. F.. Journal of the American Statistical Association 96(453):270–281. J. Bayesian estimation of an open economy DSGE model with incomplete passthrough. Laséen... Understanding the metropolishastings algorithm.. (1980). Journal of Econometrics 136(2):595–627. (1995). Computational Economics 20(1–2):21–55. Blanchard. Econometric Theory 13(6):877–888. Eichenbaum.
.. J. International Economic Review 45(2):643–673. GAUSS (2004). New York: Chapman & Hall. E. Renault. Bayesian model choice: asymptotics and exact calculations. A method for taking models to the data. DYNARE (2005).. National Bank of Belgium. Contemporary Bayesian Econometrics and Statistics. Ireland. F. M... Rabanal. Del Negro. D. Schorfheide.. Ingram. Perturbation methods for general dynamic stochastic models. (1970).. F. Geweke. R. H. J. (2006). F. Biometrika 57(1):97–109. pp. (2002).. X. Federal Reserve Bank of Atlanta. C. H. B.. Wouters. FernándezVillaverde. RubioRamírez. DeJong. L. IEE ProceedingsF 140(2):107–113.aptech. Fear of ﬂoating: a structural estimation of monetary policy in Mexico. Manuscript.. Salmond. Asymptotic expansions of posterior expectations. An open economy DSGE model linking the Euro area and the U. M. G.. Annals of the Institute for Statistical Mathematics 40:297–309..170
S. Journal of Econometrics 136(2):397–430.S. (1989).. Galí. Rubin. (2004). H. (1999b).. (1996). Journal of Business and Economic Statistics 14(1):1–9. M. J. Hoboken: John Wiley & Sons. The cyclical behavior of skill acquisition. (2005). Hammersley. Monte Carlo sampling methods using markov chain and their applications. Priors from general equilibrium models for VARs. J. Accuracy in simulation. Bayesian Data Analysis. L. Software available from http://www. Novel approach to nonlinear and nongaussian Bayesian state estimation. distributions. Schorfheide
Crowder. K.com/. Heckman. NBER Macroeconomics Annual 2004. Whiteman. Ingram. Hoover Institute.. (2005).. Rational decisions. (2003). Cambridge. (2007). M. Ingram. MA: MIT Press. FernándezVillaverde. DeJong. J. Manuscript. I. Jin. Geweke. (1952). Journal of Applied Econometrics 20(7):891–910. economy.. Guay. J. J. Judd. J. Stern. Del Negro.. Journal of Economic Dynamics and Control 28(6):1205–1226. M. MA: MIT Press. (1995). M. B. J. Berkowitz. (1998). A. Journal of Monetary Economics 34(3):497–510. (2001). Estimating dynamic equilibrium economies: linear versus nonlinear likelihood. Comparing dynamic equilibrium models to data: aBayesian approach. Dynamic equilibrium economies: a framework for comparing models and data. Gelman.. Technology shocks and aggregate ﬂuctuations: how well does the RBC model ﬁt postwar U. C. (1994).. Manuscript. eds. W. A. Journal of Econometrics 98(2):203–223. Dridi. J. A. D. Judd. Supplanting the Minnesota prior: forecasting macroeconomic time series using real business cycle model priors. Numerical Methods in Economics. Cambridge. Geweke. Review of Economic Dynamics 4(3):536–561. R. Journal of Econometrics 123(1):153–187. J. K. DeJong. software available from http://cepremap.H.. (2000). Econometrica 57(6):1317–1339. (2004). D.cnrs.. M. 225–228. Hansen.. (1994). K. F. (1988). Review of Economic Studies 61(1):3–17. New York: John Wiley. P. F.. F. development and communication. Handscomb. Ohanian.S. RubioRamírez. D. F. L. (2004). J. E. J.. (1996). P. Whiteman. F. University of Iowa. F. Journal of the Royal Statistical Society Series B 14(1):107–114. E. Journal of Economic Perspectives 10(1):87–104. Schorfheide. N. Manuscript. (2005). Indirect inference and calibration of dynamic stochastic general equilibrium models. A. Gelfand. N.. J. L. (1964). C. Smets. H. Gordon. Smith. (1998).. Geweke. and densities for stochastic processes. Marcet. Whiteman. Dey. Bayesian inference in econometric models using Monte Carlo integration. N. Journal of Business and Economic Statistics. N. R. D. B..
. F. The empirical foundations of calibration. On the ﬁt and forecasting performance of new keynesian models. D. (1993). Review of Economic Studies 65(3):433–452. (1999a). Ingram.. B. J. A Bayesian approach to calibration. (2004). Carlin. data. A Bayesian approach to dynamic macroeconomics. J. K. Del Negro. Wouters. N. (1994). J. In: Gertler. Rogoff. K. Computational experiments and reality. Good. An and F.. forthcoming. F. Monte Carlo Methods. Hastings.fr/dynare/. Econometric Reviews 18(1):1–126. W. Diebold. Den Haan. Journal of the Royal Statistical Society Series B 56(3):501–514. Using simulation methods for bayesian econometric models: inference. P. D. A. J. de Walque.
J. G. (1998). P. Schorfheide. F. J. pp. dissertation. (2004). Journal of Computational and Graphical Statistics 5(1):1–25. K. Constructing and estimating a realistic optimizing model of monetary policy.. Laforte. Bayesian Comparison of Dynamic Macroeconomic Models. R. Manuscript. and the likelihood principle in nonstationary time series models. (1999). S. Shephard.. Bayes factors. The computational experiment: an econometric tool. N. Using the generalized Schur form to solve a multivariate linear rational expectations model. Econometrica 64(4):763–812. Journal of the American Statistical Association 94(446):590–599. R. Filtering via simulation: Auxiliary particle ﬁlters. Schaumburg. On measuring the welfare cost of business cycles. Do central banks respond to exchange rate ﬂuctuation – a structural investigation. S. Lubik. K. NBER Macroeconomics Annual 2005. eds. K. (2005). Preston. Rotemberg. Levin. Ph.. T. In: Pesaran. M. Lancaster. (1995).Bayesian Analysis of DSGE Models
171
Justiniano. Econometrica 66(2):359–380. Cambridge. Journal of Chemical Physics 21(6):1087–1092. Kim.. Kim.. Sims. J. M. Cambridge. A. A. American Economic Review 94(1):190–217. Prescott. Kim. (1994). Williams. Kydland. E... F. (1994). Rosenbluth. (1982). Rafterty. pp. H. (2004).. G. (1996). (1953). Schorfheide. (1998). Journal of Monetary Economics 47(1):61–92.. E. (1996)... M.P. Princeton University. E. P. Rogoff.. Testing for indeterminacy: an application to US monetary policy. C. Alexei O. Version 7. Kim. Empirical and policy performance of a forwardlooking monetary model.. Journal of Monetary Economics 45(2):329–359. Journal of Economic Dynamics and Control 24(10):1405–1423. Oxford: Basil Blackwell. Kim. A. Software available from http://www. A. Econometric analysis of calibrated macroeconomic models. W. N. Monte Carlo ﬁlter and smoother for nongaussian nonlinear state space models. Chib. International Monetary Fund and Columbia University. Onatski. Stochastic volatility: likelihood inference and comparison with ARCH Models. Prescott. In: Stanley. J. R. T. Otrok. Comparing monetary policy rules in an estimated equilibrium model of the US economy. F. (2000). Calculating and using second order accurate solutions of discrete time dynamic equilibirum models. W. G. Lubik. (2004). McGrattan. M. estimation. E. (2004). Manuscript. Equation of state calculations by fast computing machines. F. MA: MIT Press.
. MA: MIT Press. Kass. E. King. Williams. 356–390. C. The solution of singluar linear difference systems under rational expectations. Teller. Schorfheide. B. University of Minnesota. Federal Reserve Board of Governors. A. NBER Macroeconomics Annual 2005.. Journal of the American Statistical Association 90(430): 773–795. University of Pennsylvania. Journal of Economic Perspectives 10(1):69–85. J. A. MA: MIT Press. J. Kim. (1998). Journal of International Economics 60(2):471–500. N. Metropolis. In: Mark G. E. Pagan. A Bayesian look at new open economy macroeconomics. Phillips.. Teller.. Spurious welfare reversal in international business cycle models. (1998).. Manuscript. Econometrica 50(6):1345–1370. Sims. (2004). A. J. (1995). pp. N.. Kitagawa.com/. T. E. MATLAB (2005). A. Lubik.mathworks. N.. K. Kim. Econometric model determination. Large sample properties of posterior densities. pp.. A. Time to build and aggregate ﬂuctuations.. A. J. (2001).. C. bayesian information criterion. S... (2006). Cambridge... Manuscript. and model ﬁt. (2003). Klein. Manuscript. Rogoff.Y. (2006). Wickens.. Watson. Review of Economic Studies 65(3):361–393. J. In: Mark. A. E. C. Princeton University. Monetary policy under uncertainty in microfounded macroeconometric models. 81–118. Toward a modern macroeconomic model usable for policy analysis. C. Williams. E.D.. A. Shephard. Journal of Monetary Economics 33(3):573–601. eds.. 313–366... C. (2003). H. H.. N. A. (1996). Leeper. Small open economy Dsge models – speciﬁcation. E. 229–287. Rosenbluth. An Introduction to Modern Bayesian Econometrics. Kydland.. C. International Economic Review 39(4):1015–1026. eds. Oxford: Blackwell Publishing. Handbook of Applied Econometrics: Macroeconomics. (2000). S. NBER MAcroeconomics Annual 1994. LandonLane. F. F. Kim. H. B. The macroeconomic effects of distortionary taxation. T. eds. M. Pitt.
Journal of European Economic Association 1(5):1123–1175. P. (2003). Uribe. Revising beliefs in nonidentiﬁed models.. New York: Springer Verlag. B. Journal of Political Economy 101(6):1011–1041. HigherOrder perturbation solutions to dynamic. Solving linear rational expectations models. Computational Methods for the Study of Dynamic Economies. Optimal simple and implementable monetary and ﬁscal rules. R. (1993). (2006). A toolkit for analyzing nonlinear dynamic stochastic models easily.172
S. (2005a). On the asymptotic behaviour of posterior distributions. Journal of Political Economy 97(2):251–287. J. Two models of measurements and the investment accelerator. Rabanal. G.. Rotemberg. Computational Economics 20(1–2):1–20. G. A. (1997). M. Comparing new keynesian models in the Euro area: A bayesian approach. Schorfheide. Probability models for monetary policy decisions. Uhlig. Rabanal. M. Journal of Economic Perspectives 10(1):105–120. NJ: Princeton University Press. Oxford. H. CEPR Discussion Paper No. C. Smets. F. Federal Reserve Bank of San Fransisco. E. Sims. 5957. Uhlig. F. F. 30–61. pp. Anderson. (2005). M. Journal of Monetray Economics. (2005). Sims. Swanson. Princeton. (1996).. An estimated dynamic stochastic general equilibrium model of the Euro area. MA: MIT Press. J. An and F. SchmittGrohé. NBER Macroeconomics Annual 1997. A. H. Princeton University. RubioRamírez. Journal of Applied Econometrics 8:S63–S84. eds. J. Rabanal. (2005b). Loss functionbased evaluation of DSGE models. Journal of Applied Econometrics 20(2):161–183. Journal of Monetary Economics 52(6):1152–1166. A. Journal of Business and Economic Statistics 8(1):1–17. P. (1993). J. J.. Walker. Sims. EuroDollar real exchange rate dynamics in an estimated twocountry model: What is important and what is not. P. A. Smith. Solving nonlinear stochastic growth models: a comparison of alternative solution methods. Schorfheide
Poirier. Econometric Theory 14(4):483–509. White... W. A... (1989). J. Smets. Manuscript. Wouters. F. Sargent. Comparing shocks and frictions in US and Euro area business cycles: a Bayesian DSGE Approach. (1998). (1999). A. S. Tuesta. F.. (2003). Monte Carlo Statistical Methods. Manuscript. (1999). Macroeconomics and methodology. RubioRamírez. S. Wouters. A. Woodford.. Watson. Federal Reserve Bank of Atlanta Working Paper 2005–8. A.. Casella. Levin.. (1971). Taylor.. UK: Oxford University Press. Duke University. Maximum likelihood estimation of misspeciﬁed models. Review of Economic Dynamics 8(2):392–419.. M. (1990). Measures of ﬁt for calibrated models. Zeller. In: Marimón. (1982). A. In: Bernanke. J. Comparing new keynesian models of the business cycle: A Bayesian approach. Journal of the Royal Statistical Society Series B 31(1):80–88. (2003).. Rotemberg. Cambridge. Journal of Applied Econometrics 15(6):645–670. discretetime rational expectations models. M. R. Schorfheide. T. J. Learning and monetary policy shifts. V. (2000). Scott. pp. Estimating nonlinear timeseries models using simulated vector autoregressions. C. C. D. Interest and Prices: Foundations of a Theory of Monetary Policy. New York: John Wiley & Sons. H. (1969). (2006). (2005). F. Woodford. R. Introduction to Bayesian Inference in Econometrics. 297–346. C. (2002). B. eds. Econometrica 50(1):1–25. Robert. An optimizationbased econometric framework for the evaluation of monetary policy.
.