Lecture 4 Slides

Lecture 4: Multivariate Linear Time
Series (Vector Autoregressions)

Prof. Massimo Guidolin and Dr. Francesco
Rotondi
20192– Financial Econometrics
Winter/Spring 2023
Overview
▪ Multivariate strong vs. weak stationarity
▪ Multivariate white noise and testing for it
▪ Vector autoregressions: reduced vs. structural forms
▪ From structural to reduced-form VARs and back:

identification issues
▪ Recursive Choleski identification
▪ Stationarity and moments of VARs, model specification
▪ Impulse response functions and variance decompositions
▪ Granger causality tests

Lecture 4: Multivariate Linear Time Series (VARs) – Prof. Guidolin & Dr. Rotondi 2
Motivation and Preliminaries
▪ Because markets and institutions are highly intercorrelated, in
financial applications we need to jointly model different time series
to study the dynamic relationship among them
▪ This requires multivariate time series analysis
▪ Instead of focusing on a single variable, we consider a set of
variables, 𝒚𝑡 ≡ [𝑦1,𝑡 𝑦2,𝑡 … 𝑦𝑁,𝑡 ]′ with t = 1, 2, …, T
o The resulting sequence is called a N-dimensional (discrete) vector
stochastic process
▪ Most important example are vector autoregressive (VAR) models
o Flexible models in which a researcher needs to know very little ex-
ante theoretical information about the relationship among the
variables to guide the specification of the model
o All variables are treated as a-priori endogenous
▪ But first need to generalize the concepts of (weak) stationarity to the case
of N-dimensional vector time series and discuss how to compute the first
two moments
Multivariate Weak vs. Strong Stationarity
▪ The object that appears in the definition is new but it collects
familiar objects: given a sample , the cross-covariance matrix can be
estimated by ‘ h0
o Here is
the vector of sample means
o When h = 0 we have the
sample covariance matrix
o The cross-correlation
matrix is
where D is the diagonal
matrix that collects sample standard deviations on its main diagonal
Cross-sample
correlogram (i.e., off-
diagonal element of
0, 1, …, 24)
It means “if and only if”
▪ Iff a series is stationary, all cross-serial correlations will decay to 0

▪ Strict stationarity has identical definition, except that it now
involves the joint multivariate PDF of the variables
o So we have both a time series dimension, f(y1), f(y2), …, f(yT), but also
each of the f(yt) is a N X 1 multivariate density
Multivariate White Noise and Portmanteau Tests
▪ Ljung and Box’s (1978) Q-statistic to jointly test whether several
(M) consecutive autocorrelation coefficients were equal to zero can
be generalized to the multivariate case, see Hosking (1980):
▪ vs. for some i = 1, 2, …, m can be tested
using:
o tr(A) is the trace of a matrix, simply the sum of the elements on its
main diagonal
o Q(m) has an asymptotic (large sample) 𝜒𝑁2 2𝑚 (which may be poor in
small samples)
o Note that the null hypothesis corresponds to:
Vector Autoregressions: Reduced-Form vs. Structural
▪ A VAR is a system regression model that treats all the N variables as
endogenous and allows each of them to depend on p lagged values
of itself and of all the other variables
(serially)
▪ For instance, when N = 2, yt = [xt zt ]’ or [R1,t R2,t]’, one example
concerning two asset returns may be:
u1,t
u2,t – Prof. Guidolin
Lecture 24: Multivariate Linear Time Series (VARs) 8
o   Var[ut] is the covariance matrix of the shocks
o When it is a full matrix, it implies that contemporaneous shocks to
one variable may produce effects on others that are not captured by
the VAR structure
▪ If the variables included on the RHS of each equation in the VAR
are the same then the VAR is called unrestricted and OLS can be
used equation-by-equation to estimate the parameters
o This means that estimation is very simple
▪ When the VAR includes restrictions, then one should use MLE,
which in this case often takes the form of Generalized Least
Squares (GLS), Seemingly Unrelated Regressions (SUR), or MLE
▪ Because the VAR(p) model, yt = a0 + A1yt-1 + A2yt-2 + ... + Apyt-p+ ut
does not include contemporaneous effects, it is said to be in
standard or reduced form, to be opposed to a structural VAR
▪ In a structural VAR(p), the contemporaneous effects do not need to
go through the covariance matrix of the residuals, ut
▪ What is the difference? Consider the simple N = 2, p = 1 case of
where both xt and zt are stationary, xt and zt are uncorrelated
white noise processes, also called structural errors
▪ Using matrices, this VAR(1) model may be re-written as:
Structural VAR
▪ Pre-multiplying both sides by B-1 (this will be possible if b12b21

1),
a0
a0 Reduced-form VAR
▪ The “new” error terms are composites of the two primitive shocks:
▪ What are the properties of the reduced form errors? Recall that xt
and zt were uncorrelated, white noise processes, then:
▪ The reduced-form shocks uxt and uzt will be correlated even though
the structural shocks are not:
which shows that they are uncorrelated if b12 = b21 = 0

▪ This is very important: unless the variables are contempo-
raneously uncorrelated in the structural VAR (b12 = b21 = 0), a
reduced-form VAR will generally display correlated shocks
o If VARs are just extensions of AR models under what conditions will
they be stationary? Stay tuned...
▪ A structural VAR cannot be estimated directly by OLS: because of
the contemporaneous feedbacks, each contemporaneous variable
is correlated with its own error term
o Standard estimation requires that the regressors be uncorrelated
with the error term or a simultaneous equation bias will emerge
o An additional drawback of structural models is that contempora-
neous terms cannot be used in forecasting, where VARs are popular
▪ However there is no such problem in estimating the VAR system in
its reduced form; OLS can provide estimates of a0 and A1, A2, …
o Moreover, from the residuals, it is possible to calculate estimates of
the variances of and of the covariances between elements of ut
o The issue is whether it is possible to recover all of the information
present in the structural primitive VAR
▪ Is the primitive system identifiable given OLS estimates?
▪ The lack of identification may be overcome if one is prepared to
impose appropriate restrictions on the primitive, structural system
Identifying Structural from Reduced-Form VARs
▪ The reason is clear if we compare the number of parameters of the
primitive system with the number recovered from the estimated
VAR model
8 mean parameters + 2
a0 6 mean parameters + 3
a0
o 9 vs. 10: unless one is willing to restrict one of the parameters, it is
not possible to identify the primitive system and the structural VAR
is under-identified
▪ One way to identify the model is to use the type of recursive
system proposed by Sims (1980): we speak of triangularizations
▪ In our example, it consists of imposing a restriction on the
primitive system such as, for example, b21 = 0
▪ As a result, while zt has a contemporaneous impact on xt, the
opposite is not true
Identifying Structural from Reduced-Form VARs
▪ In a sense, shocks to zt are more primitive, enjoy a higher rank, and
move the system also through a contemporaneous impact on xt
▪ The VAR(1) now acquires a triangular structure:
a0
a0
▪ This corresponds to imposing a Choleski decomposition on the
covariance matrix of the residuals of the VAR in its reduced form
▪ Indeed, now we can re-write the relationship between the pure
shocks (from the structural VAR) and the regression residuals as
Recursive Choleski Identification
▪ Working out the full algebra,
o Notice that by estimating 6 mean parameters, it is possible to pin

down the 6 structural parameters; same for variances and
covariances: a01 a02
9 equations,
9 unknowns
▪ We say that the triangularized VAR(1) is exactly identified
Recursive Choleski Identification
▪ In fact, this method is quite general and extends well beyond this
VAR(1), N = 2 example: in a N-variable VAR(p), B is a NxN matrix
because there are N residuals and N structural shocks
▪ Exact identification requires (N2  N)/2 restrictions placed on the
relationship btw. regression residuals and structural innovations
▪ Because a Choleski decomposition is triangular, it forces exactly
(N2 -N)/2 values of the B matrix to equal zero
o Because with N = 2, (22 -2)/2 = 1, you see that b21 = 0 was sufficient
in our example
▪ There are as many Choleski decompositions as all the possible
orderings of the variables, a combinatorial factor of N
o A Choleski identification scheme results in a specific ordering, we are
introducing a number of (potentially arbitrary) assumptions on the
contemporaneous relationships among variables
o Choleski decompositions are deliberate in the restrictions they place
but tend not to be based on theoretical assumptions
Stationarity of VAR Processes
▪ Alternative identification schemes are possible (but they are more
popular in macroeconomics than in finance)
▪ For a general VAR(p), algebra shows that
which is the vector moving average (VMA) infinite representation
o The vectors of coefficients 1, 2, 3, … are complex functions of the
original (reduced-form) coefficients
o  = E[yt] is the unconditional mean of the VAR process
▪ In a VAR(1), we have
and i = Ai1, i.e., increasing powers of the A1 matrix

▪ For such dependence to fade progressively away as the time
distance between yt and past innovations grows, i must converge
to zero as i goes to infinity
o This means that all the N eigenvalues of the matrix A1 must be less
than 1 in modulus, in order to avoid that Ai1 will either explode or
converge to a nonzero matrix as i goes to infinity
Stationarity and Moments of VAR Processes
o The requirement that all the eigenvalues of Ai1 are less than one in
modulus is a necessary and sufficient condition for yt to be stable
(and, thus, stationary, as stability implies stationarity), that is:
det(IN − A1z ) = 0 | z | 1
▪ Multivariate extension of the Wold’s representation theorem
▪ Assuming now stationarity, in a VAR(1) case:
vec of
= N
▪ These hold if and only if is non-singular which requires

that, again, all the eigenvalues of A1 are less than 1 in modulus
▪ These unconditional moments must be contrasted with the
conditional moments:
Conditional vs. Unconditional Moments
o While the unconditional covariance matrix is a function of both the
covariance matrix of the residuals, u, and of the matrix A1,
conditioning on past information, the covariance matrix of yt is the
same as the covariance matrix of the residuals, u
▪ For instance, in the case of the US monthly interest rate data on 1-
month and 10-year Treasuries, we have (t-statistics in […]):
=
o The conditional moments are obviously different, e.g.,
Generalizations to VAR(p) Models
▪ At some cost of algebra complexity, these findings generalize
o If is non-singular,
assuming the series is weakly stationary
o The conditional mean differs from the unconditional one, as
o The expressions for conditional and unconditional covariance

matrices are harder to derive (think of Yule-Walker equations)
o As for stationarity, a VAR(p) model is stable (thus stationary) as long
as
det(IN − A1z − ... − A p z p ) = 0 for | z | 1
o The roots of the characteristic polynomial should all exceed one in
modulus (i.e., they should lie outside the unit circle)
▪ The typical estimation outputs,
o OLS equation-by-equation for unrestricted reduced-form models
o MLE/GLS otherwise (more complex estimators for structural VARs)
have standard structure but deal with N+N2p+0.5N(N+1) parameters
VAR(p) Model Specification
▪ Increasing the order of a VAR reduces the (absolute) size of the
residuals and improves the fit, but damages its forecasting power
because of the risk of overfitting
o If the lag length is p, each of the N equations will contain Np coeffi-
cients plus the intercept
▪ How do we appropriately select or even test for p?
o Such a lag choice is then common to all equations  restricted
models in which each equation is specified separately (by t- or F-
tests that however are often contradicting the need to avoid OLS)
▪ We can use the likelihood ratio (LR) test – want to test the
hypothesis that a set of variables was generated from a Gaussian
VAR with p0 lags against the alternative specification of p1 lags
▪ Under the assumption of normally distributed shocks
෩ 𝑝0 ෩ 𝑝1 𝑎𝑠𝑦 2
▪ L𝑅𝑇 𝑝0 , 𝑝1 = (𝑇 − 𝑁𝑝0 − 1) ln 𝜮𝑢 − 𝑙𝑛|𝜮𝑢 | 𝜒𝑁2 (𝑝1 −𝑝0 )
o N2(p1 – p0) is the number of restrictions Determinant of a matrix
that are tested
VAR(p) Model Specification
o Large values of the LRT trigger a rejection of the null hypothesis that
෩ 𝑝𝑢0 − 𝑙𝑛|𝜮
lags are sufficient, as 𝑙𝑛 𝜮 ෩ 𝑝𝑢1 |, an indication that increasing
the number of lags reduces RSS by a lot
▪ LR tests can only be used to perform a pairwise comparison of two
VARs, one obtained as a restricted version of the other (nested)
▪ The recipe is then to first to specify the largest VAR and then
proceed to pair it down until we can reject the null hypothesis
o LRT is valid only under the assumption that errors are normally
distributed; without distributional assumptions, it is unclear
whether it may have any merit
▪ An alternative approach is to minimize a multivariate version of
the information criteria: 2
where K = N2p + N
Forecasting with VAR Models
o The various
criteria suggest
it would be pru-
dent to estimate
larger VAR models
▪ Loss functions that
lead to the minimi-
zation of the mean
squared forecast
error (MSFE) are the
most widely used
▪ The minimum time t
MSFE prediction at
a forecast horizon h
is the conditional
expected value:
Impulse Response Functions
o The formula can be used recursively to compute h-step-ahead
predictions starting with h = 1:
▪ In essence, the same results that apply to AR models generalize

▪ VAR models are used in practice with the goal of understanding
the dynamic relationships between the variables of interest
▪ Let’s use again a simple VAR(1) model:
▪ We know that a stationary VAR has a MA() representation:
25
o The two error processes, 𝑢1,𝑡 and 𝑢2,𝑡 can be represented in
terms of the two sequences 𝜖1,𝑡 and 𝜖2,𝑡 , i.e., the structural
innovations:
▪ Therefore the model can be re-written as
▪ The coefficients in 𝚽𝑖 (impact multipliers) can be used to generate

the effects of shocks to the innovations 𝜖1,𝑡 and 𝜖2,𝑡 , on the time
path of 𝑦1,𝑡 and 𝑦2,𝑡
▪ The cumulative effects of a one-unit shock (or impulse) to a
structural shock on an endogenous variable after H periods can
then be obtained by computing the sum
▪ A VAR in reduced form cannot identify the structural form and
therefore we cannot compute the coefficients in 𝚽𝑖 from the OLS
estimates in its standard form unless we impose restrictions
Lecture 4: Multivariate Linear Time Series (VARs) – Prof. Guidolin & Dr. Rotondi
▪ One method to place these restrictions consists of the application
of a Choleski decomposition:
( )
o Because of the triangular structure of W = B-1, a Choleski
decomposition allows only the shock to the first variable to
contemporaneously affect all the other variables in the system
o A shock to the second variable will produce a contemporaneous
effect on all the variables in the system, but the first one
o This may of course be impacted in the subsequent period, through
the transmission effects mediated by the autoregressive coefficients
o A shock to the third variable will affect all the variables in the system,
but the first two, and so on
▪ Therefore, a Choleski identification scheme forces a potentially
important identification asymmetry on the system
▪ A different ordering of the variables in the system would have
been possible, implying a reverse ordering of the shocks
▪ IRFs are based on estimated coefficients: as each coefficient is
estimated with uncertainty, IRFs will contain sampling error
o Advisable to construct confidence intervals around IRFs
o An IRF is statistically significant if zero is not included in the
confidence
interval
Estimates
based on
VAR(2) for
weekly series
selected by
information
criteria
The
experiment is
a tightening
on short-term
rates by the
FED
Variance Decompositions
▪ Understanding the properties of forecast errors from VARs is
helpful in order to assess the interrelationships among variables
▪ Using the VMA representation of the errors, the h-step-ahead
forecast error is
o See lecture notes for algebra of such representation

o Because all white noise shocks the same variance, if we denote by
the h-step-ahead variance of the forecast of (say) y1, we have:
 
o Because all the coefficients in 𝚽𝑖 are non-negative, the variance of
the forecast error increases as the forecast horizon h increases
o We decompose the h-step-ahead forecast error variance into the
proportion due to each of the (structural) shocks
 
▪ Such proportions due to each shock is a variance decomposition

Variance Decompositions
▪ Like in IRF analysis, variance decompositions of reduced-form
VARs require identification (because otherwise we would be
unable to go from the coefficients in 𝚯𝑖 to their counterparts in 𝚽𝑖 )
o Choleski decompositions are typically imposed
o Forecast error variance decomposition and IRF analyses both entail
similar information from the time series
o Example on weekly US Treasury yields, 1990-2016 sample:
Choleski ordering:
__ 1M Yield
__ 1Y Yield
__ 5Y Yield
__ 10Y Yield
Variance Decompositions and Granger Causality
Choleski ordering:
__ 10Y Yield
__ 1M Yield
__ 1Y Yield
__ 5Y Yield
▪ Unfortunately, the ordering may turn out to be crucial

o This occurs because the reduced-form residuals from a 4x1 VAR(2)
system for US Treasury rates are highly, positively correlated
▪ Variance decompositions may convey useful information when in a
given VAR(p)m a subset of the Nx1 vector (say 𝒙𝑡 ) forecasts their
own future and all the remaining variables in 𝒚𝑡 but…
▪ … such remaining variables fail to forecast 𝒙𝑡
Granger-Sims Causality
▪ We say that the sub-vector 𝒙𝑡 is (block) exogenous to 𝒚𝑡 or that 𝒙𝑡
is not Granger-caused by the remaining variables
▪ We also write 𝒙𝑡 ⟹𝐺𝐶 𝒚𝑡 but 𝒚𝑡 ⇏𝐺𝐶 𝒙𝑡

▪ When in a Nx1 system , [𝒙𝑡 𝒚𝑡 ]’, 𝒙𝑡 ⟹𝐺𝐶 𝒚𝑡 and 𝒚𝑡 ⟹𝐺𝐶 𝒙𝑡 , we say
that there is a feedback system or two-way causality
o Consider the example:
Granger-Sims Causality
▪ If y1 is found to Granger-cause y2, but not vice versa, we say that

variable y1 is strongly exogenous (in the equation for y2).
▪ If neither set of lags are statistically significant in the equation for
the other variable, it would be said that y1 and y2 are linearly
unrelated
o Practically, block-causality tests simply consist of LR or F-type tests
▪ The word “causality” is somewhat of a misnomer, for Granger-Sims
causality really means only a correlation between the current
value of one variable and the past values of others
▪ It does not mean that movements of one variable actually cause
movements of another
One Example of Granger-Sims Causality Tests
▪ Use the VAR(2) model for the one-month, one-, five-, and ten-year
Treasury yields to test Granger causality
▪ The table considers one dependent variable at a time and tests
whether the lags of each of the other variables help to predict it
o The chi-square statistics refer to a test in which the null is that the
lagged coefficients of the “excluded” variable are equal to zero

Lecture 4 Slides

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 4 Slides

Uploaded by

Copyright:

Available Formats

Lecture 4: Multivariate Linear Time

Series (Vector Autoregressions)

20192– Financial Econometrics

▪ Multivariate white noise and testing for it

▪ Vector autoregressions: reduced vs. structural forms

▪ From structural to reduced-form VARs and back:

▪ Recursive Choleski identification

▪ Stationarity and moments of VARs, model specification

▪ Impulse response functions and variance decompositions

▪ Granger causality tests

It means “if and only if”

▪ Iff a series is stationary, all cross-serial correlations will decay to 0

▪ Pre-multiplying both sides by B-1 (this will be possible if b12b21

which shows that they are uncorrelated if b12 = b21 = 0

o Notice that by estimating 6 mean parameters, it is possible to pin

▪ We say that the triangularized VAR(1) is exactly identified

and i = Ai1, i.e., increasing powers of the A1 matrix

▪ These hold if and only if is non-singular which requires

o The expressions for conditional and unconditional covariance

▪ In essence, the same results that apply to AR models generalize

▪ Let’s use again a simple VAR(1) model:

▪ We know that a stationary VAR has a MA() representation:

▪ Therefore the model can be re-written as

▪ The coefficients in 𝚽𝑖 (impact multipliers) can be used to generate

o See lecture notes for algebra of such representation

▪ Such proportions due to each shock is a variance decomposition

▪ Unfortunately, the ordering may turn out to be crucial

▪ We also write 𝒙𝑡 ⟹𝐺𝐶 𝒚𝑡 but 𝒚𝑡 ⇏𝐺𝐶 𝒙𝑡

▪ If y1 is found to Granger-cause y2, but not vice versa, we say that

You might also like