You are on page 1of 76

Cointegration: an Applied

Perspectives
Finding the Long Run Equilibrium Relationship Between Multiple
Variables
Relationship Between Two Time Series
Variables
• Two types of relationships are possible:
• Intrinsic (true relationship)
• Non-intrinsic (spurious relationships)
• There are two main sources of non-intrinsic or spurious relationship:
• Seasonality
• Trend
• Hence to find relationship between variables we have to remove from each of them the
seasonal and the trend components, otherwise any relationship that we find between
them will be spurious.
• Note a non stationary series is a “trending” process (as the combined means variance
and covariance of the random variables are time dependent). Hence they should first be
de-trended (transformed to Gaussian White Noise Processes) before any relationship
between them can be thought of.
• Hence stationarity/non-stationarity plays an important role in determining the
relationship between two variables.
Cointegration: The basic Idea
• If two variables are I(1) and a OLS regression yields that the resultant
error is I(0) then the variables are cointegrated. They have a long run
equilibrium relationship between them. Error must be I(0), This is the
key.
• So, never estimate a time series regression equation without prior
determination of the order of integration of the associated variables
one by one.
• Removal of stochastic trend through differencing often eliminates
deterministic trend as well. So differencing can be an useful exercise if
you want to remove trend.
• Discussion will follow.
Steps in finding long run relationship
• Step 1: Test for the level of integration of the variables. Long run relationship can be
traditionally found out only in two cases:
• All variables are I(1)
• Some variables are I(1) some I(0)
• If all variables are I(0) just run OLS as all weighted averages of I(0) variables are themselves I(0).
• If variables are I(k), k > 1 then you can difference the variables till they become I(1). Warning:
Variables often become meaningless due to differencing. Only first (growth) and rarely second
differences (rate of change) are sometimes meaningful. For such variables immediately conclude
that no long run relationship exists.
• Step 2:
• If all variables are I(1) and an LS regression yields I(0) error then they are cointegrated. This can be
done for single equations (like the PPP test) the standard method is called the two step Granger
procedure. For simultaneous systems (like, say, the IS – LM model) the standard procedure is the
Johansen test. There are many other methods. The simultaneous estimation procedure is called
Vector Auto Regressions (VARs).
• If some variables are I(1) and some I(0) then use the Auto Regressive Distributed Lag (ARDL)
procedure. This is, in fact, an older procedure to deal with AR systems.
Major Methods of Determining the
Cointegrating Vector(s)
• When All Variables are I(1):
• The Two-Step Granger Procedure (single equation method).
• The Johansen Procedure (Simultaneous Equation/Vector Autoregression
method)
• The Toda Yamamoto Procedure (Short Run Causality when cointegration does
not exist).
• When Some Variables are I(0) and Some I(1):
• Auto Regressive Distributed Lag (ARDL) Procedure.
Single Equation Method of
Cointegration and Causality
The Two Step Granger Procedure
• Step 1: Check whether all variables are I(1) (if not then you cannot use the procedure)
• Step 2: Run an OLS regression and save the residual. Do an unit root test on the residual
and check whether it is I(0). If so then cointegration (long run relationship) exists
between the variables.
• Example: Regression equation: 𝑋𝑡 = 𝛼 + 𝛽𝑌𝑡 + 𝑢𝑡 both 𝑋𝑡 and 𝑌𝑡 are I(1). Suppose the
result is: 𝑋𝑡 = 5 + .3𝑌𝑡
• t = (5.6) (11.9)
• OLS. Figures in brackets are t values.
• Further: ∆𝑢𝑡 = −.43 𝑢𝑡−1
• t = (-7.11)
Where the Engle – Granger critical value at 1 percent is -2.58. Therefore the null of unit
root is rejected at 1 percent level so 𝑢𝑡 𝑖𝑠 𝐼 0 . So 𝑋𝑡 and 𝑌𝑡 are cointegrated. 𝑋𝑡 and 𝑌𝑡
and the long run coefficient is .3.
The Error Correction Model (Rate at which
variables adjust to equilibrium relationship)
• Granger Representation Theorem: If X and Y are cointegrated then the short run
dynamics can be represented by an error correction model.
• If in regression equation: 𝑋𝑡 = 𝛼 + 𝛽𝑌𝑡 + 𝑢𝑡 both 𝑋𝑡 and 𝑌𝑡 are I(1) and cointegrated (𝑢𝑡
is I(0)). So that there is a long run equilibrium relationship between the variables
then you can find the rate at which the variables adjust towards equilibrium
relationship in the short run by estimating the error correction model.
• The error correction representation of the above equation is:
• ∆𝑋𝑡 = 𝑎 + 𝑏∆𝑌𝑡 + 𝑐 𝐗 𝐭−𝟏 − 𝛂 − 𝛃𝐘𝐭−𝟏 + 𝜀𝑡
• Or, ∆𝑋𝑡 = 𝑎 + 𝑏∆𝑌𝑡 + 𝑐𝒖𝒕−𝟏 + 𝜀𝑡
• Suppose the result is: ∆𝑋𝑡 = .2 + 4.6∆𝑌𝑡 − .06𝑢𝑡−1 + 𝜀𝑡
• t= (4.3) (5.2) (-.91)
• Then (1) 4.6 is the short run coefficient (2) since the t statistics of 𝑢𝑡−1 is statistically
insignificant so 𝑋𝑡 instantaneously adjusts towards 𝑌𝑡 . If it was significant then the
adjustment rate is 6 percent per period.
Granger “Causality”
• Granger starts from the premise that the future cannot cause the present or the
past. If event A occurs after event B, we know that A cannot cause B. At the same
time, if A occurs before B, it does not necessarily imply that A causes B. For
instance, the weatherman’s prediction occurs before the rain. This does not mean
that the weatherman causes the rain. In practise, we observe A and B as time
series and we would like to know weather A precedes B, or B precedes A, or they
are contemporaneous. For instance, do movement in prices precede movements
in interest rates, or is it the opposite, or are the movements contemporaneous?
This is the purpose of Granger causality. So Granger causality is not really about
causality it is about precedence. None the less the mainstream literature on
econometrics does this all the time. For an introduction to actual causality read:
https://ftp.cs.ucla.edu/pub/stat_ser/r354-reprint-corrected.pdf.
• Most important economic application: Difference in Difference Model. In that
context a ppt lecture: http://jenni.uchicago.edu/econ312/Slides/Econ312_Econ-
Causality-HO-STATIC_2019-05-09a_jbb.pdf
Granger Causality Test
• Consider two time processes 𝑋𝑡 and 𝑌𝑡 . The series 𝑋𝑖 fails to Granger
cause 𝑌𝑖 if in the following regression:
• 𝑌𝑡 = 𝑘𝑖=1 𝛼𝑖 𝑌𝑡−𝑖 + 𝑘𝑖=1 𝛽𝑖 𝑋𝑡−𝑖 + 𝑢𝑡
• 𝛽𝑖 = 0 (i = 1, 2, ……k). K can be determined by AIC, BIC etc.
• Note Granger causality is a purely time series econometrics thing.
Used only here.
Practical Issues
• The equation is:
• 𝑌𝑡 = 𝑘𝑖=1 𝛼𝑖 𝑌𝑡−𝑖 + 𝑘𝑖=1 𝛽𝑖 𝑋𝑡−𝑖 + 𝑢𝑡
• 1. OLS cannot be performed unless both X and Y are I(0). Transform them if
necessary.
• 2. Autocorrelation should be absent.
• 3. Lags by AIC, BIC or length of data.
• 4. You can include and intercept in the above equation.
• 5. Note that if you only see the t values of the coefficients of X for
significance in the above equation, then you will have to look at each t
values and if some are significant then your conclusion should be that X
causes Y. But this might be an erroneous conclusion. To avoid this do a joint
F test.
The F Test for Granger Causality
• In this test:
• The restricted model is: 𝑌𝑡 = 𝑘𝑖=1 𝛼𝑖 𝑌𝑡−𝑖 + 𝑢𝑡 (1)
• The unrestricted model is:
• 𝑌𝑡 = 𝑘𝑖=1 𝛼𝑖 𝑌𝑡−𝑖 + 𝑘𝑖=1 𝛽𝑖 𝑋𝑡−𝑖 + 𝑢𝑡 (2)
(𝑅𝑆𝑆1 −𝑅𝑆𝑆2 )/(𝑣2 − 𝑣1 )
• We know: 𝑅𝑆𝑆2 Follows F with df (n - 𝑣2 )
(𝑛 −𝑣2 )
• Where 𝑣1 , 𝑣2 are the parameters in (2) and (1) and n is the total
number of observations.
• If k = infinity then you are in sound territory.
Two Way Granger Causality
• Causality from X to Y:
𝑘
• 𝑌𝑡 = 𝑖=1 𝛼𝑖 𝑌𝑡−𝑖 + 𝑢𝑡 (Restricted)
𝑘
• 𝑌𝑡 = + 𝑘𝑖=1 𝛽𝑖 𝑋𝑡−𝑖 + 𝑢𝑡 (Unrestricted)
𝑖=1 𝛼𝑖 𝑌𝑡−𝑖
• Causality from Y to X:
• 𝑋𝑡 = 𝑘𝑖=1 𝛽𝑖 𝑋𝑡−𝑖 + 𝑢𝑡 (Restricted)
• 𝑋𝑡 = 𝑘𝑖=1 𝛼𝑖 𝑌𝑡−𝑖 + 𝑘𝑖=1 𝛽𝑖 𝑋𝑡−𝑖 + 𝑢𝑡 (Unrestricted)
• Note: If variables are cointegrated then there should be causality in at
least one direction.
Reduced Form Vector Auto
Regression (VAR)
Simultaneous Equation Method of Cointegration and Causality
Definition
• Two Variable Reduced Form VAR case:
• 𝑋𝑡 = 𝛼1 + 𝑛1 𝛽1𝑘 𝑌𝑡−𝑘 + 𝑛1 𝛾1𝑘 𝑋𝑡−𝑗 + 𝑢1𝑡
• 𝑌𝑡 = 𝛼2 + 𝑛1 𝛽2𝑘 𝑌𝑡−𝑘 + 𝑛1 𝛾2𝑘 𝑋𝑡−𝑗 + 𝑢2𝑡
• Where k and j are actually chosen (a) by AIC, BIC etc. (model fitting) (b) LR statistics (eliminating
Autocorrelation). So A VAR shows simultaneous evolution of variables over time.
• 𝑢1𝑡 and 𝑢2𝑡 are called impulses, innovations and shocks.
• Lags: Note introduction of too many lags increase the dangers of degrees of freedom and multicolinerity.
Too few lags lead to specification problems.
• Normalized Contegrating vectors: [𝛼1 𝛽11 … 𝛽1𝑛 γ11 … γ1𝑛 ] and [𝛼2 𝛽21 … 𝛽2𝑛 γ21 … γ2𝑛 ]
• Note 1: Any two or more time series variables can be regressed in this vector form whether the cointegrating
vectors exist or not (if no cointegrating vector exsits then we call these estimates short run coefficients, if the
exist then we call these long run coefficients).
• Note 2: This is nothing else but a simultaneous equation estimation procedure.
• Note 3: These equations can easily be extended to the many variable case. In that case a single equation is
written to represent the VAR: and the individual variables are vectors.
Vector Error Correction Model (VECM):
Representation 1
• First test for cointegration. Error correction makes sense if long run
equilibrium relationship exists.
• Suppose the cointegration model was:
• 𝑋𝑡 = 𝛼 + 𝛽𝑌𝑡 + 𝑢𝑡
• Then the VECM is:
• ∆𝑋𝑡 = 𝛾1 𝑢𝑡−1 + 𝜀1𝑡
• ∆𝑌𝑡 = 𝛾2 𝑢𝑡−1 + 𝜀2𝑡
Vector Error Correction Model (VECM):
Representation 2
• Two Variable VAR case:
𝑛 𝑛
• 𝑋𝑡 = 𝛼1 + 1 𝛽1𝑘 𝑌𝑡−𝑘 + 1 𝛾1𝑘 𝑋𝑡−𝑗 + 𝑢1𝑡 (1)
𝑛 𝑛
• 𝑌𝑡 = 𝛼2 + 1 𝛽2𝑘 𝑌𝑡−𝑘 + 1 𝛾2𝑘 𝑋𝑡−𝑗 + 𝑢2𝑡 (2)
• Subtract one period lagged value in both equations:
𝑛 𝑛
• 𝑋𝑡 − 𝑋𝑡−1 = 𝛼1 + 1 − 𝛾11 𝑋𝑡−1 + 1 𝛽1𝑘 𝑌𝑡−𝑘 + 2 𝛾1𝑘 𝑋𝑡−𝑗 + 𝑢1𝑡
𝑛 𝑛
• 𝑌𝑡 − 𝑌𝑡−1 = 𝛼2 + 1 − 𝛽11 𝑌𝑡−1 + 2 𝛽2𝑘 𝑌𝑡−𝑘 + 1 𝛾2𝑘 𝑋𝑡−𝑗 + 𝑢2𝑡
• Or
𝑛 𝑛
• ∆𝑋𝑡 = 𝛼1 + 1 − 𝛾11 𝑋𝑡−1 + 1 𝛽1𝑘 𝑌𝑡−𝑘 + 2 𝛾1𝑘 𝑋𝑡−𝑗 + 𝑢1𝑡
𝑛 𝑛
• ∆𝑌𝑡 = 𝛼2 + 1 − 𝛽11 𝑌𝑡−1 + 2 𝛽2𝑘 𝑌𝑡−𝑘 + 1 𝛾2𝑘 𝑋𝑡−𝑗 + 𝑢2𝑡
• The VECM representation is:
𝑛 𝑛
• ∆𝑋𝑡 = 𝛼1 + Π11 𝑋𝑡−1 + 1 Γ1𝑘 Δ𝑌𝑡−𝑘 + 2 ΔΨ1𝑘 𝑋𝑡−𝑗 + 𝑢1𝑡 (3)
𝑛 𝑛
• ∆𝑌𝑡 = 𝛼2 + Π21 𝑌𝑡−1 + 2 Γ2𝑘 Δ𝑌𝑡−𝑘 + 1 Ψ2𝑘 Δ𝑋𝑡−𝑗 + 𝑢2𝑡 4
• In (1), (2) and (3), (4) the coefficients will be related. We will see their relations with a n variable VAR case in the next slide.
VECM in g I (1)Variables

• We turn the VAR of the form

yt = 1 yt-1 + 2 yt-2 +...+ k yt-k + ut


g×1 g×g g×1 g×g g×1 g×g g×1 g×1

into a VECM, which can be written as (we do not prove)

yt =  yt-k + 1 yt-1 + 2 yt-2 + ... + k-1 yt-(k-1) + ut


k i

where  = (i )Ig and 


i(j)Ig
j1 j
1

 is a long run coefficient matrix since all the yt-i = 0.

18
Testing for and Estimating
Cointegrating Systems
The Johansen Technique of VARs
The Johansen Test and Eigenvalues

• Some properties of the eigenvalues of any square matrix A:


1. the sum of the eigenvalues is the trace
2. the product of the eigenvalues is the determinant
3. the number of non-zero eigenvalues is the rank

• The VECM representation of the VAR was


yt =  yt-1 + 1 yt-1 + 2 yt-2 + ... + k-1 yt-(k-1) + ut

• The test for cointegration between the y’s is calculated by looking at the
rank of the  matrix via its eigenvalues. (To prove this requires some
technical intermediate steps).

• The rank of a matrix is equal to the number of its characteristic roots


(eigenvalues) that are different from zero.
20
 And Cointegration
• Granger’s representation theorem asserts that if the coefficient
matrix  has reduced rank r < g , then  =  so that  yt is I(0)
(that is yt s cointegrate). r is the number of cointegrating relations (the
cointegrating rank) and each column of  is the cointegrating vector.
The elements of  are known as the adjustment parameters in the
VEC model. Johansen’s method is to estimate the  matrix from an
unrestricted VAR and to test whether we can reject the restrictions
implied by the reduced rank of .
The Johansen Test Statistics

• The test statistics for cointegration are formulated as

 (r 
g
)
T
trace 
1ˆ)
ln(

i 
r1
i

and (
r
,
r
m
a
x1
)

T
l
n(
1) 
r1

where 𝜆 is the estimated value for the ith ordered eigenvalue from
the  matrix.
trace tests the null that the number of cointegrating vectors is less than
equal to r against an unspecified alternative.
trace = 0 when all the i = 0, so it is a joint test.
max tests the null that the number of cointegrating vectors is r against
an alternative of r+1.
22
Decomposition of the  Matrix

• For any 1 < r < g,  is defined as the product of two matrices:


 = 
gg gr rg
  contains the cointegrating vectors while  gives the “loadings” of
each cointegrating vector in each equation.
• For example, if g=4 and r=1,  and  will be 41, and yt-k will be
given by:


  y1  11 
 
 
11
 

  y2  12 

12

11 
12 
13  or     11 y1  12 y 2  13 y3  14 y4 

 13
14
y
 3  13 





y
 4  14 
14 tk

23
Johansen Critical Values

• Johansen & Juselius (1990) provide critical values for the 2 statistics.
The distribution of the test statistics is non-standard. The critical values
depend on:
1. the value of g-r, the number of non-stationary components
2. whether a constant and / or trend are included in the regressions.

• If the test statistic is greater than the critical value from Johansen’s
tables, reject the null hypothesis that there are r cointegrating vectors in
favour of the alternative that there are more than r.

24
The Johansen Testing Sequence

• The testing sequence under the null is r = 0, 1, ..., g-1


so that the hypotheses for trace are

H0: r = 0 vs H1: 0 < r  g


H0 : r = 1 vs H1 : 1 < r  g
H0: r = 2 vs H1: 2 < r  g
... ... ...
H0: r = g-1 vs H1: r = g

• We keep increasing the value of r until we no longer reject the null.

25
Interpretation of Johansen Test Results

• But how does this correspond to a test of the rank of the  matrix?

• r is the rank of .

  cannot be of full rank (g) since this would correspond to the original
yt being stationary.

• If  has zero rank, then by analogy to the univariate case, yt depends
only on yt-j and not on yt-1, so that there is no long run relationship
between the elements of yt-1. Hence there is no cointegration.

• For 1 < rank () < g , there are multiple cointegrating vectors.

26
Hypothesis Testing Using Johansen

• EG did not allow us to do hypothesis tests on the cointegrating relationship


itself, but the Johansen approach does.
• If there exist r cointegrating vectors, only these linear combinations will be
stationary.
• You can test a hypothesis about one or more coefficients in the
cointegrating relationship by viewing the hypothesis as a restriction on the
 matrix.
• All linear combinations of the cointegrating vectors are also cointegrating
vectors.
• If the number of cointegrating vectors is large, and the hypothesis under
consideration is simple, it may be possible to recombine the cointegrating
vectors to satisfy the restrictions exactly.

27
Hypothesis Testing Using Johansen (cont’d)

• As the restrictions become more complex or more numerous, it will


eventually become impossible to satisfy them by renormalisation.

• After this point, if the restriction is not severe, then the cointegrating
vectors will not change much upon imposing the restriction.

• A test statistic to test this hypothesis is given by

 
r

T 1
[ln(
i

i)
1

ln(
1  2(m)
*)]
i
where,
i* are the characteristic roots of the restricted model
i are the characteristic roots of the unrestricted model
r is the number of non-zero characteristic roots in the unrestricted model,
and m is the number of restrictions.
28
Cointegration Tests using Johansen:
Three Examples

Example 1: Hamilton(1994, pp.647 )

• Does the PPP relationship hold for the US / Italian exchange rate -
price system?

• A VAR was estimated with 12 lags on 189 observations. The Johansen


test statistics were
r max critical value
0 22.12 20.8
1 10.19 14.0

• Conclusion: there is one cointegrating relationship.


29
Assignment
• Explain the movement of the daily Sensex data since October 2018 in
light of the covid crisis.
• The two possibilities discussed in the class
• Then confirmation or rejection of the hypotheses.
• Last date: 15th April.
AUTOREGRESSIVE DISTRIBUTED LAG (ADL)
MODEL

31
AUTOREGRESSIVE DISTRIBUTED LAG (ADL)
MODEL
• Augment AR(p) with lags of explanatory variables produces ADL
model
Yt   0  1Yt 1     pYt  p  1 X t 1     q X t q   t
• p lags of Y, q lags of X ⇒ ADL(p,q).
Note: X and Y must have the same stationarity properties (either must
both be stationary or both have a unit root). If not use:
Yt   0  1Yt 1     p Yt  p 1  1X t 1     q X t  q 1   t
AUTOREGRESSIVE DISTRIBUTED LAG (ADL)
MODEL
• Estimation and interpretation of the ADL(p,q) model depends on
whether Y and X are stationary or have unit roots.

• Before you estimate an ADL model you should test both Y and X for
unit roots using the Dickey-Fuller test.
MULTICOLLINEARITY
• This form of ADL model less likely to run into multicollinearity problems.
Yt   0  Yt 1  1Yt 1     p Yt  p 1  X t  1X t 1     q X t q 1   t
• One thing researchers often calculate is the long run or total multiplier.
• Suppose that X and Y are in an equilibrium or steady state. Then X rises
(permanently) by one unit, affecting Y, which starts to change, settling down
in the long run to a new equilibrium value.
• Difference between old and new equilibrium values for Y is the long run
effect of X on Y, and called long run multiplier. This multiplier is often of
great interest for policy makers who want to know the eventual effects of
their policy changes in various areas.
• For ADL(p, q) model long run multiplier is: /.
The Toda Yamamoto Procedure
• This method applies to any variables with any level of integration. The
equations are:
k  d max k  d max
X t    
i 1
2 X t 1   Y
i 1
3 t i  1

k  d max k  d max
Yt      Y
i 1
2 t 1   X
i 1
3 t i 2

Where dmax is the maximum order of integration of the variables.


Structural VAR vs Reduced
Form VAR
Simultaneous Equations Models

• All the models we have looked at thus far have been single equations models of the
form y = X + u
• All of the variables contained in the X matrix are assumed to be EXOGENOUS.
• y is an ENDOGENOUS variable.

An example from economics to illustrate - the demand and supply of a good:


(1)
Qdt    Pt  St  ut
(2)
Qst    Pt  Tt  vt
(3)
Qdt  Qst
where = quantity of the good demanded
Qdt
= quantity of the good supplied
Q
St st = price of a substitute good
Tt = some variable embodying the state of technology

37
Simultaneous Equations Models:
The Structural Form

• Assuming that the market always clears, and dropping the time subscripts for
simplicity
(4)
Q    P  S  u
(5)
Q    P  T  v
This is a simultaneous STRUCTURAL FORM of the model.

• The point is that price and quantity are determined simultaneously (price affects
quantity and quantity affects price).

• P and Q are endogenous variables, while S and T are exogenous.

• We can obtain REDUCED FORM equations corresponding to (4) and (5) by


solving equations (4) and (5) for P and for Q (separately).

38
Obtaining the Reduced Form

• Solving for Q,
(6)
  P  S  u    P  T  v

• Solving for P,
Q  S u Q  T v (7)
      
       
• Rearranging (6),

P  P      T  S  v  u
(  ) P  (   )  T  S  (v  u)
(8)
    vu
P  T S
   

39
Obtaining the Reduced Form (cont’d)

• Multiplying (7) through by ,


Q    S  u  Q    T  v
Q  Q      T  S  u  v
(    )Q  (    )  T  S  ( u  v)

     u  v
Q  T S
    (9)

• (8) and (9) are the reduced form equations for P and Q.

40
Simultaneous Equations Bias

• But what would happen if we had estimated equations (4) and (5), i.e. the
structural form equations, separately using OLS?

• Both equations depend on P. One of the CLRM assumptions was that E(Xu) =
0, where X is a matrix containing all the variables on the RHS of the equation.

• It is clear from (8) that P is related to the errors in (4) and (5) - i.e. it is
stochastic.

• What would be the consequences for the OLS estimator, , if we ignore the
simultaneity? 

41
Simultaneous Equations Bias (cont’d)

• Recall that   ( X ' X ) 1 X ' yand y  X  u

• So that ˆ  ( X ' X ) 1 X ' ( X  u )


 ( X ' X ) 1 X ' X  ( X ' X ) 1 X ' u
   ( X ' X ) 1 X ' u
• Taking expectations, E ( )  E (  )  E (( X ' X ) 1 X ' u)
   ( X ' X ) 1 E ( X ' u )

• If the X’s are non-stochastic, E(Xu) = 0, which would be the case in a single
equation system, so that E(  ) 
, which is the condition for unbiasedness.

• But .... if the equation is part of a system, then E(Xu)  0, in general.

42
Simultaneous Equations Bias (cont’d)

• Conclusion: Application of OLS to structural equations which are part of a simultaneous system
will lead to biased coefficient estimates.

• Is the OLS estimator still consistent, even though it is biased?

• No - In fact the estimator is inconsistent as well.

• Hence it would not be possible to estimate equations (4) and (5) validly using OLS.

43
Avoiding Simultaneous Equations Bias

So What Can We Do?


• Taking equations (8) and (9), we can rewrite them as
P  10  11T  12 S  1 (10)

Q   20   21T   22 S  2
(11)

• We CAN estimate equations (10) & (11) using OLS since all the RHS
variables are exogenous.

• But ... we probably don’t care what the values of the  coefficients are;
what we wanted were the original parameters in the structural equations
- , , , , , .

44
Identification of Simultaneous Equations

Can We Retrieve the Original Coefficients from the ’s?


Short answer: sometimes.

• As well as simultaneity, we sometimes encounter another problem:


identification.
• Consider the following demand and supply equations
Supply equation (12)
Q    P
Demand equation (13)
Q    P
We cannot tell which is which!
• Both equations are UNIDENTIFIED or NOT IDENTIFIED, or
UNDERIDENTIFIED.
• The problem is that we do not have enough information from the equations to
estimate 4 parameters. Notice that we would not have had this problem with
equations (4) and (5) since they have different exogenous variables.

45
What Determines whether an Equation is Identified
or not?
• We could have three possible situations:

1. An equation is unidentified
· like (12) or (13)
· we cannot get the structural coefficients from the reduced form estimates

2. An equation is exactly identified


· e.g. (4) or (5)
· can get unique structural form coefficient estimates

3. An equation is over-identified
· Example given later
· More than one set of structural coefficients could be obtained from the reduced form.

46
What Determines whether an Equation is Identified
or not? (cont’d)

• How do we tell if an equation is identified or not?


• There are two conditions we could look at:

- The order condition - is a necessary but not sufficient condition for an


equation to be identified.

- The rank condition - is a necessary and sufficient condition for


identification. We specify the structural equations in a matrix form and
consider the rank of a coefficient matrix.

47
Simultaneous Equations Bias (cont’d)

Statement of the Order Condition (from Ramanathan 1995, pp.666)


• Let G denote the number of structural equations. An equation is just identified if
the number of variables excluded from an equation is G-1.

• If more than G-1 are absent, it is over-identified. If less than G-1 are absent, it is
not identified.

Example
• In the following system of equations, the Y’s are endogenous, while the X’s are
exogenous. Determine whether each equation is over-, under-, or just-identified.
Y1  0  1Y2  3Y3  4 X1  5 X 2  u1
Y2  0  1Y3  2 X1  u2 (14)-(16)
Y3   0   1Y2  u3

48
Simultaneous Equations Bias (cont’d)

Solution

G = 3;
If # excluded variables = 2, the eqn is just identified
If # excluded variables > 2, the eqn is over-identified
If # excluded variables < 2, the eqn is not identified

Equation 14: Not identified


Equation 15: Just identified
Equation 16: Over-identified

49
Tests for Exogeneity

• How do we tell whether variables really need to be treated as endogenous or


not?
• Consider again equations (14)-(16). Equation (14) contains Y2 and Y3 - but do
we really need equations for them?
• We can formally test this using a Hausman test, which is calculated as follows:
1. Obtain the reduced form equations corresponding to (14)-(16). The reduced
forms turn out to be:

Y1  10  11 X1  12 X 2  v1 (17)-(19)


Y2  20  21 X1  v2
Estimate the reducedY3form  30equations
 31 X1 (17)-(19) vusing
3 OLS, and obtain the fitted
values,
Y1 ,Y2 ,Y3

50
Tests for Exogeneity (cont’d)

2. Run the regression corresponding to equation (14).

3. Run the regression (14) again, but now also including the fitted values
Y2 , Y3
as additional regressors:

Y1   0  1Y2   3Y3   4 X 1   5 X 2  2Yˆ2  3Yˆ3  u1


1 1

(20)

4. Use an F-test to test the joint restriction that 2 = 0, and 3 = 0. If the null
hypothesis is rejected, Y2 and Y3 should be treated as endogenous.

51
Estimating Structural VAR
• Eviews demonstration (not to be covered in class) is available at:
https://www.youtube.com/watch?v=_nGkJzDlPY4
• Note that we
Variance Decomposition
A Basic Idea
What variance?
• The forecast error variance.
• What is forecast error?
• 𝜀𝑡 = 𝐴𝑡 − 𝐹𝑡 or forecast error = actual – forecast all in period t,
• Note the forecast is given at the beginning of period t and the actual can be
seen at the end of period t.
• Alternatively, you may have divided your time period into two parts. You do
your forecasting on the basis of the first half and the forecasts are defined
over the second half. Note that when you do this the second half data is
there but it has not been used in the forecast. You now find 𝜀𝑡 as above as
the difference between the actuals of the second half minus forecast for
the second half based on the first half.
• Year(t) GDP n(t) GDTt = a + b t
• 1960 234 1 GDPt = .3 + 2.3 t (1960 to 1990)
• 1961 345 2 t = 61, GDP = ? (extrapolation method)
•… t = 62, GDP = ? …..
• 2019 562 60 (1960 to 1990) (1991 to 2019)
• 2020 61
• 2021 62
• 2022 63
What is forecast error variance?
• Variance of 𝜀𝑡 .
• The question is how much of this variance can be explained by each of the
variables in the system.
• For example we write a two variable VAR on GDP and Consumption. On the
basis of the VAR estimates we find forecasts for a certain period and
calculate the forecast error. Now we change say, GDP by one unit (called an
“innovation in the GDP series”) and try to calculate how much this change
in GDP explains (1) the forecast error in GDP and (2) the forecast error in
consumption. These percentages are recorded in the forecast error
variance decomposition matrix.
• So we decompose the total forecast error generated from the VAR by the
parts of it that can be explained by GDP itself and by consumption (other
variables).
FEV in the Context of a VAR
• We write a 2 variable VAR (k):
• 𝑋𝑡 = 𝛼1 + 𝑛1 𝛽1𝑘 𝑌𝑡−𝑘 + 𝑛1 𝛾1𝑘 𝑋𝑡−𝑗 + 𝑢1𝑡
𝑘 𝑗
• 𝑌𝑡 = 𝛼2 + 1 𝛽2𝑘 𝑌𝑡−𝑘 + 1 𝛾2𝑘 𝑋𝑡−𝑗 + 𝑢2𝑡
• Here X depends of past values of X and Y.
• Y depends on past values of X and Y.
• Now I divide my sample between a trial and a test set and run the
VAR on the trial set and based on the estimates of this VAR I forecast
the values for the test set ad compare it to the actual values of the
test set and arrive at a forecast error and a forecast error variance.
• What can the source of this error?
• Noise
• The noise will affect both X and Y. So the effect of the noise on the
estimate can either come through its effect on X or its effect on Y plus
an unexplained part.
• What does variance decomposition do? It divides the forecast
variation or the forecast variance in say X, caused due to the effect on
itself and on Y.
• How does it do it? It gives a shock to X and sees how the shock affects
the variance of X over time
Basic Concept of Variance Decomposition
• The Forecast Error Variance Decomposition (FEVD) show you how
much of the future uncertainty (captured by the forecast error) of one
time series is due to future shocks into the other time series in the
system. This evolves over time, so the shocks on time series x1 may
not be very important in the short-run but very important in the long
run. For example you could learn that oil price shocks account only
for 5% of the variance of future electricity price movements in the
next 7 days, but for 40% in the next 6 months.
Variance Decomposition and Impulse
Response
• Variance Decomposition gives the relative importance of each
innovation in affecting the forecast error variance of all response
variables in the system.
• In contrast, the impulse response function (IRF) traces the effects of
an innovation shock to one variable on the response of all variables in
the system.
From Where Does the Idea Come?
• In order to understand the decomposition of variance, it is necessary to understand the law of
total variance. Assume that there are two variables; Y = dependent variable or response variable,
and X = independent variable or explanatory factor. In the general linear model, the relationship is
capture by the linear equation:
• Y = a + bX + c (1)
• Simply state, for every change of X, there is a corresponding change in Y. The focus of variance
decomposition is on the response variable: Y. Specifically, the variance of Y, which is given by:
• Var(Y) = E(Var[Y|X]) + Var(E[Y|X]) (2)
• In the relationship between X and Y, the variance of Y (dependent variable) is comprised of (i) the
expected variance of Y with respect to X, plus (ii) the variance of the “expected variance of Y”
with respect to X. In simple language, the variance of Y is its expected value plus the “variance of
this expected value.” This is sometimes summarized as:
• E(Var[Y|X]) = explained variation directly due to changes in X
• Var(E[Y|X]) = unexplained variation comes from somewhere other than X
DECOMPOSITION
• The decomposition of variance is used when we are dealing with
dynamic stochastic system. Stochastic system is a random value
process. This stochastic system may be defined as:
• Y(t) = value of system at time (t)
• H(it) = historical value corresponding to (t) where H)it) = H(1t), H(2t),
…, H(c-1, t)
• From equation (3), the statement may be rewritten in terms of Y(t) =
Y and H(it) = X as:
• Var[Y(t)] = E(Var[Y(t) | H(1t), H(2t, …, H(c-1,t)]) + Sum(E[Var(Y(t)] |
H(1t), H(2t, …, H(j-1,t)]) + Var(E[Y(t)] | H(1t)) (3)
MEANING OF THE RESULT
• Recall these two conditions:
• E(Var[Y|X]) = explained variation directly due to changes in X; and
• Var(E[Y|X]) = unexplained variation comes from somewhere other than X.
• The result helps the researcher to isolate to appreciate the fact that the
response in Y has variation; this variation is comprised of 2 components.
When these components are decomposed they are one type of variation
that is explained by the changes of X (independent variable) and another
variance that is completely due to chance stance, i.e. unexplained. Another
meaning of this is that Var(E[Y | X]) = randomness; after all, randomness is
defined as unpredictable pattern.
What is the Difference Between The Law of Total
Variances and Variance Decomposition Here?
• Only E(Var[Y|X]) = explained variation directly due to changes in X is
reported.
• Var(E[Y|X]) = unexplained variation comes from somewhere other
than X is not reported.
• So the estimated decomposition adds up to 100 %.
IMPULSE RESPONSE
FUNCTION
IN THE CONTEXT OF A VAR
The unrestricted VAR
• We write a 2 variable VAR (k):
• 𝑋𝑡 = 𝛼1 + 𝑛1 𝛽1𝑘 𝑌𝑡−𝑘 + 𝑛1 𝛾1𝑘 𝑋𝑡−𝑗 + 𝑢1𝑡
𝑘 𝑗
• 𝑌𝑡 = 𝛼2 + 1 𝛽2𝑘 𝑌𝑡−𝑘 +1 𝛾2𝑘 𝑋𝑡−𝑗 + 𝑢2𝑡
• Suppose we reduce this to a VAR(1):
• 𝑋𝑡 = 𝛼1 + 𝛽11 𝑌𝑡−1 + 𝛾11 𝑋𝑡−1 + 𝑢1𝑡 (1)
• 𝑌𝑡 = 𝛼2 + + 𝛽21 𝑌𝑡−1 + 𝛾21 𝑋𝑡−1 + 𝑢2𝑡 (2)
• 𝑢1𝑡 and 𝑢2𝑡 are called the ‘innovations’. Why?
• Suppose 𝑢1𝑡 increases by one standard deviation (why one standard
deviation?) we will look at what it does to the VAR.
Effect of an Innovation on the VAR
• 𝑋𝑡 = 𝛼10 + 𝛽110 𝑌𝑡−1 + 𝛾110 𝑋𝑡−1 + 𝑢1𝑡 (1)
• 𝑌𝑡 = 𝛼20 + + 𝛽210 𝑌𝑡−1 + 𝛾210 𝑋𝑡−1 + 𝑢2𝑡 (2)
• Taking time one period ahead:
• 𝑋𝑡+1 = 𝛼11 + 𝛽111 𝑌𝑡 + 𝛾111 𝑋𝑡 + 𝑢1𝑡+1 (3)
• 𝑌𝑡+1 = 𝛼21 + + 𝛽211 𝑌𝑡 + 𝛾211 𝑋𝑡 + 𝑢2𝑡+1 (4)
• Taking time ahead by one more period:
• 𝑋𝑡+2 = 𝛼12 + 𝛽112 𝑌𝑡+1 + 𝛾112 𝑋𝑡+1 + 𝑢1𝑡+2 (5)
• 𝑌𝑡+2 = 𝛼22 + + 𝛽212 𝑌𝑡+1 + 𝛾212 𝑋𝑡+1 + 𝑢2𝑡+2 (6)
• Effect of an innovation in 𝑋𝑡 on 𝑋𝑡 and 𝑌𝑡 : 𝐿𝑒𝑡 𝑢1𝑡 ↑ (one σ). What
happens?
ARCH and GARCH

Modeling Volatility Dynamics


Modeling Unequal Variability
• Equal Variability: Homoscedasticity

• Unequal Variability: Heteroscedasticity


• Means any variability (around the mean) that is not
homoscedasticity
• Models must be developed for specific cases
What These Acronym Mean?
• ARCH
• Autoregressive Conditional Heteroscedasticity

• GARCH
• Generalized ARCH
Information in e2

• Let t have the mean 0 and the variance st.

• Let et be the residual of a model fitted.

• Then:
• et estimates t

• et2 estimates the variance st2.


ARCH Modeling of st2.
• ARCH(1)

s    
t
2 2
( t 1)
• ARCH as AR(1) on
  s  t
t
2
t
2

    
t
2 2
( t 1)  t
GARCH
• GARCH(1)

s    
t
2 2
( t 1)  s 2
( t 1)
• GARCH (1) as ARMA(1,1) on
 t2  s t2  t

        
t
2 2
( t 1)  t   t 1
t  0

Asymmetry in GARCH - TARCH

• TARCH(1,1)

s    
t
2 2
t 1   d 2
t 1s  s 2
t 1

d = 1 if t < 0, and = 0 if t > 0


Asymmetry in GARCH - EGARCH
• EGARCH(1,1)

 t 1  t 1
log s 2
     log s 2
t 1  
s t 1 s t 1
t

s t2  0

  0 for asymmetric effect


Eviews Command

ARCH(p, q) series_name c

You might also like