You are on page 1of 113

Chapter 14, 15, 16

Time Series
Regression

Copyright 2015 Pearson Education, Inc. All rights reserved.

1. Time Series Data: Whats Different?


Time series data are data collected on the same
observational unit at multiple time periods;
Yt=B0+B1X1t+B2X2t+ut
Aggregate consumption and GDP for a country (for
example, 20 years of quarterly observations = 80
observations)
Yen/$, pound/$ and Euro/$ exchange rates (daily
data for 1 year = 365 observations)
Cigarette consumption per capita in California, by
year (annual data)
Copyright 2015 Pearson Education, Inc. All rights reserved.

14-2

Some monthly U.S. macro and financial


time series

Copyright 2015 Pearson Education, Inc. All rights reserved.

14-3

Logarithm:

Copyright 2015 Pearson Education, Inc. All rights reserved.

14-4

Monthly Percentage Change

Copyright 2015 Pearson Education, Inc. All rights reserved.

14-5

Copyright 2015 Pearson Education, Inc. All rights reserved.

14-6

Some uses of time series data


Forecasting (SW Ch. 14)-separate class Econ 373
Estimation of dynamic causal effects (SW Ch. 15)
If the Fed increases the Federal Funds rate now,
what will be the effect on the rates of inflation and
unemployment in 3 months? in 12 months?
What is the effect over time on cigarette
consumption of a hike in the cigarette tax?
Modeling risks, which is used in financial markets
(one aspect of this, modeling changing variances
and volatility clustering, is discussed in SW Ch.
16)
Copyright 2015 Pearson Education, Inc. All rights reserved.

14-7

Time series data raises new technical issues


Time lags
Correlation over time (serial correlation, a.k.a.
autocorrelation which we encounter in panel data)
Calculation of standard errors when the errors are
serially correlated
A good way to learn about time series data is to
investigate it yourself! A great source for U.S. macro
time series data, and some international data, is the
Federal Reserve Bank of St. Louiss FRED database.
Copyright 2015 Pearson Education, Inc. All rights reserved.

14-8

3. Time Series Data

Time series basics:


A. Notation
B. Lags, first differences, and growth rates
C. Autocorrelation (serial correlation)
D. Stationarity.

Copyright 2015 Pearson Education, Inc. All rights reserved.

14-9

A. Notation
Yt = value of Y in period t.
Data set: {Y1,,YT} are T observations on
the time series variable Y
We consider only consecutive, evenlyspaced observations (for example, monthly,
1960 to 1999, no missing months)
missing and unevenly spaced data introduce
technical complications

Copyright 2015 Pearson Education, Inc. All rights reserved.

14-10

B. Lags, first differences, and growth


rates

Copyright 2015 Pearson Education, Inc. All rights reserved.

14-11

3. Time Series Data

Time series basics:


A. Notation
B. Lags, first differences, and growth rates
C. Autocorrelation (serial correlation)
D. Stationarity.

Copyright 2015 Pearson Education, Inc. All rights reserved.

14-12

AUTOCORRELATION
(Serial Correlation):
follows the laws of multiple
regressors heteroskedasticity

C. Autocorrelation (serial correlation)


The correlation of a series Yt with its own lagged
values is called autocorrelation or serial
correlation.
The first autocovariance of Yt is cov(Yt,Yt1)
The first autocorrelation of Yt is corr(Yt,Yt1)
cov(Yt , Yt 1 )
Thus
corr(Yt,Yt1) = var(Yt ) var(Yt 1 ) =1
These are population correlations they describe the
population joint distribution of (Yt, Yt1)
Copyright 2015 Pearson Education, Inc. All rights reserved.

14-14

Pure Auto/Serial Correlation


Pure serial correlation occurs when the assumption of
uncorrelated observations of the error term, is violated (in a
correctly specified equation!)
The most commonly assumed kind of serial correlation is
first-order serial correlation, in which the current value of
the error term is a function of the previous value of the error
term:
t = t1 + ut (9.1)
where: = the error term of the equation in question
= the first-order autocorrelation coefficient
u = a classical (not serially correlated) error term
2011 Pearson Addison-Wesley. All rights reserved.

9-15

Pure Serial Correlation (cont.)

t = t1 + ut
The magnitude of indicates the strength of the
serial correlation:
If is zero, there is no serial correlation
As approaches one in absolute value, the previous
observation of the error term becomes more important in
determining the current value of t and a high degree of
serial correlation exists
For to exceed one is unreasonable, since the error term
effectively would explode

As a result of this, we can state that:


1 < < +1(9.2)
2011 Pearson Addison-Wesley. All rights reserved.

9-16

Pure Serial Correlation (cont.)


The sign of indicates the nature of the serial correlation in an
equation:
Positive:
implies that the error term tends to have the same sign from
one time period to the next
this is called positive serial correlation
Negative:
implies that the error term has a tendency to switch signs
from negative to positive and back again in consecutive
observations
this is called negative serial correlation
Figures 9.19.3 illustrate several different scenarios
2011 Pearson Addison-Wesley. All rights reserved.

9-17

Positive or Negative Serial


Correlation ?

2011 Pearson Addison-Wesley. All rights reserved.

9-18

Figure 9.1b
Positive Serial Correlation

2011 Pearson Addison-Wesley. All rights reserved.

9-19

Figure 9.2
No Serial Correlation

2011 Pearson Addison-Wesley. All rights reserved.

9-20

Positive or
Negative Serial Correlation

2011 Pearson Addison-Wesley. All rights reserved.

9-21

Positive or
Negative Serial Correlation

2011 Pearson Addison-Wesley. All rights reserved.

9-22

Impure Serial Correlation


Impure serial correlation is serial correlation that is caused
by a specification error such as:
an omitted variable and/or
an incorrect functional form
How does this happen? Just as with heteroskedasticity in cross sectional data
As an example, suppose that the true equation is:
(9.3)
where t is a classical error term. As learned, if X2 is accidentally omitted
from the equation (or if data for X2 are unavailable), then:
(9.4)

2011 Pearson Addison-Wesley. All rights reserved.

9-23

Impure Serial Correlation (OV)

Instead, the error term is also a function of one of the


explanatory variables, X2
As a result, the new error term, * , can be serially correlated
even if the true error term , is not
In particular, the new error term will tend to be serially
correlated when:

1. X2 itself is serially correlated (this is quite likely in a


time series) and
2. the size of is small compared to the size of
Figure 9.4 illustrates 1., for the case of U.S. disposable
income
2011 Pearson Addison-Wesley. All rights reserved.

9-24

U.S. Disposable Income as a


Function of Time

2011 Pearson Addison-Wesley. All rights reserved.

9-25

Impure Serial Correlation


(Incorrect Functional Form IFF)
Turn now to the case of impure serial correlation caused by an
incorrect functional form
Suppose that the true equation is polynomial in nature:
(9.7)
but that instead a linear regression is run:
(9. 8)
The new error term * is now a function of the true error term
and of the differences between the linear and the polynomial
functional forms
Figure 9.5 illustrates how these differences often follow fairly
autoregressive patterns
2011 Pearson Addison-Wesley. All rights reserved.

Figure 9.5a Incorrect Functional Form as a


Source of Impure Serial Correlation

2011 Pearson Addison-Wesley. All rights reserved.

9-27

Incorrect Functional Form as a Source of


Impure Serial Correlation

2011 Pearson Addison-Wesley. All rights reserved.

9-28

The Consequences of Serial


Correlation
The existence of serial correlation in the error term leads to the
estimation of the equation with OLS to have at least three
consequences:
1.

Pure serial correlation does not cause bias in the coefficient


estimates

2.

Serial correlation causes OLS to no longer be the minimum


variance estimator (of all the linear unbiased estimators): So what
doesnt it minimize anymore ? R2

3.

Serial correlation causes the OLS estimates of the SE to be


biased, leading to unreliable hypothesis testing. Typically the
bias in the SE estimate is negative, meaning that OLS
underestimates the standard errors of the coefficients (and thus
overestimates the t-scores). How does this compare to
9-29
heteroskedasticity ?

The DurbinWatson d Test

Two main ways to detect serial correlation:

Informal: observing a pattern in the residuals like we did in figures


Formal: testing for serial correlation using the DurbinWatson d test
We will now go through the second of these in detail

First, it is important to note that the DurbinWatson d test is only


applicable if the following three assumptions are met:
1. The model includes an intercept term: Yt=B1X1T+B2X2T is NOT ok
2. The serial correlation is first-order in nature:
t = t1 + ut where is the autocorrelation coefficient and u is a
classical (normally distributed) error term

3. The regression model does not include a lagged dependent


variable as an independent variable:

Yt=B0+B1X1T+B2X2T+B3Yt-1+ui
9-30

The DurbinWatson
d Test (cont.)
The equation for the DurbinWatson d statistic for T
observations is:
(9.10)
where the ets are the OLS residuals
There are three main cases:
1. Extreme positive serial correlation: d = 0
2. Extreme negative serial correlation: d 4
3. No positive serial correlation: d 2
2011 Pearson Addison-Wesley. All rights reserved.

9-31

The DurbinWatson
d Test (cont.)
To test for positive (note that we rarely, if ever, test for
negative!) serial correlation, the following steps are required:
1. Obtain the OLS residuals from the equation to be tested
and calculate the d statistic by using Equation 9.10:

2. Determine the sample size and the number of


explanatory variables and then consult a Statistical
Table to find the upper critical d value, dU, and the lower
critical d value, dL, respectively
9-32

The DurbinWatson
d Test (cont.)
3. Set up the test hypotheses and decision rule:
H0: 0 (no positive serial correlation)
HA: > 0 (positive serial correlation)
if d < dL

Reject H0

if d > dU

Do not reject H0

if dL d dU

Inconclusive

In rare circumstances, perhaps first differenced equations, a


two-sided d test might be appropriate
In such a case, steps 1 and 2 are still used, but step 3 is now:
2011 Pearson Addison-Wesley. All rights reserved.

9-33

The DurbinWatson
d Test (cont.)
3. Set up the test hypotheses and decision rule:
H0: = 0

(no serial correlation)

HA: 0

(serial correlation)

if d < dL

Reject H0

if d > 4 dL

Reject H0

if 4 dU > d > dU Do Not Reject H0


Otherwise Inconclusive
Figure 9.6 gives an example of a one-sided Durbin Watson d test
2011 Pearson Addison-Wesley. All rights reserved.

9-34

Figure 9.6 An Example of a OneSided DurbinWatson d Test

2011 Pearson Addison-Wesley. All rights reserved.

9-35

https://www3.nd.edu/~wevans1/econ30331/
Durbin_Watson_tables.pdf

Copyright 2015 Pearson Education, Inc. All rights reserved.

14-36

More in class practice and examples


A farmers association hire you to predict inches of growth
for corn as a function of rain on a monthly basis (they
provide you with the data they have been collecting for
the past 14 months). You estimate the model:

InGrwtht=B0+B1InRaint+B2Tempt+ut
1. What sign to you expect each coefficient to have ?
2. Your results are:
InGrwtht=1.2+.07InRaint+.03Tempt , R2=.48
(.07)

(.003)

(.02)

Which are significant ?


3. Interpret in words the findings for your employer.
Copyright 2015 Pearson Education, Inc. All rights reserved.

14-37

More in class practice and examples


A farmers association hire you to predict inches of growth
for corn as a function of rain on a monthly basis (they
provide you with the data they have been collecting for
the past 14 months). You estimate the model:
InGrwtht=1.2+.07InRaint+.03Tempt , R2=.48
(.07)

(.003)

(.02)

4. How would you test whether your model suffers from


serial correlation ?
5. You run the DW test and find: d=2.8. Do you have
serial correlation ?
Copyright 2015 Pearson Education, Inc. All rights reserved.

14-38

More in class practice and examples


5. You run the DW test and find: d=2.8. Do you have
serial correlation ?

Copyright 2015 Pearson Education, Inc. All rights reserved.

14-39

Remedy 1: Generalized Least


Squares
Start with an equation that has first-order serial correlation:
(9.15)
Which, if t = t1 + ut (due to pure serial correlation), also
equals:
(9.16)
Multiply Equation 9.15 by and then lag the new equation
by one period, obtaining:
(9.17)

2011 Pearson Addison-Wesley. All rights reserved.

9-40

Generalized Least Squares


(cont.)
Next, subtract Equation 9.17 from Equation 9.16,
obtaining:
(9.18)

Finally, rewrite equation 9.18 as:


(9.19)
(9.20)

2011 Pearson Addison-Wesley. All rights reserved.

9-41

Generalized Least Squares


Equation 9.19 is called a Generalized Least Squares
(or quasi-differenced) version of Equation 9.16.
Notice that:
1. The error term is not serially correlated
a. As a result, OLS estimation of Equation 9.19 will be minimum
variance
b. This is true if we know or if we accurately estimate

2. The slope coefficient 1 is the same as the slope


coefficient of the original serially correlated equation,
Equation 9.16. Thus coefficients estimated with GLS have
the same meaning as those estimated with OLS.
2011 Pearson Addison-Wesley. All rights reserved.

9-42

Generalized Least Squares


3. The dependent variable has changed compared
to that in Equation 9.16:
This means that the GLS is not directly comparable
to the OLS.
4. To forecast with GLS, adjustments discussed later
are required
Unfortunately, we cannot use OLS to estimate a GLS
model because GLS equations are inherently nonlinear
in the coefficients
Fortunately, there are at least two other methods
available:
Copyright 2015 Pearson Education, Inc. All rights reserved.

14-43
9-43

1.The CochraneOrcutt Method


This is a two-step iterative technique that first produces an
estimate of and then estimates the GLS equation using that
estimate.
The two steps are:
1. Estimate by running a regression based on the residuals of the
equation suspected of having serial correlation:
et = et1 + ut (9.21)
where the ets are the OLS residuals from the equation suspected
of having pure serial correlation and ut is a classical error term
2. Use this to estimate the GLS equation by substituting into
Equation 9.18
and using OLS to estimate Equation 9.18 with the adjusted data
These two steps are repeated (iterated) until further iteration results
in little change in
Once has converged (usually in just a few iterations), the last
estimate of step 2 is used as a final estimate of Equation 9.18
Copyright 2015 Pearson Education, Inc. All rights reserved.

14-44
9-44

2. The AR(1) Method


The AR(1) method estimates a GLS equation like Equation
9.18
by estimating 0, 1 and simultaneously with iterative
nonlinear regression techniques (that are well beyond the
scope of this class!)
The AR(1) method tends to produce the same coefficient
estimates as CochraneOrcutt
However, the estimated standard errors are smaller
This is why the AR(1) approach is recommended as long as
your software can support such nonlinear regression
2011 Pearson Addison-Wesley. All rights reserved.

9-45

Remedies for Serial


Correlation
The place to start in correcting a serial correlation problem is to
considered

Remember we said there are two main remedies for pure


serial correlation:
1. Generalized Least Squares we just learned it
2. Newey-West standard errors what is this ? And when
would we use this instead of GLS ? Next !

2011 Pearson Addison-Wesley. All rights reserved.

9-46

Remedy 2: NeweyWest
Standard Errors
Not all corrections for pure serial correlation involve Generalized
Least Squares (GLS does not do well in small samples)
NeweyWest standard errors take account of serial correlation
by correcting the standard errors without changing the
estimated coefficients
The logic begin NeweyWest standard errors is powerful:
If serial correlation does not cause bias in the estimated
coefficients but does impact the standard errors, then it
makes sense to adjust the estimated equation in a way that
changes the standard errors but not the coefficients
9-47

NeweyWest Standard Errors


(cont.)
The NeweyWest SEs are biased but generally more
accurate than uncorrected standard errors for large
samples in the face of serial correlation
As a result, NeweyWest standard errors can be used for
t-tests and other hypothesis tests in most samples without
the errors of inference potentially caused by serial
correlation
Typically, NeweyWest SEs are larger than OLS SEs, thus
producing lower t-scores

2011 Pearson Addison-Wesley. All rights reserved.

9-48

DYNAMIC MODELS

Copyright 2015 Pearson Education, Inc. All rights reserved.

14-49

Dynamic Models: Distributed Lag Models


An (ad hoc) distributed lag model explains the
current value of Y as a function of current and past
values of X, thus distributing the impact of X over a
number of time periods
For example, we might be interested in the impact of a
change in the money supply (X) on GDP (Y) and model
this as:
Yt = 0 + 0Xt + 1Xt1 + 2Xt2 + ... + pXtp + t

(12.2)

Or, in our example:

GDPt = 0 + 0MSt + 1MSt1 + 2MSt2 + ... + pMStp + t

Copyright 2015 Pearson Education, Inc. All rights reserved.

14-50
12-

Dynamic Models: Distributed Lag Models


interested in the impact of a change in the money supply
(X) on GDP (Y) and model this as:
Yt = 0 + 0Xt + 1Xt1 + 2Xt2 + ... + pXtp + t

(12.2)

Or, in our example:

GDPt = 0 + 0MSt + 1MSt1 + 2MSt2 + ... + pMStp + t

If we estimate such a model, what would we


expect ?

Copyright 2015 Pearson Education, Inc. All rights reserved.

14-51
12-

What Is a Dynamic Model? (DLM)


Yt = 0 + 0Xt + 1Xt1 + 2Xt2 + ... + pXtp + t
where: 1 = 0
3 = 30
.
.
p = P0

(12.2)

(12.8) 2 = 20

As long as is between 0 and 1, these coefficients


will indeed smoothly decline, as shown in Figure 12.1
Copyright 2015 Pearson Education, Inc. All rights reserved.

14-52
12-

Figure 12.1 Geometric Weighting Schemes for


Various Dynamic Models

Copyright 2015 Pearson Education, Inc. All rights reserved.

14-53
12-

Potential issues from estimating Equation


12.2 with OLS:
Yt = 0 + 0Xt + 1Xt1 + 2Xt2 + ... + pXtp + t

(12.2)

GDPt = 0 + 0MSt + 1MSt1 + 2MSt2 + ... + pMStp + t


1. The various lagged values of X are likely to be severely
multicollinear, making coefficient estimates
imprecise
there is no guarantee that the estimated coefficients
will follow the smoothly declining pattern that
economic theory would suggest
Instead, its quite typical to get something like:

Copyright 2015 Pearson Education, Inc. All rights reserved.

14-54

Potential issues from estimating Equation 12.2


with OLS:
Yt = 0 + 0Xt + 1Xt1 + 2Xt2 + ... + pXtp + t

(12.2)

GDPt = 0 + 0MSt + 1MSt1 + 2MSt2 + ... + pMStp + t

2. The degrees of freedom tend to decrease,


sometimes substantially, since we have to:
estimate a coefficient for each lagged X, thus
increasing K and lowering the degrees of
freedom (N K 1)
decrease the sample size by one for each
lagged X, thus lowering the number of
observations, N, and therefore the degrees of
freedom (unless data for lagged Xs outside the
14-55
12-

Copyright 2015 Pearson Education, Inc. All rights reserved.

If Ad Hoc Distributed Lag Models


Yt = 0 + 0Xt + 1Xt1 + 2Xt2 + ... + pXtp + t
GDPt = 0 + 0MSt + 1MSt1 + 2MSt2 + ... +
+pMStp + t
have all these problems, how can we still
correctly estimate, say, the impact of a change in
the money supply on GDP ?
Copyright 2015 Pearson Education, Inc. All rights reserved.

14-56

Ad Hoc DLM problem resolution


Because of the aforementioned problems with
an Ad Hoc Distributed Lag Model:
Yt = 0 + 0Xt + 1Xt1 + 2Xt2 + ... + pXtp + t
we always want to rewrite it as

Yt 0 0 X t Yt1 ut

(12.3)

GDPt = 0 + 0MSt + 1GDPt1 + ut


Note that Y is on the left-hand side as Yt, and
on the right-hand side as Yt1
Its this difference in time period that
14-57
makes the equation dynamic
12Copyright 2015 Pearson Education, Inc. All rights reserved.

What Is a Dynamic Model?


The simplest dynamic model is:

Yt 0 0 X t Yt1 ut

(12.3)

GDPt = 0 + 0MSt + 1GDPt1 + ut


Note that Y is on the left-hand side as Yt,
and on the right-hand side as Yt1
Its this difference in time period that
makes the equation dynamic
14-58

Copyright 2015 Pearson Education, Inc. All rights reserved.

12-

Serial Correlation and Dynamic Models


Dynamic models:

Now serial correlation causes bias in the coefficients


produced by OLS

Yt 0 0 X t Yt1 ut
GDPt = 0 + 0MSt + 1GDPt1 + ut

Can we use the Durbin Watson d test to detect this ?


Why or why not ?
Copyright 2015 Pearson Education, Inc. All rights reserved.

14-59
12-

Testing for Serial Correlation in Dynamic Models

Yt 0 0 X t Yt1
ut

Using the Lagrange Multiplier to test for serial


correlation for a typical dynamic model involves
three steps:
1. Obtain the residuals of the estimated equation:

2. Use these residuals as the dependent variable in


an auxiliary regression that includes as
independent variables all those on the right-hand
side of the original equation as well as the lagged
residuals:
Copyright 2015 Pearson Education, Inc. All rights reserved.

14-60
12-

Testing for Serial Correlation in Dynamic Models


3. Estimate Eq 12.18
using OLS and then test the null hypothesis that a3 = 0 with
the following test statistic:
LM = N*R2 (12.19)
where: N = the sample size
R2 is the unadjusted coefficient of determination
For large samples, LM has a chi-square distribution with
degrees of freedom equal to the number of restrictions in the
null hypothesis (in this case, one).
If LM is greater than the critical chi-square value from
the corresponding Statistical Table, then we reject the
null hypothesis that a3 = 0 and conclude that there is
indeed serial correlation in the original equation
14-61
Copyright 2015 Pearson Education, Inc. All rights reserved.

12-

More in class practice and examples


A farmers association hire you to predict inches of growth
for corn as a function of rain on a monthly basis (they
provide you with the data they have been collecting for
the past 14 months). You estimate the model:

InGrwtht=B0+B1InRaint+B2Tempt+B3InGrwtht-1+ut
1. What sign to you expect each coefficient to have ?
2. Your results are:
InGrwtht=1.3+.11InRaint+.19Tempt-.01InGrwtht-1, R2=.48
(.07)

(.003)

(.02)

(.003)

Which are significant ?


3. Was introducing the lag of the dependent variable a
14-62
good idea or should you remove it ?
Copyright 2015 Pearson Education, Inc. All rights reserved.

More in class practice and examples


A farmers association hire you to predict inches of growth for
corn as a function of rain on a monthly basis (they provide you
with the data they have been collecting for the past 14
months). You estimate the model:
InGrwtht=1.3+.11InRaint+.19Tempt-.01InGrwtht-1, R2=.48
(.07) (.003)

(.02)

(.003)

4. Interpret in words the findings for your employer.


5. How would you test whether your model suffers from
serial correlation ?
6. You run the LM test and find: LM=_____
7 . Do you have serial correlation ?
Copyright 2015 Pearson Education, Inc. All rights reserved.

14-63

More in class practice and examples


A farmers association hire you to predict inches of growth for
corn as a function of rain on a monthly basis (they provide you
with the data they have been collecting for the past 14
months). You estimate the model:
InGrwtht=1.3+.11InRaint+.19Tempt-.01InGrwtht-1, R2=.48
(.07) (.003)

(.02)

(.003)

6. You run the LM test and find: LM=N* R2=6.72


7 . Do you have serial correlation? F-test table (Chi-Square)

Copyright 2015 Pearson Education, Inc. All rights reserved.

14-64

Correcting for Serial Correlation in


Dynamic Models
There are essentially three strategies for attempting to rid a
dynamic model of serial correlation:
improving the specification:
Only relevant if the serial correlation is impure

instrumental variables:
substituting an instrument (a variable that is highly correlated with YM
but is uncorrelated with ut) for Yt: in the original equation effectively
eliminates the correlation between Ytl and ut
Problem: good instruments are hard to come by (more in Ch 12)

modified GLS:
Technique similar to the GLS procedure we learned
Potential issues: sample must be large and the standard

Copyright 2015 Pearson Education, Inc. All rights reserved.

14-65
12-

Then, are Ad Hoc Distributed Lag


Models
Yt = 0 + 0Xt + 1Xt1 + 2Xt2 + ... + pXtp + t
GDPt = 0 + 0MSt + 1MSt1 + 2MSt2 + ... +
+pMStp + t

useless or do they offer any


information to a researcher ?

They can tell us if one time-series variable


consistently and predictably changes before
another one.
Copyright 2015 Pearson Education, Inc. All rights reserved.

14-66
9-66

Granger Causality
Granger causality, or precedence, is a circumstance
in which one time series variable consistently and
predictably changes before another variable
A word of caution: even if one variable precedes
(Granger causes) another, this does not mean that the
first variable causes the other to change
There are several tests for Granger causality
They all involve distributed lag models in one form or
another, however
Well discuss an expanded version of a test originally
developed by Granger

Copyright 2015 Pearson Education, Inc. All rights reserved.

14-67
12-

Granger Causality (cont.)


Granger suggested that to see if A Granger-caused Y, we
should run:
Yt = 0 + 1Yt1 + ... + pYtp + 1At1 + ... + pAtp + t(12.20)
and test the null hypothesis that the coefficients of the
lagged As (the s) jointly equal zero
If we can reject this null hypothesis using the F-test,
then we have evidence that A Granger-causes Y
Note that if p = 1, Equation 12.20 is similar to the
dynamic model, Equation 12.3 Y X Y
t

t1

ut

Applications of this test involve running two Granger


tests, one in each direction
14-68
12Copyright 2015 Pearson Education, Inc. All rights reserved.

Granger Causality (cont.)


That is, run Equation 12.20:
Yt = 0 + 1Yt1 + ... + pYtp + 1At1 + ... + pAtp + t
(12.20)

and also run:


At = 0 + 1At1 + ... + pAtp + 1Yt1 + ... + pYtp + t
(12.21)

testing for Granger causality in both directions by


testing the null hypothesis that the coefficients of the
lagged Ys (again, the s) jointly equal zero
If the F-test is significant for Equation 12.20 but not
for Equation 12.21, then we can conclude that
A Granger-causes Y
Copyright 2015 Pearson Education, Inc. All rights reserved.

14-69
12-

3. Time Series Data

Time series basics:


A. Notation
B. Lags, first differences, and growth rates
C. Autocorrelation (serial correlation)
D. Stationarity.

Copyright 2015 Pearson Education, Inc. All rights reserved.

14-70

STATIONARITY

Copyright 2015 Pearson Education, Inc. All rights reserved.

14-71

Spurious Correlation and Nonstationarity

Independent variables can appear to be more significant than they


actually are if they have the same underlying trend as the
dependent variable
Example: In a country with rampant inflation almost any nominal
variable will appear to be highly correlated with all other
nominal variables
Why?
Nominal variables are unadjusted for inflation, so every nominal
variable will have a powerful inflationary component

Such a problem is an example of spurious correlation:


a strong relationship between two or more variables that is not caused by
a real underlying causal relationship
If you run a regression in which the dependent variable and one or more
independent variables are spuriously correlated, the result is a
spurious regression, and the t-scores and overall fit of such spurious
14-72
regressions are likely to be overstated and untrustworthy

Copyright 2015 Pearson Education, Inc. All rights reserved.

What is a main cause of spurious


correlation ?
NONSTATIONARITY TIME SERIES
Lets see what that means and how can we
correct for it

Copyright 2015 Pearson Education, Inc. All rights reserved.

14-73

Spurious Correlation and Nonstationarity

Independent variables can appear to be more significant than they


actually are if they have the same underlying trend as the
dependent variable
Such a problem is an example of spurious correlation:
a strong relationship between two or more variables that is not caused by
a real underlying causal relationship
If you run a regression in which the dependent variable and one or more
independent variables are spuriously correlated, the result is a
spurious regression

coefficients are biased: upward or downward?


the t-scores and overall fit of such spurious regressions
are likely to be overstated and untrustworthy

Copyright 2015 Pearson Education, Inc. All rights reserved.

14-74

Stationary and Nonstationary Time Series

Copyright 2015 Pearson Education, Inc. All rights reserved.

14-75

Stationary and Nonstationary Time Series


a time-series variable, Xt, is stationary if:
1. the mean of Xt is constant over time,
2. the variance of Xt is constant over time, and
3. the simple correlation coefficient between Xt
and Xtk depends on the length of the lag (k) but on no
other variable (for all k)
If one or more of these properties is not met, then Xt
is nonstationary
If a series is nonstationary, that problem is often
referred to as nonstationarity
14-76
12Copyright 2015 Pearson Education, Inc. All rights reserved.

Stationary and Nonstationary Time Series


a time-series variable, Xt, is stationary if:
1. the mean of Xt is constant over time,
2. the variance of Xt is constant over time, and
3. the simple correlation coefficient between Xt
and Xtk depends on the length of the lag (k) but on no
other variable (for all k)
What is real per capita output ?
What is the growth rate for real per capita output ?
Copyright 2015 Pearson Education, Inc. All rights reserved.

14-77
12-

Stationary and Nonstationary Time Series

To get a better understanding of these issues, consider the case


where Yt is generated by an equation that includes only past values
of itself (an autoregressive equation):
Yt =

Yt1 + vt

(12.22)
GDPt =

GDPt1 + vt

where vt is a classical error term

Can we see that if | | < 1, then the expected value of Yt will


eventually approach 0 (and therefore be stationary) as the sample
size gets bigger and bigger? (Remember, since vt is a classical error
term, its expected value = 0)

Similarly, can we see that if | | > 1, then the expected value of

Yt will continuously increase, making Yt nonstationary?

This is nonstationarity due to a trend, but it still can cause


spurious regression results
Copyright 2015 Pearson Education, Inc. All rights reserved.

14-78

Stationary and Nonstationary Time Series


Most importantly, what about if || = 1? In this case:
Yt = Yt1 + vt (12.23)

GDPt = GDPt1 + vt
This is a random walk: the expected value of Yt does
not converge on any value, meaning that it is
nonstationary
This circumstance, where = 1 in Equation 12.23 (or
similar equations), is called a unit root
If a variable has a unit root, then Equation 12.23
holds, and the variable follows a random walk and is
nonstationary
Copyright 2015 Pearson Education, Inc. All rights reserved.

14-79
12-

The DickeyFuller Test


From the previous discussion of stationarity and unit
roots, it makes sense to estimate Equation 12.22:
Yt = Yt1 + vt (12.22)
GDPt =

GDPt1 + vt

and then determine if || < 1 to see if Y is stationary


This is almost exactly how the Dickey-Fuller test
works:
1. Subtract Yt1 from both sides of Equation 12.22,
yielding:
(Yt Yt1) = ( 1)Yt1 + vt

Copyright 2015 Pearson Education, Inc. All rights reserved.

(12.26)

14-80
12-

The DickeyFuller Test


(Yt Yt1) = ( 1)Yt1 + vt
GDPt - GDPt1 = (-1) GDPt1 + vt
If we define Yt = Yt Yt1 then we have the simplest
form of the DickeyFuller test:
Yt = 1Yt1 + vt

(12.27)

where 1 = 1
Note: alternative Dickey-Fuller tests additionally
include a constant and/or a constant and a trend term
2. Set up the test hypotheses:
H0: 1 = 0 (unit root)
HA: 1 < 0 (stationary)
Copyright 2015 Pearson Education, Inc. All rights reserved.

14-81
12-

The DickeyFuller Test (cont.)


3. Set up the decision rule:
If is statistically significantly less than 0, then we can
reject the null hypothesis of nonstationarity
If is not statistically significantly less than 0, then
we cannot reject the null hypothesis of
nonstationarity
Note that the standard t-table does not apply to Dickey
Fuller tests
For the case of no constant and no trend (Equation 12.27)
the large-sample values for tc are listed on the next slide
Copyright 2015 Pearson Education, Inc. All rights reserved.

14-82
12-

Table 12.1 Large-Sample Critical


Values for the DickeyFuller Test

Copyright 2015 Pearson Education, Inc. All rights reserved.

14-83
12-

Augmented Dickey-Fuller tests:


what and when

Copyright 2015 Pearson Education, Inc. All rights reserved.

14-84
9-84

When should you include a time trend in


the DF test?
The decision to use the intercept-only DF test
or the intercept & trend DF test depends on
what the alternative is and what the data
look like.
In the intercept-only specification, the
alternative is that Y is stationary around a
constant no long-term growth in the series
Yt = 0+1Yt1 + vt
In the intercept & trend specification, the
alternative is that Y is stationary around a linear
time trend the series has long-term growth.
Yt = 0+1Yt1 + 2t + vt
Copyright 2015 Pearson Education, Inc. All rights reserved.

14-85

ln(GDPt ) = 0.244 + 0.0002t 0.030ln(GDPt1)


(0.109) (0.0001)

(0.014)

+ 0.269ln(GDPt1) + 0.178ln(GDPt2)
(0.069)

(0.070)

DF t-statstic = 2.18
Note that the standard t-table does not apply to
DickeyFuller tests
Dont compare this to 1.96 use the Dickey-Fuller
table!
14-86

Copyright 2015 Pearson Education, Inc. All rights reserved.

DF t-statstic = 2.18 (intercept and time


trend):

t = 2.18 does not reject a unit root at 10% level.

Copyright 2015 Pearson Education, Inc. All rights reserved.

14-87

More in class practice and examples

Lets check if there is non-stationarity:


1. Which is the coefficient you have to test whether its significant ?
2. Which of the three Dickey-Fuller tables would you use ?
3. Do we have non stationarity in our study or not ?
Copyright 2015 Pearson Education, Inc. All rights reserved.

14-88
9-88

Typical examples of spurious correlation


What was that again ?
a strong relationship between two or more
variables that is not caused by a real underlying
causal relationship
What was its main cause ?
Nonstationarity
Some more examples:
http://www.tylervigen.com/spurious-correlations

Copyright 2015 Pearson Education, Inc. All rights reserved.

14-89
9-89

NON
STATIONARITY AND
COINTEGRATION
Copyright 2015 Pearson Education, Inc. All rights reserved.

14-90

Cointegration
If the DickeyFuller test reveals nonstationarity, what
should we do?
The traditional approach has been to take first
differences (Y = Yt Yt1 and X = Xt Xt1) and use them
in place of Yt and Xt in the regressions
Issue: the first-differencing basically throws away
information about the possible equilibrium
relationships between the variables
Alternatively, one might want to test whether the timeseries are cointegrated, which means that even though
individual variables might be nonstationary, its possible for
linear combinations of nonstationary variables to be
stationary
Copyright 2015 Pearson Education, Inc. All rights reserved.

14-91

Cointegration (cont.)
To see how this works, consider Equation 12.24:
(12.24)
Assume that both Yt and Xt have a unit root
Solving Equation 12.24 for ut, we get:
(12.30)
In Equation 12.24, u t is a function of two nonstationary
variables, so u t might be expected also to be nonstationary
Cointegration refers to the case where this is not the case:
Yt and Xt are both non-stationary, yet a linear combination
of them, as given by Equation 12.24, is stationary
How does this happen?
This could happen if economic theory supports Equation
12.24 as an equilibrium
14-92
12Copyright 2015 Pearson Education, Inc. All rights reserved.

Cointegration (cont.)
We thus see that if Xt and Yt are cointegrated then OLS
estimation of the coefficients in Equation 12.24 can
avoid spurious results
To determine if Xt and Yt are cointegrated, we begin with
OLS estimation of Equation 12.24 and calculate the OLS
residuals:
(12.31)
Next, perform a Dickey-Fuller test on the residuals
Remember to use the critical values from the DickeyFuller Table!
If we are able to reject the null hypothesis of a unit root
in the residuals, we can conclude that Xt and Yt are
cointegrated and our OLS estimates are not spurious
Copyright 2015 Pearson Education, Inc. All rights reserved.

14-93

A Standard Sequence of Steps for Dealing with


Nonstationary Time Series
1. Specify the model (lags vs. no lags, etc)
2. Test all variables for nonstationarity (technically unit roots)
using the appropriate version of the DickeyFuller test
3. If the variables dont have unit roots, estimate the equation
in its original units (Y and X)
4. If the variables have unit roots, test the residuals of the
equation for cointegration using the DickeyFuller test
5. If the variables have unit roots but are not cointegrated,
then change the functional form of the model to first
differences (X and Y) and estimate the equation
6. If the variables have unit roots and also are cointegrated,
then estimate the equation in its original units
Copyright 2015 Pearson Education, Inc. All rights reserved.

14-94

More in class practice and examples


Assume we are estimating the following model:
GDPt = 0 + 0MSt + t
1. We first check if each variable is nonstationary:
How would you do that ?
2. Assume we find out both are. Please write out
step by step how you would check for
cointegration.
3. If you find no evidence of cointegration, how
can you still estimate your model correctly ?
Copyright 2015 Pearson Education, Inc. All rights reserved.

14-95

AUTOREGRESSION

Copyright 2015 Pearson Education, Inc. All rights reserved.

14-96

4. Autoregressions
(SW Section 14.3)
A natural starting point for a forecasting model is to
use past values of Y (that is, Yt1, Yt2,) to forecast Yt.
An autoregression is a regression model in which Yt
is regressed
against its own lagged values.
The number of lags used as regressors is called the
order of the autoregression.

In a first order autoregression, Yt is


regressed against Yt1.
In a pth order autoregression, Yt is regressed
against Yt1,Yt2,,Ytp.
Copyright 2015 Pearson Education, Inc. All rights reserved.

14-97

The First Order Autoregressive (AR(1)) Model


The population AR(1) model is
Yt = 0 + 1Yt1 + ut
0 and 1 do not have causal interpretations
if 1 = 0, Yt1 is not useful for forecasting Yt
The AR(1) model can be estimated by an OLS
regression of Yt against Yt1 (mechanically, how
would you run this regression??)
Testing 1 = 0 v. 1 0 provides a test of the
hypothesis that Yt1 is not useful for forecasting Yt
Copyright 2015 Pearson Education, Inc. All rights reserved.

14-98

Example: AR(1) model for the growth


rate of GDP
Estimated using data from 1962:Q1
2012:Q4:

GDPGR
t = 1.991 + 0.344GDPGRt1
R2
(0.349)
(0.075)
= 0.11
Is the lagged growth rate of GDP a useful
predictor of the current growth rate of GDP?
1. t = 0.344/.075 = 4.59 > 1.96 (in absolute value)
2. Reject H0: 1 = 0 at the 5% significance level
3. Yes, the lagged growth rate of GDP is a useful of
2
R
the current growth ratebut the
is pretty low.
Copyright 2015 Pearson Education, Inc. All rights reserved.

14-99

The AR(p) model: using multiple lags for


forecasting
The pth order autoregressive model (AR(p)) is
Yt = 0 + 1Yt1 + 2Yt2 + + pYtp + ut
The AR(p) model uses p lags of Y as regressors
The AR(1) model is a special case
The coefficients do not have a causal interpretation
To test the hypothesis that Yt2,,Ytp do not further
help forecast Yt, beyond Yt1, use an F-test
Use t- or F-tests to determine the lag order p
Or, better, determine p using an information
criterion (more on this later)

Copyright 2015 Pearson Education, Inc. All rights reserved.

14-100

Lag Length Selection Using


Information Criteria
How to choose the number of lags p in an AR(p)?

Copyright 2015 Pearson Education, Inc. All rights reserved.

14-101

AR(1) model for the growth rate of GDP


Estimated using data from 1962:Q1
2012:Q4:

GDPGR
t = 1.991 + 0.344GDPGRt1
R2
(0.349)
(0.075)
= 0.11
Is the lagged growth rate of GDP a useful
predictor of the current growth rate of GDP?
1. t = 0.344/.075 = 4.59 > 1.96 (in absolute value)
2. Reject H0: 1 = 0 at the 5% significance level
3. Yes, the lagged growth rate of GDP is a useful of
2
R
the current growth ratebut the
is pretty low.
Copyright 2015 Pearson Education, Inc. All rights reserved.

14-102

Example: AR(2) model for the growth


rate of GDP

GDPGR
t
R2

= 1.63 + 0.28GDPGRt1 + 0.17GDPGRt2


(0.40)

(0.08)

(0.08)

= 0.14

t-statistic testing lag 2 is 2.27 (p-value = .02)


R 2 increased from .11 to .14 by adding lags 2
So, lag 2 help to predicts the growth of GDP.

Copyright 2015 Pearson Education, Inc. All rights reserved.

14-103

Lag Length Selection Using Information Criteria


(SW Section 14.5)
How to choose the number of lags p in an AR(p)?
You can use sequential downward t- or F-tests;
but the models chosen tend to be too large
Another better way to determine lag lengths is
to use an information criterion
Information criteria trade off bias (too few lags)
vs. variance (too many lags)
Two IC are the Bayes (BIC) and Akaike (AIC)

Copyright 2015 Pearson Education, Inc. All rights reserved.

14-104

The Bayes Information Criterion (BIC)


ln T
SSR ( p )
( p 1)
BIC(p) = ln

T
T
First term: always decreasing in p (larger p, better fit)

Second term: always increasing in p.


The variance of the forecast due to estimation error
increases with p so you dont want a forecasting
model with too many coefficients but what is too
many?
This term is a penalty for using more parameters
and thus increasing the forecast variance.
Minimizing BIC(p) trades off bias and variance to determine a
best value of p for your forecast.
The result is that

BIC

Copyright 2015 Pearson Education, Inc. All rights reserved.

p! (SW, App. 14.5)

14-105

Another information criterion: Akaike


Information Criterion (AIC)
2
SSR ( p )
( p 1)
AIC(p) = ln

T
T

BIC(p) ln SSR ( p ) ( p 1) ln T

T
T
=
The penalty term is smaller for AIC than BIC (2 <
lnT)
AIC estimates more lags (larger p) than the BIC
This might be desirable if you think longer lags
might be important.
However, the AIC estimator of p isnt consistent
it can overestimate p the penalty isnt big
enough
Copyright 2015 Pearson Education, Inc. All rights reserved.

14-106

Example: AR model of GDP Growth, lags 0


6:

BIC chooses 2 lags, AIC chooses 2 lags.

Copyright 2015 Pearson Education, Inc. All rights reserved.

14-107

Example: AR model of inflation, lags 0 6:


# Lags
0
1
2
3
4
5
6

BIC
1.095
1.067
0.955
0.957
0.986
1.016
1.046

AIC
1.076
1.030
0.900
0.884
0.895
0.906
0.918

R2
0.000
0.056
0.181
0.203
0.204
0.204
0.204

BIC chooses 2 lags, AIC chooses 3 lags.

Copyright 2015 Pearson Education, Inc. All rights reserved.

14-108

Time Series Regression with Additional


Predictors and the Autoregressive
Distributed Lag (ADL) Model
Can you use lags of more than the
independent variable in your regression ?
If so, how do you decide how many for
those independent variables ?
Copyright 2015 Pearson Education, Inc. All rights reserved.

14-109
9-

Time Series Regression with Additional


Predictors and the Autoregressive Distributed
Lag (ADL) Model
(SW Section 14.4)
So far we have considered models that use only past
values of Y
It makes sense to add other variables (X) that might
be useful predictors of Y, above and beyond the
predictive value of lagged values of Y:
Yt = 0 + 1Yt1 + + pYtp + 1Xt1 + + rXtr + ut
This is an autoregressive distributed lag model
with p lags of Y and r lags of X ADL(p,r).

Copyright 2015 Pearson Education, Inc. All rights reserved.

14-110

Example: interest rates and the term spread

Copyright 2015 Pearson Education, Inc. All rights reserved.

14-111

ADL(2,2) Model (1962-2012):

= 0.97 + 0.24 GDPGRt1 + 0.18 GDPGRt2


GDPGR
t
(0.48) (0.08)

(0.08)

0.14 TSpreadt1 + 0.66 TSpreadt2


(0.42)

(0.43)

R 2 0.17
F-statistic for coefficients on lags of TSpread:
F = 4.43 (p-value = 0.01)
Copyright 2015 Pearson Education, Inc. All rights reserved.

14-112

Generalization of BIC to multivariate (ADL)


models
Let K = the total number of coefficients in the model
(intercept, lags of Y, lags of X). The BIC is,
BIC(K) =

ln T
SSR ( K )
ln
K

T
T

Can compute this over all possible combinations of lags


of Y and lags of X (but this is a lot)!
Shortcut ? Yes:
require the same number of lags for each variable
used Y, X1,X2
you might choose lags of Y by BIC, and decide
whether or not to include X using a Granger causality
test with a fixed number of lags (number depends on
the data and application)
14-113
Copyright 2015 Pearson Education, Inc. All rights reserved.