You are on page 1of 9

Basic Time Series Analysis

1. The problem of spurious regression.

Statisticians working with time series data uncovered a serious problem with standard
econometric techniques applied to time series. Estimation of parameters of the OLS
model produces statistically significant results between time series that contain a
trend and are otherwise random. This finding let to considerable work on the how to
determine what properties a time series must possess if econometric techniques are to
be used. The basic conclusion was that any times series used in econometric
applications must be stationary.

2. Determining the properties of a series.

a. Graphing the series

It is good practice to produce a plot of the time series you are investigating. Does the
series appear to have a stable mean or a trend? Does the variance of the series appear
to be constant over time?

Example: presidential approval.

scatter approval qtr, c(l)


80
Presidential Approval
40 20 60

1947q3 1960q1 1972q3 1985q1 1997q3


Quarter
b. Describing the series

(i) Stationary versus Nonstationary time series

Are the characteristics of a time series – the mean and variance - constant over time?
If the mean and variance are constant over time, then the series is stationary. If the
mean and variance change, then the series is nonstationary. How can you know if a
series is stationary?

(ii) The Dickey Fuller test

A time series is describes as random walk if zt is a function of zt-1+εt.

εt may have mean of zero and variance of σ2, which implies that the best guess of zt+1
is zt and that the forecast error associated with zt+1 is σ.

εt may have be accompanied by a constant, μ, which means that the best guess for zt+1
is zt+ μ. This is designated a random walk with a drift.

Rather than depending upon zt-1, zt may simply be a function of a deterministic trend
zt is a function of βT and εt. This is designated a trend-stationary process.

We could describe each of these processes in a single equation

zt = λzt-1+ μ + βT + εt

Or

zt - zt-1 = μ + (λ−1)zt-1+ βT + εt

We can use OLS to estimate the parameters of this equation. Notice the short hand
for lags and differences in STATA. d.variable indicates the first difference. l.variable
indicates the first lag.

Example: presidential approval

reg d.approval l.approval t


Source | SS df MS Number of obs = 175

-------------+------------------------------ F( 2, 172) = 7.66


Model | 547.813897 2 273.906949 Prob > F = 0.0006
Residual | 6148.40356 172 35.7465323 R-squared = 0.0818
-------------+------------------------------ Adj R-squared = 0.0711
Total | 6696.21746 174 38.4840084 Root MSE = 5.9788

------------------------------------------------------------------------------
D.approval | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
approval |
L1 | -.1642488 .042101 -3.90 0.000 -.24735 -.0811475
t | -.0133481 .0098681 -1.35 0.178 -.0328263 .0061301
_cons | 10.27947 2.869247 3.58 0.000 4.615999 15.94294
------------------------------------------------------------------------------

If λ−1=0 then the series is not stationary (the series contains what is called a unit
root). In the example above, λ−1<0

If β>0 then the series contains a trend. In the example above β=0

If β=0 and λ−1 is not zero, then the series is stationary. The approval series is
stationary.

We can use the same test to make sure that all of the variables in order model are
stationary.

To automatically run this test in STATA, use the dfuller command. Notice that I use
both the regress and trend options. regress reguests that the table be included, trend
indicates that the trend variable should be included. One important difference
between the standard OLS output and the dfuller command output is the calculation
of the critical value from tables published by MacKinnon. We can conclude that the
coefficient on approval t-1 is not zero, so the series is stationary
. dfuller approval, regress trend

Dickey-Fuller test for unit root Number of obs = 175

---------- Interpolated Dickey-Fuller ---------


Test 1% Critical 5% Critical 10% Critical
Statistic Value Value Value
------------------------------------------------------------------------------
Z(t) -3.901 -4.015 -3.440 -3.140
------------------------------------------------------------------------------
* MacKinnon approximate p-value for Z(t) = 0.0121

------------------------------------------------------------------------------
D.approval | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
approval |
L1 | -.1642488 .042101 -3.90 0.000 -.24735 -.0811475
_trend | -.0133481 .0098681 -1.35 0.178 -.0328263 .0061301
_cons | 10.26612 2.862827 3.59 0.000 4.615321 15.91692
------------------------------------------------------------------------------

c. Transforming the series: detrending or creating a first difference?

(ii) detrending – remove a linear trend. Identify the linear trend (regression z on t).
Create a predicted value. Subtract the predicted value from the original. Challenge
(and why this is often not done): is it very useful to assume a linear trend?

(ii) first–difference. In many cases, a series can be transformed from nonstationary to


stationary by taking the first difference. Rather than using zt as a dependent variable,
the dependent variable becomes zt-zt-1. A series that is stationary without any
transformation is designated as I(0), or integrated of order 0. A series that has
stationary first differences is designated I(1), or integrated of order 1.
(iii) Carefully consider the substantive implications of these transformations.
Consider the presidential approval example. Consider the implications of a positive
link between presidential approval and economic conditions

Model 1 (raw series). The level of presidential approval is a function of the level of
MICS. When consumer sentiment is high, approval is high.

Model 2 (detrend). Presidential approval is a function of departure of MICS from a


trend. If MICS is increasing and remains increasing, the level of presidential
approval does not change. If the rate of increase of MICS falls (an observations is
below trend), then presidential approval falls.

Model 3 (first difference). Levels of presidential approval are a function of changes


in MICS. If MICS falls, then presidential approval falls. If MICS falls from very
high to high, then presidential approval falls. If there is no change in consumer
sentiment (remains low), then approval is expected to remain at the mean.

The technical prescription (difference) may or may not coincide with what you think
happens substantively.

Once you are confident the variables are stationary, then you can proceed to OLS.

reg approval mics

Source | SS df MS Number of obs = 176


-------------+------------------------------ F( 1, 174) = 72.56
Model | 7220.76528 1 7220.76528 Prob > F = 0.0000
Residual | 17316.2643 174 99.5187602 R-squared = 0.2943
-------------+------------------------------ Adj R-squared = 0.2902
Total | 24537.0296 175 140.211597 Root MSE = 9.9759

------------------------------------------------------------------------------
approval | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
mics | .5935344 .0696798 8.52 0.000 .456008 .7310608
_cons | 4.298118 6.091941 0.71 0.481 -7.725493 16.32173
3. Diagnosing high order autocorrelation.

(a) Autocorrelation function or ACF

What is the correlation between the values of the error term at t and t-1, t and t-2, t and t-
3? STATA permits use to visually inspect the level of correlation across error terms at
each lag – the plot is labeled a correlogram.

predict res1, residual

corrgram res1

-1 0 1 -1 0 1
LAG AC PAC Q Prob>Q [Autocorrelation] [Partial Autocor]
-------------------------------------------------------------------------------
1 0.8108 0.8121 117.69 0.0000 |------ |------
2 0.6248 -0.0827 187.97 0.0000 |---- |
3 0.4673 -0.0269 227.52 0.0000 |--- |
4 0.3575 0.0318 250.79 0.0000 |-- |
5 0.2591 -0.0359 263.09 0.0000 |-- |
6 0.1722 -0.0362 268.55 0.0000 |- |
7 0.1130 0.0242 270.92 0.0000 | |
8 0.0508 -0.0721 271.4 0.0000 | |
9 0.0415 0.1188 271.72 0.0000 | |
10 0.0085 -0.1003 271.74 0.0000 | |
11 -0.0580 -0.1248 272.38 0.0000 | |
12 -0.0994 0.0322 274.26 0.0000 | |
13 -0.1583 -0.1327 279.08 0.0000 -| -|
14 -0.1642 0.0812 284.3 0.0000 -| |
15 -0.1130 0.1616 286.78 0.0000 | |-
16 -0.0398 0.0542 287.09 0.0000 | |
17 -0.0165 -0.0719 287.14 0.0000 | |
18 -0.0174 -0.0322 287.2 0.0000 | |
19 0.0167 0.0803 287.26 0.0000 | |
20 0.0369 0.0030 287.53 0.0000 | |
21 0.0484 -0.0360 288.01 0.0000 | |
22 0.0260 -0.0777 288.14 0.0000 | |
23 -0.0131 -0.0453 288.18 0.0000 | |
24 -0.0391 -0.0574 288.49 0.0000 | |
25 -0.0129 0.1483 288.53 0.0000 | |-
26 0.0151 -0.0053 288.57 0.0000 | |
27 0.0276 0.0582 288.74 0.0000 | |
28 0.0413 0.0673 289.1 0.0000 | |
29 0.0588 0.0675 289.83 0.0000 | |
30 0.0783 0.0837 291.15 0.0000 | |
31 0.1298 0.1569 294.79 0.0000 |- |-
32 0.1707 0.0391 301.12 0.0000 |- |
33 0.1709 0.0490 307.52 0.0000 |- |
34 0.1645 0.0425 313.49 0.0000 |- |
35 0.1878 0.0924 321.33 0.0000 |- |
36 0.1687 -0.1532 327.7 0.0000 |- -|
37 0.1425 -0.0394 332.28 0.0000 |- |
38 0.1135 0.0388 335.2 0.0000 | |
39 0.0952 0.0911 337.27 0.0000 | |
40 0.0542 -0.0576 337.95 0.0000 | |
(b) Q test

If the error is strictly a product of random error (et=vt), where vt is mean zero and
variance sigma2, then the autocorrelation function should be composed of ρ=0 for k>0.
This is described as a white noise process. If there is no serial correlation – at lag 1 or
other lags – in our model, then the error term should appear to be white noise. There is a
formal test for this implemented in STATA – the Ljung Box Q. Formally, the Q test
statistics is a function of the square of the correlation coefficients at each lag (for j lags)
and the number of observations in the sample. STATA reports if the test is statistically
significant. If the test is significant, then the residuals are correlated.

Q= n (n+2 ) Σ (1/n-j) ρ2j

wntestq res1

Portmanteau test for white noise


---------------------------------------
Portmanteau (Q) statistic = 337.9503
Prob > chi2(40) = 0.0000

4. Estimation strategies: ARIMA models

1. AR(1) – equivalent to prais wintsten (Assumes et=ρet-1+vt)

arima approval mics, ar(1)

(setting optimization to BHHH)


Iteration 0: log likelihood = -558.30383
Iteration 1: log likelihood = -555.80467
Iteration 2: log likelihood = -555.7188
Iteration 3: log likelihood = -555.70873
Iteration 4: log likelihood = -555.70662
(switching optimization to BFGS)
Iteration 5: log likelihood = -555.70602
Iteration 6: log likelihood = -555.70576
Iteration 7: log likelihood = -555.70576

ARIMA regression

Sample: 1 to 176 Number of obs = 176


Wald chi2(2) = 529.56
Log likelihood = -555.7058 Prob > chi2 = 0.0000

------------------------------------------------------------------------------
| OPG
approval | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
approval |
mics | .3900807 .0814409 4.79 0.000 .2304594 .549702
_cons | 22.18424 7.137865 3.11 0.002 8.19428 36.1742
-------------+----------------------------------------------------------------
ARMA |
ar |
L1 | .8286119 .0421585 19.65 0.000 .7459827 .911241
-------------+----------------------------------------------------------------
/sigma | 5.669955 .2467909 22.97 0.000 5.186254 6.153656
------------------------------------------------------------------------------

2. MA (1). Assumes et=θvt-1+vt

. arima approval mics, ma(1)

(setting optimization to BHHH)


Iteration 0: log likelihood = -614.7891
Iteration 1: log likelihood = -598.99993
Iteration 2: log likelihood = -595.73072
Iteration 3: log likelihood = -595.64723
Iteration 4: log likelihood = -595.64185
(switching optimization to BFGS)
Iteration 5: log likelihood = -595.63556
Iteration 6: log likelihood = -595.63516
Iteration 7: log likelihood = -595.63514

ARIMA regression

Sample: 1 to 176 Number of obs = 176


Wald chi2(2) = 188.74
Log likelihood = -595.6351 Prob > chi2 = 0.0000

------------------------------------------------------------------------------
| OPG
approval | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
approval |
mics | .5090958 .0745687 6.83 0.000 .3629438 .6552478
_cons | 11.64684 6.569249 1.77 0.076 -1.228655 24.52233
-------------+----------------------------------------------------------------
ARMA |
ma |
L1 | .680474 .0664946 10.23 0.000 .5501471 .810801
-------------+----------------------------------------------------------------
/sigma | 7.124995 .3504756 20.33 0.000 6.438076 7.811915
------------------------------------------------------------------------------

3. MA (1) AR (1 4)

. arima approval mics, ma(1 4) ar(1 4)

(setting optimization to BHHH)


Iteration 0: log likelihood = -558.24089
Iteration 1: log likelihood = -554.74565
Iteration 2: log likelihood = -554.65487
Iteration 3: log likelihood = -554.64873
Iteration 4: log likelihood = -554.64796
(switching optimization to BFGS)
Iteration 5: log likelihood = -554.64778
Iteration 6: log likelihood = -554.64771
Iteration 7: log likelihood = -554.64771

ARIMA regression
Sample: 1 to 176 Number of obs = 176
Wald chi2(5) = 460.81
Log likelihood = -554.6477 Prob > chi2 = 0.0000

------------------------------------------------------------------------------
| OPG
approval | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
approval |
mics | .3720738 .0794335 4.68 0.000 .2163869 .5277607
_cons | 23.70178 6.94251 3.41 0.001 10.09471 37.30885
-------------+----------------------------------------------------------------
ARMA |
ar |
L1 | .7803232 .0683468 11.42 0.000 .646366 .9142804
L4 | .0101643 .072298 0.14 0.888 -.1315372 .1518659
ma |
L1 | .134136 .0971602 1.38 0.167 -.0562946 .3245665
L4 | .0211127 .0855824 0.25 0.805 -.1466257 .1888511
-------------+----------------------------------------------------------------
/sigma | 5.635385 .2446229 23.04 0.000 5.155932 6.114837

Next week

Dynamic models (including lagged values of X or Y in the model)

Read Gujurati, Chapter 16, Sections 1-5; Kennedy, Chapter 18


PLS 692.
Assignment Number 3.

ARIMA models

Last week we used a remedy for serial correlation that assumed the special case of "first
order autoregressive" error. This week we both use a more general remedy and tackle
new tests for stationarity and high order autcorrelation. We will again use Green et al
data on presidential approval and macropartisanship. You can take the same approach
with the data as in assignment #2 – model presidential approval as a function of other
variables in the data set.

Your assignment

1. What is the link between presidential approval and the variables you include in the
model? Describe your expectations

2. Estimate the model implied by your expectations with OLS. Report the results

3. Are the times series used in the analysis stationary?

Note: the Dickey Fuller augmented test for a unit root determines if the series has a unit
root. Stationary series do not have a unit root, so you would observe a rejection of the
null hypothesis (the absolute value of the test statistic is high) if the series is stationnary.
(An estimated p-value<0.05 implies the series is stationary)

4. Is there a problem with serial correlation?

a. Report and interpret the Ljung-Box Q test statistic for higher order serial
correlation (the default is to test up to lag 40). Note: the statistical test is a
Lagrange Multiplier test. If the value is high, then the null hypothesis - no high
order serial correlation --must be rejected. (An estimated p-value<0.05 imples
serial correlation.)

b. Produce and interpret the correlogram. Are there strong relationships between
residuals at any lags?

5. Specify and estimate an ARIMA model and compare the results to the simple OLS

6. Is there still a problem with serial correlation? Repeat 4(a), and 4(b) above