Chapter 10: Basic Time Series Regression

Chapter 10: Basic Time Series Regression
1. Time series data have temporal ordering. Past can affect future, not vice versa. That
is the biggest difference from the cross sectional data.
2. In time series data there is a variable that serves as index for the time periods. That
variable also indicates the frequency of observations. For instance, we have yearly data
if one observation becomes available once a year. The stata command tset declares
the data are time series.
3. The regression that uses time series can have causal interpretation, provided that the
regressor series is uncorrelated with the error series. The regression result will be biased
when there are some omitted series. More often, time series are used for forecasting
purpose.
4. Consider the causal study first.
(a) For simplicity a simple regression is considered
Yt = β0 + β1 Xt + ut (1)
where subscript t is used to emphasize that data are time series.

(b) Model (1) is a static model because it only captures the immediate or contem-
poraneous effect of X on Y. When Xt changes by one unit, it only has effect on
Yt (in the same period). In other words, Yt+1 , Yt+2 , and so on are unaffected. In
light of this, model (1) is static.
(c) It is easy to account for lag effect by using lag value. There are first lag, second
lag, and so on. The table below is one example
t Yt Yt−1 Yt−2 ∆Yt
1 2.2 na na na
2 3 2.2 na 0.8
3 -2 3 2.2 -5
i. t is the index series. t = 1 corresponds to the first (or earliest) observation.

ii. Yt−1 is the first lag of Yt . When t = 2, Yt−1 = Y2−1 = Y1 = 2.2. So the second
observation of Yt−1 is the first observation of Yt . The first observation of Yt−1
1
is missing value (denoted by na) because there is no data for Y0 . Essentially
you get the first lag by pushing the whole series one period down.
iii. Yt−2 is the second lag of Yt . You get the second lag by pushing the whole
series two periods down (and two missing values are generated).
iv. ∆Yt is the first difference. By definition,
∆Yt ≡ Yt − Yt−1 (2)
for example, when t = 2, ∆Y2 ≡ Y2 − Y2−1 = 3 − 2.2 = 0.8. ∆Y1 is missing

since Y0 is missing
v. Exercise : Use the above table and find ∆Yt−1 for t = 1, 2, 3
vi. The stata commands to generate the first lag and second lag are
gen ylag1 = y[_n-1]
gen ylag2 = y[_n-2]
vii. Alternatively you can refer to the first and second lags by using the lag
operator L (after declaring time series using tset)
L.y, L2.y
(d) The distributed lag (DL) model uses lag value of X to account for the lag effect.
A DL model with two lags are
Yt = β0 + δ0 Xt + δ1 Xt−1 + δ2 Xt−2 + ut (3)
where the parameter δ is called (short-run) multiplier.

i. It follows that a change in Xt has effect on Yt , Yt+1 and Yt+2 . Then the effect
of Xt vanishes.
2
ii. Mathematically, we can show
dYt
= δ0 (4)
dXt
dYt+1
= δ1 (5)
dXt
dYt+2
= δ2 (6)
dXt
dYt+j
= 0, (j > 2) (7)
dXt
where dYdXt+1
t
can be obtained from the regression where Yt+1 is the dependent
variable.
iii. When we graph δj as a function of j, we obtain the lag distribution.
iv. The cumulative effect of Xt on Yt , Yt+1 and Yt+2 is called long run propensity
(LRP) or long run multiplier.
LRP ≡ δ0 + δ1 + δ2 (8)
There are two steps to obtain the estimate and standard error for LRP. First,
write θ = δ0 + δ1 + δ2 , which implies that δ0 = θ − δ1 − δ2 . Then equation (3)
becomes
Yt = β0 + δ0 Xt + δ1 Xt−1 + δ2 Xt−2 + ut (9)

= β0 + (θ − δ1 − δ2 )Xt + δ1 Xt−1 + δ2 Xt−2 + ut (10)
= β0 + θXt + δ1 (Xt−1 − Xt ) + δ2 (Xt−2 − Xt ) + ut (11)
The last equation suggests running the transformed regression of Yt onto Xt ,

(Xt−1 − Xt ) and (Xt−2 − Xt ). Then θ̂ = LRPd , se(θ̂) = se(LRP
d ) and the
confidence interval for θ is the confidence interval for LRP.
v. Due to multicollinearity, it can be difficult to obtain precise estimates of the
short run multiplier δj . However we can often get good estimate of LRP.
vi. To mitigate multicollinearity, we can impose some restriction on the lag
distribution. For example, if we assume the lag distribution is flat, i.e.,
3
δ0 = δ1 = δ2 = δ then model (3) reduces to
Yt = β0 + δZ + ut , (Z ≡ Xt + Xt−1 + Xt−2 ) (12)
(e) In theory we need to include infinite lag value of X, which is infeasible, if the
effect of X never dies out. Instead we may consider an autoregressive distributed
lag model (ADL) given by
Yt = β0 + ρYt−1 + δXt + ut (13)
By recursive substitution
Yt+1 = β0 + ρYt + δXt+1 + ut+1 (14)

= β0 + ρ(β0 + ρYt−1 + δXt + ut ) + δXt+1 + ut+1 (15)
= . . . + ρδXt + . . . (16)
we can show the effect of X on Y is
dYt dYt+1 dYt+j

= δ, = ρδ, . . . , = ρj δ (17)
dXt dXt dXt
Alternatively you use apply the chain rule of calculus and get
dYt+1 dYt+1 dYt

= = ρδ (18)
dXt dYt dXt
dYt+2 dYt+2 dYt+1
= = ρ(ρδ) = ρ2 δ (19)
dXt dYt+1 dXt
The effect of X decreases exponentially if |ρ| < 1, but never becomes zero no
matter how large j is.
5. We get the autoregressive (AR) model (or autoregression) if X is excluded from the
ADL model. More explicitly, a first order autoregressive model is
Yt = β0 + ρYt−1 + ut (20)
(a) The AR model is more suitable for forecasting than ADL and DL models since
we do not need to forecast X before forecasting Y. This is because in AR model
4
the regressor is just the lagged dependent variable.
(b) The AR model can be used to test serial correlation. A series is serially correlated
if ρ̂ in (20) is significantly different from 0 (when its p-value is less than 0.05).
(c) (Optional) The time series has a unit root (and becomes non-stationary) if it is
extremely serially correlated (or highly persistent). In theory, a series has unit
root if ρ = 1 in (20). In practice ρ̂ ≈ 1 is suggestive of unit root. We can show
the variance of a unit root series goes to infinity as t rises. The conventional law
of large number and central limit theorem both require finite variance, so neither
apply to a unit root series. That means in general we need to difference a unit
root series (to make it stationary) before using it in regression. One exception is,
you do not need to difference unit root series if they are cointegrated. The unit
root series is also called random walk.
(d) The AR model with one lag can be fitted by stata command
reg y L.y
(e) Then the one-period ahead forecast is computed as
Ŷt+1 = β̂0 + ρ̂Yt (21)
and the two-period ahead forecast is
Ŷt+2 = β̂0 + ρ̂Ŷt+1 (22)
In general the h-period ahead forecast is
Ŷt+h = β̂0 + ρ̂Ŷt+h−1 (h = 2, 3, . . .) (23)
In short all forecasts can be computed in an iterative (recursive) fashion.

(f) The AR model with two lags can be fitted by stata command
reg y L.y L2.y
(g) We can add third lag, fourth lag, and so on until the new lag has insignificant
coefficient.
5
6. If the time series is trending, we can add a linear trend (and quadratic trend) into the
AR model and run below regression:
Yt = β0 + ρYt−1 + ct + ut (24)
where t is the linear trend. According to the Frisch-Waugh Theorem, the estimated
coefficient ρ̂ in (24) can be obtained by regressing Ytdetrended onto Yt−1 detrended
, where
detrended
Yt denotes the detrended series, and it is the residual of regressing Yt on the
trend. In short, ρ̂ in (24) measures the effect on the deviation of the series around its
trend.
7. The trend term can also be used in DL model as proxy for unobserved trending factors.
8. If we have monthly or quaterly data, there may exist regular up and down pattern at
specific interval. This phenomenon is called seasonality. For instance, the sale data
typically go up in November and December. It is easy to account for seasonality by
using seasonal dummy that equals one during a particular season.
6
Example: Chapter 10
1. The data are 311 fertil3.dta. See example 10.4 and 10.8 in textbook for detail.
2. The dependent variable is general fertility rate (gfr). It is the number of children born
to every 1000 women of childbearing age. The key independent variable is personal
tax exemption (pe). We want to control two variables. ww2 is a dummy variable that
equals 1 during World War II (year 1941-1945). pill is another dummy variable that
equals one after year 1963 (in that year birth control pill became available)
3. The data contain 72 yearly observations between 1913 to 1984. In 1913, for example,
gfr = 124.7, pe = 0 and pill = 0.
4. Stata command twoway line gfr pe year draws the time series plot of gfr (red line)
and pe (blue line). It is shown that pe was upward trending before mid 1940. Starting
in 1950, both gfr and pe overall have been downward trending. We notice that the
biggest rise in pe happened around WWII.
5. Stata command reg gfr pe ww2 pill estimates the static model of (10.18) in text-
book. All variables are significant, and the signs of their coefficients are expected. The
coefficient of pe indicates that a 12 (= 1/0.083) dollar increase in pe will increase gfr
by about one birth per 1000 women. Being in WWII causes 24 fewer births for every
1000 women. Having access to birth control pill on average causes 31 fewer birth per
1000 women.
6. The static model has a drawback that it only captures the immediate or contempo-
raneous effect of pe on gfr. By contrast, the distributed lag model can capture both
immediate and lag effects. We cannot ignore the lag effect as long as it takes a while
for people to adjust. After generating the first and second lag of pe, stata command
reg gfr pe pelag1 pelag2 ww2 pill
reports the distributed lag model (10.19) in textbook. The F test shows that pe and its
two lags are jointly significant, even though individually insignificant. The individual
insignificance is due to the correlation among pe and its lagged values.
7. Next we run the transformed regression (shown on the top of page 359 in 5th edition)
d = .1007191 and
in order to estimate long run propensity (LRP). It is shown that LRP
its standard error is .0298027.
7
8. We may worry that the regression (10.18) produces spurious (biased) result. From the
time series plot, it is obvious that gfr and pe tended to move together (both moved
down) after 1950s. We may suspect that there is an omitted trending variable, which
is correlated with both gfr and pe. In other words, we suspect the observed correlation
between gfr and pe is possibly due to that omitted variable, so tells us nothing about
causality.
9. Then we include a linear trend as a proxy for the unobserved trending factors. The
coefficient of trend is -1.149872 and significant. This confirms that gfr is overall down-
ward trending. After the trending factor is controlled for, the coefficient of pe becomes
.2788778, much bigger than without trend. This new estimate measures the effect of
pe on detrended gfr. Put differently, it is shown that pe explains the deviation of gfr
around its trend better than the level of gfr.
10. We obtain equation (10.35) in textbook after using both trend and squared trend as
proxy. The squared trend is significant, confirming the nonlinear trending behavior of
gfr.
11. So far we use these time series for purpose of inducing causality. In reality, the main
use of time series is forecasting. We cannot use the stata time series command until
we declare the data are time series, which is done by command tset year
12. Autoregression (AR) is one example of time series model. It is a popular forecasting
model because it uses a variable’s own history to predict its future. In other words, no
other variables are needed. Regressors in autoregression consist of lagged dependent
variables only.
13. Autoregression is rationalized by the fact that most time series are serially correlated.
That is, the current level of most series is correlated with previous or earlier level.
Autoregression utilizes the idea that history can repeat itself.
14. We start with the first order AR model for gfr. The stata command is
reg gfr L.gfr
where L.gfr refers to the fist lag value gf rt−1 . By using the lag operator L, we can
skip the generation of lagged value using [ n-1].
8
15. The estimated AR(1) model is
dr = 1.304937 + .9777202gf rt−1

gf t
It is easy to obtain forecast using AR model. For example, let t = 1980, gf r1979 = 100,
dr
then the forecast of gfr in 1980 is gf 1980 = 1.304937 + .9777202(100) = 99.076957
16. By contrast, if we use distributed lag model, then we need to forecast X before fore-
casting Y. This is a harder job.
17. The AR(2) model (with two lags) can be estimated by command reg gfr L.gfr
L2.gfr where L2.gfr refers to the second lag gf rt−2 . Both the first and second lags
are significant, implying that AR(1) model is mis-specified by omitting the second lag
term.
18. In theory we can keep adding lagged values until it becomes insignificant.
19. Exercise : Use the lag operator L and rewrite the stata command for regression (10.19)
in textbook.
20. Exercise : Please estimate the follow model
gf rt = β0 + β1 pet + β2 ww2t + β3 pillt + β4 gf rt−1 + u
where the lagged dependent variable is used as proxy for the unobserved factors. Com-
pare it to regression (10.34) in textbook.
21. Exercise : Continue the above exercise. Can you reject the hypothesis H0 : β4 = 1?
How to estimate the restricted model that imposes this hypothesis?
9
250
200
150
100
50
0
1920 1940 1960 1980

1913 to 1984
births per 1000 women 15−44 real value pers. exemption, $
10
11
12
13
14
Do File
* Do file for chapter 10
set more off
clear
capture log close
cd "I:\311"
log using 311log.txt, text replace
use 311_fertil3.dta, clear
list in 1/5
sum pe
* time plot for gfr and pe
twoway line gfr pe year
* generate dummy variable for World War II (1941-1945)
gen ww2 = 0
replace ww2 = 1 if year==1941 | year==1942 | year == 1943 | year == 1944 | year == 194
* equation 10.18 in textbook, static model
reg gfr pe ww2 pill
* generate the first and second lagged value of pe
gen pelag1 = pe[_n-1]
gen pelag2 = pe[_n-2]
* equation 10.19 in textbook, finite distributed lag model
reg gfr pe pelag1 pelag2 ww2 pill
* F test for joint significance of pe and its lags
test pe pelag1 pelag2
* estimate LRP and its standard error based on transformed regression
gen dpe1 = pelag1 - pe
gen dpe2 = pelag2 - pe
reg gfr pe dpe1 dpe2 ww2 pill
* generate trend and squared trend
gen t = _n
gen tsq = t^2
* equation 10.34 in textbook
reg gfr pe ww2 pill t
* equation 10.35 in textbook
15
reg gfr pe ww2 pill t tsq
* declear time series data
tset year
* AR(1) regression
reg gfr L.gfr
* AR(2) regression
reg gfr L.gfr L2.gfr
log close
exit
16

Chapter 10: Basic Time Series Regression

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 10: Basic Time Series Regression

Uploaded by

Copyright:

Available Formats

Chapter 10: Basic Time Series Regression

4. Consider the causal study ﬁrst.

(a) For simplicity a simple regression is considered

where subscript t is used to emphasize that data are time series.

i. t is the index series. t = 1 corresponds to the ﬁrst (or earliest) observation.

∆Yt ≡ Yt − Yt−1 (2)

for example, when t = 2, ∆Y2 ≡ Y2 − Y2−1 = 3 − 2.2 = 0.8. ∆Y1 is missing

Yt = β0 + δ0 Xt + δ1 Xt−1 + δ2 Xt−2 + ut (3)

where the parameter δ is called (short-run) multiplier.

Yt = β0 + δ0 Xt + δ1 Xt−1 + δ2 Xt−2 + ut (9)

The last equation suggests running the transformed regression of Yt onto Xt ,

Yt = β0 + δZ + ut , (Z ≡ Xt + Xt−1 + Xt−2 ) (12)

Yt = β0 + ρYt−1 + δXt + ut (13)

Yt+1 = β0 + ρYt + δXt+1 + ut+1 (14)

we can show the eﬀect of X on Y is

dYt dYt+1 dYt+j

dYt+1 dYt+1 dYt

(e) Then the one-period ahead forecast is computed as

Ŷt+1 = β̂0 + ρ̂Yt (21)

and the two-period ahead forecast is

Ŷt+2 = β̂0 + ρ̂Ŷt+1 (22)

In general the h-period ahead forecast is

Ŷt+h = β̂0 + ρ̂Ŷt+h−1 (h = 2, 3, . . .) (23)

In short all forecasts can be computed in an iterative (recursive) fashion.

reg y L.y L2.y

reg gfr pe pelag1 pelag2 ww2 pill

reg gfr L.gfr

dr = 1.304937 + .9777202gf rt−1

20. Exercise : Please estimate the follow model

gf rt = β0 + β1 pet + β2 ww2t + β3 pillt + β4 gf rt−1 + u

1920 1940 1960 1980

births per 1000 women 15−44 real value pers. exemption, $

You might also like