You are on page 1of 13

Chapter Two

Introduction to Basic Regression Analysis with Time Series Data

2.1 The Nature of Time Series Data

 A time series data set consists of observations on a variable or several variables over time.
 One objective of analysing economic data is to predict or forecast the future values of economic
variables. Because past events can influence future events and lags in behaviour are prevalent in the
social sciences, time is an important dimension in a time series data set.
 The chronological ordering of observations in a time series conveys potentially important
information.
 Economic time series data can rarely be assumed to be independent across time.
 Economic data may be collected on daily, weekly, monthly, quarterly or annual basis.
 We assume that the observations are equally spaced in time.

2.2 Stationary and Non-Stationary Stochastic Processes

 A random variable is a variable whose value is unknown until it is observed. A variable whose value is
determined by the outcome of a chance experiment is called a random variable .
 discrete if it can take only a finite number of values and they can be counted by using the
positive integers.
 Continuous if it can take any real value in an interval on the realnumber line.
 The theory of random process was developed in order to explain the fluctuations.
 A random process is a collection of random variables defined on a given probability space.A collection
of random variables ordered in time instants is called a stochastic or random process.
 Random process is described by using the statistical expectations, covariance, variance, and correlation
functions.
 Just as we use sample data to draw inferences about a population in cross-sectional data, in time series
we use the realization to draw inferences about the underlying stochastic process.
 A given series can be either stationary or non-stationary.The main difference between these series is the
1
degree of persistence of shocks.

Prepared By TEFERA G. |
2.2.1 Stationary Stochastic Processes

 A stochastic process is said to be stationary if its mean and variance are constant over time and the value
of the covariance between the two time periods depends only on the distance or gap or lag between the
two time periods and not the actual time at which the covariance is computed. Such a stochastic process
is known as a weakly stationary, or covariance stationary. Such a time series will tend to return to its
mean (called mean reversion) and fluctuations around this mean (measured by its variance) will have
broadly constant amplitude.
 A time series is strictly stationary if all the moments of its probability distribution are invariant over
time. If, however, the stationary process is normal, the weakly stationary stochastic process is also
strictly stationary, for the normal stochastic process is fully specified by its two moments, the mean and
the variance.
 To explain weak stationarity, let Yt be a stochastic time series with these properties:
Mean: E(Yt) = µ
Variance: var (Yt) = E(Yt − µ)2 = ζ2
Covariance: γk = Cov (Yt, Yt-k) = Cov (Yt, Yt+k) = E[(Yt − µ) (Yt+k − µ)]
 As the covariance (autocovariances) are not independent of the units in which the variables are
measured, it is common to standardize it by defining autocorrelations ρkas

 Note that ρ0 = 1, while − 1 ≤ ρk ≤ 1.


 The correlation of a series with its own lagged values is called autocorrelation or serial correlation.
 The autocorrelations considered as a function of k are referred to as the autocorrelation function (ACF).
 From the ACF we can infer the extent to which one value of the process is correlated with previous
values and thus the length and strength of the memory of the process. It indicates how long (and how
strongly) a shock in the process (εt) affects the values of Yt.
 A shock in an MA(p) process affects Ytin p+1 periods only, while a shock in the AR(p) process affects
all future observations with a decreasing effect.
 Why are stationary time series so important? Because if a time series is non-stationary, we can study its
behaviour only for the time period under consideration. Each set of time series data will therefore be for
a particular episode. As a consequence, it is not possible to generalize it to other time periods. Therefore,
2
for the purpose of forecasting, such (nonstationary) time series may be of little practical value. Besides,
the classical t tests, F tests, etc. are based on the assumption of stationarity.

Prepared By TEFERA G. |
2.2.2 Non-Stationary Stochastic Processes

 A non-stationary time series will have a time-varying mean or a time-varying variance or both.
 We call a stochastic processpurely random (white noise) if it has zero mean, constant variance, and is
serially uncorrelated.
 A classic example for non-stationary stochastic process is the random walk model (RWM). Stock prices
or exchange rates, follow a random walk. Today’s stock price is equal to yesterday’s stock price plus a
random shock.
 We distinguish two types of random walks: (1) random walk without drift (i.e., no constant term) and (2)
random walk with drift.
 Random Walk without Drift: Suppose ut is a white noise error term. Then the series Yt is
said to be a random walk if Yt = Yt-1 + ut AR(1)
In general, if the process started at some time 0 with a value of Y0, we have
Yt = Y0 +∑ut.Therefore, E(Yt) = E(Y0 +∑ut) = Y0 and var (Yt) = tζ2.
 RWM is characterized by persistence of random shocks and that’s why it is said to have an
infinite memory.
 Random Walk with Drift: Yt = δ + Yt-1 + ut where δ is known as the drift parameter.
E(Yt) = Y0 + tδ and var (Yt) = tζ2.
 RWM, with or without drift, is a non-stationary stochastic process.
 Regression of one time series variable on one or more time series variables often can give
nonsensical results. This phenomenon is known as spurious/ meaningless regression. When Yt and
Xt are uncorrelated I(1) processes, the R2 from the regression of Y on X should tend to zero.Yule
showed that (spurious) correlation could persist in nonstationary time series even if the sample is
very large.
 The spurious regression can be easily seen from regressing the first differences of Yt (= ∆Yt) on the
first differences of Xt (= ∆Xt) where R2 is practically zero. One way to guard against it is to find out
if the time series are cointegrated.
 The usual statistical results do not hold for spurious regression when all the regressors are I(1) and
not cointegrated.

Prepared By TEFERA G. |
AUTOCORRELATION

2.3 Trend Stationary and Difference Stationary

 Based on the nature of trend, an economic time series can be trend stationary or difference stationary.
A trend stationary time series has a deterministic trend, whereas a difference stationary time series
has a variable, or stochastic, trend. The common practice of including the time or trend variable in a
regression model to detrend the data is justifiable only for trend stationary time series.
 If the trend in a time series is completely predictable and not variable, we call it a deterministic
trend, whereas if it is not predictable, we call it a stochastic trend.
 Consider the following model of the time series Yt
Yt = β1 + β2t + β3Yt-1 + ut , where ut is a white noise error term and where t is time
measured chronologically.
 Deterministic trend: If β1 ≠ 0, β2≠ 0, β3 = 0, we obtain Yt = β1 + β2t + ut which is called a trend
stationary process. Although the mean of Yt is β1 + β2t, which is not constant, its variance (= ζ2) is. If
we subtract the mean of Yt from Yt, the resulting series will be stationary, hence the name trend
stationary. This procedure of removing the (deterministic) trend is called detrending.
 Random walk with drift and deterministic trend: If β1≠ 0, β2 ≠ 0, β3 = 1, we obtain: Yt = β1 + β2t
+ Yt-1 + ut, which can be seen if we write this equation as∆Yt = β1 + β2t + utwhich means that Yt is
non-stationary.
 Deterministic trend with stationary AR (1) component: If β1≠ 0, β2≠ 0, β3< 1, then we get Yt = β1
+ β2t + β3Yt-1 + utwhich is stationary around the deterministic trend.
 Pure random walk: If β1 = 0, β2 = 0, β3 = 1, we get Yt = Yt-1 + ut which is nothing but a RWM
without drift and is therefore non-stationary. If we write this equation as ∆Yt = (Yt – Yt-1) = ut it
becomes stationary. Hence, a RWM without drift is a difference stationary process and we call the
RWM without drift integrated of order 1.
 Random walk with drift: If β1≠ 0, β2 = 0, β3 = 1, we getYt = β1 + Yt-1 + ut which is a random walk
with drift and is therefore non-stationary. If we writeit as (Yt– Yt-1) = ∆Yt = β1 + ut, this means Yt
will exhibit a positive (β1> 0) or negative (β1< 0) trend. Such a trend is called a stochastic trend.
Equation (Yt – Yt-1) is a difference stationary process because the non-stationarity in Yt can be
eliminated by taking first differences of the time series.
 If a non-stationary time series has to be differenced dtimes to make it stationary, that time series is 4
said to be integrated of order d. A time series Yt integrated of order dis denoted as Yt∼I(d).

Prepared By TEFERA G. |
 If a time series Yt is stationary to begin with (i.e., itdoes not require any differencing), it is said to be
integrated of order zero,denoted by Yt∼I(0). Most economic time series are generally I(1).
 An I(0) series fluctuates around its mean with a finite variance that does not depend on time, while an
I(1) series wanders widely.

2.4 Tests of Stationarity: The Unit Root Test

 The random walk model is an example of what is known in the literature as a unit root process.
 How do we find out if a given time series is stationary?
 There are several tests of stationarity: graphical analysis, the correlogram test and the unit root test.
But we focus on the last one.
 One can allow for nonzero means by adding an intercept term to the model.
 Let us write the RW without drift as: Yt = ρYt-1 + ut -1 ≤ ρ ≤ 1.
 If ρ is 1, we face what is known as the unit root problem, that is, a situation of non-
stationarity. The name unit root is due to the fact that ρ = 1. Thus the terms non-stationarity,
random walk, and unit root can be treated as synonymous.
 If |ρ| < 1, then it can be shown that the time series Yt is stationary.
 The above equation can be rewritten as:
Yt – Yt-1 = ρYt-1 – Yt-1 + ut
= (ρ − 1)Yt-1 + ut
∆Yt = δYt-1 + utwhere δ = (ρ − 1).
 The null hypothesis now becomes δ = 0. If δ = 0, then ρ = 1, that is we have a unit root.
 It may be noted that if δ = 0, ∆Yt = (Yt – Yt-1) = utand since ut is a white noise error term, it is
stationary.
 If δ is zero, we conclude that Yt is nonstationary. But if it is negative, we conclude that Yt is
stationary.
 Which test we should use to find out if the estimated coefficient of Yt-1 is zero or not?
 Under the null hypothesis that δ = 0,the t value of the estimated coefficient of Yt-1 does not
follow the t distribution even in large samples; that is, it does not have an asymptotic normal
distribution. Hence, t test can’t be used.
 Dickey and Fuller have shown that under the null hypothesis that δ = 0, the estimated t value
of the coefficient of Yt-1 follows the η (tau) statistic.These authors have computed the critical 5

values of the tau statistic on the basis of Monte Carlo simulations.

Prepared By TEFERA G. |
Dickey–Fuller (DF) test

 The DF test is estimated in three different forms.


Yt is a random walk: ∆Yt = δYt-1 + ut
Yt is a random walk with drift: ∆Yt = β1 + δYt-1 + ut
Yt is a random walk with drift around a stochastic trend: ∆Yt = β1 + β2t + δYt-1 + ut where t is the
time or trend variable.
 In each case, the null hypothesis is that δ = 0; that is, there is a unit root.
 Estimate the above models by OLS; divide the estimated coefficient of Yt-1 in each case by its
standard error to compute the (η) tau statistic.
 If the computed absolute value of the tau statistic (|η|) exceeds the absolute value of DF or
MacKinnon critical tau values, we reject the hypothesis that δ = 0, in which case the time series is
stationary.
 Note that the critical values of the tau test to test the hypothesis that δ = 0, are different for each of
the preceding three specifications of the DF test.
 Before we examine the results, we have to decide which of the three models may be appropriate. We
should rule out the first model because the coefficient of GDPt-1 that is δ is positive implying that ρ
> 1.
 E.g. The U.S. GDP time series
̂ = 0.00576
η = (5.7980)
This can be ruled out because in this case the GDP time series would be explosive, δ > 0 →ρ > 1.
̂
η , ρ= 0.9986
Our conclusion is that the GDP time series is not stationary.
̂
η , ρ=0.9397
Our conclusion is that the GDP time series is not stationary.
Critical Values Critical Values Critical Values
1% 5% 10%

−1.6177 6
No constant −2.5897 −1.9439

Prepared By TEFERA G. |
−3.5064 −2.8947 −2.5842
With constant
−4.0661 −3.4614 −3.1567
Constant and trend

The Augmented Dickey–Fuller (ADF) Test


 DF test assumed that the error term ut was uncorrelated. But in case the ut are correlated, Dickey and
Fuller have developed a test, known as the augmented Dickey–Fuller (ADF) test. This test is
conducted by “augmenting” the preceding three equations by adding the lagged values of the
dependent variable ∆Yt.

whereεtis a pure white noise error term. The number of lagged difference must be enough to
make the error term serially uncorrelated.

 In ADF we still test whether δ = 0 and the ADF test follows the same asymptotic distribution as the
DF statistic, so the same critical values can be used.
̂
η
 The GDP series is still non-stationary.
 In an econometric modelling, the relationship between the dependent variable and the explanatory
variables has been defined either in a form of a static relationship, or in a dynamic relationship.
 A static relationship defines the dependent variable as a function of a set of explanatory variables at
the same point in time. This form of relation is also called “the long run” relationship.
 A dynamic relation involves the non-contemporaneous relationship between the variables. This
relationship defines “the short run” relationship.
 Time series modelling techniques can be classified into three:
 Box-Jenkins ARIMA models
 Box-Jenkins Multivariate Models
 Holt-Winters Exponential Smoothing (single, double, triple)

2.5 ARIMA Modelling of Time Series Data


7
 For forecasting,
 R2 matters (a lot!)

Prepared By TEFERA G. |
 Omitted variable bias isn’t a problem!
 We will not worry about interpreting coefficients in forecasting models
 External validity is paramount: the model estimated using historical data must hold into the
(near) future
 A natural starting point for a forecasting model is to use past values of Y (that is, Yt–1 ,Yt–2 ,…) to
forecast Yt .
 For psychological, technological, and institutional reasons, a regressand may respond to a
regressor(s) with a lapse of time. Regression models that take into account time lags are known as
dynamic or lagged regression models.
 There are two types of lagged models: distributed-lag and autoregressive. In the former, the current
and lagged values of regressors are explanatory variables. In the latter, the lagged value(s) of the
regressand appear as explanatory variables.
 An autoregression is a regression model in which Yt is regressed against its own lagged values.
 The number of lags used as regressors is called the order of the autoregression.
 In a first order autoregression, Yt is regressed against Yt–1
 In a pth order autoregression, Yt is regressed against Yt–1 ,Yt–2 ,…,Yt–p .
 ARIMA methodology emphasizes on analysing the probabilistic, or stochastic, properties of
economic time series on their own under the philosophy let the data speak for themselves.
Box–Jenkins Strategy
i. First examine the series for stationarity. This step can be done by computing the
autocorrelation function (ACF) and the partial autocorrelation function (PACF) or by a
formal unit root analysis.
 The joint distribution of all values of Ytis characterized by the so-called autocovariances, the
covariances between Ytand one of its lags, Yt-k. The covariance between Yt and Yt-k depends on k
only, not on time. This reflects the stationarity of the process.
 Partial autocorrelation is the correlation between Yt and Yt-k after removing the effect of the
intermediate Y’s.
ii. If the time series is not stationary, difference it one or more times to achieve stationarity.
iii. The ACF and PACF of the stationary time series are then computed to find out if the series is
purely autoregressive (AR) or purely of the moving average (MA) type or a mixture of the
8
two. At this stage the chosen ARMA(p, q) model is tentative.

Prepared By TEFERA G. |
 An autoregressive process of order p, an AR(p) process, is given by
, where εtis a white noise process and yt = Yt− µ.
 If there is only one lag, this model says that the forecast value of Y at time t is simply some
proportion (= α1) of its value at time (t − 1) plus a random shock at time t; again the Y values are
expressed around their mean values.
 A moving average process of order q is defined as . In short, a moving
average process is simply a linear combination of white noise error terms.
 It is quite likely that Y has characteristics of both AR and MA and is therefore ARMA. Obviously, it
is possible to combine the autoregressive and moving average specification into an ARMA( p, q)
model, which consists of an AR part of order p and an MA part of order q ,
.
 There are no fundamental differences between autoregressive and moving average processes. The
choice is simply a matter of parsimony.
 If we have to difference a time series d times to make it stationary and then apply the ARMA (p, q)
model to it, we say that the original time series is ARIMA (p, d, q), that is, it is an autoregressive
integrated moving average time series, where p denotes the number of autoregressive terms, d the
number of times the series has to be differenced before it becomes stationary, and q the number of
moving average terms.
iv. The tentative model is then estimated.
v. The residuals from this tentative model are examined to find out if they are white noise. If
they are, the tentative model is probably a good approximation to the underlying stochastic
process. If they are not, the process is started all over again. Therefore, the Box–Jenkins
method is iterative.
vi. The model finally selected can be used for forecasting.

2.6 Multivariate Time Series Analysis

VAR

 According to Sims, if there is true simultaneity among a set of variables, they should all be treated on
an equal footing; there should not be any a priori distinction between endogenous and exogenous
variables. It is in this spirit that Sims developed his VAR model.
9
 It is a truly simultaneous system in that all variables are regarded as endogenous.

Prepared By TEFERA G. |
 The term autoregressive is due to the appearance of the lagged value of the dependent variable on the
right-hand side and the term vector is due to the fact that we are dealing with a vector of two (or
more) variables.
 In VAR modeling the value of a variable is expressed as a linear function of the past, or lagged,
values of that variable and all other variables included in the model.
 If each equation contains the same number of lagged variables in the system, it can be estimated by
OLS.

δ ∑α ∑β

δ ∑ ∑γ

Where ut and vt are uncorrelated stochastic error terms.


 Before we estimate the above model, we have to decide on the maximum lag length, k.
 One way of deciding this question is to use a criterion like the Akaike or Schwarz information
criteria and choose that model that gives the lowest values of these criteria (prediction errors). There
is no question that some trial and error is inevitable.

2.7 Cointegration Analysis

 Cointegration means that despite being individually non-stationary, a linear combination of two or
more time series can be stationary.
 Cointegration of two (or more) time series suggests that there is a long-run, or equilibrium,
relationship between them.

2.7.1 Engle-Granger Test

 Note that EG test runs static regression.


 Be aware that the issue of efficient estimation of parameters in cointegrating relationships is quite a
different issue from the issue of testing for cointegration.
 Assume personal consumption expenditure (PCE) and personal disposable income (PDI) are
individually I(1) variables and we regress PCE on PDI.
10
β β
a) Estimate the error term ut

Prepared By TEFERA G. |
β β
b) Perform unit root test for the error term

The null hypothesis in the Engle-Granger procedure is no-cointegration and the alternative is
cointegration. If the test shows that utis stationary [or I(0)], it means that the linear combination of PCE
and PDI is stationary. If you take consumption and income as two I(1) variables, savings defined as
(income − consumption) could be I(0) and the initial equation is meaningful. In this case we say that the
two variables are cointegrated. If PCE and PDI are not cointegrated, any linear combination of them will
be non-stationary and, therefore, the ut will also be non-stationary.

E.g. ̂

Since PCE and PDI are individually non-stationary, there is the possibility that this regression is
spurious.

̂ ̂

The Engle–Granger 1% critical η value is −2.5899 and so the residuals from the regression of PCE
on PDI are I(0). Thus, this regression is not spurious and we call it the static or long run consumption
function and 0.9672 represents the long-run, or equilibrium, marginal propensity to consumer
(MPC).

 We just showed that PCE and PDI are cointegrated; that is, there is a long-term relationship between
the two. Of course, in the short run there may be disequilibrium.
 The Granger representation theorem, states that if two variables Y and X are cointegrated, then the
relationship between the two can be expressed as ECM.
, where β β
This ECM equation states that ∆PCE depends on ∆PDI and also on the equilibrium error term. If the
latter is nonzero, then the model is out of equilibrium. If ∆PDI is zero and ut-1 is negative (i.e., PCE
is below its equilibrium value), α2ut-1 will be positive (as α2 is expected to be negative), which will
cause ∆CPEt to be positive, leading PCEt to rise in period t. 11
̂

Prepared By TEFERA G. |
Statistically, the equilibrium error term is zero, suggesting that PCE adjusts to changes in PDI in the
same time period (automatically). One can interpret 0.2906 as the short-run marginal propensity to
consume (MPC).
 The error correction mechanism (ECM) developed by Engle and Granger is a means of reconciling
the short-run behavior of an economic variable with its long-run behavior.The ECM links the long-
run equilibrium relationship implied by cointegration with the short-run dynamic adjustment
mechanism that describes how the variables react when they move out of long-run equilibrium.

2.8 Granger Causality Test

 Assume the

δ ∑α ∑β

δ ∑ ∑γ

Given this we can test for the following null hypotheses:

a) Unidirectional causality: if and or if and


b) Bidirectional causality: if and
c) No causality: if and

Critics of VAR

 VAR model is a-theoretic because it uses less prior information.


 Because of its emphasis on forecasting, VAR models are less suited for policy analysis.
 Challenge in choosing the appropriate lag length.
 In an m-variable VAR model, all the m variables should be (jointly) stationary.

2.9 Diagnostic Tests

Lucas Critique: the estimated parameters are not invariant in the presence of policy changes or the
parameters estimated from an econometric model are dependent on the policy prevailing at the time the
model was estimated and will change if there is a policy change.
12

Prepared By TEFERA G. |
13

Prepared By TEFERA G. |

You might also like