You are on page 1of 28

# 1

## CHAPTER TWENTY-ONE: TIME SERIES ECONOMETRICS: SOME BASIC CONCEPTS 831

21.2 Key Concepts: Page 796
1. Stochastic process
2. Stationarity process
3. Purely random process
4. Nonstationarity process
5. Integrated variables
6. Random walk models
7. Coinegrattion
8. Deterministic and stochastic trends
.
21.3. What is the meaning of a unit root?
21.4. If a time series is I(3), how many times would you have to difference it to make it stationary?
21.5. What are DickeyFuller (DF) and augmented DF tests?
21.6. What are EngleGranger (EG) and augmented EG tests?
21.7. What is the meaning of cointegration?
21.8. What is the difference, if any, between tests of unit roots and tests of cointegration?
21.9. What is spurious regression?
21.10. What is the connection between cointegration and spurious regression?
21.11. What is the difference between a deterministic trend and a stochastic trend?
21.12. What is meant by a trend-stationary process (TSP) and a difference-stationary process (DSP)?
21.13. What is a random walk (model)?
21.14. For a random walk stochastic process, the variance is innite. Do you agree? Why?
21.15. What is the error correction mechanism (ECM)? What is its relation with cointegration?
Problems.
2

21.16. Using the data given in Table 21.1, obtain sample correlograms up to 25 lags for the time series
PCE, PDI, Prots, and Dividends. What gen-eral pattern do you see? Intuitively, which one(s) of these
time series seem to be stationary?
21.17. For each of the time series of exercise 21.16, use the DF test to nd out if these series contain a
unit root. If a unit root exists, how would you characterize such a time series?
21.18. Continue with exercise 21.17. How would you decide if the ADF test is more appropriate than the
DF test?
21.19. Consider the dividends and prots time series given in Table 21.1. Since dividends depend on
prots, consider the following simple model: Dividendst = 1 + 2Prots + ut
a. Would you expect this regression to suffer from the spurious regres-sion phenomenon? Why?
b. Are Dividends and Prots time series cointegrated? How do you test for this explicitly? If, after
testing, you nd that they are cointegrated, would your answer in a change?
c. Employ the error correction mechanism (ECM) to study the short-and long-run behavior of dividends
in relation to prots.
d. If you examine the Dividends and Prots series individually, do they
exhibit stochastic or deterministic trends? What tests do you use?
*e. Assume Dividends and Prots are cointegrated. Then, instead of re-gressing dividends on prots, you
regress prots on dividends. Is
such a regression valid?

Stationary Stochastic Processes
A type of stochastic process that has received a great deal of attention and scrutiny by time series
analysts is the so-called stationary stochastic process. Broadly speaking, a stochastic process is said to be
stationary if its mean and variance are constant over time and the value of the covariance between the
two time periods depends only on the distance or gap or lag between the two time periods and not the
actual time at which the covariance is com-puted. In the time series literature, such a stochastic process
is known as a weakly stationary, or covariance stationary, or second-order stationary, or wide sense,
stochastic process. For the purpose of this chapter, and in most practical situations, this type of
stationarity often sufces.6 To explain weak stationarity, let Yt be a stochastic time series with these
properties:
Mean: E(Yt ) = (21.3.1)
3

Variance: var (Yt ) = E(Yt )2 = 2 (21.3.2)
Covariance: k = E*(Yt )(Yt+k )+ (21.3.3)
where k, the covariance (or autocovariance) at lag k, is the covariance
between the values of Yt and Yt+k, that is, between two Y values k periods
apart. If k = 0, we obtain 0, which is simply the variance of Y ( = 2); if
k = 1, 1 is the covariance between two adjacent values of Y, the type of co-variance we encountered in
Chapter 12 (recall the Markov rst-order au-toregressive scheme). Suppose we shift the origin of Y from
Yt to Yt+m (say, from the rst quar-ter of 1970 to the rst quarter of 1975 for our GDP data). Now if Yt is
to be stationary, the mean, variance, and autocovariances of Yt+m must be the same as of Yt.

Nonstationary Stochastic Processes
Although our interest is in stationary time series, one often encounters non-stationary time series, the
classic example being the random walk model (RWM). It is often said that asset prices, such as stock
prices or exchange rates, follow a random walk; that is, they are nonstationary. We distinguish two
types of random walks: (1) random walk without drift (i.e., no constant or intercept term) and (2)
random walk with drift (i.e., a constant term is present)

Random Walk without Drift. Suppose ut is a white noise error term with mean 0 and variance 2. Then
the series Yt is said to be a random walk if
Yt = Yt1 + ut (21.3.4)
In the random walk model, as (21.3.4) shows, the value of Y at time t is equal to its value at time (t 1)
plus a random shock; thus it is an AR(1) model in the language of Chapters 12 and 17. We can think of
(21.3.4) as a regression of Y at time t on its value lagged one period. Believers in the efcient capital
market hypothesis argue that stock prices are essentially random and therefore there is no scope for
protable speculation in the stock market: If one could predict tomorrows price on the basis of todays
price, we would all be millionaires.
Now from (21.3.4) we can write
Y1 = Y0 + u1
Y2 = Y1 + u2 = Y0 + u1 + u2
Y3 = Y2 + u3 = Y0 + u1 + u2 + u3
In general, if the process started at some time 0 with a value of Y0, we have
4

Yt = Y0 +
t
(21.3.5)
Therefore,
E(Yt ) = E(Y0 +
t
) = Y0
In like fashion, it can be shown that
var (Yt ) = t 2 (21.3.7)
As the preceding expression shows, the mean of Y is equal to its initial, or starting, value, which is
constant, but as t increases, its variance increases indenitely, thus violating a condition of stationarity.
In short, the RWM without drift is a nonstationary stochastic process. In practice Y0 is often set
at zero, in which case E(Yt ) = 0. An interesting feature of RWM is the persistence of random shocks (i.e.,
random errors), which is clear from (21.3.5): Yt is the sum of initial Y0 plus the sum of random shocks. As
a result, the impact of a particular shock does not die away. For example, if u2 = 2 rather than u2 = 0,
then all Yt s from Y2 onward will be 2 units higher and the effect of this shock never dies out. That is
why random walk is said to have an innite memory. As Kerry Patterson notes, random walk
remembers the shock forever10; that is, it has innite memory.

Random Walk with Drift. Let us modify (21.3.4) as follows:
Yt = + Yt1 + ut (21.3.9)
where is known as the drift parameter. The name drift comes from the
fact that if we write the preceding equation as
Yt Yt1 = Yt = + ut (21.3.10)
it shows that Yt drifts upward or downward, depending on being positive or negative. Note that
model (21.3.9) is also an AR(1) model.
Following the procedure discussed for random walk without drift, it can be shown that for the random
walk with drift model (21.3.9),
E(Yt ) = Y0 + t (21.3.11)
var (Yt ) = t 2 (21.3.12)
As you can see, for RWM with drift the mean as well as the variance increases over time, again violating
the conditions of (weak) stationarity. In short, RWM, with or without drift, is a nonstationary stochastic
process.
5

To give a glimpse of the random walk with and without drift, we conducted two simulations as follows:
Yt = Y0 + ut (21.3.13)
where ut are white noise error terms such that each ut N(0, 1); that is, each ut follows the standard
normal distribution. From a random number generator, we obtained 500 values of u and generated Yt
as shown in
(21.3.13). We assumed Y0 = 0. Thus, (21.3.13) is an RWM without drift.
Now consider
Yt = + Y0 + ut (21.3.14)
21.4 UNIT ROOT STOCHASTIC PROCESS
Let us write the RWM (21.3.4) as:
Yt = Yt1 + ut 1 1 (21.4.1)
This model resembles the Markov rst-order autoregressive model that we discussed in the chapter on
autocorrelation. If = 1, (21.4.1) becomes a RWM (without drift). If is in fact 1, we face what is known
as the unit root problem, that is, a situation of nonstationarity; we already know that in this case the
variance of Yt is not stationary. The name unit root is due to the fact that = 1.11 Thus the terms
nonstationarity, random walk, and unit root can be treated as synonymous.
If, however, || 1, that is if the absolute value of is less than one, then it can be shown that the time
series Yt is stationary in the sense we have dened it. In practice, then, it is important to nd out if a
time series possesses a unit root.

21.5 TREND STATIONARY (TS) AND DIFFERENCE STATIONARY (DS) STOCHASTIC PROCESSES
The distinction between stationary and nonstationary stochastic processes (or time series) has a crucial
bearing on whether the trend (the slow long-run evolution of the time series under consideration)
observed in the con-structed time series in Figures 21.3 and 21.4 or in the actual economic time series of
Figures 21.1 and 21.2 is deterministic or stochastic. Broadly speaking, if the trend in a time series is
completely predictable and not variable, we call it a deterministic trend, whereas if it is not predictable,
we call it a stochastic trend. To make the denition more formal, consider the fol-lowing model of the
time series Yt .
Yt = 1 + 2t + 3Yt1 + ut (21.5.1)
Now we have the following possibilities:
Pure random walk: If in (21.5.1) 1 = 0, 2 = 0, 3 = 1, we get
6

Yt = Yt1 + ut (21.5.2)
which is nothing but a RWM without drift and is therefore nonstationary.
But note that, if we write (21.5.2) as
= A
t
Y (Yt Yt1) = ut (21.3.8)
it becomes stationary, as noted before. Hence, a RWM without drift is a
difference stationary process (DSP).

Random walk with drift: If in (21.5.1) 1 0 = , 2 = 0, 3 = 1, we get
Yt = 1 + Yt1 + ut (21.5.3)
which is a random walk with drift and is therefore nonstationary. If we write
it as
(Yt Yt1) = = A
t
Y 1 + ut (21.5.3a)
this means Yt will exhibit a positive (1 > 0) or negative (1 < 0) trend (see
Figure 21.4). Such a trend is called a stochastic trend. Equation (21.5.3a) is a DSP process because the
nonstationarity in Yt can be eliminated by tak-ing rst differences of the time series.

Deterministic trend: If in (21.5.1), 1 0 = , 2 0 = , 3 = 0, we obtain
Yt = 1 + 2t + ut (21.5.4)
which is called a trend stationary process (TSP). Although the mean of Yt is 1 + 2t, which is not
constant, its variance ( = 2) is. Once the values of 1 and 2 are known, the mean can be forecast
perfectly. Therefore, if we subtract the mean of Yt from Yt , the resulting series will be stationary, hence
the name trend stationary. This procedure of removing the (deterministic) trend is called detrending.

Random walk with drift and deterministic trend: If in (21.5.1), 1 0 = , 2 0 = , 3 = 1, we obtain:
Yt = 1 + 2t + Yt1 + ut (21.5.5)
we have a random walk with drift and a deterministic trend, which can be
7

seen if we write this equation as
= A
t
Y 1 + 2t + ut (21.5.5a) which means that Yt is nonstationary.

21.6 INTEGRATED STOCHASTIC PROCESSES
The random walk model is but a specic case of a more general class of sto-chastic processes known as
integrated processes. Recall that the RWM without drift is nonstationary, but its rst difference, as
shown in (21.3.8), is stationary. Therefore, we call the RWM without drift integrated of order 1,
denoted as I(1).
Similarly, if a time series has to be differenced twice (i.e., take the rst difference of the rst differences)
to make it stationary, we call such a time series integrated of order 2.15 In general, if a (nonstationary)
time series has to be differenced d times to make it stationary, that time se-ries is said to be integrated
of order d. A time series Yt integrated of order d is denoted as Yt I(d). If a time series Yt is stationary to
begin with (i.e., it does not require any differencing), it is said to be integrated of order zero, denoted by
Yt I(0). Thus, we will use the terms stationary time series and time series integrated of order zero
to mean the same thing.
Most economic time series are generally I(1); that is, they generally become stationary only after taking
their rst differences.

21.7 THE PHENOMENON OF SPURIOUS REGRESSION
To see why stationary time series are so important, consider the following two
random walk models:
Yt = Yt1 + ut (21.7.1)
Xt = Xt1 + vt (21.7.2)
where we generated 500 observations of ut from ut N(0, 1) and 500 obser-vations of vt from vt N(0,
1) and assumed that the initial values of both Y and X were zero. We also assumed that ut and vt are
serially uncorrelated as well as mutually uncorrelated. As you know by now, both these time series
are nonstationary; that is, they are I(1) or exhibit stochastic trends. Suppose we regress Yt on Xt . Since
Yt and Xt are uncorrelated I(1) processes, the R2 from the regression of Y on X should tend to zero; that
is, there should not be any relationship between the two variables. But wait till you see the regression
results:

8

Variable Coefficient Std. error t statistic
C -13.2556 0.6203 -21.36856
X 0.3376 0.0443 7.61223
R2 = 0.1044 d = 0.0121 ( d = Durban Watson statistics)
As you can see, the coefcient of X is highly statistically signicant, and, although the R2 value is low, it
is statistically signicantly different from zero. From these results, you may be tempted to conclude that
there is a signicant statistical relationship between Y and X, whereas a priori there should be none. This
is in a nutshell the phenomenon of spurious or non-sense regression, rst discovered by Yule. Yule
showed that (spurious) correlation could persist in nonstationary time series even if the sample is very
large. That there is something wrong in the preceding regression is suggested by the extremely low
DurbinWatson d value, which suggests very strong rst-order autocorrelation. According to Granger
and Newbold, an R2 > d is a good rule of thumb to suspect that the estimated regression is spurious, as
in the example above.
21.8 TESTS OF STATIONARITY
By now the reader probably has a good idea about the nature of stationary stochastic processes and
their importance. In practice we face two impor-tant questions: (1) How do we nd out if a given time
series is stationary?
(2) If we nd that a given time series is not stationary, is there a way that it can be made stationary? We
take up the rst question in this section and discuss the second question in Section 21.10. Before we
proceed, keep in mind that we are primarily concerned with weak, or covariance, stationarity. Although
there are several tests of stationarity, we discuss only those that are prominently discussed in the
literature. In this section we discuss two tests: (1) graphical analysis and (2) the correlogram test.
Because of the importance attached to it in the recent past, we discuss the unit root test in the next
section. We illustrate these tests with appropriate examples.
1. Graphical Analysis
As noted earlier, before one pursues formal tests, it is always advisable to plot the time series under
study, as we have done in Figures 21.1 and 21.2 for the data given in Table 21.1. Such a plot gives an
initial clue about the likely nature of the time series. Take, for instance, the GDP time series
shown in Figure 21.1. You will see that over the period of study GDP has been increasing, that is,
showing an upward trend, suggesting perhaps that the mean of the GDP has been changing. This
perhaps suggests that the GDP series is not stationary. This is also more or less true of the other U.S.
economic time series shown in Figure 21.2. Such an intuitive feel is the starting point of more formal
tests of stationarity.
9

2. Autocorrelation Function (ACF) and Correlogram
One simple test of stationarity is based on the so-called autocorrelation function (ACF). The ACF at lag k,
denoted by k, is dened as
k = k/ 0 (21.8.1)
= covariance at lag k/ variance
where covariance at lag k and variance are as dened before. Note that if
k = 0, 0 = 1 (why?)
Since both covariance and variance are measured in the same units of measurement, k is a unitless, or
pure, number. It lies between 1 and +1, as any correlation coefcient does. If we plot k against k, the
graph we obtain is known as the population correlogram. Since in practice we only have a realization
(i.e., sample) of a stochastic process, we can only compute the sample autocorrelation function (SAFC),
k. To compute this, we must rst compute the sample covariance at lag k, k,
and the sample variance, 0, which are dened as18
k =

0 =

## (Yt Y)2/n (21.8.3)

where n is the sample size and Y is the sample mean. Therefore, the sample autocorrelation function
at lag k is
k = k/0 (21.8.4)
which is simply the ratio of sample covariance (at lag k) to sample variance. A plot of k against k is
known as the sample correlogram.

How does a sample correlogram enable us to nd out if a particular time series is stationary?

Figure: 21.8
Let us examine the correlogram of the GDP time series given in Table 21.1. The correlogram up to 25
lags is shown in Figure 21.8. The GDP correlogram up to 25 lags also shows a pattern similar to the
correlogram of the random walk model in Figure 21.7. The autocorrelation coefcient starts at a very
high value at lag 1 (0.969) and declines very slowly. Thus it seems that the GDP time series is
nonstationary. If you plot the correlograms of the other U.S. economic time series shown in Figures 21.1
10

and 21.2, you will also see a similar pattern, leading to the con-clusion that all these time series are
nonstationary; they may be nonstationary in mean or variance or both.
Time series results: Data Table 21.1 Page 794

Included observations: 88

Autocorrelation Partial Correlation AC PAC Q-Stat Prob

. |******* . |******* 1 0.969 0.969 85.462 0.000
. |******* . | . | 2 0.935 -0.058 166.02 0.000
. |******| . | . | 3 0.901 -0.020 241.72 0.000
. |******| . | . | 4 0.866 -0.045 312.39 0.000
. |******| . | . | 5 0.830 -0.024 378.10 0.000
. |******| . | . | 6 0.791 -0.062 438.57 0.000
. |***** | . | . | 7 0.752 -0.029 493.85 0.000
. |***** | . | . | 8 0.713 -0.024 544.11 0.000
. |***** | . | . | 9 0.675 0.009 589.77 0.000
. |***** | . | . | 10 0.638 -0.010 631.12 0.000
. |**** | . | . | 11 0.601 -0.020 668.33 0.000
. |**** | . | . | 12 0.565 -0.012 701.65 0.000
. |**** | . | . | 13 0.532 0.020 731.56 0.000
. |**** | . | . | 14 0.500 -0.012 758.29 0.000
. |*** | . | . | 15 0.468 -0.021 782.02 0.000
. |*** | . | . | 16 0.437 -0.001 803.03 0.000
. |*** | . | . | 17 0.405 -0.041 821.35 0.000
. |*** | . | . | 18 0.375 -0.005 837.24 0.000
. |** | . | . | 19 0.344 -0.038 850.79 0.000
. |** | . | . | 20 0.313 -0.017 862.17 0.000
. |** | .*| . | 21 0.279 -0.066 871.39 0.000
. |** | . | . | 22 0.246 -0.019 878.65 0.000
. |** | . | . | 23 0.214 -0.008 884.22 0.000
. |*. | . | . | 24 0.182 -0.018 888.31 0.000
. |*. | . | . | 25 0.153 0.017 891.25 0.000
. |*. | . | . | 26 0.123 -0.024 893.19 0.000
. |*. | . | . | 27 0.095 -0.007 894.38 0.000
. | . | . | . | 28 0.068 -0.012 894.99 0.000
. | . | . | . | 29 0.043 -0.007 895.24 0.000
. | . | . | . | 30 0.019 -0.005 895.29 0.000
. | . | . | . | 31 -0.003 -0.002 895.29 0.000
. | . | . | . | 32 -0.026 -0.028 895.38 0.000
. | . | . | . | 33 -0.046 0.007 895.69 0.000
. | . | . | . | 34 -0.061 0.047 896.24 0.000
.*| . | . | . | 35 -0.075 0.004 897.08 0.000
.*| . | . | . | 36 -0.085 0.037 898.18 0.000

The Choice of Lag Length.
This is basically an empirical question. A rule of thumb is to compute ACF up to one-third to one-quarter
the length of the time series. Since for our economic data we have 88 quarterly observations, by this
rule lags of 22 to 29 quarters will do. The best practical advice is to start with sufciently large lags and
11

then reduce them by some statistical criterion, such as the Akaike or Schwarz information criterion that
we discussed in Chapter 13.

21.9 THE UNIT ROOT TEST
A test of stationarity (or nonstationarity) that has become widely popular over the past several years is
the unit root test. We will rst explain it, then illustrate it and then consider some limitations of this test.
The starting point is the unit root (stochastic) process that we discussed in Section 21.4. We start with
Yt = Yt1 + ut 1 1 (21.4.1)
where ut is a white noise error term.
We know that if = 1, that is, in the case of the unit root, (21.4.1) be-comes a random walk model
without drift, which we know is a nonstation-ary stochastic process. Therefore, why not simply regress
Yt on its (one-period) lagged value Yt1 and nd out if the estimated is statistically equal to 1? If it is,
then Yt is nonstationary. This is the general idea behind the unit root test of stationarity.
For theoretical reasons, we manipulate (21.4.1) as follows:
Subtract Yt1 from both sides of (21.4.1) to obtain:
Yt Yt1 = Yt1 Yt1 + ut (21.9.1)
= ( 1)Yt1 + ut
which can be alternatively written as:
= A
t
Y Yt1 + ut (21.9.2)
where = ( 1) and , A , as usual, is the rst-difference operator. In practice, therefore, instead of
estimating (21.4.1), we estimate (21.9.2) and test the (null) hypothesis that = 0. If = 0, then = 1,
that is we have a unit root, meaning the time series under consideration is nonstationary.
Before we proceed to estimate (21.9.2), it may be noted that if = 0, (21.9.2) will become
= A
t
Y (Yt Yt1) = ut (21.9.3)
Since ut is a white noise error term, it is stationary, which means that the rst differences of a random
walk time series are stationary, a point we have already made before.
Now let us turn to the estimation of (21.9.2). This is simple enough; all we have to do is to take the rst
differences of Yt and regress them on Yt1 and see if the estimated slope coefcient in this regression (
12

= ) is zero or not. If it is zero, we conclude that Yt is nonstationary. But if it is negative, we conclude
that Yt is stationary. The only question is which test we use to find out if the estimated coefficient of
nd out if the estimated coefcient of Yt1 in (21.9.2) is zero or not.

You might be tempted to say, why not use the usual t test? Unfortunately, under the null hypothesis
that = 0 (i.e., = 1), the t value of the estimated coefcient of Yt1 does not follow the t distribution
even in large samples; that is, it does not have an asymptotic normal distribution.

What is the alternative? Dickey and Fuller have shown that under the null hypothesis that = 0, the
estimated t value of the coefcient of Yt1 in (21.9.2) follows the (tau) statistic.26 These authors
have computed the critical values of the tau statistic on the basis of Monte Carlo simulations. A sample
of these critical values is given in Appendix D, Table D.7. The table is limited, but MacKinnon has
prepared more extensive tables, which are now incorporated in several econometric packages.27 In the
literature the tau statistic or test is known as the DickeyFuller (DF) test, in honor of its discoverers.
Interestingly, if the hypothesis that = 0 is rejected (i.e., the time series is stationary), we can use the
usual (Students) t test.
The actual procedure of implementing the DF test involves several deci-sions. In discussing the nature of
the unit root process in Sections 21.4 and 21.5, we noted that a random walk process may have no drift,
or it may have drift or it may have both deterministic and stochastic trends. To allow for the various
possibilities, the DF test is estimated in three different forms, that is, under three different null
hypotheses.
Yt is a random walk: = A
t
Y Yt1 + ut (21.9.2)
Y t is a random walk with drift: = A
t
Y 1 + Yt1 + ut (21.9.4)
Y t is a random walk with drift around a stochastic trend: = A
t
Y 1 + 2t + Yt1 + ut (21.9.5)
where t is the time or trend variable. In each case, the null hypothesis is that = 0; that is, there is a unit
rootthe time series is nonstationary. The alternative hypothesis is that is less than zero; that is, the
time series is stationary.

13

Dicky-Fuller Test:

Let us return to the U.S. GDP time series. For this series, the results of the three regressions (21.9.2),
(21.9.4), and (21.9.5) are as follows: The dependent variable in each case is = A
t
Y AGDPt
t
P GD

A = 0.00576GDPt1 (21.9.6)
t = (5.7980) R2 =0.0152 d = 1.34
t
P GD

## A = 28.2054 0.00136GDPt1 (21.9.7)

t = (1.1576) (0.2191) R2 = 0.00056 d = 1.35
t
P GD

## A = 190.3857 + 1.4776t 0.0603GDPt1

t = (1.8389) (1.6109) (1.6252) (21.9.8)
R2 = 0.0305 d = 1.31
Our primary interest here is in the t ( = ) value of the GDPt1 coefcient. The critical 1, 5, and 10
percent values for model (21.9.6) are 2.5897, 1.9439, and 1.6177, respectively, and are 3.5064,
2.8947, and 2.5842 for model (21.9.7) and 4.0661, 3.4614, and 3.1567 for model (21.3.8).
As noted before, these critical values are different for the three models. Before we examine the results,
we have to decide which of the three models may be appropriate. We should rule out model (21.9.6)
because the coefcient of GDPt1, which is equal to is positive. But since = ( 1), a positive would
imply that > 1.

Although a theoretical possibility, we rule this case out because in this case the GDP time series would
be explosive. That leaves us with models (21.9.7) and (21.9.8). In both cases the estimated coefcient
is negative, implying that the estimated is less than 1. For these two models, the estimated values
are 0.9986 and 0.9397, respectively. The only question now is if these values are statistically signicantly
below 1 for us to declare that the GDP time series is stationary. For model (21.9.7) the estimated value
is 0.2191, which in absolute value is below even the 10 percent critical value of 2.5842. Since, in
absolute terms, the former is smaller than the latter, our conclusion is that the GDP time series is not
stationary.

14

The story is the same for model (21.9.8). The computed value of 1.6252 is less than even the 10
percent critical value of 3.1567 in ab-solute terms. Therefore, on the basis of graphical analysis, the
correlogram, and the DickeyFuller test, the conclusion is that for the quarterly periods of 1970 to
1991, the U.S. GDP time series was nonstationary; i.e., it contained a unit root.

The Augmented DickeyFuller (ADF) Test
In conducting the DF test as in (21.9.2), (21.9.4), or (21.9.5), it was assumed that the error term ut was
uncorrelated. But in case the ut are correlated, Dickey and Fuller have developed a test, known as the
augmented DickeyFuller (ADF) test. This test is conducted by augmenting the pre-ceding three
equations by adding the lagged values of the dependent vari-able AYt . To be specic, suppose we use
(21.9.5). The ADF test here consists of estimating the following regression:
AYt = 1 + 2t + Yt1 +

=

A
m
i
i t i
Y
1
o + t (21.9.9)
where t is a pure white noise error term and where AYt1 = (Yt1 Yt2), AYt2 = (Yt2 Yt3), etc.
The number of lagged difference terms to include is often determined empirically, the idea being to
include enough terms so that the error term in (21.9.9) is serially uncorrelated. In ADF we still test
whether = 0 and the ADF test follows the same asymptotic distribution as the DF statistic, so the same
critical values can be used.
The Augmented Dicky-Fuller (ADF) test
Null Hypothesis: GDP has a unit root
Exogenous: Constant, Linear Trend
Lag Length: 1 (Automatic - based on SIC, maxlag=11)

t-Statistic Prob.*

Augmented Dickey-Fuller test statistic -2.214243049 0.4749408485712958
Test critical values: 1% level -4.068290085894107
5% level -3.462912333509468
10% level -3.157836346666039

*MacKinnon (1996) one-sided p-values.

(Level, trend intercept)

Augmented Dickey-Fuller Test Equation
Dependent Variable: D(GDP)
15

Method: Least Squares
Date: 11/06/12 Time: 09:21
Sample (adjusted): 1970Q3 1991Q4
Included observations: 86 after adjustments

Variable Coefficient Std. Error t-Statistic Prob.

GDP(-1) -0.07866081105631163 0.03550817804841549 -2.215287164243049
0.0295132669846266
9
D(GDP(-1)) 0.3557941184362571 0.1026909453890556 3.464707789847421
0.0008468952071084
719
C 234.9729141123378 98.58764442917724 2.383391098071484
0.0194652036580554
6
@TREND(1970Q1) 1.892198782738424 0.8791682647741285 2.152260106004348
0.0343168721926446
3

R-squared 0.1526149401000559 Mean dependent var 23.3453488372093
Adjusted R-squared 0.1216130476646922 S.D. dependent var 35.93794212242164
S.E. of regression 33.68186594142686 Akaike info criterion 9.917191453035755
Sum squared resid 93026.38365029257 Schwarz criterion 10.03134714123359
Log likelihood -422.4392324805375 Hannan-Quinn criter. 9.963133828831874

To give a glimpse of this procedure, we estimated (21.9.9) for the GDP series using one lagged difference
of GDP; the results were as follows:

AGDPt = 234.9729 + 1.8921t 0.0786GDPt1 + 0.3557 AGDPt1
t = (2.3833) (2.1522) (2.2152) (3.4647)
R2 = 0.1526 d = 2.0858 (21.9.10)
The t ( = ) value of the GDPt1 coefcient ( = ) is 2.2152, but this value in absolute terms is much less
than even the 10 percent critical value of 3.1570, again suggesting that even after taking care of
possible autocorre-lation in the error term, the GDP series is nonstationary.

The PhillipsPerron (PP) Unit Root Tests:
An important assumption of the DF test is that the error terms ut are independently and identically
distributed. The ADF test adjusts the DF test to take care of possible serial correlation in the error terms
by adding the lagged difference terms of the regressand. Phillips and Perron use nonpara-metric
statistical methods to take care of the serial correlation in the error terms without adding lagged
difference terms. Since the asymptotic distribution of the PP test is the same as the ADF test statistic, we
will not pursue this topic here.
16

ARIMA Models and the BOX- Jenkins Methodology

Topics to be covered:

1. An introduction to the time series econometrics
2. ARIMA models
3. Stationarity
4. Autoregressive Time series models
5. Moving average models
6. ARMA models
7. Integrated processes and the ARIMA models
8. Box-Jenkins models selection
9. Example: The Box-Jenkins Approach

Questions:

1. Explain what is the implication of behind the AR and MA models by using
examples of each.
2. Define the concept of stationarity and state which conditions for statioanrity need
to be present in the AR models .
3. Define and explain the concepts of stationarity and explain why it is important in
the analysis of time series data. Present example of statioanrity and non-
stationarity

The AR(1) Model:

The simplest, pure statistical time series models is the autoregssive of order one model,
or AR(1), which is given below:

t t t
u Y Y + =
1
|

This equation states that the behavior of Yt is largely determined by its onw value in the
preceeding period. So, what will happen in t is largely dependent on what happened in
t-1, or alternatively what will happen in t+1 will be largely be determined by the behavior
of the series in the current time t.

The AR(p) Model: A generalization of the AR(1) model is the AR(1) model. It will be
autoregssive model of order p, and will have p lagged terms as in the following

t p t p t t t
u Y Y Y Y + + + + =

| | | ..........
2 2 1 1

Or suing the summation symbol:

=

+ =
p
i
t i t t
u Y Y
1
|
17

Properties of AR Models:

1. 0 ) ( ) ( ) (
1 1
= = =
+ t t t
Y E Y E Y E
2. Cov(
2
1
0 ) , ( o =
t t
Y Y

The MA(1) Model:

The simplest, pure statistical time series models is that of order one, ot the MA(1) , hich
has the form :

1
+ =
t t t
u u Y u

The implication behind the MA(1) model is that Yt depends on the value of the
immediate past error, which is known at time t.

The MA(q) Model: A generalization of the AR(1) model is the AR(1) model. It will be
autoregssive model of order p, and will have p lagged terms as in the following

q t p t t t t
u u u u Y

+ + + + = u u u ..........
2 2 1 1

Or suing the summation symbol:

=

=
q
i
j t j t
u Y
1
u

ARMA models:

The combinations of AR(p) and MA(q) is known as the ARMA(p,q) models. The general
form of the ARMA (p,q) model is and ARMA(p,q) of the following form:

+ + + + + =
t p t p t t t
u Y Y Y Y | | | ..........
2 2 1 1 q t p t t
u u u

+ + + + u u u ..........
2 2 1 1

Which can be written, using the summations, as:

= =

+ + =
p
i
q
j
j t j t i t i t
u u u Y
1 1
u |

18

BOX-Jenskins Model Selection:

A fundamental idea in the Box-Jenkins approach is the principal of parsimony.
Parsimony (meaning sparseness or stinginess) should come as second nature to
economists and financial analyst. Incorporating additional coefficients will necessarily
increase the fit of the regression equation (i.e. the value of the
2
R will increase), but the
cost will be a reduction of the degrees of freedom. Box and Jenkins argue that the
parsimonious models produce better forecasts than overparameterized models.

In general Box and Jenkins popularized athree-stage method aimed at selecting an
appropriate (parsimonus) ARIMA model for the purpose of estimating and forecasting a
univariate time series. The three stages are:

1. Identification
2. Estimation
3. Diagnostic checking.

Please see the details: In the photocopied sheet

Example: The Box-Jenkins Approach

File: ARIMA.wf1

Date: 11/12/12 Time: 22:17
Sample: 1980Q3 1998Q2
Included observations: 72

Autocorrelation Partial Correlation AC PAC Q-Stat Prob

. |******* . |******* 1 0.958 0.958 68.932 0.000
. |******* .*| . | 2 0.913 -0.067 132.39 0.000
. |******| . | . | 3 0.865 -0.050 190.23 0.000
. |******| . | . | 4 0.817 -0.030 242.57 0.000
. |******| . | . | 5 0.770 -0.013 289.73 0.000
. |***** | . | . | 6 0.723 -0.032 331.88 0.000
. |***** | . | . | 7 0.675 -0.024 369.26 0.000
. |***** | . | . | 8 0.629 -0.022 402.15 0.000
. |**** | . | . | 9 0.582 -0.030 430.77 0.000
. |**** | . | . | 10 0.534 -0.035 455.31 0.000

19

Date: 11/12/12 Time: 22:17
Sample: 1980Q3 1998Q2
Included observations: 72

Autocorrelation Partial Correlation AC PAC Q-Stat Prob

. |*** | . |*** | 1 0.463 0.463 16.112 0.000
. |*. | . | . | 2 0.206 -0.011 19.342 0.000
. |** | . |** | 3 0.289 0.252 25.814 0.000
. |** | . | . | 4 0.251 0.033 30.749 0.000
. |** | . |*. | 5 0.220 0.103 34.592 0.000
. |** | . | . | 6 0.225 0.061 38.671 0.000
. | . | .*| . | 7 0.027 -0.198 38.729 0.000
.*| . | .*| . | 8 -0.074 -0.102 39.187 0.000
. | . | .*| . | 9 -0.041 -0.068 39.327 0.000
. | . | . | . | 10 -0.041 -0.019 39.473 0.000

.

Dependent Variable: DLGDP
Method: Least Squares
Date: 11/12/12 Time: 22:20
Sample: 1980Q3 1998Q2
Included observations: 72
Convergence achieved after 14 iterations
MA Backcast: 1979Q4 1980Q2

Variable Coefficient Std. Error t-Statistic Prob.

C 0.006814 0.001547 4.403203 0.0000
AR(1) 0.714711 0.100576 7.106173 0.0000
MA(1) -0.452598 0.150094 -3.015439 0.0036
MA(2) -0.196976 0.128418 -1.533867 0.1298
MA(3) 0.293634 0.118360 2.480865 0.0156

R-squared 0.336044 Mean dependent var 0.005942
Adjusted R-squared 0.296405 S.D. dependent var 0.006687
S.E. of regression 0.005609 Akaike info criterion -7.461977
Sum squared resid 0.002108 Schwarz criterion -7.303875
Log likelihood 273.6312 Hannan-Quinn criter. -7.399036
F-statistic 8.477592 Durbin-Watson stat 1.890012
Prob(F-statistic) 0.000013

Inverted AR Roots .71
Inverted MA Roots .54+.43i .54-.43i -.62

20

Dependent Variable: DLGDP
Method: Least Squares
Date: 11/12/12 Time: 22:32
Sample (adjusted): 1980Q3 1998Q2
Included observations: 72 after adjustments
Convergence achieved after 9 iterations
MA Backcast: 1980Q2

Variable Coefficient Std. Error t-Statistic Prob.

C 0.006809 0.001464 4.650788 0.0000
AR(1) 0.742293 0.101179 7.336398 0.0000
MA(1) -0.471429 0.161392 -2.921010 0.0047

R-squared 0.279356 Mean dependent var 0.005942
Adjusted R-squared 0.258468 S.D. dependent var 0.006687
S.E. of regression 0.005758 Akaike info criterion -7.435603
Sum squared resid 0.002288 Schwarz criterion -7.340742
Log likelihood 270.6817 Hannan-Quinn criter. -7.397839
F-statistic 13.37388 Durbin-Watson stat 1.876207
Prob(F-statistic) 0.000012

Inverted AR Roots .74
Inverted MA Roots .47

21

Modeling the variance: ARCH-GARCH Models

Dependent Variable: R_FTSE
Method: Least Squares
Date: 11/13/12 Time: 10:30
Sample: 1/01/1990 12/31/1999
Included observations: 2610

Variable Coefficient Std. Error t-Statistic Prob.

C 0.000363 0.000184 1.975016 0.0484
R_FTSE(-1) 0.070612 0.019538 3.614090 0.0003

R-squared 0.004983 Mean dependent var 0.000391
Adjusted R-squared 0.004602 S.D. dependent var 0.009398
S.E. of regression 0.009376 Akaike info criterion -6.500477
Sum squared resid 0.229287 Schwarz criterion -6.495981
Log likelihood 8485.123 Hannan-Quinn criter. -6.498849
F-statistic 13.06165 Durbin-Watson stat 1.993272
Prob(F-statistic) 0.000307

22

Heteroskedasticity Test: ARCH

F-statistic 46.84671 Prob. F(1,2607) 0.0000
Obs*R-squared 46.05506 Prob. Chi-Square(1) 0.0000

Test Equation:
Dependent Variable: RESID^2
Method: Least Squares
Date: 11/13/12 Time: 10:47
Sample (adjusted): 1/02/1990 12/31/1999
Included observations: 2609 after adjustments

Variable Coefficient Std. Error t-Statistic Prob.

C 7.62E-05 3.76E-06 20.27023 0.0000
RESID^2(-1) 0.132858 0.019411 6.844466 0.0000

R-squared 0.017652 Mean dependent var 8.79E-05
Adjusted R-squared 0.017276 S.D. dependent var 0.000173
S.E. of regression 0.000171 Akaike info criterion -14.50709
Sum squared resid 7.64E-05 Schwarz criterion -14.50260
Log likelihood 18926.50 Hannan-Quinn criter. -14.50546
F-statistic 46.84671 Durbin-Watson stat 2.044481
Prob(F-statistic) 0.000000

Heteroskedasticity Test: ARCH

F-statistic 37.03529 Prob. F(6,2597) 0.0000
Obs*R-squared 205.2486 Prob. Chi-Square(6) 0.0000

Test Equation:
Dependent Variable: RESID^2
Method: Least Squares
Date: 11/13/12 Time: 10:58
Sample (adjusted): 1/09/1990 12/31/1999
Included observations: 2604 after adjustments

Variable Coefficient Std. Error t-Statistic Prob.

C 4.30E-05 4.46E-06 9.633006 0.0000
RESID^2(-1) 0.066499 0.019551 3.401305 0.0007
RESID^2(-2) 0.125443 0.019538 6.420328 0.0000
RESID^2(-3) 0.097259 0.019657 4.947847 0.0000
RESID^2(-4) 0.060954 0.019658 3.100789 0.0020
RESID^2(-5) 0.074990 0.019539 3.837926 0.0001
RESID^2(-6) 0.085838 0.019551 4.390579 0.0000

23

R-squared 0.078821 Mean dependent var 8.79E-05
Adjusted R-squared 0.076692 S.D. dependent var 0.000173
S.E. of regression 0.000166 Akaike info criterion -14.56581
Sum squared resid 7.16E-05 Schwarz criterion -14.55004
Log likelihood 18971.68 Hannan-Quinn criter. -14.56010
F-statistic 37.03529 Durbin-Watson stat 2.012275
Prob(F-statistic) 0.000000

Dependent Variable: R_FTSE
Method: ML - ARCH (Marquardt) - Normal distribution
Date: 11/13/12 Time: 11:01
Sample: 1/01/1990 12/31/1999
Included observations: 2610
Convergence achieved after 8 iterations
Presample variance: backcast (parameter = 0.7)
GARCH = C(3) + C(4)*RESID(-1)^2

Variable Coefficient Std. Error z-Statistic Prob.

C 0.000401 0.000178 2.257632 0.0240
R_FTSE(-1) 0.075196 0.019209 3.914518 0.0001

Variance Equation

C 7.39E-05 2.11E-06 35.07451 0.0000
ARCH(1) 0.161294 0.020232 7.972289 0.0000

R-squared 0.004944 Mean dependent var 0.000391
Adjusted R-squared 0.004563 S.D. dependent var 0.009398
S.E. of regression 0.009377 Akaike info criterion -6.524781
Sum squared resid 0.229296 Schwarz criterion -6.515789
Log likelihood 8518.839 Hannan-Quinn criter. -6.521523
Durbin-Watson stat 2.001997

24

Dependent Variable: R_FTSE
Method: ML - ARCH (Marquardt) - Normal distribution
Date: 11/13/12 Time: 11:03
Sample: 1/01/1990 12/31/1999
Included observations: 2610
Convergence achieved after 15 iterations
Presample variance: backcast (parameter = 0.7)

GARCH = C(3) + C(4)*RESID(-1)^2 + C(5)*RESID(-2)^2 + C(6)*RESID(-3)^2
+ C(7)*RESID(-4)^2 + C(8)*RESID(-5)^2 + C(9)*RESID(-6)^2

Variable Coefficient Std. Error z-Statistic Prob.

C 0.000399 0.000162 2.455934 0.0141
R_FTSE(-1) 0.069681 0.019753 3.527547 0.0004

Variance Equation

C 3.52E-05 2.58E-06 13.65496 0.0000
ARCH(-1)^2 0.080467 0.014866 5.412946 0.0000
ARCH(-2)^2 0.131236 0.024881 5.274448 0.0000
ARCH(-3)^2 0.107555 0.022741 4.729569 0.0000
ARCH(-4)^2 0.081070 0.022648 3.579493 0.0003
ARCH(-5)^2 0.089833 0.022985 3.908289 0.0001
ARCH(-6)^2 0.123531 0.023890 5.170768 0.0000

R-squared 0.004968 Mean dependent var 0.000391
Adjusted R-squared 0.004586 S.D. dependent var 0.009398
S.E. of regression 0.009376 Akaike info criterion -6.610798
Sum squared resid 0.229290 Schwarz criterion -6.590567
Log likelihood 8636.092 Hannan-Quinn criter. -6.603469
Durbin-Watson stat 1.991464

25

GARCH Model

Dependent Variable: R_FTSE
Method: ML - ARCH (Marquardt) - Normal distribution
Date: 11/13/12 Time: 11:07
Sample: 1/01/1990 12/31/1999
Included observations: 2610
Convergence achieved after 13 iterations
Presample variance: backcast (parameter = 0.7)
GARCH = C(3) + C(4)*RESID(-1)^2 + C(5)*GARCH(-1)

Variable Coefficient Std. Error z-Statistic Prob.

C 0.000433 0.000158 2.732030 0.0063
R_FTSE(-1) 0.062548 0.020697 3.022112 0.0025

Variance Equation

C 8.22E-07 2.42E-07 3.392464 0.0007
RESID(-1)^2 0.050868 0.006659 7.639165 0.0000
GARCH(-1) 0.940258 0.007973 117.9339 0.0000

R-squared 0.004868 Mean dependent var 0.000391
Adjusted R-squared 0.004486 S.D. dependent var 0.009398
S.E. of regression 0.009377 Akaike info criterion -6.648590
Sum squared resid 0.229314 Schwarz criterion -6.637351
Log likelihood 8681.411 Hannan-Quinn criter. -6.644519
Durbin-Watson stat 1.977746

Dependent Variable: R_FTSE
Method: ML - ARCH (Marquardt) - Normal distribution
Date: 11/13/12 Time: 11:08
Sample: 1/01/1990 12/31/1999
Included observations: 2610
Convergence achieved after 40 iterations
Presample variance: backcast (parameter = 0.7)
GARCH = C(3) + C(4)*RESID(-1)^2 + C(5)*RESID(-2)^2 + C(6)*RESID(-3)^2
+ C(7)*RESID(-4)^2 + C(8)*RESID(-5)^2 + C(9)*RESID(-6)^2 + C(10)
*GARCH(-1) + C(11)*GARCH(-2) + C(12)*GARCH(-3) + C(13)*GARCH(
-4) + C(14)*GARCH(-5) + C(15)*GARCH(-6)

Variable Coefficient Std. Error z-Statistic Prob.

C 0.000434 0.000157 2.765434 0.0057
R_FTSE(-1) 0.068390 0.020316 3.366263 0.0008

Variance Equation

C 2.07E-06 1.93E-05 0.106965 0.9148
RESID(-1)^2 0.023716 0.016300 1.455038 0.1457
26

RESID(-2)^2 0.075183 0.127784 0.588359 0.5563
RESID(-3)^2 0.010935 0.432076 0.025308 0.9798
RESID(-4)^2 0.010192 0.265912 0.038327 0.9694
RESID(-5)^2 0.014333 0.168467 0.085076 0.9322
RESID(-6)^2 -0.011530 0.156800 -0.073533 0.9414
GARCH(-1) 0.510498 4.959440 0.102935 0.9180
GARCH(-2) -0.459335 0.194327 -2.363726 0.0181
GARCH(-3) 0.579821 2.185128 0.265349 0.7907
GARCH(-4) -0.234038 1.834222 -0.127595 0.8985
GARCH(-5) 0.843345 0.280605 3.005451 0.0027
GARCH(-6) -0.385907 4.042993 -0.095451 0.9240

R-squared 0.004922 Mean dependent var 0.000391
Adjusted R-squared 0.004541 S.D. dependent var 0.009398
S.E. of regression 0.009377 Akaike info criterion -6.648366
Sum squared resid 0.229301 Schwarz criterion -6.614647
Log likelihood 8691.117 Hannan-Quinn criter. -6.636151
Durbin-Watson stat 1.988911

Dependent Variable: R_FTSE
Method: ML - ARCH (Marquardt) - Normal distribution
Date: 11/13/12 Time: 11:13
Sample: 1/01/1990 12/31/1999
Included observations: 2610
Convergence achieved after 13 iterations
Presample variance: backcast (parameter = 0.7)
GARCH = C(3) + C(4)*RESID(-1)^2 + C(5)*GARCH(-1)

Variable Coefficient Std. Error z-Statistic Prob.

C 0.000433 0.000158 2.732030 0.0063
R_FTSE(-1) 0.062548 0.020697 3.022112 0.0025

Variance Equation

C 8.22E-07 2.42E-07 3.392464 0.0007
RESID(-1)^2 0.050868 0.006659 7.639165 0.0000
GARCH(-1) 0.940258 0.007973 117.9339 0.0000

R-squared 0.004868 Mean dependent var 0.000391
Adjusted R-squared 0.004486 S.D. dependent var 0.009398
S.E. of regression 0.009377 Akaike info criterion -6.648590
Sum squared resid 0.229314 Schwarz criterion -6.637351
Log likelihood 8681.411 Hannan-Quinn criter. -6.644519
Durbin-Watson stat 1.977746

27

Dependent Variable: R_FTSE
Method: ML - ARCH (Marquardt) - Normal distribution
Date: 11/13/12 Time: 11:11
Sample: 1/01/1990 12/31/1999
Included observations: 2610
Convergence achieved after 13 iterations
Presample variance: backcast (parameter = 0.7)
GARCH = C(4) + C(5)*RESID(-1)^2 + C(6)*GARCH(-1)

Variable Coefficient Std. Error z-Statistic Prob.

@SQRT(GARCH) 0.098607 0.080727 1.221495 0.2219
C -0.000347 0.000659 -0.526540 0.5985
R_FTSE(-1) 0.061636 0.020704 2.976991 0.0029

Variance Equation

C 8.68E-07 2.56E-07 3.392693 0.0007
RESID(-1)^2 0.052405 0.006845 7.655931 0.0000
GARCH(-1) 0.938191 0.008273 113.4009 0.0000

R-squared 0.005109 Mean dependent var 0.000391
Adjusted R-squared 0.004346 S.D. dependent var 0.009398
S.E. of regression 0.009378 Akaike info criterion -6.648400
Sum squared resid 0.229258 Schwarz criterion -6.634913
Log likelihood 8682.162 Hannan-Quinn criter. -6.643514
Durbin-Watson stat 1.976261

28

Grangerer Causalty:

Date: 11/13/12 Time: 11:28
Sample: 1 40
Lags: 2

Null Hypothesis: Obs F-Statistic Prob.

M1 does not Granger Cause R 38 3.22343 0.0526
R does not Granger Cause M1 12.9266 7.E-05

Pairwise Granger Causality Tests
Date: 11/13/12 Time: 11:30
Sample: 1 40
Lags: 2

Null Hypothesis: Obs F-Statistic Prob.

R does not Granger Cause M1 38 12.9266 7.E-05
M1 does not Granger Cause R 3.22343 0.0526