10 views

Uploaded by visekemi

- Irregular Waves2
- Statistics
- topic8
- 1-s2.0-S0160738313001278-main
- Managerial Economics 07042011
- STA301_LEC37
- 8726.pdf
- Automatic Detection and Classification of Beluga Whale Vocalizations
- ch08test
- Definition
- 9814272795_279573
- Forecasting
- The Residential Demand of electricity in ethiopia.pdf
- Note 32004 A
- Trend Encyclo
- SEM 1 MB0040 2 Statistics for Management
- Dissecting a Journal Paper
- Chapter 05 Forecasting Market Demand and Sales Budget.ppt
- IRJET-Rice and Jute yield forecast over Bihar region
- Measuring Security Price Performance (S. Brown, J. Warner)

You are on page 1of 20

Jennifer L. Castle

Hilary 2008

This note considers non-stationary univariate time-series. For readings see Banerjee,

Dolado, Galbraith and Hendry (1993) Ch.3&4, and Patterson, K. (2000), Ch.6. For imple-

mentation of the unit root test using OxMetrics see Hendry and Doornik (2001), Ch.4.

1 Pure random walk

We have analysed the AR(1) process in a lot of detail:

y

t

= +y

t1

+

t

,

t

WN

0,

2

(1)

A special case of interest occurs when we set = 0 and = 1. This is called a pure random

walk:

y

t

= y

t1

+

t

,

t

WN

0,

2

(2)

Likely values for are 0 < 1 for most economic time series.

If > 1 the process is explosive and it will grow without limit.

If < 0 successive values of y

t

tend to oscillate in sign.

If = 0, y

t

is a white noise process.

If 0 < < 1, the process is a stationary AR(1) process.

If = 1 the process is a random walk. If = 0 it is a pure random walk and if = 0

the process is a random walk with drift.

Why is a random walk interesting?

A random walk implies that the best guess of y

t+1

given the information at time t is y

t

.

There is no predictive structure in the AR process or the error process. Hence, the random

walk is often the baseline model for nancial and foreign exchange markets. It implies that

it is not possible to exploit the past history of y

t

or

t

to make prots by speculating on

future realisations.

1

A random walk has persistent shocks. To see this, solve recursively:

y

t

= y

t1

+

t

= y

t2

+

t1

+

t

= y

t3

+

t2

+

t1

+

t

=

.

.

.

= y

0

+

t

i=1

i

(3)

Hence, the random disturbances, or shocks, cumulate from the start of the process, and the

impact of a shock does not die away. The random walk is said to have an innite memory

it remembers the shocks forever.

A characteristic of the pure random walk is that it can wander anywhere (either positive

or negative) and it is not pulled back to a mean.

1.1 Calculating the rst and second moments of a RW

To calculate the mean take expectations of (3):

E[y

t

] = E[y

0

] +E

i=1

= y

0

+

t

i=1

E[

i

]

= y

0

(4)

where y

0

is xed. Hence, the mean is constant and nite.

To calculate the variance:

V[y

t

] =

t

(0) = E

(y

t

E[y

t

])

2

= E

i=1

=

t

i=1

E

2

i

{as E[

j

k

] = 0, j = k}

=

t

i=1

2

= t

2

(5)

2

To calculate the autocovariances:

t

(1) = E[(y

t

E[y

t

]) (y

t1

E[y

t1

])]

= E

i=1

t1

i=1

=

t1

i=1

E

2

i

{as E[

j

k

] = 0, j = k}

=

t1

i=1

2

= (t 1)

2

(6)

and in general:

t

(h) = E[(y

t

E[y

t

]) (y

th

E[y

th

])]

= E

i=1

th

i=1

=

th

i=1

2

= (t h)

2

(7)

Hence, the second moments depend on time, t.

The autocorrelation function is calculated as:

t

(h) =

t

(h)

2

t

(0)

2

th

(0)

=

(t h)

2

(t

2

) ((t h)

2

)

=

(t h)

t (t h)

=

(t h)

2

t (t h)

=

1

h

t

(8)

Recall the conditions for stationarity: {y

t

} is weakly or covariance stationary if rst

and second moments of the process exist and are time-invariant.

E [y

t

] = < t T

E [(y

t

) (y

th

)] = (h) < t, h (9)

3

Stationarity implies

t

(h) =

t

(h) = (h).

The stronger denition of strict stationarity states that {y

t

} is strictly stationary if for

any values of h

1

, h

2

, ..., h

n

the joint distribution of (y

t

, y

t+h

1

, ..., y

t+hn

) depends only on

the intervals h

1

, h

2

, ..., h

n

and not on t:

f (y

t

, y

t+h

1

, ..., y

t+hn

) = f (y

, y

+h

1

, ..., y

+hn

) t, (10)

Strict stationarity all moments are time invariant.

It is clear from the second moments that a random walk is non-stationary, as the variance

and autocovariance increases with t.

It is often difcult to distinguish between a random walk and a near random walk, i.e.

an AR(1) process with close to but less than 1 (say 0.95). This is because the process

has a lot of persistence and so over short samples can look similar to a random walk (as it

can deviate from its mean for long periods). However, as the sample size increases you will

observe the process returning to its mean more frequently and hence, there is more power

to distinguish between the two series as t increases.

1.2 Order of integration

An integrated process is one that can be made stationary by differencing. A discrete process

integrated of order d is one that can be made stationary by differencing d times. i.e

d

y

t

is

stationary where the differencing operater

d

is dened by (1 L)

d

.

Aside: If d is not an integer but a fractional value, the process is said to be fractionally

integrated of order d. This results in a process that has long memory, but we shall not

cover such processes in this course.

A random walk is integrated of order one, denoted I(1), as the rst difference

1

is a

stationary process:

y

t

= y

t1

+

t

I (1)

y

t

=

t

I (0) (11)

The rst difference is a white noise process. Integrating a White Noise process results in a

random walk.

2 Random walk with drift

The pure random walk can wander in any direction and so a priori we cannot predict which

way the process will go. If we add an intercept to the process we observe a systematic trend.

To see this solve the model recursively:

4

y

t

= +y

t1

+

t

= + ( +y

t2

+

t1

) +

t

= 2 + ( +y

t3

+

t2

) +

t1

+

t

=

.

.

.

= t +y

0

+

t

i=1

i

(12)

There is a deterministic trend component in the random walk, which induces a drift.

If the intercept is positive, the process will exhibit a positive trend, and if the intercept is

negative, there will be a downward trend.

2.1 Calculating the rst and second moments of a RW with drift

To calculate the mean, take expectations of (12):

E[y

t

] = E[t] +E[y

0

] +E

i=1

= t +y

0

(13)

The variance and autocovariance will be as before:

V[y

t

] =

t

(0) = E

(y

t

E[y

t

])

2

= t

2

(14)

t

(h) = E[(y

t

E[y

t

]) (y

th

E[y

th

])]

= (t h)

2

(15)

The rst difference of a random walk with drift is a stationary process but it will have a

non-zero constant mean determined by :

y

t

= +y

t1

+

t

I (1)

y

t

= +

t

I (0) (16)

2.2 Unit roots

Recall the AR(1) model:

y

t

= +y

t1

+

t

(1 L) y

t

= +

t

(17)

5

The characteristic equation is given by:

(1 z) = 0 (18)

which is solved when z =

1

and the process is said to have a unit root. In this case, the rst difference is stationary, as

(1 L) y

t

= y

t

y

t1

= +

t

which is stationary. Therefore, an I (1) series has 1 unit

root and an I (0) series has no unit roots.

For higher order autoregressive processes it is possible to have more than one unit root.

E.g.:

y

t

= +

1

y

t1

+

2

y

t2

+

t

1

1

L

2

L

2

y

t

= +

t

(19)

Forming the characteristic equation:

1

1

z

2

z

2

= 0

(1

1

z)

1

2

z

2

= 0 (20)

where

1

+

2

=

1

and

1

2

=

2

.

The roots are given by z

1

=

1

1

and z

2

=

1

2

. If these are both 1, there are 2 unit roots.

Hence, the process needs to be differenced twice to be stationary and is denoted I (2).

Aside: see the problem set exercise for a case with two unit roots.

For the general AR(p) model we can write the model as:

y

t

= +

p

j=1

j

y

tj

+

t

(L) y

t

= +

t

(21)

where (L) = 1

p

j=1

j

L

j

.

If a unit root exists, (L) can be factored into:

(L) = (1 L)

(L) (22)

where

(L) = 1

p1

j=1

j

L

j

such that (21) can be written as:

(1 L)

(L) y

t

= +

t

(L) y

t

= +

t

y

t

= +

p1

j=1

j

L

j

y

t

+

t

(23)

which is an AR(p-1) model in rst differences.

6

Likewise, if there are 2 unit roots, the same factorization can be applied and the model

would be an AR(p-2) process in second differences:

(1 L

2

)

(L) y

t

= +

t

2

y

t

= +

p2

j=1

j

L

j

2

y

t

+

t

(24)

where

(L) = 1

p2

j=1

j

L

j

.

3 Difference stationary and trend stationary series

Many economic time series exhibit a positive trend. One characterization of such series is

that they are random walks with drifts. Hence, they are difference-stationary as taking

the rst difference results in a stationary process. In this case, the shocks have a persistent

effect. An alternative explanation is that the data are trend stationary. A trend stationary

process has a deterministic trend, which accounts for the sustained increase in the series

over time. In this case the shocks have a transitory effect as they do not continue through

time but enter at only 1 point in time.

y

t

= +t +u

t

Trend-stationary process (25)

De-meaning by removing the trend results in a stationary process:

y

t

t = +u

t

(26)

Aside: We say that a trend stationary process is stationary despite its mean being a

function of time because the stochastic properties of the trend stationary process are en-

tirely determined by u

t

, and u

t

is clearly stationary.

The nesting model for both the trend stationary and difference stationary models is:

y

t

= +y

t1

+t +

t

t

WN

0,

2

(27)

The models nested include:

= 0; || < 1; = 0: Deterministic trend with stationary AR(1) component

= 0; = 1; = 0: Random Walk with drift and quadratic trend

= 0; = 1; = 0: Random Walk with drift

= 0; = 0; = 0: Deterministic trend

= 0; = 1; = 0: Pure Random Walk

It is important to establish whether data are trend stationary or difference stationary for

2 reasons. First, the persistence of shocks is very different between the 2 cases. Secondly, if

the data are difference stationary caution needs to be applied when analyzing non-stationary

data. Without appropriate transformations, spurious regressions could result. We shall look

at this phenomenon next.

7

3.1 Note on the quadratic trend in a unit root with determistic trend model

If:

y

t

= +y

t1

+t +

t

I (1) (28)

then we know that the rst difference will be I(0):

y

t

= +t +

t

I (0) (29)

i.e., a trend stationary process.

As y

t

I (0) this implies the cumulation of the series (i.e. the integral) would be I(1):

t

i=1

[y

i

] I (1) (30)

Therefore:

y

t

=

t

i=1

( +t +

t

)

=

t

i=1

+

t

i=1

(t) +

t

i=1

i

= t +

t (t + 1)

2

+

t

i=1

i

(31)

where

t(t+1)

2

results from the summation of an arithmetic progression. Hence, the process

has a quadratic trend in the level:

y

t

=

+

2

t +

2

t

2

+

t

i=1

i

(32)

This is intuitive as a random walk with drift has a trend and a trend stationary process has

a trend. Hence, a model with both components will have a quadratic trend. This is unlikely

to occur in practice, except perhaps locally.

4 Spurious regression

We are interested in investigating the relationship between dynamics and interdependence.

In other words, what impact does non-stationarity have when looking at the relationships

beween variables. Non-stationary processes can lead to nonsense regressions, where ap-

parent correlations are found which are in fact false. The problem of nonsense regression

was rst analysed by Yule in 1926.

8

Let us assume that we have 2 unrelated I (1) series, y

t

and x

t

, where:

y

t

= y

t1

+u

t

(33)

x

t

= x

t1

+v

t

(34)

E(u

t

v

s

) = 0, t, s; E(u

t

u

k

) = E(v

t

v

k

) = 0, k = 0 (35)

y

0

= x

0

= 0

i.e. x

t

and y

t

are uncorrelated Random Walks.

We next specify the economic hypothesis. As an example let y be the cumulative num-

ber of murders in the UK and x be the population of the UK. We are interested to see if

there is a relationship between the two variables:

y

t

=

0

+

1

x

t

+

t

(36)

where

yt

xt

=

1

. Under conventional assumptions we would calculate the t-statistic for

1

, denoted

t

1

and test the null hypothesis H

0

:

1

= 0. Under the null hypothesis, the

conventional probability of the t-test of H

0

being signicant is:

P

1

=0

2.0|H

0

= 0.05 (37)

This states that under the null hypothesis that y and x are unrelated we would expect to

observe t-values greater than 2 in absolute value with a 5% probability. As the processes

are unrelated we should nd

1

=

yt

xt

p

0.

However, when the data are non-stationary there is a balance problem when the null

hypothesis is true. As y

t

I (1) and we assume

t

I (0), equation (36) can be well

dened for non-zero

1

as x

t

I (1). If

1

= 0, the error term,

t

must be I (1) and there is

a violation of the assumptions.

The asymptotic theory for the unit root case is very complicated. An alternative way of

examining the issues is through Monte Carlo - this is where we simulate the process many

times and plot the distributions that result.

4.1 Monte Carlo

Generate y

t

and x

t

from (33) and (34) for T = 100 observations. The errors are drawn

using a random number generator for a standard normal distribution. The two series are

independently generated consistent with (35). Then estimate equation (36) and record the

estimated coefcient, standard error, t-statistic and R

2

. This process is repeated M =

10, 000 times, taking draws from the error distributions. This will give us 10,000 estimates

of:

the estimated coefcients

1,i

, i = 1, . . . , M;

the estimated coefcient standard errors

1,i

, i = 1, . . . , M;

the estimated t-statistics

t

1,i

, i = 1, . . . , M

9

the frequency of rejection of H

0

:

1

= 0;

the sample correlation between y

t

and x

t

.

Let us rst look at the estimate of

1

. The mean estimate of

1

is given by:

1

=

1

M

M

i=1

1,i

= 0.008 (38)

with a standard error of 0.006. Hence, we cannot reject H

0

: E

= 0.

The frequency distribution of

1

is recorded in gure 1. The shape of the distribution is rea-

sonably normal but the sampling standard deviation denoted MCSD is large (MCSD=0.62),

i.e. the distribution is spread out, revealing that some of the estimates are large in absolute

value.

Note the difference between MCSE and MCSD. The MCSD is what Monte Carlo re-

veals the correct value of the sampling standard deviation to be. The MCSE is what the

economist would report on average in any one regression. For M = 10, 000 replications,

MCSE = MCSD

M = MCSD

Monte Carlo estimates of E

The uncertainty is evident when looking at the distribution of the standard errors of the

coefcient estimates. Figure 2 plots the distribution of

1,i

=

T

i=1

x

2

t,i

1/2

,i

. There

is a long right hand tail suggesting that there are some very large estimates of the standard

error. This impacts on the t-test statistics, recorded in gure 3. This gure also records the t-

distribution for the same test on stationary data in the top panel for comparison. The shape

is as we would expect a t-distribution, but the draws are very spread out. The statistic

calculated in each replication doesnt have the same variance as under the stationary case.

Instead, very large values of the t-statistic are likely. For:

P ({Reject H

0

:

1

= 0} |H

0

) = 0.05 (39)

instead of the critical values being 2, the values are 14.8:

P

1

=0

14.8|H

0

0.05 (40)

This is called a nonsense regression. It is easy to observe a vary large t-statistic showing

that the 2 series have a high correlation, when in fact they are unrelated.

The nonsense regression phenomenon does not disappear as T increases. Figure 4

records the estimated coefcients and standard errors recursively for T = 20, 21, . . . , 100.

The coefcient estimates are not biased but the standard deviation of the estimates is very

large and does not decrease with T. The rejection frequency (which should be 5% under

the null) increases with the sample size to 76%.

10

3.5 3.0 2.5 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Figure 1: Frequency distribution of

1

0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65

1

2

3

4

5

6

7

8

9

10

random walks

Figure 2: Frequency distribution of

1

For non-stationary data,

1

does not converge to 0, but instead converges to a random

variable.

1

=

T

1

T

t=1

(x

t

x)

2

1

T

2

T

t=1

(x

t

x) (y

t

y) (41)

The numerator and denominator converge weakly to functionals of Brownian motion. See

Hamilton (1994) for further reading.

The reason for the over-rejection of the null is that when the data are non-stationary,

two series can be highly correlated merely because they wander in the same direction.

11

4 3 2 1 0 1 2 3 4

0.1

0.2

0.3

0.4

White noise

40 30 20 10 0 10 20 30 40

0.02

0.04

0.06

Random walks

Figure 3: Frequency distribution of the t-test of H

0

:

1

= 0

10 20 30 40 50 60 70 80 90 100

1

0

1

2ESE

2ESE

2MCSD

2MCSD

^

1

10 20 30 40 50 60 70 80 90 100

0.25

0.50

0.75

1.00

Rejection frequency

Figure 4: Recursives for nonsense regression simulation

Figure 5 records the R

2

s for a pair of I(0) and I(1) processes. When both variables are I(1)

the distribution is like a semi-ellipse, with excess frequency at both ends of the distribution.

Hence, values of R

2

well away from 0 are more likely, resulting in over-rejection of the

null.

As x

t

I (1), the sample variance is a function of T. Hence, the regression is imbal-

anced: under the null

t

I (1) and the sample variance will also be a function of T.

0

and

1

minimise

2

1

=

T

t=1

(x

t

x)

2

1

2

which is a function

of T.

12

0.4 0.3 0.2 0.1 0.0 0.1 0.2 0.3 0.4

1

2

3

4

1.25 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00

0.2

0.4

0.6

Figure 5: Distribution of R for unrelated I(0) and I(1) series

Therefore the t-test,

1

=

c

1

b

1

diverges as T .

The nonsense regression phenomenon has serious implications for modelling economic

time-series. We need methods to test for non-stationary data, and then we need to think

about how to model non-stationary data in order to avoid this nonsense regression problem.

5 Testing for unit roots Dickey-Fuller tests

We will rst consider the simplest model and then move on to more complex models. Con-

sider the AR(1) with no intercept:

y

t

= y

t1

+

t

,

t

WN

0,

2

(42)

We can rewrite this model as:

y

t

= y

t1

+

t

(43)

where = 1.

If the process has a unit root, = 1 and = 0. Hence, the test for a unit root would be

a t-test of H

0

: = 0.

5.1 Simulating the Dickey-Fuller distribution

The t-test of H

0

: = 0 doesnt have a standard distribution. Instead the critical values are

simulated.

1. Generate the data of a given sample size T according to a specied DGP. For the pure

random walk generate:

y

t

=

t

t

NID(0, 1) (44)

13

with y

0

= 0.

The values of

t

for T = 1, . . . , T are taken from a random number generator with a

distribution as specied (here a standard normal).

Repeat the process M times to generate M samples of size T.

2. Estimate a regression model for each sample generated in step 1.

The choice of regression model is important it should match the DGP. So in this

example the model estimated would not include a constant:

y

t

= y

t1

+

t

(45)

For each replication and

tion of due to random sampling. The t-statistic is plotted and we can calculate the

critical values from this distribution when the DGP is y

t

=

t

and the maintained

regression is y

t

= y

t1

+

t

. The usual choice of signicance level is used (e.g.

5% or 1% are most common).

Note that the critical values will depend on the choice of null and alternative hypothesis.

The null hypothesis is of a unit root, but the alternative could be a 1 or 2-sided test. If

H

a

: = 0, we would also be testing for an explosive process as > 0 implies > 1.

This is not chosen in general as an explosive process is unstable and is unlikely to occur

in economic data. Hence, we can maximize power by computing a 1-sided test where

H

a

: < 0. This tests the null of a unit root against a stationary process. The critical values

will be negative and when sample values are more negative than the critical values we have

a rejection of the null hypothesis in favour of a stationary process.

The null and maintained regressions that we have looked at exclude an intercept. If we

wish to extend the model to test for a random walk with drift we would have the maintained

regression:

y

t

= +y

t1

+

t

(46)

where the null hypothesis is H

0

: = 0, i.e. a unit root, and the alternative hypothesis is

H

a

: < 0.

Under the alternative, the process is stationary and the process will have a non-zero

mean given by E[y

t

] =

1

. This test will have a different distribution to the test for a pure

random walk. This is because the test statistics depend on deterministic terms. Again, test

statistics are obtained by simulation, but depend on the value of in the DGP. We think

of as a nuisance parameter for inference on as the critical values for depend on the

generally unknown value of .

An alternative to the t-test is to compute a joint F-test of signicance of both and .

In this case the joint null hypothesis is H

0

: = 0; = 0. Under the null the series is a

pure random walk without drift. Under the alternative there are 3 possibilities:

H

a

: = 0; = 0. =Random walk with drift

H

a

: = 0; = 0. =Stationary AR process with non-zero mean

14

H

a

: = 0; = 0. =Stationary AR process with zero mean

We would not anticipate a random walk with drift, as if there was a noticable drift in the

series we would include a trend in the process (see next section). The F-test is calculated as

standard but different distributions apply and these are simulated. A drawback of the joint

test is that it is 2-sided and therefore does not maximise power in the likely direction of

departure from the null hypothesis.

We can generalize the test further to include a trend, as we are concerned with testing

between a trend stationary and difference stationary process. In the models we have looked

at there is no mechanism for generating the trend under the alternative of stationarity. Hence,

the maintained regression would be:

y

t

= +y

t1

+t +

t

(47)

where the null hypothesis is H

0

: = 0, i.e. a unit root, and the alternative hypothesis is

H

a

: < 0.

Again, a different distribution is used to obtain the critical values and is simulated by

computer packages.

If a joint F-test of signicance was computed the null hypothesis would be H

0

: =

0; = 0, i.e. a random walk with drift and no deterministic trend. Under the alternative

there are 3 possibilities:

H

a

: = 0; = 0. = Random walk with drift and deterministic trend (i.e. a

quadratic trend).

H

a

: = 0; = 0. =Stationary AR process with deterministic trend

H

a

: = 0; = 0. =Stationary AR process with no deterministic trend.

Aquadratic trend is unlikely, and would be observable in the data, ruling out the rst hypoth-

esis. The most likely alternative is a difference stationary process (as a trend was observed

in the data). If the third hypothesis were true, no trend would be observed and you would

apply the DF test to a model with just an intercept and not a trend.

Tests often have low power when the null and the alternative are close (i.e. a stationary

AR process with a very high persistence parameter as opposed to a unit root). Various alter-

native tests have been proposed in the literature to try and improve on the power, although

none are uniformly more powerful. This suggests that caution should be used against too

rigid an application of the unit root test. In practice there is considerable uncertainty in de-

termining the exact nature of the DGP. This is further complicated when there are structural

breaks in the data. Hence, further evidence is often sort before being rm on the null and

alternative hypothesis at the margin.

The distributions of the t-statistic vary depending on whether there is no intercept, an

intercept, or an intercept and trend in the model.

Summary

Three cases to consider:

15

Test Model Hypothesis

y

t

= y

t1

+

t

H

0

: = 0

y

t

= +y

t1

+

t

H

0

: = 0 or H

0

: = 0; = 0

y

t

= +y

t1

+t +

t

H

0

: = 0 or H

0

: = 0; = 0

where the critical values depend on the specication of the null and alternative hypotheses.

There are different distributions of test statistic depending on the deterministic terms. A

one-sided test is usually used to maximise power:

H

0

: = 0 versus H

a

: < 0. (48)

For examples of the critical values:

Distribution 2.5% 5% 10% 50% 90% 95% 97.5%

N(0, 1) -1.96 -1.64 -1.28 0 1.28 1.64 1.96

DF

DF

Figure 6 records the distribution of the DF tests with no intercept and intercept against

a standard normal. The distributions are more negative. Also observe that for a given

signicance level,

<

< .

4 3 2 1 0 1 2 3 4

0.1

0.2

0.3

0.4

0.5

No deterministic terms

N(0,1)

DF

5 4 3 2 1 0 1 2 3 4

0.2

0.4

Intercept

N(0,1)

DF

16

6 Augmented Dickey-Fuller test

So far we have considered an AR(1) process. However, this is a simple model that may not

characterize actual economic data. A simple generalization is the AR(p) model:

y

t

= +

1

y

t1

+

2

y

t2

+. . . +

p

y

tp

+

t

(49)

If an AR(1) model were tted, say:

y

t

= +

1

y

t1

+v

t

(50)

then:

v

t

= +

2

y

t2

+. . . +

p

y

tp

+

t

(51)

and the autocorrelations of v

t

and v

tk

will be non-zero. The residuals will be autocorre-

lated and we fail the assumptions of the classical regression model.

The strategy for selecting the lag of an AR process is to start from a long lag and test

downwards. This is called a general-to-specic strategy. You would specify a long lag

and test the signicance of the longest lag and compute the diagnostic tests. If the longest

lag is insignicant, and the model passes the diagnostic tests you would delete the lag and

re-estimate the model with 1 fewer lag. Again you would test for the signicance of the

longest lag and check the diagnostic tests. You would repeat the procedure until either the

longest lag was signicant or deleting the lag led to a failure of a diagnostic test.

To compute the Augmented Dickey-Fuller test, consider an AR(2) process:

y

t

= +

1

y

t1

+

2

y

t2

+

t

(52)

We can rewrite this as:

y

t

y

t1

= +

1

y

t1

y

t1

+ [

2

y

t1

2

y

t1

] +

2

y

t2

+

t

y

t

= + (

1

+

2

1) y

t1

2

y

t1

+

t

y

t

= +y

t1

+y

t1

+

t

(53)

where:

=

1

+

2

1

=

2

(54)

The test statistic is as before a t-test on , where H

0

: = 0 is a unit root and H

a

: < 0 is

a stationary process. This is called an Augmented Dickey-Fuller test as the test is augmented

by y

t1

to mop up any residual autocorrelation and pre-whiten the residuals.

The distributions of the test statistics are as before. They depend on whether an intercept

and trend is included in the model but they dont depend on the augmentation of the lagged

differences.

We can generalize the model to an AR(3) process:

y

t

= +

1

y

t1

+

2

y

t2

+

3

y

t3

+

t

(55)

17

We can rewrite this as:

y

t

y

t1

= +

1

y

t1

y

t1

+ [

2

y

t1

2

y

t1

] + [

3

y

t1

3

y

t1

] +

2

y

t2

+

3

y

t3

+

t

y

t

= + (

1

+

2

+

3

1) y

t1

2

y

t1

3

y

t1

+

2

y

t2

+

3

y

t3

+ [

3

y

t2

3

y

t2

] +

t

= + (

1

+

2

+

3

1) y

t1

(

2

+

3

) y

t1

3

y

t2

y

t

= +y

t1

+

1

y

t1

+

2

y

t2

+

t

(56)

where:

=

1

+

2

+

3

1

1

= (

2

+

3

)

2

=

3

(57)

The general AR(p) model with intercept and trend is:

y

t

= +

p

j=1

j

y

tj

+t +

t

y

t

= +y

t1

+

p1

j=1

j

y

tj

+t +

t

(58)

where

=

1

+

2

+ +

p

1,

1

= (

2

+ +

p

)

.

.

.

p1

=

p

(59)

6.1 A framework for unit root testing

1. Plot the data. Does it look stationary or non-stationary? Is there a trend evident. Is

the mean non-zero?

2. Plot the ACF and PACF. How persistent is the data? How many lags look like being

required to ensure the residual is white noise?

3. Start with a long lag and estimate an AR(p) model. Check the diagnostic tests. Re-

duce the model by eliminating the longest lag if it is statistically insignicant, whilst

ensuring the residuals are not autocorrelated. Stop when the longest lag is signicant,

or deletion of the longest lag results in an autocorrelated error. This will determine

the lag order for your AR(p) model.

4. Transform the model to the maintained Dickey-Fuller regression, i.e. the model in

differences with the lagged level.

18

5. Decide whether to include an intercept and trend based on your graphical analysis.

If the data exhibit a trend, you would test a random walk with drift against a trend

stationary process. Hence, include an intercept and trend in the model. If there is no

trend in the data but the mean is non-zero, include an intercept.

6. Compute the ADF statistic and compare to the Dickey-Fuller distribution. Recall that

the null hypothesis is non-stationarity against the alternative that is stationary.

7. If you cannot reject the null hypothesis, you would conclude that the data are likely to

be non-stationary. The data could be integrated of order 1 or higher. To test whether

the data are integrated of a higher order, apply the same test to the differences of the

process, i.e.

y

t

= +

p

j=1

j

y

tj

+t +

t

2

y

t

= +y

t1

+

p1

j=1

2

y

tj

+t +

t

(60)

The null hypothesis, H

0

: = 0 is for y

t

to have a unit root, against the alternative

that H

a

: < 0, such that y

t

is stationary. Hence, under the null, y

t

is I(2) and

under the alternative y

t

is I(1).

8. This procedure can be applied recursively. If you nd the process to be non-stationary,

test for I(1) versus I(2), etc.

6.2 Structural Breaks

One problem with testing for unit roots is that a structural break can lead to incorrect infer-

ence regarding unit roots. In this case an I(0) process with break is difcult to distinguish

from an I(1) process.

Consider an articial process in which we generate data given by:

y

t

=

0.5y

t1

+

t

for t = 1, . . . , 50

4 + 0.5y

t1

+

t

for t = 51, . . . , 100

(61)

where

t

NID(0, 1) for t = 1, . . . , T.

The process is a stationary AR process but with a mean shift at time 50.

If we t an AR(1) model to the full sample we nd:

y

t

= 0.96

(0.03)

y

t1

+ 0.26

(0.18)

There is a coefcient of 0.96 on the lagged dependent variable. This is a near unit root, and

a statistical test would not reject the null hypothesis of a unit root. This is clearly incorrect.

19

The reason for such a high coefcient is that the process needs a mechanism for getting up

to the new mean and the only way to do this is to put a near unit coefcient on the lagged

dependent variable. Figure 7 shows the properties of the estimated model. It is clearly

misspecied and incorrect inference would result.

0 20 40 60 80 100

0.0

2.5

5.0

7.5

10.0

Ya Fitted

0.0 2.5 5.0 7.5 10.0

0

.

0

2

.

5

5

.

0

7

.

5

1

0

.

0

Fitted

Y

a

0 20 40 60 80 100

2

0

2

4

r:Ya (scaled)

0 5 10

0.5

0.0

0.5

1.0

ACFr:Ya PACFr:Ya

Figure 7: Properties of estimated AR(1) model

7 Conclusion

Many economic time series have unit roots. A general rule of thumb is that often nominal

variables are I(2), real variables are I(1), and growth rates of real variables are I(0). We can

test for unit roots and next we shall consider how to model them.

8 References

Banerjee, A., Dolado, J. J., Galbraith, J.W., and Hendry, D. F. (1993). Cointegration, Error

Correction and the Econometric Analysis of Non-Stationary Data. Oxford: Oxford Univer-

sity Press.

Hendry, D. F., and Doornik, J. A. (2001). Empirical Econometric Modelling using PcGive:

Volume I, 3rd edn. London: Timberlake Consultants Press.

Patterson, K. (2000). An Introduction to Applied Econometrics. A Time Series Approach.

UK: Palgrave Macmillan.

20

- Irregular Waves2Uploaded byArnold Quispe
- StatisticsUploaded byJon West
- topic8Uploaded byJun Nyap
- 1-s2.0-S0160738313001278-mainUploaded byRaghu Verma
- Managerial Economics 07042011Uploaded byHarpreet Dhillon
- STA301_LEC37Uploaded byziaf
- 8726.pdfUploaded bydavidguachalla10
- Automatic Detection and Classification of Beluga Whale VocalizationsUploaded bySEP-Publisher
- ch08testUploaded byĐức Đặng
- DefinitionUploaded byShashi Bhandari
- 9814272795_279573Uploaded bymanuelq9
- ForecastingUploaded byJohn Steve
- The Residential Demand of electricity in ethiopia.pdfUploaded byteddy getachew
- Note 32004 AUploaded byJose Alirio
- Trend EncycloUploaded by12345fff12345
- SEM 1 MB0040 2 Statistics for ManagementUploaded byKumar Gollapudi
- Dissecting a Journal PaperUploaded byalyr
- Chapter 05 Forecasting Market Demand and Sales Budget.pptUploaded byAshish Singh
- IRJET-Rice and Jute yield forecast over Bihar regionUploaded byIRJET Journal
- Measuring Security Price Performance (S. Brown, J. Warner)Uploaded byAlizada Huseynov
- 2006101392343361[1]Uploaded byNigel Douch
- hoenig2.pdfUploaded byyoxstl
- Time SeriesUploaded byzamir
- Chapter 10Uploaded byVenkata Kalyan
- 18_HW_Assignment_Biostat.docxUploaded byjhon
- !kirby-koury final paperUploaded byapi-249262732
- lindsay new spssUploaded byapi-242910883
- Table of Content - EditedUploaded byprincenaqib
- Inferential StatisticsUploaded byxandercage
- 1986_Krajca_VpretermUploaded byVladimír Krajča

- Microfit guide2Uploaded byJuan Sapena
- chebi-Ch7Uploaded byVijaya Kumar Selvaraj
- Munyama Modelling TechnicalUploaded byRohit Bhide
- Math Midterm PptUploaded byEleisha Rosete
- chap2_solution manual _ elements of Information thoery,pdfUploaded byYahia Shabara
- Minitab Survey Analysis With ANOVA and MoreUploaded byAdnan Mustafić
- Silabus-S2-STATISTIKA 2009-2014Uploaded byDebrina Gonzaga
- 4.2 Binomial DistributionsUploaded bysakshi
- Final Lastminute.com Case StudyUploaded byRaphaSilva
- NumXL OverviewUploaded bySpider Financial
- Data AnalyticsUploaded bycmukherjee
- Econ 251 examUploaded byjoe
- Session3__PSQT_dkjUploaded bydharmendrakumar_jaiswal
- Exponential Distribution Theory and MethodsUploaded byIbad Tantawi
- A General Formula for Channel CapacityUploaded byngoc
- Adapted Spearman Rank Corr CoefficientUploaded byaby251188
- AP Statistics TutorialUploaded byMahmudul Hasan Rabby
- MC0074Uploaded byPrashant Agrawal
- Utility of Critical Items Within the Recognition Memory Test and Word Choice Test (2017)Uploaded byAlina Patrascu
- Experimental DesignUploaded byadiraju07
- CRLB Vector ProofUploaded byleena_shah25
- Two SampleUploaded byRohit Singh
- Bayesian ForecastingUploaded byaclose
- A Lesson Plan in StatisticsUploaded byJohn Paul Ore
- Hypothesis Testing on a Single Population MeanUploaded byEiyra Nadia
- Growth Curve Analysis in RUploaded byJosé Murilo Costa Silva
- ProbabilityUploaded bylynajam
- Markov ChainsUploaded byYamin Laode
- SPSS- how toUploaded byNikhil Sharma
- 14. Waiting Line ManagementUploaded byAmro Banjar