Dougherty5e C13G01 2016 05 23

Type author name/s here
Dougherty
Introduction to Econometrics,
5th edition
Chapter heading
Chapter 13: Introduction to
Nonstationary Time Series
© Christopher Dougherty, 2016. All rights reserved.

STATIONARY PROCESSES
10 X t   2 X t 1   t
 2  0.8
 t ~ N 0,1
5
0
0 10 20 30 40 50
-5
-10
In this slideshow we will define what is meant by a stationary time series process.
1
10 X t   2 X t 1   t
 2  0.8
 t ~ N 0,1
5
0
0 10 20 30 40 50
-5
-10
We will begin with a very simple example, the AR(1) process Xt = b2Xt–1 + et where │b2│ < 1
and et has an iid—independently and identically distributed—normal distribution with zero
mean and finite variance.
2
10 X t   2 X t 1   t
 2  0.8
 t ~ N 0,1
5
0
0 10 20 30 40 50
-5
-10
As noted in Chapter 11, we make a distinction between the potential values {X1, ..., XT},
before the sample is generated, and a realization of actual values {x1, ..., xT}.
3
10 X t   2 X t 1   t
 2  0.8
 t ~ N 0,1
5
0
0 10 20 30 40 50
-5
-10
Statisticians write the potential values in upper case, and the actual values of a particular
realization in lower case, to emphasize the distinction.
4
10 X t   2 X t 1   t
 2  0.8
 t ~ N 0,1
5
0
0 10 20 30 40 50
-5
-10
The figure shows an example of a realization starting with X0 = 0, with b2 = 0.8 and the
innovation et being drawn randomly for each time period from a normal distribution with
zero mean and unit variance.
5
10 X t   2 X t 1   t
 2  0.8
 t ~ N 0,1
5
0
0 10 20 30 40 50
-5
-10
Because history cannot repeat itself, we will only ever see one realization of a time series
process. Nevertheless, it is meaningful to ask whether we can determine the potential
distribution of X at time t, given information at some earlier period, for example, time 0.
6
10 X t   2 X t 1   t
 2  0.8
 t ~ N 0,1
5
0
0 10 20 30 40 50
-5
-10
As usual, there are two approaches to answering this question: mathematical analysis and
simulation. We shall do both for the time series process represented by the figure, starting
with a simulation.
7
10 X t   2 X t 1   t
 2  0.8
 t ~ N 0,1
5
-5
-10
0 10 20 30 40 50
The figure shows 50 realizations of the process.
8
10 X t   2 X t 1   t
 2  0.8
 t ~ N 0,1
5
-5
-10
0 10 20 30 40 50
For the first few periods, the distribution of the realizations at time t is affected by the fact
that they have a common starting point of 0. However, the initial effect soon becomes
unimportant and the distribution becomes stable from one period to the next.
9
Histogram of ensemble distribution of X20
0.30
0.20
0.10
0.00
–5 to –4 –4 to –3 –3 to –2 –2 to –1 –1 to 0 0 to 1 1 to 2 2 to 3 3 to 4 4 to 5
The figure presents a histogram of the values of X20. Apart from the first few time points,
histograms for other time points would look similar. If the number of realizations were
increased, each histogram would converge to the normal distribution shown in Figure 13.3.
10
10 X t   2 X t 1   t
 2  0.8
 t ~ N 0,1
5
-5
-10
0 10 20 30 40 50
The AR(1) process Xt = b2Xt–1 + et , with │b2│ < 1, is said to be stationary, the adjective
referring, not to Xt itself, but to the potential distribution of its realizations, ignoring
transitory initial effects.
11
10 X t   2 X t 1   t
 2  0.8
 t ~ N 0,1
5
-5
-10
0 10 20 30 40 50
Xt itself changes from period to period, but the potential distribution of its realizations at
any given time point does not.
12
10 X t   2 X t 1   t
 2  0.8
 t ~ N 0,1
5
-5
-10
0 10 20 30 40 50
The potential distribution at time t is described as the ensemble distribution at time t, to

emphasize the fact that we are talking about the distribution of a cross-section of
realizations, not the ordinary distribution of a random variable.
13
10 X t   2 X t 1   t
 2  0.8
 t ~ N 0,1
5
-5
-10
0 10 20 30 40 50
In general, a time series process is said to be stationary if its ensemble distribution

satisfies three conditions:
14
10 X t   2 X t 1   t
 2  0.8
 t ~ N 0,1
5
-5
-10
0 10 20 30 40 50
1. The population mean of the distribution is independent of time. (In this example, it is
zero.)
2. The population variance of the distribution is independent of time.
15
10 X t   2 X t 1   t
 2  0.8
 t ~ N 0,1
5
-5
-10
0 10 20 30 40 50
3. The population covariance between its values at any two time points depends only on
the distance between those points, and not on time.
16
10 X t   2 X t 1   t
 2  0.8
 t ~ N 0,1
5
-5
-10
0 10 20 30 40 50
This definition of stationarity is known as weak stationarity or covariance stationarity. For

the definition of strong stationarity, (1) and (2) are replaced by the condition that the whole
potential distribution is independent of time.
17
10 X t   2 X t 1   t
 2  0.8
 t ~ N 0,1
5
-5
-10
0 10 20 30 40 50
Our analysis will be unaffected by using the weak definition, and in any case the distinction
disappears when, as in the present example, the limiting distribution is normal.
18
Conditions for weak stationarity:

1. The population mean of the distribution is independent of time.
3. The population covariance between its values at any two time points depends only
on the distance between those points, and not on time.
X t   2 X t 1   t  1  2  1
X t 1   2 X t  2   t 1
We will check that the process represented by Xt = b2Xt–1 + et , with │b2│ < 1, satisfies the three
conditions for stationarity. First, if the process is valid for time period t, it is also valid for
time period t – 1.
19

X t   2 X t 1   t  1  2  1
X t 1   2 X t  2   t 1
X t   22 X t  2   2 t 1   t
Substituting into the original model, one has Xt in terms of Xt–2, et, and et–1.
20

X t   2 X t 1   t  1  2  1
X t 1   2 X t  2   t 1
X t   22 X t  2   2 t 1   t
X t   2t X 0   2t 1 1  ...   22 t  2   2 t 1   t
E  X t    2t X 0
Lagging and substituting t – 1 times, one has Xt in terms of X0 and all the innovations e1, ...,
et from period 1 to period t.
21

X t   2 X t 1   t  1  2  1
X t 1   2 X t  2   t 1
X t   22 X t  2   2 t 1   t
X t   2t X 0   2t 1 1  ...   22 t  2   2 t 1   t
E  X t    2t X 0
Hence E(Xt) = b2tX0 since the expectation of every innovation is zero. In the special case X0
= 0, we then have E(Xt) = 0. Since the expectation is not a function of time, the first
condition is satisfied.
22

X t   2 X t 1   t  1  2  1
X t 1   2 X t  2   t 1
X t   22 X t  2   2 t 1   t
X t   2t X 0   2t 1 1  ...   22 t  2   2 t 1   t
E  X t    2t X 0
If X0 is nonzero, b2t tends to zero as t becomes large since │b2│<1. Hence b2tX0 will tend to
zero and the first condition will still be satisfied, apart from initial effects.
23

X t   2t X 0   2t 1 1  ...   22 t  2   2 t 1   t

var X t   var  2t X 0   2t 1 1  ...   22 t  2   2 t 1   t 
 var  2t 1 1   ...  var  22 t  2   var 2 t 1   var t 
  22 t 1  2  ...   24 2   22 2   2
   2  2
2t
  2  ...   2   2  1   
2  t 1  1
4 2 2

2  
 1  2 
Next, we have to show that the variance is also not a function of time. The first term in the
variance expression, b2tX0, can be dropped because it is a constant, using variance rule 4
from the Review Chapter.
24

X t   2t X 0   2t 1 1  ...   22 t  2   2 t 1   t

var X t   var  2t X 0   2t 1 1  ...   22 t  2   2 t 1   t 
  22 t 1  2  ...   24 2   22 2   2
   2  2
2t
  2  ...   2   2  1   
2  t 1  1
4 2 2

2  
 1  2 
The variance expression can be decomposed as the sum of the variances, using variance
rule 1 from the Review chapter and the fact that the covariances are all zero. (The
innovations are assumed to be generated independently.)
25

X t   2t X 0   2t 1 1  ...   22 t  2   2 t 1   t

var X t   var  2t X 0   2t 1 1  ...   22 t  2   2 t 1   t 
  22 t 1  2  ...   24 2   22 2   2
   2  2
2t
  2  ...   2   2  1   
2  t 1  1
4 2 2

2  
 1  2 
In the third line, the constants are squared when taken out of the variance expressions,
using variance rule 2.
26

Z   22 t 2 of
1. The population mean of the distribution is independent  22 t 4  ...   22  1
 time.
2. The population variance of the distribution2 is independent of
2 t time.

3. The population covariance between its values
2 Z   2t
at any  
2 two time
2
2
 ...   2
points depends
2 only
1   Z   1  
on the distance between those points, and 2 not on time. 2 t
2 2
X t   2t X 0   2t 1 1  ...   22 t  2   2 t 1   t

var X t   var  2t X 0   2t 1 1  ...   22 t  2   2 t 1   t 
  22 t 1  2  ...   24 2   22 2   2
   2  2
2t
  2  ...   2   2  1   
2  t 1  1
4 2 2

2  
 1  2 
The final line involves the standard summation of a geometric progression.
27

X t   2t X 0   2t 1 1  ...   22 t  2   2 t 1   t

 1   22 t  2  1  2
var  X t      
2  
 
 1  2   1  2 
2
Given that │b2│ < 1, b22t tends to zero as t becomes large. Thus, ignoring transitory initial
effects, Thus the variance tends to a limit that is independent of time.
28
10
-5
-10
0 10 20 30 0.3040 50
 1   22 t  2  1  2
var  X t      
2  
 
0.20
 1  2   1  2 
2
0.10
0.00
–5 to –4 –4 to –3 –3 to –2 –2 to –1 –1 to 0 0 to 1 1 to 2 2 to 3 3 to 4 4 to 5
This is the variance of the ensemble distribution shown in the figures.
29

X t   2 X t 1   t
X t  s   2 X t  s 1   t  s
It remains for us to demonstrate that the covariance between Xt and Xt+s is independent of
time. If the relationship is valid for time period t, it is also valid for time period t+s.
30

X t   2 X t 1   t
X t  s   2 X t  s 1   t  s
X t  s   22 X t  s  2   2 t  s 1   t  s
Lagging and substituting, we can express Xt+s in terms of Xt+s–2 and the innovations et+s–1 and
et+s.
31

X t   2 X t 1   t
X t  s   2 X t  s 1   t  s
X t  s   22 X t  s  2   2 t  s 1   t  s
X t  s   2s X t   2s 1 t 1  ...   22 t  s  2   2 t  s 1   t  s
Lagging and substituting s times, we can express Xt+s in terms of Xt and the innovations et+1,
..., et+s.
32

X t  s   2s X t   2s 1 t 1  ...   22 t  s  2   2 t  s 1   t  s
cov X t , X t  s   cov X t ,  2s X t 
 cov X t ,  2s 1 t 1  ...   22 t  s  2   2 t  s 1   t  s  
  2s var  X t 
Then the covariance between Xt and Xt+s is given by the expression shown. The second
term on the right side is zero because Xt is independent of the innovations after time t.
33

X t  s   2s X t   2s 1 t 1  ...   22 t  s  2   2 t  s 1   t  s
cov X t , X t  s   cov X t ,  2s X t 
 cov X t ,  2s 1 t 1  ...   22 t  s  2   2 t  s 1   t  s  
  2s var  X t 
The first term can be written b2s var(Xt). As we have just seen, var(Xt) is independent of t,
apart from a transitory initial effect. Hence the third condition for stationarity is also
satisfied.
34

X t   1   2 X t 1   t
Suppose next that the process includes an intercept b1. How does this affect its
properties? Is it still stationary?
35

X t   1   2 X t 1   t
X t   1  2  1   22 X t  2   2 t 1   t
Lagging and substituting, we can express Xt in terms of Xt–2 and the innovations et and et–1.
36

X t   1   2 X t 1   t
X t   1  2  1   22 X t  2   2 t 1   t
X t   2t X 0   1  2t 1  ...   2  1   2t 1 1  ...   22 t  2   2 t 1   t
1   t
  2t X 0   1 2
  2t 1 1  ...   22 t  2   2 t 1   t
1  2
Lagging and substituting t times, we can express Xt in terms of X0 and the innovations e1, ...,
et.
37

X t   1   2 X t 1   t
X t   1  2  1   22 X t  2   2 t 1   t
X t   2t X 0   1  2t 1  ...   2  1   2t 1 1  ...   22 t  2   2 t 1   t
1   t
  2t X 0   1 2
  2t 1 1  ...   22 t  2   2 t 1   t
1  2
1   t
1
E X t    2 X 0  1
t 2

1  2 1  2
Taking expectations, E(Xt) tends to b1/(1 – b2) since the term b2t tends to zero. Thus the
expectation is now non-zero, but it remains independent of time.
38

1   t
X t   2t X 0   1 2
  2t 1 1  ...   22 t  2   2 t 1   t
1  2
 t 1   2t 
var X t   var  2 X 0   1   2  1  ...   2  t  2   2 t 1   t 
t 1 2
 1  2 
 var 2t 1 1  ...   22 t  2   2 t 1   t 
 1   22 t  2  2
   
2  
 1   2  1   2
2
The variance is unaffected by the addition of a constant in the expression for Xt (variance
rule 4). Thus it remains independent of time, apart from initial effects.
39

X t   1   2 X t 1   t
X t  s   1   2 X t  s 1   t  s
Finally, we need to consider the covariance of Xt and Xt+s. If the relationship is valid for time
period t, it is also valid for time period t+s.
40

X t   1   2 X t 1   t
X t  s   1   2 X t  s 1   t  s
X t  s   1  2  1   22 X t  s  2   2 t  s 1   t  s
Lagging and substituting, we can express Xt+s in terms of Xt+s–1, the innovations et+s–1 and ...,
et+s, and a term involving b1.
41

X t   1   2 X t 1   t
X t  s   1   2 X t  s 1   t  s
X t  s   1  2  1   22 X t  s  2   2 t  s 1   t  s
X t  s   1  2s 1  ...   22   2  1   2s X t
  2s 1 t 1  ...   22 t  s  2   2 t  s 1   t  s
Lagging and substituting s times, we can express Xt+s in terms of Xt, the innovations et+1, ...,
et+s, and a term involving b1.
42

X t   1   2 X t 1   t
X t  s   1   2 X t  s 1   t  s
X t  s   1  2  1   22 X t  s  2   2 t  s 1   t  s
X t  s   1  2s 1  ...   22   2  1   2s X t
  2s 1 t 1  ...   22 t  s  2   2 t  s 1   t  s
The covariance of Xt and Xt+s is not affected by the inclusion of this term because it is a
constant. Hence the covariance is the same as before and remains independent of t.
43

X t   1   2 X t 1   t
1   t
X t   2t X 0   1 2
  2t 1 1  ...   22 t  2   2 t 1   t
1  2
1   t
1
E X t    2 X 0  1
t 2

1  2 1  2
 1   22 t  2  1  2
var  X t      
2  
 
 1  2   1  2 
2
We have seen that the process Xt = b1 + b2Xt–1 + et has a limiting ensemble distribution with
mean b1/(1 – b2) and variance se2 / (1 – b22). However, the process exhibits transient time-
dependent initial effects associated with the starting point X0.
44

X t   1   2 X t 1   t
1   t
X t   2t X 0   1 2
  2t 1 1  ...   22 t  2   2 t 1   t
1  2
1 1
X0   0
1  2 1   2 
2
We can get rid of the transient effects by determining X0 as a random draw from the
ensemble distribution. e0 is a random draw from the distribution of e at time zero.
(Checking that X0 has the ensemble mean and variance is left as an exercise.)
45

X t   1   2 X t 1   t
1   t
X t   2t X 0   1 2
  2t 1 1  ...   22 t  2   2 t 1   t
1  2
1 1
X0   0
1  2 1   2 
2
If we determine X0 in this way, the expectation and variance of the process both become
strictly independent of time.
46

1 1
X t   1   2 X t 1   t X0   0
1  2 1   2 
2
1   t
X t   2t X 0   1 2
  2t 1 1  ...   22 t  2   2 t 1   t
1  2
t 1 1  1   2t
X t   2         2  1  ...   2 t 1   t
t 1
 1  2 1   2   1   2
2 0 1
Substituting for X0, Xt is equal to b1/(1 – b2) plus a linear combination of the innovations
e0, ..., et.
47

t 1 1  1   2t
X t   2         2  1  ...   2 t 1   t
t 1
 1  2 1   2   1   2
2 0 1
1 1
   2t    2  1  ...   2 t 1   t
t 1
1  2 1   2 
2 0
1
E X t  
1  2
Hence E(Xt) is a constant and strictly independent of t for all t.
48

1 1
Xt    2t    2  1  ...   2 t 1   t
t 1
1  2 1   2 
2 0
 t 1 
var  X t   var  2  0   2  1  ...   2 t 1   t 
t 1
 1   2 
2

 22 t  1   2t
  2
  2
  2
 2
 
1   22    1   22   1   22
The right side of the equation can be decomposed as the sum of the variances because all
the covariances are zero, the innovations being generated independently. As always
(variance rule 2), the multiplicative constants are squared in the decomposition.
49

1 1
Xt    2t    2  1  ...   2 t 1   t
t 1
1  2 1   2 
2 0
 t 1 
var  X t   var  2  0   2  1  ...   2 t 1   t 
t 1
 1   2 
2

 22 t  1   2t
  2
  2
  2
 2
 
1   22    1   22   1   22
The sum of the variances attributable to the innovations e1, ..., et has already been derived
above. Taking account of the variance of e0, the total is now strictly independent of time.
50
15 X t  1.0  0.8 X t 1   t
 t ~ N 0,1
10
-5
0 10 20 30 40 50
The figure shows 50 realizations with X0 treated in this way. This is the counterpart of the
ensemble distribution shown near the beginning of this sequence, with b2 = 0.8 as in that
figure. As can be seen, the initial effects have disappeared.
51
15 X t  1.0  0.8 X t 1   t
 t ~ N 0,1
10
-5
0 10 20 30 40 50
The other difference in the figures results from the inclusion of a nonzero intercept. In the
earlier figure, b1 = 0. In this figure, b1 = 1.0 and the mean of the ensemble distribution is
b1 / (1 – b2) = 1 / (1 – 0.8) = 5.
52
15 X t  1.0  0.8 X t 1   t
 t ~ N 0,1
10
-5
0 10 20 30 40 50
Which is the more appropriate assumption: X0 fixed or X0 a random draw from the ensemble
distribution? If the process really has started at time 0, then X0 = 0 is likely to be the
obvious choice.
53
15 X t  1.0  0.8 X t 1   t
 t ~ N 0,1
10
-5
0 10 20 30 40 50
However, if the sample of observations is a time slice from a series that had been
established well before the time of the first observation, then it will usually make sense to
treat X0 as a random draw from the ensemble distribution.
54
15 X t  1.0  0.8 X t 1   t
 t ~ N 0,1
10
-5
0 10 20 30 40 50
As will be seen in another sequence, evaluation of the power of tests for nonstationarity
can be sensitive to the assumption regarding X0.
55
15 X t  1.0  0.8 X t 1   t
 t ~ N 0,1
10
-5
0 10 20 30 40 50
Typically the most appropriate way of characterizing a stationary process is to avoid

transient initial effects by treating X0 as a random draw from the ensemble distribution.
56
Copyright Christopher Dougherty 2016.
These slideshows may be downloaded by anyone, anywhere for personal use.

Subject to respect for copyright and, where appropriate, attribution, they may be
used as a resource for teaching an econometrics course. There is no need to
refer to the author.
The content of this slideshow comes from Section 13.1 of C. Dougherty,

Introduction to Econometrics, fifth edition 2016, Oxford University Press.
Additional (free) resources for both students and instructors may be
downloaded from the OUP Online Resource Centre
http://www.oxfordtextbooks.co.uk/orc/dougherty5e/.
Individuals studying econometrics on their own who feel that they might benefit
from participation in a formal course should consider the London School of
Economics summer school course
EC212 Introduction to Econometrics
http://www2.lse.ac.uk/study/summerSchools/summerSchool/Home.aspx
or the University of London International Programmes distance learning course
EC2020 Elements of Econometrics
www.londoninternational.ac.uk/lse.
2016.05.23

Dougherty5e C13G01 2016 05 23

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Dougherty5e C13G01 2016 05 23

Uploaded by

Copyright:

Available Formats

Type author name/s here

© Christopher Dougherty, 2016. All rights reserved.

The figure shows 50 realizations of the process.

Histogram of ensemble distribution of X20

The potential distribution at time t is described as the ensemble distribution at time t, to

In general, a time series process is said to be stationary if its ensemble distribution

This definition of stationarity is known as weak stationarity or covariance stationarity. For

Conditions for weak stationarity:

Conditions for weak stationarity:

Conditions for weak stationarity:

Conditions for weak stationarity:

Conditions for weak stationarity:

Conditions for weak stationarity:

X t   2t X 0   2t 1 1  ...   22 t  2   2 t 1   t

Conditions for weak stationarity:

X t   2t X 0   2t 1 1  ...   22 t  2   2 t 1   t

Conditions for weak stationarity:

X t   2t X 0   2t 1 1  ...   22 t  2   2 t 1   t

Conditions for weak stationarity:

X t   2t X 0   2t 1 1  ...   22 t  2   2 t 1   t

The final line involves the standard summation of a geometric progression.

Conditions for weak stationarity:

X t   2t X 0   2t 1 1  ...   22 t  2   2 t 1   t

This is the variance of the ensemble distribution shown in the figures.

Conditions for weak stationarity:

Conditions for weak stationarity:

Conditions for weak stationarity:

X t  s   2s X t   2s 1 t 1  ...   22 t  s  2   2 t  s 1   t  s

Conditions for weak stationarity:

X t  s   2s X t   2s 1 t 1  ...   22 t  s  2   2 t  s 1   t  s

Conditions for weak stationarity:

X t  s   2s X t   2s 1 t 1  ...   22 t  s  2   2 t  s 1   t  s

Conditions for weak stationarity:

Conditions for weak stationarity:

Conditions for weak stationarity:

Conditions for weak stationarity:

Conditions for weak stationarity:

Conditions for weak stationarity:

Conditions for weak stationarity:

Conditions for weak stationarity:

Conditions for weak stationarity:

Conditions for weak stationarity:

Conditions for weak stationarity:

Conditions for weak stationarity:

Conditions for weak stationarity:

Conditions for weak stationarity:

Hence E(Xt) is a constant and strictly independent of t for all t.

Conditions for weak stationarity:

Conditions for weak stationarity:

Typically the most appropriate way of characterizing a stationary process is to avoid

These slideshows may be downloaded by anyone, anywhere for personal use.

The content of this slideshow comes from Section 13.1 of C. Dougherty,

You might also like