You are on page 1of 121

Chapter 1

Introduction
The purpose of this course is to:
Teach econometric principles for the analysis of macroeconomic data.
Discuss the link between theory models and empirical macrodata.
Provide the tools to read applied (and theoretical) econometrics papers.
Develop the skills to read and use matrix algebra
Provide the tools to critically evaluate empirical results (PcGive and
PcNaive).
Provide the tools to perform moderately demanding empirical analyses.
The guiding principle is learning by doing! It is, therefore, important to
actively participate in the exercises which are all based on real data sets.
Half of the exam questions will be related to the exercises.
1.1 The dynamic linear regression model for
time-series data
The aim of this section is to discuss how the analysis of cross-sectional data
and panel data analyses dier from the analysis of time-series data. It is
important to understand that in time-series modelling we have to make a
number of simplyfying assumptions regarding the constancy of the mean
1
2 CHAPTER 1. INTRODUCTION
and the covariances over time, some of which cannot be tested because we
have just one realization per time period. This is contrary to panel data
analysis where we have several observations available per time period. To
test constancy of parameters (i.e. constancy of covariances of the data over
time) we often need quite long time series. This means that it is almost
impossible to know whether some macroeconomic mechanisms have changed
as a result of a change in a regime, for example, as a result of adopting the
Euro. Thus, when interpreting the results from macroeconometric models it
is important to have a realistic sense for the reliablity of the results.
Random variable with one observation (realization) per time period.
Consequtive realizations of the random variable usually strongly time
dependent
Simplifying assumptions regarding the mean, variance, and covariances
necessary for statistical inference.
The notation in econometrics is far from standardized and it is important
from the outset to get used to the fact that dierent authors use dierent
notations for the same concepts. Though there will be occasional exceptions
the following notation will generally be used during the course:
In MV (Marno Verbeek) Y denotes a random variable and y a realization
of the random variable. For time series data we often use the notation y
t
both
for a random variable and its realization. It is also customary to use capital
letters (for example C
t
) to denote a variable before the log transformation
and lower case letter (for example c
t
) to denote lnC
t
.
In the following we use the notation:
y
t
is the dependent/endogenous/ variable (or the regressand),
x
i,t
is an explanatory/exogenous variable (or a regressor)

i
is the theoretical regression coecient
b
i
or

i
is an estimate of
i
, whereas the formula (for ex. b = (X
0
X)
1
X
0
y)
is called an estimator.

2
y
or
yy
are used to denote the theoretical variance of y. The former is
often used in connection with a single time series, and the latter in connection
with a covariance matrix.
s
2
y
/
2
y
or s
yy
/
yy
are the corresponding estimates.
is often used to denote a matrix of theoretical variances and covariances.

notates the corresponding estimated variances and covariances.


1.1. THEDYNAMICLINEARREGRESSIONMODEL FORTIME-SERIES DATA3
1.1.1 A single time series process
To begin with we will look at a single variable observed over consecutive time
points and discuss its time-series properties. Let y
s,t
, j = 1, ..., N, t = 1, ..., T
describe realizations of a variable y observed over T time periods. When
N > 1 the observations could, for example, come from a study based on panel
data or they could have been generated from a simulation study of a time
series process y
t
, in which the number of replications are N. Here we will focus
on the case when N = 1, i.e. when there is just one realization (y
1
, ..., y
T
) on
the index set T. Since we have just one realization of the random variable y
t
,
we cannot make inference on the shape of the distribution or its parameter
values without making simplifying assumptions. We illustrate the diculties
with two simple examples in Figures 1 and 2.
y(t)

T
6
-
1
r

y
1
2
r

y
2
3
r
Q
Q
Q
Q
Q
y
3
4
r

y
4
5
r
S
S
S
S
S
S
y
5
6
r
y
6
Figure 1. E(y
t
) = , V ar(y
t
) =
2
y
, t = 1, .., 6
4 CHAPTER 1. INTRODUCTION
y(t)
T
6
-
1
r

y
1

1
2
r

y
2

2
3
r
Q
Q
Q
Q
Q
y
3

3
4
r

y
4

4
5
r
S
S
S
S
S
S
y
5

5
6
r
y
6

6
Figure 2. E(y
t
) =
t
, V ar(y
t
) =
2
y
, t = 1, .., 6
In the two examples, the line connecting the realizations y
t
produces
the graph of the time-series. For instance in Figure 1 we have assumed
that the distribution, the mean value and the variance is the same for each
y
t
, t = 1, ..., T. In gure 2 the distribution and the variance are identical,
but the mean varies with t. Note that the observed time graph is the same
in both cases illustrating the fact, that we often need rather long time series
to be able to statistically distinguish between dierent hypotheses in time
series models.
To be able to make statistical inference we need:
(i) a probability model for y
t
, for example the normal model
(ii) a sampling model for y
t
, for example dependent or independent drawings
For the normal distribution, the rst two moments around the mean are su-
cient to describe the variation in the data. Without simplifying assumptions
on the time series process we have the general formulation for t = 1, ..., T:
E(y
t
) = =
t
V ar(y
t
) = E(y
t

t
)
2
=
t,t.0
Cov(y
t
, y
th
) = E[(y
t

t
)(y
th

th
)] =
t,th.h
h = ... 1, 1, ...
1.1. THEDYNAMICLINEARREGRESSIONMODEL FORTIME-SERIES DATA5
E[y] = E

y
1
y
2
.
.
.
y
T

2
.
.
.

=
Cov[y] = E[yE(y)][yE(y)]
0
=

11.0

12.1

13.2

1T.T1

21.1

22.0

23.1

2T.T2

31.2

32.1

33.0

3T.T3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

T1.T1

T2.T2

T3.T3

TT.0

=
y =

y
1
y
2
.
.
.
y
T

N(, )
Because there is just one realization of the process at each time t, there is
not enough information to make statistical inference about the underlying
functional form of the distribution of each y
t
, t T and we have to make
simplifying assumptions to secure that the number of parameters describing
the process are fewer than the number of observations available. A typical
assumption in time series models is that each y
t
has the same distribution
and that the functional form is approximately normal. Furthermore, given
the normal distribution, it is frequently assumed that the mean is the same,
i.e. E(y
t
) =
y
, for t = 1, ..., T, and that the variance is the same, i.e.
E(y
t
)
2
=
2
y
, for t = 1, ..., T.
1.1.2 A vector process
We will now move on to the more interesting case where we observe a variable
y
t
(the endogenous variable) and k explanatory variables x
i,t
, i = 1, ..., k.
In this case we need to discuss covariances between the variables {y
t
, x
i,t
} at
time t as well as covariances between t and t h. The covariances contain
information about static and dynamic relationships between the variables
which we would like to uncover using econometrics. For notational simplicity
y
t
, x
i,t
will here denote both a random variable and its realization.
We consider the vector z
t
:
6 CHAPTER 1. INTRODUCTION
z
t
=

y
t
x
1,t
x
2,t
.
.
.
x
k,t

, t = 1, ..., T,
and introduce the following notation:
E[z
t
] =

y,t

x
1
,t

x
2
,t
.
.
.

x
k
,t

=
t
,
Cov[z
t
, z
th
] =

y
t
,y
th

y
t
,x
1th

y
t
,x
2th

y
t
,x
kth

x
1t
,y
th

x
1t
,x
1th

x
1t
,x
2th

x
1t
,x
kth

x
2t
,y
th

x
2t
,x
1th

x
2t
,x
2th

x
2t
,x
kth
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

x
kt
,y
th

x
kt
,x
1th

x
kt
,x
2th

x
kt
,x
kth

=
t.h
t = 1, ..., T.
for the case when no simplifying assumptions have been madeWe will now
assume that the same distribution applies for all z
t
and that it is approxi-
mately normal, i.e. z
t
N(
t
,
t
). Under the normality assumption the rst
two moments around the mean (central moments) are sucient to describe
the variation in the data. We introduce the notation :
Z =

z
1
z
2
.
.
.
z
T

, E[Z] =

2
.
.
.

= (1.1)
1.1. THEDYNAMICLINEARREGRESSIONMODEL FORTIME-SERIES DATA7
where Z is a (k + 1)T 1 vector. The covariance matrix

is given by
E[(Z)(Z)
0
] =

1.0

0
2.1

0
T1.T2

0
T.T1

2.1

2.0

0
T.T2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

T1.T2
.
.
.
T1.0

0
T.1

T.T1

T.T2

T.1

T.0

T(k+1)T(k+1)
where
t.h
= Cov(z
t
, z
th
) = E(z
t

t
)(z
th

th
)
0
. The above notation
provides a completely general description of a multivariate vector time series
process. Since there are far more parameters than observations available for
estimation, it has no meaning from a practical point of view. Therefore, we
have to make simplifying assumptions to reduce the number of parameters.
Empirical models are typically based on the following assumptions:

t.h
=
h
, for all t T, h = ..., 1, 0, 1, ...

t
= for all t T.
These two assumptions are needed to secure parameter constancy in the
dynamic regression model to be subsequently discussed. When the assump-
tions are satised we can write the mean and the covariances of the data
matrix in the simplied form:
=

.
.
.

,

=

0

0
1

0
2

0
T1

1

0

0
1
.
.
.
.
.
.

2

1

0
.
.
.
0
2
.
.
.
.
.
.
.
.
.
.
.
.
0
1

T1

2

1

0

The static regression model generally disregards the information con-


tained in
i
, i 6= 0. Thus the static regression model is only based on the
information in
0
.
The above two assumptions for innite T dene a weakly stationary
process:
8 CHAPTER 1. INTRODUCTION
Denition 1 Let {y
t
} be a stochastic process (an ordered series of random
variables) for t = ..., 1, 1, 2, .... If
E[y
t
] = < for all t,
E[y
t
]
2
=
2
< for all t,
E[(y
t
)(y
t+h
)] =
.h
< for all t and h = 1, 2, ...
then {y
t
} is said to be weakly stationary. Strict stationarity requires that the
distribution of (y
t
1
, ..., y
t
k
) is the same as (y
t
1
+h
, ..., y
t
k
+h
) for h = ..., 1, 1, 2, ....
1.1.3 An illustration:
The data set is dened by [c
r
t
, y
r
t
, w
r
t
, R
b,t
, p
t
, p
h,t
p
c,t
], t = 1973:1, ..., 2003:1,where
c
r
t
= c
t
p
t
is a measurement of real private consumption at time t, where
c
t
is the log of nominal conumption expenditure in Denmark and p
t
is the
log of the implicit consumption deator,
y
r
t
is the log of real domestic expenditure, GNE,
R
b,t
is the 10 year government bond rate,
p
t
is the quarterly ination rate measured by the implicit consumption
deator, and
p
h,t
p
c,t
is the log dierence between the house price deator and the
consumption deator.
Figures 3 and 4 show the graphs of the data in levels and in rst dier-
ences.
1.1. THEDYNAMICLINEARREGRESSIONMODEL FORTIME-SERIES DATA9
1970 1980 1990 2000
6.0
6.2
LrC
1970 1980 1990 2000
-0.05
0.00
0.05
0.10
DLrC
1970 1980 1990 2000
6.6
6.8
7.0 LrY
1970 1980 1990 2000
0.00
0.05
DLrY
1970 1980 1990 2000
8.2
8.4
LrW
1970 1980 1990 2000
-0.025
0.000
0.025
DLrW
Figure 3: The graphs of real consumption, real income, and real wealth in
levels and dierences.
A visual inspection reveals that neither the assumption of a constant
mean or a constant variance seem appropriate for the levels of the variables,
whereas the dierenced variables look more satisfactory in this respect. If the
marginal processes are normal then the observations should lie symmetrically
on both sides of the mean. This seems approximately to be the case for the
dierenced variables, although most of the variables seem to have a higher
variance in the seventies and the beginning of the eithies than in the more
recent EMS period. Moreover, there seem to be some outlier observations
in most of the series and the question is whether these observations are too far
away from the mean to be considered realizations from a normal distribution.
It is generally a good idea to have a look in the economic calendar to nd
out if the outlier observations can be related to some signicant economic
interventions or reforms.
For example the outlier observation in real consumption, real income and
ination rate appears at the same date as a temporary removal of the value
added tax rate in 1975:3. Denmark had experienced a stagnating domestic
demand in the period after the rst oil shock and to boost aggregate activity
the government decided to remove VAT for one quarter and gradually put it
back again over the next two quarters. The outlier observation in bond rate
10 CHAPTER 1. INTRODUCTION
1970 1980 1990 2000
0.02
0.04
Rb
1970 1980 1990 2000
-0.005
0.000
0.005
DRb
1970 1980 1990 2000
0.00
0.02
0.04
DLPc
1970 1980 1990 2000
-0.025
0.000
0.025
0.050
DDLPc
1970 1980 1990 2000
-0.2
-0.1
0.0
0.1
LphLPc
1970 1980 1990 2000
0.000
0.025
0.050
DLphLPc
Figure 1.1: Figure 4: The graphs of the bond rate, ination rate, and relative
house - consumption prices in levels and dierences.
1.1. THEDYNAMICLINEARREGRESSIONMODEL FORTIME-SERIES DATA11
is related to the lifting of previous restrictions on capital movements and the
start of the hard EMS in 1983.
These are realistic examples that point at the need to include additional
information on interventions and institutional reforms in the empirical model
analysis. This can be done by including new variables measuring the eect of
institutional reforms, or if such variables are not available by using dummy
variables as a proxy for the change in institutions.
At the start of the empirical analysis it is not always possible to know
whether an intervention was strong enough to produce an extraordinary
eect or not. Essentially every single month, quarter, year is subject to some
kind of political interventions, most of them have a minor impact on the data
and the model. Thus, if an ordinary intervention does not stick out as an
outlier, it will be treated as a random shock for practical reasons. Major
interventions, like removing restrictions on capital movements, joining the
EMS, etc. are likely to have much more fundamental impact on economic
behavior and, hence, need to be included in the systematic part of the model.
Ignoring this problem is likely to seriously bias all estimates of our model and
result in invalid inference.
It is always a good idea to start with a visual inspection of the data
and their time series properties as a rst check of the assumptions of the
linear regression model. Based on the graphs we can get a rst impression of
whether x
i,t
looks stationary with constant mean and variance, or whether
this is the case for x
i,t
. If the answer is negative to the rst question, but
positive to the next one, we can solve the problem by respecifying the model
in error-correction form as will be subsequently demonstrated. If the answer
is negative to both questions, it is often a good idea to check the economic
calendar to nd out whether any signicant departure from the constant
mean and constant variance coincides with specic reforms or interventions.
The next step is then to include this information in the model and nd out
whether the intervention or reform had changed the parameters of the model.
In the latter case the intervention is likely to have caused a regime shift and
the model would need to be re-specied allowing for the shift in the model
structure. We will subsequently discuss procedures to check for parameter
constancy over the chosen sample period.
12 CHAPTER 1. INTRODUCTION
1.1.4 Descriptive statistics: denitions and estimators
In this section we give the denitions of the statistics calculated by the pack-
age Descriptive Statistics in Pc-Give.
The mean of a random variable, x
j
, is dened by
E(x
j
) =
X
i
f
i
x
j,i
=
j
where f
i
is the probability density function.
Given a sample of T observations of the variable, x
j
, the sample mean
is calculated as:
x
j
=
1
T
T
X
t=1
x
j
t
.
In the case of a set of variables, x
1
, ...x
k
, the (k 1) vector of sample
means is given by:
x =

x
1
x
2
.
.
.
x
k

Note that the sum of the deviations from the mean is always zero. Note
also that the sample mean is sensitive to outlying observations, which do
not belong to the assumed probability distribution and can in such cases be
misleading.
The variance of x
i
is the expected value of the squared deviations from
the mean:
V ar(x
i
) = E(x
i

i
)
2
= E(x
2
i
)

E(x
i
)
2

=
X
i
f
i
(x
i

i
)
2
=
2
i
An unbiased estimator of the sample variance is given by:
b
2
i
=
1
T 1
T
X
t=1
(x
i
t
x
i
)
2
.
The variance represents an average of the squared deviations. It can
be very sensitive to outlying observations.
1.1. THEDYNAMICLINEARREGRESSIONMODEL FORTIME-SERIES DATA13
The covariance of two variables, x
k
and x
j
, is dened as:
cov(x
k
, x
j
) = E

(x
k

k
)(x
j

j
)

= E(x
k
x
j
)
k

j
=
X
i
f
i
(x
k,i

k
)(x
j,i

j
) =
kj
and an unbiased estimator is given by:
c cov(x
k
, x
j
) =
kj
= s
kj
=
1
T 1
T
X
t=1
(x
kt
x
k
)(x
jt
x
j
)
A positive (i.e., upward-sloping) linear relationship between the variables
will give a positive covariance, and a negative (i.e., downward-sloping) linear
relationship gives a negative covariance.
The variance-covariance matrix of a set of variables {x
i
, x
j
, x
k
} is
given by:

x.0
=

var(x
i
) cov(x
i
, x
j
) cov(x
i
, x
k
)
cov(x
i
, x
j
) var(x
j
) cov(x
j
, x
k
)
cov(x
i
, x
k
) cov(x
j
, x
k
) var(x
k
)


ii

ij

ik

ji

jj

jk

ki

kj

kk

with the variances on the diagonal and the covariances on the o-diagonal.
The subscript 0 in
x.0
shows that the covariances have been calculated based
on current values (but not lagged values) of the variables, i.e. h = 0.
The sample standard deviation of the variable, x
i
, is given by:
b
i
=
q
b
2
i
=
v
u
u
t
1
T 1
T
X
t=1
(x
i
t
x
i
)
2
The sample correlation coecient between two variables, x
i
and x
j
, is
given by:
r
ij
= corr(x
i
, x
j
) =
P
t
(x
it
x
i
)(x
jt
x
j
)
p
P
t
(x
i
t
x
i
)
2
p
P
t
(x
j
t
x
j
)
2
The relationship between the standard deviation, the covariance, and the
correlation coecient can be obtained by multiplying the numerator and the
denominator by 1/(T 1), to give:
14 CHAPTER 1. INTRODUCTION
=
1
T1
T
P
t=1
(x
it
x
i
)(x
jt
x
j
)
q
1
T1
P
t
(x
it
x
i
)
2
q
1
T1
P
t
(x
jt
x
j
)
2
which reduces to:
r
ij
=

ij

i

j
(1.2)
The correlation coecient measures the strength of the linear relationship
between the variables. Perfect negative and positive linear relationships are
indicated by r = 1 and r = 1, respectively, and a value of r = 0 indicates no
linear relationship. It interpretation is strictly limited to linear relationships.
The estimated correlation matrix, i.e. the standardized covariance matrix

0
, for the Danish consumption data (output from the PcGive Descriptive
Statistics package):
Means, standard deviations and correlations (using consumption.in7)
The sample is 1973 (1) - 2003 (1)
Means
LrC LrY LrW Rb DLPc
6.1337 6.7752 8.2741 0.029070 0.013802
Standard deviations (using T-1)
LrC LrY LrW Rb DLPc
0.11663 0.13811 0.13517 0.011418 0.011389
Correlation matrix:
LrC LrY LrW Rb DLPc
LrC 1.0000 0.98299 0.97384 -0.86493 -0.70119
LrY 0.98299 1.0000 0.97986 -0.87192 -0.64968
LrW 0.97384 0.97986 1.0000 -0.88104 -0.67227
Rb -0.86493 -0.87192 -0.88104 1.0000 0.67992
DLPc -0.70119 -0.64968 -0.67227 0.67992 1.0000
Note that the nonstandrdized covariances can be derived using the for-
mula (1.2).
1.2. JOINT, MARGINAL AND CONDITIONAL PROBABILITIES 15
1.2 Joint, marginal and conditional probabil-
ities
First we give a repetition of the simple multiplicative rule to calculate joint
probabilities, and then the formulas for calculating the conditional and mar-
ginal mean and the variance of a multivariate normal vector z
t
.
Repetition:
***********************************************
An illustration of the multiplicative rule for probability calculations based
on four dependent events, A, B, and C:
P(A B C) = P(A|B C)P(B C)
= P(A|B C)P(B|C)P(C)
Note that a multiplicative formulation has been achieved for the conditional
events, even if the events themselves are are not independent.
The general principle of the multiplicative rule for probability calculations
will be applied in the derivation of conditional and marginal distributions.
Consider rst two normally distributed random variables y
t
and x
t
with the
joint distribution:
z
t
N(, )
z
t
=

y
t
x
t

, E[z
t
] =


y

, Cov

y
t
x
t

y
t
x
t

0
=


yy

yx

xy

xx

The marginal distributions for y


1,t
and y
2,t
are given by
y
t
N(
y
,
yy
)
x
t
N(
x
,
xx
)
The conditional distribution for y
t
|x
t
is given by
(y
t
|x
t
) N(
y.x
,
yy.x
)
where

y.x
=
y
+
yx

1
xx
(x
t

x
) (1.3)
= (
y

yx

1
xx

x
) +
yx

1
xx
x
t
=
0
+
1
x
t
16 CHAPTER 1. INTRODUCTION
and

yy.x
=
yy

yx

1
xx

xy
(1.4)
The joint distribution of z
t
can now be expressed as the product of the
conditional and the marginal distribution:
P(y
t
, x
t
; )
| {z }
the joint distribution
= P(y
t
|x
t
;
1
)
| {z }
the conditional distribution
P(x
t
;
2
)
| {z }
the marginal distribution
(1.5)
The linear regression model:
y
t
=
0
+
1
x
t
+
t
corresponds to the conditional expectation of y
t
for given values of x
t
(or
alternatively when keeping x
t
xed.
Chapter 2
Estimating the Standard Linear
Model
2.1 The Assumptions of the Linear Regres-
sion Model
The linear regression model, in matrix notation, can either be written as:
y
t
=
0
x
t
+
t
, t = 1, ..., T
where is a k 1 vector of coecients, x
t
is a k 1 vector of explanatory
variables, including a constant, or in more compact form as:
y = X + (2.1)
where y is an (T 1) vector, X is an (T k) matrix of which the rst column
is 1s, is a (k 1) vector, and is a (T 1) vector.
Estimation of the standard linear model by the method of ordinary
least squares (OLS) is motivated by the Gauss-Markov theorem which
states that the OLS estimators are best linear unbiased estimators (b.l.u.e.).
Least-squares estimators are the best in the sense that among the class of
linear unbiased estimators they have the least variance under the following
assumptions.
1. E(
t
) = 0, t = 1, ..., T
2. E(x
0
t

t
) = 0, t = 1, ..., T, i.e.
t
and x
t
are independent.
1
2 CHAPTER 2. ESTIMATING THE STANDARD LINEAR MODEL
3. V ar(
t
) =
2

, t = 1, ..., T, i.e. error term has a constant variance.


4. Cov(
t
,
th
) = 0, t = 1, ..., T, h = ... 1, 0, 1, ...
5.
t
Nid(0,
2

) or N(0,
2

I)
6.
1
T
(X
0
T
X
T
)
T

xx
The normality assumption (5.) is not needed for the Gauss-Markov the-
orem to deliver estimators of minimum variance amongst the class of all
linear estimators, but it is required for tests of hypotheses (inference) in the
estimated models unless the sample size is very large.
2.1.1 Derivation of the OLS Estimator
The idea of the OLS estimation is to nd an estimate of , i.e.,
b
, with the
property that the sum of the squared residuals from the estimated model:
y
t
=

0
x
t
+ e
t
, t = 1, ..., T
or
y = X

+e
is minimized, i.e. such that
P
e
2
t
= e
0
e is minimized. Based on the estimated
value,

, the estimated, or predicted, value for y, is:
b y = X
b

The residuals in the vector, e, is then seen to be the deviations of the esti-
mated b ys from the actual ys:
e = b yy.
In order to nd the OLS estimator we rst express e as:
e = X
b
y
and then, by squaring, obtain
e
0
e = (X
b
y)
0
(X
b
y)
e
0
e = y
0
y 2
b
X
0
y +
b

0
X
0
X
b

2.1. THE ASSUMPTIONS OF THE LINEAR REGRESSION MODEL 3


To minimize the sum of squared residuals, take the derivative of e
0
e with
respect to
b
and set it equal to zero:
(e
0
e)

= 2X
0
y + 2X
0
X
b
= 0
which yields
(X
0
X)
b
= X
0
y.
If the matrix X has full rank the design matrix, X
0
X is invertible and
we can nd the OLS estimator as:
b
= (X
0
X)
1
X
0
y. (2.2)
To derive the variance of the OLS estimate
b
we insert the value of y
from (2.1) in (2.2):
b
= (X
0
X)
1
X
0
(X + )
= (X
0
X)
1
(X
0
X) + (X
0
X)
1
X
0

= + (X
0
X)
1
X
0

(2.3)
and
E(
b
) = + E(X
0
X)
1
X
0

= + (X
0
X)
1
E(X
0
)
Under the assumption A.2:
E(X
0
) = 0
the OLS estimator is unbiased.
To derive the standard error of estimate

we rst express the deviation


of the OLS estimate from the true value (see (2.3)) as:
(
b
) = (X
0
X)
1
X
0
.
The variance can then be expressed as:
E{(
b
)(
b
)
0
} = E{(X
0
X)
1
X
0

0
X(X
0
X)
1
}
4 CHAPTER 2. ESTIMATING THE STANDARD LINEAR MODEL
By assumption A.5 E(
0
) =
2

I
T
and we obtain:
var(
b
) =
2

= (X
0
X)
1
X
0

2
I
T
X(X
0
X)
1
=
2
(X
0
X)
1
The normality assumption implies
b
is a linear combination of normally
distributed variables. Since we know its mean and variance, it can be con-
cluded that
b
N(,
2
(X
0
X)
1
).
2.1.2 The Estimated Residual Variance
The OLS residual, e, is connected to the error term, , in the following way:
e
0
e =
0
M,
where M = I X(X
0
X)
1
X
0
is an idempotent matrix of reduced rank, (T
k).
An unbiased estimator of the residual error variance, b
2

, is:
b
2

=

0
M
T k
=
e
0
e
T k
,
or, equivalently,
b

=
1
T k
T
X
t=1
e
t
2
=
RSS
(T k)
,
where RSS stands for the residual sum of squares. Thus, note that
RSS = (T k) b
2

.
The square-root of the estimated residual variance is the standard error of
the regression estimate, b

, calculated by
b

=
s
RSS
(T k)
Finally, as will be shown in the next chapter, the quadratic form,
0
M,
is
2
-distributed when is normally distributed:

0
M

2
=
e
0
e

2

2
(Tk)
where the degrees of freedom, T k, are equal to the rank of the idempotent
matrix, M.
2.1. THE ASSUMPTIONS OF THE LINEAR REGRESSION MODEL 5
2.1.3 The Analysis of Variance
The total variation of the observed ys (TSS) can be decomposed into the
explained sum of squares (ESS) and the residual sum of squares (RSS):
T
X
t=1
(y
t
y)
2
=
T
X
t=1
(b y
t
y)
2
+
T
X
t=1
(y
t
b y
t
)
2
or
TSS = ESS + RSS.
Thus:
TSS =
T
X
t=1
(y
t
y)
2
ESS =
T
P
t=1
(b y
t
y)
2
=
b

T
P
t=1
(x
t
x)(y
t
y)
RSS =
P
T
t=1
(y
t
b y
t
)
2
=
T
P
t=1
e
2
t
Note that the OLS estimation procedure implies that mean of e is zero by
construction.
To show the variance decomposition in matrix notation, we rst transform
the linear regression model, into deviation form, so that the variables are
expressed as deviations from their means using the transformation matrix,
A:
A = I
T

1
T
ii
0
where i is a vector of 1s, and I
T
is the (T T) identity matrix. It is now
useful to distinguish between the constant term and the remaining regressors

1
= [

1
, ...,

k
] in the regression model:
y =

0
+X

1
+e
Multiplying the model through by A gives:
Ay = A

0
+AX

1
+Ae
6 CHAPTER 2. ESTIMATING THE STANDARD LINEAR MODEL
Because the mean of a constant is equal to a constant, A

0
is zero and the
rst term drops out. Furthermore, Ae = e, because e = 0, and the regression
model becomes:
Ay = AX

1
+e
Squaring the model in deviation form gives:
y
0
Ay =

0
1
X
0
AX

1
+e
0
e
which gives the decomposition into:
TSS = ESS + RSS
With the variance decomposition we can form the ratio of the explained
sum of squares to the total sum of squares, ESS/TSS, which is a measure
of the goodness-of-t of the linear regression model:
R
2
=
ESS
TSS
= 1
RSS
TSS
When the assumptions of the regression model are satised a high value of R
2
is generally an indication of a high explanatory power of the model. However,
R
2
can be completely misleading when the assumption A.6 no longer holds,
i.e. when the data are not stationarity. (If the data are nonstationary,
i.e. when the data are trending, the assumption A.6 is violated, as the
design matrix(
1
T
X
0
T
X
T
) does not converge to a constant matrix. This
will be discussed later on in the course). The reason is that R
2
essentially
compares the tted values of the regression model, y
t
, with the average value,
y. Since the average value is a very poor statistic as a description of a trending
variable adding almost any variable to the model is likely to increase ESS
substantially. For example a linear time trend will often give a high R
2
even
if a time trend does not really explain anything.
When the data are strongly time dependent R
2
can still be relevant as
a measure of explanatory power, but only in a model explaining y
t
(for
which the constant mean is often a more appropriate description) instead of
y
t
. This will be discussed in more detail later on in the course.
The value of TSS in terms of the variance of the dependent variable is
given by:
TSS = (T 1)
2
y
.
2.1. THE ASSUMPTIONS OF THE LINEAR REGRESSION MODEL 7
(Note again that for the this measure to be relevant y
t
need to be a
stationary variable.) When adding more variables to the regression model
the R
2
statistic tends to increase, irrespective of whether or not the new
variables have signicantly improved the explanatory power of the model.
The adjusted R
2
, or R
2
, corrects for the number of explanatory variables in
the following way:
R
2
= 1
RSS/(T k)
TSS/(T 1)
= 1
b
2

2
y
.
Both R
2
and R
2
measures only the explanatory power of the information
set as a whole and says nothing about the relative contribution of any one
individual explanatory variable. The squared partial correlation coef-
cient r
2
yx.z
measures the partial eect between the dependent variable,
y, and one of the explanatory variables, x, when the eect of the remaining
k 1 variables, z ,have been corrected for.
To illustrate the meaning and the calculation of the partial correlation
coecient we assume the following regression model:
y
t
=
y0.123
+
y1.23
x
1t
+
y2.13
x
2t
+
y3.12
x
3t
+
t
.
We would now like to know how much x
3t
is able to improve the explanation
of y
t
when we have corrected for the inuence of x
2t
and x
3t
. To answer this
we need to perform three auxiliary regressions:
y
t
=
b

y0.23
+
b

y2.3
x
2t
+
b

y3.2
x
3t
+ e
1t
,
and
x
1t
=
b

x
1
0.23
+
b

x
1
2.3
x
2t
+
b

x
1
3.2
x
3t
+ e
2t
.
The residual fromthe rst regression, e
1t
measures what is left of the variation
of y
t
after having removed the variation explained by x
2t
and x

3t
. With other
words we have cleaned y
t
of the eects of the other explanatory variables.
The residual from the second regression does the same with x
1
t
: It cleans x
1
t
from the eects of x
2
t
and x

3t
. In statistics jargon it is called to concentrate
out the eect of x
2t
and x

3t
. The third regression takes the rst auxiliary
residual e
1t
and regresses it on the second auxiliary residual e
2t
:
e
1t
=
b

0
+
b

y1.23
e
2t
+
t
. (2.4)
8 CHAPTER 2. ESTIMATING THE STANDARD LINEAR MODEL
The squared partial correlation coecient can now be dened as:
r
2
yx
1
.x
2
,x
3
=
ESS
A
TSS
A
where ESS
A
and TSS
A
are, respectively, the explained sum of squares and
total sum of squares from the third auxiliary regression (2.4). Adding or
removing explanatory variables to the regression model is likely to change
the squared partial correlation coecient. If the coecient falls when adding
a variable, the two variables are probably collinear, i.e. the new variable is
a substitute rather than a complement to the variables already included in
the model.
The partial r
2
can also be calculated from the t-values of the coecients
in the original regression based on the following formula:
part. r
2
=
t
2
i
t
2
i
+ (T k)
,
where t
i
is the t-value of the coecient
yx.z
in the original regression model:
t
i
=
b

i
b

i
,
where
b

i
is the OLS estimator of the ith coecient, and b

i
is the estimated
standard error.
2.2. MULTICOLLINEARITY AND OMITTED VARIABLES 9
2.2 Multicollinearity and Omitted Variables
Multicollinearity and omitted variables are both concerned with the structure
of the Xmatrix. Though their eect on the obtained estimates can be serious,
in general neither of them violates the classical assumptions of the standard
linear model.
2.2.1 Multicollinearity
Initially, the term multicollinearity was used to refer to linear dependencies
among three or more regressors, while collinearity meant a high correlation
between a pair of them, but the term multicollinearity has come to cover
both cases. A perfect linear relationship between some of the x variables is
seldom the case (except when certain transformations of the variables results
in exact linear dependencies among the variables) some re one has fallen into
the dummy variable trap)dependencies and the multicollinearity problem is
one of degree. Thus, when we speak of multicollinearity it usually means
a high linear correlation between two or more of the regressors. One way
of investigating multicollinearity is to examine the correlation coecients of
the variables. A correlation matrix can often highlight which (if any) of the
regressors are highly collinear.
The eects of multicollinearity are not as serious as other conditions which
violate the assumptions of the standard linear model. The coecient es-
timates are not biased in the presence of multicollinearity, and the OLS
estimators are still b.l.u.e. There is, however, an important eect on infer-
ence. The standard errors of the estimates usually increase in the presence
of multicollinearity, lowering the t-values and making signicant individual
coecient estimates appear insignicant. This raises the probability of too
often accepting the null hypothesis of a zero coecient when it should be
rejected. Multicollinearity, therefore, means that the individual eects of
the correlated variables cannot be disentangled. Thus, an F-test of the joint
signicance of the same variables may indicate signicance, even when the
t-ratios do not. Therefore, one should be cautious to remove variables exclu-
sively based on t-tests.
The reason for having a highly multicollinear model can be because we
have included two regressors which essentially measure the same thing. In
this case one of them could in fact be left out without aecting the explana-
tory power of the model. However, in most of the cases when multicollinearity
10 CHAPTER 2. ESTIMATING THE STANDARD LINEAR MODEL
is a problem the underlying cause is that Assumption A.6 is violated, i.e. the
data are nonstationary instead of stationary. In this case the remedy is to
transform the model so that stationarity is recovered. This will be discussed
later on in the course.
2.2.2 Omitted Relevant Variables
The problem of omitted variables is more fundamental. If the omitted vari-
able(s) are relevant for the explanation of the variation in the dependent
variable, their eects will be contained in the residual. This is likely to result
in ineciency (in the sense that the estimated residual variance will be over-
estimated) and, if the omitted variables are not orthogonal to the included
regressors, in omitted variables bias.
Assume the true model is
y = X
x
+Z
z
+ , (2.5)
but the economist estimates the model
y = X
x
+
1
. (2.6)
The OLS estimator:

x
= (X
0
X)
1
X
0
y (2.7)
=
x
+ (X
0
X)
1
X
0

1
We now substitute y in (2.7) with (2.5) and obtain an expression of the OLS
estimator

x
as a function of the omitted variable(s) Z

x
= (X
0
X)
1
X
0
y
= (X
0
X)
1
X
0
X + (X
0
X)
1
X
0
Z
z
+ (X
0
X)
1
X
0

=
x
+ (X
0
X)
1
X
0
Z
z
,
and
E

x
=
x
+X
0
X)
1
E(X
0
Z)
z
+ (X
0
X)
1
E(X
0
)
=
x
+X
0
X)
1
E(X
0
Z)
z
2.2. MULTICOLLINEARITY AND OMITTED VARIABLES 11
i.e. even if the assumption A.1, E(X
0
) = 0, holds true, the OLS estimate

x
will be biased if the omitted variables Z are correlated with the included
variable X, i.e. unless E(X
0
Z) = 0. This can clearly give rise to serious
interpretational problems, as economic models usually only contain a subset
of all relevant variables (because of the ceteris paribus assumption everything
else equal) and because economic variables are usually strongly correlated.
To demonstrate the eect of omitted variables on the residual error vari-
ance we rst note that the residual sum of squares can be expressed as
RSS = y
0
My.
where M = I X(X
0
X)
1
X
0
is dened in terms of the economists model
(2.6). Substituting y for the true model value (2.5) gives
RSS = (X
x
+Z
z
+ )
0
M(X
x
+Z
z
+ ),
which, since MX = 0, reduces to
RSS = (Z
z
+ )
0
M(Z
z
+ )
=
0
M +
0
z
Z
0
MZ
z
+ 2
0
z
Z
0
M,
which, upon taking expectations, is
E(RSS) = E {
0
M +
0
z
Z
0
MZ
z
+ 2
0
z
Z
0
M} .
Because E(M) = 0 and E(
0
M) =
2

(T k), where k is the number of


variables in X, so
E(RSS) =
2

(T k) + (
0
z
Z
0
MZ
z
),
and
E

RSS
T k

=
2

1
=
2

+
1
T k
(
0
z
Z
0
MZ
z
)
which shows that the residual variance of the economist model (2.6) is larger
than the residual variance of the true model as long as
z
6= 0.
Problems typical of time-series econometrics
Only one realization per time unit
Strong time dependence
Many simplifying assumptions on the constancy of mean
and covariances
Many observations are needed to check these
assumptions
Economic mechanisms often change as a result of
changes in regime
The Danish consumption function
Notation
Estimation in the linear regression model
The assumptions of the linear regression
model
The properties of the OLS estimator
Statistical properties of estimators
Unbiasedness
Consistency (asymptotic property)
Efficiency
Problems:
Multicollinearity
Time dependence
Aspects on econometric modelling
Interpreting the coefficients of the regression model
The ceteris paribus assumption and the econometric
analysis
Omitted variable bias
Omitted variable bias and orthogonality of regressors
Endogeneity and exogeneity in economics and in
econometrics
Loglinear or linear models?
Data mining or data snooping
General-to-specific or Specific-to general
Model selection tests and strategies
Nested verus non-nested models, encompassing
Multicollinearity and omitted variable bias
Chapter 4
Evaluating the Standard Linear
Model
The linear regression model is easy to estimate and is, therefore, one of the
most used empirical models in economics. Though simplicity is a desirable
property, the estimated results are not necessarily reliable unless the basic
assumptions underlying the regression model are valid. Therefore, these as-
sumptions should always be checked for its empirical consistence. The nal
model specication is usually a compromise between what is feasible or not
feasible (number of estimated parameters/data points available) and what
is crucial or less crucial for reliable inference (robustness arguments). The
table below gives a summary of the basic assumptions and why they are
important:
1
2 CHAPTER 4. EVALUATING THE STANDARD LINEAR MODEL
1. A linear model of the form: y
t
= x
0
t
+
t
,
2. with disturbances of mean zero, E(
t
) = 0,
3a.
constant (homoscedastic) error variance,
(important for eciency)
E(
2
t
) =
2

,
3b.
independent errors,
(important for correct inference)
E(
t

s
) = 0, t 6= s
4.
The X matrix has full rank,
(otherwise estimation breaks down)
Cov(x
i
, x
j
) preferably as small as possible,
(important for precise estimates of
i
)
rank(X) = k,
multicollinearity
5. X is non-stochastic, or stochastic but
uncorrelated with the s,
(gives unbiased estimates)
E(X
0
) = 0,
6. The errors are normal
(important for inference)

t
Niid(0, ).
7.
The x variables are stationary,
(important for consistency)
P lim(X
0
T
X
T
)
4.1 Specication and Misspecication Tests
It is customary to speak (interchangeably) about specication tests or
misspecication tests. Though the distinction between the two is not
always clear one could, nevertheless, claim that it lies in the specicity of the
alternative: In specication tests one model is tested against another, i.e., a
model corresponding to the null hypothesis is tested against a model given
by the alternative. For example, the t-test of whether a coecient is dierent
from zero is a specic test, or a Chow test whether a coecient changes at a
specic point in time is also a specic test. In contrast, misspecication tests
have the character of diagnostic model checking. For example, the evaluation
4.2. HETEROSCEDASTICITY 3
of whether the linear form of the model is adequate or not using a battery
of dierent tests belongs to this category. Often a mis-specication test may
tell us that our model does not satisfy (some of) the basic assumptions but
does not necessarily give us a precise information on how to respecify the
model.
Clearly, testing for mis-specication should logically precede specication
testing, because we are examining empirically the adequacy of the assump-
tions of the standard linear model, on which the validity and power of infer-
ence depends. When the model is misspecied, the eect of misspecication
can usually be seen in the estimated residuals, and most misspecication tests
are about checking whether the residuals are approximately homoscedastic,
are not autocorrelated, and approximately normal. It should, however, be
noted that rejection of normally distributed residuals, for example as a result
of outliers, is likely to inuence the test of residual heteroscedasticity and the
test of no autocorrelation. Therefore, the order of the testing is not irrele-
vant. It is often a good idea to correct the specication of the model for the
most obvious problems, like outliers, regime shifts, and so on before testing
for autocorrelation. When this is said, there does not necessarily exist an
unambiguous order in which to perform the tests. For instance, a test for het-
eroscedastic disturbances is also a test of mis-specication of the functional
form, because systematic patterns in the variance of the estimated residuals
may be caused by either one. Similarly, Chow-tests of recursive residuals, to
be discussed in the next chapter, are sensitive not only to instability in the
coecient estimates but also to heteroscedastic disturbances. The Durban-
Watson d statistic, originally constructed to test against rst-order autocor-
relation, can also be symptomatic of an inappropriate functional form, in
particular a static instead of a dynamic formulation.
The reason for the popularity of the regression model can be traced back
to the Gauss-Markov theorem stating that the OLS estimators are B.L.U.E.
under assumptions 1 - 3 and 5 above. This chapter is primarily concerned
with testing the Assumptions 3-4 and 6 whereas Assumption 5 will be dis-
cussed later on in connection with the Instrumental Variables approach.
4.2 Heteroscedasticity
As stated, heteroscedastic residuals can be the result of other things besides
heteroscedastic disturbances. The latter suggests a heteroscedastic process
4 CHAPTER 4. EVALUATING THE STANDARD LINEAR MODEL
in the DGP or in the gathering of the data which the researcher has failed to
model, in which case economic or statistical theory should be called upon to
provide a basis for the existence of such a process and an attempt should be
made to identify the form of the heteroscedasticity and to explicitly include
the process in the model. On the other hand, heteroscedastic residuals could
just as easily arise on account of a mis-specied functional form, or other
mis-specication. In other words, observed heteroscedastic residuals are as
likely to mean that the model is mis-specied as that a true heteroscedastic
process is present.
4.2.1 The Eects of Heteroscedasticity
Under the null hypothesis of homoscedasticity the errors have a constant
variance over the sample period, that is
H
0
: E {
0
} =
2

I,
t = 1, ..., T.
Under the alternative hypothesis of heteroscedasticity the variance of the
errors varies over time:
E {
t

0
t
} =
where is a diagonal matrix with diagonal elements
2
,t
, t = 1, . . . , T.
Under the null hypothesis it was already shown that the estimated resid-
ual variance was derived from the equality

0
=
0
M,
in which Mis the symmetric idempotent projection matrix, I X(X
0
X)
1
X
0
.
In the case of homoscedastic residuals, the derivation of the estimator, after
taking expectations, leads to
E{
0
} =
2

trM
=
2

(T k).
Thus, a homoscedastic residual variance can be estimated by
b
2

=

0

(T k)
, (4.1)
4.2. HETEROSCEDASTICITY 5
which is an unbiased estimator of
2

:
E(b
2

) =
2

.
Under the alternative of heteroscedastic residuals the estimator (4.1) will
yield an average residual error variance,
2

, which does not account for that


in periods of high residual variance the predictions of y
t
are less precise
than in periods of small residual variance. Therefore, the eect of residual
heteroscedasticity is that it renders the OLS estimate or,
b
, inecient. The
variance-covariance matrix of
b
is given by
Cov(
b
) = (X
0
X)
1
X
0
X(X
0
X)
1
, (4.2)
where
E{
0
} = .
It is easy to see that only when =
2

I, the covariance matrix (4.2) collapses


to the usual OLS estimator:
Cov(
b
) =
2

(X
0
X)
1
. (4.3)
In the case of heteroscedastic errors = diag {
11
, . . . ,
TT
} and (4.3)
will typically underestimate the true variance of
b
. Although the eect of
heteroscedasticity is that the OLS estimator is no longer B.L.U.E., the loss
of eciency is not necessary very large in many practical situations.
4.2.2 Heteroscedasticity-Consistent Standard Errors
In the presence of heteroscedasticity it is possible to estimate heteroscedastic-
consistent standard errors (HCSE) for the OLS estimators. The method
proposes an estimate of
b
= diag

2
1
, . . . ,
2
T

, obtained by considering the


individual elements of the residual vector to be proxies for the unobservable
disturbances. That is, given that
t
is a proxy for
t
, the former may be con-
sidered a sample of size 1 from the distribution of the tth residual, and
2
t
can be viewed as an estimator of
2
,t
, the true variance of
t
. Under standard
regularity conditions this has been shown to lead to a consistent estimator
of
2

X
0
X. Letting
b
= diag

2
1
, . . . ,
2
T

, then
X
0
b
X =
T
X
i=1

2
i
x
i
x
0
i
,
6 CHAPTER 4. EVALUATING THE STANDARD LINEAR MODEL
where x
0
i
is the (1 k) ith row of X. The true variance of the OLS estima-
tor,
b
, which is (X
0
X)
1
X
0
X(X
0
X)
1
, can then be consistently estimated
despite the heteroscedasticity by,
d
Cov(
b
) = (X
0
X)
1
"
T
X
i=1

2
i
x
i
x
0
i
#
(X
0
X)
1
. (4.4)
Under the assumption of heteroscedastic errors the use of (4.4) instead of
(4.2) yields asymptotically valid inference based on Students t- and F-tests.
Moreover, HCSEs act as a general test of heteroscedasticity, insofar as the
residuals from a model in which the HCSEs do not signicantly dier from
the conventional estimates can be assumed to be homoscedastic. Note also
that no specic form of heteroscedasticity is needed for the calculation of
HCSE.
4.2.3 Testing for Heteroscedasticity
The Breusch-Pagan test and the White LM test for heteroscedastic errors
are very similar in nature. They are based on the following very general
assumption about the error term:
y
t
=
0
x
t
+
t
,
t
N(0,
2
,t
) (4.5)
and

2
,t
=
2

h(z
0
t
).
The vector z
t
consists of known variables which under the alternative hy-
pothesis are supposed to cause the time dependent variation in
2
,t
. Under
the null hypothesis
2
,t
=
2

. This is a typical case where the model is very


easy to estimate under the null, but quite dicult to estimate under the
alternative. Thus, the LM procedure is a convenient test procedure. The
practical procedure consists of the following steps:
1. Estimate the model with OLS and obtain the estimated residuals

t
= y
t

0
(H
0
)
x
t
.
2. Perform the auxiliary regression
(

2
t

2

1) =
k
X
i=1
b
i
(x
i,t
x
i,t
) +
m
X
i=1
a
i
(z
i
z
i
) +e
t
, t = 1, ..., T.
4.2. HETEROSCEDASTICITY 7
where b
i
' (

(H
1
)i

(H
0
)
i
) is needed to correct for the error in the OLS
estimates,

(H
0
)i
, when the alternative is true. (Compare with the result
(3.4) in Chapter 3.) This error can be explained by the fact that if we
had estimated the true model under the alternative hypothesis, i.e. with a
model which properly accounted for the heteroscedasticity in the error term,
then the correctly estimated coecients,

(H
1
)i
, would have deviated from
the OLS estimates,

(H
0
)i
, which were based on the wrong assumption of
homoscedastic errors. Even if the OLS estimates are inecient when the
errors are heteroscedastic, they need not be biased and, depending on the
assumed type of heteroscedsticity h(z
0
t
), they often are not. In the latter case
the estimation error, b
i
, does not behave in a systematic way. For example,
if we simulated a model with heteroscedastic errors a large number of times,
then the average of the estimated coecients

b
i
would be approximately zero.
This does not exclude the possibility that the correction b
i
in one specic
model run can be (and often is) very large.
3. Calculate the LMtest statistic, TR
2
, where R
2
is a measure of goodness
of t in the auxiliary regression, which under the null is distributed as
2
(m)
or, alternatively:
4. Calculate the F-form of the LM test:

H
(m, T k m1) =

T 2k m
m

R
2
1 R
2

where k is the number of explanatory varibles and the constant. The LM


test is distributed as F(m, T k m1).
In the Breusch-Pagan test z
0
t
= [x
1,t
, ..., x
k,t
, x
2
1,t
, ..., x
2
k,t
], that is
2
t
is
regressed on the explanatory variables and their squares, and in the White
test
2
t
is regressed additionally on their crossproducts.
If heteroscedasticity is detected by these tests it is in most cases a sign of
model misspecication, rather than a true property of the error term.
4.2.4 Non-Normality
The Students t-test, the F-test, and the
2
-test assume that the disturbances
are normally distributed. However, if the sample size is quite large, the
test procedures are reasonably robust to moderate deviations from normality
as long as the distribution is relatively symmetrical. When the residuals
are skewed, often a result of outliers, inference can be unreliable based on
8 CHAPTER 4. EVALUATING THE STANDARD LINEAR MODEL
standard test procedures and caution is needed. It is always a good idea to
plot the residual to get a rst visual impression of possible deviations from
the normality assumption. If some of the residuals fall outside the interval
3b

it might be an indication that an intervention or similar extraordinary


events have taken place at that time period. A careful examination of the
economic calendar is indispensable when judging whether a big residuals
is an innovation outlier or only an extraordinary big residual. The former
needs often to be accounted for using a dummy variable as it corresponds to
an actual institutional event, whereas the latter can just be the result of a
misspecied model and may disappear when the problem has been corrected.
Similarly, it is useful to plot the a frequency distribution of the estimated
residuals together with the frequency distribution of a normal random vari-
able. Any large deviations from the normal distribution can then be seen by
inspection. Another way of illustrating the deviation of the residuals from
the normal distribution is by plotting the QQ graph. In this case normally
distributed residuals will lie on the 45
0
straight line and deviations from this
line indicate deviations from normality.
4.2.5 Skewness and Excess Kurtosis
The deviations from normality can be described by calculating the third
and forth moment of a standardized variable. Skewness yields information
about the extent to which a distribution is skewed away from the mean and
kurtosis yields information about the extent to which the distribution has
fat tails.
For a normal standardized variable the third central moment, skewness,
is dened as:
SK = E

x
t

3
and the fourth central moment, kurtosis, as:
K = E

x
t

4
.
Tests for normality are generally of a
2
type, using the result that the es-
timated skewness and kurtosis are asymptotically normal. This implies that
the sum of squared standardized skewness and squared standardized kurtosis
4.2. HETEROSCEDASTICITY 9
can be considered asymptotically
2
(2). However, in small and even moder-
ately large samples, the estimated test statistic has generally not followed
very closely the
2
distribution. This has led to various small sample correc-
tions of the normality test. As will be discussed below PcGive reports the
asymptotic
2
test and as well as a small sample corrected test. The deriva-
tion of the latter is quite complicated and we will, therefore, only report the
calculations of the asymptotic test statistics here.
Based on a sample of T observations on a the normal variable x
t
skewness
is calculated approximately distributed as:
d
SK =
T
X
i=1
1
T

x
t
x

x

3
and kurtosis as:
b
K =
T
X
i=1
1
T

x
t
x

x

4
where the estimated variance of x
t
is:

2
x
=

1
T
T
X
t=1
(x
t
x)
2
!
The estimated skewness is asymptotically distributed as:
d
SK
as
N(0,
6
T
)
and kurtosis as:
b
K
as
N(3,
24
T
).
The estimated residuals residuals from an OLS regression model sum to zero
by construction given that the model contains a constant term. This can be
seen from the OLS result:
X
0
= X
0
(I X(X
0
X)
1
X
0
) = 0
where
X
0
=

1 1 . . . 1
x
1,1
x
1,2
. . . x
1,T
.
.
.
.
.
.
.
.
.
x
k,1
x
k,2
. . . x
k,T

.
10 CHAPTER 4. EVALUATING THE STANDARD LINEAR MODEL
Since the rst element of X
0
= 0 corresponds to
P

t
= 0, the residuals
from an OLS regression model with a constant will always have a zero mean
and a zero sample correlation with the explanatory variables.
Using the result that = 0, the skewness of T estimated residuals residu-
als froman OLS regression model with k explanatory variables and a constant
is asymptotically distributed as:
d
SK =
1
T
T
X
i=1

3
=
1

3

1
T
T
X
i=1

3
t
!
and the kurtosis as:
b
K =
1
T
T
X
i=1

4
=
1

4

1
T
T
X
i=1

4
t
!
,
where the estimated error variance is:

2

1
T
T
X
t=1

2
t
!
For a reasonably large sample the estimated residual skewness is distributed
as:
d
SK
as
N(0,
6
T
)
and the kurtosis as:
b
K
as
N(3,
24
T
).
Instead of kurtosis it is often customary to report excess kurtosis, EK,
which is
EK = Kurtosis 3.
4.2.6 The Jarque-Bera Normality Test
Both the estimated skewness and excess kurtosis are used in the Jarque-
Bera normality test. The asymptotic version of the test is valid for very
large samples and the test statistic is calculated as
JB
A
=
T
6

d
SK
2
+
1
4
d
EK
2

4.2. HETEROSCEDASTICITY 11
The test is asymptotically
2
with 2 degrees of freedom, that is,
JB
A
as

2
(2)
.
This is the normality test in PcGive called the asymptotic test. If the null
hypothesis of normally distributed residuals is correct, then the test statistic
should be small (for example, less than
2
(.95)
= 5.9), otherwise normality is
rejected. However, simulation studies have shown that even in moderately
large samples the asymptotic test results are often not very good. This is
partly due to the fact that we use the estimated instead of the true error
variance
2

. A degrees of freedom corrected test is calculated as:


JB =

T k
T

JB
A
.
However, another problem with the Jargue-Bera test is that in nite samples
the skewness and the kurtosis are generally not uncorrelated, which is a
violation against the independence assumption of the
2
distribution. The
second test statistic in PcGive contains a degrees of freedom correction as well
as a correction for the sample dependence between skewness and kurtosis.
Jarque-Bera type of normality tests are often standard output in regres-
sion packages. It is often hard to know exactly how they have been calcu-
lated. If in doubt, one possibility, though a time consuming one, is to try to
replicate the results.
Econometrics 2.
December 10, 2003
An Introduction to
Monte Carlo Simulations and PcNaive
Heino Bohn Nielsen
Institute of Economics,
University of Copenhagen
1 of 18
Monte Carlo Simulations
Many results in econometrics are asymptotic, i.e. for T .
1. Often dicult to get a rm understanding of the results:
How should we think of repeated sampling?
What is the exact meaning of a central limit theorem?
2. How does a given estimator work in nite samples?
To answer these questions, simulation methods are often useful.
Get some intuition for asymptotic results.
Graphical representation of convergence, uncertainty etc.
Analyze nite sample properties (often very dicult analytically).
2 of 18
Outline of the Lecture
1. The basic idea in Monte Carlo simulations.
2. Example 1: Sample mean (OLS) of iid normals.
3. Introduction to PcNaive.
4. Example 2: OLS in an AR(1) model:
Consistency of OLS.
Finite sample bias.
5. Example 3: Simultaneous equation model:
Inconsistency of OLS.
Consistency of IV.
The idea of strong and weak instruments.
3 of 18
The Monte Carlo Idea
The Monte Carlo method replaces a dicult deterministic problem with a
stochastic problem with the same solution (e.g. due to LLN).
If we can solve the stochastic problem by simulations, labour intensive work
can be replaced by cheap capital intensive simulations.
What is the probability of success in the game Solitaire?
Very dicult analytical problem.
But a machine could play M times, and for M we could estimate
the probability.
What is the nite sample distribution of an estimator?
Very dicult in most situations.
We could generate M samples and look at the empirical distribution of
the estimates.
4 of 18
Note of Caution
The Monte Carlo method is a useful tool in econometrics.
But:
Simulations do not replace (asymptotic) theory.
Simulation can illustrate but not prove theorems.
Simulations results are not general.
Results are specic to the chosen setup.
Work like good examples.
5 of 18
Ex. 1: Mean of iid Normals
Consider the model
y
t
= +
t
,
t
N(0,
2
), t = 1, 2, ..., T. (1)
The OLS (and ML) estimator b of is the sample mean
b = T
1
T
X
t=1
y
t
.
Note, that b is consistent, unbiased and (exactly) normally distributed
b N(, T
1

2
).
The standard deviation of the estimate can be calculated as
b

= T
1
T
X
t=1
(y
t
b )
2
.
We call this the estimated standard error (ESE).
6 of 18
Ex. 1 (cont.): Illustration by Simulation
We can illustrate the results, if we can generate data from (1). We need:
1. A fully specied Data Generating Process (DGP), e.g.
y
t
= +
t
,
t
N(0,
2
), t = 1, 2, ..., T (2)
= 5

2
= 1.
An algorithm for drawing random numbers from N(, ).
Specify a sample length, e.g. T = 50.
2. An estimation model for y
t
and an estimator. Consider OLS in
y
t
= + u
t
. (3)
Note that the statistical model (3) and the DGP (2) need not coincide.
7 of 18
Ex. 1 (cont.): Four Realizations
Suppose we draw
1
, ...,
50
from N(0, 1) and construct a data set,
y
1
, ..., y
50
,
from (2).
We then estimate the model
y
t
= + u
t
,
using OLS to obtain the sample mean and the standard deviation in one
realization,
b
= 4.9528, ESE(
b
) = 0.1477.
We can look at more realizations
Realization
b
ESE(
b
)
1 4.98013 0.1477
2 5.04104 0.1320
3 4.99815 0.1479
4 4.82347 0.1504
Mean 4.96070 0.1445
8 of 18
3
4
5
6
7
First realization, Mean=4.98013
3
4
5
6
7
First realization, Mean=5.04104
2
4
6
First realization, Mean=4.99815
2
4
6
First realization, Mean=4.82347
9 of 18
Ex. 1 (cont.): More Replication
Now suppose we generate data from (2) M times,
y
m
1
, ..., y
m
50
, m = 1, 2, ..., M.
For each m we obtain a sample mean
b

m
.
We could then look at the mean estimate and the Monte Carlo standard
deviation
MEAN = M
1
M
X
m=1
b

m
MCSD =
v
u
u
t
M
1
M
X
m=1

m
MEAN

2
.
For large M we would expect:
MEAN to be close to the true . This bias is BIAS=MEAN.
MCSD to be close to the average ESE.
Depending on the properties of ESE.
10 of 18
Example 1 (cont.): Results
Note that MEAN is also an estimator. The standard deviation of MEAN
is the Monte Carlo standard error
MCSE = M

1
2
MCSD.
MCSD measures the uncertainty of
b
.
MCSE measures the uncertainty of the estimated MEAN in the simulation.
Consider the results for M = 1000 :
b
ESE(
b
)
1 4.98013 0.1477
2 5.04104 0.1320
.
.
.
.
.
.
.
.
.
1000 4.92140 0.1254
MEAN=4.9909 MEAN(ESE)=0.14088
MCSD=0.1386
MCSE=0.0044
11 of 18
Ex. 1 (cont.): More Results
3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0
0.25
0.50
0.75
1.00
Density, T=5
3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0
0.5
1.0
Density, T=10
3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0
1
2
3
Density, T=50
0 50 100 150 200 250
4.50
4.75
5.00
5.25
5.50
Estimates, different T
12 of 18
PcNaive
PcNaive is a menu-driven module in GiveWin.
Technically, PcNaive generates Ox code, which is then executed by Ox.
Output is returned in GiveWin.
Idea:
1. Set up the DGP
AR(1)
Static
PcNaive
General
2. Specify the estimation model
3. Choose estimators and test statistics to analyze.
4. Set specications: M, T etc.
5. Select output to generate.
6. Save and run.
13 of 18
Ex. 3: OLS in an AR(1)
Consider the DGP
y
t
= 0.9 y
t1
+
t

t
N(0, 1), t = 1, 2, ..., T.
We specify y
0
= 0, and discard 20 observations.
The estimation model is given by
y
t
= y
t1
+ u
t
,
and we estimate with OLS.
Note, that
b
is consistent but biased.
T MEAN BIAS MCSE
5 0.7590 0.1410 0.00475
10 0.7949 0.1051 0.00310
25 0.8410 0.0590 0.00172
50 0.8673 0.0327 0.00108
75 0.8779 0.0221 0.00082
100 0.8833 0.0167 0.00069
14 of 18
Ex. 3: Simultaneity Bias
Consider as a DGP the system of equations
y
at
= 0.5 y
bt
+ 0.75 y
at1
+
1t
y
bt
= + 0.8 z
t
+
2t
z
t
= + 0.75 z
t1
+
3t
for t = 1, 2, ..., 50, where


1t

2t

3t

0
0
0

1 0.2 0
0.2 1 0
0 0 1

.
We set y
a0
= z
0
= 0 and discard 20 observations.
Consider the OLS estimator in the estimation equation
y
at
=
0
+
1
y
bt
+
2
y
at1
+ u
t
.
Note that
cov(
1t
,
2t
) 6= 0 so that OLS is inconsistent.
z
t
is a valid instrument for y
bt
and IV is consistent.
15 of 18
Ex. 3: Results, OLS vs. IV: T=50
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1
0
2
4
No Simultaneity. OLS.
Yb
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1
0
2
4
Simultaneity. OLS.
Yb
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1
0
2
4
Simultaneity. IV.
Yb
16 of 18
Ex. 3: Results, OLS vs. IV: T=500
0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80
5
10
15
No simultaneity, OLS
0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80
5
10
15
Simultaneity, OLS
0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80
5
10
15
Simultaneity, IV
17 of 18
Ex. 3: Results, Strength of Instruments
-2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0
1
2
3
4
IV. Mediuminstrument. Yb =0.80*Z+e.
Yb
-2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0
0.5
1.0
1.5
IV. Weak instrument. Yb =0.30*Z+e.
Yb
-2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0
2.5
5.0
IV. Strong instrument. Yb =1.30*Z+e.
Yb
18 of 18
Chapter 5
Autocorrelation and Lagged
Variables
In the previous chapter we demonstrated that inference in the linear regres-
sion model (for example using t-tests and F-tests) can be totally misleading
if the error term is not independent of previous errors, i.e. if the error term
is autocorrelated. Therefore, it is very important always to test whether
the residuals can be assumed autocorrelated or not. Such test procedures
will be discussed in Section 5.1. Since economic time-series data as a rule
are strongly time-dependent, the errors
t
in static linear regression model,
y
t
=
0
x
t
+
t
, are almost always autocorrelated. In this case there are two
possibilities: (i) one can adjust the OLS estimates by correcting for the de-
tected autocorrelation or (ii) one can reformulate the static regression model
as a dynamic regression model. In most cases the latter procedure is likely
to yield much better results. Section 5.2 demonstrates how this can be done
using lagged variables.
5.1 Autoregressive Residuals
The assumption of independent errors, E(
t

s
) = 0, t 6= s, in seldom satised
in the static regression model. When this is the case we say that the OLS
residuals are autocorrelated which means that a residual observed at time t
is likely to be correlated with the previously observed residuals. Because the
notation
t
is usually reserved for an independent error process we will here
use the notation u
t
for an autocorrelated error term in the regression model,
1
2 CHAPTER 5. AUTOCORRELATION AND LAGGED VARIABLES
i.e. y
t
=
0
x
t
+u
t
.
5.1.1 The Autocorrelation and Partial Autocorrelation
Function.
The combined residual autocorrelation function (autocorrelogram) and the
partial autocorrelation function provides a rst description of the pattern of
time dependence. The correlation coecient between u
t
and u
th
, is calcu-
lated as
r
h
=
T
P
t=h+1
u
t
u
th
s

T
P
t=h+1
u
2
t

T
P
t=k+1
u
2
th

, t = 1 +h, 1 +h + 1, ..., T
=
Cov( u
t
, u
th
)

u,
t

u
th
If the residuals are independent over time, then the true autocorrelation
coecients are zero and the estimated values should be small. As a rule of
thumb a correlation coecient is signicant if r
h
> 2
1

T
, where
1

T
is the
approximate standard error for r
h
when the true is zero, i.e. when the er-
rors are uncorrelated at lag h. The autocorrelogram, or the autocorrelation
function, consists of r
h
, for h = 1, 2, ... Plotting the autocorrelation function
against the lag, h, gives a rst visual impression of the magnitude of the
autocorrelation problem. The autocorrelogram provides information about
simple correlation coecients and, when the errors are strongly time depen-
dent, the autocorrelations tend to be fairly large even for large values of h.
The residual autoregression of order h is dened from:
u
t
=
0,h
+
1,h
u
t1
+... +
h,h
u
th
+
t
, t = 1 +h, 1 +h + 1, ..., T
yields the multiple correlation coecient
i
between the residual at time t and
t i. One can say that it diers from the simple autocorrelation coecient,
r
i
, in the same way as a multiple regression coecient diers from a single
regression coecient: The coecient
i
measures the correlation between u
t
and u
ti
when corrected for the remaining lagged residuals.
The partial autocorrelation function of order h is dened by
1,1
,
2,2
, ...,
h,h
,
i.e. the last coecient in each of the successive autoregressions AR(1),
AR(2),...., AR(h).
5.1. AUTOREGRESSIVE RESIDUALS 3
Table 5.1: Residual autocorrelations and autoregressions
h 1 2 3 4 5 6 7 8
The residual autocorrelogram
r
i
0.69 0.57 0.42 0.31 0.19 0.15 0.06 -0.08
The residual autoregression

i,8
0.48 0.33 0.01 -0.02 -0.16 0.08 0.10 -0.21

i,7
0.49 0.32 0.03 -0.02 -0.15 0.03 -0.02

i,6
0.48 0.32 0.03 -0.02 -0.16 -0.02

i,5
0.48 0.32 0.03 -0.01 -0.14

i,4
0.49 0.33 -0.01 -0.09

i,3
0.50 0.30 -0.05

i,2
0.49 0.28

i,1
0.68
The partial autocorrelogram

i,P
0.68 0.28 -0.05 -0.09 -0.14 -0.02 -0.02 -0.21
Note that the sample estimates can depend on how we treat the rst
lagged initial values. There are essentially three possibilities: (1) we do
all calculations on the same sample t = 1, 2, ..., T and replace the initial
values u
1
, ..., u
h
with zero, (2) we do all calculations on the same sample
t = 1 + h, 1 + h + 1, ..., T, and (3) we estimate
1,1
on the sample t =
2, 3, ..., T,
2,2
on the sample t = 3, 4, ..., T. The autoregressions and the
partial autocorrelogram in Table 5.1 are based on alternative (2), which does
not exactly reproduce the output from PcGive.
By running an OLS regression in PC-Give and then selecting the test, one
can obtain the estimated residual autocorrelation and partial autocorrelation
function for a user dened lag length h.
As an illustration, the residual correlogram and the residual autogregres-
sion below of lag length 8 are calculated for the residuals from regressing
the real aggregate consumption on real domestic demand and real wealth in
Denmark, 1973:1-2003:1.
It is now easy to see the dierence between the two functions: While
the autocorrelations remain high until the seventh lag, only the rst two au-
toregressive coecients are quite high. Thus, a two period lag dependence
is probably sucient to account for autocorrelations in the OLS residuals.
Note, however, that these residuals have been estimated from a misspeci-
4 CHAPTER 5. AUTOCORRELATION AND LAGGED VARIABLES
ed model in the sense that the errors were supposed to be uncorrelated,
but obviously were not. When testing for the signicance of the residual
autoregressions we need to account for this fact as will be illustrated below.
5.1.2 An LM Test for Residual Autoregressive Errors
When testing the signicance of the residual autocorrelations, we need to
specify a model describing the dependence of u
t
on its lagged values, against
which we can test the null hypothesis of no autocorrelation. Although
it is possible to choose between various kinds of ARMA (AutoRegressive-
MovingAverage) models, the simple AR(m) model is by far the most popular
in empirical applications. The reason is, besides its simplicity, that the test
procedures for autocorrelated errors have been shown to work well for the
AR model, independently of whether the true model is an ARMA. We will,
therefore, only discuss the AR model here and leave the more detailed treat-
ment of ARMA models to the end of the course. We specify the regression
model with autoregressive residuals as:
y
t
=
0
x
t
+u
t
(5.1)
where
u
t
=
1
u
t1
+... +
m
u
tm
+
t
. (5.2)
The null hypothesis is
H
0
:
1
= . . . =
m
= 0,
which corresponds to the joint insignicance of the autoregression coecients,
and, thus, to the absence of autocorrelation, whereas the alternative hypoth-
esis is at least one
i
6= 0. To estimate (5.1) subject to the restriction (5.2)
is a highly nonlinear problem. Therefore, the type of test procedure to apply
is obviously the LM test procedure. It has been shown that this amounts to
running the usual auxiliary regression:
u
t
=
1
u
t1
+... +
m
u
tm
+(
0

0
) +(
1

1
)x
1,t
+... +(
k

k
)x
1,t
+
t
Note that we need to regress the estimated residuals not only the hypothet-
ical lag structure, but also on the original regressors. The latter have to
5.1. AUTOREGRESSIVE RESIDUALS 5
be included to correct for any bias in the original OLS regression due to
misspecication of the error term.
From the auxiliary regression we can calculate the LM test statistic:

(m) = TR
2
distributed as
2
(m) or, alternatively, the F-form:

F
(m, T k m) =
ESS
aux
/m
RSS
aux
/(T k m1)
=
T k m1
m
R
2
1 R
2
distributed as F(m, T k m1). When deciding about the lag length m
in (5.2) one can use the results from the partial autocorrelation function as
a coarse guide. Because the residuals are likely to have been inuenced by
the misspecication bias, it is advisable not to take the results too literally.
As an illustration we calculate the LM test assuming a residual autore-
gressive process of order ve. The estimated coecients of
1
, ...,
5
are given
below with their standard errors:
i 1 2 3 4 5

i
0.58 0.23 0.03 0.00 0.06

i
0.09 0.11 0.11 0.11 0.09
It appears that the estimated coecients dier quite substantially from the
estimated partial autocorrelation function and the third autoregressive co-
ecient is, in fact, more signicant than we might have expected. Based
on the RSS
aux
= 0.0302818 and the RSS
OLS
= 0.06112145 we can derive
the LM test statistics,

(5) = 61.05 [0.0000] and the F-form,


F
(5, 113) =
23.16[0.0000].
Because the LM test is a joint test of all the coecients, the power of
the test will decrease if we choose m too large. Therefor, the choice of m is
a compromise between not including too many insignicant coecients, but
at the same time not excluding any signicant coecients.
5.1.3 The Durbin-Watson Test for AR(1) Residuals
Acommonly used test for detecting rst-order autocorrelation in the residuals
is the Durbin-Watson d-statistic. The statistic is dened as
d =
T
P
t=2
( u
t
u
t1
)
2
T
P
1
u
2
t
,
6 CHAPTER 5. AUTOCORRELATION AND LAGGED VARIABLES
where u
t
is the vector of observed residuals from the regression model. The
d-statistic has a range of values of 0 < d < 4. Under the null of no rst order
autocorrelation the distribution of d is centered around 2. In the presence
of positive rst-order autocorrelation, that is for
1
> 0, the statistic tends
toward zero. For negative rst-order autocorrelation, it tends towards 4.
Thus, when testing the null hypothesis:
H
0
:
1
= 0,
against that the alternative:
H
1
:
1
> 0,
the DW statistic should lie in the interval 0 < d < 2. When testing against
the alternative of a negative coecient:
H
1
:
1
< 0,
the DW statistic should lie in the interval 2 < d < 4 and when comparing
with the critical DW values the appropriate statistic is 4 DW. Note, that
the d-statistic is derived under the assumption that there is no lagged depen-
dent variable in the model. If this is the case the d-statistic is biased toward
2, resulting in acceptance of the null of no rst-order residual autocorrelation
too often. Thus in this case it should not be used.
5.2 Lagged variables in the linear regression
model
In the past, the tendency has been to consider observed autocorrelation in
the residuals to be a defect in the model to be corrected for. It is now
understood that the presence of such deterministic processes in the residuals
suggests that the model is dynamically misspecied. The usual implication
of observed residual autocorrelation is that one or more lagged values of the
dependent variable should be added to the regressor set, which allows the
model to reect the dynamics features of the data.
We rst consider the following simple static model:
5.2. LAGGED VARIABLES IN THE LINEAR REGRESSION MODEL 7
y
t
=
0
+
1
x
t
+u
t
, and
u
t
= u
t1
+
t
,

t
N(0,
2

).
The parameter, , is the coecient of rst-order autocorrelation, and
t
is
a normal independent error, also called a white-noise process (a term
borrowed from engineering).
Noting that
u
t
= y
t

1
x
t
,
we can replace u
t
to obtain
y
t

1
x
t
= (y
t1

0
+
1
x
t1
) +
t
or,
y
t
= y
t1
+
1
x
t

1
x
t1
+ (1 )
0
+
t
(1 L) y
t
=
1
(1 L)x
t
+ (1 )
0
+
t
= a
1
y
t1
+b
0
x
t
+b
1
x
t1
+c
0
+
t
(5.3)
where the lag operator L is dened as L
m
x
t
= x
tm
, a
1
= , b
0
=
1
, b
1
=

1
, and c
0
= (1 )
0
. The middle row of (5.3) illustrates what is often
called the common factor dynamic regression model. Note that the GLS
method of correcting for residual autocorrelation is based on the validity of
the common factor model. Before using GLS one should, therefore, rst test
the validity of the common factor restrictions which can be done directly by
the common factor test in PCGive.
By re-specifying the model with a lagged dependent variable and a lag
of the explanatory variable as given by the third row of (5.3), we obtain a
model where residuals often are much closer to the OLS assumptions and a
model that more faithfully reects the systematic components of the data
generating process.
The general dynamic model is specied as:
A(L) y
t
= c
0
+B(L) x
t
+
t
, t = max (p, q) + 1, ..., T

t
N (0,
2
)
(5.4)
where:
A(L) = 1 a
1
L a
2
L
2
... a
p
L
p
B(L) = b
0
+b
1
L +b
2
L
2
+... +b
q
L
q
8 CHAPTER 5. AUTOCORRELATION AND LAGGED VARIABLES
If p = 1, q = 1 we obtain the simple dynamic linear model discussed above:
(1 a
1
L)y
t
= (b
0
b
1
L)x
t
+c
0
+
t
y
t
= a
1
y
t1
+b
0
x
t
+b
1
x
t1
+c
0
+
t
, t = 2, ..., T

t
N (0,
2
)
Let
x
t
=
_

_
y
t1
x
t
x
t1
1
_

_
=
_

_
a
1
b
0
b
1
c
0
_

_
The OLS-estimate of in y = X +

OLS
= (X
0
X)
1
X
0
y
= (X
0
X)
1
X
0
(X +)
= (X
0
X)
1
X
0
X + (X
0
X)
1
X
0

E(

OLS
) = if E(X
0
) = 0
(5.5)
The condition E(X
0
) = 0 is generally satised if E(
0
) =
2
I, i.e. if
t
is
uncorrelated
th
. Therefore, if the residuals from the dynamic linear model
are well-behaved, i.e.:
E(
0
) =
2
I
the OLS-estimates corresponds to the maximum likelihood estimates. There-
fore, the parameters of a linear model with lagged (dependent and/or ex-
planatory) variables can very well be estimated with OLS, which renders un-
biased estimates as long as the residuals are not autocorrelated. However, if
the explanatory variables x
t
are not stationary, cf. denition in KJ: Chapter
1, inference can be improved be properly accounting for the nonstationarity.
This will be discussed in subsequent chapters.
5.2.1 Deriving the Steady-State Solution
when there are lagged variables in the model. Example: the steady-state
solution in a simple model with one explanatory variable and p = 1, q = 1,
i.e. one lag in the model:
y
t
= a
1
y
t1
+b
0
x
t
+b
1
x
t1
+c
0
+
t

t
N(0,
2
)
(5.6)
5.2. LAGGED VARIABLES IN THE LINEAR REGRESSION MODEL 9
The long-run steady-state solution denes the hypothetical value towards
which the process would converge if one could switch o the errors. Thus, the
solution is found by solving the dynamic model for y
t
= y
t1
and x
t
= x
t1
,
i.e. by setting L = 1. For model (5.6) this yields the steady-state solution:
(1 a
1
)y = (b
0
+b
1
)x +c
0
y =

b
0
+b
1
1 a
1

x +
c
0
1 a
1
i.e.
y

=
1
x +
0
(5.7)
where:
1
=

b
0
+b
1
1 a
1

and
0
=
c
0
1 a
1
.
Using (5.7) it is now easy to see that the steady-state solution of the
dynamic linear model is closely related to the static linear model:
y
t
=
0
+
1
x
t
+u
t
, t = 1, ..., T (5.8)
where u
t
is an autocorrelated process.
5.2.2 The Dynamic Linear Model in Equilibrium-Correction
Form
It is often a good idea to reformulate the dynamic linear model in the
equilibrium-correction form (also called error-correction form). Instead of
lagged dependent and independent variables the model will contain current
and lagged dierences of the variables as well as the lagged deviation from
steady-state (the lagged equilibrium error). We illustrate with the simple
model (5.6):
10 CHAPTER 5. AUTOCORRELATION AND LAGGED VARIABLES
y
t
= a
1
y
t1
+b
0
x
t
+b
1
x
t1
+c
0
+
t
y
t
y
t1
+y
t1
= a
1
y
t1
+b
0
x
t
b
0
x
t1
+b
0
x
t1
+b
1
x
t1
+c
0
+
t
y
t
= y
t1
+a
1
y
t1
+b
0
x
t
+b
0
x
t1
+b
1
x
t1
+c
0
+
t
y
t
= b
0
x
t
(1 a
1
)y
t1
+ (b
0
+b
1
)x
t1
+c
0
+
t
y
t
= b
0
x
t
(1 a
1
)
| {z }

_
y
t1

(b
0
+b
1
)
1 a
1
| {z }

1
x
t1
+
c
0
1 a
1
| {z }

0
_

_
+
t
y
t
|{z}
changes in y
t
is explained by
= b
0
x
t
| {z }
Short-run
adjustment
to changes
in x
t
{y
t1

1
x
t1
+
0
}
| {z }
Short-run adjustment
to deviation from the
long-run steady-state
value (equilibrium value)
+
t
|{z}
new
shocks
(5.9)
Note that the ECM-model (5.9) is equivalent to the distributed lag model
(5.6) in the sense that they explain exactly as much of the variation of y
t
and the coecients of the ECM model can be calculated from the coe-
cients of the distributed lag model and vice versa. However, the ECM model
formulation has a number of advantages:
1. Generally, the stationarity condition is much more applicable to the
dierenced variables than to the variables in levels (as illustrated by
the graphs of the consumption data)
2. If the steady-state solution is statistically signicant (which is a testable
hypothesis) the deviation from steady-state (the equilibrium error) will
generally also be a stationary variable.
3. The multicollinearity between the explanatory variables which is a big
problem in models with variables in levels is much smaller in the trans-
formed model.
Thus, many of the problems which were plaguing the static linear regres-
sion model can be solved by the ECM transformation:
1. The Students t-test becomes applicable and provides a much more re-
liable tool for discriminating between important and less important
explanatory variables.
5.3. R
2
AND R
2
RELATIVE TO DIFFERENCES AND SEASONALS 11
2. The R
2
becomes likewise a reliable measure of the goodness of t of
the model.
3. The ECM model can be interpreted in a much richer framework as
it is possible to distinguish between short-run (transitory) eects of
changes in the explanatory variables and the long-run adjustment to-
wards steady-state.
5.3 R
2
and R
2
Relative to Dierences and
Seasonals
In principle, R
2
provides a measure of the residual variance in relation to
the total variation to be explained. It is regarded as a test of the goodness
of t of a standard linear regression because a low value indicates that the
model lacks explanatory power. However, the converse is not always true: A
high R
2
does not necessarily mean that the model has explanatory power. If
the data are trending (nonstationary) then a high R
2
calculated as
R
2
= 1
RSS
TSS
= 1
u
2
t
(y
t
y)
2
,
is misleading because the average, y, is a poor baseline description of the
variable y
t
. In this case, a more appropriate statistic is R
2
relative to
dierences and seasonals. It is dened as
R
2
corr
= 1
u
2
t
(y
t
y)
2
,
where y is the mean of the change in y
t
, corrected for seasonal eects.
Unlike R
2
, R
2
corr
can be negative. In such a case the RSS from the regression
is larger than the variance of the deviation of y from its average seasonal
change, so that the variation of y is explained better by the average seasonal
change than by the regressors.
Chapter 6
Testing for Structural Change
All tests discussed so far (Students t tests, the
2
tests, or the F-tests) are
based on the assumption that the parameters of the model are constant, i.e.
that the covariances of the data have remained constant over the sample
period. Quite frequently there is reason to believe that this is an unrealistic
assumption and that the model can be improved by accounting for regime
shifts in the economy which may have caused some of the model parameters to
change. For example, periods of war and peace are very likely to be dierent
in terms of macroeconomic behavior. Other examples are oating versus
xed exchange rate regimes, and periods of regulation versus deregulation,
changes between left or right wing governments might easily imply changes
in policies and, thus, result in changes in the parameters of the model.
One can generally distinguish between two dierent test situations:
1. The case when we have good reasons to suspect a change in the param-
eters at a specic point in time, for example because we know there was
a change in regime at this time point. In this case we usually perform
a Chow-test for a known structural break.
2. The case when we have no specic hypotheses about the time point for
a structural break but, nevertheless, would like to test whether there
could have been any over the sample period. In this case we do not
exclude the possibility that there might have been changes in the model
parameters and we use diagnostic Chow tests to nd out whether this
might be the case and if so where the changes seem to be be.
The rst case is usually quite straightforward, as it is easy to test a
well-specied hypothesis. The second case is more diuse, partly because
1
2 CHAPTER 6. TESTING FOR STRUCTURAL CHANGE
the available test procedures are not very good at nding the time points
when the break happens, partly because estimated model parameters can be
unstable as a result of other types of model misspecication. For example,
the correct model is a loglinear model, but we have used levels of variables,
or we estimated a static instead of a dynamic model.
6.1 Tests of a structural break at a known
time point
In this case the sample is split into two sub-samples, one consisting of the
observations up to the suspected break, t = 1, ..., T
1
, and another sub-sample
t = T
1
, ..., T containing the rest of the observations, T T
1
= T
2
.
As an illustration we consider the regression model with two explanatory
variables and a constant term.
y
t
=
0
+
1
x
1t
+
2
x
2t
+
t
,
t = 1, . . . , T

t
NI(0,
2

),
The null hypothesis is formulated as no change in the model parameters
and the alternative hypothesis as at least one parameter has changed i.e.:
The following hypotheses are of interest:
H
0
: The parameters (
0
,
1
,
2
) are constant over the entire sample period.
H
1
: The parameters (
0
,
1
,
2
) changes at time T
1
to
: (
0
+
0
,
1
+
1
,
2
+
2
), where at least one
i
6= 0, i = 1, 2, 3,
and remain constant until the end of the sample period.
Under the null of no structural change the sum (RSS
T
1
+RSS
T
2
) should
not to dier signicantly from the RSS
T
obtained from the full-sample model.
This is the idea behind the structural change Chow-test:
F
Chow
=
(RSS
T
(RSS
T
1
+RSS
T
2
))/k
(RSS
T
1
+RSS
2
)/(T 2k)
F
(k,T2k)
,
where RSS
T
1
and RSS
T
2
are the residual sums of squares from the two sub-
models, and RSS
T
is the residual sum of squares from the single model of
the entire sample. A signicant F-statistic indicates that (RSS
T
1
+ RSS
T
2
)
6.1. TESTS OF A STRUCTURAL BREAK AT A KNOWN TIME POINT3
is signicantly smaller than RSS
T
, suggesting the presence of a structural
change. The degrees of freedom are k in nominator as we test the stability
of k parameters (k = 3 in the example above) whereas in the denominator
the degrees of freedom is T 2k, because the model with k regressors is
estimated twice.
The Chow test can easily be generalized to more than two sub-periods,
corresponding to multiple structural breaks.
A special case arises when one of the sub-periods has the same or a smaller
number of observations than the model, that is, when k T
i
. In this case
the OLS estimator cannot be calculated for the smaller sub-sample and the
Chow-test is calculated as
F
Chow
=
(RSS
T
RSS
T
1
)/T
2
RSS
T
1
/(T
1
k)
F
(T
2
,T
1
k)
.
The Chow test is an overall test that does not distinguish between whether
all of the parameters have changed or whether only some subset of the param-
eters have done so. Another more informative way of testing for structural
change is to use dummy variables to create new split sample variables in
the following way:
First, dene an indicator variable (a shift dummy variable)
D
T
1
=
0 for t = 1, ..., T
1
1
1 for t = T
1
, ..., T
Then create the new variables, x

i,t
= D
T
1
x
i,t
, i = 1, ..., k. For a regression
model with two explanatory variables and a constant term the new data
matrix will look as follows:
X
H
1
=

X
T
1
0
X
T
2
X
T
2

1 x
1
1
x
2
1
0 0 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1 x
1
T11
x
2
T11
0 0 0
1 x
1
T1
x
2
T1
1 x
1
T1
x
2
T1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1 x
1
T
x
2
T
1 x
1
T
x
2
T

Thus, under H
1
, the regression model contain 2k regressors, which for the
simple two variable model above leads to the following regression model:
y
t
=
0
+
1
x
1t
+
2
x
2
+

0
+

1
x

1t
+

2
x

2t
+

t
,
4 CHAPTER 6. TESTING FOR STRUCTURAL CHANGE
It is now easy to test the overall hypothesis of parameter change at time
T
1
as the joint hypothesis of (

0
,

1
,

2
) = 0. If rejected, at least one of the
parameters have changed at time T
1
. The testing can, for example, be done
as a Wald test, R = r, where
[R, r] =

0, 0, 0, 1, 0, 0, 0
0, 0, 0, 0, 1 0, 0
0, 0, 0, 0, 0, 1 0

.
The advantage of this procedure is that it is easy to test the signicance
of each coecient with a Students t-test, so that we can nd out which
coecients have changed, if any. In most cases we have from the outset a
prior hypothesis regarding which coecients are likely to have changed and
which not. For example, if we expect the regime shift to exclusively have
changed the constant term in the model, i.e. have caused a level shift in y
t
at time T
1
, then we would only add the shift dummy D
T
1
to the model and
then test the hypothesis

0
= 0. Thus, the dummy procedure allows us to
be more specic regarding which parameters we believe have changed.
6.2 Diagnosing Structural Change
The above Chow tests for a structural change have been designed to detect
a regime shift in the economy at a specic point in time. Sometimes we do
not know for sure whether there have been any regime shifts or not or we
might suspect such shifts to have occurred but do not know when. By the use
of recursive methods, a sample can be successively split and the model re-
estimated for each consecutive sample, thus providing a method of locating
the point in the sample where the evidence for a structural break is the
strongest. When this is said it is often very dicult to diagnose a structural
change based on the recursive graphs. This is so partly because adding one
more observation belonging to a new regime may not be enough to produce
a signicant test statistic, partly because a signicant test statistic need not
always be evidence in favor of a regime shift, it can as well be the result of
the model being more generally misspecied. For example, if the estimated
model has autocorrelated errors, then the assumptions under which most of
the test statistics are derived are not satised. Thus, the relevant distinction
is one between a regime shift or structural change on the one hand, and the
6.2. DIAGNOSING STRUCTURAL CHANGE 5
instability of the parameters that arises from a mis-specied model on the
other.
6.2.1 Analysis of 1-Step Forecasts
In most cases the purpose of empirical modelling is partly to obtain a better
understanding of the empirical mechanism that have generated our data,
partly to use this knowledge to obtain better forecasts for future periods.
Thus the task is to estimate a good econometric model and then use it to
produce a forecast of the variable, i.e. E
t
(y
t+h
| x
t+h
) for the next period
(h = 1), or possibly for several periods ahead (h = 1, 2, ...). In a realistic
forecasting situation we do not know the future value of x
t+h
. Thus, we
should rst calculate forecasts for the explanatory variables to be able to
calculate forecasts for (y
t+h
| x
t+h
). This is called dynamic forecasting or ex
ante forecasting.
Most people would believe that an econometric model with more explana-
tory power would generate better forecasts than a model with less explana-
tory power. Unfortunately this is not always true. In some cases simple mod-
els, like y
t
= y
t1
+
t
, with essentially no explanatory content provides
better forecasts than sophisticated explanatory models. When this happens
it is often an indication either that our econometric model is misspecied
or that there has been an unpredictable change in the model parameters
over the forecast period. In the latter case the forecast failure can only be
remedied ex post, that is after the event that caused forcast failure. This is
because it is impossible to include information about unpredictable events in
our forecast procedure (such events could, for example, be the tearing down
of the Berlin wall).
But, if the forecast failure is due to model misspecication then the rem-
edy is to improve the specication of our model. To nd out whether our
model is likely to suer from forecast failure due to misspecication we can
perform an ex post forecasting exercise. The idea is the following: Assume
that we had estimated our present model based on a sample 1, ..., T
1
where
T
1
< T and used it to forecast E
t
(y
T
1
+h
| x
T
1
+h
), h = 1, .., T
2
, T
2
= T T
1
where x
T
1
+h
is known. The following hypotheses are being tested:
H
0
: (T
2
) = (T
1
),
2

(T
2
) =
2

(T
1
), and E(
t

th
) = 0 for h 6= 0
H
1
: (T
2
) 6= (T
1
), or
2

(T
2
) 6=
2

(T
1
), or E(
t

th
) 6= 0
6 CHAPTER 6. TESTING FOR STRUCTURAL CHANGE
Ex post forecasts, which are systematically incorrect, either too big or small,
or systematically positive or negative, will be interpreted as a sign of model
misspecication.
6.2.2 The 1-Step Forecasts
The purpose of this section is to discuss ex post forecast analysis as a means
to detect model misspecication. The idea is to divide the sample into a
estimation base period, t = 1, ..., T
1
and a forecast period, t = T
1
+ 1, ..., T.
The 1-step forecasts, b y
T
1
+1
, . . . , b y
T
, are calculated based on a model that has
been estimated over the base period t = 1, . . . , T
1
, for given the values of
x
t
, t = T
1
+ 1, . . . , T, i.e.:
b y
T
1
+i
= x
0
T
1
+i
b
, i = 1, ..., T
2
where T
2
= T T
1
and
b
has been estimated on the T
1
rst sample ob-
servations. The 1-step forecast error, e e
t
, is dened as the departure of the
prediction, b y
t
, from the actual value, y
t
, i.e.:
e e
t
= y
t
b y
t
= y
t
x
0
t
b
, t = T
1
+ 1, ..., T.
(6.1a)
Under the null of a correctly specied model we can substitute the true value
of y
t
= x
0
t
+
t
, for y
t
in (6.1a)
ee
t
= x
0
t

t
x
0
t
b

= x
t
(
b
) +
t
,
and it appears that the forecast error is made up of two components, the
rst of which is due to the deviation of the estimated parameters from their
true values, (
b
), and the second,
t
, is due to an (unpredictable) random
shock. The variance of the 1-step forecast error is given as
var(ee
t
) = x
0
t
V ar(
b
)x
t
+V ar(
2
t
)
= b
2

x
0
t
(X
0
X)
1
x
t
+ b
2

= b
2

(1 +x
0
t
(X
0
X)
1
x
t
),
where b
2

is the estimated within-sample residual variance.


The forecast error t-value is the forecast error divided by the forecast
standard error
t
f
=
e e
t
se(ee
t
)
=
y
t
b y
t
b

q
1 +x
0
t
(X
0
T
1
X
T
1
)
1
x
t
, t = T
1
, ..., T
6.2. DIAGNOSING STRUCTURAL CHANGE 7
which is t-distributed with T
1
k degrees of freedom:
e e
t
se(e e
t
)
t
(T
1
k)
.
This provides a test of the signicance of an forecast error. The 1-step ahead
forecasts can be evaluated graphically by plotting the actual and predicted
values on the same graph. By plotting 95% condence intervals around the
forecast, b y
t
1.96se(ee
t
), one can illustrate whether the actual realization, y
t
,
would have fallen within the the condence bands of the model forecast.
6.2.3 Chow Type Forecast Tests
The t test of individual forecast errors tells us whether at one specic pe-
riod our model forecast would have failed or not, but does not say much
about the joint forecasting performance of our model. If the model works
adequately also in the forecast period we would expect the forecast errors to
be independent and have approximately the same variance as the base period
residuals. Therefore, a comparison of the base period residual variance,
2

,
with the forecast-period residual variance can be used to evaluate the joint
performance of the forecasts.
Under the H
0
hypothesis of identically and independently distributed
forecast errors, the sum of squared standardized forecast errors is distributed
as
2
(T
2
)
. Since we use the approximate error variance,
2

, instead of the true


one, the forecast
2
test is only approximately distributed as
2
, i.e.:

1
=
T
X
t=T
1
+1
e e
2
t
b
2

app.

2
(T
2
)
,
When
1
>
2
c
(T
2
) we reject the null hypothesis that our base sample model
is correctly specied as a description of the behavior in the forecast period.
The F form becomes:

2
=
T
X
t=T
1
+1
e e
2
t
/T
2
(T
1
k)b
2

/(T
1
k)
=

1
T
2
app
F(T
2
, T
1
k)
8 CHAPTER 6. TESTING FOR STRUCTURAL CHANGE
A similar test procedure, called the forecast Chow Test, compares the
dierence between RSS
T
calculated for the full period, and the RSS
T
1
calcu-
lated for the baseline period with the RSS
T
1
, i.e. the residual error variance
for the baseline period. The null hypothesis is H
0
: (T
1
) = (T
2
) = ,

(T
1
) =
2

(T
2
) =
2

, and E(
t

th
) = 0, under which the Chow test F
statistic is

3
=
(RSS
T
RSS
T
1
)/T
2
RSS
T
1
/(T
1
k)
app
F
(T
2
,T
1
k)
.
The forecast Chow F-test is essentially a way of checking whether the
baseline model would be appropriate over the entire sample, so it is more of
a test for a structural break at or near period t = T
1
.
6.3 Recursive Methods
Recursive estimation is used as a general diagnostic tool to detect signs of
parameter nonconstancy over the sample period. It is primarily meant to
be used as a graphical device to detect any nonconstacy problems and, after
the problems have been solved, as a way to convince the reader that the
empirical results can be trusted. Similarly as in the 1-step forecast analysis
the sample needs to be split into a base period, t = 1, . . . , T
1
, and a recursive
test period, t = T
1
+1, . . . , T. However, unlike the one-step forecast analysis,
the model parameters are updated at each new observation in time.
6.3.1 Recursive parameter estimates
For a given base period the OLS parameter estimates and their variance can
be calculated as
b

T
1
= (X
0
T
1
X
T
1
)
1
X
0
T
1
y
T
1
Cov(
b

T
1
) =
2
,T
1
(X
0
T
1
X
T
1
)
1
.
where the subscript indicates the length of the estimation period. Increasing
the sample recursively by one observation at the time, and reestimating the
parameters {
b

T
1
+j
, Cov(
b

T
1
+j
)}, j = 1, ..., T
2
, yields a series of recursively
calculated parameter estimates which can be plotted against time. It is cus-
tomary to plot
b

i,T
1
+j
1.96
q
V ar(
b

i,T
1
+j
), j = 1, ..., T
2
and check whether
6.3. RECURSIVE METHODS 9
any
b

i,T
1
+i
, i = 1, ..., T
2
, falls outside the condence bands given by the
min

i,T
1
+j
+ 1.96
q
V ar(
b

i,T
1
+j
)

and the max

i,T
1
+i
1.96
q
V ar(
b

i,T
1
+i
)

.
When this happens it may be an indication of parameter nonconstancy
around that date. Large variation in the recursive estimates is indicative
of parameter instability and model mis-specication.
6.3.2 Recursive residuals and innovations
The recursive residuals are dened by:

t
= y
t

0
x
t
, t = T
1
+ 1, ..., T.
The standardized recursive residuals are dened by

st
t
=
t
/
t,
, t = T
1
+ 1, ..., T.
When a standardized residual
st
t
falls outside the 1.96 condence bands it
may indicate that there has been an outlier observation at time t, and one
should try to nd out why the model was not able to explain this observation.
The recursive innovations are dened by
v
t
= y
t
x
0
t
b

t1
, t = T
1
+ 1, . . . T,
which is similar to the 1-step forecast error discussed above, except that here
the estimation of
b
is updated for each period. Under the null of a correctly
specied model with constant parameters, the innovations should be
v
t
N(0,
2
v
).
where the innovation variance is given by:

2
v
=
2

(1 +x
0
t
(X
0
t1
X
t1
)
1
x
t
)
The estimated standardized innovations are dened as:
v
st
t
=
y
t
x
0
t
b

t1
b

p
1 +x
0
t
(X
0
t1
X
t1
)
1
x
t
=
v
t

v
, t = T
1
+ 1, . . . T. (6.2)
Under the null hypothesis of a correctly specied model, we expect these
to be normally distributed with mean zero and variance one, i.e.:
v
st
t
N
id
(0, 1).
10 CHAPTER 6. TESTING FOR STRUCTURAL CHANGE
The recursive residual sum of squares, or cumulative sum of squares
(CUSUMSQ) is computed similarly. Starting from the base period resid-
ual sum of squares, RSS
T
1
, the recursive RSS
T
1
+i
are dened as
RSS
T
1
+i
= RSS
T
1
+
T
1
+i
X
t=T
1
+1

2
t
, i = 1, ..., T
2
where
t
is the recursive residual, dened as

t
= y
t
x
0
t
b

t
, for t = T
1
+ 1, . . . , T.
Under the null hypothesis of a correctly specied model for the whole
sample, the cumulative sum of squared residuals should not deviate signi-
cantly from the straight 45-degree line. Deviations from the 45-degree line is
an indication of changes in either or
2

, or both.
6.4 Recursive Chow Tests
Given the recursive residual sum of squares, a series of recursive Chow-type
F-tests can be constructed. These tests are scaled by PC-Give by the critical
values to correct for the changing degrees of freedom as the recursion proceeds
from T
1
+1 to T. Thus, the critical values appear as a straight line at unity
in all of the three Chow test graphs.
The rst of these, the 1-step ahead Chow Test is calculated as:
Chow1 =
(RSS
t
RSS
t1
)/1
RSS
t1
/(t 1 k)
F
(1, t1k)
, t = T
1
+ 1, ..., T.
and tests recursively whether at time t 1 the next observation at time t is
likely to have been generated by the same model as was estimated for the
t 1 the sample. For example, if it happens that y
t
is extraordinary large,
even if the explanatory variables have not changed much or, alternatively, the
explanatory variables have changed a lot, but y
t
is almost unchanged then
the Chow test is likely to be above the critical unit line indicating that the
model was not able to forecast y
t
given the information at time t 1. This
may either be indicative of an intervention (or some kind of extraordinary
event) at time t, or of permanent changes in the model parameters. In the
rst case, the subsequent Chow test values will probably return back below
6.4. RECURSIVE CHOW TESTS 11
the critical unit line, in the second case they will probably continue to stay
above the line.
The second Chow test, sometimes denoted the T
2
(Breakpoint) Chow
Test is based on the following idea: Assume that there is one regime shift
somewhere between T
1
and T and that the model parameters are signicantly
dierent between the two regimes. By recursively testing for a regime shift
between the subsamples T
1
+j and T (T
1
+j) one would expect the Chow
test value to be at the highest exactly when the regime shift takes place.
Therefore, this test should have good properties to detect parameter changes
when there is exactly one regime shift in the sample period and the shift
takes place between T
1
and T. The T
2
Chow test is calculated as:
Chow2 =
(RSS
T
RSS
t
)/(T t)
RSS
t
/(t k)
F
(Tt, tk)
, t = T
1
+ 1, ..., T.
It is easy to see that the forecast period (T t) goes down as the recursion
proceeds until nally the last value of the T
2
test is the same as the 1-step
ahead Chow Test.
Note however, if the Chow T
2
test has shown a signicant value at
T
1
+m, then the remaining test values of a regime shift between (T
1
+m+j)
and T (T
1
+m+j), j = 1, 2, ... are no longer valid as we have already found
that the period (T
1
+m+j) contain information about two regimes.
The third Chow test, sometimes denoted the T
2
Chow Test is based on
the following idea: Say we are quite convinced that in the period t = 1, ..., T
1
there has been no regime shift, i.e. we believe the model parameters to be
constant. However, we are less sure about parameter constancy between
T
1
+ 1 and T. Also, if there has been a shift we expect it to have occurred
shortly after T
1
which we have chosen to be the baseline sample. We will
now test the occurrence of a parameter shift by testing whether our baseline
model can predict the next period ahead value of y
T
1
+1
, and if the test does
not reject, whether it can predict the next two periods ahead, and so on,
until the test rejects rst time at T
1
+m. After the test has rejected once the
remaining tests values are no longer valid based on the same arguments as
for the second Chow test. Thus the third Chow test compares whether the
baseline model can forecast at increasingly long forecast horizons, starting
from 1 to T
2
. Note, however, if the shift in the parameters takes place far
away from the end of the baseline period, this test procedure may not nd
the shift as the eect of the rst poor forecast is likely to drown among the
many good forecasts. Therefore, for this procedure to have reasonably good
12 CHAPTER 6. TESTING FOR STRUCTURAL CHANGE
power we should choose the baseline period as closely as possible to the date
of a hypothetical regime shift. The T
2
Chow Test is calculated as:
F
T
2
Chow
=
(RSS
t
RSS
T
1
)/(t T
1
)
RSS
T
1
/(T
1
k)
F
(tT
1
,T
1
k)
, t = T
1
+ 1, ..., T.
Note that the rst value of this test is equivalent to the rst value of the
1 Chow Test.
Chapter 7
Maximum Likelihood
Estimation and Testing
7.1 ML estimation
Given T sample observations on y
t
and x
0
t
= [x
1,t
, ....x
k,t
] and a hypothetical
model y
t
=
0
x
t
+
t
, t = 1, ..., T where
t
Niid(0,
2

) and x
t
is considered
xed or given, we have that E(y
t
| x
t
) =
0
x
t
and V ar(y
t
| x
t
) =
2

.
Under these assumption the density function of (y
t
| x
t
) is
f(y
t
| x
t
; ,
2

) =
1
p
2
2

exp

1
2
(y
t

0
x
t
)
2

.
The density function gives the probability of observing dierent values of
y
t
E(y
t
| x
t
) =
t
. If the model and the normality assumption is correct
then it would be very unlikely to observe errors |
t
| > 3.3

.
The joint density function is the multiplicative sum of all the T joint
densities (given that the assumption of independent errors is correct):
f(y
1
, ..., y
T
| x
t
; ,
2

) =

1
p
2
2

!
T
T
Y
t=1
exp

1
2
(y
t

0
x
t
)
2

. (7.1)
The likelihood function is dened by
L
t
(,
2

| y
t
, x
t
) = f(y
1
, ..., y
T
| x
t
; ,
2

),
1
2CHAPTER7. MAXIMUMLIKELIHOODESTIMATIONANDTESTING
i.e. for known values of the parameters and
2

the likelihood function and


the density function are identical.
Consequently, the joint likelihood function is
L(,
2

| y, X) = f(y
1
, ..., y
T
| x
t
; ,
2

)
Since we generally do not know the true parameter values, and
2

, we
use the likelihood function to nd the estimates which maximises the joint
likelihood function, given the observed data and the hypothetical model. The
ML estimate

ML
= (

ML
,
2
,ML
) is dened by the point where
Dlog L
D
=
log L

= 0.
The function
Dlog L
D
is called the score function and measures how much the
total likelihood would increase when changing the value of the parameter
. At the point where the score function is zero it is no longer possible to
increase the likelihood, and this value of corresponds to the maximum
likelihood. Thus
b

ML
gives an estimator which has the highest likelihood for
describing the true parameter
7.1. ML ESTIMATION 3
Example: The linear regression model with Niid errors. The log likeli-
hood function is:
log L(,
2

| y, X) =
T
2
log 2
2

1
2
P
T
t=1
(y
t

0
x
t
)
2

where
P
T
t=1
(y
t

0
x
t
)
2
= y
0
y + (X
0
X)
0
2X
0
y. The rst derivative
w.r.t. is
log L

=
1
2
2

(2X
0
X 2X
0
y) = 0 (7.2)
which gives the ML estimator

ML
= (X
0
X)
1
X
0
y, i.e. the ML estimator in
the linear regression model is identical to the OLS estimator.
The derivative w.r.t.
2

is
log L

=
T
2
1

+
1
2
P
T
t=1
(y
t

0
x
t
)
2

= 0
=
1
2
2

T
P
T
t=1
(y
t

0
x
t
)
2

!
= 0 (7.3)
which gives the ML estimator
2
,ML
= 1/T
P
T
t=1
(y
t

0
x
t
)
2
, i.e. the ML
estimator is similar to the OLS estimator except that we divide by T instead
of T k. Thus, ML estimator of
2

is consistent but not unbiased.


To be able to test hypothesis on we need to nd the variance of and
its distribution. Based on the central limit theorem and under weak regu-
larity conditions (which generally means independent errors and stationary
variables) the ML estimators are:
1. consistent
2. asymptotically ecient
3. asymptotically normally distributed

T(

) N(0, TV) where


V is the asymptotic variance of , dened as the inverse information
matrix
V = I
1
()
which is dened as:
I() E

2
log L()

.
4CHAPTER7. MAXIMUMLIKELIHOODESTIMATIONANDTESTING
The information matrix is important because its inverse I
1
() = V
gives the minimum variance bound of an estimator
b
. As the sample size
increases, the information in the data increases implying that the information
matrix becomes increasingly large and the variance V ar(

) = V = I
1
()
increasingly small. Thus, the parameters become more and more precisely
estimated. When T , the estimation error (

) 0, and V 0.
Therefore, when deriving asymptotic theory results one has to multiply the
estimation error (

) with

T and the variance V with T. In general TV


becomes a constant matrix as T .
The information matrix is generally a function of the unknown parame-
ters . Therefore, for practical purposes we need an approximation to the
information matrix, I(

), evaluated at a consistent estimator



. The nota-
tion I
1
(

) =

V will be used here to denote a consistent estimate of the
variance of the ML estimate
b
.
The properties of the ML estimators are usually based on asymptotic
results, for which the information matrix plays a central role. We will now
illustrate how the information matrix can be calculated based on the simple
linear regression model with Niid errors.
Based on (7.2) we nd

2
log L

0
=
(X
0
X)

,
where
0
= [,
2

] and the information


I
11
() =
1

E {(X
0
X)} =
1

TM.
where I
11
() is the rst element of the information matrix. When x
t
is
stationary E {(X
0
X)}
T
TM where the elements of M are dened by:
m
ij
(T) =
1
T
T
X
t=1
x
i,t
x
j,t
T
m
ij
.
Thus, the variance of

, V
11
=
2

(X
0
X)
1
11
, where V
11
is the rst element
of the variance matrix V, is identical to the OLS estimator. The minimum
variance bound for

is
2

(TM)
1
. A consistent estimator of V ar(

) is:
V ar(

) = V
11
(

) =
2

(X
0
X)
1
11
.
7.2. DERIVING THE LR, WALD, AND LM TESTS 5
Example:
Based on (7.3) we nd

2
log L

=
T
2
4


2
2
P
T
t=1
(y
t

0
x
t
)
2

=
T
2

2T
2

2
6

=
T
2
4

Thus, the information w.r.t.


2

is
I
22
() =
T
2
4

and the minimum variance bound for


2

is V
22
=
2
T

. A consistent estimator
of the variance of
2

is:
V ar(
2

) =

V
22
=
2
T

4

.
The covariance between and
2

can be found from:

2
log L

=
1
2
4

(2X
0
X 2X
0
y).
By inserting = (X
0
X)
1
X
0
y it is easy to see that

2
log L

= 0, i.e. the
estimates of and of
2

are independent. Thus, the information matrix for


the linear regression model is:
I() =
"
1

(X
0
X) 0
0
T
2
4

#
and the asymptotic variance-covariance matrix is
V =
"

2

(X
0
X)
1
0
0
2
4

T
#
.
7.2 Deriving the LR, Wald, and LM tests
It is useful from the beginning to think about the test procedures as testing a
big model (the unrestricted model) versus a small model (the restricted
model). The three procedures dier from each other in the following way:
6CHAPTER7. MAXIMUMLIKELIHOODESTIMATIONANDTESTING
1. Likelihood Ratio test: Both models are estimated.
2. Wald-test: Only the unrestricted model is estimated.
3. LM-test: Only the restricted model is estimated.
To distinguish between the two models we call them the model under
H
1
(mostly the unrestricted model) and the model under H
0
(mostly the
restricted model), and their parameters
1
resp.
0
. We use the notation
b

for parameter-estimates, and for the true parameter-values.


The LR procedure is used when both the restricted model H
0
and the
unrestricted model H
1
are easy to estimate.
The Wald procedure is used when the unrestricted model H
1
is easy to
estimate, but the restricted model H
0
is not. E.g. the restricted model
is nonlinear in parameters. Only
b

0
is estimated in this case.
The LM procedure is used when the restricted model H
0
is easy to esti-
mate, but the unrestricted H
1
is not. For example, when the restricted
model is linear in the parameters parameters but the unrestricted model
is not we only need the estimate
b

0
for the LM test.
Figure 1 illustrates:
7.3. THE LIKELIHOOD RATIO PROCEDURE: 7
7.3 The Likelihood Ratio procedure:
The likelihood ratio statistic is dened by LR = 2 log() where
log() = log

L(
b

0
)
L(
b

1
)
!
= log L(
b

0
) log L(
b

1
) (7.4)
The log likelihood function is developed around
b

1
using a Taylor series
expansion
log L(
b

0
) ' log L(
b

1
)+Dlog L(
b

1
)
| {z }
0
(
b

1
)+
1
2
(
b

1
)
0
D
2
log L(
b

1
)(
b

1
)
where Dlog L(
b

1
) = 0 because
b

1
is the unrestricted ML-estimate. Thus:
log L(
b

0
) log L(
b

1
) '
1
2
(
b

1
)
0
D
2
log L(
b

1
)(
b

1
) (7.5)
Under H
0
the asymptotic distibution of the ML estimator is:

T(
b

0
) N (0, TV) ,
where is the true parameter value and V is the asymptotic variance and
b

0
N (0, V) .
The distribution of the Likelihood Ratio test statistic is derived under the
assumption that H
0
is true, i.e. both
b

0
and
b

1
are estimates of the true
8CHAPTER7. MAXIMUMLIKELIHOODESTIMATIONANDTESTING
parameter value . Therefore, we can approximate
b

1
with and nd the
asymptotic distribution for the Likelihood Ratio statistic

LR
= 2 log() (
b

1
)
0
I(
b

1
)(
b

1
)
(
b

0
)
0
I(

)(
b

0
)
as

2
(m)
(7.6)
where I(
b

1
) is evaluated at
b

1
which (because it is estimated under the
unrestricted model H
1
) is a consistent estimator of .
It is useful to compare the expression in (7.6) with the results of the
likelihood ratio test as it was derived in Chapter 3:

LR
= (
b

b
)
0

(X
0
X)(
b

b
) = (e
0
R
e
R
e
0
e)
2

where
e
0
e

is the standardized RSS, which corresponds to the exponential in


the likelihood function (7.1). It is easy to see that
2 {log L(
0
) logL(
1
)} =
e
0
R
e
R
e
0
e

i.e. the likelihood ratio test statistic for the linear regression model is the
dierence between the restricted and the unrestricted residual sum of squares
divided by the error variance.
7.4 The Wald test procedure
In the Wald test procedure only the unrestricted model H
1
(
1
) needs to
be estimated. It is useful to distinguish between the case of linear and of
nonlinear restrictions:
H
1
:
1
unrestricted
1
alt.1 H
0
: R
1
= r m linear restrictions on
1
alt.2 H
0
: r
j
(
1
) = 0 m nonlinear restrictions on
1
where
1
is k 1, R is m k (m k), and j = 1, ..., m.
7.4. THE WALD TEST PROCEDURE 9
7.4.1 Tests of linear restrictions R
1
= r.
We know that

T(
b

1
) N (0, TV)
and, hence, that

TR(
b

1
) =

T(R
b

1
r) N

0, RTVR
0

.
Since squared standardised normal variables are
2
-distributed we calculate
the quadratic form
(R
b

1
r)
h
RVR
0
i
1
(R
b

1
r)
0
as.

2
(m)
The ML-estimate
b

1
is a consistent estimator of , so we can replace V with
its consistent estimate

V yielding the Wald statistic
W = (R

1
r)
h
R

VR
0
i
1
(R

1
r)
0
as

2
(m).
It is easy to see that the above result is a generalization of the previously
discussed result in Chapter 3 on testing linear restrictions using the Wald
test:
W = (R
b
r)

R(X
0
X)
1
R
0

1
(R
b
r)
0

2
(q)
,
7.4.2 Tests of nonlinear restrictions r
j
(
1
) = 0, j =
1, ..., m
The idea behind the test is to approximate the nonlinear restriction r
j
(
1
) =
0 with a linear restriction

R
0
r = 0 using a Taylor expansion. The idea
is illustrated in the gure below:
The Wald test is derived under the assumption that we have estimated
the unrestricted model H
1
, obtained the estimator
b

1
, and calculated r
j
(
b

1
).
In order to linearize the nonlinear function r
j
(
b

1
) we use a rst order Taylor
expansion around the true parameter value
1
:
r
j
(
1
) ' r
j
(
b

1
)
| {z }
0
+
r
j
(
b

1
)

1
| {z }
e
R
(
1

1
) j = 1, ..., m
10CHAPTER7. MAXIMUMLIKELIHOODESTIMATIONANDTESTING
where r
j
(
b

1
) is the restriction on the parameter-estimates of the unrestricted
model evaluated under H
0
and

R =
r
j
(
b

1
)

1
. Thus under H
0
:
r
j
(
1
) '

R(
b

1
) = 0 j = 1, ..., m
i.e. the nonlinear restriction has been replaced by a linear approximation.
7.4. THE WALD TEST PROCEDURE 11
The asymptotic distribution is:

T

R(
b

1
) N

0,

RTV

R
0

which implies

R(
b

1
)
| {z }
'r
j
(
b

1
)
h

RV

R
0
i
1
(
b

1
)
0

R
0
| {z }
'r
j
(
b

1
)
0
as

2
(m).
The unrestricted ML-estimator
b

1
is a consistent estimator for
1
and we
can replace the inverse information matrix with the estimated version. The
Wald statistic becomes

W
= r
j
(
b

1
)
h

R
0
i
1
r
j
(
b

1
)
0
appr.

2
(m) (7.7)
Notice that the distribution of the Wald test is only approximatly
2
since
(
b

1

1
)
0

R is only approximately equal to r


j
(
1
), and this approximation
does not disappear with increasing sample size.
Example:
Consider the dynamic linear regression model:
y
t
= a
0
+ a
1
y
t1
+ b
0
x
t
+ b
1
x
t1
+
t
(7.8)
and the static long-run solution:
y

=
0
+
1
x (7.9)
where
0
=
a
0
1a
1
and
1
=
b
0
+b
1
1a
1
.
Assume that we have estimated (7.8) and (7.9) and would now like to
test the H
0
hypothesis:
r
1
(
1
) =
b
0
+ b
1
1 a
1
=
1
= 0

1
=
b
0
+b
1
1a
1
= 0. Since this is a nonlinear hypothesis on the parameters

0
1
= [a
0
, a
1
, b
0
, b
1
], we have to nd a linear approximation:
r
1
(
1
) '

R
1

1
=
r
1
(
1
)

1

1
=

b
0
+b
1
1a
1

1

1
.
12CHAPTER7. MAXIMUMLIKELIHOODESTIMATIONANDTESTING
The vector

R
0
1
becomes:

R
0
1
=

b
0
+b
1
1a
1

a
0
=0

b
0
+b
1
1a
1

a
1
=
b
0
+b
1
(1a
1
)
2

b
0
+b
1
1a
1

b
0
=
1
1a
1

b
0
+b
1
1a
1

b
1
=
1
1a
1

.
We can now nd the Wald test statistic (7.7):

W
=

1
(

R
1

R
0
1
)
1

1

2
(1).
where r
1
(

1
) =

1
,

V =
2

(X
0
X)
1
and X = [1, y
1
, x, x
1
].
The static long-run solution becomes y

=
0
+
1
x
1
+ ... +
k
x
k
, when
the dynamic model contains k explanatory variables. In this case we can nd
the long-run covariance matrix of the long-run steady-state parameters:
r(
1
) =

r
0
(
1
) =
a
0
1a
1
=
0
r
1
(
1
) =
b
10
+b
11
1a
1
=
1
.
.
.
r
k
(
1
)=
b
k0
+b
k1
1a
1
=
k

by rst evaluating the restriction matrix



R

R =

R
0

R
1
.
.
.

R
k

,
and then calculating the variance/covariance matrix:
V ar(

) =
2

R(X
0
X)
1

R
0
.
7.5. THE LAGRANGE MULTIPLIER TEST PROCEDURE 13
7.5 The Lagrange Multiplier Test Procedure
Under this procedure we ask whether the log likelihood function is close
enough to its maximum evaluated for the parameters estimated under the
null hypothesis
b

0
. The closeness is then investigated by checking whether
Dlog L(
b

0
) ' 0. The log likelihood function can be approximated by a
Taylor expansion around the value
1
:
log L(
b

0
) ' log L(
1
)+Dlog L(
1
)(
b

1
)
| {z }
=0
+
1
2
(
b

1
)
0
D
2
log L(
1
)(
b

1
)
Dierentiating log L w.r.t.
b

0
and using the fact that the log likelihood
function attains a maximum at the parameter value
1
(i.e. Dlog L(
1
) = 0)
yields
Dlog L(
b

0
) ' (
b

1
)
0
D
2
log L(
1
) (7.10)
The LM-test statistic is derived under assumption that H
0
is the true model
and in that case
1
=
0
= . We have that

T(
b

0
) N (0, TV)
14CHAPTER7. MAXIMUMLIKELIHOODESTIMATIONANDTESTING
where is the true parameter value. We also know that
E(D
2
log L()) = I().
Therefore, under H
0
Dlog L(
b

0
) (
b

0
)
0
I().
Hence, the asymptotic variance is
Avar(Dlog L(
0
)) = I()Avar(
0
)
| {z }
V=I
1
()
I()
= I()I
1
()I() = I()
i.e.
Dlog L(
b

0
)
as
N(0, I()). (7.11)
We can now use (7.11) to construct the LM test statistic as the sum of
squared standardized normal variables, i.e.
LM = Dlog L(
b

0
)VDlog L(
b

0
)
0
(7.12)
which is asymptotically distributed as
2
(m), where m is the dimension of the
vector DlogL(
b

0
). As usual we substitute the information matrix I() with
I(
b

0
) which under the null hypothesis is based on the consistent estimate
b

0
.
Thus, the L-M statistic becomes

LM
= Dlog L(
b

0
)V

0
Dlog L(
b

0
)
0 as

2
(m) (7.13)
7.5.1 Illustration: LM test of residual autocorrelation
We consider the simple linear model with rst order residual autocorrelation
as the alternative hypothesis
y
t
=
0
x
t
+ u
t
, u
t
= u
t1
+
t
,
t
Niid(0,
2

)
H
0
: = 0
H
1
: 6= 0
Under the null hypothesis there is no residual autocorrelation and so u
t
=

t
Niid(0,
2

). From this it follows that


u
t
= y
t

0
x
t
Niid(0,
2

)
7.5. THE LAGRANGE MULTIPLIER TEST PROCEDURE 15
Accordingly, we can write the log likelihood function as follows:
log L(,
2

) =
T
2
log 2
T
2
log
2

1
2
2

y
t

0
x
t

2
Under the H
1
hypothesis there is rst order residual autocorrelation and so

t
= u
t
u
t1
and
t
Niid(0,
2

). It follows that

t
= u
t
u
t1
=
n
(y
t

0
x
t
) (y
t1

0
x
t1
)
o
Niid(0,
2

)
Accordingly, the log likelihood function can be written as
log L(, ,
2

) =
T
2
log 2
T
2
log
2

1
2
2

(y
t

0
x
t
) (y
t1

0
x
t1
)

2
Since the log likelihood function under H
1
is nonlinear in the parameters (i.e.

0
is part of the Sum of Squared Errors) the L-M procedure is preferable
for this kind of model misspecication testing. To nd the L-M statistic we
need the derivatives of the log likelihood function of the unrestricted model
(H
1
) evaluated under the H
0
hypothesis. Because of the block-diagonality of
the information-matrix, we only need the derivatives with respect to (, ).
It is useful to partition the parameters into two sets
1
= (, ) and
2
=
2

.
To simplify the notation we use
Dlog L() =

D
1
log L()
D
2
log L()

where D
1
and D
2
is the derivative with respect to
1
= (, ) and
2
=
2

,
respectively. The L-M statistic becomes

D
1
log L(), D
2
log L()
| {z }
0

0
H
0
"

I
11
() I
12
()
I
21
() I
22
()

H
0
#
1

D
1
log L(), D
2
log L()
| {z }
0

H
0
= [D
1
log L()]
0
H
0
h

I
11
() I
12
()I
1
22
()I
21
()

H
0
i
1
[D
1
log L()]
H
0
=
h
D
1
log L()
0
I
1
11
()D
1
log L()
i
H
0
16CHAPTER7. MAXIMUMLIKELIHOODESTIMATIONANDTESTING
where D
2
log L() = 0 because
2

is based on the unrestricted model H


1
,
and I
12
() = I
0
21
() = 0 because
2

and , are independent. The relevant


derivatives are as follows
log L

=
P
(x
t
x
t1
)

(y
t

0
x
t
) (y
t1

0
x
t1
)

=
P
(x
t
x
t1
)
t

log L

=
P

y
t1

0
x
t1

(y
t

0
x
t
) (y
t1

0
x
t1
)

=
P
u
t1

2
log L

0
=
P
(x
t
x
t1
) (x
t
x
t1
)
0

2
log L

=
P

y
t

0
x
t

=
P
u
2
t

2
log L

2
log L

0
0
=
P
(x
t1

t
(x
t
x
t1
)u
t1
)

The information-matrix for the relevant variables evaluated under the H


0
hypothesis becomes
I
11
()
H
0
=

E

2
log L

0
1

H
0
=
1
b
2
u
P
x
t
x
0
t
P
x
t
b u
t1
P
b u
t1
x
0
t
P
b u
2
t1

=

1
b
2
u
(X
0
X) 0
0 T

.
Note that under the null = 0, E (u
t
) = E (
t
) = 0 and E

P
T
t=1
b u
2
t1
b
2
u

= T,
since E
b u
2
t1
b
2
u
= 1.
Similarly, the rst derivatives of
1
are evaluated under the null:
(D
1
log L())
H
0
=

log L

log L

H
0
=

1
b
2
u
P
x
t
b u
t
1
b
2
u
P
b u
t1
b u
t

7.5. THE LAGRANGE MULTIPLIER TEST PROCEDURE 17


Accordingly the L-M statistic for rst order residual autocorrelation can be
evaluated from the formula given above
LM = (D
1
log(
0
))
0
H
0

V
|{z}
I
1
11
(
b

0
)

Dlog L(
b

0
)

H
0
. (7.14)
By inserting the above expressions into (7.14) we nd

LM
=
(
P
z
t
b u
t
)
0
P
z
t
z
0
t

1
(
P
z
t
b u
t
)
b
2
u
= TR
2

2
(1)
where z
t
= (x
0
t
, b u
t1
)
0
, and R
2
is the coecient of multiple correlation from
the regression of our model
b u
t
= b
0
1
x
t
+ b
2
b u
t1
+ e
t
7.5.2 The L-M test of residual correlation: a practical
procedure
Assume that we have estimated the standard linear model with k explana-
tory variables. The H
1
hypothesis is m
0
th order autocorrelation and the H
0
hypothesis is independent errors.
y
t
=
0
x
t
+ u
t
, u
t
=
1
u
t1
+ ... +
m
u
tm
+
t

t
Niid(0,
2

)
H
0
:
1
= ... =
m
= 0
H
1
:
i
6= 0 for at least one i
We rst estimate the model under H
0
(that is with
1
= ... =
m
= 0), using
OLS and obtain
b u
t
= y
t

0
x
t
The following steps denes the LM test procedure:
1. Estimate the auxiliary regression:
b u
t
= b
0
1
x
t
+
1
b u
t1
+ ... +
m
b u
tm
+
t
18CHAPTER7. MAXIMUMLIKELIHOODESTIMATIONANDTESTING
2. Calculate the R
2
statistic from the auxiliary regression
3. Calculate the
2
LM test statistic:

LM
= TR
2

2
(m)
4. or alternatively the F-form:

LM
=
T n
m
R
2
1 R
2
F(m, T n)
where n = m + k. The F-form statistic can be rewritten to yield

LM
=
(RSS
H
0
RSS
H
1
) /m
RSS
H
1
/(T n)
F(m, T n)
7.5.3 The LM tests of residual heteroscedasticity
We consider here the linear model with multiplicative heteroscedasticity in
the error term:
y
t
=
0
x
t
+ u
t
, u
t
N(0,
2
t
)
H
0
:
2
t
=
2
H
1
:
2
t
= exp(w
0
t
)
Thus the u
t
s are independently, but not identically distributed. A spe-
cial case of the above model is found by setting w
0
t
= (1, log x
t
) and
0
=
(log
2
, ). If = 0 then the model has homoscedastic errors
2
t
=
2
.
Since the restricted model under the null is easy to estimate, and the un-
restricted model under the alternative is not, the L-M procedure is a natural
choice. The L-M statistic involves the derivatives of the log likelihood func-
tion specied under H
1
and evaluated under H
0
. The log likelihood function
under H
1
is
L(, ) =
T
2
log 2
1
2
X
w
0
t

1
2
X

y
t

0
x
t

2
e
w
0
t

As in the previous example our information-matrix is block-diagonal with


respect to and and the L-M statistic is only a function of the derivatives
Dlog L(
b

0
) w.r.t. . The relevant derivatives are
7.5. THE LAGRANGE MULTIPLIER TEST PROCEDURE 19
log L

=
1
2
X
w
t
+
1
2
X
w
t

y
t

0
x
t

2
e
w
0
t

=
1
2
X
w
t

u
2
t

2
t

2
log L

=
1
2
X
w
t
w
0
t

y
t

0
x
t

2
e
w
0
t

=
1
2
X
w
t
w
0
t

u
2
t

2
t

To evaluate the L-M statistic we need the rst and the second derivative
evaluated under the null:

log L

H
0
= (D
1
log L())
H
0
=
1
2
X
w
t

u
2
t

2
u

=
1
2
X
w
t
v
t
(I
22
())
H
0
= (E [D
22
log L()])
H
0
=

1
2
X
w
t
w
0
t

H
0
=
1
2
X
w
t
w
0
t
where we have used v
t
= ((u
2
t
/
2
u
) 1). From these expressions we are now
able to calculate the L-M statistic
LM =
1
2

X
w
t
b v
t
0
X
w
t
w
0
t

X
w
t
b v
t

We notice that the L-M statistic in this case becomes the ESS in a regression
of b v
t
=

b u
2
t
/b
2
u

on w
t
.
7.5.4 The L-M test of multiplicative heteroscedastic-
ity: a practical procedure
Assume that we have estimated the standard linear model with k explanatory
variables. The H
1
hypothesis is multiplicative heteroscedasticity and the H
0
hypothesis is homoscedastic errors.
y
t
=
0
x
t
+ u
t
, u
t
N(0,
2
t
)
H
1
:
2
t
= exp(w
0
t
), where w
t
is m 1.
H
0
:
2
t
=
2
20CHAPTER7. MAXIMUMLIKELIHOODESTIMATIONANDTESTING
1. Estimate the model under the null hypothesis, y
t
=

0
x
t
+ u
t
, and and
save u
t
and b
2
u
.
2. Calculate
v
t
=
b u
2
t
b
2
u
1.
3. Regress v
t
on w
t
(the variables determining the heteroscedasticity under
H
1
) and calculate the ESS and R
2
.
4. Calculate the L-M statistic either from the ESS

LM
=
ESS
2

2
(m)
or from R
2
= TR
2

2
(m)
5. Alternatively, the F-form of the LM test can be calculated as:

LM
=
T n
m
R
2
1 R
2
F(m, T n)
where n = m+k, and where k is the number of variables in the vector
x
t
.
7.5.5 The Breusch-Pagan/Whites test: A practical
procedure
Again, we consider the standard linear model with k explanatory variables.
The H
1
hypothesis is heteroscedastic errors and the H
0
hypothesis is ho-
moscedastic errors:
y
t
=
0
x
t
+ u
t
, u
t
N(0,
2
t
)
H
1
:
2
t
= f(x
t
, x
2
t
)
H
0
:
2
t
=
2
We perform the following steps:
7.5. THE LAGRANGE MULTIPLIER TEST PROCEDURE 21
1. Estimate the model under H
0
(i.e. with
2
t
=
2
) using OLS and
calculate b u
t
and b
2
u
.
2. Construct
v
t
=
b u
2
t
b
2
u
3. Estimate the following auxiliary regression model with OLS
v
t
= a
0
+(a
11
x
1t
+a
12
x
2t
+...+a
1m
x
mt
)+(a
21
x
2
1t
+a
22
x
2
2t
+...+a
2m
x
2
mt
)+
t
and calculate the R
2
.
4. Calculate

LM
= TR
2

2
(k)
5. or, alternatively the F-form:

LM
=
T 2k
2k
R
2
1 R
2
F(2k, T 2k)
where k is the number of variables in x
t
including the constant.
7.5.6 Testing functional form misspecication
The tests of functional form misspecication are similar to the above tests of
heteroscedastic errors, but diers in the sense that the alternative hypothesis
is formulated in terms of omitted variables, and not in terms of the residual
variance. Although omitted variables, if relevant, would also inuence the
residual variance when being estimated under the H
0
hypothesis, the remedy
in this case would be to include the omitted variables in the regression model.
In practise it is often dicult to distinguish between the case of functional
form misspecication and the case of heteroscedastic errors.
We consider again the standard linear model for which the sqaures and
cross-products of the explanatory variables may have explanatory power. For
the sake of simplicity we assume k = 2 explanatory variables x
0
t
= (x
1t
, x
2t
).
y
t
=
0
x
t
+ c
1
x
2
1t
+ c
2
x
2
2t
+ c
3
x
1t
x
2t
+ u
t
, u
t
Niid(0,
2
u
)
H
0
: c
1
= c
2
= c
3
= 0
H
1
: c
i
6= 0 for at least one i.
22CHAPTER7. MAXIMUMLIKELIHOODESTIMATIONANDTESTING
With k = 2 explanatory variables there are m = 3 auxiliary terms. This
model is nonlinear in the variables but not in the parameters, implying that
the model is easy to estimate both under the H
0
and the H
1
hypothesis.
Therefore, both the Likelihood Ratio and the Wald test procedure are easy
to perform.
Functional form misspecication can also be tested with the L-M proce-
dure as will be demonstrated below. The log likelihod function under H
1
is
given by
log L(, c
1
, c
2
, c
3
,
2
u
) =
T
2
log 2
T
2
log
2
u

1
2
2
u
X

y
t

0
x
t
c
1
x
2
1t
c
2
x
2
2t
c
3
x
1t
x
2t

2
From the log likelihood function we calculate th e rst derivative with respect
to the parameters (
0
, c
1
, c
2
, c
3
) and evaluate this expression under the null,
again taking advantage of the block-diagonality of the information-matrix
and leaving aside
2
u
:
[D
1
log()]
0
H
0
=

P
z
0
t

y
t

0
x
t
c
1
x
2
1t
c
2
x
2
2t
c
3
x
1t
x
2t

2
u

H
0
=
P
z
0
t
b u
t
b
2
u
where z
0
t
= [x
1t
, x
2t
, x
2
1t
, x
2
2t
, x
1t
x
2t
]. From the rst order derivative we nd
the rst element of the information matrix and evaluate it under the null
hypothesis:
I
11
()
H
0
= [E (D
11
log L())]
H
0
=
P
z
t
z
0
t

2
u

H
0
=
P
z
t
z
0
t
b
2
u
We are now in a position to evaluate the L-M statistic based on the consistent
estimate
b
:

LM
=

D
1
log(
b
)
0

V
|{z}
I
1
11
(
b
)

Dlog L(
b
)

=
(
P
z
t
b u
t
)
0
P
z
t
z
0
t

1
(
P
z
t
b u
t
)
b
2
u

LM
= TR
2

2
(m)
Chapter 8
The Analysis of Non-Stationary
Data
8.1 Dickey-Fuller tests of unit roots
Most students are familiar with the standard regression model and how to
interpret its coecients. When dynamics are introduced, inference and in-
terpretation changes fundamentally and interpretational mistakes are easy
to make. We will use a simple univariate model to demonstrate how the in-
terpretation of a linear time trend and a constant term is crucially related to
the dynamics of the model, in particular to whether the dynamics contains
a unit root or not.
We consider the following simple regression model for y
t
containing a
linear trend and a constant:
y
t
= t +u
t
+
0
, t = 1, ..., T (8.1)
where the residual u
t
is a rst order autoregressive process:
u
t
=

t
1 L
(8.2)
and u
0
is assumed xed. Note that the assumption (8.2) implies that (8.1)
is a common factor model. As demonstrated below such a model imposes
nonlinear restrictions on the parameters of the AR model and is, therefore,
a special case of the general autoregressive model. Nonetheless, (8.1)-(8.2)
serves the purpose of providing a pedagogical illustration of the dual roles of
deterministic components in dynamic models.
1
2 CHAPTER 8. THE ANALYSIS OF NON-STATIONARY DATA
It is useful to see how the constant
0
is related to the initial value
of y
t
. Using (8.1) we have that y
0
=
0
+ u
0
. Since an economic variable
is usually given in logs, the level contains information about the unit of
measurements (the log of 100.000 euro, say). Therefore, the value of
0
is generally dominated by y
0
. For practical purposes
0
' y
0
and in the
discussion below we will set
0
= y
0
to emphasize the role of measurments
on the constant in a dynamic regression model.
By substituting (8.2) in (8.1) we get:
y
t
= t +

t
1 L
+y
0 (8.3)
and by multiplying through with (1 L) :
(1 L)y
t
= (1 L)t + (1 L)y
0
+
t
. (8.4)
Rewriting (8.4) using Lx
t
= x
t1
we get:
y
t
= y
t1
+ (1 )t + + (1 )y
0
+
t
. (8.5)
It is easy to see that the static regression model (8.1) is equivalent to the
following dynamic regression model:
y
t
= b
1
y
t1
+b
2
t +b
0
+
t
(8.6)
with
b
1
=
b
2
= (1 )
b
0
= + (1 )y
0
.
(8.7)
We consider the following four cases:
Case 1. = 1 and 6= 0. It follows from (8.5) that y
t
= +
t
, for t =
1, ..., T, i.e. the random walk with drift model. Note that E(y
t
) = 6= 0
is equivalent to y
t
having a linear trend, t.
Case 2. = 1 and = 0. It follows from (8.5) that y
t
=
t
, for
t = 1, ..., T, i.e. the pure random walk model. In this case E(y
t
) = 0 and
y
t
contains no linear trend.
Case 3. | |< 1 and 6= 0 gives (8.6), i.e. y
t
is stationary around its
mean Ey
t
= a
1
t +a
0
. We will now show that a
1
= and a
0
= y
0
:
Ey
t
= Ey
t1
+ (1 )t + + (1 )y
0
a
1
t +a
0
= (a
1
(t 1) +a
0
) + (1 )t + + (1 )y
0
a
1
(1 )t +a
0
(1 ) + a
1
= (1 )t + + (1 )y
0
8.2. THE DICKEY-FULLER TEST PROCEDURE 3
Hence:
a
1
(1 )t = (1 )t
a
1
=
and:
a
0
(1 ) + = + (1 )y
0
a
0
= y
0
.
Thus, one should note that the coecients in a dynamic regression model
have to be interpreted with caution. For example b
2
in (8.6) is not an estimate
of the trend slope in y
t
and b
0
is not an estimate of
0
.
Case 4. | |< 1 and = 0, gives us y
t
= y
t1
+ (1 )y
0
+
t
, where
Ey
t
= y
0
i.e. the stationary autoregressive model with a constant term.
To summarize:
in the static regression model (8.1) the constant term is essentially
accounting for the unit of measurement of y
t
,
in the dynamic regression model (8.5) the constant term is a weighted
average of the growth rate and the initial value y
0
,
in the dierenced model ( = 1) the constant term is only measuring
the growth rate, .
8.2 The Dickey-Fuller test procedure
Under the null of a unit root in u
t
, and thus in y
t
, the OLS-estimator of
in (8.1) is no longer t-distributed. Dickey and Fuller () showed that it has
nonstandard asymptotic distribution and simulated the quantiles of the dis-
tribution. The so called Dickey-Fuller tables are calculated for three dierent
assumptions on the deterministic components of model (8.1)
1
:
i) b
0
, b
2
= 0, i.e. no constant, nor a trend in (8.6) Table
nc
ii) b
2
= 0, i.e. a constant but no a trend in (8.6) Table
c
iii) b
0
, b
2
6= 0, i.e. a constant anda trend in(8.6) Table
ct
1
Remember that this model is based on the common factor restriction which may or
may not be satised. The valididty of the Dickey-Fuller tests is not restricted to the
common factor model.
4 CHAPTER 8. THE ANALYSIS OF NON-STATIONARY DATA
8.2.1 Case 1: A linear time trend and a constant in
the data
Model (8.6)
y
t
= b
1
y
t1
+b
2
t +b
0
+
t
(8.8)
can be formulated in the equivalent form:
y
t
= (1 b
1
)
| {z }
c
y
t1
+b
2
t +b
0
+
t
(8.9)
where as before
b
1
=
b
2
= (1 )
b
0
= + (1 )y
0
The Dickey - Fuller procedure:
1. Specify the null and the alternative hypothesis:
H
0
: c = 0 (i.e. = 1) and b
2
= 0
H
1
: c 6= 0 (i.e. < 1) and b
2
6= 0
2. Estimate (8.9) with OLS,
3. Calculate the D-F test statistic:
DF
=
c

c
,
4. Compare
DF
with, for example the 95% critical values of
ct
in the
Dickey - Fuller tables.
Under H
0
: = 1 the model becomes y
t
= b
0
+
t
, i.e. no linear time
trend in the dierenced model, but b
0
(the drift term) is an estimate of the
slope coecient of the linear trend in (8.1). Therefore, note that the
role of the constant term and the role of the linear trend changes
8.2. THE DICKEY-FULLER TEST PROCEDURE 5
at the point = 1.
The correct null hypothesis is therefore a joint test of:
H
0
: c = 0 and b
2
= 0.
However, when people refer to the D-F unit root test it is usually only the
test of c = 0. The joint test is somewhat more complicated, as c is D-F
distributed but the distribution of

b
2
is closer to Students t-distribution.
8.2.2 Case 2: No linear time trend in the data
Model (8.9) now becomes:
y
t
= b
1
y
t1
+b
0
+
t
(8.10)
or in the equivalent form:
y
t
= (1 b
1
)
| {z }
c
y
t1
+b
0
+
t
(8.11)
where
b
1
=
b
0
= (1 )y
0
The Dickey - Fuller procedure:
1. Specify the null and the alternative hypothesis:
H
0
: c = 0 (i.e. = 1) and b
0
= 0
H
1
: c 6= 0 (i.e. < 1) and b
0
6= 0
2. Estimate (8.11) with OLS,
3. Calculate the D-F test statistic:
DF
=
c

c
,
6 CHAPTER 8. THE ANALYSIS OF NON-STATIONARY DATA
4. Compare
DF
with, for example the 95% critical values of
c
in the
Dickey - Fuller tables.
Under H
0
the model becomes y
t
=
t
. Thus, the test of the null hypoth-
esis is the joint test of:
H
0
: c = 0 and b
0
= 0
The case with no constant is derived in a similar way.
8.2.3 The Augmented Dickey - Fuller tests for unit
roots
We consider now a more general dynamic specication of (8.1):
(
y
t
= t +y
0
+u
t
u
t
=

t
1
1
L ...
m
L
m
i.e.
y
t
=
1
y
t1
+... +
m
y
tm
+(1
1
...
m
)t+
+(
1
+ 2
2
+... +m
m
) + (1
1

2
...
m
)y
0
,
or in the equivalent form
y
t
= (1
m
X
i=1

i
)
| {z }
c
y
t1
+

bt +b
0
+b
1
y
t1
+... +b
m1
y
tm1
+
t
(8.12)
where

b = (1
1

2
...
m
)
b
0
= (
1
+ 2
2
+... +m
m
) + (1
1

2
...
m
)y
0
b
i
=
P
m
j=i+1

j
The test of a unit root in y
t
is similar to the cases described above. For
example, when the data contain a trend and a constant:
8.2. THE DICKEY-FULLER TEST PROCEDURE 7
1. Specify the null and the alternative hypothesis:
H
0
: c = 0 (i.e. = 1) and

b = 0
H
1
: c 6= 0 (i.e. < 1) and

b 6= 0
2. Estimate (8.9) with OLS,
3. Calculate the D-F test statistic:
DF
=
c

c
,
4. Compare
DF
with, for example the 95% critical values of
ct
in the
Dickey - Fuller tables.
Again, the correct null hypothesis should be formulated as the joint hy-
pothesis:
H
0
: c = 0 and
e
b = 0
Finally, note that the D-F test is derived under the assumption
that the parameters in the D-F model are constant over time. This
is often not the case, and the assumption should, therefore, always
be checked before reporting the test.
Econometrics 2.
December 10, 2003
An Introduction to
Monte Carlo Simulations and PcNaive
Heino Bohn Nielsen
Institute of Economics,
University of Copenhagen
1 of 18
Monte Carlo Simulations
Many results in econometrics are asymptotic, i.e. for T .
1. Often dicult to get a rm understanding of the results:
How should we think of repeated sampling?
What is the exact meaning of a central limit theorem?
2. How does a given estimator work in nite samples?
To answer these questions, simulation methods are often useful.
Get some intuition for asymptotic results.
Graphical representation of convergence, uncertainty etc.
Analyze nite sample properties (often very dicult analytically).
2 of 18
Outline of the Lecture
1. The basic idea in Monte Carlo simulations.
2. Example 1: Sample mean (OLS) of iid normals.
3. Introduction to PcNaive.
4. Example 2: OLS in an AR(1) model:
Consistency of OLS.
Finite sample bias.
5. Example 3: Simultaneous equation model:
Inconsistency of OLS.
Consistency of IV.
The idea of strong and weak instruments.
3 of 18
The Monte Carlo Idea
The Monte Carlo method replaces a dicult deterministic problem with a
stochastic problem with the same solution (e.g. due to LLN).
If we can solve the stochastic problem by simulations, labour intensive work
can be replaced by cheap capital intensive simulations.
What is the probability of success in the game Solitaire?
Very dicult analytical problem.
But a machine could play M times, and for M we could estimate
the probability.
What is the nite sample distribution of an estimator?
Very dicult in most situations.
We could generate M samples and look at the empirical distribution of
the estimates.
4 of 18
Note of Caution
The Monte Carlo method is a useful tool in econometrics.
But:
Simulations do not replace (asymptotic) theory.
Simulation can illustrate but not prove theorems.
Simulations results are not general.
Results are specic to the chosen setup.
Work like good examples.
5 of 18
Ex. 1: Mean of iid Normals
Consider the model
y
t
= +
t
,
t
N(0,
2
), t = 1, 2, ..., T. (1)
The OLS (and ML) estimator b of is the sample mean
b = T
1
T
X
t=1
y
t
.
Note, that b is consistent, unbiased and (exactly) normally distributed
b N(, T
1

2
).
The standard deviation of the estimate can be calculated as
b

= T
1
T
X
t=1
(y
t
b )
2
.
We call this the estimated standard error (ESE).
6 of 18
Ex. 1 (cont.): Illustration by Simulation
We can illustrate the results, if we can generate data from (1). We need:
1. A fully specied Data Generating Process (DGP), e.g.
y
t
= +
t
,
t
N(0,
2
), t = 1, 2, ..., T (2)
= 5

2
= 1.
An algorithm for drawing random numbers from N(, ).
Specify a sample length, e.g. T = 50.
2. An estimation model for y
t
and an estimator. Consider OLS in
y
t
= + u
t
. (3)
Note that the statistical model (3) and the DGP (2) need not coincide.
7 of 18
Ex. 1 (cont.): Four Realizations
Suppose we draw
1
, ...,
50
from N(0, 1) and construct a data set,
y
1
, ..., y
50
,
from (2).
We then estimate the model
y
t
= + u
t
,
using OLS to obtain the sample mean and the standard deviation in one
realization,
b
= 4.9528, ESE(
b
) = 0.1477.
We can look at more realizations
Realization
b
ESE(
b
)
1 4.98013 0.1477
2 5.04104 0.1320
3 4.99815 0.1479
4 4.82347 0.1504
Mean 4.96070 0.1445
8 of 18
3
4
5
6
7
First realization, Mean=4.98013
3
4
5
6
7
First realization, Mean=5.04104
2
4
6
First realization, Mean=4.99815
2
4
6
First realization, Mean=4.82347
9 of 18
Ex. 1 (cont.): More Replication
Now suppose we generate data from (2) M times,
y
m
1
, ..., y
m
50
, m = 1, 2, ..., M.
For each m we obtain a sample mean
b

m
.
We could then look at the mean estimate and the Monte Carlo standard
deviation
MEAN = M
1
M
X
m=1
b

m
MCSD =
v
u
u
t
M
1
M
X
m=1

m
MEAN

2
.
For large M we would expect:
MEAN to be close to the true . This bias is BIAS=MEAN.
MCSD to be close to the average ESE.
Depending on the properties of ESE.
10 of 18
Example 1 (cont.): Results
Note that MEAN is also an estimator. The standard deviation of MEAN
is the Monte Carlo standard error
MCSE = M

1
2
MCSD.
MCSD measures the uncertainty of
b
.
MCSE measures the uncertainty of the estimated MEAN in the simulation.
Consider the results for M = 1000 :
b
ESE(
b
)
1 4.98013 0.1477
2 5.04104 0.1320
.
.
.
.
.
.
.
.
.
1000 4.92140 0.1254
MEAN=4.9909 MEAN(ESE)=0.14088
MCSD=0.1386
MCSE=0.0044
11 of 18
Ex. 1 (cont.): More Results
3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0
0.25
0.50
0.75
1.00
Density, T=5
3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0
0.5
1.0
Density, T=10
3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0
1
2
3
Density, T=50
0 50 100 150 200 250
4.50
4.75
5.00
5.25
5.50
Estimates, different T
12 of 18
PcNaive
PcNaive is a menu-driven module in GiveWin.
Technically, PcNaive generates Ox code, which is then executed by Ox.
Output is returned in GiveWin.
Idea:
1. Set up the DGP
AR(1)
Static
PcNaive
General
2. Specify the estimation model
3. Choose estimators and test statistics to analyze.
4. Set specications: M, T etc.
5. Select output to generate.
6. Save and run.
13 of 18
Ex. 3: OLS in an AR(1)
Consider the DGP
y
t
= 0.9 y
t1
+
t

t
N(0, 1), t = 1, 2, ..., T.
We specify y
0
= 0, and discard 20 observations.
The estimation model is given by
y
t
= y
t1
+ u
t
,
and we estimate with OLS.
Note, that
b
is consistent but biased.
T MEAN BIAS MCSE
5 0.7590 0.1410 0.00475
10 0.7949 0.1051 0.00310
25 0.8410 0.0590 0.00172
50 0.8673 0.0327 0.00108
75 0.8779 0.0221 0.00082
100 0.8833 0.0167 0.00069
14 of 18
Ex. 3: Simultaneity Bias
Consider as a DGP the system of equations
y
at
= 0.5 y
bt
+ 0.75 y
at1
+
1t
y
bt
= + 0.8 z
t
+
2t
z
t
= + 0.75 z
t1
+
3t
for t = 1, 2, ..., 50, where


1t

2t

3t

0
0
0

1 0.2 0
0.2 1 0
0 0 1

.
We set y
a0
= z
0
= 0 and discard 20 observations.
Consider the OLS estimator in the estimation equation
y
at
=
0
+
1
y
bt
+
2
y
at1
+ u
t
.
Note that
cov(
1t
,
2t
) 6= 0 so that OLS is inconsistent.
z
t
is a valid instrument for y
bt
and IV is consistent.
15 of 18
Ex. 3: Results, OLS vs. IV: T=50
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1
0
2
4
No Simultaneity. OLS.
Yb
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1
0
2
4
Simultaneity. OLS.
Yb
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1
0
2
4
Simultaneity. IV.
Yb
16 of 18
Ex. 3: Results, OLS vs. IV: T=500
0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80
5
10
15
No simultaneity, OLS
0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80
5
10
15
Simultaneity, OLS
0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80
5
10
15
Simultaneity, IV
17 of 18
Ex. 3: Results, Strength of Instruments
-2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0
1
2
3
4
IV. Mediuminstrument. Yb =0.80*Z+e.
Yb
-2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0
0.5
1.0
1.5
IV. Weak instrument. Yb =0.30*Z+e.
Yb
-2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0
2.5
5.0
IV. Strong instrument. Yb =1.30*Z+e.
Yb
18 of 18

You might also like