Professional Documents
Culture Documents
Book - Eco No Metrics - Michael Creel
Book - Eco No Metrics - Michael Creel
onometri s
Mi hael Creel
Contents
1 About this do
ument
15
1.1
Li enses
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16
1.2
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
18
1.3
18
19
21
3.1
21
3.2
22
3.3
X, Y
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
24
3.3.1
In
Spa e . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
24
3.3.2
In Observation Spa e . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25
3.3.3
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25
3.4
26
3.5
Goodness of t
28
3.6
3.7
3.8
3.9
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . .
31
3.7.1
Unbiasedness
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
31
3.7.2
Normality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
32
3.7.3
33
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
36
3.8.1
Theoreti al ba kground
. . . . . . . . . . . . . . . . . . . . . . . . . . .
36
3.8.2
37
3.8.3
. . . . . . . . . . . . . . . . . . . . . . . . .
38
Exer ises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
40
30
41
41
4.1.1
42
4.2
Consisten y of MLE
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
44
4.3
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
45
4.4
46
CONTENTS
4.4.1
49
4.5
. . . . . . . . . . . . . . . . . . . . . . . . . .
49
4.6
50
4.7
Exer ises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
52
55
5.1
Consisten y . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
55
5.2
Asymptoti normality
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
56
5.3
Asymptoti e ien y . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
57
5.4
Exer ises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
58
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
59
6.1.1
Imposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
59
6.1.2
62
Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
63
6.2.1
t-test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
63
6.2.2
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
65
6.2.3
Wald-type tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
66
6.2.4
. . . . . . . . . .
66
6.2.5
68
6.3
69
6.4
. . . . . . . . . . . . . . . . . . . . . . . . . . .
72
6.5
Conden e intervals
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
72
6.6
Bootstrapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
74
6.7
. . . . . . . . . . . . . . .
75
6.8
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
77
6.9
Exer ises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
80
6.2
59
test
83
7.1
84
7.2
85
7.3
Feasible GLS
87
7.4
7.5
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.4.1
7.4.2
Dete tion
88
. . . . . . . . . .
88
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
89
7.4.3
Corre tion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
90
7.4.4
93
Auto orrelation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
96
7.5.1
Causes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
96
7.5.2
. . . . . . . . . . . . . . . . . . . . . . . .
97
7.5.3
AR(1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
98
CONTENTS
7.6
7.5.4
MA(1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
7.5.5
7.5.6
7.5.7
7.5.8
Examples
. 102
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
113
8.1
Case 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
8.2
Case 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
8.3
Case 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
8.4
8.5
9 Data problems
9.1
9.2
9.3
9.4
Collinearity
119
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
9.1.1
9.1.2
Ba k to ollinearity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
9.1.3
9.1.4
Measurement error
. . . . . . . . . . . . . . . . . . . . . . . . . . . 122
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
9.2.1
9.2.2
Missing observations
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
9.3.1
9.3.2
9.3.3
. . . . . . . . . . . . . . . . . . . 130
133
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
145
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
. . . . . . . . . . . . . . . . . . . . . . . 154
CONTENTS
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
. . . . . . . . . . . . . . . . . . . . . . . 165
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
. . . . . . . . . . . . . . . . . . . . . . . . 174
177
183
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
. . . . . . . . . . . . . . . . . . . 192
. . . . . . . . . . . . . . . . . . . . . . . . . . . 195
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
201
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
. . . . . . . . . . . . . . . . . . . . . . . . . . . 208
. . . . . . . . . . . . . . . . . . . . . . . . . . . 208
. . . . . . . . . . . . . . . 212
217
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
CONTENTS
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
. . . . . . . . . . . . . . . . . . . . . . . . . . . 221
. . . . . . . . . . . . . . . . . . . 222
. . . . . . . . . . . . . . . . . . . . . . . 225
. . . . . . . . . . . . . . . . . . . . . . . . . . . 232
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
16 Quasi-ML
249
. . . . . . . . . . . . . . . . . . 251
. . . . . . . 257
261
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
17.7 Appli
ation: Limited dependent variables and sample sele
tion
17.7.1 Example: Labor Supply
18 Nonparametri inferen e
. . . . . . . . . 268
. . . . . . . . . . . . . . . . . . . . . . . . . . . 268
271
CONTENTS
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
. . . . . . . . . . . . . . . . . . . . . . . . . . 287
. . . . . . . . . . . . . . . . . 288
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
19 Simulation-based estimation
297
. . . . 297
. . . . . . . . . . . . . . . 299
19.1.3 Estimation of models spe
ied in terms of sto
hasti
dierential equations300
19.2 Simulated maximum likelihood (SML)
19.2.1 Example: multinomial probit
. . . . . . . . . . . . . . . . . . . . . . . 301
. . . . . . . . . . . . . . . . . . . . . . . . 302
. . . . . . . . . . . . . . . . . . . . . . . 304
. . . . . . . . . . . . . . . . . . . . . . . . . . . 309
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310
. . . . . . . . . . . . 310
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
. . . . . . . . . . . . . . . . 312
315
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316
20.1.2 ML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316
20.1.3 GMM
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
323
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
CONTENTS
331
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
335
. . . . . . . . . . . . . . . . 335
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336
24 Li
enses
24.1 The GPL
341
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341
25 The atti
355
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 359
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364
10
CONTENTS
List of Figures
1.1
O tave . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16
1.2
LYX
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17
3.1
22
3.2
24
3.3
25
3.4
3.5
2
Un
entered R
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
28
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
29
3.6
32
3.7
33
3.8
35
3.9
. . . . . . . . . . . . . . . . .
35
6.1
73
6.2
81
7.1
. . . . . . . . . . . . . . . . . . .
93
7.2
. . . . . . . . . . . . . . . . . . . .
97
7.3
7.4
7.5
9.1
s()
9.2
s()
9.3
. . . . . . . . . . . . . . . . . . . . . . . . . 108
. . . . . . . . . . . . . . . . . . . . 110
. . . . . . . . . . . . . . . . . . . . . 188
. . . . . . . . . . . . . . . . . . . 194
. . . . . . . . . . . . . . . 196
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
11
12
LIST OF FIGURES
. . . . . . . . . . . . . . . . . . . . . 272
. . . . . . . . . . . . . . . . . . 274
. . . . . . . . . . . . . . . . . . 275
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 319
. . . . . . . . . . . . . . . . . . . 324
. . . . . . . . . . . . . . . . . 324
List of Tables
13
14
LIST OF TABLES
Chapter 1
About this do
ument
This do
ument integrates le
ture notes for a one year graduate level
ourse with
omputer
programs that illustrate and apply the methods that are studied. The immediate availability
of exe
utable (and modiable) example programs when using the PDF version of the do
ument is a distinguishing feature of these notes. If printed, the do
ument is a somewhat terse
approximation to a textbook. These notes are not intended to be a perfe
t substitute for a
printed textbook. If you are a student of mine, please note that last senten
e
arefully. There
are many good textbooks available. The
hapters make referen
e to readings in arti
les and
textbooks.
With respe
t to
ontents, the emphasis is on estimation and inferen
e within the world of
stationary data, with a bias toward mi
roe
onometri
s.
polished than the rst half, sin
e I have taught that
ourse more often. If you take a moment
to read the li
ensing information in the next se
tion, you'll see that you are free to
opy and
modify the do
ument. If anyone would like to
ontribute material that expands the
ontents,
it would be very wel
ome. Error
orre
tions and other additions are also wel
ome.
The integrated examples (they are on-line here and the support les are here) are an
important part of these notes. GNU O
tave (www.o
tave.org) has been used for the example
programs, whi
h are s
attered though the do
ument.
fa
tors. The rst is the high quality of the O
tave environment for doing applied e
onometri
s.
R , and will run s
ripts for that language
O
tave is similar to the
ommer
ial pa
kage Matlab
1
without modi ation . The fundamental tools (manipulation of matri es, statisti al fun tions,
minimization,
et .)
exist and are implemented in a way that make extending them fairly easy.
Se
ond, an advantage of free software is that you don't have to pay for it. This
an be an
important
onsideration if you are at a university with a tight budget or if need to run many
opies, as
an be the
ase if you do parallel
omputing (dis
ussed in Chapter 20). Third, O
tave
runs on GNU/Linux, Windows and Ma
OS. Figure 1.1 shows a sample GNU/Linux work
environment, with an O
tave s
ript being edited, and the results are visible in an embedded
1
R is a trademark of The Mathworks, In
. O
tave will run pure Matlab s
ripts. If a Matlab s
ript
Matlab
alls an extension, su
h as a toolbox fun
tion, then it is ne
essary to make a similar extension available to
O
tave. The examples dis
ussed in this do
ument
all a number of fun
tions, su
h as a BFGS minimizer, a
program for ML estimation, et
. All of this
ode is provided with the examples, as well as on the Peli
anHPC
live CD image.
15
16
CHAPTER 1.
shell window.
The main do
ument was prepared using LYX (www.lyx.org). LYX is a free
AT X. It (with
is what you mean word pro
essor, basi
ally working as a graphi
al frontend to L
E
AT X, HTML, PDF and several other
help from other appli
ations)
an export your work in L
E
forms. It will run on Linux, Windows, and Ma
OS systems. Figure 1.2 shows LYX editing
this do
ument.
1.1 Li
enses
All materials are
opyrighted by Mi
hael Creel with the date that appears above. They are
provided under the terms of the GNU General Publi
Li
ense, ver.
24.1 of the notes, or, at your option, under the Creative Commons Attribution-Share Alike
2.5 li
ense, whi
h forms Se
tion 24.2 of the notes. The main thing you need to know is that
you are free to modify and distribute these materials in any way you like, as long as you share
2
Free is used in the sense of freedom, but LYX is also free of harge (free as in free beer).
1.1.
17
LICENSES
18
CHAPTER 1.
your ontributions in the same way the materials are made available to you.
In parti ular,
you must make available the sour
e les, in editable form, for your modied version of the
materials.
PDF, together with all of the examples and the software needed to run them are available
on Peli
anHPC. Peli
anHPC is a live CD image. You
an burn the Peli
anHPC image to
a CD and use it to boot your
omputer, if you like. When you shut down and reboot, you
will return to your normal operating system.
be somewhat in
onvenient. It is also possible to use Peli
anHPC while running your normal
operating system by using a virtualization platform su
h as Sun xVM Virtualbox
3
R
The reason why these notes are integrated into a Linux distribution for parallel
omputing
will be apparent if you get to Chapter 20. If you don't get that far or you're not interested in
parallel
omputing, please just ignore the stu on the CD that's not related to e
onometri
s.
If you happen to be interested in parallel
omputing but not e
onometri
s, just skip ahead to
Chapter 20.
R is a trademark of Sun, In
. Virtualbox is free software (GPL v2). That, and the fa
t
xVM Virtualbox
that it works very well, is the reason it is re
ommended here. There are a number of similar produ
ts available.
It is possible to run Peli
anHPC as a virtual ma
hine, and to
ommuni
ate with the installed operating system
using a private network. Learning how to do this is not too di
ult, and it is very
onvenient.
Chapter 2
Introdu
tion: E
onomi
and
e
onometri
models
E
onomi
theory tells us that an individual's demand fun
tion for a good is something like:
x = x(p, m, z)
x
is
m
z
G1
is in ome
(this is a
ross se tion ,
where
i = 1, 2, ..., n
xi = xi (pi , mi , zi )
The model is not estimable as it stands, sin
e:
Some omponents of
zi
i.
For example,
people don't eat the same lun
h every day, and you
an't tell what they will order just
by looking at them. Suppose we
an break
single unobservable
omponent
zi
wi
and a
i .
A step toward an estimable e
onometri
model is to suppose that the model may be written
as
xi = 1 + pi p + mi m + wi w + i
We have imposed a number of restri
tions on the theoreti
al model:
19
20
CHAPTER 2.
xi ()
Of all parametri families of fun tions, we have restri ted the model to the lass of linear
oe ients to exist in a sense that has e onomi meaning, and in order to
be able to use sample data to make reliable inferen
es about their values, we need to make
additional assumptions. These additional assumptions have
no theoreti al basis,
they are
assumptions on top of those needed to prove the existen
e of a demand fun
tion. The validity
of any results we obtain using this model will be
ontingent on these additional restri
tions
being at least approximately
orre
t. For this reason,
will be needed, to
he
k that the model seems to be reasonable. Only when we are
onvin
ed that the model is
at least approximately
orre
t should we use it for e
onomi
analysis.
When testing a hypothesis using an e
onometri
model, at least three fa
tors
an
ause a
statisti
al test to reje
t the null hypothesis:
1. the hypothesis is false
2. a type I error has o
ured
3. the e
onometri
model is not
orre
tly spe
ied, and thus the test does not have the
assumed distribution
To be able to make s
ienti
progress, we would like to ensure that the third reason is not
ontributing in a major way to reje
tions, so that reje
tion will be most likely due to either
the rst or se
ond reasons. Hopefully the above example makes it
lear that there are many
possible sour
es of misspe
i
ation of e
onometri
models. In the next few se
tions we will
obtain results supposing that the e
onometri
model is entirely
orre
tly spe
ied. Later we
will examine the
onsequen
es of misspe
i
ation and see some methods for determining if a
model is
orre
tly spe
ied. Later on, e
onometri
methods that seek to minimize maintained
assumptions are introdu
ed.
Chapter 3
Ordinary Least Squares
3.1 The Linear Model
Consider approximating a variable
We an onsider a model
Linearity:
0 :
y = 10 x1 + 20 x2 + ... + k0 xk +
or, using ve
tor notation:
y = x 0 +
The dependent variable
x = ( x1 x2 xk ) is a k-ve tor
0 = ( 10 20
x.
The data
{(yt , xt )} , t = 1, 2, ..., n
yt = xt + t
The
y = X + ,
where
y=
y1 y2 yn
is
n1
and
X=
(3.1)
x1 x2 xn
Linear models are more general than they might rst appear, sin e one an employ non-
histori ally.
21
22
CHAPTER 3.
-5
-10
-15
0
10
X
12
14
16
18
20
0 (z) =
where the
i ()
y = 0 (z), x1 = 1 (w),
et .
leads to a model in
ln z = ln A + 2 ln w2 + 3 ln w3 + .
If we dene
y = ln z, 1 = ln A,
et .,
The
approximation is linear in the parameters, but not ne essarily linear in the variables.
yt = 1 + 2 xt2 + t .
are the data points
of
xt2 .
(xt2 , yt ), where t
1 + 2 xt2 ,
Exa tly how the green line is dened will be ome lear later. In pra ti e, we only have
the data, and we don't know where the green line lies. We need to gain information about the
straight line that best ts the data points.
3.2.
23
The
s() =
n
X
t=1
yt xt
2
= (y X) (y X)
= y y 2y X + X X
= k y X k2
This last expression makes it
lear how the OLS estimator is dened: it minimizes the Eu
lidean distan
e between
linear approximation to
distan
e.
and
using
X.
x
The tted OLS oe ients are those that give the best
(MAD) minimizes
Pn
t=1 |yt
estimator is best in terms of their statisti
al properties, rather than in terms of the metri
s
that dene them, depends upon the properties of
assumptions.
s(),
D s() = 2X y + 2X X
Then setting it to zeros gives
= 2X y + 2X X 0
D s()
so
= (X X)1 X y.
To verify that this is a minimum, he k the se ond order su ient ondition:
= 2X X
D2 s()
Sin
e
(X) = K,
n),
so
The
The
is in fa t a minimizer.
24
CHAPTER 3.
-5
-10
-15
0
10
X
12
14
16
18
20
Note that
y = X +
= X +
X y X X = 0
X y X = 0
X = 0
X.
true line and the estimated line are dierent. This gure was
reated by running the O
tave
program OlsFit.m . You
an experiment with
hanging the parameter values to see how this
ae
ts the t, and to see how the tted line will sometimes be
lose to the true line, and
sometimes rather far away.
3.3.
25
K > 1.
Observation 2
e = M_xY
S(x)
x
x*beta=P_xY
Observation 1
spa e spanned by
nK
y into two
omponents: the orthogonal proje
tion onto the Kdimensional
and the
omponent that is the orthogonal proje
tion onto the
X , X ,
We an de ompose
Sin
e
by
X, .
X. Sin e X
is in this spa e,
X,
or
X = X X X
Therefore, the matrix that proje
ts
1
X y
PX = X(X X)1 X
is
26
CHAPTER 3.
sin e
X = PX y.
is
X.
onto the
We have that
N K
= y X
= y X(X X)1 X y
= In X(X X)1 X y.
is
= In X(X X)1 X
MX
= In PX .
We have
= MX y.
Therefore
y = PX y + MX y
= X + .
These two proje
tion matri
es de
ompose the
omponents - the portion that lies in the
that lies in the orthogonal
PX
nK
and
MX
dimensional ve tor
X,
dimensional spa
e.
are
symmetri
and
idempotent.
A symmetri matrix
An idempotent matrix
is one su h that
A = A .
is one su h that
A = AA.
ith
i =
=
(X X)1 X
ci y
is simply
This is how we dene a linear estimator - it's a linear fun
tion of the dependent variable. Sin
e it's a linear
ombination of the observations on the dependent variable, where the
weights are determined by the observations on the regressors, some observations may have
more inuen
e than others.
3.4.
tth
27
et
In .
be an
in the t
th position,
Dene
ht = (PX )tt
= et PX et
so
ht
is the t
PX .
Note that
ht = k PX et k2
so
ht k et k2 = 1
So
0 < ht < 1.
Also,
T rPX = K h = K/n.
So the average of the
ht
is
K/n.
The value
ht
is referred to as the
If the leverage is mu
h higher than average, the observation has the potential to ae
t the
OLS t importantly. However, an observation may also be inuential due to the value of
yt ,
xt 's.
th
without using the t
observation (designate
rather than the weight it is multiplied by, whi
h only depends on the
To a
ount for this,
onsider estimation of
this estimator as
(t) ).
One an show (see Davidson and Ma Kinnon, pp. 32-5 for proof ) that
(t)
tth
1
1 ht
(X X)1 Xt t
xt
xt (t)
ht
1 ht
t
is
t
While an observation may be inuential if it doesn't ae
t its own tted value, it
ertainly
inuential if it does. A fast means of identifying inuential observations is to plot
(whi
h I will refer to as the
ht
1ht
an example plot of data, t, leverage and inuen
e. The O
tave program is InuentialObservation.m . If you re-run the program you will see that the leverage of the last observation (an
outlying value of x) is always high, and the inuen
e is sometimes high.
After inuential observations are dete
ted, one needs to determine
data entry error, whi h an easily be orre ted on e dete ted. Data entry errors
ommon.
are very
spe
ial e
onomi
fa
tors that ae
t some observations. These would need to be identied
and in
orporated in the model. This is the idea behind
may not be
onstant a
ross all observations.
the parameters
28
CHAPTER 3.
12
10
0
0
0.5
1.5
2.5
3.5
There exist
robust
3.5 Goodness of t
The tted model is
y = X +
Take the inner produ
t:
y y = X X + 2 X +
But the middle term of the RHS is zero sin
e
X = 0,
so
y y = X X +
The
un entered Ru2
is dened as
Ru2 = 1
=
=
yy
X X
yy
k PX y k2
k y k2
= cos2 (),
(3.2)
3.5.
29
GOODNESS OF FIT
where
The un entered
R2
y,
45 degree
(see Figure
Another, more ommon denition measures the ontribution of the variables, other than
R2
= (1, 1, ..., 1) ,
Let
y.
-ve tor. So
M = In ( )1
= In /n
M y
y M y = X M X + M
The
entered Rc2
is dened as
Rc2 = 1
where
ESS = and T SS = y M y =
Pn
ESS
=1
y M y
T SS
t=1 (yt
y)2 .
30
CHAPTER 3.
Supposing that
X = 0
so
M = .
t = 0
In this ase
y M y = X M X +
So
Rc2 =
where
RSS
T SS
RSS = X M X
Rc2
X (PX = ),
then one an
1.
and some geometri al properties. There is no e onomi ontent to the model, and
the regression parameters have no e
onomi
interpretation. For example, what is the partial
derivative of
with respe t to
xj ?
y = 1 x1 + 2 x2 + ... + k xk +
The partial derivative is
y
= j +
xj
xj
need to make additional assumptions. The assumptions that are appropriate to make depend
on the data under
onsideration. We'll start with the
lassi
al linear regression model, whi
h
in
orporates some assumptions that are
learly not realisti
for e
onomi
data. This is to be
able to explain some
on
epts with a minimum of
onfusion and notational
lutter. Later we'll
adapt the results to what we
an get with more realisti
assumptions.
Linearity:
y = 10 x1 + 20 x2 + ... + k0 xk +
0 :
(3.3)
y = x 0 +
K,
1
lim X X = QX
n
(3.4)
3.7.
where
QX
is a nite positive denite matrix. This is needed to be able to identify the individual
(3.5)
(3.6)
E(t s ) = 0, t 6= s
(3.7)
Optionally, we will sometimes assume that the errors are normally distributed.
(3.8)
3.7.1 Unbiasedness
We have
= (X X)1 X y .
By linearity,
= (X X)1 X (X + )
= + (X X)1 X
By 3.4 and 3.5
y = 1 + 2x + ,
where
n = 20,
32
CHAPTER 3.
0.1
0.08
0.06
0.04
0.02
0
-3
-2
-1
bias. The program that generates the plot is Unbiased.m , if you would like to experiment
with this.
With time series data, the OLS estimator will often be biased. Figure 3.7 shows the results
of a small Monte Carlo experiment where the OLS estimator was
al
ulated for 1000 samples
from the AR(1) model with
yt = 0 + 0.9yt1 + t ,
where
n = 20
and
2 = 1.
In this ase,
assumption 3.4 does not hold: the regressors are sto
hasti
. We
an see that the bias in the
estimation of
is about -0.2.
The program that generates the plot is Biased.m , if you would like to experiment with
this.
3.7.2 Normality
With the linearity assumption, we have
= + (X X)1 X .
Adding the assumption of normality (3.8, whi h implies strong exogeneity), then
N , (X X)1 02
sin
e a linear fun
tion of a normal random ve
tor is also normally distributed. In Figure 3.6
you
an see that the estimator appears to be normally distributed.
It in fa t is normally
distributed, sin
e the DGP (see the O
tave program) has normal errors. Even when the data
may be taken to be IID, the assumption of normality is often questionable or simply untenable.
For example, if the dependent variable is the number of automobile trips per week, it is a
ount
variable with a dis
rete distribution, and is thus not normally distributed. Many variables in
3.7.
0.12
0.1
0.08
0.06
0.04
0.02
0
-1
-0.8
-0.6
-0.4
-0.2
0.2
0.4
e onomi s an take on only nonnegative values, whi h, stri tly speaking, rules out normality.
3.7.3 The varian
e of the OLS estimator and the Gauss-Markov theorem
Now let's make all the
lassi
al assumptions ex
ept the assumption of normality.
= + (X X)1 X
= .
E()
= E
V ar()
We have
So
= E (X X)1 X X(X X)1
= (X X)1 02
The OLS estimator is a
dependent variable,
linear estimator ,
y.
=
(X X)1 X y
= Cy
where
also
is a fun tion of the explanatory variables only, not the dependent variable.
unbiased
weights
It is
under the present assumptions, as we proved above. One ould onsider other
X.
is a
Normality may be a good model nonetheless, as long as the probability of a negative value o uring is
negligable under the model. This depends upon the mean being large enough in relation to the varian e.
34
CHAPTER 3.
W X = IK :
E(W y)
E(W X0 + W )
W X0
WX
The varian
e of
IK
is
= W W 2 .
V ()
0
Dene
D = W (X X)1 X
so
W = D + (X X)1 X
Sin
e
W X = IK , DX = 0,
so
D + (X X)1 X D + (X X)1 X 02
1 2
0
=
DD + X X
=
V ()
So
V ()
V ()
The inequality is a shorthand means of expressing, more formally, that
is a positive
V ()V
()
semi-denite matrix. This is a proof of the Gauss-Markov Theorem. The OLS estimator is
the best linear unbiased estimator (BLUE).
It is worth emphasizing again that we have not used the normality assumption in any
way to prove the Gauss-Markov theorem, so it is valid if the errors are not normally
distributed, as long as the other assumptions hold.
To illustrate the Gauss-Markov result,
onsider the estimator that results from splitting the
sample into
resulting estimators.
illustrates this using a small Monte Carlo experiment, whi
h
ompares the OLS estimator and
a 3-way split sample estimator. The data generating pro
ess follows the
lassi
al model, with
n = 21.
= 2.
estimator is more e
ient, sin
e the tails of its histogram are more narrow.
We have that
=
E()
and
=
V ar()
XX
1
02 ,
2
varian
e of , 0 , in order to have an idea of the pre
ision of the estimates of
A ommonly
3.7.
0.1
0.08
0.06
0.04
0.02
0
0
0.5
1.5
2.5
3.5
0.1
0.08
0.06
0.04
0.02
0
0
36
CHAPTER 3.
used estimator of
02
is
c2 =
0
=
c2 ) =
E(
0
=
=
=
=
=
where we use the fa
t that
1
nK
c2 =
1
nK
1
M
nK
1
E(T r M )
nK
1
E(T rM )
nK
1
T rE(M )
nK
1
2 T rM
nK 0
1
2 (n k)
nK 0
02
T r(AB) = T r(BA)
min w x
x
f (x) = q.
The solution is the ve
tor of fa
tor demands
x(w, q).
The
C(w, q)
0
w
is obtained by substi-
3.8.
37
Remember that these derivatives give the onditional fa tor demands (Shephard's Lemma).
Homogeneity The
ost fun
tion is homogeneous of degree 1 in input pri
es: C(tw, q) =
tC(w, q)
where
neous of degree zero in fa tor pri es - they only depend upon relative pri es.
Returns to s ale
returns to s ale
The
parameter
q
C(w, q)
q
C(w, q)
1
= 1.
the form
C = Aw11 ...wg g q q e
C
eC
wj
with respe t to
C
WJ
wj ?
wj
C
= j Aw11 .wj j
..wg g q q e
wj
1
Aw1 ...wg g q q e
= j
This is one of the reasons the Cobb-Douglas form is popular - the
oe
ients are easy to interpret, sin
e they are the elasti
ities of the dependent variable with respe
t to the explanatory
variable. Not that in this
ase,
eC
wj
C
WJ
= xj (w, q)
wj
C
wj
C
sj (w, q)
the
ost share
of the
j th
ln C = + 1 ln w1 + ... + g ln wg + q ln q +
j = sj (w, q).
The ost
38
CHAPTER 3.
where
= ln A
. So we see that the transformed model is linear in the logs of the data.
g
X
g = 1
i=1
=
so
q = 1.
1
=1
q
i 0, i = 1, ..., g.
ln C = 1 + 2 ln Q + 3 ln PL + 4 ln PF + 5 ln PK +
(3.9)
using OLS. To do this yourself, you need the data le mentioned above, as well as Nerlove.m
(the estimation program), and the library of O
tave fun
tions mentioned in the introdu
tion
3
*********************************************************
OLS estimation results
Observations 145
R-squared 0.925955
Sigma-squared 0.153943
Results (Ordinary var-
ov estimator)
onstant
output
labor
fuel
apital
3
estimate
-3.527
0.720
0.436
0.427
-0.220
st.err.
1.774
0.017
0.291
0.100
0.339
t-stat.
-1.987
41.244
1.499
4.249
-0.648
p-value
0.049
0.000
0.136
0.000
0.518
If you are running the bootable CD, you have all of this installed and ready to run.
3.8.
39
*********************************************************
While we will use O
tave programs as examples in this do
ument, sin
e following the programming statements is a useful way of learning how theory is put into pra
ti
e, you may
be interested in a more user-friendly environment for doing e
onometri
s. I heartily re
ommend Gretl, the Gnu Regression, E
onometri
s, and Time-Series Library. This is an easy to
use program, available in English, Fren
h, and Spanish, and it
omes with a lot of data ready
AT X fragments, so that I
an just in
lude
to use. It even has an option to save output as L
E
the results into this do
ument, no muss, no fuss. Here the results of the Nerlove model from
GRETL:
Coe ient
Std. Error
3.5265
t-statisti
1.77437
p-value
1.9875
0.0488
41.2445
0.0000
0.720394
0.0174664
l_labor
0.436341
0.291048
1.4992
0.1361
l_fuel
0.426517
0.100369
4.2495
0.0000
0.219888
0.339429
0.6478
0.5182
l_output
l_ apita
1.72466
1.42172
R2
2
Adjusted R
21.5520
0.392356
0.925955
0.923840
F (4, 140)
437.686
145.084
159.967
bootable CD mentioned in the introdu
tion. I re
ommend using GRETL to repeat the examples that are done using O
tave.
The previous properties hold for nite sample sizes.
properties of the OLS estimator it is useful to review the MLE estimator, sin
e under the
assumption of normal errors the two estimators
oin
ide.
40
CHAPTER 3.
X N (x , ),
AX + b,
where
and
T r(AB) = T r(BA)
for
and
tra e.
yt = 1 + 2 xt + t ,
whi h satises
the
lassi
al assumptions, prove that the varian
e of the OLS estimator de
lines to zero
as the sample size in
reases.
Chapter 4
Maximum likelihood estimation
The maximum likelihood estimator is important sin
e it is asymptoti
ally e
ient, as is shown
below.
For the lassi al linear model with normal errors, the ML and OLS estimators of
are the same, so the following theory is presented without examples. In the se
ond half of the
ourse, nonlinear models with nonnormal errors are introdu
ed, and examples may be found
there.
Y =
y1 . . . yn
and
Z=
n
z1 . . . zn
and
z.
0 :
fY Z (Y, Z, 0 ).
This is the joint density of the sample. This density
an be fa
tored as
L(Y, Z, ) = f (Y, Z, ), ,
where
parameter spa
e.
maximum likelihood estimator
The
is a
of
is the value of
fun
tion.
Note that if
fun
tion
fun
tion
and
fY |Z (Y |Z, )
with respe t to
are said to be
exogenous
0 .
fY |Z (Y |Z, )
that orrespond to
for estimation of
In
42
CHAPTER 4.
If the
L(Y |Z, ) =
where the
ft
n
Y
t=1
f (yt |zt , )
vations,
ontributions of obser-
L(Y, ) = f (y1 |z1 , )f (y2 |y1 , z2 , )f (y3 |y1 , y2 , z3 , ) f (yn |y1, y2 , . . . ytn , zn , )
To simplify notation, dene
x1 = z1 , x2 = {y1 , z2 },
et .
L(Y, ) =
n
Y
t=1
f (yt |xt , )
The riterion fun tion an be dened as the average log-likelihood fun tion:
1X
1
ln f (yt |xt , )
sn () = ln L(Y, ) =
n
n t=1
The maximum likelihood estimator may thus be dened equivalently as
ln L
and
ln()
Dividing by
has no ee t on
y = 1(heads)
be a binary variable that indi
ates whether or not a heads is observed. The out
ome of a toss
is a Bernoulli random variable:
4.1.
43
fY (y, p) = py (1 p)1y
and
ln fY (y, p) = y ln p + (1 y) ln (1 p)
The derivative of this is
ln fY (y, p)
p
=
=
y (1 y)
p (1 p)
yp
p (1 p)
gives
1 X yi p
sn (p)
=
p
n
p (1 p)
i=1
p = y
So it's easy to
al
ulate the MLE of
p0 in
(4.1)
this ase.
Now imagine that we had a bag full of bent
oins, ea
h bent around a sphere of a dierent radius (with the head pointing to the outside of the sphere). We might suspe
t that the probability of a heads
ould depend upon the radius. Suppose that
where
xi =
1 ri
, so that
is a 21 ve tor. Now
pi ()
= pi (1 pi ) xi
so
ln fY (y, )
y pi
pi (1 pi ) xi
pi (1 pi )
= (yi p(xi , )) xi
sn ()
=
Pn
i=1 (yi
p(xi , )) xi
n
There is no expli it
solution for the two elements that set the equations to zero. This is
ommonly the
ase with
ML estimators: they are often nonlinear, and nding the value of the estimate often requires
use of numeri
methods to nd solutions to the rst order
onditions.
explored further in the se
ond half of these notes (see se
tion 14.6).
This possibility is
44
CHAPTER 4.
over
whi h is ompa t.
K .
Maximixation is
parameter spa e .
Uniform
onvergen
e
u.a.s
sn () lim E0 sn () s (, 0 ), .
n
We have suppressed
here for simpli ity. This requires that almost sure onvergen e holds for
all possible parameter values. For a given parameter value, an ordinary Law of Large Numbers
will usually imply almost sure
onvergen
e to the limit of the expe
tation. Convergen
e for a
single element of the parameter spa
e,
ombined with the assumption of a
ompa
t parameter
spa
e, ensures uniform
onvergen
e.
Continuity sn ()
in
is ontinuous in
, .
s (, 0 )
is ontinuous
a.s.
n 0 .
ertainly exists, sin e a ontinuous fun tion has a maximum on a ompa t set.
6= 0
E ln
by Jensen's inequality (
ln ()
L()
L(0 )
L()
ln E
L(0 )
E
sin
e
L(0 )
is
L()
L(0 )
L()
L(0 )dy = 1,
L(0 )
the density fun tion of the observations, and sin e the integral of any density is
1. Therefore, sin e
ln(1) = 0,
E ln
L()
L(0 )
0,
or
s (, 0 ) s (0 , 0 ) 0
4.3.
45
6= 0 :
s (, 0 ) s (0 , 0 ) < 0, 6= 0 , a.s.
is a limit point of n (any sequen
e from a
ompa
t
n is a maximizer, independent of n, we must have
Sin
e
Suppose that
limit point).
s ( , 0 ) s (0 , 0 ) 0.
These last two inequalities imply that
= 0 , a.s.
Thus there is only one limit point, and it is equal to the true parameter value, with probability
one. In other words,
lim = 0 , a.s.
This
ompletes the proof of strong
onsisten
y of the MLE. One
an use weaker assumptions
to prove weak
onsisten
y (
onvergen
e in probability to
a neighborhood
N (0 )
of
0 ,
at least when
gn (Y, ) = D sn ()
n
1X
=
D ln f (yt |xx , )
n t=1
n
This is the
s ore ve tor
(with dim
K 1).
1X
gt ().
n
t=1
sets
as an argument,
suppressed for
larity, but one should not forget that they are still there.
The ML estimator
X
= 1
0.
gn ()
gt ()
n t=1
46
CHAPTER 4.
f (),
not ne essarily
f (0 ) .
E [gt ()] =
=
=
1
[D f (yt |xt , )] f (yt |xt , )dyt
f (yt |xt , )
E [gt ()] = D
= D 1
= 0
So
E (gt () = 0 :
so it implies that
E gn (Y, ) = 0.
g(Y, )
0 :
= g(0 ) + (D g( )) 0
0 g()
or with appropriate denitions
H( ) 0 = g(0 ),
where
= + (1 )0 , 0 < < 1.
minute). So
Now onsider
H( ).
Assume
H( )
n 0 = H( )1 ng(0 )
This is
H( ) = D g( )
= D2 sn ( )
n
1X 2
D ln ft ( )
=
n t=1
4.4.
47
D2 sn ()
2 sn ()
.
Given that this is an average of terms, it should usually be the
ase that this satises a strong
law of large numbers (SLLN).
that this will happen.
Regularity onditions
D2 ln ft ( ) must
over time, and their varian
es must not be
ome innite. We don't assume any parti
ular set
here, sin
e the appropriate assumptions will depend upon the parti
ularities of a given model.
However, we assume that a SLLN applies.
Also, sin
e we know that
a.s.
= + (1 )0 ,
H()
is ontinuous in
we have that
Given this,
a.s.
H( ) lim E D2 sn (0 ) = H (0 ) <
n
Re-arranging orders of limits and dierentiation, whi h is legitimate given regularity onditions, we get
H (0 ) = D2 lim E (sn (0 ))
n
2
D s (0 , 0 )
=
We've already seen that
s (, 0 ) < s (0 , 0 )
i.e., 0
maximizes the limiting obje tive fun tion. Sin e there is a unique maximizer, and by
H (0 )
sn ()
must be negative denite, and therefore of full rank. Therefore the previous inversion
Now onsider
ng(0 ).
a.s.
n 0 H (0 )1 ng(0 ).
(4.2)
This is
ngn (0 ) =
nD sn ()
X
n
n
=
D ln ft (yt |xt , 0 )
n t=1
n
=
We've already seen that
Note that
1 X
gt (0 )
n t=1
E [gt ()] = 0. As su h,
a.s.
gn (0 ) 0,
by onsisten y.
(a
48
CHAPTER 4.
n.
Xn
a random ve tor
Xn
n:
Xn = n
This is the
ase for
of the
Xt .
Xt
Pn
that
will
t=1 Xt
n
Xn
E( ngn (0 ) = 0,
Xn
we get
d
I (0 )1/2 ngn (0 ) N [0, IK ]
where
I (0 ) =
=
This
an also be written as
I (0 )
is known as the
lim V0
ngn (0 )
n
n
ngn (0 ) N [0, I (0 )]
(4.3)
information matrix.
a
n 0 N 0, H (0 )1 I (0 )H (0 )1 .
normally distributed if
n- onsistent
d
n 0 N (0, V )
(4.4)
There do exist, in spe
ial
ases, estimators that are
onsistent su
h that
These are known as
super onsistent
p
n 0 0.
i ally unbiased if
= .
lim E ()
(4.5)
4.5.
49
Estimators that are CAN are asymptoti
ally unbiased, though not all
onsistent estimators
are asymptoti
ally unbiased. Su
h
ases are unusual, though. An example is
p = y (equation
n (
p p).
lim V ar n (
p p) = lim nV ar (
p p)
= lim nV ar (
p)
= lim nV ar (
y)
P
yt
= lim nV ar
n
X
1
V ar(yt ) (by independen
e of obs.)
= lim
n
1
= lim nV ar(y) (by identi
ally distributed obs.)
n
= p (1 p)
H () = I ().
Let
1 =
0 =
ft ()
be short for
ft ()dy,
f (yt |xt , )
so
D ft ()dy
(D ln ft ()) ft ()dy
D2 ln ft ()
0 =
and multiply by
(4.6)
1
n
#
" n
n
1X
1X
[Ht ()] = E
[gt ()] [gt ()]
E
n t=1
n t=1
The s
ores
gt
and
gs
t 6= s,
sin e for
t.
has
50
CHAPTER 4.
basis for a spe
i
ation test proposed by White: if the s
ores appear to be
orrelated one may
question the spe
i
ation of the model). This allows us to write
sin e all ross produ ts between dierent periods expe t to zero. Finally take limits, we get
H () = I ().
This holds for all
0 .
(4.7)
Using this,
a.s.
n 0 N 0, H (0 )1 I (0 )H (0 )1
simplies to
a.s.
n 0 N 0, I (0 )1
I\
(0 ) = n
n
X
(4.8)
H (0 )
and
I (0 ).
We an use
t ()
gt ()g
t=1
H\
(0 ) = H().
Note, one
an't use
h
ih
i
I\
(
)
=
n
g
(
)
g
(
)
0
n
n
V (0 )
These in lude
\
V\
(0 ) = H (0 )
\
V\
(0 ) = I (0 )
\
V\
(0 ) = H (0 )
\
I\
(0 )H (0 )
(OPG) and
sandwi h
The sandwi h form is the most robust, sin e it oin ides with the
quasi-ML estimator.
Theorem 3
4.6.
51
lim E ( ) = 0
Dierentiate wrt
lim E ( ) =
lim
= 0 (this
Noting that
Z
h
i
D f (Y, ) dy
is a
K K
matrix of zeros).
we an write
Z
f (Y, )D dy = 0.
f ()D ln f ()dy + lim
n
D = IK ,
lim
and
Z
f ()D ln f ()dy = IK .
n we get
Z
1
lim
n
n [D ln f ()] f ()dy = IK
n
{z
}
|n
Note that the bra keted part is just the transpose of the s ore ve tor,
lim E
g(),
so we an write
i
h
n
ng() = IK
n , for any CAN esti
varian
e of
n tends to V ().
This means that the
ovarian
e of the s
ore fun
tion with
mator, is an identity matrix. Using this, suppose the
Therefore,
# "
"
#
n
V ()
IK
V
.
=
IK
I ()
ng()
(4.9)
1
I
()
This simplies to
Sin e
is arbitrary,
"
V ()
IK
IK
I ()
#"
I ()1
-ve tor
0.
h
i
I 1 () 0.
V ()
I 1 ()
V ()
1 ()
I
Asymptoti e ien y )
is a
lower bound
V ()
if V ()
0 ,
say
and
is
52
CHAPTER 4.
A dire
t proof of asymptoti
e
ien
y of an estimator is infeasible, but if one
an show that
the asymptoti
varian
e is equal to the inverse of the information matrix, then the estimator
is asymptoti
ally e
ient. In parti
ular,
Summary of MLE
Consistent
This is for general MLE: we haven't spe ied the distribution or the linearity/nonlinearity of the estimator
y = 1(heads)
is
pb0 = y.
n.
a
n (
y p0 ) N 0, H (p0 )1 I (p0 )H (p0 )1
a) nd the analyti
expression for gt () and show that E [gt ()] = 0
b) nd the analyti
al expressions for H (p0 ) and I (p0 ) for this problem
d) Write an O
tave program that does a Monte Carlo study that shows that n (y p0 )
is approximately normally distributed when
show the sampling frequen
y of
n (
y p0 )
yt = xt + t
n.
f (t ) =
1
, < t <
1 + 2t
The Cau
hy density has a shape similar to a normal density, but with mu
h thi
ker tails.
Thus, extremely small and large errors o
ur mu
h more frequently with this density
4.7.
53
EXERCISES
gn ()
where
gn ()
where
yt = xt + t
where
t IIN (0, 2 ).
4. Compare the rst order
onditions that dene the ML estimators of problems 2 and 3
and interpret the dieren
es.
Why
Edit (to see what it does) then run the s ript mle_example.m.
Interpret the
54
CHAPTER 4.
Chapter 5
Asymptoti
properties of the least
squares estimator
1
The OLS estimator under the
lassi
al assumptions is BLUE , for all sample sizes. Now let's
see what happens when the sample size tends to innity.
5.1 Consisten
y
= (X X)1 X y
= (X X)1 X (X + )
= 0 + (X X)1 X
1
XX
X
= 0 +
n
n
Consider the last two terms. By assumption
limn
XX
n
= QX limn
X X
n
1
Q1
X , sin
e the inverse of a nonsingular matrix is a
ontinuous fun
tion of the elements of the
X
matrix. Considering
n ,
n
X
1X
n
Ea h
xt t
xt t
X
n
t=1
=0
V (xt t ) = xt xt 2 .
1
BLUE
55
56CHAPTER 5.
As long as these are nite, and given a te hni al ondition , the Kolmogorov SLLN applies, so
1X
a.s.
xt t 0.
n
t=1
a.s.
0 .
This is the property of
strong onsisten y:
true value.
errors.
If the error distribution is unknown, we of ourse don't know the distribution of the
= 0 + (X X)1 X
0 = (X X)1 X
1
XX
X
n 0
=
n
n
XX
n
1
Q1
X .
Now as before,
X
Considering , the limit of the varian
e is
n
lim V
lim E
= 02 QX
X X
n
The mean is of
ourse zero. To get asymptoti
normality, we need to apply a CLT. We
assume one (for instan
e, the Lindeberg-Feller CLT) holds, so
X d
N 0, 02 QX
n
For appli
ation of LLN's and CLT's, of whi
h there are very many to
hoose from, I'm going to avoid
the te
hni
alities. Basi
ally, as long as terms that make up an average have nite varian
es and are not too
strongly dependent, one will be able to nd a LLN or CLT to apply. Whi
h one it is doesn't matter, we only
need the result.
5.3.
57
ASYMPTOTIC EFFICIENCY
Therefore,
d
n 0 N 0, 02 Q1
X
In summary, the OLS estimator is normally distributed in small and large samples if
is normally distributed. If
s() =
n
X
t=1
Supposing that
yt xt
2
y = X0 + ,
N (0, 02 In ), so
n
Y
1
2t
f () =
exp 2
2
2 2
t=1
The joint density for
so
y
= 1,
In and | y
|
n
Y
(yt xt )2
f (y) =
exp
2 2
2 2
t=1
Taking logs,
= y X,
n
X
(yt xt )2
ln L(, ) = n ln 2 n ln
.
2 2
t=1
are the same as the fon for OLS (up to multipli ation
the estimators are the same, under the present assumptions. Therefore, their
properties are the same. In parti
ular, under the
lassi
al assumptions with normality, the OLS
estimator is asymptoti
ally e
ient.
by a onstant), so
As we'll see later, it will be possible to use (iterated) linear estimation methods and still
a
hieve asymptoti
e
ien
y even if the assumption that
normally distributed. This is
V ar() 6= 2 In ,
as long as
is still
it will be ne
essary to use nonlinear estimation methods to a
hieve asymptoti
ally e
ient
estimation. That possibility is addressed in the se
ond half of the notes.
58CHAPTER 5.
n j j ,
where
is
is one of the
slope parameters.
should be a large number, at least 1000. The model used to generate data should follow
the
lassi
al assumptions, ex
ept that the errors should not be normally distributed (try
Do you
Chapter 6
Restri
tions and hypothesis tests
6.1 Exa
t linear restri
tions
In many
ases, e
onomi
theory suggests restri
tions on the parameters of a model.
For
example, a demand fun
tion is supposed to be homogeneous of degree zero in pri
es and
in
ome. If we have a Cobb-Douglas (log-linear) model,
ln q = 0 + 1 ln p1 + 2 ln p2 + 3 ln m + ,
then we need that
k0 ln q = 0 + 1 ln kp1 + 2 ln kp2 + 3 ln km + ,
so
1 ln p1 + 2 ln p2 + 3 ln m = 1 ln kp1 + 2 ln kp2 + 3 ln km
= (ln k) (1 + 2 + 3 ) + 1 ln p1 + 2 ln p2 + 3 ln m.
The only way to guarantee this for arbitrary
is to set
1 + 2 + 3 = 0,
whi
h is a
6.1.1 Imposition
The general formulation of linear equality restri
tions is the model
y = X +
R = r
where
is a
QK
matrix,
Q<K
and
is a
Q1
59
ve tor of onstants.
60
CHAPTER 6.
We assume
is of rank
Q,
min s() =
1
(y X) (y X) + 2 (R r).
n
The Lagrange multipliers are s aled by 2, whi h makes things less messy. The fon are
)
= 2X y + 2X X R + 2R
0
D s(,
)
= RR r 0,
D s(,
"
whi h an be written as
We get
"
X X R
R
#
"
#"
X X R
R
"
X y
#1 "
X y
r
Maybe you're urious about how to invert a partitioned matrix? I an help you with that:
Note that
"
(X X)1
R (X X)1 IQ
#"
X X R
R
AB
=
"
"
IK
(X X)1 R
R (X X)1 R
#
(X X)1 R
IK
0
C,
and
"
IK
(X X)1 R P 1
P 1
#"
IK
(X X)1 R
DC
= IK+Q ,
6.1.
61
so
DAB = IK+Q
DA = "
B 1
#
#"
(X X)1
0
IK (X X)1 R P 1
1
B
=
R (X X)1 IQ
0
P 1
#
"
(X X)1 (X X)1 R P 1 R (X X)1 (X X)1 R P 1
,
=
P 1 R (X X)1
P 1
If you weren't
urious about that, please start paying attention again. Also, note that we have
P = R (X X)1 R )
"
"
P 1 R (X X)1
P 1
(X X)1 R P 1 R r
=
P 1 R r
#
"
"
#
X)1 R P 1 r
(X
IK (X X)1 R P 1 R
+
=
P 1 r
P 1 R
X y
r
is already known. Re
all that for x a random ve
tor, and for A and
sin
e the distribution of
The fa
t that
#"
and
V ar (Ax + b) = AV ar(x)A .
Though this is the obvious way to go about nding the restri
ted estimator, an easier way,
if the number of restri
tions is small, is to impose them by substitution. Write
h
where
R1
is
an always make
R1 R2
"
1
2
y = X1 1 + X2 2 +
#
= r
R1
X.
Then
1 = R11 r R11 R2 2 .
Substitute this into the model
y = X1 R11 r X1 R11 R2 2 + X2 2 +
y X1 R11 r = X2 X1 R11 R2 2 +
yR = XR 2 + .
62
CHAPTER 6.
is as before
V (2 ) = XR
XR
and the estimator is
V (2 ) = XR
XR
where one estimates
1 , use
2 , so
of
To re
over
fun
tion
02
1
1
02
c2 =
One an
i.e.,
yR XR b2
yR XR b2
n (K Q)
1 ,
V (1 ) = R11 R2 V (2 )R2 R11
1
R2 R11 02
= R11 R2 X2 X2
R = (X X)1 R P 1 R r
R = (X X)1 X
+ (X X)1 R P 1 [r R]
M SE(R ) = (X X)1 2
+ (X X)1 R P 1 [r R] [r R] P 1 R(X X)1
(X X)1 R P 1 R(X X)1 2
So, the rst term is the OLS ovarian e. The se ond term is PSD, and the third term is NSD.
If the restri tion is true, the se ond term is 0, so we are better o.
6.2.
63
TESTING
If the restri
tion is false, we may be better or worse o, in terms of MSE, depending on
the magnitudes of
r R
and
2.
6.2 Testing
In many
ases, one wishes to test e
onomi
theories. If theory suggests parameter restri
tions,
as in the above homogeneity example, one
an test theory by testing parameter restri
tions.
A number of tests are available.
6.2.1 t-test
Suppose one has the model
y = X +
and one wishes to test the
HA :R 6= r
vs.
R r N 0, R(X X)1 R 02
so
02
R r
R(X X)1 R 02
R r
R(X X)1 R
H0 ,
with
N (0, 1) .
but the test would only be valid asymptoti ally in this ase.
Proposition 4
. Under
c2
in pla e of
N (0, 1)
q 2
t(q)
(q)
q
02 ,
(6.1)
distribution.
where =
When a
2
i i
is the
(6.2)
r.v. has the non entrality parameter equal to zero, it is referred to as a entral
2 (n),
64
CHAPTER 6.
Proof: Fa tor
V 1
as
P P
y = P x.
is dened to be
We have
y N (0, P V P )
but
so
P V P = In
and thus
y N (0, In ).
V P P
= In
P V P P
= P
Thus
y y 2 (n)
but
y y = x P P x = xV 1 x
and we get the result we wanted.
A more general proposition whi
h implies this result is
(6.3)
Proposition 8 If the random ve tor (of dimension n) x N (0, I), and B is idempotent with
rank r, then
x Bx 2 (r).
(6.4)
=
02
MX
02
MX
=
0
0
2 (n K)
Proposition 9 If the random ve tor (of dimension n) x N (0, I), then Ax and x Bx are
independent if AB = 0.
Now onsider (remember that we have only one restri tion in this ase)
Rr
R(X X)1 R
q
=
(nK)02
0
c
R r
R(X X)1 R
6.2.
65
TESTING
But
= +(X X)1 X
and
(X X)1 X MX = 0,
so
0
c
R r
R r
=
t(n K)
R
R(X X)1 R
H0 : i = 0
vs.
H0 : i 6= 0
test of signi an e
i
t(n K)
Note:
the
test is stri tly valid only if the errors are a tually normally distributed.
If one has nonnormal errors, one
ould use the above asymptoti
result to justify taking
riti
al values from the
N (0, 1)
distribution, sin e
t(n K) N (0, 1)
H0
as
n .
distribution
distribution is
6.2.2
The
test
(6.5)
independent if AB = 0.
F =
distribution:
1
R r
R r
R (X X)1 R
q
2
F (q, n K).
(ESSR ESSU ) /q
F (q, n K).
ESSU /(n K)
Note:
The
test is stri tly valid only if the errors are truly normally distributed. The
following tests will be appropriate when one annot assume normally distributed errors.
66
CHAPTER 6.
then under
H0 : R0 = r,
Q1
X
or
02
we have
d
n R r N 0, 02 RQ1
X R
so by Proposition [6
Note that
d
n 0 N 0, 02 Q1
X
d
1
r
R
n R r
R
2 (q)
02 RQ1
X
are not observable. The test statisti
we use substitutes the
onsistent es-
n s,
Q1
X . With this, there is a
an
ellation
d
c2 R(X X)1 R 1 R r
R r
2 (q)
0
The Wald test is a simple way to test restri tions without having to estimate the re-
Note that this formula is similar to one of the formulae provided for the
test.
y = (X) +
is nonlinear in
and
but is linear in
under
H0 : = 1.
is a bit more
ompli
ated, so one might prefer to have a test based upon the restri
ted, linear
model. The s
ore test is useful in this situation.
S
ore-type tests are based upon the general prin
iple that the gradient ve
tor of the unrestri
ted model, evaluated at the restri
ted estimate, should be asymptoti
ally normally
distributed with mean zero, if the restri
tions are true. The original development was
for ML estimation, but the prin
iple is valid for a wide variety of estimation methods.
6.2.
67
TESTING
1
R r
R(X X)1 R
= P 1 R r
so
Given that
nP =
n R r
d
n R r N 0, 02 RQ1
X R
nP N 0, 02 RQ1
X R
So
Noting that
d
1 1
2
nP 0 RQX R
nP 2 (q)
1
R,
lim nP = RQX
we obtain,
R(X X)1 R
02
2 (q)
02 .
This makes it
lear why the test is sometimes referred to as a Lagrange multiplier test.
It may seem that one needs the a
tual Lagrange multipliers to
al
ulate this.
If we
impose the restri
tions by substitution, these are not available. Note that the test
an
be written as
(X X)1 R
02
2 (q)
X y + X X R + R
to get that
= X (y X R )
R
= X R
R X(X X)1 X R d 2
(q)
02
68
CHAPTER 6.
PX
d
R 2 (q).
2
0
To see why the test is also known as a s ore test, note that the fon for restri ted least squares
X y + X X R + R
give us
= X y X X R
R
and the rhs is simply the gradient (s
ore) of the unrestri
ted model, evaluated at the restri
ted
estimator. The s
ores evaluated at the unrestri
ted estimate are identi
ally zero. The logi
behind the s
ore test is that the s
ores evaluated at the restri
ted estimate should be approximately zero, if the restri
tion is true. The test is also known as a Rao test, sin
e P. Rao rst
proposed it in 1948.
ln L()
LR = 2 ln L()
where
asymptoti ally
2 ,
ln L()
+
ln L()
(note, the rst order term drops out sin
e
sin e
H()
To show that it is
ln L()
about
n
H()
2
0
D ln L()
is dened in terms of
1
n
ln L())
so
LR n H()
As
H (0 ) = I(0 ),
n , H()
a
LR = n I (0 )
?? that
a
n 0 = I (0 )1 n1/2 g(0 ).
6.3.
69
An analogous result for the restri
ted estimator is (this is unproven here, to prove this set up
the Lagrangean for MLE subje
t to
R = r,
1
a
n 0 = I (0 )1 In R RI (0 )1 R
RI (0 )1 n1/2 g(0 ).
1
a
n = n1/2 I (0 )1 R RI (0 )1 R
RI (0 )1 g(0 )
??
But sin e
i
h
i
1 h
a
LR = n1/2 g(0 ) I (0 )1 R RI (0 )1 R
RI (0 )1 n1/2 g(0 )
d
LR 2 (q).
6.3 The asymptoti
equivalen
e of the LR, Wald and s
ore tests
We have seen that the three tests all
onverge to
to the
same
2 random variables.
2 rv, under the null hypothesis. We'll show that the Wald and LR tests are
asymptoti ally equivalent. We have seen that the Wald test is asymptoti ally equivalent to
Using
a
d
1
r
W = n R r
R
R
02 RQ1
2 (q)
X
0 = (X X)1 X
and
R r = R( 0 )
we get
nR( 0 ) =
nR(X X)1 X
1
XX
n1/2 X
= R
n
70
CHAPTER 6.
?? to get
=
a
=
where
PR
1
RQ1
X X
1
a
= X(X X)1 R 02 R(X X)1 R
R(X X)1 X
1
2
= n1 XQ1
X R 0 RQX R
A(A A)1 A
02
PR
02
X(X X)1 R .
q.
1
1 (y X) (y X)
ln L(, ) = n ln 2 n ln
.
2
2
Using this,
1
ln L(, )
n
X (y X0 )
n 2
X
n 2
g(0 ) D
=
=
Also, by the information matrix equality:
I(0 ) = H (0 )
= lim D g(0 )
X (y X0 )
= lim D
n 2
XX
= lim
n 2
QX
=
2
so
I(0 )1 = 2 Q1
X
6.3.
71
??, we get
1
R(X X)1 X
= W
This
ompletes the proof that the Wald and LR tests are asymptoti
ally equivalent. Similarly,
one
an show that,
qF = W = LM = LR
The
of the errors,
LR
statisti
is
qF
statisti an be thought of as a
qF
and
pseudo-LR statisti ,
LR,
supposing
a LR statisti
in that it uses the value of the obje
tive fun
tions of the restri
ted and
unrestri
ted models, but it doesn't require distributional assumptions.
The presentation of the s ore and Wald tests has been done in the ontext of the linear model.
methods.
are
small samples. The numeri values of the tests also depend upon how
is estimated, and
we've already seen than there are several ways to do this. For example all of the following are
onsistent for
under
H0
nk
n
R R
nk+q
R R
n
and in general the denominator
all be repla
ed with any quantity
su h that
lim a/n = 1.
It
an be shown, for linear regression models subje
t to linear restri
tions, and if
used to
al
ulate the Wald test and
R R
is used for the s
ore test, that
n
n is
72
CHAPTER 6.
For this reason, the Wald test will always reje
t if the LR test reje
ts, and in turn the LR
test reje
ts if the LM test reje
ts. This is a bit problemati
: there is the possibility that by
areful
hoi
e of the statisti
used, one
an manipulate reported results to favor or disfavor a
hypothesis. A
onservative/honest approa
h would be to report all three test statisti
s when
they are available. In the
ase of linear models with normal errors the
test is to be preferred,
t() =
a
100 (1 ) %
t()
H0 : 0 = ,
using a
su h that
signi an e level:
is the interval
< c/2 }
c c/2
A onden e ellipse for two oe ients jointly would be, analogously, the set of {1 , 2 }
su h that the
(or some other test statisti ) doesn't reje t at the spe ied riti al value. This
The region is an ellipse, sin
e the CI for an individual
oe
ient denes a (innitely
long) re
tangle with total prob. mass
1 ,
(e.g.,
an take on any value). Sin
e the ellipse is bounded in both dimensions but also
ontains mass
1 ,
Reje tion of hypotheses individually does not imply that the joint test will reje t.
Joint reje tion does not imply individal tests will reje t.
6.5.
CONFIDENCE INTERVALS
73
74
CHAPTER 6.
6.6 Bootstrapping
When we rely on asymptoti
theory to use the normal distribution-based tests and
onden
e
intervals, we're often at serious risk of making important errors. If the sample size is small
and errors are highly nonnormal, the small sample distribution of
n 0
may be very
dierent than its large sample distribution. Also, the distributions of test statisti
s may not
resemble their limiting distributions at all. A means of trying to gain information on the small
sample distribution of test statisti
s and estimators is the
bootstrap.
X0 +
IID(0, 02 )
X
Given that the distribution of
samples.
is nonsto hasti
The
steps are:
1. Draw
observations from
(it's a
n 1).
yj = X + j
j = (X X)1 X yj .
4. Save
j
J,
J/2
of
j .
limits of the CI. Note that this will not give the shortest CI if the empiri
al distribution is
skewed.
j,
for example a
If the assumption of iid errors is too strong (for example if there is heteros
edasti
ity
or auto
orrelation, see below) one
an work with a bootstrap dened by sampling from
(y, x)
How to hoose
J: J
should be large enough that the results don't hange with repetition
This is easy to he k.
6.7.
75
The bootstrap is based fundamentally on the idea that the empiri
al distribution of the
sample data
onverges to the a
tual sampling distribution as
approximation to the empiri
al distribution of the test statisti
under the alternative
hypothesis, and use this to get
riti
al values.
using the real data, under the null, to the bootstrap riti al values.
r(0 ) = 0.
where
r()
is a
q -ve tor
valued fun tion. Write the derivative of the restri tion evaluated at
as
D r() = R()
0 ,
so that
(R()) = q
in a neighborhood of
0 .
r()
about
= r(0 ) + R( )( 0 )
r()
where
is a onvex ombination of
and
0 .
= R( )( 0 )
r()
Due to
onsisten
y of
we an repla e
by
=
nr()
0 ,
asymptoti ally, so
nR(0 )( 0 )
0 :
76
CHAPTER 6.
n( 0 ).
d
2
nr()
N 0, R(0 )Q1
X R(0 ) 0 .
1
r()
2 (q)
1
X)1 R()
R()(X
r()
r()
c2
0, QX
and
02 ,
the resulting
2 (q)
Sin
e this is a Wald test, it will tend to over-reje
t in nite samples. The s
ore and LR
tests are also possibilities, but they require estimation methods for nonlinear models,
whi
h aren't in the s
ope of this
ourse.
Note that this also gives a onvenient way to estimate nonlinear fun tions and asso iated
d
2
r(0 )
n r()
N 0, R(0 )Q1
X R(0 ) 0
(x) =
where
f (x)
is
x
f (x)
x
f (x)
y = x + .
The elasti
ities of
w.r.t.
are
(x) =
x
x
(note that this is the entire ve tor of elasti ities). The estimated elasti ities are
b(x) =
x
x
6.8.
77
To al ulate the estimated standard errors of all ve elasti ites, use
R() =
(x)
x1 0 0
.
0 x
.
.
2
.
..
..
.
0
0 0 xk
x.
1 x21
2 x22
.
.
.
..
(x )2
0
.
.
.
k x2k
In many
ases, nonlinear restri
tions
an also involve the data, not just the parameters.
For example,
onsider a model of expenditure shares. Let
is pri es and
si (p, m) =
goods is
pi xi (p, m)
, i = 1, 2, ..., G.
m
Now demand must be positive, and we assume that expenditures sum to in
ome, so we have
the restri
tions
G
X
0 si (p, m) 1, i
si (p, m)
i=1
i
si (p, m) = 1i + p pi + mm
+ i
It is fairly easy to write restri
tions su
h that the shares sum to one, but the restri
tion that
the shares lie in the
[0, 1]
0 si (p, m) 1
and
and
m.
m.
It
In su h
ases, one might onsider whether or not a linear model is a reasonable spe i ation.
*********************************************************
OLS estimation results
Observations 145
78
CHAPTER 6.
R-squared 0.925955
Sigma-squared 0.153943
Results (Ordinary var-
ov estimator)
onstant
output
labor
fuel
apital
estimate
-3.527
0.720
0.436
0.427
-0.220
st.err.
1.774
0.017
0.291
0.100
0.339
t-stat.
-1.987
41.244
1.499
4.249
-0.648
p-value
0.049
0.000
0.136
0.000
0.518
*********************************************************
Note that
sK = K < 0,
and that
L + F + K 6= 1.
L + F + K = 1.
Q = 1,
jointly. NerloveRestri
tions.m imposes and tests CRTS and then HOD1. From it we obtain
the results that follow:
onstant
output
labor
fuel
apital
estimate
-4.691
0.721
0.593
0.414
-0.007
st.err.
0.891
0.018
0.206
0.100
0.192
t-stat.
-5.263
41.040
2.878
4.159
-0.038
p-value
0.000
0.000
0.005
0.000
0.969
*******************************************************
Value
p-value
F
0.574
0.450
Wald
0.594
0.441
LR
0.593
0.441
S
ore
0.592
0.442
6.8.
79
onstant
output
labor
fuel
apital
st.err.
2.966
0.000
0.489
0.167
0.572
t-stat.
-2.539
Inf
0.040
4.289
0.132
p-value
0.012
0.000
0.968
0.000
0.895
*******************************************************
Value
p-value
F
256.262
0.000
Wald
265.414
0.000
LR
150.863
0.000
S
ore
93.771
0.000
Noti e that the input pri e oe ients in fa t sum to 1 when HOD1 is imposed. HOD1 is
e.g., = 0.10).
Also,
R2
the restri tion is imposed, ompared to the unrestri ted results. For CRTS, you should note
Q = 1 is
2
reje
ted by the test statisti
s at all reasonable signi
an
e levels. Note that R drops quite a
that
Q = 1,
bit when imposing CRTS. If you look at the unrestri
ted estimation results, you
an see that
a t-test for
Q = 1
From the point of view of neo
lassi
al e
onomi
theory, these results are not anomalous:
HOD1 is an impli
ation of the theory, but CRTS is not.
Exer ise 12 Modify the NerloveRestri tions.m program to impose and test the restri tions
jointly.
Sin e CRTS is reje ted, let's examine the possibilities more arefully. Re all
that the data is sorted by output (the third
olumn). Dene 5 subsamples of rms, with the
rst group being the 29 rms with the lowest output levels, then the next 29 rms, et
. The
ve subsamples
an be indexed by
j = 1, 2, ..., 5,
where
j = 1
for
t = 1, 2, ...29, j = 2
for
ln Ct = 1j + 2j ln Qt + 3j ln PLt + 4j ln PF t + 5j ln PKt + t
(6.6)
80
CHAPTER 6.
where
is a supers ript (not a power) that ini ates that the oe ients may be dierent
y1
X1 0
y2 0
.. ..
. = .
y5
0
where
y1
j is the
is 291,
29 1
X1
is 295,
X2
X3
X4
is the
51
.
+
.
5
5
X5
j th
(6.7)
subsample, and
th subsample.
ve
tor of errors for the j
The O
tave program Restri
tions/ChowTest.m estimates the above model. It also tests the
hypothesis that the ve subsamples share the same parameter ve
tor, or in other words, that
there is
oe
ient stability a
ross the ve subsamples. The null to test is that the parameter
ve
tors for the separate groups are all the same, that is,
1 = 2 = 3 = 4 = 5
This type of test, that parameters are
onstant a
ross dierent sets of data, is sometimes
referred to as a
Chow test.
There are 20 restri tions. If that's not lear to you, look at the O tave program.
The restri tions are reje ted at all onventional signi an e levels.
Sin
e the restri
tions are reje
ted, we should probably use the unrestri
ted model for analysis.
What is the pattern of RTS as a fun
tion of the output group (small to large)? Figure 6.2 plots
RTS. We
an see that there is in
reasing RTS for small rms, but that RTS is approximately
onstant for large rms.
ln Ci = 1j + 2j ln Qi + 3 ln PLi + 4 ln PF i + 5 ln PKi + i
(a) estimate this model by OLS, giving
(6.8)
statisti
s for tests of signi
an
e, and the asso
iated p-values. Interpret the results
in detail.
6.9.
81
EXERCISES
2.2
1.8
1.6
1.4
1.2
1
1
1.5
2.5
3.5
4.5
(b) Test the restri
tions implied by this model (relative to the model that lets all
oe
ients vary a
ross groups) using the F, qF, Wald, s
ore and likelihood ratio
tests. Comment on the results.
(
) Estimate this model but imposing the HOD1 restri
tion,
using an OLS
estimation
program. Don't use m
_olsr or any other restri
ted OLS estimation program. Give
estimated standard errors for all
oe
ients.
(d) Plot the estimated RTS parameters as a fun
tion of rm size. Compare the plot to
that given in the notes for the unrestri
ted model. Comment on the results.
2. For the simple Nerlove model, estimated returns to s
ale is
[
RT
S =
1
cq .
Apply the
delta method to al ulate the estimated standard error for estimated RTS. Dire tly test
3. Perform a Monte Carlo study that generates data from the model
y = 2 + 1x2 + 1x3 +
where the sample size is 30,
and
x2
and
x3
[0, 1]
IIN (0, 1)
(a) Compare the means and standard errors of the estimated
oe
ients using OLS
and restri
ted OLS, imposing the restri
tion that
2 + 3 = 2.
(b) Compare the means and standard errors of the estimated
oe
ients using OLS
and restri
ted OLS, imposing the restri
tion that
2 + 3 = 1.
82
CHAPTER 6.
Chapter 7
Generalized least squares
One of the assumptions we've made up to now is that
t IID(0, 2 ),
or o
asionally
t IIN (0, 2 ).
Now we'll investigate the
onsequen
es of nonidenti
ally and/or dependently distributed errors.
We'll assume xed regressors for now, relaxing this admittedly unrealisti
assumption later.
The model is
y = X +
E() = 0
V () =
where
in pla e of
to simplify
has the same number on the main diagonal but nonzero elements
o the main diagonal gives identi
ally (assuming higher moments are also the same)
dependently distributed errors. This is known as
auto orrelation.
The general ase ombines heteros edasti ity and auto orrelation.
This is known as
nonspheri
al disturban
es, though why this term is used, I have no idea. Perhaps it's
be
ause under the
lassi
al assumptions, a joint
onden
e region for
dimensional hypersphere.
83
would be an
84
CHAPTER 7.
= (X X)1 X y
= + (X X)1 X
The varian e of
is
i
h
E ( )( ) = E (X X)1 X X(X X)1
= (X X)1 X X(X X)1
(7.1)
isn't
any
for the
If
t,
2 ,
it doesn't exist as a feature of the true d.g.p. In parti ular, the formulas
F, 2 based tests given above do not lead to statisti s with these distributions.
is still onsistent, following exa tly the same argument given before.
testing hypotheses.
we still have
n
=
n(X X)1 X
1
XX
=
n1/2 X
n
Dene the limiting varian
e of
n1/2 X
lim E
so we obtain
Summary:
X X
n
d
1
n N 0, Q1
X QX
unbiased in the same ir umstan es in whi h the estimator is unbiased with iid errors
has a dierent varian e than before, so the previous test statisti s aren't valid
is onsistent
7.2.
85
is asymptoti
ally normally distributed, but with a dierent limiting
ovarian
e matrix.
Previous test statisti
s aren't valid in this
ase for this reason.
is ine
ient, as is shown below.
P P = 1
Here,
P P = In
so
P P P = P ,
whi
h implies that
P P = In
Consider the model
P y = P X + P ,
or, making the obvious denitions,
y = X + .
This varian
e of
= P
is
E(P P ) = P P
= In
Therefore, the model
y = X +
E( ) = 0
V ( ) = In
satises the
lassi
al assumptions.
formed model:
GLS
= (X X )1 X y
= (X P P X)1 X P P y
= (X 1 X)1 X 1 y
86
CHAPTER 7.
The GLS estimator is unbiased in the same
ir
umstan
es under whi
h the OLS estimator
is unbiased. For example, assuming
is nonsto hasti
E(GLS ) = E (X 1 X)1 X 1 y
= E (X 1 X)1 X 1 (X +
= .
GLS
an be al ulated using
= (X X )1 X y
= (X X )1 X (X + )
= + (X X )1 X
so
GLS GLS
E
= E (X X )1 X X (X X )1
= (X X )1 X X (X X )1
= (X X )1
= (X 1 X)1
Either of these last formulas
an be used.
All the previous results regarding the desirable properties of the least squares estimator
hold, when dealing with the transformed model, sin
e the transformed model satises
the
lassi
al assumptions..
an set it to
1.
in pla e of
X.
The GLS estimator is more e
ient than the OLS estimator. This is a
onsequen
e of
the Gauss-Markov theorem, sin
e the GLS estimator is based on a model that satises
the
lassi
al assumptions but the OLS estimator is not. To see this dire
tly, not that
= AA
where
i
h
A = (X X)1 X (X 1 X)1 X 1 .
AA
AA
is a quadrati form in a
As one an verify by al ulating fon , the GLS estimator is the solution to the mini-
7.3.
87
FEASIBLE GLS
mization problem
metri 1
The
: it's an
unique elements.
nn
matrix with
n2 n /2 + n = n2 + n /2
and in
reases faster than
n.
There's no way to devise an estimator that satises a LLN without adding restri tions.
form of
Suppose that we
and
where
may in lude
as well as
= (X, )
where
as long as
If we repla e
(X, )
we an onsistently estimate
p
b = (X, )
(X, )
b
,
The FGLS estimator shares the same asymptoti
properties as GLS. These are
1. Consisten
y
2. Asymptoti
normality
3. Asymptoti
e
ien
y
if
().
b = (X, )
1 ).
Pb = Chol(
P y = P X + P
5. Estimate using OLS on the transformed model.
88
CHAPTER 7.
E( ) =
is a diagonal matrix, so that the errors are un
orrelated, but have dierent varian
es. Heteros
edasti
ity is usually thought of as asso
iated with
ross se
tional data, though there is
absolutely no reason why time series data
annot also be heteros
edasti
. A
tually, the popular ARCH (autoregressive
onditionally heteros
edasti
) models expli
itly assume that a time
series is heteros
edasti
.
Consider a supply fun
tion
qi = 1 + p Pi + s Si + i
where
Pi
is pri e and
Si
ith
rm.
unobservable fa
tors (e.g., talent of managers, degree of
oordination between produ
tion
units,
et .)
i .
Si
low.
Another example, individual demand.
qi = 1 + p Pi + m Mi + i
where
is pri e and
are more possibilities for expression of preferen
es when one is ri
h, so it is possible that the
varian
e of
is high.
d
1
n N 0, Q1
X QX
lim E
n
This matrix has dimension
K K
X X
n
onsistently. The onsistent estimator, under heteros edasti ity but no auto orrelation is
X
b= 1
xt xt 2t
n t=1
7.4.
89
HETEROSCEDASTICITY
One
an then modify the previous test statisti
s to obtain tests that are valid when there is
heteros
edasti
ity of unknown form. For example, the Wald test for
be
n R r
X X
n
1
X X
n
1
!1
H0 : R r = 0
would
a
R r 2 (q)
Goldfeld-Quandt
n1 , n2
where
n3
and
observations,
1 M 1 1 d 2
1 1
(n1 K)
=
2
2
and
3 M 3 3 d 2
3 3
=
(n3 K)
2
2
so
1 1 /(n1 K) d
F (n1 K, n3 K).
3 3 /(n3 K)
The distributional result is exa
t if the errors are normally distributed. This test is a two-tailed
test. Alternatively, and probably more
onventionally, if one has prior ideas about the possible
magnitudes of the varian
es of the observations, one
ould order the observations a
ordingly,
from largest to smallest. In this
ase, one would use a
onventional one-tailed F-test.
pi ture.
Draw
Ordering the observations is an important step if the test is to have any power.
The motive for dropping the middle observations is to in
rease the dieren
e between the
average varian
e in the subsamples, supposing that there exists heteros
edasti
ity. This
an in
rease the power of the test. On the other hand, dropping too many observations
will substantially in
rease the varian
e of the statisti
s
1 1
and
3 3 .
A rule of thumb,
If one doesn't have any ideas about the form of the het. the test will probably have low
power sin
e a sensible data ordering isn't available.
White's test
When one has little idea if there exists heteros edasti ity, and no idea of its
potential form, the White test is a possibility. The idea is that if there is homos
edasti
ity,
then
E(2t |xt ) = 2 , t
so that
xt
or fun tions of
xt
E(2t ).
90
CHAPTER 7.
1. Sin e
instead.
2. Regress
2t = 2 + zt + vt
where
zt
is a
P -ve tor. zt
= 0.
ESSR = T SSU ,
The
qF
R2
P (ESSR ESSU ) /P
ESSU / (n P 1)
qF = (n P 1)
Note that this is the
as well as other
xt .
qF =
Note that
xt ,
xt ,
R2
1 R2
or the arti ial regression used to test for heteros edasti ity,
2
not the R of the original model.
An asymptoti
ally equivalent statisti
, under the null of no heteros
edasti
ity (so that
R2
nR2 2 (P ).
This doesn't require normality of the errors, though it does assume that the fourth moment
of
Question:
The White test has the disadvantage that it may not be very powerful unless the
zt ve tor
It also has the problem that spe i ation errors other than heteros edasti ity may lead
is hosen well, and this is hard to do without knowledge of the form of heteros edasti ity.
to reje tion.
(2t )
= h( +
zt ), where
h()
more general than is may appear from the regression that is used.
the observations are ordered a ording to the suspe ted form of the heteros edasti ity.
onsistently be determined.
()
be supplied, and
7.4.
91
HETEROSCEDASTICITY
().
yt = xt + t
t2 = E(2t ) = zt
but the other
lassi
al assumptions hold. In this
ase
2t = zt
and
vt
+ vt
and
onsistently,
2t in pla
e of
were t observable. The solution is to substitute the squared OLS residuals
we
an estimate 2
2t , sin
e it is
onsistent by the Slutsky theorem. On
e we have and ,
t
onsistently using
t2 = zt
t2 .
In the se ond step, we transform the model by dividing by the standard deviation:
x
t
yt
= t +
t
or
yt = x
t + t .
Asymptoti
ally, this model satises the
lassi
al assumptions.
This model is a bit
omplex in that NLS is required to estimate the model of the varian
e.
A simpler version would be
yt
xt + t
t2 = E(2t ) = 2 zt
where
zt
is a single variable. There are still two parameters to be estimated, and the
sear h method
an be used in this
ase to redu
e the estimation problem to repeated appli
ations of
OLS.
The regression
e.g.,
ztm .
2t = 2 ztm + vt
[0, 3].
92
CHAPTER 7.
m ,
so one an estimate
by OLS.
Can rene.
Works well when the parameter to be sear hed over is low dimensional, as in this ase.
mum
ESSm
ESSm .
as the estimate.
Draw pi ture.
series model.
It may be reasonable to presume that the varian e is onstant over time within
the
ross-se
tional units, but that it diers a
ross them (e.g., rms or
ountries of dierent
sizes...). The model is
yit = xit + it
E(2it ) = i2 , t
where
i = 1, 2, ..., G
t = 1, 2, ..., n
later.
E(it is ) = 0.
i2
i2 =
1X 2
it
n
t=1
1/n
nK
regressors, so
yit
x
it
= it +
i
Do this for ea
h
ross-se
tional group.
assumptions, asymptoti
ally.
7.4.
93
HETEROSCEDASTICITY
0.5
-0.5
-1
-1.5
0
20
40
60
80
100
120
140
160
Value
p-value
White's test
61.903
0.000
Value
p-value
GQ test
10.886
0.000
All in all, it is very
lear that the data are heteros
edasti
. That means that OLS estimation
is not e
ient, and tests of restri
tions that ignore heteros
edasti
ity are not valid.
The
previous tests (CRTS, HOD1 and the Chow test) were
al
ulated assuming homos
edasti
ity.
The O
tave program GLS/NerloveRestri
tions-Het.m uses the Wald test to
he
k for CRTS
and HOD1, but using a heteros
edasti
-
onsistent
ovarian
e estimator.
1
By the way, noti e that GLS/NerloveResiduals.m and GLS/HetTests.m use the restri ted LS estimator
dire
tly to restri
t the fully general model with all
oe
ients varying to the model with only the
onstant
and the output
oe
ient varying. But GLS/NerloveRestri
tions-Het.m estimates the model by substituting
94
CHAPTER 7.
Testing HOD1
Wald test
Value
6.161
p-value
0.013
Value
20.169
p-value
0.001
Testing CRTS
Wald test
We see that the previous
on
lusions are altered - both CRTS is and HOD1 are reje
ted at
the 5% level. Maybe the reje
tion of HOD1 is due to to Wald test's tenden
y to over-reje
t?
From the previous plot, it seems that the varian
e of
Suppose that the 5 size groups have dierent error varian es (heteros edasti ity by groups):
V ar(i ) = j2 ,
where
j = 1
if
i = 1, 2, ..., 29,
et .,
as before.
estimates the model using GLS (through a transformation of the model so that OLS
an be
applied). The estimation results are
*********************************************************
OLS estimation results
Observations 145
R-squared 0.958822
Sigma-squared 0.090800
Results (Het.
onsistent var-
ov estimator)
onstant1
onstant2
onstant3
onstant4
onstant5
output1
output2
output3
output4
output5
labor
fuel
estimate
-1.046
-1.977
-3.616
-4.052
-5.308
0.391
0.649
0.897
0.962
1.101
0.007
0.498
st.err.
1.276
1.364
1.656
1.462
1.586
0.090
0.090
0.134
0.112
0.090
0.208
0.081
t-stat.
-0.820
-1.450
-2.184
-2.771
-3.346
4.363
7.184
6.688
8.612
12.237
0.032
6.149
p-value
0.414
0.149
0.031
0.006
0.001
0.000
0.000
0.000
0.000
0.000
0.975
0.000
the restri
tions into the model. The methods are equivalent, but the se
ond is more
onvenient and easier to
understand.
7.4.
95
HETEROSCEDASTICITY
apital
-0.460
0.253
-1.818
0.071
*********************************************************
*********************************************************
OLS estimation results
Observations 145
R-squared 0.987429
Sigma-squared 1.092393
Results (Het.
onsistent var-
ov estimator)
estimate
-1.580
-2.497
-4.108
-4.494
-5.765
0.392
0.648
0.892
0.951
1.093
0.103
0.492
-0.366
onstant1
onstant2
onstant3
onstant4
onstant5
output1
output2
output3
output4
output5
labor
fuel
apital
st.err.
0.917
0.988
1.327
1.180
1.274
0.090
0.094
0.138
0.109
0.086
0.141
0.044
0.165
t-stat.
-1.723
-2.528
-3.097
-3.808
-4.525
4.346
6.917
6.474
8.755
12.684
0.733
11.294
-2.217
p-value
0.087
0.013
0.002
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.465
0.000
0.028
*********************************************************
Testing HOD1
Value
9.312
Wald test
p-value
0.002
The rst panel of output are the OLS estimation results, whi
h are used to
onsistently
estimate the
The
j2 .
R2
The se ond panel of results are the GLS estimation results. Some omments:
measures are not omparable - the dependent variables are not the same. The
measure for the GLS results uses the transformed dependent variable. One ould al ulate a omparable
R2
an
be in-
terpreted as eviden e of improved e ien y of GLS, sin e the OLS standard errors are
96
CHAPTER 7.
Note that the previously noted pattern in the output oe ients persists. The non onstant CRTS result is robust.
The oe ient on apital is now negative and signi ant at the 3% level. That seems to
indi ate some kind of problem with the model or the data, or e onomi theory.
Spe i ation
error in model?
7.5.1 Causes
Auto
orrelation is the existen
e of
orrelation a
ross the error term:
E(t s ) 6= 0, t 6= s.
Why might this o
ur? Plausible explanations in
lude
yt = xt + t ,
one
ould interpret
of observations.
equilibrium.
xt
One an interpret
xt
t+1
to be positive, onditional on
positive,
auto orrelation.
yt = 0 + 1 xt + 2 x2t + t
7.5.
97
AUTOCORRELATION
but we estimate
yt = 0 + 1 xt + t
The ee
ts are illustrated in Figure 7.2.
98
CHAPTER 7.
7.5.3 AR(1)
There are many types of auto
orrelation. We'll
onsider two examples. The rst is the most
ommonly en
ountered
ase: autoregressive order 1 (AR(1) errors. The model is
yt = xt + t
t = t1 + ut
ut iid(0, u2 )
E(t us ) = 0, t < s
We assume that the model satises the other
lassi
al assumptions.
|| < 1.
explodes as
t = t1 + ut
= (t2 + ut1 ) + ut
= 2 t2 + ut1 + ut
= 2 (t3 + ut2 ) + ut1 + ut
In the limit the lagged
t =
m 0
as
m ,
so we obtain
m utm
m=0
With this, the varian
e of
is found as
E(2t )
u2
u2
1 2
2m
m=0
so
V (t ) =
0th
u2
1 2
0 = V (t )
7.5.
99
AUTOCORRELATION
is
Cov(t , t1 ) = s = E((t1 + ut ) t1 )
=
V (t )
u2
1 2
s<t
Cov(t , ts ) = s =
The
t:
s u2
1 2
{t }
is
ovarian e stationary
ov(x, y)
se(x)se(y)
but in this ase, the two standard errors are the same, so the
s-order
auto orrelation
is
s = s
1 2
| {z }
.
.
.
n1
..
{z
n1
n2
.
.
..
.
1
}
So we have homos
edasti
ity, but elements o the main diagonal are not zero.
this depends only on two parameters,
we an apply FGLS.
It turns out that it's easy to estimate these
onsistently. The steps are
1. Estimate the model
yt = xt + t
by OLS.
t =
t1 + ut
Sin
e
t t ,
All of
2
and u . If we
an estimate these
onsistently,
t = t1 + ut
100
CHAPTER 7.
t =
Therefore,
ut ,
the estimator
u2 =
1X 2 p 2
(
ut ) u
n
t=2
u2
and
= (
, form
u2 , ) using the previous stru
ture
u2 /(1 2 ),
sin e it
1
1 X
1 y).
F GLS = X
(X
One an iterate the pro ess, by taking the rst FGLS estimator of
re-estimating
2
and u , et
. If one iterates to
onvergen
es it's equivalent to MLE (supposing normal
errors).
and Or utt.
it an be
y1 = y1
x1 = x1
1 2
1 2
1 .
See
Davidson and Ma
Kinnon, pg. 348-49 for more dis
ussion. Note that the varian
e of
is
u2 ,
y1
asymptoti ally, so we see that the transformed model will be homos edasti (and
u s
y s,
7.5.4 MA(1)
The linear regression model with moving average order 1 errors is
yt = xt + t
t = ut + ut1
ut iid(0, u2 )
E(t us ) = 0, t < s
7.5.
101
AUTOCORRELATION
In this ase,
i
h
V (t ) = 0 = E (ut + ut1 )2
=
u2 + 2 u2
u2 (1 + 2 )
Similarly
and
= u2
1 + 2
1 + 2
..
.
.
.
.
.
.
..
1 + 2
1 =
=1
1
0
(1 + 2 )
2
u
2 (1+2 )
u
and a minimum at
= 1,
Again the
ovarian
e matrix has a simple stru
ture that depends on only two parameters. The
problem in this
ase is that one
an't estimate
using OLS on
t = ut + ut1
be
ause the
ut
V (t ) = 2 = u2 (1 + 2 )
102
CHAPTER 7.
1X 2
c2 = 2 (1
\
2) =
t
u
n
t=1
u2
and
X
c2 (1 + b2 ) = 1
2
u
n t=1 t
However, this isn't su
ient to dene
onsistent estimators of the parameters, sin
e it's
unidentied.
and
t1
using
X
d2 = 1
d t , t1 ) =
Cov(
t t1
u
n t=2
This is a
onsistent estimator, following a LLN (and given that the epsilon hats are
onsistent for the epsilons). As above, this
an be interpreted as dening an unidentied
estimator:
X
c2 = 1
t t1
u
n
t=2
Now solve these two equations to obtain identied (and therefore
onsistent) estimators
of both
and
u2 .
c2 )
= (,
following the form we've seen above, and transform the model using the Cholesky de omposition. The transformed model satises the lassi al assumptions asymptoti ally.
7.5.5 Asymptoti
ally valid inferen
es with auto
orrelation of unknown form
See Hamilton Ch. 10, pp. 261-2 and 280-84.
When the form of auto
orrelation is unknown, one may de
ide to use the OLS estimator,
without
orre
tion. We've seen that this estimator has the limiting distribution
where, as before,
is
d
1
n N 0, Q1
X QX
= lim E
n
Dene
X X
n
mt = xt t
(re
all that
xt
is dened as a
K 1
7.5.
103
AUTOCORRELATION
X =
=
=
t=1
n
X
i
2
xn .
..
n
x1 x2
n
X
xt t
mt
t=1
so that
1
= lim E
n n
We assume that
n
X
mt
t=1
n
X
mt
t=1
!#
mt
and
mts
does
t).
not depend on
Dene the
mt
"
v th
mt
auto ovarian e of
as
v = E(mt mtv ).
Note that
mt
E(mt mt+v ) = v .
v = E(mt mtv ) 6= 0
Note that this auto
ovarian
e does not depend on
ontemporaneously orrelated (
t,
will in general
= i2
to base a parametri
spe
i
ation. Re
ent resear
h has fo
used on
onsistent nonparametri
estimators of
Now dene
1
n = E
n
We have (
"
n
X
t=1
mt
n
X
t=1
mt
!#
show that the following is true, by expanding sum and shifting rows to left)
n = 0 +
n2
n1
1
1 + 1 +
2 + 2 +
n1 + n1
n
n
n
104
CHAPTER 7.
is
n
X
cv = 1
m
tm
tv .
n
t=v+1
where
m
t = xt t
(note: one
ould put
of
would be
1/(n v)
1/n
instead of
c + n 2
c + + 1
[
c1 +
c2 +
n =
c0 + n 1
[
+
n1
n1
1
2
n
n
n
n1
Xnv
c .
c0 +
cv +
=
v
n
v=1
This estimator is in
onsistent in general, sin
e the number of parameters to estimate is more
than the number of observations, and in
reases more rapidly than
build up as
n .
q(n)
as
modied estimator
where
n,
n =
c0 +
q(n)
X
v=1
c ,
cv +
q(n)
tends to
The assumption that auto
orrelations die o is reasonable in many
ases. For example,
the AR(1) model with
|| < 1
nv
n
an be dropped be
ause it tends to one for
The term
A disadvantage of this estimator is that is may not be positive denite. This ould ause
v < q(n),
given that
q(n)
n.
n =
c0 +
q(n)
X
v=1
v
c .
cv +
1
v
q+1
This estimator is p.d. by onstru tion. The ondition for onsisten y is that
0.
q.
n1/4 q(n)
It is an example of a
kernel
estimator.
Finally, sin
e
has
as its limit,
p
n
We an now use
and
d
Q
X =
1
n X X to
onsistently estimate the limiting distribution of the OLS estimator under heteros edasti ity
7.5.
105
AUTOCORRELATION
and auto
orrelation of unknown form. With this, asymptoti
ally valid tests are
onstru
ted
in the usual way.
DW
=
=
Pn
(
t t1 )2
t=2P
n
2t
t=1
Pn
t t1
2t 2
t=2
Pn 2
t
t=1
+ 2t1
The null hypothesis is that the rst order auto orrelation of the errors is zero:
1 = 0.
HA : 1 6= 0.
H0 :
the errors are AR(1), sin
e many general patterns of auto
orrelation will have the rst
order auto
orrelation dierent than zero. For this reason the test is useful for dete
ting
auto
orrelation in general.
Under the null, the middle term tends to zero, and the other two tend to one, so
tends to
2,
so
= 1.
DW 0
DW 2.
2,
so
DW 4
= 1.
DW
X,
so tables
an't give exa
t
riti
al values. The give upper and lower bounds, whi
h
orrespond to
the extremes that are possible. See Figure 7.3. There are means of determining exa
t
riti
al values
onditional on
X.
is xed in repeated
pre
isely the
ontext where the test would have appli
ation. It is possible to relate the
DW test to other test statisti
s whi
h are valid without stri
t exogeneity.
106
CHAPTER 7.
This test uses an auxiliary regression, as does the White test for heteros
edasti
ity. The
regression is
t = xt + 1 t1 + 2 t2 + + P tP + vt
and the test statisti
is the
nR2
restri tions,
2
so the test statisti
is asymptoti
ally distributed as a (P ).
The intuition is that the lagged errors shouldn't
ontribute to explaining the
urrent
error if there is no auto
orrelation.
xt
if the
This test is valid even if the regressors are sto hasti and ontain lagged dependent
The alternative is not that the model is an AR(P), following the argument above. The
variables, so it is onsiderably more useful than the DW test for typi al time series data.
7.5.
107
AUTOCORRELATION
We've seen that the OLS estimator is
onsistent under auto
orrelation, as long as
This will be the
ase when
where
E(X ) = 0,
plim Xn = 0.
ontains lagged y s and the errors are auto orrelated. A simple example is the ase
of a single lag of the dependent variable with AR(1) errors. The model is
yt = xt + yt1 + t
t = t1 + ut
Now we
an write
E(yt1 t ) = E (xt1 + yt2 + t1 )(t1 + ut )
6= 0
plim
X
n
6= 0.
E(2t1 )
In this ase
Sin e
plim = + plim
E(X ) 6= 0,
and
X
n
the OLS estimator is in
onsistent in this
ase. One needs to estimate by instrumental variables
(IV), whi
h we'll get to later.
7.5.8 Examples
Nerlove model, yet again
The Nerlove model uses ross-se tional data, so one may not
ln C = 1 + 2 ln Q + 3 ln PL + 4 ln PF + 5 ln PK +
and the extended Nerlove model
ln C = 1j + 2j ln Q + 3 ln PL + 4 ln PF + 5 ln PK + .
We have seen eviden
e that the extended model is preferred.
model, the simple model is misspe
ied.
So if it is in fa t the proper
ln Q,
Value
34.930
p-value
0.000
The
108
CHAPTER 7.
Residuals
Quadratic fit to Residuals
1.5
0.5
-0.5
-1
0
??)
to see
Klein model
in the model explains onsumption (C ) as a fun tion of prots (P ), both urrent and lagged,
as well as the sum of wages in the private se tor (W ) and wages in the government se tor
(W ). Have a look at the README le for this data set. This gives the variable names and
other information.
Consider the model
10
7.5.
109
AUTOCORRELATION
The O
tave program GLS/Klein.m estimates this model by OLS, plots the residuals, and
performs the Breus
h-Godfrey test, using 1 lag of the residuals.
results are:
*********************************************************
OLS estimation results
Observations 21
R-squared 0.981008
Sigma-squared 1.051732
Results (Ordinary var-
ov estimator)
Constant
Profits
Lagged Profits
Wages
estimate
16.237
0.193
0.090
0.796
st.err.
1.303
0.091
0.091
0.040
t-stat.
12.464
2.115
0.992
19.933
p-value
0.000
0.049
0.335
0.000
*********************************************************
Value
p-value
Breus
h-Godfrey test
1.539
0.215
and the residual plot is in Figure 7.5. The test does not reje
t the null of nonauto
orrelatetd
errors, but we should remember that we have only 21 observations, so power is likely to be
fairly low. The residual plot leads me to suspe
t that there may be auto
orrelation - there are
some signi
ant runs below and above the x-axis. Your opinion may dier.
Sin
e it seems that there
may
The
O
tave program GLS/KleinAR1.m estimates the Klein
onsumption equation assuming that
the errors follow the AR(1) pattern. The results, with the Breus
h-Godfrey test for remaining
auto
orrelation are:
*********************************************************
OLS estimation results
Observations 21
R-squared 0.967090
Sigma-squared 0.983171
Results (Ordinary var-
ov estimator)
estimate
st.err.
t-stat.
p-value
110
CHAPTER 7.
Residuals
-1
-2
-3
0
10
15
20
25
7.6.
111
EXERCISES
Constant
Profits
Lagged Profits
Wages
16.992
0.215
0.076
0.774
1.492
0.096
0.094
0.048
11.388
2.232
0.806
16.234
0.000
0.039
0.431
0.000
*********************************************************
Value
p-value
Breus
h-Godfrey test
2.129
0.345
The test is farther away from the reje
tion region than before, and the residual plot is
a bit more favorable for the hypothesis of nonauto
orrelated residuals, IMHO. For this
reason, it seems that the AR(1)
orre
tion might have improved the estimation.
Nevertheless, there has not been mu
h of an ee
t on the estimated
oe
ients nor on
their estimated standard errors. This is probably be
ause the estimated AR(1)
oe
ient
is not very large (around 0.2)
The existen
e or not of auto
orrelation in this model will be important later, in the
se
tion on simultaneous equations.
V ar(GLS ) = AA
V ar()
Verify that this is true.
2. Show that the GLS estimator
an be dened as
where
d
1
n N 0, Q1
X QX ,
lim E
n
Explain why
X X
n
n
X
b= 1
xt xt 2t
n t=1
112
CHAPTER 7.
4. Dene the
as
Show that
E(mt mt+v ) = v .
ln C = 1j + 2j ln Q + 3 ln PL + 4 ln PF + 5 ln PK +
assume that
V (t |xt ) = j2 , j = 1, 2, ..., 5.
a) Apply White's test using the OLS residuals, to test for homos
edasti
ity
b) Cal
ulate the FGLS estimator and interpret the estimation results.
) Test the transformed model to
he
k whether it appears to satisfy homos
edasti
ity.
Chapter 8
Sto
hasti
regressors
Up to now we have treated the regressors as xed, whi
h is
learly unrealisti
. Now we will
assume they are random.
interested in an analysis
onditional
First, if we are
are sto
hasti
or not, sin
e
onditional on the values of they regressors take on, they are
nonsto
hasti
, whi
h is the
ase already
onsidered.
general, sin
e we may want to predi
t into the future many periods out, so we need to
onsider the behavior of
X.
The model we'll deal will involve a ombination of the following assumptions
Linearity:
0 :
yt = xt 0 + t ,
or in matrix form,
where
is
n 1, X =
x1 x2
y = X0 + ,
, where xt
xn
is
has rank
is sto hasti
limn Pr
K 1, and 0
and
are onformable.
with probability 1
1
nX X
= QX = 1,
where
QX
n1/2 X N (0, QX 02 )
(8.1)
114
CHAPTER 8.
STOCHASTIC REGRESSORS
xt
yt
(8.2)
xt : E(yt |xt ) = xt
given
8.1 Case 1
Normality of , strongly exogenous regressors
In this
ase,
= 0 + (X X)1 X
E(|X)
= 0 + (X X)1 X E(|X)
= 0
and sin
e this holds for all
= ,
X, E()
un onditional on
N , (X X)1 2
|X
0
If the density of
d(X),
is
onditional density by
density for
d(X)
X.
X.
Likewise,
in small samples.
However, onditional on
X,
t, F
and
distributions.
Summary: When
1.
is unbiased
2.
is nonnormally distributed
is normally distributed:
3. The usual test statisti
s have the same distribution as with nonsto
hasti
4. The Gauss-Markov theorem still holds, sin
e it holds
onditionally on
is true for all
X,
X.
and this
X.
8.2 Case 2
8.3.
115
CASE 3
Still, we have
= 0 + (X X)1 X
1
XX
X
= 0 +
n
n
Now
X X
n
1
Q1
X
by assumption, and
n1/2 X p
X
0
=
n
n
sin
e the numerator
onverges to a
N (0, QX 2 ) r.v.
p
0 .
Considering the asymptoti
distribution
1
XX
X
n 0
=
n
n
n
1
XX
n1/2 X
=
n
so
d
2
n 0 N (0, Q1
X 0 )
Sin e
the asymptoti
results on all test statisti
s only require this, all the previous asymptoti
results
on test statisti
s are also valid in this
ase.
normal or nonnormal,
has
the
properties:
1. Unbiasedness
2. Consisten
y
3. Gauss-Markov theorem holds, sin
e it holds in the previous
ase and doesn't depend
on normality.
4. Asymptoti
normality
5. Tests are asymptoti
ally valid
6. Tests are not valid in small samples if the error is normally distributed
8.3 Case 3
Weakly exogenous regressors
116
CHAPTER 8.
STOCHASTIC REGRESSORS
an impa
t on the
urrent value. A simple version of these models that
aptures the important
points is
yt = zt +
p
X
s yts + t
s=1
= xt + t
where now
xt
E(t |xt ) = 0, X
and
E(t1 xt ) 6= 0
sin
e
xt
ontains
yt1
t1 )
as an element.
This fa t implies that all of the small sample properties su h as unbiasedness, GaussMarkov theorem, and small sample validity of test statisti s
do not hold
in this ase.
Re
all Figure 3.7. This is a
ase of weakly exogenous regressors, and we see that the
OLS estimator is biased in this
ase.
Nevertheless, under the above assumptions, all asymptoti
properties
ontinue to hold,
using the same arguments as before.
1
nX X
= QX = 1,
1.
limn Pr
2.
n1/2 X N (0, QX 02 )
QX
The most
ompli
ated
ase is that of dynami
models, sin
e the other
ases
an be treated as
nested in this
ase. There exist a number of
entral limit theorems for dependent pro
esses,
many of whi
h are fairly te
hni
al.
if you're interested).
sequen e
{st } = {
1X
zt }
n t=1
zt
be
t.
8.5.
117
EXERCISES
Covarian e (weak) stationarity requires that the rst and se ond moments of this set
An example of a sequen e that doesn't satisfy this is an AR(1) pro ess with a unit root
not depend on
(a
t.
random walk):
xt = xt1 + t
t IIN (0, 2 )
xt
depends upon
stationary.
The series
sin t + t
has a rst moment that depends upon t, so it's not weakly stationary
either.
Stationarity prevents the pro
ess from trending o to plus or minus innity, and prevents
y
li
al behavior whi
h would allow
orrelations between far removed
zt
znd
zs
to be high.
In summary, the assumptions are reasonable when the sto
hasti
onditioning variables
have varian
es that are nite, and are not too strongly dependent. The AR(1) model
with unit root is an example of a
ase where the dependen
e is too strong for standard
asymptoti
s to apply.
The e
onometri
s of nonstationary pro
esses has been an a
tive area of resear
h in the
last two de
ades. The standard asymptoti
s don't apply in this
ase. This isn't in the
s
ope of this
ourse.
How
e.g., yt = 0 + 0.9yt1 + t
satisfy
118
CHAPTER 8.
STOCHASTIC REGRESSORS
Chapter 9
Data problems
In this se
tion well
onsider problems asso
iated with the regressor matrix:
ollinearity, missing
observation and measurement error.
9.1 Collinearity
Collinearity is the existen
e of linear relationships amongst the regressors.
We an always
write
1 x1 + 2 x2 + + K xK + v = 0
where
xi
is the
ith
X, and v
is an
n 1 ve tor.
relative and approximate are impre
ise, so it's di
ult to dene when
ollinearilty
exists.
so (X X)
< K,
so X X is not invertible and the OLS estimator is not uniquely dened. For
yt = 1 + 2 x2t + 3 x3t + t
x2t = 1 + 2 x3t
then we
an write
yt = 1 + 2 (1 + 2 x3t ) + 3 x3t + t
= 1 + 2 1 + 2 2 x3t + 3 x3t + t
= (1 + 2 1 ) + (2 2 + 3 ) x3t
= 1 + 2 x3t + t
The
119
120
CHAPTER 9.
are
unidentied
DATA PROBLEMS
Perfe
t
ollinearity is unusual, ex
ept in the
ase of an error in
onstru
tion of the
regressor matrix, su
h as in
luding the same regressor twi
e.
Another
ase where perfe
t
ollinearity may be en
ountered is with models with dummy
variables, if one is not
areful. Consider a model of rental pri
e
(yi )
of an apartment. This
Gi , Ti
and
Li
Bi = 1
yi = 1 + 2 Bi + 3 Gi + 4 Ti + 5 Li + xi + i
In this model,
Bi + Gi + Ti + Li = 1, i,
9.1.2 Ba
k to
ollinearity
The more
ommon
ase, if one doesn't make mistakes su
h as these, is the existen
e of inexa
t
linear relationships,
i.e., orrelations between the regressors that are less than one in absolute
value, but not zero. The basi problem is that when two (or more) variables move together, it
i.e.,
With e
onomi
data,
ollinearity is
ommonly en
ountered, and
is di
ult to determine their separate inuen
es. This is ree
ted in impre
ise estimates,
estimates with high varian
es.
When there is
ollinearity, the minimizing point of the obje
tive fun
tion that denes the
OLS estimator (s(), the sum of squared errors) is relatively poorly dened. This is seen in
Figures 9.1 and 9.2.
To see the ee
t of
ollinearity on varian
es, partition the regressor matrix as
X=
where
x W
isf we like, so
there's no loss of generality in
onsidering the rst
olumn). Now, the varian
e of
the
lassi
al assumptions, is
= X X
V ()
Using the partition,
XX=
"
x x
1
x W
W x W W
under
9.1.
121
COLLINEARITY
Figure 9.1:
s()
6
4
60
55
50
45
40
35
30
25
20
15
2
0
-2
-4
-6
-6
-4
-2
Figure 9.2:
s()
6
4
2
0
-2
-4
-6
-6
-4
-2
100
90
80
70
60
50
40
30
20
122
CHAPTER 9.
DATA PROBLEMS
X X
where by
ESSx|W
1
x x x W (W W )1 W x
1
=
x In W (W W ) 1 W x
1
= ESSx|W
1
1,1
x = W + v.
Sin
e
R2 = 1 ESS/T SS,
we have
ESS = T SS(1 R2 )
so the varian
e of the
oe
ient
orresponding to
V (x ) =
is
2
2
T SSx (1 Rx|W
)
We see three fa
tors inuen
e the varian
e of this
oe
ient. It will be high if
1.
is large
x.
well.
2
In this
ase, R
x|W will be
lose to 1.
1, V (x ) .
2
As R
x|W
an
R2 ,
Furthermore, this pro edure identies whi h parameters are ae ted.
9.1.
123
COLLINEARITY
An alternative is to examine the matrix of
orrelations between the regressors. High
orrelations are su
ient but not ne
essary for severe
ollinearity.
Also indi
ative of
ollinearity is that the model ts well (high
is signi
antly dierent from zero (e.g., their separate inuen
es aren't well determined).
In summary, the arti
ial regressions are the best approa
h if one wants to be
areful.
ollinearity.
is all the
R = r + v
where
and
y = X +
R = r + v
!
!
0
,
N
0
v
2 In 0nq
0qn
v2 Iq
This sort of model isn't in line with the
lassi
al interpretation of parameters as
onstants:
a
ording to this interpretation the left hand side of
R = r + v
random. This model does t the Bayesian perspe
tive: we
ombine information
oming from
the model and the data, summarized in
y = X +
N (0, 2 In )
with prior beliefs regarding the distribution of the parameter, summarized in
R N (r, v2 Iq )
Sin
e the sample is random it is reasonable to suppose that
E(v ) = 0,
of information in the spe i ation. How an you estimate using this model? The solution is
124
CHAPTER 9.
DATA PROBLEMS
"
y
r
"
X
R
"
2 6= v2 .
Dene the
This
expresses the degree of belief in the restri
tion relative to the variability of the data. Supposing
that we spe
ify
k,
"
y
kr
"
X
kR
"
kv
is homos
edasti
and
an be estimated by OLS. Note that this estimator is biased.
onsistent, however, given that
ontrast to the
ase of false exa
t restri
tions). To see this, note that there are
where
Q is the number
of rows of
It is
restri tions,
no weight
in the obje
tive fun
tion, so the estimator has the same limiting obje
tive fun
tion as the OLS
estimator, and is therefore
onsistent.
To motivate the use of sto
hasti
restri
tions,
onsider the expe
tation of the squared
length of
:
1
1
X
X
+ X X
+ X X
= + E X(X X)1 (X X)1 X
1 2
= + T r X X
= E
E( )
= + 2
K
X
i (the
i=1
so
min(X X)
values) and
is p.d.
min(X X)
X X
> +
E( )
where
X X
"
y
0
IK = 0 + v.
#
"
X
kIK
"
kv
9.2.
125
MEASUREMENT ERROR
ridge =
=
This is the ordinary
kIK
X X + k2 IK
ridge regression
2
add k IK , whi
h is nonsingular, to
be
omes worse and worse. As
"
1
X
kIK
#!1
X y
IK
"
y
0
k ,
= 0,
ridge = X X + k2 IK
so
ridge
ridge 0.
1
X y k2 IK
1
X y =
X y
0
k2
This is learly a false restri tion in the limit, if our original model is at al
sensible.
There should be some amount of shrinkage that is in fa
t a true restri
tion. The problem is
to determine the
su h that the restri tion is orre t. The interest in ridge regression enters
depends on
su h that
ridge
ridge
as a fun tion of
k,
that artisti ally seems appropriate (e.g., where the ee t of in reasing
pi ture here.
The
2
and , whi
h are unknown.
dies o ).
Draw
In summary, the ridge estimator oers some hope, but it is impossible to guarantee that
it will outperform the OLS estimator. Collinearity is a fa
t of life in e
onometri
s, and there
is no
lear solution to the problem.
126
CHAPTER 9.
DATA PROBLEMS
is presumed to be
y = X +
y = y + v
vt iid(0, v2 )
where
that
and
= X +
this, we have
y + v = X +
so
y = X + v
= X +
t iid(0, 2 + v2 )
As long as
is un orrelated with
X,
yt = x
t + t
xt = xt + vt
vt iid(0, v )
where
is a
K K
matrix.
Now
is independent of
yt = (xt vt ) + t
= xt vt + t
= xt + t
xt
and
t ,
E(xt t ) = E (xt + vt ) vt + t
= v
where
v = E vt vt .
sin e
y=
is
9.3.
127
MISSING OBSERVATIONS
Be
ause of this
orrelation, the OLS estimator is biased and in
onsistent, just as in the
ase of
auto
orrelated errors with lagged dependent variables. In matrix notation, write the estimated
model as
y = X +
We have that
X X
n
1
X y
n
and
plim
sin e
and
X X
n
1
= plim
(X + V ) (X + V )
n
= (QX + v )1
plim
V V
n
= lim E
= v
1X
vt vt
n
t=1
Likewise,
plim
X y
n
(X + V ) (X + )
n
= QX
= plim
so
plim = (QX + v )1 QX
So we see that the least squares estimator is in
onsistent when the regressors are measured
with error.
A potential solution to this problem is the instrumental variables (IV) estimator, whi
h
we'll dis
uss shortly.
missing observations on the dependent variable and missing observations on the regressors.
y = X +
128
CHAPTER 9.
"
or
where
y2
y1
y2
"
X1
X2
"
1
2
DATA PROBLEMS
y1 = X1 + 1
Sin
e these observations satisfy the
lassi
al assumptions, one
ould estimate by OLS.
The question remains whether or not one
ould somehow repla
e the unobserved
a predi
tor, and improve over OLS in some sense. Let
y2
("
# "
=
=
Re
all that the OLS fon
are
X1
X2
# "
X1
#)1 "
X1
y1
X2
X2
y2
1
X1 X1 + X2 X2
X1 y1 + X2 y2
y2 .
X X = X y
so if we regressed using only the rst (
omplete) observations, we would have
X1 X1 1 = X1 y1.
Likewise, an OLS regression using only the se
ond (lled in) observations would give
X2 X2 2 = X2 y2 .
Substituting these into the equation for the overall
ombined estimator gives
i
1 h
X1 X1 1 + X2 X2 2
1
1
X2 X2 2
X1 X1 1 + X1 X1 + X2 X2
= X1 X1 + X2 X2
X1 X1 + X2 X2
A1 + (IK A)2
where
1
X1 X1
A X1 X1 + X2 X2
and we use
X1 X1 + X2 X2
1
1
X1 X1 + X2 X2 X1 X1
X1 X1 + X2 X2
1
X1 X1
= IK X1 X1 + X2 X2
X2 X2 =
= IK A.
y2
by
Now
9.3.
129
MISSING OBSERVATIONS
Now,
= A + (IK A)E 2
E()
2 = .
if E
The
on
lusion is the this lled in observations alone would need to dene an unbiased
estimator. This will be the
ase only if
y2 = X2 + 2
where
of
y2 = y1
estimator.
One possibility that has been suggested (see Greene, page 275) is to estimate
using a
1 = (X1 X1 )1 X1 y1
then use this estimate,
1 ,to
predi t
y2
y2 = X2 1
= X2 (X1 X1 )1 X1 y1
Now, the overall estimate is a weighted average of
and
2 ,
2 = (X2 X2 )1 X2 y2
= (X2 X2 )1 X2 X2 1
= 1
This shows that this suggestion is
ompletely empty of
ontent: the nal estimator is
the same as the OLS estimator using only the
omplete observations.
yt = xt + t
130
CHAPTER 9.
DATA PROBLEMS
15
10
-5
-10
0
10
yt
yt
dened as
yt = yt
Or, in other words,
yt
if
yt 0
The dieren
e in this
ase is that the missing values are not random: they are
orrelated
with the
xt .
y = x +
with
V () = 25,
y > 0
to estimate.
Figure 9.3
"
y1
y2
X2
"
X1
X2
"
1
2
just estimate using the omplete observations, but it may seem frustrating to have to drop
X2 is
repla ed by some predi tion, X2 , then we are in the ase of errors of observation. As before,
this means that the OLS estimator is biased when X2 is used instead of X2 . Consisten
y is
observations simply be
ause of a single missing variable. In general, if the unobserved
salvaged, however, as long as the number of missing observations doesn't in rease with
ad ho
n.
values an be inter-
9.4.
131
EXERCISES
preted as introdu
ing false sto
hasti
restri
tions. In general, this introdu
es bias. It is
di
ult to determine whether MSE in
reases or de
reases. Monte Carlo studies suggest
that it is dangerous to simply substitute the mean, for example.
In the
ase that there is only one regressor other than the
onstant, subtitution of
the missing
xt
for
K > 2.
In summary, if one is strongly
on
erned with bias, it is best to drop observations that
have missing
omponents.
missing elements with intelligent guesses, but this ould also in rease MSE.
ln C = 1j + 2j ln Q + 3 ln PL + 4 ln PF + 5 ln PK +
When this model is estimated by OLS, some
oe
ients are not signi
ant. This may
be due to
ollinearity.
132
CHAPTER 9.
DATA PROBLEMS
Chapter 10
Fun
tional form and nonnested tests
Though theory often suggests whi
h
onditioning variables should be in
luded, and suggests
the signs of
ertain derivatives, it is usually silent regarding the fun
tional form of the relationship between the dependent variable and the regressors. For example,
onsidering a
ost
fun
tion, one
ould have a Cobb-Douglas model
c = Aw11 w22 q q e
This model, after taking logarithms, gives
ln c = 0 + 1 ln w1 + 2 ln w2 + q ln q +
where
0 = ln A.
1 + 2 = 1,
c = 0 when q = 0. Homogeneity
of degree one
q = 1.
c = 0 + 1 w1 + 2 w2 + q q +
and
ln(x)
regressors, and up to a linear transformation, so it may be di
ult to
hoose between these
models.
The basi
point is that many fun
tional forms are
ompatible with the linear-in-parameters
model, sin
e this model
an in
orporate a wide variety of nonlinear transformations of the
dependent variable and the regressors. For example, suppose that
and that
x()
is a
K ve tor-valued
xt = x(zt )
yt = xt + t
There may be
133
zt ,
regressors, where
134
CHAPTER 10.
P.
For example,
xt
zt .
Flexibility in this
K = 1 + P + P 2 P /2 + P
y = g(x) +
A se
ond-order Taylor's series expansion (with remainder term) of the fun
tion
point
x=0
is
x Dx2 g(0)x
+R
2
Use the approximation, whi h simply drops the remainder term, as an approximation to
x 0,
x Dx2 g(0)x
2
the approximation be omes more and more exa t, in the sense that
Dx gK (x) Dx g(x)
2
and Dx gK (x)
x = 0,
g(x) :
gK (x) g(x),
g(0), Dx g(0)
2
and Dx g(0) are all
onstants. If we treat them as parameters, the approximation will have
the se
ond order. The idea behind many exible fun
tional forms is to note that
x = 0.
g(x),
The model is
gK (x) = + x + 1/2x x
so the regression model to t is
y = + x + 1/2x x +
While the regression model has enough free parameters to be Diewert-exible, the question remains: is
plim
= g(0)?
Is
plim = Dx g(0)?
Is
= D 2 g(0)?
plim
x
10.1.
135
x,
so that
and
As before, the
Draw pi ture.
The
on
lusion is that exible fun
tional forms aren't really exible in a useful statisti
al sense, in that neither the fun
tion itself nor its derivatives are
onsistently estimated,
unless the fun
tion belongs to the parametri
family of the spe
ied fun
tional form. In
order to lead to
onsistent inferen
es, the regression model must be
orre
tly spe
ied.
y = ln(c)
z
x = ln
z
= ln(z) ln(
z)
y = + x + 1/2x x +
Note that
y
x
= + x
=
=
ln(c)
ln(z)
c z
z c
onstant)
so the
x is
z, x = 0,
y
=
x z=z
y = c(w, q)
136
CHAPTER 10.
where
x=
c(w, q)
w
s=
wx
c(w, q) w
=
c
w c
whi
h is simply the ve
tor of elasti
ities of
ost with respe
t to input pri
es. If the
ost fun
tion
is modeled using a translog fun
tion, we have
ln(c) = + x + z + 1/2
x z
"
11 12
12 22
#"
x
z
= + x + z + 1/2x 11 x + x 12 z + 1/2z 22
where
x = ln(w/w)
"
11 =
"
12 =
z = ln(q/
q ),
11 12
12 22
#
13
and
23
22 = 33 .
Note that symmetry of the se
ond derivatives has been imposed.
s=+
11 12
"
x
z
Therefore, the share equations and the
ost equation have parameters in
ommon. By pooling
the equations together and imposing the (true) restri
tion that the parameters of the equations
be the same, we
an gain e
ien
y.
x=
"
x1
x2
In this ase the translog model of the logarithmi ost fun tion is
ln c = + 1 x1 + 2 x2 + z +
11 2 22 2 33 2
x +
x +
z + 12 x1 x2 + 13 x1 z + 23 x2 z
2 1
2 2
2
10.1.
137
ln c
with respe t to
x1
and
x2 :
s1 = 1 + 11 x1 + 12 x2 + 13 z
s2 = 2 + 12 x1 + 22 x2 + 13 z
Note that the share equations and the
ost equation have parameters in
ommon.
One
an do a pooled estimation of the three equations at on
e, imposing that the parameters are
the same. In this way we're using more observations and therefore more information, whi
h
will lead to imporved e
ien
y. Note that this does assume that the
ost equation is
orre
tly
i.e.,
spe ied (
not an approximation), sin e otherwise the derivatives would not be the true
derivatives of the log
ost fun
tion, and would then be misspe
ied for the shares. To pool
the equations, write the model in matrix form (adding in error terms)
ln c
s1
s2
This is
1 x1 x2 z
= 0 1
0 0
one
x21
2
x22
2
z2
2
x1 x2
x2
x2 0
x1
0 x1 0
0 0
x1 z x2 z
z
0
0
z
11
+ 2
22
3
33
12
13
23
observation an be written as
yt = Xt + t
The overall model would sta
k
vations:
y1
X1
3n
obser-
y2 X2
+ .2
. = .
.
. .
.
. .
yn
Xn
n
1t
t = 2t
3t
First
onsider the
ovarian
e matrix of this ve
tor: the shares are
ertainly
orrelated sin
e
they must sum to one. (In fa
t, with 2 shares the varian
es are equal and the
ovarian
e is
-1 times the varian
e. General notation is used to allow easy extension to the
ase of more
138
CHAPTER 10.
than 2 inputs). Also, it's likely that the shares and the
ost equation have dierent varian
es.
Supposing that the model is
ovarian
e stationary, the varian
e of
t:
11 12 13
V art = 0 =
22 23
33
(SUR)
stru ture.
V ar .
=
..
n
0 0
= .
..
0
..
0
..
.
.
.
= In 0
where the symbol
and
is
a B ...
21
AB = .
..
apq B
.
.
.
apq B
So we need to estimate
2. Estimate
using
X
0 = 1
t t
n t=1
0 .
It an be shown that
will be
singular when the shares sum to one, so FGLS won't work. The solution is to drop one
10.1.
139
of the share equations, for example the se ond. The model be omes
"
ln c
s1
"
1 x1 x2 z
0 1
x21
2
x22
2
0 x1 0
z2
2
x1 x2
x2
#
x1 z x2 z
z
0
"
#
11
1
+
22
2
33
12
13
23
yt = Xt + t
and in sta
ked notation for all observations we have the
y1
X1
2n
observations:
y2 X2
+ .2
. = .
.
. .
.
. .
yn
Xn
n
or, nally in matrix notation for all observations:
y = X +
Considering the error
ovarian
e, we
an dene
0 = V ar
Dene
as the leading
22
= In
blo k of
"
1
2
, and form
= In
0 .
This is a onsistent estimator, following the onsisten y of OLS and applying a LLN.
1
0
P0 = Chol
(I am assuming this is dened as an upper triangular matrix, whi
h is
onsistent with
140
CHAPTER 10.
the way O
tave does it) and the Cholesky fa
torization of the overall
ovarian
e matrix
of the 2 equation model, whi
h
an be
al
ulated as
= In P0
P = Chol
5. Finally the FGLS estimator
an be
al
ulated by applying OLS to the transformed model
P y = P X + P
or by dire
tly using the GLS formula
F GLS =
1
1
1
0
0
y
X
X
X
P0 yy = P0 Xt + P0
and then apply OLS. This is probably the simplest approa
h.
A few last
omments.
1. We have assumed no auto
orrelation a
ross time. This is
learly restri
tive. It is relatively simple to relax this, but we won't go into it here.
2. Also, we have only imposed symmetry of the se
ond derivatives.
that the model should satisfy is that the estimated shares should sum to 1. This
an be
a
omplished by imposing
1 + 2 = 1
3
X
ij = 0, j = 1, 2, 3.
i=1
These are linear parameter restri
tions, so they are easy to impose and will improve
e
ien
y if they are true.
3. The estimation pro
edure outlined above
an be
iterated.
F GLS
as
= y X F GLS
These might be expe
ted to lead to a better estimate than the estimator based on
sin
e FGLS is asymptoti
ally more e
ient. Then re-estimate
OLS ,
error ovarian e. It an be shown that if this is repeated until the estimates don't hange
i.e., iterated to onvergen e) then the resulting estimator is the MLE. At any rate, the
10.2.
141
asymptoti
properties of the iterated and uniterated estimators are the same, sin
e both
are based upon a
onsistent estimator of the error
ovarian
e.
qF
yt = + xt + 1/2xt xt +
where the variables are in logarithms, while the Cobb-Douglas is
yt = + xt +
so a test of the Cobb-Douglas versus the translog is simply a test that
The situation is more
ompli
ated when we want to test
= 0.
non-nested hypotheses.
If the
two fun
tional forms are linear in the parameters, and use the same transformation of the
dependent variable, then they may be written as
M1 : y = X +
t iid(0, 2 )
M2 : y = Z +
iid(0, 2 )
We wish to test hypotheses of the form:
H0 : Mi
versus
One ould a ount for non-iid errors, but we'll suppress this for simpli ity.
E onometri a
HA : Mi
is
e.g.,
y = (1 )X + (Z) +
If the rst model is
orre
tly spe
ied, then the true value of
hand, if the se
ond model is
orre
tly spe
ied then
= 1.
M1 : yt = 1 + 2 x2t + 3 x3t + t
M2 : yt = 1 + 2 x2t + 3 x4t + t
142
CHAPTER 10.
test is to substitute
= 0.
in pla e of
supposing that the se
ond model is
orre
tly spe
ied. It will tend to a nite probability limit
even if the se
ond model is misspe
ied. Then estimate the model
y = (1 )X + (Z ) +
= X +
y+
where
y = Z(Z Z)1 Z y = PZ y.
In this model,
=0
t=
a
N (0, 1)
t ,
sin e
tends in probability to
1, while it's estimated standard error tends to zero. Thus the test will always reje
t the
false null model, asymptoti
ally, sin
e the statisti
will eventually ex
eed any
riti
al
value with probability one.
We an reverse the roles of the models, testing the se ond against the rst.
neither
model is orre tly spe ied. In this ase, the test will
still reje
t the null hypothesis, asymptoti
ally, if we use
riti
al values from the
distribution, sin
e as long as
N (0, 1)
when we swit h the roles of the models the other will also be reje ted asymptoti ally.
In summary, there are 4 possible out
omes when we test two models, ea
h against the
other. Both may be reje
ted, neither may be reje
ted, or one of the two may be reje
ted.
M1
is nonlinear.
The above presentation assumes that the same transformation of the dependent variable
10.2.
143
Journal of E onometri s,
Monte-Carlo eviden
e shows that these tests often over-reje
t a
orre
tly spe
ied model.
Can use bootstrap
riti
al values to get better-performing tests.
144
CHAPTER 10.
Chapter 11
Exogeneity and simultaneity
Several times we've en
ountered
ases where
orrelation between regressors and the error term
lead to biasedness and in
onsisten
y of the OLS estimator. Cases in
lude auto
orrelation with
lagged dependent variables and measurement error in the regressors. Another important
ase
is that of simultaneous equations. The
ause is dierent, but the ee
t is the same.
y = X +
where, for purposes of estimation we
an treat
we
ondition
on
X.
Demand:
"
1t
2t
qt
and
Supply:
1t 2t
pt
qt = 1 + 2 pt + 3 yt + 1t
q = 1 + 2 pt + 2t
#
!t
"
i
11 12
=
22
, t
yt
to see that we have orrelation between regressors and errors. Solving for
pt
1 + 2 pt + 3 yt + 1t = 1 + 2 pt + 2t
2 pt 2 pt = 1 1 + 3 yt + 1t 2t
3 yt
1t 2t
1 1
+
+
pt =
2 2 2 2
2 2
145
146
CHAPTER 11.
pt
is un orrelated with
1t :
1 1
3 yt
1t 2t
+
+
2 2 2 2
2 2
11 12
2 2
E(pt 1t ) = E
=
1t
Be
ause of this
orrelation, OLS estimation of the demand equation will be biased and in
onsistent. The same applies to the supply equation, for the same reason.
In this model,
the system.
yt
qt
is an
and
pt
endogenous
are the
exogenous
variable (exogs).
return to it in a minute. First, some notation. Suppose we group together
urrent endogs in
the ve
tor
Yt .
If there are
Et .
endogs,
Xt
Yt
is
, whi h is
G 1.
K 1.
Yt = Xt B + Et
Et N (0, ), t
E(Et Es ) = 0, t 6= s
We
an sta
k all
Y = XB + E
E(X E) = 0(KG)
vec(E) N (0, )
where
is
n G, X
is
n K,
Y1
X1
X2
Y2
.
,
X
=
Y =
.
..
.
.
Yn
Xn
and
is
n G.
E1
E
,E = . 2
.
En
This system is
Et
are individually
11.2.
147
EXOGENEITY
11 In 12 In 1G In
.
.
.
22 In
..
.
.
.
GG In
= In
may ontain lagged endogenous and exogenous variables. These variables are
termined.
prede-
We need to dene what is meant by endogenous and exogenous when
lassifying the
urrent period variables.
11.2 Exogeneity
The model denes a
Xt ,
Yt
and
is a G2 +GK + G2 G /2+G dimensional
parameter ve tor
Yt
and
Xt ,
whi h depends on a
ft (Yt , Xt |, It )
where
It
t.
Xt
Yt
Yt s
onditional on
Xt
and lagged
Xt
's of
and
148
CHAPTER 11.
Yt = Xt B + Et
Et N (0, ), t
E(Et Es ) = 0, t 6= s
Normality and la
k of
orrelation over time imply that the observations are independent of
one another, so we
an write the log-likelihood fun
tion as the sum of likelihood
ontributions
of ea
h observation:
ln L(Y |, It ) =
=
n
X
t=1
n
X
t=1
n
X
t=1
ln ft (Yt , Xt |, It )
ln (ft (Yt |Xt , 1 , It )ft (Xt |2 , It ))
ln ft (Yt |Xt , 1 , It ) +
n
X
t=1
ln ft (Xt |2 , It ) =
ve tor) if there is a mapping from to that is invariant to 2 . More formally, for an arbitrary
(1 , 2 ), () = (1 ).
This implies that
hange as
Supposing that
Xt
(1 , 2 ).
ln L(Y |X, , It ) =
n
X
t=1
ln ft (Yt |Xt , 1 , It )
1 .
Xt
Xt
is irrelevant, we an treat
Xt
1 , and
as xed in inferen e.
is
(1 ),and
from 1 .
Of ourse, we'll need to gure out just what this mapping is to re over
With la k of weak exogeneity, the joint and onditional likelihood fun tions maximize in
is the famous
This
Xt
11.3.
149
REDUCED FORM
Xt
Yt
Yt1 It .
Yt
Yt = Xt B + Et
V (Et ) =
This is the model in
Denition 16 (Stru tural form) An equation is in stru tural form when more than one
Yt = Xt B1 + Et 1
= Xt + Vt =
Now only one
urrent period endog appears in ea
h equation. This is the
redu ed form.
Denition 17 (Redu ed form) An equation is in redu ed form if only one urrent period
endog is in luded.
An example is our supply/demand system. The redu
ed form for quantity is obtained by
solving the supply equation for pri
e and substituting into demand:
qt =
2 qt 2 qt =
qt =
=
qt 1 2t
1 + 2
+ 3 yt + 1t
2
2 1 2 (1 + 2t ) + 2 3 yt + 2 1t
2 3 yt
2 1t 2 2t
2 1 2 1
+
+
2 2
2 2
2 2
11 + 21 yt + V1t
150
CHAPTER 11.
1 + 2 pt + 2t = 1 + 2 pt + 3 yt + 1t
2 pt 2 pt = 1 1 + 3 yt + 1t 2t
3 yt
1t 2t
1 1
+
+
pt =
2 2 2 2
2 2
= 12 + 22 yt + V2t
The interesting thing about the rf is that the equations individually satisfy the
lassi
al assumptions, sin
e
i=1,2,
t.
yt
is un orrelated with
1t
and
2t
The varian e of
"
V1t
V1t
V2t
"
2 1t 2 2t
2 2
1t 2t
2 2
E(yt Vit ) = 0,
is
2 1t 2 2t
2 1t 2 2t
V (V1t ) = E
2 2
2 2
2
2 11 22 2 12 + 2 22
=
(2 2 )2
Vt .
1t 2t
1t 2t
V (V2t ) = E
2 2
2 2
11 212 + 22
=
(2 2 )2
and the
ontemporaneous
ovarian
e of the errors a
ross equations is
1t 2t
2 1t 2 2t
E(V1t V2t ) = E
2 2
2 2
2 11 (2 + 2 ) 12 + 22
=
(2 2 )2
In summary the rf equations individually satisfy the
lassi
al assumptions, under the
assumtions we've made, but they are
ontemporaneously
orrelated.
Yt = Xt B1 + Et 1
= Xt + Vt
11.4.
151
IV ESTIMATION
so we have that
Vt
Vt = 1 Et N 0, 1 1 , t
are timewise independent (note that this wouldn't be the ase if the
Et
were
auto orrelated).
11.4 IV estimation
The IV estimator may appear a bit unusual at rst, but it will grow on you over time.
The simultaneous equations model is
Y = XB + E
Considering the rst equation (this is without loss of generality, sin
e we
an always reorder
the equations) we
an partition the
matrix as
Y =
y
y Y1 Y2
Y1
are the other endogenous variables that enter the rst equation
Y2
Similarly, partition
as
X=
X1
X2
X1 X2
E=
Assume that
E12
has ones on the main diagonal. These are normalization restri tions that
simply s
ale the remaining
oe
ients on ea
h equation, and whi
h s
ale the varian
es of the
error terms.
Given this s
aling and our partitioning, the
oe
ient matri
es
an be written as
12
= 1 22
0
32
"
#
1 B12
B =
0 B22
152
CHAPTER 11.
y = Y1 1 + X1 1 +
= Z +
The problem, as we've seen is that
is orrelated with
sin e
Y1
is formed of endogs.
Now, let's onsider the general problem of a linear regression model with orrelation between regressors and the error term:
y = X +
iid(0, In 2 )
E(X ) 6= 0.
The present
ase of a stru
tural equation from a system of equations ts into this notation,
but so do other problems, su
h as measurement error or lagged dependent variables with
auto
orrelated errors.
with
PW = W (W W )1 W
so that anything that is proje
ted onto the spa
e spanned by
by the denition of
W.
PW y = PW X + PW
or
y = X +
Now we have that
and
E(X ) = E(X PW
PW )
= E(X PW )
and
PW X = W (W W )1 W X
is the tted value from a regression of
W,
on
W.
y = X +
will lead to a
onsistent estimator, given a few more assumptions.
This is the
generalized
11.4.
153
IV ESTIMATION
IV = (X PW X)1 X PW y
from whi
h we obtain
IV
= (X PW X)1 X PW (X + )
= + (X PW X)1 X PW
so
IV = (X PW X)1 X PW
X W (W W )1 W X
=
Now we
an introdu
e fa
tors of
IV =
X W
n
to get
W W 1
n
!
n
W W
n
QW W ,
X W
n
QXW ,
W p
n
W X
n
1
X W (W W )1 W
!1
X W
n
W W
n
1
W
n
a nite pd matrix
(= ols(X) )
then the plim of the rhs is zero. This last term has plim 0 sin e we assume that
and
un orrelated, e.g.,
E(Wt t ) = 0,
Given these assumtions the IV estimator is
onsistent
p
IV .
Furthermore, s
aling by
n IV =
n,
we have
X W
n
W W
n
1
W X
n
!1
X W
n
W W
n
d
W
n
then we get
N (0, QW W 2 )
d
1 2
n IV N 0, (QXW Q1
W W QXW )
1
are
154
CHAPTER 11.
QXW
and
QW W
is
2 = 1 y X
IV .
IV
y
d
IV
n
This estimator is onsistent following the proof of onsisten y of the OLS estimator of
2 ,
V (IV ) =
The IV estimator is
X W
IV
W W
is
1
W X
1 d
2
IV
1. Consistent
2. Asymptoti
ally normally distributed
3. Biased in general, sin
e even though
1 and
zero, sin
e (X PW X)
the estimator.
W.
W1 .
W2
IV
depends upon
QXW
and
QW W ,
may not be
W1
and
W2
su h that
W1 W2 ,
then the IV
There are spe ial ases where there is no gain (simultaneous equations is an example of
The penalty for indis riminant use of instruments is that the small sample bias of the
IV estimator rises as the number of instruments in reases. The reason for this is that
PW X
IV estimation
an
learly be used in the
ase of simultaneous equations. The only issue
is whi
h instruments to use.
plim n1 W = 0.
This matrix is
1 2
V (IV ) = (QXW Q1
W W QXW )
11.5.
155
The ne essary and su ient ondition for identi ation is simply that this matrix be
For this matrix to be positive denite, we need that the onditions noted above hold:
These identi ation onditions are not that intuitive nor is it very obvious how to he k
positive denite, and that the instruments be (asymptoti ally) un orrelated with
QW W
QXW
).
them.
y = Z +
where
Z=
Notation:
Let
Let
K = cols(X1 )
Let
Y1 X1
G = cols(Y1 ) + 1
K = K K
be the
G = G G
be
Now the
X1
don't lead to an identied model then no other instruments will identify the model
either. Assuming this is true (we'll prove it in a moment), then a ne
essary
ondition
for identi
ation is that
must be used twi
e, so
cols(X2 ) cols(Y1 )
order ondition
the only identifying information is ex
lusion restri
tions on the variables that enter an
equation, then the number of ex
luded exogs must be greater than or equal to the number
of in
luded endogs, minus 1 (the normalized lhs endog), e.g.,
K G 1
156
CHAPTER 11.
To show that this is in fa t a ne essary ondition onsider some arbitrary set of instruments
W.
1
plim W Z
n
where
Z=
Re
all that we've partitioned the model
= K + G 1
Y1 X1
Y = XB + E
as
Y =
X=
y Y1 Y2
X1 X2
Y = X + V
we
an write the redu
ed form using the same partition
y Y1 Y2
X1 X2
"
11 12 13
21 22 23
v V1 V2
so we have
Y1 = X1 12 + X2 22 + V1
so
Be ause the
V1
i
h
1
1
W Z = W X1 12 + X2 22 + V1 X1
n
n
V1
and
i
h
1
1
plim W Z = plim W X1 12 + X2 22 X1
n
n
Sin
e the far rhs term is formed only of linear
ombinations of
olumns of
matrix
an never be greater than
than
K,
X,
has more
olumns we
have
G 1 + K > K
or noting that
K = K K ,
G 1 > K
In this ase, the limiting matrix is not of full olumn rank, and the identi ation ondition
11.5.
157
fails.
Yt = Xt B + Et
V (Et ) =
This leads to the redu
ed form
Yt = Xt B1 + Et 1
= Xt + Vt
V (Vt ) = 1 1
=
The redu ed form parameters are onsistently estimable, but none of them are known
a priori,
and there are no restri
tions on their values. The problem is that more than one stru
tural
form has the same redu
ed form, so knowledge of the redu
ed form parameters alone isn't
enough to determine the stru
tural parameters. To see this,
onsider the model
Yt F
= Xt BF + Et F
V (Et F ) = F F
where
GG
Yt = Xt BF (F )1 + Et F (F )1
= Xt BF F 1 1 + Et F F 1 1
= Xt B1 + Et 1
= Xt + Vt
Likewise, the
ovarian
e of the rf of the transformed model is
V (Et F (F )1 ) = V (Et 1 )
=
Sin
e the two stru
tural forms lead to the same rf, and the rf is all that is dire
tly estimable,
the models are said to be
observationally equivalent.
158
CHAPTER 11.
restri tions on
and
equations are to be identied). Take the oe ient matri es as partitioned before:
"
12
=
0
1
0
22
32
B12
B22
The
oe
ients of the rst equation of the transformed model are simply these
oe
ients
multiplied by the rst
olumn of
"
F.
#"
This gives
f11
F2
12
=
0
1
0
#
22 "
f11
32
F
2
B12
B22
For identi
ation of the rst equation we need that there be enough restri
tions so that the
only admissible
"
f11
F2
12
1
0
#
22 "
1
f11
=
32
F
0
B12
1
0
B22
"
32
B22
F2 =
"
0
0
"
32
B22
#!
= cols
"
32
B22
#!
=G1
then the only way this
an hold, without additional restri
tions on the model's parameters, is
if
F2
F2
1 12
"
f11
F2
= 1 f11 = 1
11.5.
159
Therefore, as long as
"
then
"
B22
#!
"
32
f11
F2
=G1
1
0G1
The rst equation is identied in this
ase, so the
ondition is su
ient for identi
ation. It is
also ne
essary, sin
e the
ondition implies that this submatrix must have at least
Sin
e this matrix has
G1
rows.
G + K = G G + K
rows, we obtain
G G + K G 1
or
K G 1
whi
h is the previously derived ne
essary
ondition.
The above result is fairly intuitive (draw pi
ture here). The ne
essary
ondition ensures
that there are enough variables not in the equation of interest to potentially move the other
equations, so as to tra
e out the equation of interest. The su
ient
ondition ensures that
those other equations in fa
t do move around as the variables
hange their values.
Some
points:
K = G 1,
is is
in that omission of an
K > G 1,
the equation is
overidentied,
When
an equation is overidentied we have more instruments than are stri
tly ne
essary for
onsistent estimation. Sin
e estimation by IV with more instruments is more e
ient
asymptoti
ally, one should employ overidentifying restri
tions if one is
ondent that
they're true.
We an repeat this partition for ea h equation in the system, to see whi h equations are
These results are valid assuming that the only identifying information omes from know-
ing whi
h variables appear in whi
h equations, e.g., by ex
lusion restri
tions, and through
the use of a normalization. There are other sorts of identifying information that
an be
used. These in
lude
160
CHAPTER 11.
4. Nonlinearities in variables
When these sorts of information are available, the above
onditions aren't ne
essary for
identi
ation, though they are of
ourse still su
ient.
Y = XB + E
where
system
This is a
triangular
y1 = XB1 + E1
Sin
e only exogs appear on the rhs, this equation is identied.
y2 = 21 y1 + XB2 + E2
This equation has
K = 0
G = 2
21 = 0,
E(y1t 2t ) = E (Xt B1 + 1t )2t = 0
This is known as a
fully re ursive
11.5.
161
Consumption:
Investment:
Wtp = 0 + 1 Xt + 2 Xt1 + 3 At + 3t
Private Wages:
Xt = Ct + It + Gt
Output:
Prots:
Capital Sto
k:
It = 0 + 1 Pt + 2 Pt1 + 3 Kt1 + 2t
1t
Pt = Xt Tt Wtp
Kt = Kt1 + It
11 12 13
2t IID 0 ,
0
3t
The other variables are the government wage bill,
ing,
Gt ,and
a time trend,
At .
Wtg ,
22 23
33
taxes,
Tt ,
Yt =
Ct It Wtp Xt Pt Kt
Xt
Wtg
The model assumes that the errors of the equations are
ontemporaneously
orrelated, by
nonauto
orrelated. The model written as
Y = XB + E
3 0
0
0
0
B=
1 0
0 0 0 0 0
3 0
2 2 0
0
3 0
1 0
0
0 1 0
0 0
0
0 0
0
0 0
1
0 0
0
0 0
1
0
1 0
1
0
0
1
1 0
1
0
1 1
1 1 0
gives
162
CHAPTER 11.
don't
32
and
B22 ,
the
the rows that have zeros in the rst olumn, and we need to drop the rst olumn. We get
"
32
B22
1 0
1 1
1 0
1 0
3 0
For example,
A=
0
0
3
0
0
3
0
0 0
0 1 0
0 0
0
0 0
1
1 0
This matrix is of full rank, so the su ient ondition for identi ation is met.
in luded endogs, G
= 3,
= 5,
Counting
so
K L = G 1
5L
L
=31
=3
The equation is over-identied by three restri
tions, a
ording to the
ounting rules,
whi
h are
orre
t when the only identifying information are the ex
lusion restri
tions.
However, there is additional information in this
ase.
Both
Wtp
and
Wtg
onsumption equation, and their oe ients are restri ted to be the same.
enter the
For this
11.6 2SLS
When we have no information regarding
ross-equation restri
tions or the stru
ture of the
error
ovarian
e matrix, one
an estimate the parameters of a single equation of the system
without regard to the other equations.
This isn't always e
ient, as we'll see, but it has the advantage that misspe
i
ations
in other equations will not ae
t the
onsisten
y of the estimator of the parameters of
11.6.
163
2SLS
Also, estimation of the equation won't be ae
ted by identi
ation problems in other
equations.
Y1
is regressed on
all
the
Y1 = X(X X)1 X Y1
= PX Y1
1
= X
Sin
e these tted values are the proje
tion of
Y1
1
ve
tor in this spa
e is un
orrelated with by assumption, Y
simply the redu
ed-form predi
tion, it is
orrelated with
X,
is un orrelated with
. Sin e Y1
is
the instruments be linearly independent. This should be the
ase when the order
ondition is
satised, sin
e there are more
olumns in
The se
ond stage substitutes
Y1
X2
than in
in pla e of
Y1 ,
Y1
in this ase.
is
y = Y1 1 + X1 1 +
= Z +
and the se
ond stage model is
y = Y1 1 + X1 1 + .
Sin
e
X1
X, PX X1 = X1 ,
as
y = PX Y1 1 + PX X1 1 +
PX Z +
The OLS estimator applied to this model is
= (Z PX Z)1 Z PX y
whi
h is exa
tly what we get if we estimate using IV, with the redu
ed form predi
tions of the
endogs used as instruments. Note that if we dene
Z = PX Z
h
i
=
Y1 X1
164
CHAPTER 11.
so that
Z,
then we an write
= (Z Z)1 Z y
Important note:
estimate of
However
, sin e we see that it's equivalent to IV using a parti ular set of instruments.
A tually, there is also a simpli ation of the general IV varian e formula. Dene
Z = PX Z
h
i
=
Y X
The IV
ovarian
e estimator would ordinarily be
1
1
2
= Z Z
V ()
Z Z Z Z
IV
However, looking at the last term in bra
kets
Z Z =
but sin
e
PX
Y1 X1
i h
Y1 X1
i h
Y1 X1
PX X = X,
Y1 X1
=
=
"
h
"
X1 X1
we an write
Y1 PX PX Y1 Y1 PX X1
X1 X1
X1 PX Y1
i h
i
Y1 X1
Y1 X1
= Z Z
Therefore, the se ond and last term in the varian e formula an el, so the 2SLS var ov estimator simplies to
1
2
= Z Z
V ()
IV
1
2
= Z Z
V ()
IV
Finally, re
all that though this is presented in terms of the rst equation, it is general sin
e
any equation
an be pla
ed rst.
Properties of 2SLS:
1. Consistent
2. Asymptoti
ally normal
11.7.
165
3. Biased when the mean esists (the existen
e of moments is a te
hni
al issue we won't go
into here).
4. Asymptoti ally ine ient, ex ept in spe ial ir umstan es (more on this later).
the model.
As su h, there is room for error here: one might erroneously lassify a variable as
exog when it is in fa
t
orrelated with the error term. A general test for the spe
i
ation on
the model
an be formulated as follows:
The IV estimator
an be
al
ulated by applying OLS to the transformed model, so the IV
obje
tive fun
tion at the minimized value is
s(IV ) = y X IV PW y X IV ,
but
IV
= y X IV
= y X(X PW X)1 X PW y
= I X(X PW X)1 X PW y
= I X(X PW X)1 X PW (X + )
= A (X + )
where
A I X(X PW X)1 X PW
so
Moreover,
A PW A
A PW A =
=
=
Furthermore,
s(IV ) = + X A PW A (X + )
I PW X(X PW X)1 X PW I X(X PW X)1 X PW
PW PW X(X PW X)1 X PW PW PW X(X PW X)1 X PW
I PW X(X PW X)1 X PW .
is orthogonal to
AX =
I X(X PW X)1 X PW X
= X X
= 0
166
CHAPTER 11.
so
s(IV ) = A PW A
Supposing the
2 ,
A PW A
s(IV )
=
2
2
is a quadrati
form of a
N (0, 1)
s(IV )
2 ((A PW A))
2
This isn't available, sin
e we need to estimate
Even if the
2 .
s(IV ) a 2
((A PW A))
c
2
The last
A PW A = PW PW X(X PW X)1 X PW
so
(A PW A) = T r PW PW X(X PW X)1 X PW
= T rW (W W )1 W KX
= T rW W (W W )1 KX
= KW KX
where
KW
and
KX
X.
The degrees of freedom of the test is simply the number of overidentifying restri
tions:
the number of instruments we have beyond the number that is stri
tly ne
essary for
onsistent estimation.
and
that the
y = Z +
and
This is a parti
ular
ase of the GMM
riterion test, whi
h is
overed in the se
ond half
of the
ourse. See Se
tion 15.8.
IV = A
11.7.
167
and
s(IV ) = A PW A
we
an write
s(IV )
c2
where
Ru2
W (W W )1 W W (W W )1 W
=
/n
= n(RSSIV |W /T SSIV )
= nRu2
is the un entered
instruments
W.
R2
IV
y = X +
and
so
W X
cols(W ) = cols(X),
PW y = PW X + PW
and the fon
are
X PW (y X IV ) = 0
The IV estimator is
IV = X PW X
Considering the inverse here
X PW X
IV
1
X PW y,
1
X PW y
X W (W W )1 W X
1
1
= (W X)1 X W (W W )1
1
= (W X)1 (W W ) X W
we obtain
= (W X)1 (W W ) X W
= (W X)1 (W W ) X W
= (W X)1 W y
1
1
X PW y
X W (W W )1 W y
168
CHAPTER 11.
s(IV ) =
=
=
=
=
y X IV PW y X IV
X PW y X IV
y PW y X IV IV
X PW y + IV
X PW X IV
y PW y X IV IV
X PW y + X PW X IV
y PW y X IV IV
y PW y X IV
by the fon for generalized IV. However, when we're in the just indentied ase, this is
s(IV ) = y PW y X(W X)1 W y
= y PW I X(W X)1 W y
= y W (W W )1 W W (W W )1 W X(W X)1 W y
= 0
The value of the obje
tive fun
tion of the IV estimator is zero in the just identied
ase.
makes sense, sin
e we've already shown that the obje
tive fun
tion after dividing by
asymptoti
ally
This
is
2 (0)
mean 0 and varian
e 0, e.g., it's simply 0. This means we're not able to test the identifying
restri
tions in the
ase of exa
t identi
ation.
Y = XB + E
E(X E) = 0(KG)
vec(E) N (0, )
Et
are individually
11.8.
169
11 In 12 In 1G In
.
.
.
22 In
..
.
.
.
GG In
= In
This means that the stru
tural equations are heteros
edasti
and
orrelated with one
another
In general, ignoring this will lead to ine
ient estimation, following the se
tion on GLS.
When equations are
orrelated with one another estimation should a
ount for the
orrelation in order to obtain e
ien
y.
Also, sin
e the equations are
orrelated, information about one equation is impli
itly information about all equations. Therefore, overidenti
ation restri
tions in any equation
improve e
ien
y for
all
Single equation methods
an't use these types of information, and are therefore ine
ient
(in general).
11.8.1 3SLS
Note: It is easier and more pra
ti
al to treat the 3SLS estimator as a generalized method of
moments estimator (see Chapter 15). I no longer tea
h the following se
tion, but it is retained
Another alternative is to use FIML (Subse
tion 11.8.2),
if you are willing to make distributional assumptions on the errors. This is
omputationally
feasible with modern
omputers.
Following our above notation, ea
h stru
tural equation
an be written as
yi = Yi 1 + Xi 1 + i
= Zi i + i
Grouping the
y1
y2
.
.
.
yG
or
Z1 0
0
=
.
.
.
0
.
.
.
Z2
..
0
ZG
y = Z +
.
..
2
+ .
.
.
G
170
CHAPTER 11.
E( ) =
= In
The 3SLS estimator is just 2SLS
ombined with a GLS
orre
tion that takes advantage of the
stru
ture of
Dene
as
X(X X)1 X Z1 0
Z = .
..
Y1 X1
= .
.
.
0
X(X X)1 X Z2
.
.
.
..
0
Y2 X2
0
0
.
.
.
..
unrestri ted
0
YG XG
X(X X)1 X ZG
the exogs. The distin tion is that if the model is overidentied, then
= B1
may be subje
t to some zero restri
tions, depending on the restri
tions on
does not impose these restri
tions.
and
B,
and
= (Z Z)1 Z y
as
an be veried by simple multipli
ation, and noting that the inverse of a blo
k-diagonal
matrix is just the matrix with the inverses of the blo
ks on the main diagonal.
This IV
estimator still ignores the
ovarian
e information. The natural extension is to add the GLS
transformation, putting the inverse of the error
ovarian
e into the formula, whi
h gives the
3SLS estimator
3SLS
1
Z ( In )1 Z
Z ( In )1 y
1 1
=
Z In y
Z 1 In Z
=
residuals:
i = yi Zi i,2SLS
11.8.
171
(IMPORTANT NOTE:
estimated by
ij =
Substitute
Zi ,
not
Zi ).
i, j
of
is
i j
n
Analogously to what we did in the
ase of 2SLS, the asymptoti
distribution of the 3SLS
estimator
an be shown to be
!
Z ( I )1 Z 1
a
n
n 3SLS N 0, lim E
n
A formula for estimating the varian
e of the 3SLS estimator in nite samples (
an
elling out
the powers of
n)
is
1
1 In Z
V 3SLS = Z
In the ase that all equations are just identied, 3SLS is numeri ally equivalent to 2SLS.
tion.
Proving this is easiest if we use a GMM interpretation of 2SLS and 3SLS. GMM is
presented in the next e
onometri
s
ourse. For now, take it on faith.
al ulated equation by
= (X X)1 X Y
whi h is simply
= (X X)1 X
all
y1 y2 yG
It may seem odd that we use OLS on the redu ed form, sin e the rf equations are orrelated:
Yt = Xt B1 + Et 1
= Xt + Vt
and
Vt = 1 Et N 0, 1 1 , t
= 1 1
172
CHAPTER 11.
y1
y2
.
.
.
yG
where
yi
is the
n1
X 0
0
=
.
.
.
.
.
0
.
G
X
.
.
.
X
..
th
olumn of
exogs, i is the i
ith
v1
v2
+ .
.
.
vG
endog,
th
olumn of
and vi is the i
V.
is the entire
nK
matrix of
y = X + v
to indi
ate the pooled model. Following this notation, the error
ovarian
e matrix is
V (v) = In
(SUR)
sin e the parameter ve tor of ea h equation is dierent. The equations are on-
Xi
for ea h
equation.
Note that ea h equation of the system individually satises the lassi al assumptions.
However, pooled estimation using the GLS orre tion is more e ient, sin e equationby-equation estimation is equivalent to pooled estimation, sin e
X is blo k diagonal,
but
Xi
X = In X.
1.
(A B)1 = (A1 B 1 )
2.
(A B) = (A B )
and
OLS.
11.8.
3.
SU R =
=
=
=
173
we get
1
(In X) ( In )1 (In X)
(In X) ( In )1 y
1 1
1 X (In X)
X y
(X X)1 1 X y
IG (X X)1 X y
.2
.
.
11.8.2 FIML
Full information maximum likelihood is an alternative estimation method.
FIML will be
asymptoti
ally e
ient, sin
e ML estimators based on a given information set are asymptoti
ally e
ient w.r.t. all other estimators that use the same information set, and in the
ase
of the full-information ML estimator we use the entire information set. The 2SLS and 3SLS
estimators don't require distributional assumptions, while FIML of
ourse does. Our model
is, re
all
Yt = Xt B + Et
Et N (0, ), t
E(Et Es ) = 0, t 6= s
The joint normality of
Et
g/2
(2)
The transformation from
Et
to
Yt
det
1 1/2
Et
1
exp Et 1 Et
2
| det
dEt
| = | det |
dYt
174
CHAPTER 11.
Yt
is
1/2
1
exp Yt Xt B 1 Yt Xt B
2
Given the assumption of independen e over time, the joint log-likelihood fun tion is
ln L(B, , ) =
nG
n
1X
ln(2)+n ln(| det |) ln det 1
Yt Xt B 1 Yt Xt B
2
2
2 t=1
This is a nonlinear in the parameters obje tive fun tion. Maximixation of this an be
It turns out that the asymptoti distribution of 3SLS and FIML are the same,
One an al ulate the FIML estimator by iterating the 3SLS estimator, thus avoiding
done using iterative numeri methods. We'll see how to do this in the next se tion.
assuming
1. Cal ulate
3SLS
2. Cal ulate
=B
3SLS
1 .
3SLS
and
3SLS
B
as normal.
This is new, we didn't estimate
This estimator may have some zeros in it. When Greene says iterated 3SLS doesn't
lead to FIML, he means this for a pro
edure that doesn't update
updates
and
and
If you update
Y = X
you
do
but only
onverge to FIML.
and al ulate
using
and
to get the
FIML is fully e
ient, sin
e it's an ML estimator that uses all information. This implies
that 3SLS is fully e
ient
Also, if ea h equation
is just identied and the errors are normal, then 2SLS will be fully e
ient, sin
e in this
ase 2SLS3SLS.
When the errors aren't normally distributed, the likelihood fun
tion is of
ourse dierent
than what's written above.
CONSUMPTION EQUATION
11.9.
175
*******************************************************
2SLS estimation results
Observations 21
R-squared 0.976711
Sigma-squared 1.044059
Constant
Profits
Lagged Profits
Wages
estimate
16.555
0.017
0.216
0.810
st.err.
1.321
0.118
0.107
0.040
t-stat.
12.534
0.147
2.016
20.129
p-value
0.000
0.885
0.060
0.000
*******************************************************
INVESTMENT EQUATION
*******************************************************
2SLS estimation results
Observations 21
R-squared 0.884884
Sigma-squared 1.383184
Constant
Profits
Lagged Profits
Lagged Capital
estimate
20.278
0.150
0.616
-0.158
st.err.
7.543
0.173
0.163
0.036
t-stat.
2.688
0.867
3.784
-4.368
p-value
0.016
0.398
0.001
0.000
*******************************************************
WAGES EQUATION
*******************************************************
2SLS estimation results
Observations 21
R-squared 0.987414
Sigma-squared 0.476427
Constant
Output
Lagged Output
Trend
estimate
1.500
0.439
0.147
0.130
st.err.
1.148
0.036
0.039
0.029
t-stat.
1.307
12.316
3.777
4.475
p-value
0.209
0.000
0.002
0.000
176
CHAPTER 11.
*******************************************************
The above results are not valid (spe
i
ally, they are in
onsistent) if the errors are auto
orrelated, sin
e lagged endogenous variables will not be valid instruments in that
ase. You
might
onsider eliminating the lagged endogenous variables as instruments, and re-estimating
by 2SLS, to obtain
onsistent parameter estimates in this more
omplex
ase. Standard errors
will still be estimated in
onsistently, unless use a Newey-West type
ovarian
e estimator. Food
for thought...
Chapter 12
Introdu
tion to the se
ond half
We'll begin with study of
on a sample of size
sn (Zn , )
Zn
n.
Let
over a set
is
We'll usually write the obje tive fun tion suppressing the dependen e on
Zn .
= (X X)1 X y.
arg max Ln () =
n
Y
(2)
yt IIN ( 0 , 1).
1/2
t=1
(yt )2
exp
2
(0, ),
as
same
n
X
(yt )2
t=1
.
= y
sup-
MLE estimators are asymptoti ally e ient (Cramr-Rao lower bound, Theorem3),
One an investigate the properties of an ML estimator supposing that the distributional
posing the strong distributional assumptions upon whi
h they are based are true.
assumptions are in
orre
t. This gives a
177
178
CHAPTER 12.
yt
from the
1 = 1 ( 0 )
is a
distribution.
Here,
is the
2 ( 0 )
i.e., 1 ( 0 ) .
moment-parameter equation.
1 ( 0 ) = 0 ,
though in general
the relationship may be more ompli ated. The sample rst moment is
c1 =
n
X
yt /n.
t=1
Dene
The method of moments prin iple is to hoose the estimator of the parameter to set the
m1 () = 1 ()
c1
0.
, i.e., m1 ()
Then
In this ase,
=
m1 ()
Sin
e
Pn
t=1 yt /n
n
X
yt /n = 0.
t=1
V (yt ) = E yt 0
Dene
m2 () = 2
2
2 ( 0 )
r.v. is
= 2 0 .
Pn
t=1 (yt
y)2
= 2
m2 ()
Pn
t=1 (yt
y)2
0.
Again, by the LLN, the sample varian e is onsistent for the true varian e, that is,
Pn
t=1 (yt
y)2
2 0 .
179
So,
Pn
y)2
t=1 (yt
2n
overidenti a-
tion, whi
h means that we have more information than is stri
tly ne
essary for
onsistent
estimation of the parameter.
The GMM
ombines information from the two moment-parameter equations to form a
new estimator whi
h will be
m1t (),
i.e.,
m1t () = yt .
m1 () = 1/n
=
n
X
t=1
n
X
m1 ()
is the sample
m1t ()
yt /n.
t=1
0.
0 , both E m1t ( 0 ) = 0 and E m1 ( 0 ) =
m2 () = 2
Again, it is
lear from the LLN that
either
=0
m1 ()
or
= 0.
m2 ()
Pn
t=1 (yt
a.s.
m2 ( 0 ) 0.
y)2
to set
simultaneously.
d(m()),
where
m() =
and hoosing
d(m) = m Am,
where
it's lear that the MM gives onsistent estimates if there is a one-to-one relationship between
180
CHAPTER 12.
parameters and moments, it's not immediately obvious that the GMM estimator is
onsistent.
(We'll see later that it is.)
These examples show that these widely used estimators may all be interpreted as the
solution of an optimization problem.
useful for its generality.
We will see that the general results extend smoothly to the more
general, we will study the GMM estimator, then QML and NLS. The reason we study GMM
rst is that LS, IV, NLS, MLE, QML and other well-known parametri
estimators may all be
interpreted as spe
ial
ases of the GMM estimator, so the general results on GMM
an simplify
and unify the treatment of these other estimators. Nevertheless, there are some spe
ial results
on QML and NLS, and both are important in empiri
al resear
h, whi
h makes fo
us on them
useful.
linear models aren't useful. Linear models are more general than they might rst appear, sin
e
one
an employ nonlinear transformations of the variables:
0 (yt ) =
For example,
0 + t
In spite of this generality, situations often arise whi
h simply
an not be
onvin
ingly represented by linear in the parameters models. Also, theory that applies to nonlinear models also
applies to linear models, so one may as well start o with the general
ase.
xi =
ith
of
goods is
v(p, y)/pi
.
v(p, y)/y
An expenditure share is
so ne essarily
si [0, 1],
and
PG
i=1 si
si pi xi /y,
= 1.
xi
or
si
with
a parameter spa
e that is dened independent of the data
an guarantee that either of these
onditions holds. These
onstraints will often be violated by estimated linear models, whi
h
alls into question their appropriateness in
ases of this sort.
181
binary response) models. Individuals are asked if they would pay an amount
a proje
t. Indire
t utility in the base
ase (no proje
t) is
1
utility is v (m, z)
i , i
1
0
1}
| {z
= 0 1 ,
Dene
y =1
let
olle t
<
= 1, 2,
to pay
A for provision of
et .
After provision,
v 1 (m A, z) v 0 (m, z)
{z
}
|
v(w, A)
and
agreement is
(12.1)
v 1 (m, z) = m
v 0 (m, z) = m
and
and
are i.i.d.
in
ome, preferen
es in both states are homotheti
, and a spe
i
distributional assumption is
made on the distribution of preferen
es in the population. With these assumptions (the details
are unimportant here, see arti
les by D. M
Fadden if you're interested) it
an be shown that
p(A, ) = ( + A) ,
where
(z) is
(z) = (1 + exp(z))1 .
This is the simple logit model:
is either
is
( + A)
. Thus, we an write
y = ( + A) +
E() = 0.
One
ould estimate this by (nonlinear) least squares
1
1X
(y ( + A))2
,
= arg min
n t
We assume here that responses are truthful, that is there is no strategi behavior and that individuals are
182
CHAPTER 12.
( + A)
A,
there are no
, (A)
su h that
( + A) = (A) , A
where
(A)
is a
1,
p-ve
tor
,
we an always nd a
and
is a
whi h is illogi al, sin e it is the expe tation of a 0/1 binary random variable.
Sin e this
sort of problem o
urs often in empiri
al work, it is useful to study NLS and other nonlinear
models.
After dis
ussing these estimation methods for parametri
models we'll briey introdu
e
f (xt )
onsistently when we are not willing to assume that a model of the form
yt = f (xt ) + t
an be restri
ted to a parametri
form
yt = f (xt , ) + t
Pr(t < z) = F (z|, xt )
,
where
f ()
and perhaps
F (z|, xt )
e
onomi
theory gives us general information about fun
tions and the signs of their derivatives,
but not about their spe
i
form.
Then we'll look at simulation-based methods in e
onometri
s. These methods allow us to
substitute
omputer power for mental power.
heap
ompared to mental eort, any e
onometri
ian who lives by the prin
iples of e
onomi
theory should be interested in these te
hniques.
Finally, we'll look at how e
onometri
omputations
an be done in parallel on a
luster of
omputers. This allows us to harness more
omputational power to work with more
omplex
models that
an be dealt with using a desktop
omputer.
Chapter 13
Numeri
optimization methods
Readings:
This se tion gives a very brief introdu tion to what is a large literature on nu-
meri
optimization methods. We'll
onsider a few well-known te
hniques, and one fairly new
te
hnique that may allow one to solve di
ult problems.
familiar with the issues, and to learn how to use the BFGS algorithm at the pra
ti
al level.
The general problem we
onsider is how to nd the maximizing element
of a fun
tion
s().
(a K
-ve tor)
This fun tion may not be ontinuous, and it may not be dierentiable.
s()
e.g.,
1
s() = a + b + C,
2
the rst order
onditions would be linear:
D s() = b + C
so the maximizing (minimizing) element would be
= C 1 b.
have with linear models estimated by OLS. It's also the
ase for feasible GLS, sin
e
onditional
on the estimate of the var
ov matrix, we have a quadrati
obje
tive fun
tion in the remaining
parameters.
More general problems will not have linear f.o.
., and we will not be able to solve for the
maximizer analyti
ally. This is when we need a numeri
optimization method.
13.1 Sear
h
The idea is to
reate a grid over the parameter spa
e and evaluate the fun
tion at ea
h point
on the grid. Sele
t the best point. Then rene the grid in the neighborhood of the best point,
and
ontinue until the a
ura
y is good enough. See Figure
183
??.
184
CHAPTER 13.
the grid is ne enough in relationship to the irregularity of the fun
tion to ensure that sharp
peaks are not missed entirely.
To
he
k
values in ea h dimension of a
q = 100
and
K = 10,
there would be
10010
points to he k. If 1000
9
points
an be
he
ked in a se
ond, it would take 3. 171 10 years to perform the
al
ulations,
whi h is approximately the age of the earth. The sear h method is a very reasonable hoi e if
is moderate or large.
k+1
given
k
and
hoosing the dire
tion of movement, d , whi
h is of the same dimension of
ak
,
(a s alar)
so that
(k+1) = (k) + ak dk .
s( + ad)
>0
a
d,
k
k
all be represented as Q g( ) where
the gradient at
Qk
g () = D s()
is
a0 = 0
we need g() d
> 0.
the
o(1)
Dening
term an be ignored. If
d = Qg(),
where
and
Qg()
13.2.
185
DERIVATIVE-BASED METHODS
13.1.
(k+1) = (k) + ak Qk g( k )
and we keep going until the gradient be
omes zero, so that there is no in
reasing dire
tion.
The problem is how to
hoose
Conditional on Q,
and
hoosing
Q.
a
is fairly straightforward.
is a s alar.
Q.
gradient provides the dire tion of maximum rate of hange of the obje tive fun tion.
186
CHAPTER 13.
Disadvantages:
This doesn't always work too well however (draw pi ture of banana
fun tion).
13.2.3 Newton-Raphson
The Newton-Raphson method uses information about the slope and
urvature of the obje
tive
fun
tion to determine whi
h dire
tion and how far to move from an initial point. Supposing
we're trying to maximize
sn ().
sn ()
k
about (an initial guess).
sn () sn ( k ) + g( k ) k + 1/2 k H( k ) k
To attempt to maximize
depends on
sn (),
i.e., we
an maximize
s() = g( k ) + 1/2 k H( k ) k
with respe t to
. This is a mu h
, so it has
D s() = g( k ) + H( k ) k
k+1 = k H( k )1 g( k )
This is illustrated in Figure
??.
sn ()
k+1 = k ak H( k )1 g( k )
A potential problem is that the Hessian may not be negative denite when we're far from
the maximizing point. So
H( k )1
H( k )1 g( k )
may not dene an in
reasing dire
tion of sear
h. This
an happen when the obje
tive
fun
tion has at regions, in whi
h
ase the Hessian matrix is very ill-
onditioned (e.g.,
is nearly singular), or when we're in the vi
inity of a lo
al minimum,
denite, and our dire
tion is a
de reasing
H( k )
is positive
puters are subje
t to large errors when the matrix is ill-
onditioned. Also, we
ertainly
don't want to go in the dire
tion of a minimum when we're maximizing. To solve this
problem,
Quasi-Newton
H()
to
is
where
13.2.
187
DERIVATIVE-BASED METHODS
is an order of magnitude (in the dimension of the parameter ve
tor) more
ostly than
al
ulation of the gradient. They
an be done to ensure that the approximation is p.d.
DFP and BFGS are two well-known examples.
Stopping
riteria
The last thing we need is to de
ide when to stop. A digital
omputer is subje
t to limited
ma
hine pre
ision and round-o errors. For these reasons, it is unreasonable to hope that a
program
an
jk jk1
jk1
| < 2 , j
|s( k ) s( k1 )| < 3
|gj ( k )| < 4 , j
Also, if we're maximizing, it's good to
he
k that the last round (real, not approximate)
Hessian is negative denite.
Starting values
The Newton-Raphson and related algorithms work well if the obje
tive fun
tion is
on
ave
(when maximizing), but not so well if there are
onvex regions and lo
al minima or multiple
lo
al maxima. The algorithm may
onverge to a lo
al minimum or to a lo
al maximum that
is not optimal. The algorithm may also have di
ulties
onverging at all.
The usual way to ensure that a global maximum has been found is to use many dierent
starting values, and
hoose the solution that returns the highest obje
tive fun
tion value.
sn ()
is ompli ated.
188
CHAPTER 13.
Possible solutions are to
al
ulate derivatives numeri
ally, or to use programs su
h as MuPAD
1
or Mathemati
a to
al
ulate analyti
derivatives. For example, Figure 13.2 shows MuPAD
al
ulating a derivative that I didn't know o the top of my head, and one that I did know.
Numeri
derivatives are less a
urate than analyti
derivatives, and are usually more
ostly to evaluate. Both fa
tors usually
ause optimization programs to be less su
essful
when numeri
derivatives are used.
One advantage of numeri
derivatives is that you don't have to worry about having made
an error in
al
ulating the analyti
derivative. When programming analyti
derivatives
it's a good idea to
he
k that they are
orre
t by using numeri
derivatives. This is a
lesson I learned the hard way when writing my thesis.
Numeri
se
ond derivatives are mu
h more a
urate if the data are s
aled so that the
elements of the gradient are of the same order of magnitude. Example: if the model is
yt = h(xt + zt ) + t ,
D sn () = 0.001.
D sn ()
= /1000;
and
D sn ()
xt
= 1000xt
D sn () = 1000
1000; zt
and
= zt /1000.
will both be 1.
MuPAD is not a freely distributable program, so it's not on the CD. You an download it from
http://www.mupad.de/download.shtml
13.3.
189
SIMULATED ANNEALING
In general, estimation programs always work better if data is s
aled in this way, sin
e
roundo errors are less likely to be
ome important.
There are algorithms (su
h as BFGS and DFP) that use the sequential gradient evaluations to build up an approximation to the Hessian. The iterations are faster for this
reason sin
e the a
tual Hessian isn't
al
ulated, but more iterations usually are required
for
onvergen
e.
sele
ts evaluation points, a
epts all points that yield an in
rease in the obje
tive fun
tion,
but also a
epts some points that de
rease the obje
tive fun
tion. This allows the algorithm
to es
ape from lo
al minima. As more and more points are tried, periodi
ally the algorithm
fo
uses on the best point so far, and redu
es the range over whi
h random points are generated. Also, the probability that a negative move is a
epted redu
es. The algorithm relies on
many evaluations, as in the sear
h method, but fo
uses in on promising areas, whi
h redu
es
fun
tion evaluations with respe
t to the sear
h method. It does not require derivatives to be
evaluated. I have a program to do this if you're interested.
13.4 Examples
This se
tion gives a few examples of how some nonlinear models may be estimated using
maximum likelihood.
y = g(x)
y = 1(y > 0)
P r(y = 1) = F [g(x)]
p(x, )
The log-likelihood fun
tion is
190
CHAPTER 13.
sn () =
1X
(yi ln p(xi , ) + (1 yi ) ln [1 p(xi , )])
n
i=1
For the logit model (see the
ontingent valuation example above), the probability has the
spe
i
form
p(x, ) =
1
1 + exp(x)
You should download and examine LogitDGP.m , whi
h generates data a
ording to the
logit model, logit.m , whi
h
al
ulates the loglikelihood, and EstimateLogit.m , whi
h sets
things up and
alls the estimation routine, whi
h uses the BFGS algorithm.
Here are some estimation results with
n = 100,
= (0, 1) .
***********************************************
Trial of MLE estimation of Logit model
MLE Estimation Results
BFGS
onvergen
e: Normal
onvergen
e
Average Log-L: 0.607063
Observations: 100
onstant
slope
estimate
0.5400
0.7566
st. err
0.2229
0.2374
t-stat
2.4224
3.1863
p-value
0.0154
0.0014
Information Criteria
CAIC : 132.6230
BIC : 130.6230
AIC : 125.4127
***********************************************
13.4.2 Count Data: The MEPS data and the Poisson model
Demand for health
are is usually thought of a a derived demand: health
are is an input to
a home produ
tion fun
tion that produ
es health, and health is an argument of the utility
fun
tion. Grossman (1972), for example, models health as a
apital sto
k that is subje
t to
depre
iation (e.g., the ee
ts of ageing). Health
are visits restore the sto
k. Under the home
produ
tion framework, individuals de
ide when to make health
are visits to maintain their
health sto
k, or to deal with negative sho
ks to the sto
k in the form of a
idents or illnesses.
13.4.
191
EXAMPLES
As su
h, individual demand will be a fun
tion of the parameters of the individuals' utility
fun
tions.
The MEPS health data le ,
meps1996.data,
of health
are usage. The data is from the 1996 Medi
al Expenditure Panel Survey (MEPS).
You
an get more information at
http://www.meps.ahrq.gov/.
are o
e-based visits (OBDV), outpatient visits (OPV), inpatient visits (IPV), emergen
y
room visits (ERV), dental visits (VDV), and number of pres
ription drugs taken (PRESCR).
These form
olumns 1 - 6 of
meps1996.data.
(PUBLIC), private insuran
e (PRIV), sex (SEX), age (AGE), years of edu
ation (EDUC), and
in
ome (INCOME). These form
olumns 7 - 12 of the le, in the order given here. PRIV and
PUBLIC are 0/1 binary variables, where a 1 indi
ates that the person has a
ess to publi
or
private insuran
e
overage. SEX is also 0/1, where 1 indi
ates that the person is female. This
data will be used in examples fairly extensively in what follows.
The program ExploreMEPS.m shows how the data may be read in, and gives some des
riptive information about variables, whi
h follows:
All of the measures of use are
ount data, whi
h means that they take on the values
0, 1, 2, ....
It might be reasonable to try to use this information by spe ifying the density as a
ount data density. One of the simplest ount data densities is the Poisson density, whi h is
exp()y
.
y!
fY (y) =
The Poisson average log-likelihood fun
tion is
1X
sn () =
(i + yi ln i ln yi !)
n
i=1
i = exp(xi )
xi = [1 P U BLIC P RIV SEX AGE EDU C IN C]
(13.1)
This ensures that the mean is positive, as is required for the Poisson model. Note that for this
parameterization
j =
/j
so
j xj = xj ,
the elasti
ity of the
onditional mean of
j th
onditioning variable.
The program EstimatePoisson.m estimates a Poisson model using the full data set. The
results of the estimation, using OBDV as the dependent variable are here:
192
CHAPTER 13.
OBDV
******************************************************
Poisson model, MEPS 1996 full data set
MLE Estimation Results
BFGS
onvergen
e: Normal
onvergen
e
Average Log-L: -3.671090
Observations: 4564
onstant
pub. ins.
priv. ins.
sex
age
edu
in
estimate
-0.791
0.848
0.294
0.487
0.024
0.029
-0.000
st. err
0.149
0.076
0.071
0.055
0.002
0.010
0.000
t-stat
-5.290
11.093
4.137
8.797
11.471
3.061
-0.978
p-value
0.000
0.000
0.000
0.000
0.000
0.002
0.328
Information Criteria
CAIC : 33575.6881
Avg. CAIC: 7.3566
BIC : 33568.6881
Avg. BIC:
7.3551
AIC : 33523.7064
Avg. AIC:
7.3452
******************************************************
For example, it may be the duration of a strike, or the time needed to nd a
job on
e one is unemployed. Su
h variables take on values on the positive real line, and are
referred to as duration data.
A
spell
event.
is the period of time between the o uren e of initial event and the on luding
For example, the initial event ould be the loss of a job, and the nal event is the
t0
t1
For simpli
ity, assume that time is measured in years. The random variable
of the spell,
D = t1 t0 .
D, fD (t),
is the duration
13.4.
193
EXAMPLES
Several questions may be of interest. For example, one might wish to know the expe
ted
time one has to wait to nd a job given that one has already waited
years is
years is
fD (t)
.
1 FD (s)
fD (t|D > s) =
The expe tan ed additional time required for the spell to end given that is has already lasted
E = E(D|D > s) s =
Z
fD (z)
z
dz
1 FD (s)
To estimate this fun tion, one needs to spe ify the density
s.
fD (t)
as a parametri density,
et .
To illustrate appli
ation of this model, 402 observations on the lifespan of mongooses in
Serengeti National Park (Tanzania) were used to t a Weibull model. The spell in this
ase
is the lifetime of an individual mongoose. The parameter estimates and standard errors are
= 0.559 (0.034)
and
= 0.867 (0.033)
Figure 13.3
presents tted life expe
tan
y (expe
ted additional years of life) as a fun
tion of age, with
95%
onden
e bands. The plot is a
ompanied by a nonparametri
Kaplan-Meier estimate of
life-expe
tan
y. This nonparametri
estimator simply averages all spell lengths greater than
age, and then subtra
ts age. This is
onsistent by the LLN.
In the gure one
an see that the model doesn't t the data well, in that it predi
ts life
expe
tan
y quite dierently than does the nonparametri
model. For ages 4-6, the nonparametri
estimate is outside the
onden
e interval that results from the parametri
model,
whi
h
asts doubt upon the parametri
model.
seem to have a lower life expe
tan
y than is predi
ted by the Weibull model, whereas young
mongooses that survive beyond infan
y have a higher life expe
tan
y, up to a bit beyond 2
years. Due to the dramati
hange in the death rate as a fun
tion of t, one might spe
ify
fD (t)
2
1
fD (t|) = e(1 t) 1 1 (1 t)1 1 + (1 ) e(2 t) 2 2 (2 t)2 1 .
The parameters
and
i , i = 1, 2
is
194
CHAPTER 13.
13.5.
195
and
=1
are not identied. It is possible to take this into a ount, but this topi is out of the
s
ope of this
ourse. Nevertheless, the improvement in the likelihood fun
tion is
onsiderable.
The parameter estimates are
Parameter
Estimate
St. Error
0.233
0.016
1.722
0.166
1.731
0.101
1.522
0.096
0.428
0.035
Note that the mixture parameter is highly signi
ant. This model leads to the t in Figure
13.4.
Note that the parametri and nonparametri ts are quite lose to one another, up
to around
years.
The disagreement after this point is not too important, sin e less than
5% of mongooses live more than 6 years, whi
h implies that the Kaplan-Meier nonparametri
estimate has a high varian
e (sin
e it's an average of a small number of observations).
Mixture models are often an ee
tive way to model
omplex responses, though they
an
suer from overparameterization. Alternatives will be dis
ussed later.
matePoisson.m, the data will not be s
aled, and the estimation program will have di
ulty
onverging (it seems to take an innite amount of time). With uns
aled data, the elements
of the s
ore ve
tor have very dierent magnitudes at the initial value of
this run Che
kS
ore.m. With uns
aled data, one element of the gradient is very large, and the
maximum and minimum elements are 5 orders of magnitude apart. This
auses
onvergen
e
problems due to serious numeri
al ina
ura
y when doing inversions to
al
ulate the BFGS
dire
tion of sear
h. With s
aled data, none of the elements of the gradient are very large, and
the maximum dieren
e in orders of magnitude is 3. Convergen
e is qui
k.
196
CHAPTER 13.
13.5.
197
Let's try to nd the true minimizer of minus 1 times the foggy mountain fun
tion (sin
e the
algoritms are set up to minimize). From the pi
ture, you
an see it's
lose to
(0, 0),
but let's
pretend there is fog, and that we don't know that. The program FoggyMountain.m shows that
poor start values
an lead to problems. It uses SA, whi
h nds the true global minimum, and
it shows that BFGS using a battery of random start values
an also nd the global minimum
help. The output of one run is here:
198
CHAPTER 13.
======================================================
BFGSMIN final results
Used numeri
gradient
-----------------------------------------------------STRONG CONVERGENCE
Fun
tion
onv 1 Param
onv 1 Gradient
onv 1
-----------------------------------------------------Obje
tive fun
tion value -0.0130329
Stepsize 0.102833
43 iterations
-----------------------------------------------------param
gradient
hange
15.9999 -0.0000
0.0000
-28.8119 0.0000
0.0000
The result with poor start values
ans =
16.000 -28.812
================================================
SAMIN final results
NORMAL CONVERGENCE
Fun
. tol. 1.000000e-10 Param. tol. 1.000000e-03
Obj. fn. value -0.100023
parameter
sear
h width
0.037419
0.000018
-0.000000
0.000051
================================================
Now try a battery of random start values and
a short BFGS on ea
h, then iterate to
onvergen
e
The result using 20 randoms start values
ans =
3.7417e-02
2.7628e-07
13.5.
199
In that run, the single BFGS run with bad start values
onverged to a point far from the
true minimizer, whi
h simulated annealing and BFGS using a battery of random start values
both found the true maximizer. Using a battery of random start values, we managed to nd
the global max. The moral of the story is to be
autious and don't publish your results too
qui
kly.
200
CHAPTER 13.
the le
3. Using logit.m and EstimateLogit.m as templates, write a fun
tion to
al
ulate the probit
log likelihood, and a s
ript to estimate a probit model. Run it using data that a
tually
follows a logit model (you
an generate it in the same way that is done in the logit
example).
4. Study
mle_results.m
mle_results.m
alls, and in turn the fun
tions that those fun
tions
all. Write a
omplete des
ription
of how the whole
hain works.
5. Look at the Poisson estimation results for the OBDV measure of health
are use and
give an e
onomi
interpretation. Estimate Poisson models for the other 5 measures of
health
are usage.
Chapter 14
Asymptoti
properties of extremum
estimators
Readings:
Hayashi (2000), Ch. 7; Gourieroux and Monfort (1995), Vol. 2, Ch. 24; Amemiya,
Ch. 4 se tion 4.1; Davidson and Ma Kinnon, pp. 591-96; Gallant, Ch. 3; Newey and M Fadden (1994), Large Sample Estimation and Hypothesis Testing, in
Vol. 4, Ch.
Handbook of E onometri s,
36.
sn ()h over
Zn =
a set
z1 z2 zn
where the
zt
are
sn (Zn , ) depend
p-ve tors
and
upon a
n p random
is nite.
sn (Zn , ) = 1/n
n
X
i=1
yi xi
2
= 1/n k Y X k2
14.2 Existen
e
If
sn (Zn , )
interest,
is ontinuous in
sn (Zn , )
and
fun tion, in whi h ase existen e will not be a problem, at least asymptoti ally.
201
202
CHAPTER 14.
14.3 Consisten
y
The following theorem is patterned on a proof in Gallant (1987) (the arti
le, ref. later), whi
h
we'll see in its original form later in the
ourse. It is interesting to
ompare the following proof
with Amemiya's Theorem 4.1.1, whi
h is done in terms of
onvergen
e in probability.
Theorem 19
[Consisten y of e.e.
Assume
1. Compa
tness: The parameter spa
e is an open bounded subset of Eu
lidean spa
e K .
So the
losure of , , is
ompa
t.
2. Uniform Convergen
e: There is a nonsto
hasti
fun
tion s () that is
ontinuous in
on su
h that
lim sup |sn () s ()| = 0, a.s.
n
Then n a.s.
0.
Proof:
Sele t a
{sn (, )}
Suppose that
one by
{nm } ({nm }
is
a limit point of
{n }.
There is a subsequen e
limm nm = .
By uniform
gen e implies
o
n
nm .
lim snm (t ) = s (t ).
m
Continuity of
s ()
implies that
lim s (t ) = s ()
t
sin
e the limit as
of
Next, by maximization
n o
t
is
14.3.
203
CONSISTENCY
However,
m
as seen above, and
lim snm ( 0 ) = s ( 0 )
m
by uniform
onvergen
e, so
s ( 0 ).
s ()
But by assumption (3), there is a unique global maximum of
= s ( 0 ),
s ()
have held
= 0 . Finally,
and
C
at
0,
so we must have
0
point, , ex
ept on a set
s ()
with
P (C) = 0.
Therefore
{n }
(2)
This proof relies on the identi ation assumption of a unique global maximum at
0 . An
Identi ation:
Any point
in
with
s () s ( 0 )
must be su h that
k 0 k= 0,
whi h mat hes the way we will write the assumption in the se tion on nonparametri inferen e.
We assume that
for
is in fa t a global maximum of
sn () . It is not required
to be unique
nite, though the identi ation assumption requires that the limiting obje tive
sn ()
may be a
non-trivial problem.
See Amemiya's Example 4.1.4 for a
ase where dis
ontinuity leads to breakdown of
onsisten
y.
is in the interior of
has not been used to prove
onsisten
y, so we
ould dire
tly assume that
element of a
ompa
t set
is simply an
this is ne
essary for subsequent proof of asymptoti
normality, and I'd like to maintain
a minimal set of simple assumptions, for
larity.
parameter set
ause theoreti
al di
ulties that we will not deal with in this
ourse. Just
note that
onventional hypothesis testing methods do not apply in this
ase.
Note that
sn ()
s ()
is.
In the se ond
gure, if the fun
tion is not
onverging around the lower of the two maxima, there is no
guarantee that the maximizer will be in the neighborhood of the global maximizer.
204
CHAPTER 14.
We need a uniform strong law of large numbers in order to verify assumption (2) of Theorem
19. The following theorem is from Davidson, pg. 337.
Let {Gn ()} be a sequen e of sto hasti real-valued fun tions on a totally-bounded metri spa e (, ). Then
Theorem 20
a.s.
if and only if
14.4.
205
The pointwise almost sure onvergen e needed for assuption (a) omes from one of the
Eu lidean norm.
usual SLLN's.
the obje
tive fun
tion is
ontinuous and bounded with probability one on the entire
parameter spa
e
These are reasonable
onditions in many
ases, and hen
eforth when dealing with spe
i
estimators we'll simply assume that pointwise almost sure
onvergen
e
an be extended
to uniform almost sure
onvergen
e in this way.
The more general theorem is useful in the
ase that the limiting obje
tive fun
tion
an be
ontinuous in
even if
sn ()
may be smoothed out as we take expe tations over the data. In the se tion on simlationbased estimation we will se a ase of a dis ontinuous obje tive fun tion.
(wt , t )
(y, w),
where
yt = 0 + 0 wt +t .
0
is
ompa
t. Let xt = (1, wt ) , so we
an write yt = xt + t . The sample obje
tive fun
tion
has the
ommon distribution fun
tion
is
sn () = 1/n
= 1/n
n
X
t=1
n
X
t=1
yt xt
xt 0
= 1/n
2
n
X
i=1
+ 2/n
xt 0 + t xt
n
X
t=1
2
n
X
2t
xt 0 t + 1/n
t=1
1/n
n
X
t=1
2
a.s.
2t
E() = 0
2 dW dE = 2 .
and
and
206
CHAPTER 14.
1/n
n
X
xt
t=1
=
=
2
a.s.
x 0
2
0 + 2 0 0
2
dW
wdW + 0
(14.1)
2
2
2
+ 2 0 0 E(w) + 0 E w2
0
w2 dW
Finally, the obje
tive fun
tion is
learly
ontinuous, and the parameter spa
e is assumed to
be
ompa
t, so the
onvergen
e is also uniform. Thus,
2
2
s () = 0 + 2 0 0 E(w) + 0 E w2 + 2
= 0 , = 0 .
Exer
ise 21 Show that in order for the above solution to be unique it is ne
essary that
E(w2 ) 6= 0.
of regressors.
Dis uss the relationship between this ondition and the problem of olinearity
This example shows that Theorem 19
an be used to prove strong
onsisten
y of the OLS
estimator. There are easier ways to show this, of
ourse - this is only an example of appli
ation
of the theorem.
Theorem 22
assume
(a) Jn () D2 sn() exists and is
ontinuous in an open,
onvex neighborhood of 0 .
(b) {Jn (n )} a.s.
J ( 0 ), a nite negative denite matrix, for any sequen
e {n } that
onverges almost surely to 0.
d
(
) nDsn ( 0 )
N 0, I ( 0 ) , where I ( 0 ) = limn V ar nD sn ( 0 )
d
Then n 0
N 0, J ( 0 )1 I ( 0 )J ( 0 )1
Proof:
By Taylor expansion:
D sn (n ) = D sn ( 0 ) + D2 sn ( ) 0
where
= + (1 ) 0 , 0 1.
14.5.
207
ASYMPTOTIC NORMALITY
will
D2 sn ()
Note that
Now the l.h.s. of this equation is zero, at least asymptoti ally, sin e
is a maximizer
and the f.o.
. must hold exa
tly sin
e the limiting obje
tive fun
tion is stri
tly
on
ave
in a neighborhood of
Also, sin e
0.
is between
and
0,
and sin e
a.s.
n 0
a.s.
D2 sn ( ) J ( 0 )
So
0 = D sn ( 0 ) + J ( 0 ) + op (1) 0
And
0=
Now
J ( 0 )
nD sn ( 0 ) + J ( 0 ) + op (1) n 0
op (1)
0
next to J ( ), so we
an write
a
0=
nD sn ( 0 ) + J ( 0 ) n 0
a
n 0 = J ( 0 )1 nD sn ( 0 )
Be ause of assumption ( ), and the formula for the varian e of a linear ombination of r.v.'s,
d
n 0 N 0, J ( 0 )1 I ( 0 )J ( 0 )1
Assumption (b) is not implied by the Slutsky theorem. The Slutsky theorem says that
a.s.
g(xn ) g(x)
depend on
if
xn x and g()
is ontinuous at
In our ase
x.
Jn (n ) is
g()
an't
n. A theorem whi h
To apply this to the se
ond derivatives, su
ient
onditions would be that the se
ond
derivatives be strongly sto
hasti
ally equi
ontinuous on a neighborhood of
an ordinary LLN applies to the derivatives when evaluated at
0,
and that
N ( 0 ).
Stronger onditions that imply this are as above: ontinuous and bounded se ond derivatives in a neighborhood of
0.
representable as an average of
sn ()
is
208
CHAPTER 14.
D2 sn ()
is also an average of
D2 sn ( 0 ),
(
):
nD sn
( 0 )
d
( 0 )
nD sn ( 0 ) = Op ()
n,
we'd have
D sn ( 0 ) = n 2 Op (1)
1
= Op n 2
is
en-
14.6 Examples
14.6.1 Coin ipping, yet again
Remember that in se
tion 4.4.1 we saw that the asymptoti
varian
e of the MLE of the
parameter of a Bernoulli trial, using i.i.d.
data, was
lim V ar n (
p p) = p (1 p).
Let's
verify this using the methods of this Chapter. The log-likelihood fun tion is
sn (p) =
so
1X
{yt ln p + (1 yt ) (1 ln p)}
n t=1
Esn (p) = p0 ln p + 1 p0 (1 ln p)
s (p) = p0 ln p + 1 p0 (1 ln p).
D2 sn (p)p=p0 Jn () =
n.
1
,
p0 )
p0 (1
1 (p0 ) = p0 1 p0
J
A bit
1 (p0 ).
lim V ar n p p0 = J
14.6.
209
EXAMPLES
y = x
y = 1(y > 0)
N (0, 1)
Here,
() =
(2)1/2 exp(
where
2
)d
2
x,
p(x, ) = (x ),
If
p(x, ) = (x ),
where
()
is the standard
is
the maximizer of
1X
(yi ln p(xi , ) + (1 yi ) ln [1 p(xi , )])
n
sn () =
i=1
n
1X
s(yi , xi , ).
n
sn ()
(14.2)
i=1
s(y, x, ).
to get
Ey|x {y ln p(x, ) + (1 y) ln [1 p(x, )]} = p(x, 0 ) ln p(x, ) + 1 p(x, 0 ) ln [1 p(x, )] .
s () =
where
(x)
p(x, 0 ) ln p(x, ) + 1 p(x, 0 ) ln [1 p(x, )] (x)dx,
(14.3)
is the support of
210
x)
CHAPTER 14.
p(x, )
x.
as long as
is ontinuous, and if the parameter spa e is ompa t we therefore have uniform almost
Z
X
p(x, )
s (), ,
p(x, 0 )
1 p(x, 0 )
p(x, )
p(x, ) (x)dx = 0
p(x, )
1 p(x, )
= 0.
is
onsistent. Question:
d
n 0 N 0, J ( 0 )1 I ( 0 )J ( 0 )1 .
I ( 0 ) = limn V ar nD sn ( 0 )
There's no need to subtra
t the mean, sin
e it's zero, following the f.o.
. in the
onsisten
y proof above and the fa
t that observations are i.i.d.
The terms in
1X
lim V ar nD
s( 0 )
n
n t
X
1
s( 0 )
= lim V ar D
n
n
t
X
1
D s( 0 )
= lim V ar
n n
t
lim V ar nD sn ( 0 ) =
lim V arD s( 0 )
= V arD s( 0 )
So we get
I ( ) = E
0
0
s(y, x, ) s(y, x, ) .
Likewise,
J ( 0 ) = E
Expe
tations are jointly over
x.
and
x,
2
s(y, x, 0 ).
onditional on
s(y, x, 0 ) = y ln p(x, 0 ) + (1 y) ln 1 p(x, 0 ) .
Now suppose that we are dealing with a orre tly spe ied logit model:
1
p(x, ) = 1 + exp(x )
.
x,
then over
14.6.
211
EXAMPLES
2
1 + exp(x )
exp(x )x
p(x, ) =
exp(x )
x
1 + exp(x )
= p(x, ) (1 p(x, )) x
= p(x, ) p(x, )2 x.
1
1 + exp(x )
So
s(y, x, 0 ) = y p(x, 0 ) x
2
s( 0 ) = p(x, 0 ) p(x, 0 )2 xx .
I ( ) =
=
where we use the fa
t that
then
(14.4)
gives
EY y 2 2p(x, 0 )p(x, 0 ) + p(x, 0 )2 xx (x)dx
p(x, 0 ) p(x, 0 )2 xx (x)dx.
EY (y) = EY (y 2 ) = p(x, 0 ).
J ( 0 ) =
(14.5)
(14.6)
Likewise,
p(x, 0 ) p(x, 0 )2 xx (x)dx.
(14.7)
Note that we arrive at the expe ted result: the information matrix equality holds (that is,
J ( 0 ) = I( 0 )).
simplies to
With this,
d
n 0 N 0, J ( 0 )1 I ( 0 )J ( 0 )1
d
n 0 N 0, J ( 0 )1
d
n 0 N 0, I ( 0 )1 .
On a nal note, the logit and standard normal CDF's are very similar - the logit distribution
is a bit more fat-tailed. While
oe
ients will vary slightly between the two models, fun
tions
of interest su
h as estimated probabilities
212
CHAPTER 14.
White,
1980 is an earlier
referen
e.
Suppose we have a nonlinear model
yi = h(xi , 0 ) + i
where
i iid(0, 2 )
The
estimator solves
1X
n = arg min
(yi h(xi , ))2
n
i=1
We'll study this more later, but for now it is
lear that the fo
for minimization will require
solving a set of nonlinear equations. A
ommon approa
h to the problem seeks to avoid this
di
ulty by
linearizing
the model. A rst order Taylor's series expansion about the point
yi = h(x0 , 0 ) + (xi x0 )
where
en ompasses both
x0
h(x0 , 0 )
+ i
x
is no longer a
= h(x0 , 0 ) x0
=
h(x0 , 0 )
x
h(x0 , 0 )
x
and
by applying OLS to
yi = + xi + i
Question, will
and
be onsistent for
and
and
(, ) .
n
= arg min sn () =
1X
(yi xi )2
n
i=1
u.a.s.
sn () s () = EX EY |X (y x)2
14.6.
and
213
EXAMPLES
onverges
a.s.
to the
that minimizes
s ():
EX EY |X y x
2
2
= EX EY |X h(x, 0 ) + x
2
= 2 + EX h(x, 0 ) x
drop out.
and
0
losest to the true regression fun
tion h(x, ) a
ording to the mean squared error
riterion.
This depends on both the shape of
h()
Tangent line
x
x
x
Fitted line
x
x
x_0
It is lear that the tangent line does not minimize MSE, sin e, for example, if
h(x, 0 )
is on ave, all errors between the tangent line and the true fun tion are negative.
Se
ond order and higher-order approximations suer from exa
tly the same problem,
though to a less severe degree, of
ourse. For this reason, translog, Generalized Leontiev
and other exible fun
tional forms based upon se
ond-order approximations in general
suer from bias and in
onsisten
y. The bias may not be too important for analysis of
onditional means, but it
an be very important for analyzing rst and se
ond derivatives.
e.g.,
elasti ities of
214
CHAPTER 14.
substitution) are often of interest, so in this
ase, one should be
autious of unthinking
appli
ation of models that impose stong restri
tions on se
ond derivatives.
This sort of linearization about a long run equilibrium is a
ommon pra
ti
e in dynami
ma
roe
onomi
models. It is justied for the purposes of theoreti
al analysis of a model
given the model's parameters, but it is not justiable for the estimation of the parameters
of the model using data.
obtaining
onsistent estimators of the parameters of dynami
ma
ro models that are too
omplex for standard methods of analysis.
14.7.
215
EXERCISES
xi
uniform(0,1), and
where
is iid(0,
2 ). Suppose we
0 and
yi = 1 x2i + i ,
2. Verify your results using O
tave by generating data that follows the above model, and
al
ulating the OLS estimator. When the sample size is very large the estimator should
be very
lose to the analyti
al results you obtained in question 1.
3. Use the asymptoti
normality theorem to nd the asymptoti
distribution of the ML
y = x 0 + , where
N (0, 1) and is independent of x.
2
s
()
n
0
0
This means nding
sn (), J ( ),
, and I( ). The expressions may involve
the unspe
ied density of x.
estimator of
1
Pr(y = 1|x) = 1 + exp( 0 x) .
0
mator of (this is a s
alar parameter).
(b) Now assume that
0
the ML estimator of .
(
) Comment on the results
216
CHAPTER 14.
Chapter 15
Generalized method of moments
Readings:
Hamilton Ch.
17 (see pg.
to appli
ations); Newey and M
Fadden (1994), "Large Sample Estimation and Hypothesis
Testing", in
36.
15.1 Denition
We've already seen one example of GMM in the introdu
tion, based upon the
Consider the following example based upon the t-distribution.
t-distributed r.v.
Yt
distribution.
is
fYt (yt , ) =
Given an iid sample of size
0 + 1 /2
( 0 )1/2 ( 0 /2)
n, one ould
estimate
arg max ln Ln () =
1 + yt2 / 0
(0 +1)/2
n
X
ln fYt (yt , )
t=1
This approa h is attra tive sin e ML estimators are asymptoti ally e ient.
This is
be
ause the ML estimator uses all of the available information (e.g., the distribution
is fully spe
ied up to a parameter). Re
alling that a distribution is
ompletely
hara
terized by its moments, the ML estimator is interpretable as a GMM estimator that
uses
all
estimate a
Kdimensional parameter.
moments to
fYt (yt , 0 )
and varian e
yt2 and
V (yt ) = 0 / 0 2
m1 () = 1/n
(for
0 > 2).
Pn
m1t () = / ( 2)
2
t=1 yt . As before, when evaluated
0 and E0 m1 ( 0 ) = 0.
Pn
218
CHAPTER 15.
Choosing
to
set
0
m1 ()
yields a MM estimator:
2
1
(15.1)
Pn 2
i yi
This estimator is based on only one moment of the distribution - it uses less information than
the ML estimator, so it is intuitively
lear that the MM estimator will be ine
ient relative
to the ML estimator.
An alternative MM estimator
ould be based upon the fourth moment of the t-distribution.
The fourth moment of a t-distributed r.v. is
4
provided that
0 > 4.
E(yt4 )
2
3 0
= 0
,
( 2) ( 0 4)
1X 4
3 ()2
yt
m2 () =
( 2) ( 4) n
t=1
to
set
0.
m2 ()
This estimator isn't e
ient either, sin
e it uses only one moment. A GMM estimator would
use the two moment
onditions together to estimate the single parameter. The GMM estimator
is overidentied, whi
h leads to an estimator whi
h is e
ient relative to the just identied
MM estimators (more on e
ien
y later).
As before, set
0
size. Note that m( )
whereas
The
m() = Op (1), 6= 0 ,
0
with parameter . This is the fundamental reason that GMM is
onsistent.
m() W
Wn
d (m()) =
m W
A popular hoi e
n m, and we minimize
sn () =
moment onditions, so
matrix.
d (m()).
m() is a g
is a
gg
For the purposes of this
ourse, the following denition of the GMM estimator is su
iently
general:
Denition 24 The GMM estimator of the K -dimensional parameter ve
tor 0 , arg min sn ()
P
mn () Wn mn (),
15.2.
219
CONSISTENCY
What's the reason for using GMM if MLE is asymptoti ally e ient?
Robustness: GMM is based upon a limited set of moment
onditions. For
onsisten
y,
only these moment
onditions need to be
orre
tly spe
ied, whereas MLE in ee
t
with respe
t to the MLE estimator. Keep in mind that the true distribution is not known
so if we erroneously spe
ify a distribution and estimate by MLE, the estimator will be
in
onsistent in general (not always).
Feasibility: in some
ases the MLE estimator is not available, be
ause we are not able
to dedu
e the likelihood fun
tion.
estimation. The GMM estimator may still be feasible even though MLE is not available.
15.2 Consisten
y
We simply assume that the assumptions of Theorem 19 hold, so the GMM estimator is strongly
onsistent. The only assumption that warrants additional
omments is that of identi
ation.
In Theorem 19, the third assumption reads:
0
maximum at ,
fun
tion
i.e., s
sn () = mn
( 0 )
() W
> s (), 6=
Identi ation:
( )
mn ().
a.s.
Sin e
E mn ( 0 ) = 0
Sin e
that
m () 6= 0
by assumption,
for
6=
m ( 0 ) = 0.
a.s.
Wn W , a nite positive
0
is asymptoti
ally identied.
assumption that
mn () m ().
gg
denite
gg
we need
Note that asymptoti identi ation does not rule out the possibility of la k of identi ation for a given data set - there may be multiple minimizing solutions in nite samples.
d
n 0 N 0, J ( 0 )1 I ( 0 )J ( 0 )1
where
2
sn () and
sn ( 0 ). We
I ( 0 ) = limn V ar n
need to determine the form of these matri es given the obje tive fun tion
sn () = mn () Wn mn ().
220
CHAPTER 15.
sn () = 2
mn () Wn mn ()
Dene the
K g
matrix
Dn ()
so:
m () ,
n
s() = 2D()W m () .
(Note that
sn (), Dn (), Wn
and
mn ()
(15.2)
n,
but it is omitted to
Di
be the
th row of
D().
2
s() =
i
2Di ()Wn m ()
= 2Di W D + 2m W
D
i
D()i
2m() W
at
0 , assume that
0 a.s.
2m( ) W
D( )i 0,
0
sin e
a.s.
m( 0 ) = op (1), W W .
lim
where we dene
rows of
D,
we get
sn ( 0 ) = J ( 0 ) = 2D W D
, a.s.,
lim D = D , a.s.,
With regard to
0
at (sin
e
and
lim W = W ,
Em( 0 )
=0
by assumption), we have
lim V ar n sn ( 0 )
n
I ( 0 ) =
m( 0 )
n.
Assuming this,
nm( 0 ) N (0, ),
15.4.
221
where
= lim E nm( 0 )m( 0 ) .
n
I ( 0 ) = 4D W W D
Using these results, the asymptoti
normality theorem gives us
h
i
d
1
1
n 0 N 0, D W D
D W W D
,
D W D
the asymptoti
distribution of the GMM estimator for arbitrary weighting matrix
that for
to be positive denite,
Wn .
Note
(D ) = k.
is a
weighting matrix, whi h determines the relative importan e of violations of the individ-
ual moment
onditions. For example, if we are mu
h more sure of the rst moment
ondition,
whi
h is based upon the varian
e, than of the se
ond, whi
h is based upon the fourth moment,
"
we ould set
W =
with
a 0
0 b
a mu h larger than b. In this ase, errors in the se ond moment ondition have less weight
Sin
e moments are not independent, in general, we should expe
t that there be a
orrelation between the moment
onditions, so it may not be desirable to set the o-diagonal
elements to 0.
of GMM estimators
dened by
Let
y = x + ,
P y = P X + P
1 ,
e.g,
where
N (0, ).
P P = 1 .
V (P ) = P V ()P = P P = P (P P )1 P =
P P 1 (P )1 P = In .
mn ().
MLE, we
(Note: we use
(AB)1 = B 1 A1
for
A, B
both nonsingular).
(y
X) 1 (y
X).
P y = P X + P
Interpreting
(y X) = ()
222
CHAPTER 15.
0 ),
This
result
arries over to GMM estimation. (Note: this presentation of GLS is not a GMM
estimator, be
ause the number of moment
onditions here is equal to the sample size,
n.
Later we'll see that GLS an be put into the GMM framework dened above).
a.s
W = 1
an
e of will be minimized by
hoosing Wn so that Wn
, where =
limn E nm( 0 )m( 0 ) .
Proof:
For
W = 1
,
D W D
simplies to
D 1
D
1
1
D W W D
D W D
1
W = 1
versus when
denite matrix:
D 1
D W D
D D W D D W W D
i
h
1/2
1
I 1/2
D W 1/2
= D 1/2
W D D W W D
D
as
an be veried by multipli
ation. The term in bra
kets is idempotent, whi
h is also easy
to
he
k by multipli
ation, and is therefore positive semidenite.
A quadrati form in a
positive semidenite matrix is also positive semidenite. The dieren
e of the inverses of the
varian
es is positive semidenite, whi
h implies that the dieren
e of the varian
es is negative
semidenite, whi
h proves the theorem.
The result
allows us to treat
h
i
d
1
n 0 N 0, D 1
D
N
where the
of
and
D 1
D
,
n
0
1 !
(15.3)
d
The obvious estimator of D is simply mn , whi
h is
onsistent by the
onsisten
y
assuming that m is
ontinuous in . Sto
hasti
equi
ontinuity results
an give
of ,
us this result even if
15.5.
223
In the
ase that we wish to use the optimal weighting matrix, we need an estimate of
the limiting varian
e-
ovarian
e matrix of
nmn ( 0 ).
paramet-
ri
ally, we in general have little information upon whi
h to base a parametri
spe
i
ation. In
general, we expe
t that:
mt
not depend on
ontemporaneously orrelated, sin e the individual moment onditions will not in general
2
= it
6= 0).
).
Sin
e we need to estimate so many
omponents if we are to take the parametri
approa
h, it
is unlikely that we would arrive at a
orre
t parametri
spe
i
ation. For this reason, resear
h
has fo
used on
onsistent nonparametri
estimators of
mt
mts
v = E(mt mts ).
t).
v th
Dene the
E(mt mt+s ) = v .
and
mt
Re all that
so for
!#
!
"
n
n
X
X
mt
1/n
mt
= E nm( 0 )m( 0 ) = E n 1/n
"
= E 1/n
n
X
mt
t=1
n
X
mt
t=1
!#
t=1
t=1
n2
n1
1
= 0 +
1 + 1 +
2 + 2 +
n1 + n1
n
n
n
nv
is
cv = 1/n
n
X
m
tm
tv .
t=v+1
would be
c + n 2
c + +
[
c1 +
c2 +
=
c0 + n 1
[
n1
1
2
n1
n
n
n1
Xnv
c .
c0 +
cv +
=
v
n
v=1
This estimator is in
onsistent in general, sin
e the number of parameters to estimate is more
than the number of observations, and in
reases more rapidly than
build up as
n .
n,
tends to
224
CHAPTER 15.
modied estimator
where
q(n)
as
=
c0 +
q(n)
X
c ,
cv +
v
v=1
nv
term
n
an be dropped be
ause
q(n)
must be
op (n).
q(n)
at a rate that satises a LLN. A disadvantage of this estimator is that it may not be positive
denite. This
ould
ause one to
al
ulate a negative
0,
then re-
iterations.
=
c0 +
q(n)
X
1
v=1
v
c .
cv +
v
q+1
This estimator is p.d. by
onstru
tion. The
ondition for
onsisten
y is that
that this is a very slow rate of growth for
no parametri
restri
tions on the form of
q.
kernel estimator.
(Review of E
onomi
Studies, 1994)
It is an example of a
whitening
n1/4 q 0. Note
use
pre-
before applying the kernel estimator. The idea is to t a VAR model to the moment
onditions. It is expe
ted that the residuals of the VAR model will be more nearly white noise,
so that the Newey-West
ovarian
e estimator might perform better with short lag lengths..
The VAR model is
m
t = 1 m
t1 + + p m
tp + ut
This is estimated, giving the residuals
u
t . Then the Newey-West
ovarian
e estimator is applied
c
c1 m
cp m
m
t =
t1 + +
tp
ut .
15.6.
225
X
EY |X Y =
Y f (Y |X)dY = 0
g(X)
of
is also zero.
EY g(X) =
Z Z
X
dX.
This
an be fa
tored into a
onditional expe
tation and an expe
tation w.r.t. the marginal
density of
X:
EY g(X) =
Sin e
g(X)
doesn't depend on
EY g(X) =
Z Z
X
Y g(X)f (Y |X)dY
f (X)dX.
Z Z
X
Y f (Y |X)dY
g(X)f (X)dX.
EY g(X) = 0
as
laimed.
This is important e
onometri
ally, sin
e models often imply restri
tions on
onditional
moments. Suppose a model tells us that the fun
tion
on the information set
It ,
equal to
K(yt , xt )
k(xt , ),
E K(yt , xt )|It = k(xt , ).
K(yt , xt ) = yt
so that
k(xt , ) =
xt .
t () = K(yt , xt ) k(xt , )
has
onditional expe
tation equal to zero
E t ()|It = 0.
yt = xt + t ,
we an set
226
CHAPTER 15.
(K > 1).
-dimensional parameter
However, the above result allows us to form various un onditional expe tations
mt () = Z(wt )t ()
where
information set
so as long as
It .
The
g>K
Z(wt ) are
wt
and
wt
instrumental variables.
moment onditions,
ng
matrix
Zg (w2 )
Z1 (w2 ) Z2 (w2 )
Zn =
.
..
.
.
.
Z1 (wn ) Z2 (wn ) Zg (wn )
Z1
Z2
Zn
With this we
an form the
moment onditions
mn () =
1 ()
2 ()
1
Zn
.
n
..
n ()
1 ()
2 ()
hn () =
..
.
n ()
1
Z hn ()
n n
n
1X
=
Zt ht ()
n
mn () =
t=1
n
1X
mt ()
=
n t=1
where
Z(t,)
is the
tth
row of
Zn .
An interesting question
15.6.
227
Z(wt )
to a hieve maximum
e
ien
y.
Note that with this
hoi
e of moment
onditions, we have that
matrix) is
Dn () =
=
Dn () =
Hn
is a
K n
m () (a
Kg
1
Zn hn ()
n
1
hn () Zn
n
whi h we an dene to be
where
Dn
1
Hn Zn .
n
its olumns. Likewise, dene the var- ov. of the moment onditions
n = E nmn ( 0 )mn ( 0 )
1
0
0
Z hn ( )hn ( ) Zn
= E
n n
1
0
0
= Zn E
hn ( )hn ( ) Zn
n
n
Zn
Zn
n
n = V hn ( 0 ) . Note that the dimension
with the sample size, so it is not
onsistently estimable without additional assumptions.
The asymptoti
normality theorem above says that the GMM estimator using the optimal
weighting matrix is distributed as
d
n 0 N (0, V )
where
V = lim
Hn Zn
n
Zn n Zn
n
1
Zn Hn
n
!1
(15.4)
Zn = 1
n Hn
auses the above var-
ov matrix to simplify to
V = lim
Hn 1
n Hn
n
1
(15.5)
and furthermore, this matrix is smaller that the limiting var-
ov for any other
hoi
e of instrumental variables. (To prove this, examine the dieren
e of the inverses of the var-
ov matri
es
with the optimal intruments and with non-optimal instruments. As above, you
an show that
228
CHAPTER 15.
Usually, estimation of
0 , and
where
Hn , whi h
Hn ( 0 ), sin e it
depends on
is
Hn
b = h ,
H
n
Estimation of
elements than
n,
It is an
nn
estimated
onsistently. Basi
ally, you need to provide a parametri
spe
i
ation of the
ovarian
es of the
ht () in
approximate this matrix parametri
ally to dene the instruments. Note that the simplied var-
ov matrix in equation 15.5 will not apply if approximately optimal instruments
are used - it will be ne
essary to use an estimator based upon equation 15.4, where the
term
n1 Zn n Zn
pro edure.
1
s() = 2
mn mn 0
or
0
1 mn ()
D()
Consider a Taylor expansion of
Multiplying by
1
D()
:
m()
= mn ( 0 ) + D ( 0 ) 0 + op (1).
m()
n
we obtain
= D()
1 m()
1 mn ( 0 ) + D()
1 D( 0 ) 0 + op (1)
D()
(15.6)
15.8.
229
A SPECIFICATION TEST
tends
to
and
tends to
we an write
0 a
1
0
D 1
m
(
)
=
D
D
n
or
a
1
0
n 0 = n D 1
D 1
D
mn ( )
With this, and taking into a ount the original expansion (equation 15.6), we get
=
nm()
nmn ( 0 )
Or
a
=
nm()
1/2
1/2
1 1
1/2
mn ( 0 )
n D D D
D
a
n1/2
m() =
Now
1
0
nD D 1
D 1
D
mn ( ).
0
1 1
1/2
1/2
n Ig 1/2
D
D
D
D
mn ( )
0
n1/2
mn ( ) N (0, Ig )
is idempotent of rank
tra
e) so
1 1
1/2
P = Ig 1/2
D
D D D
g K,
d
1/2
= nm()
1 m()
n1/2
m(
)
n
m(
)
2 (g K)
Sin e
onverges to
we also have
2 (g K)
1 m()
nm()
or
d
n sn ()
2 (g K)
supposing the model is
orre
tly spe
ied.
n,
2 (g K)
riti al
value. The test is a general test of whether or not the moments used to estimate are
orre
tly
spe
ied.
This won't work when the estimator is just identied. The f.o. . are
0.
1 m()
D sn () = D
230
CHAPTER 15.
and
0.
m()
So the moment
onditions are zero
regardless
= 0,
sn ()
down.
A note: this sort of test often over-reje
ts in nite samples. One should be
autious in
reje
ting a model when this test reje
ts.
y = X 0 + ,
and
a diagonal matrix.
= (),
the parameterization of
N (0, ),
where
where
is a nite dimensional
is orre t.
we an still estimate
= (X X)1
V ()
2
onsistently by
xt
(a
mt () = xt yt xt .
m() = 1/n
parameters and
mt = 1/n
X
t
xt yt 1/n
xt xt .
That is, sin
e the number of moment
onditions is identi
al to the number of parameters, the
fo
imply that
weighting matrix
in this ase, an identity matrix works just as well for the purpose of estimation. Therefore
X
t
xt xt
!1
X
t
xt yt = (X X)1 X y,
15.9.
d
D
231
is simply
m .
In this ase
d
D
= 1/n
X
t
d
b 1 d
D
D
1
Re all that
xt xt = X X/n.
is
=
c0 +
n1
X
v=1
c .
cv +
This is in general in onsistent, but in the present ase of nonauto orrelation, it simplies to
=
c0
will
a umulate, and
b =
c0 = 1/n
= 1/n
= 1/n
=
where
is an
nn
"
"
n
X
t=1
n
X
n
X
m
tm
t
t=1
xt xt
2
yt xt
xt xt 2t
t=1
X EX
2t
in the position
t, t.
!
)1
1 X X
X EX
X X
n
n
n
!
1
XX
X X 1
X EX
=
n
n
n
V
n
=
(
This is the var
ov estimator that White (1980) arrived at in an inuential arti
le. This estimator is
onsistent under heteros
edasti
ity of an unknown form. If there is auto
orrelation,
the Newey-West estimator
an be used to estimate
232
CHAPTER 15.
y = X 0 +
N (0, )
where
is a diagonal matrix.
is known, so that
X).
( 0 )
1 1
= X 1 X
X y)
= 1/n
m()
moment onditions
X xt yt
X xt x
t
1/n
0.
0)
0)
(
t
t
t
t
That is, the GLS estimator in this
ase has an obvious representation as a GMM estimator.
With auto
orrelation, the representation exists but it is a little more
ompli
ated. Nevertheless, the idea is the same. There are a few points:
The (feasible) GLS estimator is known to be asymptoti
ally e
ient in the
lass of linear
asymptoti
ally unbiased estimators (Gauss-Markov).
This means that it is more e ient than the above example of OLS with White's het-
This means that the hoi e of the moment onditions is important to a hieve e ien y.
15.9.3 2SLS
Consider the linear model
yt = zt + t ,
or
y = Z +
using the usual
onstru
tion, where
is
K1
and
Dene
xt
zt
(suppose that
xt
is
r 1).
1
X Z
15.9.
Sin
e
with
t
x, z
K -dimensional
so
moment ondition
m() = 1/n
233
Sin e we have
parameters and
zt zt
mt () = zt (yt zt )
and
zt yt zt .
must be un orrelated
W,
!1
so we have
X
t
1
Z
y
(
zt yt ) = Z
Z
This is the standard formula for 2SLS. We use the exogenous variables and the redu
ed form
predi
tions of the endogenous variables as instruments, and apply IV estimation. See Hamilton
pp. 420-21 for the var
ov formula (whi
h is the standard formula for 2SLS), and for how to
deal with
heterogeneous and dependent (basi ally, just use the Newey-West or some other
onsistent estimator of
Note that
y1t = f1 (zt , 10 ) + 1t
y2t = f2 (zt , 20 ) + 2t
.
.
.
0
yGt = fG (zt , G
) + Gt ,
or in
ompa
t notation
yt = f (zt , 0 ) + t ,
where
f ()
is a
We need to nd an
with
it .
0 ) .
0 = (10 , 20 , , G
P
G
i=1 Ai
.
.
zt ,
orthogonality onditions
234
CHAPTER 15.
trivial problem.
A note on e
ien
y: the sele
ted set of instruments has important ee
ts on the e
ien
y
of estimation.
mo-
than others, sin
e the moment
onditions
ould be highly
orrelated. A GMM estimator that
hose an optimal set of
moment onditions would be fully e ient. Here we'll see that the
yt
be a
Yt = (y1 , y2 , ..., yt ) .
Then at time
t, Yt1
has
been observed (refer to it as the information set, sin
e we assume the
onditioning variables
have been sele
ted to take advantage of all useful information). The likelihood fun
tion is the
joint density of the sample:
ln L() =
n
X
t=1
ln f (yt |Yt1 , ).
Dene
mt (Yt , ) D ln f (yt|Yt1 , )
as the
s ore
of the
tth
observation.
E onometri s):
1/n
n
X
t=1
= 1/n
mt (Yt , )
n
X
t=1
= 0,
D ln f (yt |Yt1 , )
15.9.
235
whi
h are pre
isely the rst order
onditions of MLE. Therefore, MLE
an be interpreted as
a GMM estimator. The GMM var
ov formula is
V = D 1 D
1
= 1/n
d
D2 ln f (yt |Yt1 , )
D
m(Y
,
)
=
t
t=1
Yts ,
mt
mts , s > 0
and
t.
mts
is a
Un onditional un orrelation
follows from the fa t that onditional un orrelation hold regardless of the realization of
Yt1 ,
Yt1
ML estimation, above). The fa t that the s ores are serially un orrelated implies that
b = 1/n
n
X
t (Yt , )
= 1/n
mt (Yt , )m
n h
ih
i
X
D ln f (yt |Yt1 , )
D ln f (yt |Yt1 , )
t=1
t=1
Re all from study of ML estimation that the information matrix equality (equation
??) states
that
n
o
= E D2 ln f (yt |Yt1 , 0 ) .
D ln f (yt |Yt1 , 0 ) D ln f (yt |Yt1 , 0 )
This result implies the well known (and already seeen) result that we an estimate
in any
of three ways:
nP
o
n
2 ln f (y |Y
D
,
)
t
t1
t=1
P h
ih
i 1
n
c
D ln f (yt |Yt1 , )
V = n
n
o
Pn
D2 ln f (yt|Yt1 , )
t=1
or the inverse of the negative of the Hessian (sin
e the middle and last term
an
el,
ex
ept for a minus sign):
"
Vc
= 1/n
n
X
t=1
#1
D2 ln f (yt |Yt1 , )
or the inverse of the outer produ
t of the gradient (sin
e the middle and last
an
el
ex
ept for a minus sign, and the rst term
onverges to minus the inverse of the middle
term, whi
h is still inside the overall inverse)
236
CHAPTER 15.
Vc
=
1/n
n h
X
t=1
ih
i
D ln f (yt |Yt1 , )
D ln f (yt |Yt1 , )
)1
This simpli
ation is a spe
ial result for the MLE estimator - it doesn't apply to GMM
estimators in general.
Asymptoti
ally, if the model is
orre
tly spe
ied, all of these forms
onverge to the same
limit. In small samples they will dier. In parti
ular, there is eviden
e that the outer produ
t of the gradient formula does not perform very well in small samples (see Davidson and
Ma
Kinnon, pg. 477). White's
negative of the Hessian. If they dier by too mu
h, this is eviden
e of misspe
i
ation of the
model.
whi
h
ase, the Poisson ML estimator would be in
onsistent, even if the
onditional mean
were
orre
tly spe
ied.
presented in equation 13.1, using Poisson ML (better thought of as quasi-ML), and IV
1
estimation . Both estimation methods are implemented using a GMM form. Running that
s
ript gives the output
OBDV
******************************************************
IV
GMM Estimation Results
BFGS
onvergen
e: Normal
onvergen
e
Obje
tive fun
tion value: 0.004273
Observations: 4564
No moment
ovarian
e supplied, assuming effi
ient weight matrix
X^2 test
1
Value
19.502
df
3.000
p-value
0.000
The validity of the instruments used may be debatable, but real data sets often don't ontain ideal instru-
ments.
15.10.
237
estimate
st. err
t-stat
p-value
onstant
-0.441
0.213
-2.072
0.038
pub. ins.
-0.127
0.149
-0.851
0.395
priv. ins.
-1.429
0.254
-5.624
0.000
sex
0.537
0.053
10.133
0.000
age
0.031
0.002
13.431
0.000
edu
0.072
0.011
6.535
0.000
in
0.000
0.000
4.500
0.000
******************************************************
******************************************************
Poisson QML
GMM Estimation Results
BFGS
onvergen
e: Normal
onvergen
e
Obje
tive fun
tion value: 0.000000
Observations: 4564
No moment
ovarian
e supplied, assuming effi
ient weight matrix
Exa
tly identified, no spe
. test
estimate
st. err
t-stat
p-value
onstant
-0.791
0.149
-5.289
0.000
pub. ins.
0.848
0.076
11.092
0.000
priv. ins.
0.294
0.071
4.136
0.000
sex
0.487
0.055
8.796
0.000
age
0.024
0.002
11.469
0.000
edu
0.029
0.010
3.060
0.002
in
-0.000
0.000
-0.978
0.328
******************************************************
Note how the Poisson QML results, estimated here using a GMM routine, are the same as
were obtained using the ML estimation routine (see subse
tion 13.4.2). This is an example of
how (Q)ML may be represented as a GMM estimator. Also note that the IV and QML results
are
onsiderably dierent.
oe
ient to
hange. Perhaps it is logi
al that people who own private insuran
e make fewer
visits, if they have to make a
o-payment. Note that in
ome be
omes positive and signi
ant
when PRIV is treated as endogenous.
238
CHAPTER 15.
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
2.28
2.3
2.32
2.34
2.36
2.38
Perhaps the dieren
e in the results depending upon whether or not PRIV is treated as
endogenous
an suggest a method for testing exogeneity. Onward to the Hausman test!
yt = xt + t .
form and the
hoi
e of regressors is
orre
t, but that the some of the regressors may be
orrelated with the error term, whi
h as you know will produ
e in
onsisten
y of
For example,
.
is auto orrelated.
To illustrate, the O
tave program biased.m performs a Monte Carlo experiment where errors
are
orrelated with regressors, and estimation is by OLS and IV. The true value of the slope
oe
ient used to generate the data is
= 2.
quite biased, while Figure 15.2 shows that the IV estimator is on average mu
h
loser to the
true value. If you play with the program, in
reasing the sample size, you
an see eviden
e that
the OLS estimator is asymptoti
ally biased, while the IV estimator is
onsistent.
We have seen that in
onsistent and the
onsistent estimators
onverge to dierent probability limits.
This is the idea behind the Hausman test - a pair of onsistent estimators
15.11.
239
Figure 15.2: IV
IV estimates
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
1.9
1.92
1.94
1.96
1.98
2.02
2.04
2.06
2.08
onverge to the same probability limit, while if one is onsistent and the other is not they
e.g.,
the dieren e between the estimators is signi antly dierent from zero.
et .),
why should we be
interested in testing - why not just use the IV estimator? Be
ause the OLS estimator
is more e
ient when the regressors are exogenous and the other
lassi
al assumptions
(in
luding normality of the errors) hold. When we have a more e
ient estimator that
relies on stronger assumptions (su
h as exogeneity) than the IV estimator, we might
prefer to use it, unless we have eviden
e that the assumptions are false.
Equation 4.7 is
a.s.
n 0 H (0 )1 ng(0 ).
H () = I ().
a.s.
n 0 I (0 )1 ng(0 ).
Also, equation 4.9 tells us that the asymptoti
ovarian
e between any CAN estimator and
240
CHAPTER 15.
# "
#
"
n
V ()
IK
.
=
V
IK
I ()
ng()
Now,
onsider
"
#
#"
n
n
a.s.
.
I ()1
n
ng()
IK
0K
0K
#"
"
#"
n
IK
I
0
V
(
)
I
K
K
K
=
V
1
0K
0K I ()
IK
I ()
n
#
"
V ()
I ()1
,
=
I ()1 I ()1
0K
I ()1
"
#
1
n
V
(
)
I
()
=
V
.
I ()1 V ()
n
So, the asymptoti
ovarian
e between the MLE and any other CAN estimator is equal to the
MLE asymptoti
varian
e (the inverse of the information matrix).
Now, suppose we with to test whether the the two estimators are in fa
t both
onverging
0 , versus the alternative hypothesis that the MLE estimator is not in fa
t
onsistent (the
is a maintained hypothesis). Under the null hypothesis that they are, we have
onsisten
y of
to
IK
IK
n 0
= n ,
n 0
So,
where
d
V ()
.
n N 0, V ()
1
d
V ()
n
V ()
2 (),
asymptoti distribution is
1
d
V ()
V ()
2 ().
This is the Hausman test statisti
, in its original form. The reason that this test has power
under the alternative hypothesis is that in that
ase the MLE estimator will not be
onsistent,
15.11.
241
asymptoti distribution
of ve tor
Note: if the test is based on a sub-ve
tor of the entire parameter ve
tor of the MLE,
it is possible that the in
onsisten
y of the MLE will not show up in the portion of the
ve
tor that has been used. If this is the
ase, the test may not have power to dete
t the
in
onsisten
y. This may o
ur, for example, when the
onsistent but ine
ient estimator
is not identied for all the parameters of the model.
The rank,
, of the dieren e of the asymptoti varian es is often less than the dimension
of the matri
es, and it may be di
ult to determine what the true rank is. If the true
rank is lower than what is taken to be true, the test will be biased against reje
tion of
the null hypothesis. The
ontrary holds if we underestimate the rank.
A solution to this problem is to use a rank 1 test, by
omparing only a single
oe
ient.
For example, if a variable is suspe
ted of possibly being endogenous, that variable's
oe
ients may be
ompared.
This simple formula only holds when the estimator that is being tested for onsisten y is
fully
e ient under the null hypothesis. This means that it must be a ML estimator or a
242
CHAPTER 15.
fully e
ient estimator that has the same asymptoti
distribution as the ML estimator.
This is quite restri
tive sin
e modern estimators su
h as GMM and QML are not in
general fully e
ient.
Following up on this last point, let's think of two not ne essarily e ient estimators,
and
2 ,
where one is assumed to be
onsistent, but the other may not be. We assume for expositional
simpli
ity that both
and
expressed as generalized method of moments (GMM) estimators. The estimators are dened
(suppressing the dependen
e upon data) by
i = arg min mi (i ) Wi mi (i )
i
where
mi (i )
is a
weighting matrix,
gi 1
i = 1, 2.
Wi
gi gi
is a
i
h
1 , 2 = arg min m1 (1 ) m2 (2 )
"
W1
0(g2 g1 )
0(g1 g2 )
W2
#"
positive denite
m1 (1 )
m2 (2 )
(15.7)
#)
(
"
m1 (1 )
= lim V ar
n
n
m2 (2 )
!
1 12
.
(15.8)
and
(or
subve
tors of the two) applied to the omnibus GMM estimator, but with the
ovarian
e of the
moment
onditions estimated as
b=
c1
0(g2 g1 )
0(g1 g2 )
c2
12
the test statisti
when one of the estimators is asymptoti
ally e
ient, as we have seen above,
and thus it need not be estimated.
The general solution when neither of the estimators is e
ient is
lear: the entire
must be estimated
onsistently, sin
e the
matrix
, e.g.,
the Newey-West estimator dis
ussed previously. The Hausman test using a proper estimator
of the overall
ovarian
e matrix will now have an asymptoti
A new test an be
15.12.
243
where
1 , 2 = arg min
m1 (1
m2 (2
i 1
e
"
m1 (1 )
m2 (2 )
(15.9)
of equation 15.8.
By
standard arguments, this is a more e
ient estimator than that dened by equation 15.7, so
the Wald test using this alternative is more powerful. See my arti
le in
Applied E onomi s,
2004, for more details, in
luding simulation results. The O
tave s
ript hausman.m
al
ulates
the Wald test
orresponding to the e
ient joint GMM estimator (the H2 test in my paper),
for a simple linear model.
Though GMM estimation has many appli
ations, appli
ation to rational expe
tations models is elegant, sin
e theory dire
tly suggests the moment
onditions. Hansen and Singleton's
1982 paper is also a
lassi
worth studying in itself. Though I strongly re
ommend reading
the paper, I'll use a simplied model with similar notation to Hamilton's.
We assume a representative
onsumer maximizes expe
ted dis
ounted utility over an innite horizon. Utility is temporally additive, and the expe
ted utility hypothesis holds. The
future
onsumption stream is the sto
hasti
sequen
e
X
s=0
The parameter
It
is the
indexed
s E (u(ct+s )|It ) .
(15.10)
information set
t
{ct }
t=0 .
at time
t,
and earlier.
ct
wt .
Suppose the
onsumer
an invest in a risky asset. A dollar invested in the asset yields a
gross return
(1 + rt+1 ) =
where
pt
dt
pt+1 + dt+1
pt
t.
The pri e of
ct
is normalized to
1.
Current wealth
wt = (1 + rt )it1 ,
where
it1
is investment in period
t 1.
So the
problem is to allo
ate
urrent wealth between
urrent
onsumption and investment to
nan
e future
onsumption:
wt = ct + it .
244
CHAPTER 15.
rt+s , s > 0
are
not known
in period
t:
A partial set of ne essary onditions for utility maximization have the form:
u (ct ) = E (1 + rt+1 ) u (ct+1 )|It .
(15.11)
To see that the ondition is ne essary, suppose that the lhs < rhs.
u (ct ),
sin e there is no
dis
ounting of the
urrent period. At the same time, the marginal redu
tion in
onsumption
nan
es investment, whi
h has gross return
period
by
t + 1.
(1 + rt+1 ) ,
This in rease in onsumption would ause the obje tive fun tion to in rease
Therefore, unless the ondition holds, the expe ted dis ounted
u(ct ) =
where
c1
1
t
1
u (ct ) = c
t
so the fo
are
o
n
c
t = E (1 + rt+1 ) ct+1 |It
n
o
E ct (1 + rt+1 ) c
|It = 0
t+1
ct
is stationary, even
though it is in real terms, and our theory requires stationarity. To solve this, divide though
by
c
t
E
(note that
ct
1-
(1 + rt+1 )
ct+1
ct
)!
|It = 0
1-
ht () dened above:
(1 + rt+1 )
ct+1
ct
It .
)
t).
Now
is analogous to
ct
zt
1 (1 + rt+1 )
ct+1
ct
zt mt ()
15.13.
245
represents
and
t, mts
0.
set. By rational expe tations, the auto ovarian es of the moment onditions other than
should be zero. The optimal weighting matrix is therefore the inverse of the varian
e of the
moment
onditions:
= lim E nm( 0 )m( 0 )
= 1/n
n
X
t ()
mt ()m
t=1
whi h an be obtained
After
we then minimize
1 m().
s() = m()
This pro
ess
an be iterated, e.g., use the new estimate to re-estimate
In prin iple, we ould use a very large number of moment onditions in estimation, sin e
ould be used in
xt . Sin e
will lead to a more (asymptoti
ally) e
ient estimator, one might be tempted to use
many instrumental variables. We will do a
omputer lab that will show that this may
not be a good idea with nite samples. This issue has been studied using Monte Carlos
(Tau
hen,
JBES, 1986).
Empiri
al papers that use this approa
h often have serious problems in obtaining pre
ise
estimates of the parameters. Note that we are basing everything on a single parial rst
order
ondition. Probably this f.o.
. is simply not informative enough. Simulation-based
estimation methods (dis
ussed below) are one means of trying to use more informative
moment
onditions to estimate this sort of model.
JBES,
c, p,
and
and
r,
as
246
CHAPTER 15.
******************************************************
Example of GMM estimation of rational expe
tations model
GMM Estimation Results
BFGS
onvergen
e: Normal
onvergen
e
Obje
tive fun
tion value: 0.000014
Observations: 94
X^2 test
Value
0.001
df
1.000
p-value
0.971
estimate
st. err
t-stat
p-value
beta
0.915
0.009
97.271
0.000
gamma
0.569
0.319
1.783
0.075
******************************************************
For two lags the estimation results are
******************************************************
Example of GMM estimation of rational expe
tations model
GMM Estimation Results
BFGS
onvergen
e: Normal
onvergen
e
Obje
tive fun
tion value: 0.037882
Observations: 93
X^2 test
Value
3.523
df
3.000
p-value
0.318
estimate
st. err
t-stat
p-value
beta
0.857
0.024
35.636
0.000
gamma
-2.351
0.315
-7.462
0.000
******************************************************
15.13.
247
problem here: poor instruments, or possibly a
onditional moment that is not very informative.
Moment
onditions formed from Euler
onditions sometimes do not identify the parameter of
a model. See Hansen, Heaton and Yarron, (1996)
haven't
he
ked it
arefully)?
JBES
248
CHAPTER 15.
Dn ,
mt (),
what is the e ient weight matrix, and show that the ovarian e matrix
gmm_results),
mle_results).
( ) The two estimators should oin ide. Prove analyti ally that the estimators oi ide.
1 m()
n m()
has a
2 (g K)
dis-
tribution. That is, show that the monster matrix is idempotent and has tra e equal to
g K.
5. For the portfolio example, experiment with the program using lags of 3 and 4 periods to
dene instruments
= (, )
and
to onvergen e.
(b) Comment on the results. Are the results sensitive to the set of instruments used?
Look at
as well as
Chapter 16
Quasi-ML
Quasi-ML is the estimator one obtains when a misspe
ied probability model is used to
al
ulate an ML estimator.
Given a sample of size
of a random ve tor
Y=
0 :
y
y1 . . . yn
onditional on X = x1 . . . xn
pY (Y|X, ), . The true joint density is asso
iated
x,
is a
with
pY (Y|X, 0 ).
X
doesn't depend on
0 ,
hara terizes the random hara teristi s of samples: i.e., it fully des ribes the probabilisti ally
Let
Yt1 =
L(Y|X, ) = pY (Y|X, ), .
y1 . . . yt1
Y0 = 0,
and let
Xt =
The likelihood
pt ().
Mistakenly, we
x1 . . . xt
L(Y|X, ) =
n
Y
t=1
n
Y
pt (yt |Yt1 , Xt , )
pt ()
t=1
1
1X
sn () = ln L(Y|X, ) =
ln pt ()
n
n
t=1
0
where there is no su
h that
(this
250
CHAPTER 16.
QUASI-ML
This setup allows for heterogeneous time series data, with dynami misspe i ation.
whi h we refer to as the quasi-log likelihood fun tion. This obje tive fun tion is
1X
ln ft (yt |Yt1 , Xt , 0 )
n t=1
sn () =
1X
ln ft ()
n
t=1
n = arg max sn ()
1X
ln ft () s ()
sn () lim E
n n
t=1
a.s.
We assume that this
an be strengthened to uniform
onvergen
e, a.s., following the previous
arguments. The pseudo-true value of
s():
0 = arg max s ()
lim n = 0 , a.s.
d
n 0 N 0, J ( 0 )1 I ( 0 )J ( 0 )1
where
J ( 0 ) = lim ED2 sn ( 0 )
n
and
I ( 0 ) = lim V ar nD sn ( 0 ).
n
Note that asymptoti normality only requires that the additional assumptions regarding
and
hold in a neighborhood of
for
and at
0,
for
I,
not throughout
In this
16.1.
251
J ( 0 )
is straightforward.
1X 2
1X 2
a.s.
D ln ft (n ) lim E
D ln ft ( 0 ) = J ( 0 ).
Jn (n ) =
n n
n t=1
t=1
Notation:
Let
I ( 0 )
in pla e of
0.
gt D ft ( 0 )
We need to estimate
I ( 0 ) =
lim V ar nD sn ( 0 )
n
1X
D ln ft ( 0 )
= lim V ar n
n
n
t=1
n
X
1
gt
V ar
n
t=1
( n
! n
! )
X
X
1
= lim E
(gt Egt )
(gt Egt )
n n
t=1
t=1
=
lim
1X
(Egt ) (Egt )
n n
t=1
lim
sin
e it requires
al
ulating an expe
tation using the true density under the d.g.p., whi
h is
unknown.
I ( 0 ) is
ross se tional data, for example. (Note: under i.i.d. sampling, the joint distribution of
With random sampling, the limiting obje tive fun tion is simply
s ( 0 ) = EX E0 ln f (y|x, 0 )
where
E0
density of
x.
y|x
and
EX
By the requirement that the limiting obje tive fun tion be maximized at
D EX E0 ln f (y|x, 0 ) = D s ( 0 ) = 0
we have
252
CHAPTER 16.
QUASI-ML
The dominated onvergen e theorem allows swit hing the order of expe tation and differentiation, so
D EX E0 ln f (y|x, 0 ) = EX E0 D ln f (y|x, 0 ) = 0
The CLT implies that
1 X
d
1X
ln ft ()
D ln ft ()D
Ib =
n
t=1
This is an important
ase where
onsistent estimation of the
ovarian
e matrix is possible.
Other
ases exist, even for dynami
ally misspe
ied time series models.
V[
(y) =
Pn
t=1
. Using the program PoissonVarian e.m, for OBDV and ERV, we get
ERV
Sample
38.09
0.151
Estimated
3.28
0.086
We see that even after
onditioning, the overdispersion is not
aptured in either
ase. There
is huge problem with OBDV, and a signi
ant problem with ERV. In both
ases the Poisson
model does not appear to be plausible. You
an
he
k this for the other use measures if you
like.
The two measures seem to exhibit extra-Poisson variation. To apture unobserved heterogeneity, a possibility is the
random parameters
approa h.
16.2.
253
exp() y
y!
= exp(x + )
fY (y|x, ) =
= exp(x) exp()
=
where
= exp(x )
and
= exp().
Now
an
fY (y|x) =
This density
exp[] y
fv (z)dz
y!
be used dire tly, perhaps using numeri al integration to evaluate the likeli-
hood fun
tion. In some
ases, though, the integral will have an analyti
solution. For example,
if
(y + )
fY (y|x, ) =
(y + 1)()
where
= (, ).
If
y
(16.1)
= /,
E(y|x) = ,
where
> 0,
= exp(x )
is parameterized.
then
V (y|x) = + .
Note that
is a fun tion of
x,
so
If
= 1/,
where
> 0,
then
V (y|x) = + 2 .
model.
So both forms of the NB model allow for overdispersion, with the NB-II model allowing for a
more radi
al form.
Testing redu
tion of a NB model to a Poisson model
annot be done by testing
= 0
using standard Wald or LR pro
edures. The
riti
al values need to be adjusted to a
ount for
the fa
t that
=0
suppose that the data were in fa t Poisson, so there is equidispersion and the true
= 0.
Then about half the time the sample data will be underdispersed, and about half the time
overdispersed. When the data is underdispersed, the MLE of
= 0. Thus, under
p
n(
) = n
of
will be
This program will do estimation using the NB model. Note how modelargs is used to sele
t
a NB-I or NB-II density. Here are NB-I estimation results for OBDV:
254
CHAPTER 16.
gradient
0.0000
-0.0000
-0.0000
0.0000
0.0000
0.0000
-0.0000
-0.0000
hange
-0.0000
0.0000
0.0000
-0.0000
-0.0000
-0.0000
0.0000
0.0000
******************************************************
Negative Binomial model, MEPS 1996 full data set
MLE Estimation Results
BFGS
onvergen
e: Normal
onvergen
e
Average Log-L: -2.185730
Observations: 4564
onstant
pub. ins.
priv. ins.
sex
age
edu
in
alpha
estimate
-0.523
0.765
0.451
0.458
0.016
0.027
0.000
5.555
Information Criteria
CAIC : 20026.7513
st. err
0.104
0.054
0.049
0.034
0.001
0.007
0.000
0.296
Avg. CAIC:
t-stat
-5.005
14.198
9.196
13.512
11.869
3.979
0.000
18.752
4.3880
p-value
0.000
0.000
0.000
0.000
0.000
0.000
1.000
0.000
QUASI-ML
16.2.
255
BIC : 20018.7513
Avg. BIC:
4.3862
AIC : 19967.3437
Avg. AIC:
4.3750
******************************************************
Note that the parameter values of the last BFGS iteration are dierent that those reported
in the nal results.
This ree ts two things - rst, the data were s aled before doing the
mle_results
>
= exp( )
= log
is used to
is used to dene
the log-likelihood fun
tion, sin
e the BFGS minimization algorithm does not do
ontrained
minimization. To get the standard error and t-statisti
of the estimate of
delta method. This is done inside
mle_results,
gradient
0.0000
-0.0000
0.0000
0.0000
0.0000
-0.0000
0.0000
-0.0000
hange
-0.0000
0.0000
-0.0000
-0.0000
0.0000
0.0000
-0.0000
0.0000
******************************************************
Negative Binomial model, MEPS 1996 full data set
256
CHAPTER 16.
QUASI-ML
onstant
pub. ins.
priv. ins.
sex
age
edu
in
alpha
st. err
0.161
0.095
0.081
0.050
0.002
0.009
0.000
0.055
t-stat
-6.622
11.611
5.880
11.166
12.240
3.106
-0.176
29.099
p-value
0.000
0.000
0.000
0.000
0.000
0.002
0.861
0.000
Information Criteria
CAIC : 20019.7439
Avg. CAIC: 4.3864
BIC : 20011.7439
Avg. BIC:
4.3847
AIC : 19960.3362
Avg. AIC:
4.3734
******************************************************
For the OBDV usage measurel, the NB-II model does a slightly better job than the NB-I
model, in terms of the average log-likelihood and the information
riteria (more on this
last in a moment).
Note that both versions of the NB model t mu
h better than does the Poisson model
(see 13.4.2).
The estimated
Pn
V[
(y) =
t )2
(
t=1 t +
. For OBDV and ERV (estimation results not reported), we get For OBDV, the
n
ERV
Sample
38.09
0.151
Estimated
30.58
0.182
overdispersion problem is signi
antly better than in the Poisson
ase, but there is still some
that is not
aptured. For ERV, the negative binomial model seems to
apture the overdispersion adequately.
16.2.
257
Subje tive,
self-reported measures may suer from the same problem, and may also not be exogenous
Finite mixture models are
on
eptually simple. The density is
p1
X
(i)
i=1
where
the
i > 0, i = 1, 2, ..., p, p = 1
Pp1
i=1
i ,
and
Pp
i=1 i
= 1.
1 2 p
i 6= j , i 6= j .
This is
The properties of the mixture density follow in a straightforward way from those of the
omponents. In parti
ular, the moment generating fun
tion is the same mixture of the
moment generating fun
tions of the
omponent densities, so, for example,
Pp
i (x)
ith
omponent density.
E(Y |x) =
Mixture densities may suer from overparameterization, sin
e the total number of parameters grows rapidly with the number of
omponent densities. It is possible to
onstrained
parameters a
ross the mixtures.
Testing for the number of omponent densities is a tri ky issue. For example, testing for
p=1
1 = 1,
1 = 1,
p=2
(a mixture of two
any value without ae
ting the density. Usual methods su
h as the likelihood ratio test
are not appli
able when parameters are on the boundary under the null hypothesis.
Information
riteria means of
hoosing the model (see below) are valid.
The following results are for a mixture of 2 NB-II models, for the OBDV data, whi
h you
an
repli
ate using this program .
OBDV
******************************************************
258
CHAPTER 16.
QUASI-ML
estimate
0.127
0.861
0.146
0.346
0.024
0.025
-0.000
1.351
0.525
0.422
0.377
0.400
0.296
0.111
0.014
1.034
0.257
st. err
0.512
0.174
0.193
0.115
0.004
0.016
0.000
0.168
0.196
0.048
0.087
0.059
0.036
0.042
0.051
0.187
0.162
t-stat
0.247
4.962
0.755
3.017
6.117
1.590
-0.214
8.061
2.678
8.752
4.349
6.773
8.178
2.634
0.274
5.518
1.582
p-value
0.805
0.000
0.450
0.003
0.000
0.112
0.831
0.000
0.007
0.000
0.000
0.000
0.000
0.008
0.784
0.000
0.114
Information Criteria
CAIC : 19920.3807
Avg. CAIC: 4.3647
BIC : 19903.3807
Avg. BIC:
4.3610
AIC : 19794.1395
Avg. AIC:
4.3370
******************************************************
It is worth noting that the mixture parameter is not signi
antly dierent from zero, but
also not that the
oe
ients of publi
insuran
e and age, for example, dier quite a bit between
the two latent
lasses.
16.2.
259
are
+ k(ln n + 1)
CAIC = 2 ln L()
+ k ln n
BIC = 2 ln L()
+ 2k
AIC = 2 ln L()
It
an be shown that the CAIC and BIC will sele
t the
orre
tly spe
ied model from a group
of models, asymptoti
ally. This doesn't mean, of
ourse, that the
orre
t model is ne
esarily
in the group. The AIC is not
onsistent, and will asymptoti
ally favor an over-parameterized
model over the
orre
tly spe
ied model. Here are information
riteria values for the models
we've seen, for OBDV. Pretty
learly, the NB models are better than the Poisson. The one
AIC
BIC
CAIC
Poisson
7.345
7.355
7.357
NB-I
4.375
4.386
4.388
NB-II
4.373
4.385
4.386
MNB-II
4.337
4.361
4.365
additional parameter gives a very signi
ant improvement in the likelihood fun
tion value.
Between the NB-I and NB-II models, the NB-II is slightly favored. But one should remember
that information
riteria values are statisti
s, with varian
es. With another sample, it may
well be that the NB-I model would be favored, sin
e the dieren
es are so small. The MNB-II
model is favored over the others, by all 3 information
riteria.
Why is all of this in the
hapter on QML? Let's suppose that the
orre
t model for OBDV
is in fa
t the NB-II model. It turns out in this
ase that the Poisson model will give
onsistent
estimates of the slope parameters (if a model is a member of the linear-exponential family
and the
onditional mean is
orre
tly spe
ied, then the parameters of the
onditional mean
will be
onsistently estimated).
is
onsistent for some parameters of the true model. The ordinary OPG or inverse Hessian
ML
ovarian
e estimators are however biased and in
onsistent, sin
e the information matrix
equality does not hold for QML estimators. But for i.i.d. data (whi
h is the
ase for the MEPS
data) the QML asymptoti
ovarian
e
an be
onsistently estimated, as dis
ussed above, using
the sandwi
h form for the ML estimator.
Poisson estimation results would be reliable for inferen
e even if the true model is the NB-I
or NB-II. Not that they are in fa
t similar to the results for the NB models.
However, if we assume that the
orre
t model is the MNB-II model, as is favored by the
information
riteria, then both the Poisson and NB-x models will have misspe
ied mean
fun
tions, so the parameters that inuen
e the means would be estimated with bias and
in
onsistently.
260
CHAPTER 16.
QUASI-ML
be a latent index of health status that has expe tation equal to unity.
We suspe t that
and
P RIV
is un orrelated
Poisson()
= exp(1 + 2 P U B + 3 P RIV +
(16.2)
Sin
e mu
h previous eviden
e indi
ates that health
are servi
es usage is overdispersed ,
this is almost
ertainly not an ML estimator, and thus is not e
ient. However, when
and
P RIV
and
P RIV
are orrelated,
Mullahy's (1997) NLIV estimator that uses the residual fun tion
=
where
y
1,
instruments we use all the exogenous regressors, as well as the
ross produ
ts of
with the variables in
As
PUB
W = {1 P U B Z P U B Z }.
(a) Cal
ulate the Poisson QML estimates.
(b) Cal
ulate the generalized IV estimates (do it using a GMM formulation - see the
portfolio example for hints how to do this).
(
) Cal
ulate the Hausman test statisti
to test the exogeneity of PRIV.
(d)
omment on the results
1
2
Chapter 17
Nonlinear least squares (NLS)
Readings:
yt = f (xt , 0 ) + t .
In general,
will be heteros edasti and auto orrelated, and possibly nonnormally dis-
tributed. However, dealing with this is exa
tly as in the
ase of linear models, so we'll
just treat the iid
ase here,
t iid(0, 2 )
If we sta
k the observations verti
ally, dening
y = (y1 , y2 , ..., yn )
f = (f (x1 , ), f (x1 , ), ..., f (x1 , ))
and
= (1 , 2 , ..., n )
we
an write the
observations as
y = f () +
Using this notation, the NLS estimator
an be dened as
1
1
arg min sn () = [y f ()] [y f ()] = k y f () k2
n
n
The estimator minimizes the weighted sum of squared errors, whi
h is the same as
minimizing the Eu
lidean distan
e between
261
and
f ().
262
CHAPTER 17.
sn () =
1
y y 2y f () + f () f () ,
n
Dene the
nK
In shorthand, use
0.
f () y +
f () f ()
matrix
D f ().
F()
in pla e of
F().
(17.1)
0,
y + F
f ()
F
or
h
i
0.
y f ()
F
(17.2)
This bears a good deal of similarity to the f.o.
. for the linear model - the derivative of the
predi
tion is orthogonal to the predi
tion error. If
X y X X = 0,
the usual 0LS f.o.
.
Note that the nonlinearity of the manifold leads to potential multiple lo
al maxima,
minima and saddlepoints: the obje
tive fun
tion
sn ()
( 0 )
< s (), 6=
17.2.
263
IDENTIFICATION
requires that
D2 s ( 0 )
sn () =
1X
[yt f (xt , )]2
n
t=1
=
=
n
2
1 X
f (xt , 0 ) + t ft (xt , )
n t=1
n
n
2 1 X
1 X
(t )2
ft ( 0 ) ft () +
n
n
t=1
t=1
n
2 X
ft ( 0 ) ft () t
n
t=1
As in example 14.4, whi
h illustrated the
onsisten
y of extremum estimators using OLS,
we
on
lude that the se
ond term will
onverge to a
onstant whi
h does not depend
upon
long as
f () and
gen e.
are un orrelated.
assume it holds.
Turning to the rst term, we'll assume a pointwise law of large numbers applies, so
n
2 a.s.
1 X
ft ( 0 ) ft ()
n t=1
where
(x)
For example if
2
f (z, 0 ) f (z, ) d(z),
x.
In many ases,
f (x, )
will
(17.3)
be bounded
a bounded
0.
toti
), the question is whether or not there may be some other minimizer. A lo
al
ondition
for identi
ation is that
2
2
s
()
=
be positive denite at
0.
2
f (x, 0 ) f (x, ) d(x)
2
f (x, ) f (x, ) d(x)
0
=2
0
D f (z, 0 )
D f (z, 0 ) d(z)
264
CHAPTER 17.
the expe tation of the outer produ t of the gradient of the regression fun tion evaluated at
0.
(Note: the uniform boundedness we have already assumed allows passing the derivative
denite (wp1) as long as the gradient ve
tor is of full rank (wp1). The tangent spa
e to the
regression manifold must span a
-dimensional parameter ve
tor. This is analogous to the requirement that there be no perfe
t
olinearity in a linear model. This is a ne
essary
ondition for identi
ation. Note that the
LLN implies that the above expe
tation is equal to
J ( 0 ) = 2 lim E
F F
n
17.3 Consisten
y
We simply assume that the
onditions of Theorem 19 hold, so the estimator is
onsistent.
Given that the strong sto
hasti
equi
ontinuity
onditions hold, as dis
ussed above, and given
the above identi
ation
onditions an a
ompa
t estimation spa
e (the
losure of the parameter
spa
e
),
where
J ( 0 )
d
n 0 N 0, J ( 0 )1 I ( 0 )J ( 0 )1 ,
is the almost sure limit of
2
sn () evaluated at
0,
and
I ( 0 ) = lim V ar nD sn ( 0 )
The obje
tive fun
tion is
sn () =
So
D sn () =
Evaluating at
1X
[yt f (xt , )]2
n t=1
0,
2X
[yt f (xt , )] D f (xt , ).
n t=1
n
2X
D sn ( ) =
t D f (xt , 0 ).
n t=1
0
and
xt
to al ulate the varian e, we an simply al ulate the se ond moment about zero. Also note
17.5.
265
that
n
X
t D f (xt , 0 ) =
t=1
0
f ( )
= F
I ( 0 ) = lim V ar nD sn ( 0 )
4
= lim nE 2 F ' F
n
F F
= 4 2 lim E
n
We've already seen that
J ( 0 ) = 2 lim E
F F
,
n
J ( 0 )
pressions for
get
and
I ( 0 ),
and
d
n 0 N
F F
0, lim E
n
1
F
n
where
!1
2,
(17.4)
2 =
i h
i
y f ()
y f ()
n
the obvious estimator. Note the lose orresponden e to the results for the linear model.
yt
onditional on
ount data
xt
A Poisson random
This sort
of model has been used to study visits to do
tors per year, number of patents registered by
businesses per year,
et .
f (yt ) =
The mean of
yt
is
t ,
exp(t )yt t
, yt {0, 1, 2, ...}.
yt !
266
CHAPTER 17.
mean is
0t = exp(xt 0 ),
whi
h enfor
es the positivity of
t .
Suppose we estimate
n
2
1X
yt exp(xt )
= arg min sn () =
T t=1
We
an write
sn () =
=
n
2
1X
exp(xt 0 + t exp(xt )
T t=1
n
n
n
2 1 X
1X
1X
2
0
t + 2
t exp(xt 0 exp(xt )
exp(xt exp(xt ) +
T
T
T
t=1
t=1
t=1
The last term has expe
tation zero sin
e the assumption that
that
implies
t . Applying
a strong LLN, and noting that the obje
tive fun
tion is
ontinuous on a
ompa
t parameter
spa
e, we get
2
s () = Ex exp(x 0 exp(x ) + Ex exp(x 0 )
where the last term
omes from the fa
t that the
onditional varian
e of
varian
e of
is learly minimized at
spe
i
forms
to verify that it
an be applied.
2
n 0 .
The Gauss-Newton optimization te
hnique is spe
i
ally designed for nonlinear least squares.
The idea is to linearize the nonlinear model, rather than the obje
tive fun
tion. The model is
y = f ( 0 ) + .
At some
0,
we have
y = f () +
where
0
rather than the true value . Take a rst order Taylor's series
17.6.
267
Dene
1 :
1 + + approximation
y = f ( 1 ) + D f 1
z y f ( 1 )
and
b ( 1 ).
error.
z = F( 1 )b + ,
where, as above,
F( 1 ) D f ( 1 ) is the nK
1
evaluated at , and
is
1.
Note that
Given
b,
is known, given
as
2 = b + 1 .
2
Taylor's series expansion around and repeat the pro
ess. Stop when
b = 0
(to within
To see why this might work,
onsider the above approximation, but evaluated at the NLS
estimator:
+ F()
+
y = f ()
b is
1 h
i
b = F
F y f () .
h
i
0
y f ()
F
by denition of the NLS estimator (these are the normal equations as in equation 17.2, Sin e
b 0
when we evaluate at
The Gauss-Newton method doesn't require se ond derivatives, as does the Newton-
as a
In
fa
t, a normal OLS program will give the NLS var
ov estimator dire
tly, sin
e it's just
the OLS var
ov estimator from the last iteration.
F() F(),
y = 1 + 2 xt 3 + t
268
CHAPTER 17.
2 0, 3
When evaluated at
F F
will be nearly
17.7 Appli
ation: Limited dependent variables and sample sele
tion
Readings:
not required for reading, and whi
h is a bit out-dated. Nevertheless it's a good pla
e to start
if you en
ounter sample sele
tion problems in your resear
h).
Sample sele
tion is a
ommon problem in applied resear
h.
observations used in estimation are sampled non-randomly, a ording to some sele tion s heme.
Oer wage:
Reservation wage:
s = x +
wo = z +
wr = q +
z + q +
w =
r +
We have the set of equations
s = x +
w = r + .
Assume that
"
"
0
0
# "
,
#!
17.7.
We assume that the oer wage and the reservation wage, as well as the latent variable
are
w = 1 [w > 0]
s = ws .
In other words, we observe whether or not a person is working. If the person is working, we
observe labor supply, whi
h is equal to latent labor supply,
s .
Otherwise,
s = 0 6= s .
Note
that we are using a simplifying assumption that individuals
an freely
hoose their weekly
hours of work.
Suppose we estimated the model
s = x + residual
using only observations for whi
h
whi h w
or equivalently,
<
r and
E | < r 6= 0,
and
elements of
an enter in
sin e
> 0,
s > 0.
r.
sin e
in
onsistent.
Consider more
arefully
E [| < r ] .
and
we
an
122)
= + ,
where
s = x + + .
If we
ondition this equation on
< r
we get
s = x + E(| < r ) +
whi
h may be written as
s = x + E(| > r ) +
z N (0, 1)
E(z|z > z ) =
where
()
and
()
(z )
,
(z )
are the standard normal density and distribution fun tion, respe -
270
CHAPTER 17.
IM R(z ) =
(z )
(z )
With this we
an write (making use of the fa
t that the standard normal density is
symmetri
about zero, so that
(a) = (a)):
(r )
+
(r )
#
"
i
(r )
s = x +
where
= .
regressors
(r )
(r )
(r )
(17.5)
+ .
(17.6)
He kman showed how one an estimate this in a two step pro edure where rst
is
estimated, then equation 17.6 is estimated by least squares using the estimated value of
to form the regressors. This is ine ient and estimation of the ovarian e is a tri ky
The model presented above depends strongly on joint normality. There exist many alternative models whi
h weaken the maintained assumptions. It is possible to estimate
onsistently without distributional assumptions. See Ahn and Powell,
metri s, 1994.
Journal of E ono-
Chapter 18
Nonparametri
inferen
e
18.1 Possible pitfalls of parametri
inferen
e: estimation
Readings:
tions,
149-70.
In this se
tion we
onsider a simple example, whi
h illustrates both why nonparametri
methods may in some
ases be preferred to parametri
methods.
We suppose that data is generated by random sampling of
uniformly distributed on
(0, 2),
and
where
y = f (x) +, x
is
f (x) = 1 +
3x x 2
2
2
(y, x),
f (x)
with respe t to
x,
throughout the
x.
f (x) is unknown.
f (x) about some point x0 . Flexible fun tional forms su h as the trans endental
logarithmi
(usually know as the translog)
an be interpreted as se
ond order Taylor's series
approximations. We'll work with a rst order approximation, for simpli
ity. Approximating
about
x0 :
h(x) = f (x0 ) + Dx f (x0 ) (x x0 )
x0 = 0,
we an write
h(x) = a + bx
The
oe
ient
derivative at
x = 0.
x = 0,
These are of ourse not known. One might try estimation by ordinary
s(a, b) = 1/n
n
X
t=1
271
272
CHAPTER 18.
NONPARAMETRIC INFERENCE
3.5
approx
true
3.0
2.5
2.0
1.5
1.0
0
The limiting obje
tive fun
tion, following the argument we used to get equations 14.1 and 17.3
is
s (a, b) =
The theorem regarding the
onsisten
y of extremum estimators (Theorem 19) tells us that
and
b will
onverge almost surely to the values that minimize the limiting obje
tive fun
tion.
1
h(x)
s (a, b) obtains its minimum at a0 = 76 , b0 = 1 .
h (x) = 7/6 + x/
In Figure 18.1 we see the true fun
tion and the limit of the approximation to see the asymptoti
bias as a fun
tion of
x.
(The approximating model is the straight line, the true model has
urvature.) Note that
the approximating model is in general in
onsistent, even at the approximation point.
This
shows that exible fun
tional forms based upon Taylor's series approximations do not in
general lead to
onsistent estimation of fun
tions.
The approximating model seems to t the true model fairly well, asymptoti
ally. However,
we are interested in the elasti
ity of the fun
tion.
(x) = x (x)/(x)
Good approximation of the elasti
ity over the range of
1
The following results were obtained using the free omputer algebra system (CAS) Maxima. Unfortunately,
18.1.
273
0.7
approx
true
0.6
0.5
0.4
0.3
0.2
0.1
0.0
0
both
f (x)
and
f (x)
x.
(x) = xh (x)/h(x)
In Figure 18.2 we see the true elasti
ity and the elasti
ity obtained from the limiting approximating model.
The true elasti
ity is the line that has negative slope for large
x.
elasti
ity is not approximated so well. Root mean squared error in the approximation of the
elasti
ity is
Z
2
0
((x) (x))2 dx
1/2
= . 31546
Now suppose we use the leading terms of a trigonometri
series as the approximating
model. The reason for using a trigonometri
series as an approximating model is motivated
by the asymptoti
properties of the Fourier exible fun
tional form (Gallant, 1981, 1982),
whi
h we will study in more detail below. Normally with this type of model the number of
basis fun
tions is an in
reasing fun
tion of the sample size.
fun
tion xed. We will
onsider the asymptoti
behavior of a xed model, whi
h we interpret
as an approximation to the estimator's behavior in nite samples. Consider the set of basis
fun
tions:
Z(x) =
gK (x) = Z(x).
Maintaining these basis fun
tions as the sample size in
reases, we nd that the limiting ob-
274
CHAPTER 18.
NONPARAMETRIC INFERENCE
3.5
approx
true
3.0
2.5
2.0
1.5
1.0
0
1
1
1
7
a1 = , a2 = , a3 = 2 , a4 = 0, a5 = 2 , a6 = 0 .
6
4
Substituting these values into
gK (x)
1
g (x) = 7/6 + x/ + (cos x) 2
1
+ (sin x) 0 + (cos 2x) 2 + (sin 2x) 0
4
(18.1)
In Figure 18.3 we have the approximation and the true fun
tion: Clearly the trun
ated trigonometri
series model oers a better approximation, asymptoti
ally, than does the linear model.
In Figure 18.4 we have the more exible approximation's elasti
ity and that of the true fun
tion: On average, the t is better, though there is some implausible wavyness in the estimate.
Root mean squared error in the approximation of the elasti
ity is
2
0
g (x)x
(x)
g (x)
2
dx
!1/2
= . 16213,
about half that of the RMSE when the rst order approximation is used. If the trigonometri
series
ontained innite terms, this error measure would be driven to zero, as we shall see.
18.2.
0.7
approx
true
0.6
0.5
0.4
0.3
0.2
0.1
0.0
0
Consider means of testing for the hypothesis that onsumers maximize utility. A onsequen e of utility maximization is that the Slutsky matrix
Dp2 h(p, U ),
where
h(p, U )
are
One ap-
proa h to testing for utility maximization would estimate a set of normal demand fun tions
x(p, m).
Estimation of these fun
tions by normal parametri
methods requires spe
i
ation of
the fun
tional form of demand, for example
x(p, m) = x(p, m, 0 ) + , 0 0 ,
where
x(p, m, 0 )
The problem with this is that the reason for reje
tion of the theoreti
al proposition may
be that our
hoi
e of fun
tional form is in
orre
t. In the introdu
tory se
tion we saw
that fun
tional form misspe
i
ation leads to in
onsistent estimation of the fun
tion and
its derivatives.
Testing using parametri
models always means we are testing a
ompound hypothesis.
The hypothesis that is tested is 1) the e
onomi
proposition we wish to test, and 2) the
model is
orre
tly spe
ied.
276
CHAPTER 18.
NONPARAMETRIC INFERENCE
Varian's WARP allows one to test for utility maximization without spe
ifying the form of
the demand fun
tions. The only assumptions used in the test are those dire
tly implied
by theory, so reje
tion of the hypothesis
alls into question the theory.
usually an in
rease in
omplexity, and a loss of power,
ompared to what one would
get using a well-spe
ied parametri
model. The benet is robustness against possible
misspe
i
ation.
y = f (x) + ,
where
is a lassi al error. Let us take the estimation of the ve tor of elasti ities with typi al element
x i =
at an arbitrary point
xi f (x)
,
f (x) xi f (x)
xi .
The Fourier form, following Gallant (1982), but with a somewhat dierent parameterization, may be written as
gK (x | K ) = + x + 1/2x Cx +
where the
K -dimensional
A X
J
X
=1 j=1
uj cos(jk x) vj sin(jk x) .
parameter ve tor
(18.2)
2.
(18.3)
proximation, whi
h is desirable sin
e e
onomi
fun
tions aren't periodi
. For example,
subtra
t sample means, divide by the maxima of the
onditioning variables, and multiply
by
2 eps,
The
where
eps
in value.
ve
tors formed of integers
18.3.
277
k , = 1, 2, ..., A
pendent, and we follow the
onvention that the rst non-zero element be positive. For
example
0 1 1 0 1
i
i
0 1 1 0 1
0 2 2 0 2
in pra
ti
e. The
ost of this is that we are no longer able to test a quadrati
spe
i
ation
using nested testing.
Dx gK (x | K ) = + Cx +
A X
J
X
=1 j=1
uj sin(jk x) vj cos(jk x) jk
(18.4)
Dx2 gK (x|K ) = C +
A X
J
X
=1 j=1
uj cos(jk x) + vj sin(jk x) j 2 k k
arguments
| |
D h(x)
be an
N -dimensional
||
D h(x) h(x).
h(x)
(1 K)
ve tor
Z (x)
so that
D gK (x|K ) = z (x) K .
If we have
x1 1 x2 2 xNN
multi-
of the (arbitrary) fun tion h(x), use D h(x) to indi ate a ertain partial
derivative:
When
(18.5)
(18.6)
Both the approximating model and the derivatives of the approximating model are linear
in the parameters.
For the approximating model to the fun
tion (not derivatives), write
gK (x|K ) = z K
278
CHAPTER 18.
NONPARAMETRIC INFERENCE
The following theorem an be used to prove the onsisten y of the Fourier form.
Theorem 28 [Gallant and Ny hka, 1987 Suppose that h n is obtained by maximizing a sample
obje
tive fun
tion sn (h) over HKn where HK is a subset of some fun
tion spa
e H on whi
h
is dened a norm k h k. Consider the following
onditions:
(a) Compa
tness: The
losure of H with respe
t to k h k is
ompa
t in the relative topology
dened by k h k.
(b) Denseness: K HK , K = 1, 2, 3, ... is a dense subset of the
losure of H with respe
t to
k h k and HK HK+1 .
(
) Uniform
onvergen
e: There is a point h in H and there is a fun
tion s (h, h ) that
is
ontinuous in h with respe
t to k h k su
h that
lim sup | sn (h) s (h, h ) |= 0
almost surely.
(d) Identi
ation: Any point h in the
losure of H with s (h, h ) s (h , h ) must have
k h h k= 0.
Under these
onditions limn k h h n k= 0 almost surely, provided that limn Kn =
almost surely.
The modi
ation of the original statement of the theorem that has been made is to set the
parameter spa
e
1. A generi norm
khk
implies onvergen e
w.r.t the Eu
lidean norm. Typi
ally we will want to make sure that the norm is strong
enough to imply
onvergen
e of all fun
tions of interest.
family, only a restri tion to a spa e of fun tions that satisfy ertain onditions.
This
formulation is mu h less restri tive than the restri tion to a parametri family.
3. There is a denseness assumption that was not present in the other theorem.
We will not prove this theorem (the proof is quite similar to the proof of theorem [19, see
Gallant, 1987) but we will dis
uss its assumptions, in relation to the Fourier form as the
approximating model.
18.3.
279
Sobolev norm
what norm we wish to use. We need a norm that guarantees that the errors in approximation
of the fun
tions we are interested in are a
ounted for. Sin
e we are interested in rst-order
elasti
ities in the present
ase, we need
lose approximation of both the fun
tion
x. Let X
f (x)
and its
It is dened,
k h km,X = max sup D h(x)
| |m X
gK (x | K ),
f (x)
we would evaluate
k f (x) gK (x | K ) km,X .
We see that this norm takes into a
ount errors in approximating the fun
tion and partial
derivatives up to order
m.
X,
would be
m = 1.
Compa
tness
enlightening.
uniform
sup
over
x.
Verifying ompa tness with respe t to this norm is quite te hni al and un-
k h km,X ,
E onometri a,
1983.
The basi
m + 1.
A Sobolev
is a nite onstant. In plain words, the fun tions must have bounded partial deriva-
in onsistent estimation of rst-order elasti ities, we'll dene the estimation spa e as follows:
Denition 29 [Estimation spa e The estimation spa e H = W2,X (D). The estimation spa e
So we are assuming that the fun
tion to be estimated has bounded se
ond derivatives
throughout
X.
With seminonparametri
estimators, we don't a
tually optimize over the estimation spa
e.
Rather, we optimize over a subspa
e,
HK n ,
dened as:
280
CHAPTER 18.
NONPARAMETRIC INFERENCE
n > K,
HK ,
element of
so optimization over
HK
that
and
HK
H.
dim Kn
A set of subsets
Aa
be dense subsets of
HK , dened above,
of a set
A:
A is dense
n.
H,
as
observations,
HK
HK
n .
n,
In order
This is a hieved by
H.
a=1 Aa = A
Use a pi
ture here. The rest of the dis
ussion of denseness is provided just for
ompleteness:
there's no need to study it in detail. To show that HK is a dense subset of H with respe
t to
k h k1,X ,
it is useful to apply Theorem 1 of Gallant (1982), who in turn ites Edmunds and
Mos
atelli (1977). We reprodu
e the theorem as presented by Gallant, with minor notational
hanges, for
onvenien
e of referen
e:
Theorem 31 [Edmunds and Mos atelli, 1977 Let the real-valued fun tion h (x) be ontin-
X,
q = 1,
and
m = 2.
so the theorem is appli able. Closely following Gallant and Ny hka (1987),
HK
HK .
h H.
is
su h that
lim k h hK k1,X = 0,
K
for all
HK
Therefore,
H HK .
18.3.
281
However,
HK H,
so
HK H.
Therefore
H = HK ,
so
HK
is a dense subset of
Uniform onvergen e
H,
k h k1,X .
We estimate by
OLS. The sample obje tive fun tion stated in terms of maximization is
1X
(yt gK (xt | K ))2
sn (K ) =
n
t=1
With random sampling, as in the
ase of Equations 14.1 and 17.3, the limiting obje
tive
fun
tion is
s (g, f ) =
where the true fun
tion
the theorem. Both
g(x)
f (x)
and
(f (x) g(x))2 dx 2 .
f (x)
are elements of
(18.7)
in the presentation of
HK .
The pointwise
onvergen
e of the obje
tive fun
tion needs to be strengthened to uniform
onvergen
e. We will simply assume that this holds, sin
e the way to verify this depends upon
the spe
i
appli
ation. We also have
ontinuity of the obje
tive fun
tion in
to the norm
k h k1,X
g,
with respe t
sin e
s g 1 , f ) s g 0 , f )
kg 1 g 0 k1,X 0
Z h
2
2 i
dx.
g1 (x) f (x) g0 (x) f (x)
=
lim
lim
kg 1 g 0 k1,X 0 X
By the dominated onvergen e theorem (whi h applies sin e the nite bound
W2,Z (D)
used to dene
is dominated by an integrable fun tion), the limit and the integral an be inter-
Identi ation
(g, f )
in
H H,
g
and
are on e ontinuously dierentiable (by the assumption that denes the estimation spa e).
Review of
on
epts
on
epts are:
282
CHAPTER 18.
H = W2,X (D):
Estimation spa
e
fun
tion must lie.
Consisten y norm
k h k1,X .
Estimation subspa e
HK .
NONPARAMETRIC INFERENCE
The losure of
that is repre-
By standard
sn (K ),
K .
H.
in its rst argument, over the losure of the innite union of the estimation subpa es, at
g = f.
xi f (x)
f (x) xi f (x)
are
onsistently estimated for all
Dis ussion
x X.
Consisten y requires that the number of parameters used in the expansion in-
rease with the sample size, tending to innity. If parameters are added at a high rate, the bias
tends relatively rapidly to zero. A basi
problem is that a high rate of in
lusion of additional
parameters
auses the varian
e to tend more slowly to zero. The issue of how to
hose the rate
at whi
h parameters are added and whi
h to add rst is fairly
omplex. A problem is that the
allowable rates for asymptoti
normality to obtain (Andrews 1991; Gallant and Souza, 1991)
are very stri
t. Supposing we sti
k to these rates, our approximating model is:
gK (x|K ) = z K .
Dene
ZK
as the
estimator is
nK
K = ZK ZK
where
()+
+
ZK y,
ZK ZK
K(n)
large
where
z K ,
f (x)
d
n z K f (x) N (0, AV ),
#
"
+
2
ZK ZK
z
.
AV = lim E z
n
n
18.3.
283
Formally, this is exa
tly the same as if we were dealing with a parametri
linear model. I
emphasize, though, that this is only valid if
grows. If we an't
sti
k to a
eptable rates, we should probably use some other method of approximating
the small sample distribution. Bootstrapping is a possibility. We'll dis
uss this in the
se
tion on simulation.
Advan es in E ono-
-dimensional.
The model is
yt = g(xt ) + t ,
where
E(t |xt ) = 0.
given
is
tation, we have
f (x, y)
dy
h(x)
Z
1
yf (x, y)dy,
h(x)
g(x) =
=
where
h(x)
x:
h(x) =
g(x).
g(x)
f (x, y)dy.
by estimating
h(x)
1 X K [(x xt ) /n ]
h(x) =
,
n t=1
nk
where
K()
is the dimension of
x.
|K(x)|dx < ,
h(x)
and
yf (x, y)dy.
284
CHAPTER 18.
and
K()
integrates to
In this respe t,
K()
1:
NONPARAMETRIC INFERENCE
K(x)dx = 1.
K()
to
be nonnegative.
The
window width
parameter,
lim n = 0
lim nnk =
So, the window width must tend to zero, but not too qui kly.
h(x)
h(x),
for
estimator (sin
e the estimator is an average of iid terms we only need to
onsider the
expe
tation of a representative term):
i Z
Change variables as
z = (x z)/n ,
so
z = x n z
and
| dzdz | = nk ,
we obtain
Z
h
i
E h(x)
=
nk K (z ) h(x n z )nk dz
Z
=
K (z ) h(x n z )dz .
lim E h(x)
=
K (z ) h(x n z )dz
lim
Z
=
lim K (z ) h(x n z )dz
n
Z
=
K (z ) h(x)dz
Z
= h(x) K (z ) dz
n
= h(x),
sin
e
n 0
and
K (z ) dz = 1
through the integral is a result of the dominated
onvergen
e theorem.. For this to hold
we need that
h()
18.3.
nnk V
h(x),
n
i
X
K [(x xt ) /n ]
k 1
V
h(x) = nn 2
n
nk
= nk
285
1
n
t=1
n
X
t=1
V {K [(x xt ) /n ]}
h
i
nnk V h(x)
= nk V {K [(x z) /n ]}
Also, sin e
we have
o
h
i
n
nnk V h(x)
= nk E (K [(x z) /n ])2 nk {E (K [(x z) /n ])}2
Z
2
Z
2
k
k
k
=
n K [(x z) /n ] h(z)dz n
n K [(x z) /n ] h(z)dz
Z
h
i2
=
nk K [(x z) /n ]2 h(z)dz nk E b
h(x)
h
i2
nk E b
h(x) 0,
by the previous result regarding the expe tation and the fa t that
lim nnk V
n
n 0.
Therefore,
Z
i
nk K [(x z) /n ]2 h(z)dz.
h(x) = lim
n
Using exa tly the same hange of variables as before, this an be shown to be
Z
h
i
n
Sin
e both
[K(z )]2 dz
and
h(x)
nnk
h
i
V h(x)
0.
Sin e the bias and the varian e both go to zero, we have pointwise onsisten y ( onvergen e in quadrati mean implies onvergen e in probability).
286
CHAPTER 18.
NONPARAMETRIC INFERENCE
yf (x, y)dy,
h(x),
we need an estimator of
f (x, y).
1 X K [(y yt ) /n , (x xt ) /n ]
f(x, y) =
n
nk+1
t=1
The kernel
K ()
yK (y, x) dy = 0
h(x) :
K (y, x) dy = K(x).
1 X K [(x xt ) /n ]
y f(y, x)dy =
yt
n
nk
t=1
g(x) =
=
=
h(x)
1 Pn
y f(y, x)dy
K[(xxt )/n ]
k
n
P
K[(xx
n
t )/n ]
1
k
t=1
n
n
Pn
yt K [(x xt ) /n ]
Pt=1
.
n
t=1 K [(x xt ) /n ]
n
t=1 yt
Dis ussion
g(xt )
yj , j = 1, 2, ..., n,
xt .
where higher weights are asso
iated with points that are
loser to
to 1.
A large window width redu es the varian e (strong imposition of atness), but in reases
A small window width redu es the bias, but makes very little use of information ex ept
as
n ,
1/n.
the bias.
xt .
18.4.
287
K(.)
and
K (y, x),
though there
validation. This
onsists of splitting the sample into two parts (e.g., 50%-50%). The rst part
is the in sample data, whi
h is used for estimation, and the se
ond part is the out of sample
data, used for evaluation of the t though RMSE or some other
riterion. The steps are:
1. Split the data. The out of sample data is
2. Choose a window width
y out
and
xout .
xout
t ,
ytout .
2,
7. Sele t the
that minimizes RMSE() (Verify that a minimum has been found, for
).
and
onditional on
x,
is simply
fby|x =
=
=
f(x, y)
h(x)
1 Pn
t=1
288
CHAPTER 18.
NONPARAMETRIC INFERENCE
where we obtain the expressions for the joint and marginal densities from the se
tion on kernel
regression.
E onometri a, 1987.
useful dis ussion in the user's guide, see this link. See also Cameron and Johansson,
Journal
MLE is the estimation method of
hoi
e when we are
ondent about spe
ifying the density.
Is is possible to obtain the benets of MLE when we're not so
ondent about the spe
i
ation?
In part, yes.
f (y|x, )
onditional on
gp (y|x, , ) =
where
hp (y|) =
p
X
k y k
k=0
p (x, , ) is a normalizing fa
tor to make the density integrate (sum) to one. Be
ause
2
hp (y|)/p (x, , ) is a homogenous fun
tion of it is ne
essary to impose a normalization: 0
and
p (, ) is al ulated
using
E(Y r ) =
=
y=0
y r fY (y|, )
y
(y|)]2
fY (y|)
p (, )
r [hp
y=0
p X
p
X
X
y r fY (y|)k l y k y l /p (, )
p X
p
X
k l
k=0 l=0
p
p X
X
y=0
r+k+l
y
fY (y|) /p (, )
k l mk+l+r /p (, ).
k=0 l=0
By setting
r=0
18.8
p (, ) =
p X
p
X
k l mk+l
(18.8)
k=0 l=0
Re all that
The
mr
18.4.
289
moments of the baseline density. Gallant and Ny
hka (1987) give
onditions under whi
h su
h
a density may be treated as
orre
tly spe
ied, asymptoti
ally.
polynomial must in
rease as the sample size in
reases. However, there are te
hni
alities.
Similarly to Cameron and Johannson (1997), we may develop a negative binomial polynomial (NBP) density for
ount data. The negative binomial baseline density may be written
(see equation as
(y + )
fY (y|) =
(y + 1)()
where
= {, }, > 0
and
is the parameterization
(NB-I). When
> 0.
y
ex . When
= /
V (Y ) = + .
V (Y ) = + 2 .
E(Y ) = .
The reshaped density, with normalization to sum to one, is
[hp (y|)]2 (y + )
fY (y|, ) =
p (, ) (y + 1)()
y
(18.9)
To get the normalization fa tor, we need the moment generating fun tion:
MY (t) = et +
(18.10)
To illustrate, Figure 18.5 shows
al
ulation of the rst four raw moments of the NB density,
al
ulated using MuPAD, whi
h is a Computer Algebra System that (used to be?) free for
personal use. These are the moments you would need to use a se
ond order polynomial
(p = 2).
MuPAD will output these results in the form of C
ode, whi
h is relatively easy to edit to
write the likelihood fun
tion for the model. This has been done in NegBinSNP.
, whi
h is
a C++ version of this model that
an be
ompiled to use with o
tave using the
mko tfile
ommand. Note the impressive length of the expressions when the degree of the expansion is
4 or 5! This is an example of a model that would be di
ult to formulate without the help of
a program like
MuPAD.
parameters to depend
E onometri a,
a wide variety of densities arbitrarily well as the degree of the polynomial in
reases with the
sample size. This approa
h is not without its drawba
ks: the sample obje
tive fun
tion
an
have an
extremely
If
someone
ould gure out how to do in a way su
h that the sample obje
tive fun
tion was ni
e
and smooth, they would probably get the paper published in a good journal. Any ideas?
Here's a plot of true and the limiting SNP approximations (with the order of the polynomial
xed) to four dierent
ount data densities, whi
h variously exhibit over and underdispersion,
290
CHAPTER 18.
NONPARAMETRIC INFERENCE
Case 1
Case 2
.5
.4
.1
.3
.2
.05
.1
0
Case 3
10
15
20
.25
.2
.2
.15
.15
Case 4
10
15
20
25
.1
.1
.05
.05
1
2.5
7.5
10
12.5
15
18.5.
291
EXAMPLES
3.29
3.285
3.28
3.275
3.27
3.265
3.26
3.255
20
25
30
35
40
45
50
55
60
65
Age
18.5 Examples
18.5.1 MEPS health
are usage data
We'll use the MEPS OBDV data to illustrate kernel regression and semi-nonparametri
maximum likelihood.
OBDV
292
CHAPTER 18.
NONPARAMETRIC INFERENCE
======================================================
BFGSMIN final results
Used numeri
gradient
-----------------------------------------------------STRONG CONVERGENCE
Fun
tion
onv 1 Param
onv 1 Gradient
onv 1
-----------------------------------------------------Obje
tive fun
tion value 2.17061
Stepsize 0.0065
24 iterations
-----------------------------------------------------param
gradient
hange
1.3826 0.0000 -0.0000
0.2317 -0.0000
0.0000
0.1839 0.0000
0.0000
0.2214 0.0000 -0.0000
0.1898 0.0000 -0.0000
0.0722 0.0000 -0.0000
-0.0002 0.0000 -0.0000
1.7853 -0.0000 -0.0000
-0.4358 0.0000 -0.0000
0.1129 0.0000
0.0000
******************************************************
NegBin SNP model, MEPS full data set
MLE Estimation Results
BFGS
onvergen
e: Normal
onvergen
e
Average Log-L: -2.170614
Observations: 4564
onstant
pub. ins.
priv. ins.
sex
age
edu
in
gam1
gam2
lnalpha
estimate
-0.147
0.695
0.409
0.443
0.016
0.025
-0.000
1.785
-0.436
0.113
Information Criteria
CAIC : 19907.6244
st. err
0.126
0.050
0.046
0.034
0.001
0.006
0.000
0.141
0.029
0.027
Avg. CAIC:
t-stat
-1.173
13.936
8.833
13.148
11.880
3.903
-0.011
12.629
-14.786
4.166
4.3619
p-value
0.241
0.000
0.000
0.000
0.000
0.000
0.991
0.000
0.000
0.000
18.5.
293
EXAMPLES
BIC : 19897.6244
Avg. BIC:
4.3597
AIC : 19833.3649
Avg. AIC:
4.3456
******************************************************
Note that the CAIC and BIC are lower for this model than for the models presented in
Table 16.3. This model ts well, still being parsimonious. You
an play around trying other
use measures, using a NP-II baseline density, and using other orders of expansions. Density
fun
tions formed in this way may have
MANY
maximum, one
an try using multiple starting values, or one
ould try simulated annealing
as an optimization method.
lot
minimization. The
hapter on parallel
omputations might be interesting to read before trying
this.
$/yen ex hange rates at New York, noon, from January 04, 1999 to February 12, 2008. There
are 2291 observations. See the README le for details. Figures 18.5.2 and 18.5.2 show the
data and their histograms.
294
CHAPTER 18.
NONPARAMETRIC INFERENCE
Figure 18.9: Kernel regression tted onditional se ond moments, Yen/Dollar and Euro/Dollar
(a) Yen/Dollar
at the
enter of the histograms, the bars extend above the normal density that best ts
the data, and the tails are fatter than those of the best t normal density. This feature
of the data is known as
(b) Euro/Dollar
leptokurtosis.
in the series plots, we
an see that the varian
e of the growth rates is not
onstant
over time. Volatility
lusters are apparent, alternating between periods of stability and
periods of more wild swings. This is known as
ARCH and
GARCH well-known models that are often applied to this sort of data.
Many stru
tural e
onomi
models often
annot generate data that exhibits
onditional
heteros
edasti
ity without dire
tly assuming sho
ks that are
onditionally heteros
edasti
. It would be ni
e to have an e
onomi
explanation for how
onditional heteros
edasti
ity, leptokurtosis, and other (leverage, et
.) features of nan
ial data result from the
behavior of e
onomi
agents, rather than from a bla
k box that provides sho
ks.
2
2 ),
yt2
E(yt2 |yt1,
and generates
From the point of view of learning the pra
ti
al aspe
ts of kernel regression, note how
the data is
ompa
tied in the example s
ript.
In the Figure, note how
urrent volatility depends on lags of the squared return rate it is high when both of the lags are high, but drops o qui
kly when either of the lags is
low.
The fa
t that the plots are not at suggests that this
onditional moment
ontain information about the pro
ess that generates the data. Perhaps attempting to mat
h this
moment might be a means of estimating the parameters of the dgp. We'll
ome ba
k to
this later.
18.6.
295
EXERCISES
kernel_example.
(a) Look this s
ript over, and des
ribe in words what it does.
(b) Run the s
ript and interpret the output.
(
) Experiment with dierent bandwidths, and
omment on the ee
ts of
hoosing
small and large values.
2. In O
tave, type help
kernel_regression.
3. Using the O
tave s
ript OBDVkernel.m as a model, plot kernel regression ts for OBDV
visits as a fun
tion of in
ome and edu
ation.
296
CHAPTER 18.
NONPARAMETRIC INFERENCE
Chapter 19
Simulation-based estimation
Readings:
University Press).
(Oxford
Tau hen (1996), Whi h Moments to Mat h?, ECONOMETRIC THEORY, Vol.
12, 1996,
J. Apl. E onometri s; Pakes and Pollard (1989) E onometri a ; M Fadden (1989) E onometri a.
pages 657-681; Gourieroux, Monfort and Renault (1993), Indire t Inferen e,
19.1 Motivation
Simulation methods are of interest when the DGP is fully
hara
terized by a parameter ve
tor,
so that simulated data
an be generated, but the likelihood fun
tion and moments of the
observable varables are not
al
ulable, so that MLE or GMM estimation is not possible.
Many moderately
omplex models result in intra
tible likelihoods or moments, as we will
see. Simulation-based estimation methods open up the possibility to estimate truly
omplex
1
models. The desirability introdu
ing a great deal of
omplexity may be an issue , but it least
it be
omes a possibility.
yi
m.
Suppose that
yi = Xi + i
where
Xi
is
m K.
Suppose that
i N (0, )
i
(19.1)
y = (y )
1
Remember that a model is an abstra tion from reality, and abstra tion helps us to isolate the important
features of a phenomenon.
297
298
CHAPTER 19.
SIMULATION-BASED ESTIMATION
Dene
Ai = A(yi ) = {y |yi = (y )}
Suppose random sampling of
(yi , Xi ).
is not diagonal).
However,
yi
is independent
yj , i 6= j.
Let
= ( , (vec ) )
pi () =
Ai
n(yi Xi , )dyi
where
M/2
n(, ) = (2)
||
1/2
1
exp
2
The log-
1X
ln pi ()
ln L() =
n
i=1
solves
n
n
1 X D pi ()
1X
gi () =
0.
n
n
pi ()
i=1
Li ()
by standard methods
i=1
The mapping
y)
(y )
dierent hoi es of
(the
).
has not been made spe i so far. This setup is quite general: for
(y )
well as the
ase of multinomial dis
rete
hoi
e (the
hoi
e of one out of a nite set of
alternatives).
Multinomial dis
rete
hoi
e is illustrated by a (very simple) job sear
h model. We
have
ross se
tional data on individuals' mat
hing to a set of
is
uj = Xj + j
Utilities of jobs, sta
ked in the ve
tor
ui
19.1.
299
MOTIVATION
yj = 1 [uj > uk , k m, k 6= j]
Only one of these elements is dierent than zero.
Dynami
dis
rete
hoi
e is illustrated by repeated
hoi
es over time between two
alternatives. Let alternative
have utility
ujt = Wjt jt ,
j
{1, 2}
t {1, 2, ..., m}
Then
y = u2 u1
= (W2 W1 ) + 2 1
X +
Now the mapping is (element-by-element)
y = 1 [y > 0] ,
that is
yit = 1
if individual
t,
zero other-
wise.
possibility is to introdu
e latent random variables. This
an
ause the problem that there may
be no known
losed form for the distribution of observable variables after marginalizing out
the unobservable latent variables. For example,
ount data (that takes values
often modeled using the Poisson distribution
Pr(y = i) =
exp()i
i!
The mean and varian e of the Poisson distribution are both equal to
E(y) = V (y) = .
Often, one parameterizes the
onditional mean as
i = exp(Xi ).
0, 1, 2, 3, ...)
is
300
CHAPTER 19.
SIMULATION-BASED ESTIMATION
This ensures that the mean is positive (as it must be). Estimation by ML is straightforward.
Often,
ount data exhibits overdispersion whi
h simply means that
i = exp(Xi + i )
where
parameters). Let
d(i )
i .
be the density of
Pr(y = yi ) =
will have a losed-form solution (one an derive the negative binomial distribution in the way if
has an exponential distribution - see equation 16.1), but often this will not be possible.
ase, simulation is a means of
al
ulating
Pr(y = i),
In this
In this
ase, sin
e there is only one latent variable, quadrature is probably a better
hoi
e. However, a more exible model with heterogeneity would allow all parameters
(not just the
onstant) to vary. For example
Pr(y = yi ) =
entails a
when
K = dim i -dimensional
gets large.
A realisti model should a ount for exogenous sho ks to the system, whi h an be
done by assuming a random
omponent. This leads to a model that is expressed as a system
of sto
hasti
dierential equations. Consider the pro
ess
{Wt }
W (T ) =
dWt N (0, T )
19.2.
301
W (0) = 0
[W (s) W (t)] N (0, s t)
[W (s) W (t)]
and
[W (j) W (k)]
h(, yt )
g(, yt )
To estimate a model of this sort, we typi
ally have data that are assumed to be observations
of
yt
y1 , y2 , ...yT .
yt
f (yt |yt1 , ).
to evaluate the likelihood fun
tion or to evaluate moment
onditions (whi
h are based upon
expe
tations with respe
t to this density).
A typi
al solution is to dis
retize the model, by whi
h we mean to nd a dis
rete time
approximation to the model. The dis
retized version of the model is
approximation of the dis
retization to the a
tual (unknown) dis
rete time version of the
model is not equal to
based upon this equation is in general biased and in onsistent for the original parameter,
Nevertheless, the approximation shouldn't be too bad, whi h will be useful, as we will
see.
The important point about these three examples is that
omputational di
ulties prevent dire
t appli
ation of ML, GMM, et
. Nevertheless the model is fully spe
ied in
probabilisti
terms up to a parameter ve
tor. This means that the model is simulable,
onditional on the parameter ve
tor.
1X
M L = arg max sn () =
ln p(yt |Xt , )
n t=1
302
CHAPTER 19.
SIMULATION-BASED ESTIMATION
p(yt |Xt , ) is the density fun
tion of the tth observation. When p(yt |Xt , ) does not have
M L is an infeasible estimator. However, it may be possible to dene a
known
losed form,
where
a
E f (, yt , Xt , ) = p(yt |Xt , )
where the density of
p (yt , Xt , ) =
H
1 X
f (ts , yt , Xt , )
H
s=1
is unbiased for
p(yt |Xt , ).
1X
SM L = arg max sn () =
ln p (yt , Xt , )
n
i=1
is
uj = Xj + j
and the ve
tor
is formed of elements
yj = 1 [uj > uk , k m, k 6= j]
The problem is that
Pr(yj = 1|)
Draw
Cal ulate
Dene
Repeat this
u
i = Xi + i
N (0, )
(where
Xi
Xij )
ei
m-ve tor
eij =
PH
ijh
h=1 y
eij .
Dene
Now
as the
formed of the
p (yi , Xi , ) = yi
ei
Ea h element of
ln L(, ) =
1X
yi ln p (yi , Xi , )
n
i=1
ei
19.2.
303
and
Notes:
The
draws of
used to nd
and
are draw
only on e
i.
If the
The log-likelihood fun tion with this simulator is a dis ontinuous fun tion of
and
This does not
ause problems from a theoreti
al point of view sin
e it
an be shown
that
ln L(, )
is sto hasti ally equi ontinuous. However, it does ause problems if one
yi
log(0)
of
ei
problem.
1) use an estimation method that doesn't require a ontinuous and dierentiable obje tive fun tion, for example, simulated annealing. This is omputationally ostly.
2) Smooth the simulated probabilities so that they are
ontinuous fun
tions of the
parameters. For example, apply a kernel transformation su
h as
yij
k=1
uij
therefore
yij
A(n) ,
yij
is very
lose to 1 if
and
so that
pij
uij
and
as the sample size in
reases. There are alternative methods (e.g., Gibbs sampling)
that may work better, but this is too te
hni
al to dis
uss here.
To solve to log(0) problem, one possibility is to sear
h the web for the slog fun
tion.
Also, in
rease
19.2.2 Properties
The properties of the SML estimator depend on how
Lee (1995) Asymptoti
Bias in Simulated Maximum Likelihood Estimation of Dis
rete Choi
e
Models,
437-83.
304
CHAPTER 19.
SIMULATION-BASED ESTIMATION
1/2 .
than n
The var ov is the typi al inverse of the information matrix, so that as long as
grows
fast enough the estimator is onsistent and fully asymptoti ally e ient.
is not
al
ulable.
On
e
ould, in prin
iple, base a GMM estimator upon the moment
onditions
mt () = [K(yt , xt ) k(xt , )] zt
where
k(xt , ) =
zt
on
xt .
onditional
However
k(xt , )
H
1 X
e
k (xt , ) =
K(e
yth , xt )
H
h=1
a.s.
e
k (xt , ) k (xt , ) ,
as
H ,
intuitive basis for the estimator, though in fa
t we obtain
onsisten
y even for
sin
e a law of large numbers is also operating a
ross the
nite,
h
i
m
ft () = K(yt , xt ) e
k (xt , ) zt
(19.2)
19.3.
305
where
zt
m()
e
=
=
1X
m
ft ()
n
i=1
"
#
n
H
1X
1 X
K(yt , xt )
k(e
yth , xt ) zt
n
H
i=1
(19.3)
h=1
with whi
h we form the GMM
riterion and estimate as usual. Note that the unbiased
simulator
k(e
yth , xt )
19.3.1 Properties
Suppose that the optimal weighting matrix is used. M
Fadden (ref. above) and Pakes and
Pollard (refs.
above) show that the asymptoti distribution of the MSM estimator is very
nite,
1
d
1
n M SM 0 N 0, 1 +
D 1 D
H
where
D 1 D
1
(19.4)
estimator is not fully asymptoti
ally e
ient relative to the infeasible GMM estimator,
for
reasonably
large.
If one doesn't use the optimal weighting matrix, the asymptoti var ov is just the ordi-
H = 1.
to SML.
1 + 1/H.
The above presentation is in terms of a spe i moment ondition based upon the onditional mean. Simulated GMM an be applied to moment onditions of any form.
19.3.2 Comments
Why is SML in
onsistent if
an average of
is nite, while MSM is? The reason is that SML is based upon
ln L(, ) =
1X
yi ln pi (, )
n
i=1
To use
306
CHAPTER 19.
SIMULATION-BASED ESTIMATION
ln L(, ) =
1X
yi ln pi (, )
n
i=1
E ln(
pi (, )) 6= ln(E pi (, ))
in spite of the fa
t that
E pi (, ) = pi (, )
due to the fa
t that
(in the limit) is if
ln()
p ()
tends to
p ().
The reason that MSM does not suer from this problem is that in this
ase the unbiased
simulator appears
linearly
(see
equation [19.3). Therefore the SLLN applies to
an
el out simulation errors, from whi
h we
get
onsisten
y.
That is, using simple notation for the random sampling ase, the moment
onditions
m()
"
#
n
H
1X
1 X
h
K(yt , xt )
k(e
yt , xt ) zt
n
H
i=1
h=1
"
#
n
H
X
1
1X
k(xt , 0 ) + t
[k(xt , ) + ht ] zt
n
H
i=1
(19.5)
(19.6)
h=1
m
() =
(note:
zt
k(x, 0 ) k(x, ) z(x)d(x).
xt ).
s () = m
() 1
()
m
whi
h obviously has a minimum at
0,
If you look at equation 19.6 a bit, you will see why the varian e ination fa tor is
(1+ H1 ).
A poor hoi e of moment onditions may lead to very ine ient estimators, and an
The drawba k of the above approa h MSM is that the moment onditions used in es-
even ause identi ation problems (as we've seen with the GMM problem set).
19.4.
307
The asymptoti
ally optimal
hoi
e of moments would be the s
ore ve
tor of the likelihood
fun
tion,
mt () = D ln pt ( | It )
As before, this
hoi
e is unavailable.
The e
ient method of moments (EMM) (see Gallant and Tau
hen (1996), Whi
h Moments
to Mat
h?, ECONOMETRIC THEORY, Vol.
moment
onditions that
losely mimi
the s
ore ve
tor. If the approximation is very good, the
resulting estimator will be very nearly fully e
ient.
The DGP is
hara
terized by random sampling from the density
p(yt |xt , 0 ) pt ( 0 )
We
an dene an auxiliary model,
alled the s
ore generator, whi
h simply provides a
(misspe
ied) parametri
density
f (y|xt , ) ft ()
is
X
= arg max sn () = 1
ln ft ().
n
t=1
.
D ln f (yt |xt , )
After determining
The important point is that even if the density is misspe ied, there is a pseudo-true
: EX EY |X
D ln f (y|x, ) =
0
X
= 1
mn (, )
n
Y |X
is zero:
Z Z
t=1
for whi h the true expe tation, taken with respe t to the true but unknown density of
y, p(y|xt , 0 ),
0 ;
t ()dy
D ln ft ()p
pt ()
(19.7)
simulable using
n
H
1X 1 X
D ln f (e
yth |xt , )
m
fn (, ) =
n t=1 H
h=1
where
yth
is a draw from
onverges to
DGP (),
holding
xt
xed.
0 ,
m
e ( 0 , 0 ) = 0.
308
CHAPTER 19.
assuming that
is identied.
m
e n (, )
SIMULATION-BASED ESTIMATION
will losely approximate the optimal moment onditions whi h hara terize
If one has prior information that a
ertain density approximates the data well, it would
be a good
hoi
e for
f ().
If one has no density in mind, there exist good ways of approximating unknown distri-
E onometri a, 1987)
SNP density estimator whi h we saw before. Sin e the SNP den-
sity is
onsistent, the e
ien
y of the indire
t estimator is the same as the infeasible ML
estimator.
very large. Gallant and Tau hen give the theory for the ase of
so large that it may be treated as innite (the dieren e being irrelevant given the numeri al
presented here.
The moment
ondition
m(,
e )
We an apply
d
0
n
N 0, J (0 )1 I(0 )J (0 )1
If the density
J (0 )1 I(0 )
(19.8)
p(y|xt , ),
Re all that
so there is no an ellation.
J (0 ) p lim
2
0
sn ( )
f (yt |xt , )
is only an approxi-
sn ()
with the
J (0 ) = D m( 0 , 0 ).
As in Theorem 22,
sn () sn ()
.
I( ) = lim E n
n
0 0
0
In this ase, this is simply the asymptoti varian e ovarian e matrix of the moment onditions,
=
nm
n ( 0 , )
nm
n ( 0 , 0 ) +
nmn ( 0 , )
about
0 + op (1)
0 , 0 )
nD m(
19.4.
309
First onsider
nm
n ( 0 , 0 ).
J (0 ),
1
0
H I ( ).
so we have
0 .
0 , 0 )
nD m(
Note that
a.s.
n ( 0 , 0 )
D m
0 = nJ (0 )
0 , a.s.
0 , 0 )
nD m(
a
0
nJ (0 )
N 0, I(0 )
Now, ombining the results for the rst and se ond terms,
1
0
0 a
I( )
nm
n ( , ) N 0, 1 +
H
Suppose that
\
0)
I(
the moment
onditions. This may be
ompli
ated if the s
ore generator is a poor approximator,
sin
e the individual s
ore
ontributions may not have mean zero in this
ase (see the se
tion
on QML) . Even if this is the
ase, the individuals means
an be
al
ulated by simulation,
so it is always possible to
onsistently estimate
I(0 )
other hand, if the s
ore generator is taken to be
orre
tly spe
ied, the ordinary estimator of
the information matrix is
onsistent.
= arg min mn (, )
as
1
1 \
0
mn (, )
1+
I( )
H
If one has used the Gallant-Ny
hka ML estimator as the auxiliary model, the appropriate
weighting matrix is simply the information matrix of the auxiliary model, sin
e the
s
ores are un
orrelated. (e.g., it really is ML estimation asymptoti
ally, sin
e the s
ore
generator
an approximate the unknown density arbitrarily well).
!1
1
1
d
,
n 0 N 0, D 1 +
D
I(0 )
H
where
D = lim E D mn ( 0 , 0 ) .
n
310
CHAPTER 19.
SIMULATION-BASED ESTIMATION
= D mn (,
D
1
0 a
0
nmn ( , ) N 0, 1 +
I( )
H
The fa t that
implies that
)
nmn (,
where
1
1
a 2
)
1+
mn (,
(q)
I()
H
q is dim()dim(), sin e without dim() moment onditions the model is not identied,
so testing is impossible. One test of the model is simply based on this statisti
: if it ex
eeds
the
2 (q)
riti al point, something may be wrong (the small sample performan e of this sort
diag
1
1+
H
1/2 !1
nmn (,
I()
related to parameters of the s
ore generator, whi
h are usually related to
ertain features
of the model, this information
an be used to revise the model. These aren't a
tually
and nmn (,
)
have dierent distributions
N (0, 1), sin
e nmn ( 0 , )
)
is somewhat more
ompli
ated). It
an be shown that the pseudo-t
nmn (,
of
distributed as
(that
et. al.
19.5 Examples
19.5.1 SML of a Poisson model with latent heterogeneity
We have seen (see equation 16.1) that a Poisson model with latent heterogeneity that follows
an exponential distribution leads to the negative binomial model. To illustrate SML, we
an
integrate out the latent heterogeneity using Monte Carlo, rather than the analyti
al approa
h
whi
h leads to the negative binomial model. In a
tual pra
ti
e, one would not want to use
SML in this
ase, but it is a ni
e example sin
e it allows us to
ompare SML to the a
tual
ML estimator. The O
tave fun
tion dened by PoissonLatentHet.m
al
ulates the simulated
log likelihood for a Poisson model where
= exp xt + ),
where
N (0, 1).
This model is
similar to the negative binomial model, ex
ept that the latent variable is normally distributed
rather than gamma distributed.
this model using the MEPS OBDV data that has already been dis ussed. Note that simulated
19.5.
311
EXAMPLES
annealing is used to maximize the log likelihood fun
tion. Attempting to use BFGS leads to
trouble. I suspe
t that the log likelihood is approximately non-dierentiable in pla
es, around
whi
h it is very at, though I have not
he
ked if this is true. If you run this s
ript, you will
see that it takes a long time to get the estimation results, whi
h are:
******************************************************
Poisson Latent Heterogeneity model, SML estimation, MEPS 1996 full data set
MLE Estimation Results
BFGS
onvergen
e: Max. iters. ex
eeded
Average Log-L: -2.171826
Observations: 4564
onstant
pub. ins.
priv. ins.
sex
age
edu
in
lnalpha
estimate
-1.592
1.189
0.655
0.615
0.018
0.024
-0.000
0.203
st. err
0.146
0.068
0.065
0.044
0.002
0.010
0.000
0.014
t-stat
-10.892
17.425
10.124
13.888
10.865
2.523
-0.531
14.036
p-value
0.000
0.000
0.000
0.000
0.000
0.012
0.596
0.000
Information Criteria
CAIC : 19899.8396
Avg. CAIC: 4.3602
BIC : 19891.8396
Avg. BIC:
4.3584
AIC : 19840.4320
Avg. AIC:
4.3472
******************************************************
o
tave:3>
If you
ompare these results to the results for the negative binomial model, given in subse
tion
(16.2.1), you
an see that the present model ts better a
ording to the CAIC
riterion. The
present model is
onsiderably less
onvenient to work with, however, due to the
omputational
requirements. The
hapter on parallel
omputing is relevant if you wish to use models of this
sort.
19.5.2 SMM
To be added in future:
312
CHAPTER 19.
SIMULATION-BASED ESTIMATION
19.5.3 SNM
To be added.
Tau hen for this, but here we'll look at some simple, but hopefully dida ti ode.
The le
probitdgp.m generates data that follows the probit model. The le emm_moments.m denes
EMM moment
onditions, where the DGP and s
ore generator
an be passed as arguments.
Thus, it is a general purpose moment
ondition for EMM estimation. This le is interesting
enough to warrant some dis
ussion. A listing appears in Listing 19.1. Line 3 denes the DGP,
and the arguments needed to evaluate it are dened in line 4. The s
ore generator is dened
in line 5, and its arguments are dened in line 6. The QML estimate of the parameter of the
s
ore generator is read in line 7. Note in line 10 how the random draws needed to simulate
data are passed with the data, and are thus xed during estimation, to avoid
hattering.
The simulated data is generated in line 16, and the derivative of the s
ore generator using the
simulated data is
al
ulated in line 18. In line 20 we average the s
ores of the s
ore generator,
whi
h are the moment
onditions that the fun
tion returns.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
19.5.
313
EXAMPLES
gradient
0.0000
-0.0000
-0.0000
-0.0000
-0.0000
hange
0.0000
0.0000
0.0000
0.0000
0.0000
======================================================
Model results:
******************************************************
EMM example
GMM Estimation Results
BFGS
onvergen
e: Normal
onvergen
e
Obje
tive fun
tion value: 0.000000
Observations: 1000
Exa
tly identified, no spe
. test
estimate
st. err
t-stat
p-value
p1
1.069
0.022
47.618
0.000
p2
0.935
0.022
42.240
0.000
p3
1.085
0.022
49.630
0.000
p4
1.080
0.022
49.047
0.000
p5
0.978
0.023
41.643
0.000
******************************************************
It might be interesting to
ompare the standard errors with those obtained from ML
estimation, to
he
k e
ien
y of the EMM estimator.
study.
314
CHAPTER 19.
SIMULATION-BASED ESTIMATION
Chapter 20
Parallel programming for e
onometri
s
The following borrows heavily from Creel (2005).
Parallel
omputing
an oer an important redu
tion in the time to
omplete
omputations.
This is well-known, but it bears emphasis sin
e it is the main reason that parallel
omputing
may be attra
tive to users. To illustrate, the Intel Pentium IV (Willamette) pro
essor, running
at 1.5GHz, was introdu
ed in November of 2000. The Pentium IV (Northwood-HT) pro
essor,
running at 3.06GHz, was introdu
ed in November of 2002. An approximate doubling of the
performan
e of a
ommodity CPU took pla
e in two years.
rough snapshot of the evolution of the performan
e of
ommodity pro
essors, one would need
to wait more than 6.6 years and then pur
hase a new
omputer to obtain a 10-fold improvement
in
omputational performan
e. The examples in this
hapter show that a 10-fold improvement
in performan
e
an be a
hieved immediately, using distributed parallel
omputing on available
omputers.
Re
ent (this is written in 2005) developments that may make parallel
omputing attra
tive
to a broader spe
trum of resear
hers who do
omputations. The rst is the fa
t that setting
up a
luster of
omputers for distributed parallel
omputing is not di
ult. If you are using
the ParallelKnoppix bootable CD that a
ompanies these notes, you are less than 10 minutes
away from
reating a
luster, supposing you have a se
ond
omputer at hand and a
rossover
ethernet
able. See the ParallelKnoppix tutorial. A se
ond development is the existen
e of
extensions to some of the high-level matrix programming (HLMP) languages
in
orporation of parallelism into programs written in these languages. A third is the spread
of dual and quad-
ore CPUs, so that an ordinary desktop or laptop
omputer
an be made
into a mini-
luster. Those
ores won't work together on a single problem unless they are told
how to.
Following are examples of parallel implementations of several mainstream problems in
e
onometri
s. A fo
us of the examples is on the possibility of hiding parallelization from end
users of programs. If programs that run in parallel have an interfa
e that is nearly identi
al
to the interfa
e of equivalent serial versions, end users will nd it easy to take advantage
of parallel
omputing's performan
e.
1
By high-level matrix programming language I mean languages su h as MATLAB (TM the Mathworks,
In .), Ox (TM OxMetri s Te hnologies, Ltd.), and GNU O tave (www.o tave.org), for example.
315
316
CHAPTER 20.
et al.
(2004).
There are
also parallel pa
kages for Ox, R, and Python whi
h may be of interest to e
onometri
ians,
but as of this writing, the following examples are the most a
essible introdu
tion to parallel
programming for e
onometri
ians.
et al.
et. al.
al
ulates the tra
e test statisti
for the la
k of
ointegration of integrated time series. This
fun
tion is illustrative of the format that we adopt for Monte Carlo simulation of a fun
tion:
it re
eives a single argument of
ell type, and it returns a row ve
tor that holds the results of
one random simulation. The single argument in this
ase is a
ell array that holds the length
of the series in its rst position, and the number of series in the se
ond position. It generates
a random result though a pro
ess that is internal to the fun
tion, and it reports some output
in a row ve
tor (in this
ase the result is a s
alar).
m
_example1.m is an O
tave s
ript that exe
utes a Monte Carlo study of the tra
e test by
repeatedly evaluating the
tra etest.m
monte arlo.m
monte arlo.m.
exe utes serially on the omputer it is alled from. In line 10, there
is a fourth argument. When
alled with four arguments, the last argument is the number of
slave hosts to use. We see that running the Monte Carlo study on one or more pro
essors is
transparent to the user - he or she must only indi
ate the number of slave
omputers to be
used.
20.1.2 ML
For a sample
{(yt , xt )}n
of
an be dened as
= arg max sn ()
where
sn () =
1X
ln f (yt |xt , )
n t=1
20.1.
EXAMPLE PROBLEMS
Here,
yt
317
may be a ve tor of random variables, and the model may be dynami sin e
ontain lags of
yt .
xt
may
As Swann (2002) points out, this an be broken into sums over blo ks of
1
sn () =
n
n1
X
t=1
Analogously, we an dene up to
ln f (yt |xt , )
n blo ks.
n
X
t=n1 +1
!)
ln f (yt |xt , )
read, the name of the density fun
tion is provided in the variable
of the parameter ve
tor is set. In line 5, the fun
tion
al
ulation of the ML estimator, while in line 7 the same fun
tion is
alled with 6 arguments.
The fourth and fth arguments are empty pla
eholders where options to
mle_estimate may be
set, while the sixth argument is the number of slave
omputers to use for parallel exe
ution,
1 in this
ase.
A person who runs the program sees no parallel programming ode - the
parallelization is transparent to the end user, beyond having to sele t the number of slave
theta_p,
whi h
It is worth noting that a dierent likelihood fun tion may be used by making the
model
omputers. When exe uted, this s ript prints out the estimates
theta_s
and
mle_estimate
all any likelihood fun
tion that has the appropriate input/output syntax for evaluation either
serially or in parallel.
Users need only learn how to write the likelihood fun tion using the
O tave language.
20.1.3 GMM
For a sample as above, the GMM estimator of the parameter
arg min sn ()
where
sn () = mn () Wn mn ()
and
mn () =
1X
mt (yt |xt , )
n t=1
an be dened as
318
CHAPTER 20.
Sin e
mn () is an average, it
an obviously be
omputed blo
kwise, using for example 2 blo
ks:
1
mn () =
n
n1
X
t=1
mt (yt |xt , )
n
X
t=n1 +1
!)
mt (yt |xt , )
(20.1)
dierent ma
hine.
gmm_example1.m is a s
ript that illustrates how GMM estimation may be done serially
or in parallel. When this is run,
theta_s
and
theta_p
perform the estimation in parallel in virtually the same way as it is done serially.
Again,
gmm_estimate,
used in lines 8 and 10, is a generi fun tion that will estimate any model
moments
of the
moments
moments
that uses no parallel programming, so users
an write their models using the simple and
intuitive HLMP syntax of O
tave. Whether estimation is done in parallel or serially depends
only the seventh argument to
gmm_estimate
default done serially with one pro
essor. When it is positive, it spe
ies the number of slave
nodes to use.
g(x)
at a point
is
Pn
yt K [(x xt ) /n ]
Pt=1
n
t=1 K [(x xt ) /n ]
n
X
wt yy
g(x) =
t=1
We see that the weight depends upon every data point in the sample.
at every point in a sample of size
n,
To al ulate the t
2
on the order of n k
al
ulations must be done, where
x.
We follow this
implementation here. kernel_example1.m is a s
ript for serial and parallel kernel regression.
Serial exe
ution is obtained by setting the number of slaves equal to zero, in line 15. In line
17, a single slave is spe
ied, so exe
ution is in parallel on the master and slave nodes.
The example programs show that parallelization may be mostly hidden from end users.
Users
an benet from parallelization without having to write or understand parallel
ode.
The speedups one
an obtain are highly dependent upon the spe
i
problem at hand, as well
as the size of the
luster, the e
ien
y of the network,
et .
presented in Creel (2005). Figure 20.1 reprodu
es speedups for some e
onometri
problems on
a
luster of 12 desktop
omputers. The speedup for
20.1.
319
EXAMPLE PROBLEMS
MONTECARLO
BOOTSTRAP
MLE
GMM
KERNEL
9
8
7
6
5
4
3
2
1
2
10
12
nodes
10X speedups, as
laimed in the introdu
tion. It's pretty obvious that mu
h greater speedups
ould be obtained using a larger
luster, for the embarrassingly parallel problems.
320
CHAPTER 20.
Bibliography
[1 Bru
he, M. (2003) A note on embarassingly parallel
omputation using OpenMosix and
Ox, working paper, Finan
ial Markets Group, London S
hool of E
onomi
s.
[2 Creel, M. (2005) User-friendly parallel
omputations with e
onometri
examples,
Com-
[3 Doornik, J.A., D.F. Hendry and N. Shephard (2002) Computationally-intensive e onometri s using a distributed matrix-programming language,
at
.ugr.es/javier-bin/mpitb.
[5 Ra
ine, Je (2002) Parallel distributed kernel estimation,
[6 Swann, C.A. (2002) Maximum likelihood estimation using parallel omputing: an introdu tion to MPI,
321
322
BIBLIOGRAPHY
Chapter 21
Final proje
t: e
onometri
estimation
of a RBC model
THIS IS NOT FINISHED - IGNORE IT FOR NOW
In this last
hapter we'll go through a worked example that
ombines a number of the
topi
s we've seen. We'll do simulated method of moments estimation of a real business
y
le
model, similar to what Valderrama (2002) does.
21.1 Data
We'll develop a model for private
onsumption and real gross private investment. The data
are obtained from the US Bureau of E
onomi
Analysis (BEA) National In
ome and Produ
t
A
ounts (NIPA), Table 11.1.5, Lines 2 and 6 (you
an download quarterly data from 1947-I to
the present). The data we use are in the le rb
_data.m. This data is real (
onstant dollars).
The program plots.m will make a few plots, in
luding Figures 21.1 though 21.3.
First
looking at the plot for levels, we
an see that real
onsumption and investment are
learly
nonstationary (surprise, surprise). There appears to be somewhat of a stru
tural
hange in
the mid-1970's.
Looking at growth rates, the series for
onsumption has an extended period of high growth in
the 1970's, be
oming more moderate in the 90's. The volatility of growth of
onsumption has
de
lined somewhat, over time. Looking at investment, there are some notable periods of high
volatility in the mid-1970's and early 1980's, for example. Sin
e 1990 or so, volatility seems
to have de
lined.
E
onomi
models for growth often imply that there is no long term growth (!) - the data
that the models generate is stationary and ergodi
.
needs to be passed through the inverse of a lter. We'll follow this, and generate stationary
business
y
le data by applying the bandpass lter of Christiano and Fitzgerald (1999). The
ltered data is in Figure 21.3. We'll try to spe
ify an e
onomi
model that
an generate similar
323
data. To get data that look like the levels for
onsumption and investment, we'd need to apply
the inverse of the bandpass lter.
max{ct ,kt }
E0
t=0
t=0
t U (c )
t
(1 ) kt1 + t kt1
ct + kt
log t
log t1 + t
IIN (0, 2 )
U (ct ) =
c1
1
t
1
is observed in period
= 1,
t.
mi .
gross investment,
it ,
it = kt (1 ) kt1
(k0 , 0 )
is given.
= , , , , , 2
on
onsumption and investment. This problem is very similar to the GMM estimation of the
portfolio model dis
ussed in Se
tions 15.12 and 15.13. On
e
an derive the Euler
ondition in
the same way we did there, and use it to dene a GMM estimator. That approa
h was not
very su
essful, re
all. Now we'll try to use some more informative moment
onditions to see
if we get better results.
21.3.
325
Let
yt
be a
A ve tor
G-ve tor
of
vt = Rt t ,
where
t IIN (0, I2 ),
and
Rt
j=1,2,...,p, are
GG matri es of parameters.
is upper triangular. So
Let
You
an think of a VAR model as the redu
ed form of a dynami
linear simultaneous equations
model where all of the variables are treated as endogenous. Clearly, if all of the variables are
endogenous, one would need some form of additional information to identify a stru
tural
model.
But we already have a stru tural model, and we're only going to use the VAR to
help us estimate the parameters. A well-tting redu
ed form model will be adequate for the
purpose.
We're seen that our data seems to have episodes where the varian
e of growth rates and
ltered data is non-
onstant. This brings us to the general area of sto
hasti
volatility. Without going into details, we'll just
onsider the exponential GARCH model of Nelson (1991) as
presented in Hamilton (1994, pg. 668-669).
Dene
is a
31
ht = vec (Rt ),
Rt
n
o
p
log hjt = j + P(j,.) |vt1 | 2/ + (j,.) vt1 + G(j,.) log ht1
The varian
e of the VAR error depends upon its own past, as well as upon the past realizations
of the sho
ks.
The advantage of the EGARCH formulation is that the varian e is assuredly positive
The matrix
has dimension
3 2.
The matrix
has dimension
3 3.
The matrix
v, m
for lags of
h).
allows for
leverage,
2 2.
We will probably want to restri t these parameter matri es in some way. For instan e,
t IIN (0, I2 )
t = R1
t vt
and we know how to
al
ulate
Rt
and
vt ,
Thus, it is
1
c
1
k
c
=
E
t+1 t
t
t
t+1
or
h
n
io 1
1
1
k
ct = Et c
t+1
t
t+1
t,
there exist parameter values that make the approximation as lose to the true
h
i
1
1
k
exp (0 + 1 log t + 2 log kt1 )
Et c
t+1 t
t+1
For given values of the parameters of this approximating fun
tion, we
an solve for
then for
kt
ct ,
and
ct + kt = (1 ) kt1 + t kt1
This allows us to generate a series
by tting
{(ct , kt )}.
1
c
= exp (0 + 1 log t + 2 log kt1 ) + t
t+1 1 + t+1 kt
by nonlinear least squares. The 2 step pro
edure of generating data and updating the parameters of the approximation to expe
tations is iterated until the parameters no longer
hange.
When this is the
ase, the expe
tations fun
tion is the best t to the generated data. As long
it is a ri
h enough parametri
model to en
ompass the true expe
tations fun
tion, it
an be
made to be equal to the true expe
tations fun
tion by using a long enough simulation.
Thus, given the parameters of the stru
tural model,
= , , , , , 2
, we an generate
21.5.
data
327
{(ct , kt )}
(1 ) kt1 .
{(ct , it )}
using
it = kt
This an be used to do EMM estimation using the s ores of the redu ed form
model to dene moments, using the simulated data from the stru tural model.
Bibliography
[1 Creel. M (2005) A Note on Parallelizing the Parameterized Expe
tations Algorithm.
[2 den Haan, W. and Mar
et, A. (1990) Solving the sto
hasti
growth model by parameterized expe
tations,
[3 Hamilton, J. (1994)
[4 Maliar, L. and Maliar, S. (2003) Matlab
ode for Solving a Neo
lassi
al Growh Model
with a Parametrized Expe
tations Algorithm and Moving Bounds
[5 Nelson, D. (1991) Conditional heteros
edasti
ity is asset returns: a new approa
h,
E ono-
a hallenge for
the anoni al RBC model, E onomi Resear h, Federal Reserve Bank of San Fran is o.
http://ideas.repe .org/p/fip/fedfap/2002-13.html
329
330
BIBLIOGRAPHY
Chapter 22
Introdu
tion to O
tave
Why is O
tave being used here, sin
e it's not that well-known by e
onometri
ians?
Well,
be
ause it is a high quality environment that is easily extensible, uses well-tested and high
performan
e numeri
al libraries, it is li
ensed under the GNU GPL, so you
an get it for free
and modify it if you like, and it runs on both GNU/Linux, Ma
OSX and Windows systems.
It's also quite easy to learn.
This will give you this same PDF le, but with all of the example
programs ready to run. The editor is
ongure with a ma
ro to exe
ute the programs using
O
tave, whi
h is of
ourse installed. From this point, I assume you are running the CD (or
sitting in the
omputer room a
ross the hall from my o
e), or that you have
ongured your
omputer to be able to run the
*.m
Students
of mine: your problem sets will in
lude exer
ises that
an be done by modifying the example
programs in relatively minor ways. So study the examples!
O
tave
an be used intera
tively, or it
an be used to run programs that are written using a
text editor. We'll use this se
ond method, preparing programs with NEdit, and
alling O
tave
from within the editor.
/home/knoppix/Desktop/E onometri s
folder
and
li
king on the i
on) and then type CTRL-ALT-o, or use the O
tave item in the Shell
menu (see Figure 22.1).
331
332
CHAPTER 22.
INTRODUCTION TO OCTAVE
22.3.
333
printf() doesn't
reads printf(hello
Note that the output is not formatted in a pleasing way. That's be
ause
automati
ally start a new line.
world\n);
Edit
first.m
We need to know how to load and save data. The program se
ond.m shows how. On
e you
have run this, you will nd the le x in the dire
tory
You might have a look at it with NEdit to see O
tave's default format for saving data. Basi
ally,
if you have data in an ASCII text le, named for example myfile.data, formed of numbers
separated by spa
es, just use the
ommand load
myfile.data.
Get the olle tion of support programs and the examples, from the do ument home page.
Put them somewhere, and tell O tave how to nd them, e.g., by putting a link to the
Make sure nedit is installed and
ongured to run O
tave and use syntax highlighting.
Copy the le
NeditConguration and save it in your $HOME dire
tory with the name .nedit. Not
to put too ne a point on it, please note that there is a period in that name.
Asso iate
*.m
les with NEdit so that they open up in the editor when you li k on
334
CHAPTER 22.
INTRODUCTION TO OCTAVE
Chapter 23
Notation and Review
All ve
tors will be
olumn ve
tors, unless they have a transpose symbol (or I forget to
apply this rule - your help
at
hing typos and er0rors is mu
h appre
iated). For example,
if
xt
is a
ve tor.
p1
ve tor,
xt
is a
1p
p-ve tor,
I mean a olumn
s() : p
p-ve tor,
s()
is a
s()
=
1p
2 s()
s()
1
s()
2
.
.
.
s()
p
ve tor, and
s()
f ():p n
valued transpose of
Produ
t rule:
p-ve
tor .
be a
n-ve tor
. Then
Let
Then
f ():p n
and
has dimension
1 p.
2 s()
is a
a x
x
pp
s()
= a.
p-ve
tor .
h():p n
be
+f
n-ve
tor
h() f () =
matrix. Also,
f ().
h() f () = h
f h+
h f
335
s()
is organized as a
=
f ()
Then
p-ve tor .
Let
f ()
be the
1n
336
CHAPTER 23.
p 1.
Chain rule :
Let
r
let g():
p be a
has dimension
f ():p n
p-ve
tor
n r.
n-ve tor
x Ax
x
r -ve tor
= A + A .
p-ve tor
argument, and
valued argument
Then
f [g ()] =
f ()
g()
=g()
exp(x )
= exp(x )x.
We will
onsider several modes of
onvergen
e. The rst three modes dis
ussed are simply
for ba
kground. The sto
hasti
modes are those whi
h will be used later in the
ourse.
Denition 36 A sequen
e is a mapping from the natural numbers {1, 2, ...} = {n}
n=1 = {n}
to some other set, so that the set is ordered a
ording to the natural numbers asso
iated with
its elements.
[Convergen e
{fn ()}
where
fn : T .
[Pointwise onvergen e
depends upon
23.2.
337
CONVERGENGE MODES
[Uniform onvergen e
uniformly
(insert a diagram here showing the envelope around f () in whi h fn() must lie)
(, F, P ) ,
re all that a random variable maps the sample spa e to the real line
{Xn ()}
, X() :
i.e., ea
h
i.e.
Xn () is a random variable with respe
t to the probability spa
e (, F, P ) . For example, given
0
n = (X X)1 X Y, where n is the sample size,
the model Y = X + , the OLS estimator
n }. A number of modes of
onvergen
e are
an be used to form a sequen
e of random ve
tors {
in use when dealing with sequen
es of random variables. Several su
h modes of
onvergen
e
should already be familiar:
Denition 40
[Convergen e in probability
Xn X,
or plim
Xn = X.
P (A) = 1.
In other words,
set
C = A
Xn X, a.s.
Xn () X()
su h that
P (C) = 0.
a.s.
Xn X,
or
a.s.
Xn X Xn X.
[Convergen e in distribution
Xn X.
338
CHAPTER 23.
n = 0 +
and
X X
n
1
X
n
a.s.
n 0
in the OLS
a.s.
X
n
by a SLLN. Note that this term is not a fun tion of the parameter
This
easy proof is a result of the linearity of the model, whi
h allows us to express the estimator
in a way that separates parameters from random fun
tions. In general, this is not possible.
We often deal with the more
ompli
ated situation where the sto
hasti
sequen
e depends on
parameters in a manner that is not redu
ible to a simple sequen
e of random variables. In
this
ase, we have a sequen
e of random fun
tions that depend on
Xn (, ) is a random
Denition 43
surely in to X(, ) if
{Xn (, )}
: {Xn (, )},
(, F, P )
where ea h
(, F, P )
for all
and
X(, )
onvergen e in probability by
Xn (, )
u.a.s.
u.p.
and uniform
An equivalent denition, based on the fa
t that almost sure means with probability
one is
Pr
lim sup |Xn (, ) X(, )| = 0 = 1
sup.
Denition 44
o(g(n))
means
[Little-o
Let f (n) and g(n) be two real-valued fun tions. The notation f (n) =
(n)
limn fg(n)
= 0.
[Big-O
{fn }
and
{gn }
f (n)
g(n) have a limit (it may u
tuate boundedly).
23.3.
339
f (n) p
g(n)
0.
Example 47 The least squares estimator = (X X)1 X Y = (X X)1 X X 0 + = 0 +
X)1 X
Asymptoti
ally, the term op (1) is negligible. This is just a way of indi
ating that the LS
estimator is
onsistent.
Denition 48 The notation f (n) = Op (g(n)) means there exists some N su h that for > 0
f (n)
P
< K > 1 ,
g(n)
Example 50 Consider a random sample of iid r.v.'s with mean 0 and varian
e 2 . The
P
A
estimator of the mean = 1/n ni=1 xi is asymptoti
ally normally distributed, e.g., n1/2
N (0, 2 ). So n1/2 = Op (1), so = Op (n1/2 ). Before we had = op (1), now we have have
the stronger result that relates the rate of
onvergen
e to the sample size.
Example 51 Now
onsider a random sample of iid r.v.'s with mean and varian
e 2 .
P
The estimator
of the mean = 1/n i=1 xi is asymptoti
ally normally distributed, e.g.,
A
n1/2 N (0, 2 ). So n1/2 = Op (1), so = Op (n1/2 ), so = Op (1).
These two examples show that averages of
entered (mean zero) quantities typi
ally have
plim 0, while averages of un
entered quantities have nite nonzero plims.
denition of
Op
Asymptoti equality
Denition 52 Two sequen es of random variables {fn } and {gn } are asymptoti ally equal
(written fn =a gn ) if
plim
f (n)
g(n)
op
=1
and
Op
340
CHAPTER 23.
For
For
For
and
both
For
and
both
and
both
pp
p1
p1
matrix and
p1
p1
Dx a x = a.
Dx2 x Ax = A + A .
D2 exp x .
Write an O
tave program that veries ea
h of the previous results by taking numeri
derivatives. For a hint, type
help numgradient
and
help numhessian
inside o tave.
Chapter 24
Li
enses
This do
ument and the asso
iated examples and materials are
opyright Mi
hael Creel, under
the terms of the GNU General Publi
Li
ense, ver. 2., or at your option, under the Creative
Commons Attribution-Share Alike Li
ense, Version 2.5. The li
enses follow.
342
CHAPTER 24.
LICENSES
this servi
e if you wish), that you re
eive sour
e
ode or
an get it
if you want it, that you
an
hange the software or use pie
es of it
in new free programs; and that you know you
an do these things.
To prote
t your rights, we need to make restri
tions that forbid
anyone to deny you these rights or to ask you to surrender the rights.
These restri
tions translate to
ertain responsibilities for you if you
distribute
opies of the software, or if you modify it.
For example, if you distribute
opies of su
h a program, whether
gratis or for a fee, you must give the re
ipients all the rights that
you have. You must make sure that they, too, re
eive or
an get the
sour
e
ode. And you must show them these terms so they know their
rights.
We prote
t your rights with two steps: (1)
opyright the software, and
(2) offer you this li
ense whi
h gives you legal permission to
opy,
distribute and/or modify the software.
Also, for ea
h author's prote
tion and ours, we want to make
ertain
that everyone understands that there is no warranty for this free
software. If the software is modified by someone else and passed on, we
want its re
ipients to know that what they have is not the original, so
that any problems introdu
ed by others will not refle
t on the original
authors' reputations.
Finally, any free program is threatened
onstantly by software
patents. We wish to avoid the danger that redistributors of a free
program will individually obtain patent li
enses, in effe
t making the
program proprietary. To prevent this, we have made it
lear that any
patent must be li
ensed for everyone's free use or not li
ensed at all.
The pre
ise terms and
onditions for
opying, distribution and
modifi
ation follow.
24.1.
343
THE GPL
344
CHAPTER 24.
LICENSES
24.1.
THE GPL
4. You may not
opy, modify, subli
ense, or distribute the Program
ex
ept as expressly provided under this Li
ense. Any attempt
otherwise to
opy, modify, subli
ense or distribute the Program is
void, and will automati
ally terminate your rights under this Li
ense.
345
346
CHAPTER 24.
LICENSES
However, parties who have re
eived
opies, or rights, from you under
this Li
ense will not have their li
enses terminated so long as su
h
parties remain in full
omplian
e.
5. You are not required to a
ept this Li
ense, sin
e you have not
signed it. However, nothing else grants you permission to modify or
distribute the Program or its derivative works. These a
tions are
prohibited by law if you do not a
ept this Li
ense. Therefore, by
modifying or distributing the Program (or any work based on the
Program), you indi
ate your a
eptan
e of this Li
ense to do so, and
all its terms and
onditions for
opying, distributing or modifying
the Program or works based on it.
6. Ea
h time you redistribute the Program (or any work based on the
Program), the re
ipient automati
ally re
eives a li
ense from the
original li
ensor to
opy, distribute or modify the Program subje
t to
these terms and
onditions. You may not impose any further
restri
tions on the re
ipients' exer
ise of the rights granted herein.
You are not responsible for enfor
ing
omplian
e by third parties to
this Li
ense.
7. If, as a
onsequen
e of a
ourt judgment or allegation of patent
infringement or for any other reason (not limited to patent issues),
onditions are imposed on you (whether by
ourt order, agreement or
otherwise) that
ontradi
t the
onditions of this Li
ense, they do not
ex
use you from the
onditions of this Li
ense. If you
annot
distribute so as to satisfy simultaneously your obligations under this
Li
ense and any other pertinent obligations, then as a
onsequen
e you
may not distribute the Program at all. For example, if a patent
li
ense would not permit royalty-free redistribution of the Program by
all those who re
eive
opies dire
tly or indire
tly through you, then
the only way you
ould satisfy both it and this Li
ense would be to
refrain entirely from distribution of the Program.
If any portion of this se
tion is held invalid or unenfor
eable under
any parti
ular
ir
umstan
e, the balan
e of the se
tion is intended to
apply and the se
tion as a whole is intended to apply in other
ir
umstan
es.
It is not the purpose of this se
tion to indu
e you to infringe any
patents or other property right
laims or to
ontest validity of any
24.1.
THE GPL
347
su
h
laims; this se
tion has the sole purpose of prote
ting the
integrity of the free software distribution system, whi
h is
implemented by publi
li
ense pra
ti
es. Many people have made
generous
ontributions to the wide range of software distributed
through that system in relian
e on
onsistent appli
ation of that
system; it is up to the author/donor to de
ide if he or she is willing
to distribute software through any other system and a li
ensee
annot
impose that
hoi
e.
This se
tion is intended to make thoroughly
lear what is believed to
be a
onsequen
e of the rest of this Li
ense.
348
CHAPTER 24.
LICENSES
make ex
eptions for this. Our de
ision will be guided by the two goals
of preserving the free status of all derivatives of our free software and
of promoting the sharing and reuse of software generally.
NO WARRANTY
11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN
OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS
TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE
PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
REPAIR OR CORRECTION.
12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
POSSIBILITY OF SUCH DAMAGES.
END OF TERMS AND CONDITIONS
24.1.
THE GPL
the "
opyright" line and a pointer to where the full noti
e is found.
<one line to give the program's name and a brief idea of what it does.>
Copyright (C) <year> <name of author>
This program is free software; you
an redistribute it and/or modify
it under the terms of the GNU General Publi
Li
ense as published by
the Free Software Foundation; either version 2 of the Li
ense, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Publi
Li
ense for more details.
You should have re
eived a
opy of the GNU General Publi
Li
ense
along with this program; if not, write to the Free Software
Foundation, In
., 59 Temple Pla
e, Suite 330, Boston, MA 02111-1307 USA
Also add information on how to
onta
t you by ele
troni
and paper mail.
If the program is intera
tive, make it output a short noti
e like this
when it starts in an intera
tive mode:
Gnomovision version 69, Copyright (C) year name of author
Gnomovision
omes with ABSOLUTELY NO WARRANTY; for details type `show w'.
This is free software, and you are wel
ome to redistribute it
under
ertain
onditions; type `show
' for details.
The hypotheti
al
ommands `show w' and `show
' should show the appropriate
parts of the General Publi
Li
ense. Of
ourse, the
ommands you use may
be
alled something other than `show w' and `show
'; they
ould even be
mouse-
li
ks or menu items--whatever suits your program.
You should also get your employer (if you work as a programmer) or your
s
hool, if any, to sign a "
opyright dis
laimer" for the program, if
ne
essary. Here is a sample; alter the names:
Yoyodyne, In
., hereby dis
laims all
opyright interest in the program
`Gnomovision' (whi
h makes passes at
ompilers) written by James Ha
ker.
349
350
CHAPTER 24.
LICENSES
24.2.
351
CREATIVE COMMONS
this Li
ense. For the avoidan
e of doubt, where the Work is a musi
al
omposition or sound
re
ording, the syn
hronization of the Work in timed-relation with a moving image ("syn
hing")
will be
onsidered a Derivative Work for the purpose of this Li
ense.
3. "Li
ensor" means the individual or entity that oers the Work under the terms of this
Li
ense.
4. "Original Author" means the individual or entity who
reated the Work.
5.
"Work" means the opyrightable work of authorship oered under the terms of this
Li
ense.
6. "You" means an individual or entity exer
ising rights under this Li
ense who has not
previously violated the terms of this Li
ense with respe
t to the Work, or who has re
eived
express permission from the Li
ensor to exer
ise rights under this Li
ense despite a previous
violation.
7.
"Li ense Elements" means the following high-level li ense attributes as sele ted by
Li
ensor and indi
ated in the title of this Li
ense: Attribution, ShareAlike.
2.
rights arising from fair use, rst sale or other limitations on the ex
lusive rights of the
opyright
owner under
opyright law or other appli
able laws.
3.
Li ense Grant.
grants You a worldwide, royalty-free, non-ex
lusive, perpetual (for the duration of the appli
able
opyright) li
ense to exer
ise the rights in the Work as stated below:
1.
to reprodu e the Work, to in orporate the Work into one or more Colle tive Works,
to distribute opies or phonore ords of, display publi ly, perform publi ly, and per-
form publi
ly by means of a digital audio transmission the Work in
luding as in
orporated in
Colle
tive Works;
4. to distribute
opies or phonore
ords of, display publi
ly, perform publi
ly, and perform
publi
ly by means of a digital audio transmission Derivative Works.
5.
For the avoidan
e of doubt, where the work is a musi
al
omposition:
1. Performan
e Royalties Under Blanket Li
enses. Li
ensor waives the ex
lusive right to
olle
t, whether individually or via a performan
e rights so
iety (e.g. ASCAP, BMI, SESAC),
royalties for the publi
performan
e or publi
digital performan
e (e.g. web
ast) of the Work.
2. Me
hani
al Rights and Statutory Royalties. Li
ensor waives the ex
lusive right to
olle
t, whether individually or via a musi
rights so
iety or designated agent (e.g. Harry Fox
Agen
y), royalties for any phonore
ord You
reate from the Work ("
over version") and distribute, subje
t to the
ompulsory li
ense
reated by 17 USC Se
tion 115 of the US Copyright
A
t (or the equivalent in other jurisdi
tions).
6.
Work is a sound re
ording, Li
ensor waives the ex
lusive right to
olle
t, whether individually
or via a performan
e-rights so
iety (e.g.
352
CHAPTER 24.
LICENSES
performan
e (e.g. web
ast) of the Work, subje
t to the
ompulsory li
ense
reated by 17 USC
Se
tion 114 of the US Copyright A
t (or the equivalent in other jurisdi
tions).
The above rights may be exer
ised in all media and formats whether now known or hereafter
devised.
The above rights in lude the right to make su h modi ations as are te hni ally
ne
essary to exer
ise the rights in other media and formats. All rights not expressly granted
by Li
ensor are hereby reserved.
4. Restri
tions.The li
ense granted in Se
tion 3 above is expressly made subje
t to and
limited by the following restri
tions:
1.
You may distribute, publi ly display, publi ly perform, or publi ly digitally perform
the Work only under the terms of this Li
ense, and You must in
lude a
opy of, or the
Uniform Resour
e Identier for, this Li
ense with every
opy or phonore
ord of the Work
You distribute, publi
ly display, publi
ly perform, or publi
ly digitally perform. You may not
oer or impose any terms on the Work that alter or restri
t the terms of this Li
ense or the
re
ipients' exer
ise of the rights granted hereunder. You may not subli
ense the Work. You
must keep inta
t all noti
es that refer to this Li
ense and to the dis
laimer of warranties.
You may not distribute, publi
ly display, publi
ly perform, or publi
ly digitally perform the
Work with any te
hnologi
al measures that
ontrol a
ess or use of the Work in a manner
in
onsistent with the terms of this Li
ense Agreement.
in
orporated in a Colle
tive Work, but this does not require the Colle
tive Work apart from
the Work itself to be made subje
t to the terms of this Li
ense.
Work, upon noti
e from any Li
ensor You must, to the extent pra
ti
able, remove from the
Colle
tive Work any
redit as required by
lause 4(
), as requested. If You
reate a Derivative
Work, upon noti
e from any Li
ensor You must, to the extent pra
ti
able, remove from the
Derivative Work any
redit as required by
lause 4(
), as requested.
2.
You may distribute, publi ly display, publi ly perform, or publi ly digitally perform
a Derivative Work only under the terms of this Li
ense, a later version of this Li
ense with
the same Li
ense Elements as this Li
ense, or a Creative Commons iCommons li
ense that
ontains the same Li
ense Elements as this Li
ense (e.g. Attribution-ShareAlike 2.5 Japan).
You must in
lude a
opy of, or the Uniform Resour
e Identier for, this Li
ense or other
li
ense spe
ied in the previous senten
e with every
opy or phonore
ord of ea
h Derivative
Work You distribute, publi
ly display, publi
ly perform, or publi
ly digitally perform.
You
may not oer or impose any terms on the Derivative Works that alter or restri
t the terms
of this Li
ense or the re
ipients' exer
ise of the rights granted hereunder, and You must keep
inta
t all noti
es that refer to this Li
ense and to the dis
laimer of warranties.
You may
not distribute, publi
ly display, publi
ly perform, or publi
ly digitally perform the Derivative
Work with any te
hnologi
al measures that
ontrol a
ess or use of the Work in a manner
in
onsistent with the terms of this Li
ense Agreement. The above applies to the Derivative
Work as in
orporated in a Colle
tive Work, but this does not require the Colle
tive Work
apart from the Derivative Work itself to be made subje
t to the terms of this Li
ense.
3. If you distribute, publi
ly display, publi
ly perform, or publi
ly digitally perform the
Work or any Derivative Works or Colle
tive Works, You must keep inta
t all
opyright noti
es
24.2.
CREATIVE COMMONS
353
for the Work and provide, reasonable to the medium or means You are utilizing: (i) the name
of the Original Author (or pseudonym, if appli
able) if supplied, and/or (ii) if the Original
Author and/or Li
ensor designate another party or parties (e.g. a sponsor institute, publishing
entity, journal) for attribution in Li
ensor's
opyright noti
e, terms of servi
e or by other
reasonable means, the name of su
h party or parties; the title of the Work if supplied; to the
extent reasonably pra
ti
able, the Uniform Resour
e Identier, if any, that Li
ensor spe
ies
to be asso
iated with the Work, unless su
h URI does not refer to the
opyright noti
e or
li
ensing information for the Work; and in the
ase of a Derivative Work, a
redit identifying
the use of the Work in the Derivative Work (e.g., "Fren
h translation of the Work by Original
Author," or "S
reenplay based on original Work by Original Author"). Su
h
redit may be
implemented in any reasonable manner; provided, however, that in the
ase of a Derivative
Work or Colle
tive Work, at a minimum su
h
redit will appear where any other
omparable
authorship
redit appears and in a manner at least as prominent as su
h other
omparable
authorship
redit.
5. Representations, Warranties and Dis
laimer
UNLESS OTHERWISE AGREED TO BY THE PARTIES IN WRITING, LICENSOR
OFFERS THE WORK AS-IS AND MAKES NO REPRESENTATIONS OR WARRANTIES
OF ANY KIND CONCERNING THE MATERIALS, EXPRESS, IMPLIED, STATUTORY
OR OTHERWISE, INCLUDING, WITHOUT LIMITATION, WARRANTIES OF TITLE,
MERCHANTIBILITY, FITNESS FOR A PARTICULAR PURPOSE, NONINFRINGEMENT,
OR THE ABSENCE OF LATENT OR OTHER DEFECTS, ACCURACY, OR THE PRESENCE OF ABSENCE OF ERRORS, WHETHER OR NOT DISCOVERABLE. SOME JURISDICTIONS DO NOT ALLOW THE EXCLUSION OF IMPLIED WARRANTIES, SO
SUCH EXCLUSION MAY NOT APPLY TO YOU.
6. Limitation on Liability. EXCEPT TO THE EXTENT REQUIRED BY APPLICABLE
LAW, IN NO EVENT WILL LICENSOR BE LIABLE TO YOU ON ANY LEGAL THEORY
FOR ANY SPECIAL, INCIDENTAL, CONSEQUENTIAL, PUNITIVE OR EXEMPLARY
DAMAGES ARISING OUT OF THIS LICENSE OR THE USE OF THE WORK, EVEN IF
LICENSOR HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
7. Termination
1. This Li
ense and the rights granted hereunder will terminate automati
ally upon any
brea
h by You of the terms of this Li
ense. Individuals or entities who have re
eived Derivative
Works or Colle
tive Works from You under this Li
ense, however, will not have their li
enses
terminated provided su
h individuals or entities remain in full
omplian
e with those li
enses.
Se
tions 1, 2, 5, 6, 7, and 8 will survive any termination of this Li
ense.
2. Subje
t to the above terms and
onditions, the li
ense granted here is perpetual (for
the duration of the appli
able
opyright in the Work). Notwithstanding the above, Li
ensor
reserves the right to release the Work under dierent li
ense terms or to stop distributing the
Work at any time; provided, however that any su
h ele
tion will not serve to withdraw this
Li
ense (or any other li
ense that has been, or is required to be, granted under the terms of
this Li
ense), and this Li
ense will
ontinue in full for
e and ee
t unless terminated as stated
354
CHAPTER 24.
LICENSES
above.
8. Mis
ellaneous
1. Ea
h time You distribute or publi
ly digitally perform the Work or a Colle
tive Work,
the Li
ensor oers to the re
ipient a li
ense to the Work on the same terms and
onditions as
the li
ense granted to You under this Li
ense.
2.
oers to the re
ipient a li
ense to the original Work on the same terms and
onditions as the
li
ense granted to You under this Li
ense.
3. If any provision of this Li
ense is invalid or unenfor
eable under appli
able law, it shall
not ae
t the validity or enfor
eability of the remainder of the terms of this Li
ense, and
without further a
tion by the parties to this agreement, su
h provision shall be reformed to
the minimum extent ne
essary to make su
h provision valid and enfor
eable.
4. No term or provision of this Li
ense shall be deemed waived and no brea
h
onsented to
unless su
h waiver or
onsent shall be in writing and signed by the party to be
harged with
su
h waiver or
onsent.
5. This Li
ense
onstitutes the entire agreement between the parties with respe
t to the
Work li
ensed here. There are no understandings, agreements or representations with respe
t
to the Work not spe
ied here. Li
ensor shall not be bound by any additional provisions that
may appear in any
ommuni
ation from You. This Li
ense may not be modied without the
mutual written agreement of the Li
ensor and You.
Creative Commons is not a party to this Li
ense, and makes no warranty whatsoever in
onne
tion with the Work. Creative Commons will not be liable to You or any party on any
legal theory for any damages whatsoever, in
luding without limitation any general, spe
ial,
in
idental or
onsequential damages arising in
onne
tion to this li
ense. Notwithstanding the
foregoing two (2) senten
es, if Creative Commons has expressly identied itself as the Li
ensor
hereunder, it shall have all rights and obligations of Li
ensor.
Ex
ept for the limited purpose of indi
ating to the publi
that the Work is li
ensed under
the CCPL, neither party will use the trademark "Creative Commons" or any related trademark
or logo of Creative Commons without the prior written
onsent of Creative Commons. Any
permitted use will be in
omplian
e with Creative Commons' then-
urrent trademark usage
guidelines, as may be published on its website or otherwise made available upon request from
time to time.
Creative Commons may be
onta
ted at http://
reative
ommons.org/.
Chapter 25
The atti
This holds material that is not really ready to be in
orporated into the main body, but that
I don't want to lose. Basi
ally, ignore it, unless you'd like to help get it ready for in
lusion.
Pn
f (y = j) =
1(yi = j)/n
A tual
f(y = j) =
i=1 fY (j|xi , )/n We see that for the OBDV measure, there are many more a
tual zeros
Table 25.1: A
tual and Poisson tted frequen
ies
Count
OBDV
ERV
Count
A tual
Fitted
A tual
Fitted
0.32
0.06
0.86
0.83
0.18
0.15
0.10
0.14
0.11
0.19
0.02
0.02
0.10
0.18
0.004
0.002
0.052
0.15
0.002
0.0002
0.032
0.10
2.4e-5
than predi
ted. For ERV, there are somewhat more a
tual zeros than tted, but the dieren
e
is not too important.
Why might OBDV not t the zeros well?
do tor
visits are needed. This is a prin
ipal/agent type situation, where the total number of visits
depends upon the de
ision of both the patient and the do
tor. Sin
e dierent parameters may
govern the two de
ision-makers
hoi
es, we might expe
t that dierent parameters govern the
probability of zeros versus the other
ounts. Let
for visits, and let
initiate visits a ording to a dis rete hoi e model, for example, a logit model:
355
356
CHAPTER 25.
THE ATTIC
1/ [1 + exp(p )] ,
The above probabilities are used to estimate the binary 0/1 hurdle pro ess.
observations where visits are positive, a trun
ated Poisson density is estimated. This density
is
fY (y, d |y > 0) =
=
fY (y, d )
Pr(y > 0)
fY (y, d )
1 exp(d )
Pr(y = 0) =
exp(d )0d
.
0!
Sin e the hurdle and trun ated omponents of the overall density for
share no parameters,
they may be estimated separately, whi
h is
omputationally more e
ient than estimating
the overall model.
(Re all that the BFGS algorithm, for example, will have to invert the
K2
where
is
is the number of
25.1.
HURDLE MODELS
357
Here are hurdle Poisson estimation results for OBDV, obtained from this estimation program
**************************************************************************
MEPS data, OBDV
logit results
Strong
onvergen
e
Observations = 500
Fun
tion value
-0.58939
t-Stats
params
t(OPG)
t(Sand.)
t(Hess)
onstant
-1.5502
-2.5709
-2.5269
-2.5560
pub_ins
1.0519
3.0520
3.0027
3.0384
priv_ins
0.45867
1.7289
1.6924
1.7166
sex
0.63570
3.0873
3.1677
3.1366
age
0.018614
2.1547
2.1969
2.1807
edu
0.039606
1.0467
0.98710
1.0222
in
0.077446
1.7655
2.1672
1.9601
Information Criteria
Consistent Akaike
639.89
S
hwartz
632.89
Hannan-Quinn
614.96
Akaike
603.39
**************************************************************************
358
CHAPTER 25.
THE ATTIC
**************************************************************************
MEPS data, OBDV
tpoisson results
Strong
onvergen
e
Observations = 500
Fun
tion value
-2.7042
t-Stats
params
t(OPG)
t(Sand.)
t(Hess)
onstant
0.54254
7.4291
1.1747
3.2323
pub_ins
0.31001
6.5708
1.7573
3.7183
priv_ins
0.014382
0.29433
0.10438
0.18112
sex
0.19075
10.293
1.1890
3.6942
age
0.016683
16.148
3.5262
7.9814
edu
0.016286
4.2144
0.56547
1.6353
in
-0.0079016
-2.3186
-0.35309
-0.96078
Information Criteria
Consistent Akaike
2754.7
S
hwartz
2747.7
Hannan-Quinn
2729.8
Akaike
2718.2
**************************************************************************
25.1.
359
HURDLE MODELS
Fitted and a tual probabilites (NB-II ts are provided as well) are:
OBDV
Count
A tual
0
1
ERV
Fitted HP
Fitted NB-II
A tual
Fitted HP
Fitted NB-II
0.32
0.32
0.34
0.86
0.86
0.86
0.18
0.035
0.16
0.10
0.10
0.10
0.11
0.071
0.11
0.02
0.02
0.02
0.10
0.10
0.08
0.004
0.006
0.006
0.052
0.11
0.06
0.002
0.002
0.002
0.032
0.10
0.05
0.0005
0.001
For the Hurdle Poisson models, the ERV t is very a
urate. The OBDV t is not so good.
Zeros are exa
t, but 1's and 2's are underestimated, and higher
ounts are overestimated. For
the NB-II ts, performan
e is at least as good as the hurdle Poisson model, and one should
re
all that many fewer parameters are used. Hurdle version of the negative binomial model
are also widely used.
360
CHAPTER 25.
THE ATTIC
**************************************************************************
MEPS data, OBDV
mixnegbin results
Strong
onvergen
e
Observations = 500
Fun
tion value
-2.2312
t-Stats
params
t(OPG)
t(Sand.)
t(Hess)
onstant
0.64852
1.3851
1.3226
1.4358
pub_ins
-0.062139
-0.23188
-0.13802
-0.18729
priv_ins
0.093396
0.46948
0.33046
0.40854
sex
0.39785
2.6121
2.2148
2.4882
age
0.015969
2.5173
2.5475
2.7151
edu
-0.049175
-1.8013
-1.7061
-1.8036
in
0.015880
0.58386
0.76782
0.73281
ln_alpha
0.69961
2.3456
2.0396
2.4029
onstant
-3.6130
-1.6126
-1.7365
-1.8411
pub_ins
2.3456
1.7527
3.7677
2.6519
priv_ins
0.77431
0.73854
1.1366
0.97338
sex
0.34886
0.80035
0.74016
0.81892
age
0.021425
1.1354
1.3032
1.3387
edu
0.22461
2.0922
1.7826
2.1470
in
0.019227
0.20453
0.40854
0.36313
ln_alpha
2.8419
6.2497
6.8702
7.6182
logit_inv_mix
0.85186
1.7096
1.4827
1.7883
Information Criteria
Consistent Akaike
2353.8
S
hwartz
2336.8
Hannan-Quinn
2293.3
Akaike
2265.2
**************************************************************************
Delta method for mix parameter st. err.
mix
se_mix
0.70096
0.12043
The 95%
onden
e interval for the mix parameter is perilously
lose to 1, whi
h suggests
that there may really be only one
omponent density, rather than a mixture. Again, this
is
not
25.1.
361
HURDLE MODELS
Edu
ation is interesting. For the subpopulation that is healthy, i.e., that makes relatively few visits, edu
ation seems to have a positive ee
t on visits. For the unhealthy
group, edu
ation has a negative ee
t on visits. The other results are more mixed. A
larger sample
ould help
larify things.
The following are results for a 2
omponent
onstrained mixture negative binomial model
where all the slope parameters in
j = exj
The
362
CHAPTER 25.
THE ATTIC
**************************************************************************
MEPS data, OBDV
mixnegbin results
Strong
onvergen
e
Observations = 500
Fun
tion value
-2.2441
t-Stats
params
t(OPG)
t(Sand.)
t(Hess)
onstant
-0.34153
-0.94203
-0.91456
-0.97943
pub_ins
0.45320
2.6206
2.5088
2.7067
priv_ins
0.20663
1.4258
1.3105
1.3895
sex
0.37714
3.1948
3.4929
3.5319
age
0.015822
3.1212
3.7806
3.7042
edu
0.011784
0.65887
0.50362
0.58331
in
0.014088
0.69088
0.96831
0.83408
ln_alpha
1.1798
4.6140
7.2462
6.4293
onst_2
1.2621
0.47525
2.5219
1.5060
lnalpha_2
2.7769
1.5539
6.4918
4.2243
logit_inv_mix
2.4888
0.60073
3.7224
1.9693
Information Criteria
Consistent Akaike
2323.5
S
hwartz
2312.5
Hannan-Quinn
2284.3
Akaike
2266.1
**************************************************************************
Delta method for mix parameter st. err.
mix
se_mix
0.92335
0.047318
The slope parameter estimates are pretty lose to what we got with the NB-I model.
25.2.
363
Hamilton,
xt .
yt
as a fun tion of
yt
as a fun tion
only of its own lagged values, un
onditional on other observable variables. One
an think of
this as modeling the behavior of
yt
immediately
lear why a model that has other explanatory variables should marginalize to a
linear in the parameters time series model, most time series work is done with linear models,
though nonlinear time series is also a large and growing eld.
series models.
indexed by time:
{Yt }
t=
(25.1)
Denition 54 (Time series) A time series is one observation of a sto hasti pro ess, over
a spe i interval:
{yt }nt=1
So a time series is a sample of size
(25.2)
in mind that
on
eptually, one
ould draw another sample, and that the values would be
dierent.
Denition 55 (Auto
ovarian
e) The j th auto
ovarian
e of a sto
hasti
pro
ess is
jt = E(yt t )(ytj tj )
(25.3)
where t = E (yt ) .
Denition 56 (Covarian
e (weak) stationarity) A sto
hasti
pro
ess is
ovarian
e sta-
tionary if it has time
onstant mean and auto
ovarian
es of all orders:
t
= , t
jt = j , t
As we've seen, this implies that
Denition 57 (Strong stationarity) A sto hasti pro ess is strongly stationary if the joint
364
CHAPTER 25.
THE ATTIC
Yt ?
The time series is one sample from the sto hasti pro ess. One
pro ., e.g.,
expe t that
{ytm }
By a LLN, we would
M
1 X
p
lim
ytm E(Yt )
M M
m=1
The problem is, we have only one sample to work with, sin
e we
an't go ba
k in time and
olle
t another. How
an
property.
E(Yt )
ergodi ity
is the needed
Denition 58 (Ergodi ity) A stationary sto hasti pro ess is ergodi (for the mean) if the
1X
p
yt
n t=1
(25.4)
A su ient ondition for ergodi ity is that the auto ovarian es be absolutely summable:
X
j=0
|j | <
This implies that the auto ovarian es die o, so that the
yt
Denition 59 (Auto orrelation) The j th auto orrelation, j is just the j th auto ovarian e
j =
j
0
(25.5)
Denition 60 (White noise) White noise is just the time series literature term for a las-
q th
yt = + t + 1 t1 + 2 t2 + + q tq
25.2.
where
365
= E (yt )2
= E (t + 1 t1 + 2 t2 + + q tq )2
= 2 1 + 12 + 22 + + q2
= j + j+1 1 + j+2 2 + + q qj , j q
= 0, j > q
Therefore an MA(q) pro
ess is ne
essarily
ovarian
e stationary and ergodi
, as long as
and all of the
are nite.
pth
order dieren e
yt
yt1
.
.
.
ytp+1
1 2
1
0
= .
0
.
.
..
.
0
0
c
1
..
..
..
..
yt1
0
yt2
0
.
..
0
ytp
0
Yt = C + F Yt1 + Et
With this, we
an re
ursively work forward in time:
= C + F Yt + Et+1
= C + F (C + F Yt1 + Et ) + Et+1
= C + F C + F 2 Yt1 + F Et + Et+1
and
Yt+2
0
+ .
.
.
0
or
Yt+1
= C + F Yt+1 + Et+2
= C + F C + F C + F 2 Yt1 + F Et + Et+1 + Et+2
366
CHAPTER 25.
THE ATTIC
or in general
on
yt+j .
This is simply
Yt+j
j
= F(1,1)
Et (1,1)
If the system is to be stationary, then as we move forward in time this impa
t must die
o. Otherwise a sho
k
auses a permanent
hange in the mean of
yt .
Therefore, stationarity
requires that
j
=0
lim F(1,1)
F.
su h that
|F IP | = 0
The determinant here
an be expressed as a polynomial. for example, for
p = 1,
the matrix
is simply
F = 1
so
|1 | = 0
an be written as
1 = 0
When
p = 2,
the matrix
is
F =
"
so
F IP =
"
1 2
1
1 2
and
|F IP | = 2 1 2
So the eigenvalues are the roots of the polynomial
2 1 2
whi
h
an be found using the quadrati
equation. This generalizes. For a
the eigenvalues are the roots of
p p1 1 p2 2 p1 p = 0
pth
25.2.
367
Supposing that all of the roots of this polynomial are distin t, then the matrix
an be
fa tored as
F = T T 1
where
F,
and
is a diagonal
matrix with the eigenvalues on the main diagonal. Using this de omposition, we an write
F j = T T 1
where
T T 1
is repeated
T T 1 T T 1
F j = T j T 1
and
j1 0
0
=
0
j
i i = 1, 2, ..., p
j2
..
jp
j
=0
lim F(1,1)
j
requires that
|i | < 1, i = 1, 2, ..., p
e.g., the eigenvalues must be less than one in absolute value.
a + bi
modulus,
where
is
mod(a + bi) =
a2 + b2
This leads to the famous statement that stationarity requires the roots of the determinantal polynomial to lie inside the omplex unit ir le.
Dynami multipliers:
j
yt+j /t = F(1,1)
is a
dynami multiplier
or an
impulse-response
fun
tion. Real eigenvalues lead to steady movements, whereas
omlpex eigenvalue lead
to o
illatory behavior. Of
ourse, when there are multiple eigenvalues the overall ee
t
an be a mixture.
pi tures
368
CHAPTER 25.
THE ATTIC
Lyt = yt1
The lag operator is dened to behave just as an algebrai
quantity, e.g.,
L2 yt = L(Lyt )
= Lyt1
= yt2
or
yt (1 1 L 2 L2 p Lp ) = t
Fa
tor this polynomial as
1 1 L 2 L2 p Lp = (1 1 L)(1 2 L) (1 p L)
For the moment, just assume that the
L is
dened
su h that the following two expressions are the same for all
z:
1 1 z 2 z 2 p z p = (1 1 z)(1 2 z) (1 p z)
Multiply both sides by
z p
z p 1 z 1p 2 z 2p p1 z 1 p = (z 1 1 )(z 1 2 ) (z 1 p )
and now dene
= z 1
so we get
p 1 p1 2 p2 p1 p = ( 1 )( 2 ) ( p )
The LHS is pre
isely the determinantal polynomial that gives the eigenvalues of
the
F.
Therefore,
that are the oe ients of the fa torization are simply the eigenvalues of the matrix
F.
25.2.
369
(1 L)yt = t
|| < 1.
1 + L + 2 L2 + ... + j Lj
to get
1 + L + 2 L2 + ... + j Lj (1 L)yt = 1 + L + 2 L2 + ... + j Lj t
1 + L + 2 L2 + ... + j Lj L 2 L2 ... j Lj j+1 Lj+1 yt
== 1 + L + 2 L2 + ... + j Lj t
1 j+1 Lj+1 yt = 1 + L + 2 L2 + ... + j Lj t
so
Now as
yt = j+1 Lj+1 yt + 1 + L + 2 L2 + ... + j Lj t
j , j+1 Lj+1 yt 0,
sin e
|| < 1,
so
yt
= 1 + L + 2 L2 + ... + j Lj t
j
(1 L)yt = t
Substituting this into the above equation we have
yt
= 1 + L + 2 L2 + ... + j Lj (1 L)yt
so
1 + L + 2 L2 + ... + j Lj (1 L)
=1
|| < 1,
dene
(1 L)
in reases arbitrarily.
j Lj
j=0
yt (1 1 L 2 L2 p Lp ) = t
Therefore, for
370
CHAPTER 25.
THE ATTIC
yt (1 1 L)(1 2 L) (1 p L) = t
where the
F,
|i | < 1.
Therefore, we
X
X
X
yt =
j1 Lj
j2 Lj
jp Lj t
j=0
j=0
j=0
L,
whi h an be represented as
yt = (1 + 1 L + 2 L2 + )t
where the
The
The
a + bi
is an eigenvalue of
i ,
F,
i .
then so is
a bi.
In multipli ation
whi h is real-valued.
This shows that an AR(p) pro ess is representable as an innite-order MA(q) pro ess.
periods to get
j ,
the lagged
Ets
for their rst element, so we see that the rst equation here, in the limit, is just
yt =
X
j=0
Fj
1,1 tj
j
re
alling the previous fa
torization of F ).
and the
(and the
as well,
25.2.
371
E(yt ) = , t,
so
= c + 1 + 2 + ... + p
so
c
1 1 2 ... p
and
c = 1 ... p
so
0 = 1 1 + 2 2 + ... + p p + 2
The auto
ovarian
es of orders
j1
p+1
j = j ,
2
unknowns ( ,
0 , 1 , ..., p )
p+1
equations for
j = 0, 1, ..., p,
yt = (1 + 1 L + ... + q Lq )t
As before, the polynomial on the RHS
an be fa
tored as
whi h have
for
j>p
372
CHAPTER 25.
and ea
h of the
write
(1 i L)
an be inverted as long as
|i | < 1.
THE ATTIC
(1 + 1 L + ... + q Lq )1 (yt ) = t
where
(1 + 1 L + ... + q Lq )1
will be an innite-order polynomial in
X
j=0
with
0 = 1,
L,
so we get
j Lj (ytj ) = t
or
or
c = + 1 + 2 + ...
So we see that an MA(q) has an innite AR representation, as long as the
|i | < 1, i = 1, 2, ..., q.
It turns out that one
an always manipulate the parameters of an MA(q) pro
ess to nd
an invertible representation. For example, the two MA(1) pro
esses
yt = (1 L)t
and
yt = (1 1 L)t
have exa
tly the same moments if
2 = 2 2
For example, we've seen that
0 = 2 (1 + 2 ).
Given the above relationships amongst the parameters,
0 = 2 2 (1 + 2 ) = 2 (1 + 2 )
so the varian
es are the same.
all
same, as is easily he ked. This means that the two MA pro esses are
equivalent.
observationally
For a given MA(q) pro ess, it's always possible to manipulate the parameters to nd an
25.2.
373
It's important to nd an invertible representation, sin
e it's the only representation that
allows one to represent
y s.
Why is invertibility important? The most important reason is that it provides a justi ation for the use of parsimonious models. Sin e an AR(1) pro ess has an MA() rep-
resentation, one
an reverse the argument and note that at least some MA() pro
esses
have an AR(1) representation. At the time of estimation, it's a lot easier to estimate the
single AR(1)
oe
ient rather than the innite number of
oe
ients asso
iated with
the MA representation.
This is the reason that ARMA models are popular. Combining low-order AR and MA
models
an usually oer a satisfa
tory representation of univariate time series data with
a reasonable number of parameters.
Stationarity and invertibility of ARMA models is similar to what we've seen - we won't
go into the details. Likewise,
al
ulating moments is similar.
Exer
ise 61 Cal
ulate the auto
ovarian
es of an ARMA(1,1) model: (1 + L)yt = c + (1 +
L)t
374
CHAPTER 25.
THE ATTIC
Bibliography
[1 Davidson, R. and J.G. Ma
Kinnon (1993)
Oxford
Univ. Press.
[3 Gallant, A.R. (1985)
[5 Hamilton, J. (1994)
[6 Hayashi, F. (2000)
[7 Wooldridge (2003),
375
Index
ARCH, 294
R-squared, entered, 29
residuals, 23
376