Professional Documents
Culture Documents
WANCHUAN LIN
11/02/2007
Outline:
1. Math Review
1
1 Math Review1
or equivalently
lim Pr (! : jjxn (!) x (!) jj ") = 1
n!1
we then write
p
xn ! c
or p lim xn = c
n!1
Convergence in Distribution
De…nition 2-1: xn converges in distribution (or in law or weakly) to a random variable x with
cdf F (x) if and only if Fn (x) converges to F (x) ;i.e., limn!1 jFn (x) F (x) j = 0 at each
continuity point of F (x). we then write
d
xn ! x
De…nition 2-2: Let the cdf Fn (x) of the random variable xn depend on n, n = 1; 2; :::. If F (y)
is a distribution function and if limn!1 Fn (x) = F (x) for every continuity point fo F (x), then
the sequence of random variables x1 ; x2 ; ::: is said to converge in distribution to a random variable
with distribution function F (x). We sometimes call F (x) the limiting distribution.
De…nition 2-3: A sequence of random variables x1 ; x2 ; :::converges in distribution to a random
variable x if at all continuity points of Fx (x) ;
2
2. Slutsky Theorem
If xn converges in distribution to a random variable x and yn converges in probability to
a constant c, for a continuous function g ( ), then
d
g (xn ) ! g (x)
1
P
n
where X n = n Xi .
i=1
which is solved by
1
S XT 1
X XT 1
y
3
Since we can write P T = C 1=2 so that 1
= P T P; where P is a triangular matrix using the
Choleski Decomposition, we have
T 1 T
(y X ) (y X ) = (y X ) P T P (y X )
T
= (P y P X ) (P y PX )
y = X +"
Py = PX + P"
y = X +"
So we have a new regression equation y = X + " where if we examine the variance of the
new errors " , we have
Hence, the classical regression model applies to this transformed model. In the classical model,
OLS is e¢ cient.
1
b = X T
X X T
y
OLS
T T 1
= X P PX XT P T P y
1
= XT 1
X XT 1
y
= b
GLS
So you can view OLS as one kind of GLS which uses a weighting matrix I instead of P:
Properties of GLS:
b
GLS is e¢ cient. The GLS is the minimum variance unbias estimator in the generalized
regression.
4
2 OLS with Endogenous Variables
1. X is n K with rank K
2. X is a nonstochastic matrix
3. E ("jX) = 0
2
4. V ar ("jX) = I
1
E b OLS jX = E XT X X T yjX
1
= XT X X T E (yjX)
1
= XT X XT X
=
Property 2:
1
V ar b OLS jX = 2
XT X
1
V ar b OLS jX = V ar XT X X T yjX
1 1
= XT X X T V ar (yjX) X X T X
1 1
= XT X XT 2
In X X T X
2 1 1
= XT X XT X XT X
2 1
= XT X
5
2.2 Least Square Estimator with Endogenous Regressors
When the regressors are endogenous, we have the 3’th assumption departure from Classical Linear
Regression Model.
1
b = XT X XT y
OLS
1
= XT X X T (X + ")
1
= + XT X XT "
1 1
1 T 1 T
p lim b OLS = + p lim X X p lim X "
n!n n!n n n!n n
By WLLN,
n
1 T 1X X
p lim X X = p lim xi xTi = E xi xTi =
n!n n n!n n XX
i=1
n
1 T 1X X
p lim X " = p lim xi "i = E (xi "i ) =
n!n n n!n n X"
i=1
where xi is the ith column of the matrix X. But E (xi "i ) 6= 0, therefore
X 1 X
p lim b OLS = + 6=
n!n XX X"
so it is as if we were using the wrong moments conditions. That is, the analogy principle does
not hold. The sample moments do not have population counterparts.
yi = + zi + ui
6
but zi is not observed. Instead, we observe xi ; where
xi = zi + vi
P
Assume that (ui ; vi ) iid (0; ), i.e.,
V ar (ui ) = 11
V ar (vi ) = 22
Cov (ui ; vi ) = 12
yi = i + (xi vi ) + ui
= i + xi + (ui vi )
= i + xi + "i
where "i = ui vi :
It follows that
2. Omitted Variables;
True Model:
yi = xTi + ziT + "i ; " iid 0; 2
Hence,
3. Simultaneous Equations (Two quantities are jointly determined by one another’s value.
Here, joint determination of Price and Quantity)
Demand equation:
qt = xTt + pt + "t ( < 0)
Inverse Supply equation:
pt = ztT + qt + t ( > 0)
Assume that E ("t ) = E ( t ) = 0 and ("t ; t) are independent of xt and zt .
7
For the demand equation:
2
" + 12
) E (pt "t ) = 6= 0
1
8
a. IV Estimator
We can easy get the IV estimator b IV of :
1
b = ZT X ZT y
IV
1
= ZT X Z T (X + ")
1
1 T 1 T
= + Z X Z "
n n
So
1
1 T 1 T
p limn b IV = + p limn Z X p limn Z "
n n
By WLLN,
1 T 1X X
p limn Z X = p limn zi xTi = E zi xTi =
n n i ZX
1 1X X
p limn Z T " = p limn zi "i = E (zi "i ) = =0
n n i Z"
Therefore,
p limn b IV = 0
So IV estimator is a consistent estimator of .
d
X 1 X 1
! N 0;
ZX ZX
9
P
Consistent estimator for ZX
X
d 1X
= zi xTi
ZX n i
b = b2 1 T
Z Z
n
P
where b2 is any consistent estimator of 2
; e.g., the unbiased estimator 1
ib
"i or the
P n K
biaed one n1 i b
"i .
a. Instrumental Variables
Let zi be an l 1 vector with
E (zi "i ) = 0; l > K
and X
E zi xTi =
ZX
P P P
where ZX is an matrix with rank ( ZX ) = K < l; ZX is not invertible. Because of the
over-identi…ed, the rank will be considered.
Note that by the CLT,
1 X d
p zi "i ! N (0; V0 )
n i
V0 = E "2i zi ziT
Rank Review:
The column rank of a matrix A is the maximal number of linearly independent column of
A: Likewise, the row rank is the maximal number of linearly independent rows of A:
Since the column rank and the row rank are always equal, they are simply called the rank
of A;
The rank of an m n matrix is at most min (m; n) ;
A matrix that has as large rank as possible is said to have full rank.
10
b. Weighted IV Estimator
1.
1
b (b) = bT Z T X bT Z T y
IV
p
where b is an l K matrix with b ! 0
2.
1
b = XT X XT y
OLS
1
b = bT Z T X bT Z T y
IV
Comparing b OLS with b IV (b), it is like instead of using X T in the , and we are now using
bT Z T for the b IV ; where b is a weighting matrix with rank K:
The weighting matrix can select a subset of K instrumental variables from l instrumental
variables, or might form K linear combinations of these l instruments. The only requirement
for the weighting matrix is it has rank K:
3. The idea of the weighting matrix is to approximate bT Z T " by zero. Note that we have K
unknown but l moment conditions. Therefore, not all l moments can be exactly satis…ed.
4. Not all l moments can be satis…ed. So a criterion function that weight them appropriately
is used to improve the e¢ ciency of the estimator.
where
X 1 X 1
T T T
= 0 0 V0 0 0
ZX ZX
0 arg min
X 1 X 1
T T T
= arg min 0 0 V0 0 0
ZX ZX
This gives the IV estimator that has the smallest asymptotic variance among those that could
be formed from the instruments Z and a weighting matrix :
11
(3) The GMM Representation of Weighted
IV
The Generalized Method of moment estimator is the best estimator which solves the sample
analogue of the population moments
T
0=E zi y i xTi
1
where ye = p ZT y
n
e 1 T
X = p Z X
n
1 T
e
" = p Z "
n
d
Note that e
" ! N (0; V0 ) by CLT. Then,
1
b = e T Vb 1 X
X e e T Vb 1 Ye
X
GLS 0 0
1
1 T b 1 T 1 T b 1
= X Z V0 Z X X Z V0 ZT y
n n
1
b 1 T b 1 T 1 T b 1
GLS = X Z V0 Z X X Z V0 ZT y
n n
1
b = bT Z T X bT Z T y
IV
12
(5) 2SLS Interpretation of the (Optimal)
Weighted IV Estimator
1. Calculate b 2SLS
y = X +"
X = Z +V
First Stage:
Regress X on Z using OLS, and we can get
1
b = ZT Z ZT X
b
X = Zb
1
= Z ZT Z ZT X
= HZ X
1
where HZ = Z Z T Z Z T and E ("jZ) = 0
Second Stage
b i.e.,
Regress y on X;
yi = xi + "i
T
"i = "i + (xi bi )
x
1 1 1
= XT Z ZT Z ZT X XT Z ZT Z ZT y
1
= X T HZ X X T HZ y
b on y:
i.e., LS of X
13
(c) we can write
1
b bT X
= X b b T yb
X
2SLS
b
where yb = HZ y; that is, residual regression of yb on X:
where:
T
P 1 T
PT 1
(a) 2SLS = 0 ZX 0 V0 0 ZX 0
1 d
p Z T " ! N (0; V0 )
n
1 1 P 1 P
(c) 0 = p limn Z T Z Z T X = p limn n1 Z T Z p limn n1 Z T X = ZZ ZX
XT X X XT X X1 X X
T X X
1 1
1 1 1
2SLS = V0
ZZ ZZ ZZ ZZ ZZ ZZ ZZ ZZ
ZZ ZX
2
P 2
Under conditional homokedasticity disturbance, V0 = ZZ = Z T Z; T hus
XT X 1 X 1
2
2SLS =
ZX ZZ ZX
and
1 T p
b2 y xb y xb ! 2
n
1
For the estimation of 2 b We do use X
we use X; NOT X: b only in X b
bT X :
Note that
1 bT b p XT X 1 X
X X!
n ZX ZZ ZX
Thus,
A 1
b N bT X
; b2 X b
2SLS
4. We know that
1 1 1
b = XT Z ZT Z ZT X XT Z ZT Z ZT y
2SLS
1
b = X T Z Vb0 1 Z T X X T Z Vb0 1
ZT y
GLS
1
The 2SLS formula is very similar to GLS, just replace Z T Z in the 2SLS formula with
V0 1
14
P
5. In the homoskedastic case where V0 = 2 ZZ we have asy. variance of b 2SLS equals to
the asy. variance of b GLS . But, in general,
V ar b GLS V ar b 2SLS
if
1 X d
p zi "i ! N (0; V0 )
n i
X
V0 = E "2i zi ziT = 2
E zi ziT = 2
ZZ
that is to say, under heterokesdasity, GSL estimator is more e¢ cient than 2SLS estimator
even though they are both unbiased.
6. the optimal weighted IV estimator is
1
b = b T
ZT X b T
ZT y
OptimalIV
1 1 T
0 = V0 Z X
n
1 T X 1 1 T
1 X 1
b = X Z 2 ZT X X Z 2
ZT y
OptimalIV
n ZZ n ZZ
X 1 1 X 1
= XT Z ZT X XT Z ZT y
ZZ ZZ
1 1 1
= XT Z ZT Z ZT X XT Z ZT Z ZT y
= b
2SLS
As 6. showing above, the 2SLS estimator will no longer be best when the scalar covariance
matrix assumption E "i "Ti = 2 I fails. In general, V ar b GLS V ar b 2SLS : But
under fairly general conditions it will remain consistent.
15
Estimate b 2SLS and retrieve the residuals
b
"i = yi Xi b 2SLS ;
then use b
"i from 2SLS to form
1X 2 T
Vb = b
" i zi zi
n i
STEP II
Find a Cholesky transformation P satisfying P Vb P T = I, then make the transformations
1
e = P X; Z
ye = P y; X e = PT Z
e using Z
and do a 2SLS regression of ye on X e as instruments.
1
b = X T Z Vb 1
ZT X X T Z Vb 1
ZT y
3SLS
16