You are on page 1of 23

ASYMPTOTIC PROPERTIES OF 2SLS

The population model is : y = xβ + u , where we assume that x


contains a constant (i.e, intercept) term. Some elements of x1×K
may be correlated with u.

Assumptions

i. We assume that a random sample {(yi, xi) : i = 1, 2, · · · , N } is


available from the population and observations are independently
and identically distributed.

ii. ∃ 1 × L vector z, E( z0u)=0 [2SLS 1]

• z contains any exogenous elements of x, including unity. Unless


x contains no endogenous regressors, z will also contain vari-
ables from outside the structural model. So these are also to be
part of the sample obtained, i.e, {(yi, xi, zi) : i = 1, 2, · · · , N }

• A sufficient condition for the weak 2SLS 1 assumption is the


strong assumption E(u|z) = 0. However, we stick to the weaker
version.

iii. rank E(z0z) = L [2SLS 2a]

rank E(z0x) = K [2SLS 2b]

• 2SLS 2b is called the rank condition for 2SLS.


• The first assumption is innocuous, since it only ensures the ex-
ogenous variables included in z are linearly independent, which
only implies careful choice of instruments.

• The rank condition means that z is sufficiently linearly related


to x so that rank E(z0x) has full column rank, i.e, instruments
are “good”.

• If z = x, 2SLS1 and 2SLS 2 boil down to OLS1 and OLS2.

• Necessary for rank condition is order condition: L ≥ K i.e, number


of instruments are at least as many as regressors. However, in
single equation estimation of 2SLS, order condition always holds.
Identification of β under 2SLS:

Assuming E(z0z) is nonsingular by 2SLS 2a, write a linear projection


of x onto z as

x∗ = zΠ ,

where ΠL×K = [E(z0z)]−1E(z0x), from definition of linear projection.

Now, x = x∗ + r , where E(z0r) and hence E(x∗0r) = [L.P. error


uncorrelated with RHS explanatory factors].

Now premultiply the structural equation by x∗0 to get

x∗0y = x∗0xβ + x∗0u


Taking expectations, E(x∗0y) = E(x∗0x)β , as E(x∗0u) = E(Π0z0u) =
Π0E(z0u) = 0

Thus, β = [E(x∗0x)]−1E(x∗0y) , provided [E(x∗0x)] is non singular.

But E(x∗0x) = Π0E(z0x) = E(x0z)[E(z0z)]−1E(z0x)

Remember that the product of full ranked matrices is a full rank


matrix. The RHS matrix is nonsingular (full rank) iff E(z0x) has rank
K, since the middle matrix in RHS, [E(z0z)]−1, already has full rank.
But rank [E(z0x)] = K is guaranteed by 2SLS 2b.

Hence under 2SLS 1 and 2SLS 2, β is identified.

Also note that, x = x∗ + r


Therefore,

x∗0x = x∗0x∗ + x∗0r

E[x∗0x] = E[x∗0x∗] + E[x∗0r]

But, E[x∗0r] =

Hence, E[x∗0x] = E[x∗0x∗]

Thus identification of β essentially warrants rank E[x∗0x∗] = K

h i−1
Hence β 0 0 −1 0
= E(x z)[E(z z)] E(z x) E(x0z)[E(z0z)]−1E(z0y)

Using method of moments,


1X 0 1 X 0 −1 1 X 0 1X 0 1 X 0 −1 1 X 0
         
β̂2SLS = xizi zizi zixi −1 xizi zizi ziyi
N N N N N N

Consistency of β̂2SLS :

Theorem : Under assumptions 2SLS1 and 2SLS2, β̂2SLS is consis-


tent.

Proof :

β̂2SLS = β +
1 1 1 1 1 1
 X  X   X   X  X   X 
x0izi z0izi −1 z0ixi −1 x0izi z0izi −1 z0iui
N N N N N N

Apply WLLN to each ( ).


1 P x0 z −p 0 z), etc.
Hence N i i → E( x

Thus,

p h i−1
β̂2SLS − β −
→ 0 0 −1 0
E(x z)[E(z z)] E(z x) E(x0z)[E(z0z)]−1E(z0u), by
Slutsky theorem.

But by 2SLS1, E(z0u) = 0

p
Hence, β̂2SLS −
→β

Asymptotic Normality of 2SLS


Theorem : Under 2SLS1 – 2SLS3, N (β̂ − β ) is asymptotic normal
h i−1
with mean zero and variance-covariance matrix σ 2 E(x0z)[E(z0z)]−1E(z0x)
Proof : x∗ = zΠ

The sample counterpart is x̂i = ziΠ̂, where Π̂ is OLS of Π from first


stage (reduced form) estimation. Note that Π̂OLS is a consistent
estimator of Π.

P 0 −1 P 0
β̂2SLS (Stage 2 OLS) = ( x̂ix̂i) x̂iyi

1 P x̂0 x̂ −1 1 P x̂0 y
 
⇒ β̂2SLS − β = N i i N i i
Now,
−1 −1
1X 0 1
  X
plim x̂ix̂i = plim Π̂0z0iziΠ̂
N N
−1
1
  
= (plim Π̂0) plim z0izi (plim Π̂)
X
N
= [Π0E(z0z)Π]−1, by W LLN
o−1
0
n
= E[(zΠ) (zΠ)]
= [E(x∗0x∗)]−1
= A−1, say

1 P x̂0 x̂ −1 − A−1 −
p
 
Thus, N i i →

1
P 0 −1
Hence, N x̂ix̂i = A−1 + op(1)

√ h i 1
−2

N (β̂2SLS − β ) = A−1 + op(1)
P 0
Now, N x̂iui
−1 d 0
P 0  
But, CLT implies, N 2 x̂iui − 2 ∗ ∗
→ N 0, E(u x x ) , ie, N (0, B) , say.

Assumption 2SLS 3: E(u2x∗0x∗) = σ 2E(x∗0x∗), where E(u2) = σ 2

−1
So N 2 x̂0iui = Op(1), by Lemma 5.
P

Hence,
√ −1
h i
N (β̂2SLS − β ) = A + op(1) Op(1)
1X
−2
−1
= A (N x̂0iui) + op(1)Op(1)
−1 −1
x̂0iui) + op(1), by Lemma 2
X
= A (N 2

√ 
−1

p
− β ) − A−1
P 0
Thus, N (β̂2SLS N 2 x̂iui −
→ 0.
Hence by Asymptotic Equivalence Lemma, asymptotic distributions
√ −1

−1 P 0

of N (β̂2SLS − β ) is same as that of A N 2 x̂iui .

   
1
−2 P 0 d 1
−2 d
→ N (0, B) implies A−1
P 0
But, N x̂iui − N x̂iui → N (0, A−1BA−1).

√ a
Hence, N (β̂2SLS − β ) ∼ N (0, A−1BA−1).

Avar[ N (β̂2SLS − β )] = A−1BA−1
∗ 0 ∗ −1 2 ∗ 0 ∗ ∗ 0 ∗ −1
h i
= E(x x )] σ E(x x )[E(x x )
∗ 0 ∗ −1
h i
2
= σ E(x x )
i−1
0 0
h
2
= σ Π E(z z)Π
o−1
0 0 −1 0 0 −1 0
n
2
= σ E(x z)[E(z z)] [E(z z)][E(z z)] E(z x)
o−1
0 0 −1 0
n
2
= σ E(x z)[E(z z)] E(z x)


• To estimate this Avar[ N (β̂2SLS − β )], the matrix part may be
estimated using sample averages.

Estimation of σ 2 requires the following


i. Define 2SLS residuals as: ûi = yi − xiβ̂2SLS , i = 1, 2, . . . , N
[Note 2SLS residuals are different from 2nd stage residuals,
which are yi − x̂iβ̂2SLS .]

ii. A consistent (though not unbiased) estimator of σ 2 under


2SLS1 – 2SLS3 is
û2
P
2
σ̂ ≡ N −K i

• Finally, under assumptions 2SLS1 – 2SLS3, a valid estimator of


P −1
asymptotic variance of β̂ \
, Avar(β̂ ) = σ̂ 2 N x̂0 x̂ =
2SLS 2SLS i=1 i i
 −1
σ̂ 2 0
X̂ X̂ .

• Asymptotic standard error of β̂j,2SLS is the square root of the j-th


\
element of Avar( β̂2SLS ).
Asymptotic Efficiency of 2SLS:

Theorem : Under 2SLS1 – 2SLS3, 2SLS estimator is most efficient


in the class of all instrumental variables estimators using instruments
linear in z.

Proof : Let β̃ be any other IV estimator (other than 2SLS) that also
uses instruments which are linear in z.

Let the instruments for β̃ be x̃ = zΓ, where ΓL×K is a non-stochastic


matrix. Assume x̃ satisfies rank condition.

Now, for β̂2SLS , x∗ = zΠ , where Π = [E(z0z)]−1E(z0x)


Under 2SLS1-2SLS2, Avar[ N (β̂2SLS − β )] = σ 2[E(x∗0x∗)]−1 , where
x∗ = zΠ.
Again, β̃ = (N −1
P 0
x̃ixi)−1(N −1 x̃0iyi) (where x̃ is an instrument for
P

endogenous x – the usual IV estimator).

⇒ β̃ − β = (N −1
P 0
x̃ixi)−1(N −1 x̃0iui).
P

Now, plim (N −1
P 0
x̃ixi) = E(x̃0x) = C , say.

√ −1
1P
−2
N (β̃ − β ) = C (N x̃0iui) + op(1)

−1 d
Also, N 2 x̃0iui −
→ N (0, D), where D = E(u2x̃0x̃) = σ 2E(x̃0x̃)
P

−1 d 0
−1
⇒ C N 2 x̃0iui −
→ N (0, C −1DC −1 )
P


Hence Avar[ N (β̃ − β )] = σ 2[E(x̃0x)]−1E(x̃0x̃)[E(x0x̃)]−1
To show: [E(x∗0x∗)] − E(x0x̃)[E(x̃0x̃)]−1E(x̃0x) is positive semidefinite.

Note: x = x∗ + r and E(z0r) = . So E(x̃0r) = .

Thus, x̃0x = x̃0x∗ + x̃0r

⇒ E[x̃0x]=E[x̃0x∗]

So,

[E(x∗0x∗)] − E(x0x̃)[E(x̃0x̃)]−1E(x̃0x)

= E(x∗0x∗) − E(x∗0x̃)[E(x̃0x̃)]−1E(x̃0x∗) = E(s∗0s∗)

where s∗ = x∗ − L(x∗|x̃) = x∗ − x̃[E(x̃0x̃)]−1E(x̃0x∗).


E(s∗0s∗) is p.s.d. Hence [E(x∗0x∗)] − [E(x̃0x)]−1E(x̃0x̃)[E(x0x̃)]−1 is
p.s.d.

√ √
Thus,Avar[ N (β̃ − β )] − Avar[ N (β̂2SLS − β )] is p.s.d.

• When L = K, any (non-singular) choice of Γ leads to IV estima-


tor with a single endogeneous regressor and a single instrument.
Consequently, the above theorem is vacuous (as the “class” is a
singleton).

• When x is exogeneous, the above theorem implies that under


2SLS1 – 2SLS3, OLS is the most efficient in the class of es-
timators using instruments linear in z. Note x ⊂ z and hence
L(x|z) = x
• Asymptotically, the above theorem implies,we always do better by
using as many instruments as possible, at least under homoskedas-
ticity [2SLS 3]. However, it should be noted that 2SLS estimators
based on many overidentifying restrictions(L  K) can cause fi-
nite sample problems.

Heteroskedasticity Robust Inference with 2SLS: If we relax as-


sumption 2SLS3, we should have a variance matrix estimator that is
robust in presence of unknown form of heteroskedasticity.

Under 2SLS1 and 2SLS2,

\ β̂2SLS ) = (X̂0X̂)−1( N 2 x̂0 x̂ )(X̂0 X̂)−1 , where û is the 2SLS


P
Avar( û
i=1 i i i i
residual described above.
Hypothesis Testing with 2SLS:

• Hypotheses involving single linear restriction of βj s can be tested


using asymptotic t-stat.

• To test multiple linear restrictions of the form H0 : Rβ = r,


the Wald stat, as in case of OLS is used, with V̂ replaced by
\
Avar(β̂ ) i.e, σ̂ 2(X̂0X̂)−1.
2SLS

The Wald stat, as before, has a limiting χ2 Q distribution (Q =


Number of linear restrictions being tested).

• Another way of testing multiple restrictions is the following.

y = x1β1 + x2β2 + u
where x1 and x2 are partitions of regressors with K1 +
(1×K1 ) (1×K2 )
K2 = K.

We are testing the K2 restrictions in form of exclusion restrictions,


i.e, H 0 : β2 = 0 against H 1 : β2 6= 0

Both x1 and x2 may contain endogeneous and exogeneous vari-


ables. Let z denote L(≥ K1 + K2) × 1 vector of instruments, and
let rank condition hold.

Let ûi be 2SLS residuals from estimating unrestricted model us-


ing zi as instruments. Define 2SLS unrestricted residual sum of
squares as
N
û2
X
SSRU R = i.
i=1
Let xˆi1 be 1 × K1 fitted values from stage 1 regression of xi1 on
zi. Similarly xˆi2 be 1 × K2 fitted values from stage 2 regression
of xˆi2 on zi.

ˆ U R is defined as sum of squared residuals from unrestricted


SSR
stage 2 regression of y on xˆ1 , xˆ2.

ˆ R is sum of squared residuals from restricted stage 2 regres-


SSR
sion of y on xˆ1 alone (treating β2 = 0)

Under, H 0 : β2 = 0 and 2SLS1 – 2SLS3,


ˆ R − SSR
(SSR ˆ U R)/K2
F = ∼ FK2,N −K
SSRU R /(N − K)
Whereas computer packages usually report SSRU R by default, one
ˆ r and SSR
needs to compute SSR ˆ U R directly from Stage 2.

You might also like