Asymptotic Properties of 2Sls

ASYMPTOTIC PROPERTIES OF 2SLS
The population model is : y = xβ + u , where we assume that x

contains a constant (i.e, intercept) term. Some elements of x1×K
may be correlated with u.
Assumptions
i. We assume that a random sample {(yi, xi) : i = 1, 2, · · · , N } is

available from the population and observations are independently
and identically distributed.
ii. ∃ 1 × L vector z, E( z0u)=0 [2SLS 1]
• z contains any exogenous elements of x, including unity. Unless

x contains no endogenous regressors, z will also contain vari-
ables from outside the structural model. So these are also to be
part of the sample obtained, i.e, {(yi, xi, zi) : i = 1, 2, · · · , N }
• A sufficient condition for the weak 2SLS 1 assumption is the

strong assumption E(u|z) = 0. However, we stick to the weaker
version.
iii. rank E(z0z) = L [2SLS 2a]
rank E(z0x) = K [2SLS 2b]
• 2SLS 2b is called the rank condition for 2SLS.

• The first assumption is innocuous, since it only ensures the ex-
ogenous variables included in z are linearly independent, which
only implies careful choice of instruments.
• The rank condition means that z is sufficiently linearly related

to x so that rank E(z0x) has full column rank, i.e, instruments
are “good”.
• If z = x, 2SLS1 and 2SLS 2 boil down to OLS1 and OLS2.
• Necessary for rank condition is order condition: L ≥ K i.e, number

of instruments are at least as many as regressors. However, in
single equation estimation of 2SLS, order condition always holds.
Identification of β under 2SLS:
Assuming E(z0z) is nonsingular by 2SLS 2a, write a linear projection

of x onto z as
x∗ = zΠ ,
where ΠL×K = [E(z0z)]−1E(z0x), from definition of linear projection.
Now, x = x∗ + r , where E(z0r) and hence E(x∗0r) = [L.P. error

uncorrelated with RHS explanatory factors].
Now premultiply the structural equation by x∗0 to get
x∗0y = x∗0xβ + x∗0u

Taking expectations, E(x∗0y) = E(x∗0x)β , as E(x∗0u) = E(Π0z0u) =
Π0E(z0u) = 0
Thus, β = [E(x∗0x)]−1E(x∗0y) , provided [E(x∗0x)] is non singular.
But E(x∗0x) = Π0E(z0x) = E(x0z)[E(z0z)]−1E(z0x)
Remember that the product of full ranked matrices is a full rank

matrix. The RHS matrix is nonsingular (full rank) iff E(z0x) has rank
K, since the middle matrix in RHS, [E(z0z)]−1, already has full rank.
But rank [E(z0x)] = K is guaranteed by 2SLS 2b.
Hence under 2SLS 1 and 2SLS 2, β is identified.
Also note that, x = x∗ + r

Therefore,
x∗0x = x∗0x∗ + x∗0r
E[x∗0x] = E[x∗0x∗] + E[x∗0r]
But, E[x∗0r] =
Hence, E[x∗0x] = E[x∗0x∗]
Thus identification of β essentially warrants rank E[x∗0x∗] = K
h i−1
Hence β 0 0 −1 0
= E(x z)[E(z z)] E(z x) E(x0z)[E(z0z)]−1E(z0y)
Using method of moments,

1X 0 1 X 0 −1 1 X 0 1X 0 1 X 0 −1 1 X 0

β̂2SLS = xizi zizi zixi −1 xizi zizi ziyi
N N N N N N
Consistency of β̂2SLS :
Theorem : Under assumptions 2SLS1 and 2SLS2, β̂2SLS is consis-

tent.
Proof :
β̂2SLS = β +
1 1 1 1 1 1
X X X X X X
x0izi z0izi −1 z0ixi −1 x0izi z0izi −1 z0iui
N N N N N N
Apply WLLN to each ( ).

1 P x0 z −p 0 z), etc.
Hence N i i → E( x
Thus,
p h i−1
β̂2SLS − β −
→ 0 0 −1 0
E(x z)[E(z z)] E(z x) E(x0z)[E(z0z)]−1E(z0u), by
Slutsky theorem.
But by 2SLS1, E(z0u) = 0
p
Hence, β̂2SLS −
→β
Asymptotic Normality of 2SLS
√
Theorem : Under 2SLS1 – 2SLS3, N (β̂ − β ) is asymptotic normal
h i−1
with mean zero and variance-covariance matrix σ 2 E(x0z)[E(z0z)]−1E(z0x)
Proof : x∗ = zΠ
The sample counterpart is x̂i = ziΠ̂, where Π̂ is OLS of Π from first

stage (reduced form) estimation. Note that Π̂OLS is a consistent
estimator of Π.
P 0 −1 P 0
β̂2SLS (Stage 2 OLS) = ( x̂ix̂i) x̂iyi
1 P x̂0 x̂ −1 1 P x̂0 y

⇒ β̂2SLS − β = N i i N i i
Now,
−1 −1
1X 0 1
X
plim x̂ix̂i = plim Π̂0z0iziΠ̂
N N
−1
1

= (plim Π̂0) plim z0izi (plim Π̂)
X
N
= [Π0E(z0z)Π]−1, by W LLN
o−1
0
n
= E[(zΠ) (zΠ)]
= [E(x∗0x∗)]−1
= A−1, say
1 P x̂0 x̂ −1 − A−1 −
p

Thus, N i i →
1
P 0 −1
Hence, N x̂ix̂i = A−1 + op(1)
√ h i 1
−2

N (β̂2SLS − β ) = A−1 + op(1)
P 0
Now, N x̂iui
−1 d 0
P 0
But, CLT implies, N 2 x̂iui − 2 ∗ ∗
→ N 0, E(u x x ) , ie, N (0, B) , say.
Assumption 2SLS 3: E(u2x∗0x∗) = σ 2E(x∗0x∗), where E(u2) = σ 2
−1
So N 2 x̂0iui = Op(1), by Lemma 5.
P
Hence,
√ −1
h i
N (β̂2SLS − β ) = A + op(1) Op(1)
1X
−2
−1
= A (N x̂0iui) + op(1)Op(1)
−1 −1
x̂0iui) + op(1), by Lemma 2
X
= A (N 2
√
−1

p
− β ) − A−1
P 0
Thus, N (β̂2SLS N 2 x̂iui −
→ 0.
Hence by Asymptotic Equivalence Lemma, asymptotic distributions
√ −1

−1 P 0

of N (β̂2SLS − β ) is same as that of A N 2 x̂iui .

1
−2 P 0 d 1
−2 d
→ N (0, B) implies A−1
P 0
But, N x̂iui − N x̂iui → N (0, A−1BA−1).
−
√ a
Hence, N (β̂2SLS − β ) ∼ N (0, A−1BA−1).
√
Avar[ N (β̂2SLS − β )] = A−1BA−1
∗ 0 ∗ −1 2 ∗ 0 ∗ ∗ 0 ∗ −1
h i
= E(x x )] σ E(x x )[E(x x )
∗ 0 ∗ −1
h i
2
= σ E(x x )
i−1
0 0
h
2
= σ Π E(z z)Π
o−1
0 0 −1 0 0 −1 0
n
2
= σ E(x z)[E(z z)] [E(z z)][E(z z)] E(z x)
o−1
0 0 −1 0
n
2
= σ E(x z)[E(z z)] E(z x)
√
• To estimate this Avar[ N (β̂2SLS − β )], the matrix part may be
estimated using sample averages.
Estimation of σ 2 requires the following

i. Define 2SLS residuals as: ûi = yi − xiβ̂2SLS , i = 1, 2, . . . , N
[Note 2SLS residuals are different from 2nd stage residuals,
which are yi − x̂iβ̂2SLS .]
ii. A consistent (though not unbiased) estimator of σ 2 under

2SLS1 – 2SLS3 is
û2
P
2
σ̂ ≡ N −K i
• Finally, under assumptions 2SLS1 – 2SLS3, a valid estimator of

P −1
asymptotic variance of β̂ \
, Avar(β̂ ) = σ̂ 2 N x̂0 x̂ =
2SLS 2SLS i=1 i i
−1
σ̂ 2 0
X̂ X̂ .
• Asymptotic standard error of β̂j,2SLS is the square root of the j-th

\
element of Avar( β̂2SLS ).
Asymptotic Efficiency of 2SLS:
Theorem : Under 2SLS1 – 2SLS3, 2SLS estimator is most efficient

in the class of all instrumental variables estimators using instruments
linear in z.
Proof : Let β̃ be any other IV estimator (other than 2SLS) that also
uses instruments which are linear in z.
Let the instruments for β̃ be x̃ = zΓ, where ΓL×K is a non-stochastic

matrix. Assume x̃ satisfies rank condition.
Now, for β̂2SLS , x∗ = zΠ , where Π = [E(z0z)]−1E(z0x)
√
Under 2SLS1-2SLS2, Avar[ N (β̂2SLS − β )] = σ 2[E(x∗0x∗)]−1 , where
x∗ = zΠ.
Again, β̃ = (N −1
P 0
x̃ixi)−1(N −1 x̃0iyi) (where x̃ is an instrument for
P
endogenous x – the usual IV estimator).
⇒ β̃ − β = (N −1
P 0
x̃ixi)−1(N −1 x̃0iui).
P
Now, plim (N −1
P 0
x̃ixi) = E(x̃0x) = C , say.
√ −1
1P
−2
N (β̃ − β ) = C (N x̃0iui) + op(1)
−1 d
Also, N 2 x̃0iui −
→ N (0, D), where D = E(u2x̃0x̃) = σ 2E(x̃0x̃)
P
−1 d 0
−1
⇒ C N 2 x̃0iui −
→ N (0, C −1DC −1 )
P
√
Hence Avar[ N (β̃ − β )] = σ 2[E(x̃0x)]−1E(x̃0x̃)[E(x0x̃)]−1
To show: [E(x∗0x∗)] − E(x0x̃)[E(x̃0x̃)]−1E(x̃0x) is positive semidefinite.
Note: x = x∗ + r and E(z0r) = . So E(x̃0r) = .
Thus, x̃0x = x̃0x∗ + x̃0r
⇒ E[x̃0x]=E[x̃0x∗]
So,
[E(x∗0x∗)] − E(x0x̃)[E(x̃0x̃)]−1E(x̃0x)
= E(x∗0x∗) − E(x∗0x̃)[E(x̃0x̃)]−1E(x̃0x∗) = E(s∗0s∗)
where s∗ = x∗ − L(x∗|x̃) = x∗ − x̃[E(x̃0x̃)]−1E(x̃0x∗).

E(s∗0s∗) is p.s.d. Hence [E(x∗0x∗)] − [E(x̃0x)]−1E(x̃0x̃)[E(x0x̃)]−1 is
p.s.d.
√ √
Thus,Avar[ N (β̃ − β )] − Avar[ N (β̂2SLS − β )] is p.s.d.
• When L = K, any (non-singular) choice of Γ leads to IV estima-

tor with a single endogeneous regressor and a single instrument.
Consequently, the above theorem is vacuous (as the “class” is a
singleton).
• When x is exogeneous, the above theorem implies that under

2SLS1 – 2SLS3, OLS is the most efficient in the class of es-
timators using instruments linear in z. Note x ⊂ z and hence
L(x|z) = x
• Asymptotically, the above theorem implies,we always do better by
using as many instruments as possible, at least under homoskedas-
ticity [2SLS 3]. However, it should be noted that 2SLS estimators
based on many overidentifying restrictions(L K) can cause fi-
nite sample problems.
Heteroskedasticity Robust Inference with 2SLS: If we relax as-

sumption 2SLS3, we should have a variance matrix estimator that is
robust in presence of unknown form of heteroskedasticity.
Under 2SLS1 and 2SLS2,
\ β̂2SLS ) = (X̂0X̂)−1( N 2 x̂0 x̂ )(X̂0 X̂)−1 , where û is the 2SLS

P
Avar( û
i=1 i i i i
residual described above.
Hypothesis Testing with 2SLS:
• Hypotheses involving single linear restriction of βj s can be tested

using asymptotic t-stat.
• To test multiple linear restrictions of the form H0 : Rβ = r,

the Wald stat, as in case of OLS is used, with V̂ replaced by
\
Avar(β̂ ) i.e, σ̂ 2(X̂0X̂)−1.
2SLS
The Wald stat, as before, has a limiting χ2 Q distribution (Q =

Number of linear restrictions being tested).
• Another way of testing multiple restrictions is the following.
y = x1β1 + x2β2 + u
where x1 and x2 are partitions of regressors with K1 +
(1×K1 ) (1×K2 )
K2 = K.
We are testing the K2 restrictions in form of exclusion restrictions,

i.e, H 0 : β2 = 0 against H 1 : β2 6= 0
Both x1 and x2 may contain endogeneous and exogeneous vari-

ables. Let z denote L(≥ K1 + K2) × 1 vector of instruments, and
let rank condition hold.
Let ûi be 2SLS residuals from estimating unrestricted model us-

ing zi as instruments. Define 2SLS unrestricted residual sum of
squares as
N
û2
X
SSRU R = i.
i=1
Let xî1 be 1 × K1 fitted values from stage 1 regression of xi1 on
zi. Similarly xî2 be 1 × K2 fitted values from stage 2 regression
of xî2 on zi.
ˆ U R is defined as sum of squared residuals from unrestricted

SSR
stage 2 regression of y on xˆ1 , xˆ2.
ˆ R is sum of squared residuals from restricted stage 2 regres-

SSR
sion of y on xˆ1 alone (treating β2 = 0)
Under, H 0 : β2 = 0 and 2SLS1 – 2SLS3,

ˆ R − SSR
(SSR ˆ U R)/K2
F = ∼ FK2,N −K
SSRU R /(N − K)
Whereas computer packages usually report SSRU R by default, one
ˆ r and SSR
needs to compute SSR ˆ U R directly from Stage 2.

Asymptotic Properties of 2Sls

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Asymptotic Properties of 2Sls

Uploaded by

Copyright:

Available Formats

ASYMPTOTIC PROPERTIES OF 2SLS

The population model is : y = xβ + u , where we assume that x

i. We assume that a random sample {(yi, xi) : i = 1, 2, · · · , N } is

ii. ∃ 1 × L vector z, E( z0u)=0 [2SLS 1]

• z contains any exogenous elements of x, including unity. Unless

• A sufficient condition for the weak 2SLS 1 assumption is the

iii. rank E(z0z) = L [2SLS 2a]

rank E(z0x) = K [2SLS 2b]

• 2SLS 2b is called the rank condition for 2SLS.

• The rank condition means that z is sufficiently linearly related

• If z = x, 2SLS1 and 2SLS 2 boil down to OLS1 and OLS2.

• Necessary for rank condition is order condition: L ≥ K i.e, number

Assuming E(z0z) is nonsingular by 2SLS 2a, write a linear projection

where ΠL×K = [E(z0z)]−1E(z0x), from definition of linear projection.

Now, x = x∗ + r , where E(z0r) and hence E(x∗0r) = [L.P. error

Now premultiply the structural equation by x∗0 to get

x∗0y = x∗0xβ + x∗0u

Thus, β = [E(x∗0x)]−1E(x∗0y) , provided [E(x∗0x)] is non singular.

But E(x∗0x) = Π0E(z0x) = E(x0z)[E(z0z)]−1E(z0x)

Remember that the product of full ranked matrices is a full rank

Hence under 2SLS 1 and 2SLS 2, β is identified.

Also note that, x = x∗ + r

x∗0x = x∗0x∗ + x∗0r

E[x∗0x] = E[x∗0x∗] + E[x∗0r]

Hence, E[x∗0x] = E[x∗0x∗]

Thus identification of β essentially warrants rank E[x∗0x∗] = K

Using method of moments,

Theorem : Under assumptions 2SLS1 and 2SLS2, β̂2SLS is consis-

Apply WLLN to each ( ).

But by 2SLS1, E(z0u) = 0

Asymptotic Normality of 2SLS

The sample counterpart is x̂i = ziΠ̂, where Π̂ is OLS of Π from first

Assumption 2SLS 3: E(u2x∗0x∗) = σ 2E(x∗0x∗), where E(u2) = σ 2

Estimation of σ 2 requires the following

ii. A consistent (though not unbiased) estimator of σ 2 under

• Finally, under assumptions 2SLS1 – 2SLS3, a valid estimator of

• Asymptotic standard error of β̂j,2SLS is the square root of the j-th

Theorem : Under 2SLS1 – 2SLS3, 2SLS estimator is most efficient

Let the instruments for β̃ be x̃ = zΓ, where ΓL×K is a non-stochastic

Now, for β̂2SLS , x∗ = zΠ , where Π = [E(z0z)]−1E(z0x)

endogenous x – the usual IV estimator).

Note: x = x∗ + r and E(z0r) = . So E(x̃0r) = .

Thus, x̃0x = x̃0x∗ + x̃0r

= E(x∗0x∗) − E(x∗0x̃)[E(x̃0x̃)]−1E(x̃0x∗) = E(s∗0s∗)

where s∗ = x∗ − L(x∗|x̃) = x∗ − x̃[E(x̃0x̃)]−1E(x̃0x∗).

• When L = K, any (non-singular) choice of Γ leads to IV estima-

• When x is exogeneous, the above theorem implies that under

Heteroskedasticity Robust Inference with 2SLS: If we relax as-

Under 2SLS1 and 2SLS2,

\ β̂2SLS ) = (X̂0X̂)−1( N 2 x̂0 x̂ )(X̂0 X̂)−1 , where û is the 2SLS

• Hypotheses involving single linear restriction of βj s can be tested

• To test multiple linear restrictions of the form H0 : Rβ = r,

The Wald stat, as before, has a limiting χ2 Q distribution (Q =

• Another way of testing multiple restrictions is the following.

We are testing the K2 restrictions in form of exclusion restrictions,

Both x1 and x2 may contain endogeneous and exogeneous vari-

Let ûi be 2SLS residuals from estimating unrestricted model us-

ˆ U R is defined as sum of squared residuals from unrestricted

ˆ R is sum of squared residuals from restricted stage 2 regres-

Under, H 0 : β2 = 0 and 2SLS1 – 2SLS3,

You might also like