Professional Documents
Culture Documents
Blue 1
Blue 1
_
Eu
i
= 0 for all i
Var(u
i
) =
2
for all i
(homoscedasticity)
Cov(u
i
, u
j
) = 0 for all i = j
x
i
is nonstochastic constant
The text book uses Assumptions (B) (see p.588 of the text):
Assumptions (B)
_
_
E(u
i
|x
1
, , x
n
) = 0 for all i
Var(u
i
|x
1
, , x
n
)) =
2
for all i
(homoscedasticity)
Cov(u
i
, u
j
|x
1
, , x
n
) = 0 for all i = j
If we use Assumptions (B), we need to use the law of iterated expectations
in proving the BLUE. With Assumptions (B), the BLUE is given condi-
tionally on x
1
, , x
n
.
Let us use Assumptions (A). The Gauss-Markov Theorem is stated in
the boxed statement below:
1
Gauss Markov Theorem
Under Assumptions (A), the OLS estimators,
1
and
2
are the
Best Linear Unbiased Estimator (BLUE), that is
1. Unbias: E
1
=
1
and E
2
=
2
2. Best:
1
and
2
have the smallest variances among the class
of all linear unbiased estimators.
Real data seldomly satisfy Assumptions (A) or Assumptions (B). Ac-
cordingly we should think that the Gauss-Markov theorem only holds in the
never-never land. However, it is important to understand the Gauss-Markov
theorem on two grounds:
1. We may treat the world of the Gauss-Markov theorem as equivalent
to the world of perfect competition in micro economic theory.
2. The mathematical exercises are good for your souls.
We shall prove the Gauss-Markov theorem using the simple regression
model of equation (1). We can prove the Gauss-Markov theorem using the
multiple regression model
y
i
=
1
+
2
x
i2
+ +
k
x
ik
+ u
i
, i = 1, , n. (2)
To do so, however, we need to use vector and matrix language (linear al-
gebra.) Actually, once you learn linear algebra the proof of Gauss-Markov
theorem is far more straight forward than the proof for the simple regression
model of (1).
In the text book the Gauss-Markov theorem is discussed on the following
pages:
_
_
127
590 591
604 605
622 623
641 642
You should take a look at these pages.
2
Proving the Gauss-Markov Theorem
The unbiasedness of
1
and of
2
are given in the Comments on the
Midterm Examination and the answers to Assignment #5. So, we prove
here the minimum variance properties. There are generally two ways to
prove bestness: (i) using linear algebra, and (ii) using calculus. We prove
bestness using linear algebra rst, and we leave the proof using calculus to
the Appendix. First we prove that
1
has the smallest variance among all
other linear estimators of
1
.
Proof that
1
is best
We need to re-express
1
rst.
1
= y
2
x =
1
n
y
i
(x
i
x)y
i
s
xx
x
_
=
1
n
y
i
(x
i
x) x
s
xx
y
i
=
n
i=1
_
1
n
(x
i
x) x
s
xx
_
y
i
, where s
xx
=
x
2
i
n x
2
=
w
i
y
i
, where w
i
=
1
n
(x
i
x) x
s
xx
.
The BLUE only looks at linear estimators of
1
. The linear estimators
are dened by
1
=
n
i=1
a
i
y
i
.
In passing we notice that if
a
i
= w
i
, for all i = 1, , n
then
1
=
1
.
We have to make
1
unbiased. To take expectation of
1
we rst substi-
tute equation (1): y
i
=
1
+
2
x
i
+ u
i
for y
i
:
1
=
n
i=1
a
i
y
i
=
n
i=1
a
i
(
1
+
2
x
i
+ u
i
) =
1
a
i
+
2
a
i
x
i
+
a
i
u
i
.
E
1
=
1
a
i
+
2
a
i
x
i
+
a
i
Eu
i
=
1
a
i
+
2
a
i
x
i
,
3
since Eu
i
= 0 for all i. We see that
E
1
=
1
a
i
= 1
a
i
x
i
= 0
() means if and only if.
We take variance of
1
:
Var(
1
) E(
1
E
1
)
2
= E(
1
)
2
since E
1
=
1
= E(
a
i
u
i
)
2
= (a
2
1
Eu
2
1
+ + a
2
n
Eu
2
n
+ 2a
1
a
2
Eu
1
u
2
+ + 2a
n1
a
n
Eu
n1
u
n
)
=
2
(a
2
1
+ + a
2
n
) =
2
a
2
i
,
since Eu
2
i
=
2
and Eu
i
u
j
= 0, i = j. The variance of the OLS estimator,
Var(
1
) is given by
Var(
1
) =
2
w
2
i
.
We see
Var(
1
) Var(
1
)
n
i=1
a
2
i
n
i=1
w
2
i
.
Since a
i
is an arbitrary nonstochastic constant we can rewrite a
i
as
a
i
= w
i
+ d
i
.
Earlier we saw that
1
is unbiased if and only if
a
i
= 1 and
a
i
x
i
= 1.
So,
a
i
=
w
i
+
d
i
= 1
a
i
x
i
=
w
i
x
i
+
d
i
x
i
= 0.
But
w
i
=
_
1
n
(x
i
x) x
s
xx
_
= 1
x
s
xx
(x
i
x) = 1
w
i
x
i
=
_
1
n
(x
i
x) x
s
xx
_
x
i
=
1
n
x
i
x
2
i
n x
2
s
xx
x = x x = 0.
Hence
d
i
= 0 and
d
i
x
i
= 0.
We square a
2
i
and sum with respect to i = 1, , n:
a
2
i
=
(w
i
+ d
i
)
2
=
w
2
i
+
d
2
i
+ 2
w
i
d
i
=
w
2
i
+
d
2
i
4
since the cross product term is zero:
w
i
d
i
=
_
1
n
(x
i
x) x
s
xx
_
d
i
=
1
n
d
i
1
s
xx
(
d
i
x
i
x
d
i
) = 0
Hence
a
2
i
=
w
2
i
+
d
2
i
w
2
i
,
and this concludes the proof.
Proof that
2
is best
2
=
(x
i
x)y
i
s
xx
=
_
x
i
x
s
xx
_
y
i
=
v
i
y
i
where
v
i
=
x
i
x
s
xx
We shall use the fact that
v
i
= 0 and
v
2
i
= s
xx
.
The variance of
2
, Var(
2
), is given by
Var(
2
) =
2
v
2
i
.
Let
2
be a linear estimator of
2
:
2
=
b
i
y
i
.
We need to nd the conditions that make
2
unbiased. Taking expectation
we have
E
2
= E
b
i
(
1
+
2
x
i
+ u
i
) =
1
b
i
+
2
b
i
x
i
and thus
E(
2
) =
2
b
i
= 0,
b
i
x
i
= 1.
The variance of
2
, Var(
2
), is
Var(
2
) =
2
b
2
i
.
Let
b
i
= v
i
+ c
i
5
then
b
i
=
v
i
+
c
i
=
c
i
= 0
b
i
x
i
=
v
i
x
i
+
c
i
x
i
=
c
i
x
i
= 0
since
v
i
x
i
=
1
s
xx
(x
i
x)x
i
=
x
2
i
n x
2
s
xx
= 1.
So the variance of
2
becomes
Var(
2
) =
2
b
2
i
=
2
,
(v
i
+ c
i
)
2
=
2
(
v
2
i
+
c
2
i
+ 2
v
i
c
i
)
2
v
2
i
+
2
c
2
i
Var(
2
)
since
v
i
c
i
=
1
s
xx
(x
i
x)c
i
=
1
s
xx
(
x
i
c
i
x
c
i
) = 0.
Appendix: Proving Bestness using calculus
Another way to prove that the OLS estimators,
1
and
2
, are best is
to use calculus to nd the minimum variance. Since variance is a quadratic
function, it is twice dierentiable and thus we may use calculus to nd the
minimum.
Proving that
1
is best
The variance of a linear unbiased estimator is given by
2
a
2
i
with two linear constraints
a
i
= 1 and
a
i
x
i
= 0.
Hence we may form the following minimization problem subject to the linear
constraints:
min
a
1
,,an
2
a
2
i
subject to
_
_
_
a
i
= 1
a
i
x
i
= 0
6
We form the Lagrangian
=
2
a
2
i
1
(
a
i
1)
2
a
i
x
i
.
The rst order conditions are
a
1
= 2
2
a
1
1
x
1
2
= 0 (1)
a
2
= 2
2
a
2
1
x
2
2
= 0 (2)
.
.
.
.
.
.
.
.
.
a
n
= 2
2
a
n
1
x
n
2
= 0 (n)
1
=
a
i
+ 1 = 0 (n+1)
2
=
a
i
x
i
= 0 (n+2)
Adding the left hand and right hand sides of equations (1)(n) we have
2
2
a
i
n
1
x
i
= 0.
Since
a
i
= 1
2
2
n
1
2
n x = 0 (*)
Multiplying the left hand and right hand sides of equations (1)(n) by
x
1
, x
2
, x
n
respectively and adding up we have
2
2
a
i
x
i
x
i
x
2
i
= 0
Since
a
i
x
i
= 0 we have
n x
1
x
2
i
= 0 (**)
Equations (*) and (**) form a linear equation system in
1
and in
2
:
_
_
_
n
1
+
2
n x = 2
2
n x
1
+
2
x
2
i
= 0
Solving for
1
and for
2
we have
1
=
2
2
ns
xx
x
2
i
, and
2
=
2
2
x
s
xx
.
From equations (1)(n)we have
2
2
a
i
=
1
+
2
x
i
, i = 1, , n.
7
Substituting for
1
and for
2
we obtain
2
2
a
i
=
2
2
ns
xx
x
2
i
2
2
xx
i
s
xx
i = 1, , n
or
a
i
=
x
2
i
ns
xx
xx
i
s
xx
=
s
xx
+ n x
2
ns
xx
xx
i
s
xx
since
x
2
i
= s
xx
+ n x
2
,
=
1
n
(x
i
x) x
s
xx
= w
i
, i = 1, , n.
w
i
is for the OLS estimator of
1
,
1
.
The second order conditions are
a
2
1
= 2
2
, ,
2
a
2
n
= 2
2
,
2
2
1
= 0,
2
2
2
= 0,
and the cross-derivatives
a
1
a
2
= 0, , ;
2
a
n
a
n1
= 0
1
a
i
= 1, i = 1, , n
2
a
i
= x
i
, i = 1, , n.
Hence the bordered Hessian becomes
H =
_
a
2
1
a
1
a
2
2
a
1
a
n
a
1
a
1
2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
a
n
a
1
a
n
a
2
2
a
2
n
a
n
a
n
1
a
1
1
a
2
2
1
a
n
2
1
2a
1
2
a
2
2
2a
n
2
2
_
_
.
8
This becomes
H =
_
_
2
2
0 0 0 1 x
1
0 2
2
0 0 1 x
2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 0 2
2
1 x
n
1 1 1 1 0 0
x
1
x
2
x
3
x
n
0 0
_
_
.
and it can be proved that H is negative denite, and hence the solutions
a
1
= w
1
, a
2
= w
2
, , a
n
= w
n
yield the minimum variance.
Proving that
2
is best
The constrained minimization problem becomes
min
b
1
,,bn
2
b
2
i
subject to
_
_
_
b
i
= 0
b
i
x
i
= 1
The rst order conditions are
b
1
= 2
2
b
1
1
x
1
2
= 0 (1)
b
2
= 2
2
b
2
1
x
2
2
= 0 (2)
.
.
.
.
.
.
.
.
.
b
n
= 2
2
b
n
1
x
n
2
= 0 (n)
1
=
b
i
= 0 (n+1)
2
=
b
i
x
i
+ 1 = 0 (n+2)
We proceed just in the same way as we did before and obtain
_
_
_
1
+ x
2
= 0
n x
1
+ (
x
2
i
)
2
= 2
2
9
Solving for
1
and for
2
we obtain
1
=
2
2
x
s
xx
2
=
2
2
s
xx
Substituting for
1
and for
2
we obtain
2
2
b
i
=
1
+
2
x
i
=
2
2
x
s
xx
+
2
2
x
i
s
xx
= 2
2
_
x
i
x
s
xx
_
Hence
b
i
=
x
i
x
s
xx
= v
i
, i = 1, , n.
The second order conditions are obtained in a similar way and the bordered
Hessian is negative denite.
10