You are on page 1of 10

Gauss-Markov Theorem

The Gauss-Markov Theorem is given in the following regression model


and assumptions:
The regression model
y
i
=
1
+
2
x
i
+ u
i
, i = 1, , n (1)
Assumptions (A) or Assumptions (B):
Assumptions (A)
_

_
Eu
i
= 0 for all i
Var(u
i
) =
2
for all i
(homoscedasticity)
Cov(u
i
, u
j
) = 0 for all i = j
x
i
is nonstochastic constant
The text book uses Assumptions (B) (see p.588 of the text):
Assumptions (B)
_

_
E(u
i
|x
1
, , x
n
) = 0 for all i
Var(u
i
|x
1
, , x
n
)) =
2
for all i
(homoscedasticity)
Cov(u
i
, u
j
|x
1
, , x
n
) = 0 for all i = j
If we use Assumptions (B), we need to use the law of iterated expectations
in proving the BLUE. With Assumptions (B), the BLUE is given condi-
tionally on x
1
, , x
n
.
Let us use Assumptions (A). The Gauss-Markov Theorem is stated in
the boxed statement below:
1
Gauss Markov Theorem
Under Assumptions (A), the OLS estimators,

1
and

2
are the
Best Linear Unbiased Estimator (BLUE), that is
1. Unbias: E

1
=
1
and E

2
=
2
2. Best:

1
and

2
have the smallest variances among the class
of all linear unbiased estimators.
Real data seldomly satisfy Assumptions (A) or Assumptions (B). Ac-
cordingly we should think that the Gauss-Markov theorem only holds in the
never-never land. However, it is important to understand the Gauss-Markov
theorem on two grounds:
1. We may treat the world of the Gauss-Markov theorem as equivalent
to the world of perfect competition in micro economic theory.
2. The mathematical exercises are good for your souls.
We shall prove the Gauss-Markov theorem using the simple regression
model of equation (1). We can prove the Gauss-Markov theorem using the
multiple regression model
y
i
=
1
+
2
x
i2
+ +
k
x
ik
+ u
i
, i = 1, , n. (2)
To do so, however, we need to use vector and matrix language (linear al-
gebra.) Actually, once you learn linear algebra the proof of Gauss-Markov
theorem is far more straight forward than the proof for the simple regression
model of (1).
In the text book the Gauss-Markov theorem is discussed on the following
pages:
_

_
127
590 591
604 605
622 623
641 642
You should take a look at these pages.
2
Proving the Gauss-Markov Theorem
The unbiasedness of

1
and of

2
are given in the Comments on the
Midterm Examination and the answers to Assignment #5. So, we prove
here the minimum variance properties. There are generally two ways to
prove bestness: (i) using linear algebra, and (ii) using calculus. We prove
bestness using linear algebra rst, and we leave the proof using calculus to
the Appendix. First we prove that

1
has the smallest variance among all
other linear estimators of
1
.
Proof that

1
is best
We need to re-express

1
rst.

1
= y

2
x =
1
n

y
i

(x
i
x)y
i
s
xx
x
_
=

1
n
y
i

(x
i
x) x
s
xx
y
i
=
n

i=1
_
1
n

(x
i
x) x
s
xx
_
y
i
, where s
xx
=

x
2
i
n x
2
=

w
i
y
i
, where w
i
=
1
n

(x
i
x) x
s
xx
.
The BLUE only looks at linear estimators of
1
. The linear estimators
are dened by

1
=
n

i=1
a
i
y
i
.
In passing we notice that if
a
i
= w
i
, for all i = 1, , n
then

1
=

1
.
We have to make

1
unbiased. To take expectation of

1
we rst substi-
tute equation (1): y
i
=
1
+
2
x
i
+ u
i
for y
i
:

1
=
n

i=1
a
i
y
i
=
n

i=1
a
i
(
1
+
2
x
i
+ u
i
) =
1

a
i
+
2

a
i
x
i
+

a
i
u
i
.
E

1
=
1

a
i
+
2

a
i
x
i
+

a
i
Eu
i
=
1

a
i
+
2

a
i
x
i
,
3
since Eu
i
= 0 for all i. We see that
E

1
=
1

a
i
= 1

a
i
x
i
= 0
() means if and only if.
We take variance of

1
:
Var(

1
) E(

1
E

1
)
2
= E(

1
)
2
since E

1
=
1
= E(

a
i
u
i
)
2
= (a
2
1
Eu
2
1
+ + a
2
n
Eu
2
n
+ 2a
1
a
2
Eu
1
u
2
+ + 2a
n1
a
n
Eu
n1
u
n
)
=
2
(a
2
1
+ + a
2
n
) =
2

a
2
i
,
since Eu
2
i
=
2
and Eu
i
u
j
= 0, i = j. The variance of the OLS estimator,
Var(

1
) is given by
Var(

1
) =
2

w
2
i
.
We see
Var(

1
) Var(

1
)
n

i=1
a
2
i

n

i=1
w
2
i
.
Since a
i
is an arbitrary nonstochastic constant we can rewrite a
i
as
a
i
= w
i
+ d
i
.
Earlier we saw that

1
is unbiased if and only if

a
i
= 1 and

a
i
x
i
= 1.
So,

a
i
=

w
i
+

d
i
= 1

a
i
x
i
=

w
i
x
i
+

d
i
x
i
= 0.
But

w
i
=

_
1
n

(x
i
x) x
s
xx
_
= 1
x
s
xx

(x
i
x) = 1

w
i
x
i
=

_
1
n

(x
i
x) x
s
xx
_
x
i
=
1
n

x
i

x
2
i
n x
2
s
xx
x = x x = 0.
Hence

d
i
= 0 and

d
i
x
i
= 0.
We square a
2
i
and sum with respect to i = 1, , n:

a
2
i
=

(w
i
+ d
i
)
2
=

w
2
i
+

d
2
i
+ 2

w
i
d
i
=

w
2
i
+

d
2
i
4
since the cross product term is zero:

w
i
d
i
=

_
1
n

(x
i
x) x
s
xx
_
d
i
=
1
n

d
i

1
s
xx
(

d
i
x
i
x

d
i
) = 0
Hence

a
2
i
=

w
2
i
+

d
2
i

w
2
i
,
and this concludes the proof.
Proof that

2
is best

2
=

(x
i
x)y
i
s
xx
=

_
x
i
x
s
xx
_
y
i
=

v
i
y
i
where
v
i
=
x
i
x
s
xx
We shall use the fact that

v
i
= 0 and

v
2
i
= s
xx
.
The variance of

2
, Var(

2
), is given by
Var(

2
) =
2

v
2
i
.
Let

2
be a linear estimator of
2
:

2
=

b
i
y
i
.
We need to nd the conditions that make

2
unbiased. Taking expectation
we have
E

2
= E

b
i
(
1
+
2
x
i
+ u
i
) =
1

b
i
+
2

b
i
x
i
and thus
E(

2
) =
2

b
i
= 0,

b
i
x
i
= 1.
The variance of

2
, Var(

2
), is
Var(

2
) =
2

b
2
i
.
Let
b
i
= v
i
+ c
i
5
then

b
i
=

v
i
+

c
i
=

c
i
= 0

b
i
x
i
=

v
i
x
i
+

c
i
x
i
=

c
i
x
i
= 0
since

v
i
x
i
=
1
s
xx

(x
i
x)x
i
=

x
2
i
n x
2
s
xx
= 1.
So the variance of

2
becomes
Var(

2
) =
2

b
2
i
=
2
,

(v
i
+ c
i
)
2
=
2
(

v
2
i
+

c
2
i
+ 2

v
i
c
i
)
2

v
2
i
+
2

c
2
i
Var(

2
)
since

v
i
c
i
=
1
s
xx

(x
i
x)c
i
=
1
s
xx
(

x
i
c
i
x

c
i
) = 0.
Appendix: Proving Bestness using calculus
Another way to prove that the OLS estimators,

1
and

2
, are best is
to use calculus to nd the minimum variance. Since variance is a quadratic
function, it is twice dierentiable and thus we may use calculus to nd the
minimum.
Proving that

1
is best
The variance of a linear unbiased estimator is given by

2

a
2
i
with two linear constraints

a
i
= 1 and

a
i
x
i
= 0.
Hence we may form the following minimization problem subject to the linear
constraints:
min
a
1
,,an

2

a
2
i
subject to
_
_
_

a
i
= 1

a
i
x
i
= 0
6
We form the Lagrangian
=
2

a
2
i

1
(

a
i
1)
2

a
i
x
i
.
The rst order conditions are

a
1
= 2
2
a
1

1
x
1

2
= 0 (1)

a
2
= 2
2
a
2

1
x
2

2
= 0 (2)
.
.
.
.
.
.
.
.
.

a
n
= 2
2
a
n

1
x
n

2
= 0 (n)

1
=

a
i
+ 1 = 0 (n+1)

2
=

a
i
x
i
= 0 (n+2)
Adding the left hand and right hand sides of equations (1)(n) we have
2
2

a
i
n
1

x
i
= 0.
Since

a
i
= 1
2
2
n
1

2
n x = 0 (*)
Multiplying the left hand and right hand sides of equations (1)(n) by
x
1
, x
2
, x
n
respectively and adding up we have
2
2

a
i
x
i

x
i

x
2
i
= 0
Since

a
i
x
i
= 0 we have
n x
1

x
2
i
= 0 (**)
Equations (*) and (**) form a linear equation system in
1
and in
2
:
_
_
_
n
1
+
2
n x = 2
2
n x
1
+
2

x
2
i
= 0
Solving for
1
and for
2
we have

1
=
2
2
ns
xx

x
2
i
, and
2
=
2
2
x
s
xx
.
From equations (1)(n)we have
2
2
a
i
=
1
+
2
x
i
, i = 1, , n.
7
Substituting for
1
and for
2
we obtain
2
2
a
i
=
2
2
ns
xx

x
2
i

2
2
xx
i
s
xx
i = 1, , n
or
a
i
=

x
2
i
ns
xx

xx
i
s
xx
=
s
xx
+ n x
2
ns
xx

xx
i
s
xx
since

x
2
i
= s
xx
+ n x
2
,
=
1
n

(x
i
x) x
s
xx
= w
i
, i = 1, , n.
w
i
is for the OLS estimator of
1
,

1
.
The second order conditions are

a
2
1
= 2
2
, ,

2

a
2
n
= 2
2
,

2


2
1
= 0,

2


2
2
= 0,
and the cross-derivatives

a
1
a
2
= 0, , ;

2

a
n
a
n1
= 0

1
a
i
= 1, i = 1, , n

2
a
i
= x
i
, i = 1, , n.
Hence the bordered Hessian becomes
H =
_

a
2
1

a
1
a
2


2

a
1
a
n

a
1

a
1

2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

a
n
a
1

a
n
a
2


2

a
2
n

a
n

a
n

1
a
1

1
a
2


2

1
a
n

2
1

2a
1

2
a
2


2

2a
n

2
2
_

_
.
8
This becomes
H =
_

_
2
2
0 0 0 1 x
1
0 2
2
0 0 1 x
2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 0 2
2
1 x
n
1 1 1 1 0 0
x
1
x
2
x
3
x
n
0 0
_

_
.
and it can be proved that H is negative denite, and hence the solutions
a
1
= w
1
, a
2
= w
2
, , a
n
= w
n
yield the minimum variance.
Proving that

2
is best
The constrained minimization problem becomes
min
b
1
,,bn

2

b
2
i
subject to
_
_
_

b
i
= 0

b
i
x
i
= 1
The rst order conditions are

b
1
= 2
2
b
1

1
x
1

2
= 0 (1)

b
2
= 2
2
b
2

1
x
2

2
= 0 (2)
.
.
.
.
.
.
.
.
.

b
n
= 2
2
b
n

1
x
n

2
= 0 (n)

1
=

b
i
= 0 (n+1)

2
=

b
i
x
i
+ 1 = 0 (n+2)
We proceed just in the same way as we did before and obtain
_
_
_

1
+ x
2
= 0
n x
1
+ (

x
2
i
)
2
= 2
2
9
Solving for
1
and for
2
we obtain

1
=
2
2
x
s
xx

2
=
2
2
s
xx
Substituting for
1
and for
2
we obtain
2
2
b
i
=
1
+
2
x
i
=
2
2
x
s
xx
+
2
2
x
i
s
xx
= 2
2
_
x
i
x
s
xx
_
Hence
b
i
=
x
i
x
s
xx
= v
i
, i = 1, , n.
The second order conditions are obtained in a similar way and the bordered
Hessian is negative denite.
10

You might also like