You are on page 1of 5

Today’s Lesson 1

Basic Econometrics (1):
Ordinary Least Squares
I Representations of Classical Linear Regression Equa-
tions and the Classical Assumptions
I.1 Four Representations of Classical Linear Regression Equa-
tions
There are four conventional ways to describe classical linear regression equations as the
following.
(i) Scalar form
¸
t
= ,
1
+ r
2t
,
2
+ r
3t
,
3
+ c
t
. c
t
s iid(0. o
2
)
(ii) Vector form for each observation
¸
t
=
_
1 r
2t
r
3t
_
_
_
,
1
,
2
,
3
_
_
+ c
t
. c
t
s iid(0. o
2
)
= r
0
t
, + c
t
where r
t
=
_
1 r
2t
r
3t
_
0
and , =
_
,
1
,
2
,
3
_
0
(iii) Vector form for each variable
_
¸
¸
¸
_
¸
1
¸
2
.
.
.
¸
T
_
¸
¸
¸
_
. ¸¸ .
1
=
_
¸
¸
¸
_
1
1
.
.
.
1
_
¸
¸
¸
_
. ¸¸ .
,
1
+
A
1
_
¸
¸
¸
_
r
21
r
22
.
.
.
r
2T
_
¸
¸
¸
_
. ¸¸ .
,
2
+
A
2
_
¸
¸
¸
_
r
31
r
32
.
.
.
r
3T
_
¸
¸
¸
_
. ¸¸ .
,
3
+
_
¸
¸
¸
_
c
1
c
2
.
.
.
c
T
_
¸
¸
¸
_
. ¸¸ .
A
3
c
where c s (0. o
2
1
T
)
1
(iv) Matrix form
_
¸
¸
¸
_
¸
1
¸
2
.
.
.
¸
T
_
¸
¸
¸
_
. ¸¸ .
1
=
_
¸
¸
¸
_
1 r
21
r
31
1 r
22
r
32
.
.
.
.
.
.
.
.
.
1 r
2T
r
3T
_
¸
¸
¸
_
. ¸¸ .
_
_
,
1
,
2
,
3
_
_
. ¸¸ .
+
_
¸
¸
¸
_
c
1
c
2
.
.
.
c
T
_
¸
¸
¸
_
. ¸¸ .
A , c
I.2 The Classical Assumptions
The classical assumptions are assumptions about the explanatory variables and the
stochastic error terms.
A. 1 A is strictly exogenous and has full column rank (no multicollinearity)
A. 2 The disturbances are mutually independent and the variance is constant at each
sample point, which can be combined in the single statement, cjA (0. o
2
1
T
)
Given the assumptions, we make two additional assumptions: (i) A is nonstochastic,
and (ii) the error terms are normally distributed, i.e. c `(0. o
2
1
T
). These assump-
tion may look very restrictive, but the properties of the estimators derived under the
stronger assumptions are extremely useful to understand the large-sample properties of
the estimators under the classical assumptions.
II OLS Estimator
The OLS estimator
^
, minimizes the residual sum of squares n
0
n where n = 1 A/.
Namely
^
, = arg min
b
1oo = n
0
n
Then,
1oo = n
0
n
= (1 Ab)
0
(1 Ab)
= 1
0
1 b
0
A
0
1 1
0
Ab +b
0
A
0
Ab
= 1
0
1 2b
0
A
0
1 +b
0
A
0
Ab
2
since the transpose of a scalar is the scalar and thus b
0
A
0
1 = 1
0
Ab. The …rst order
conditions are
J1oo
Jb
= 2A
0
1 +2A
0
Ab = 0
(A
0
A) b = A
0
1
, which gives the OLS estimator
^
, = (A
0
A)
1
A
0
1
Remark 1 Let `
X
= 1
T
A(A
0
A)
1
A
0
. It can be easily seen that It follows that
`
X
1 is the vector of residuals when 1 is regressed on A. Also note that ^ c = `
X
1 =
`
X
(A, + c) = `
X
c. Then `
X
is a symmetric (`
0
X
=`
X
) and idempotent (`
X
`
X
=`
X
)
matrix with the properties that `
X
A = 0
T
and `
X
^ c = `
X
1 = ^ c since `
X
A = 0
T
.
Remark 2 The trace of an nn square matrix G, denoted by t:(G), is de…ned to be the
sum of the elements on the diagonal elements of G. Then, by de…nition, t:(c
0
) = c for
any scalar c.
Remark 3 t:(¹1C) = t:(1C¹) = t:(C¹1).
Remark 4 Estimator of o
2
On letting / denote the number of the regressors, ^ o
2
=
^ c
0
^ c,(1 /) is an unbiased estimator of o
2
since
E[^ c
0
^ c] = E[c
0
`
0
X
`
X
c]
= E[c
0
`
X
c] since `
X
is idempotent and symmetric
= E[t: (c
0
`
X
c)] by the Remark (2)
= E[t: (cc
0
`
X
)] by the Remark (3)
= o
2
t: (`
X
)
= o
2
t: (1
T
) o
2
t:
_
A(A
0
A)
1
A
0
¸
= o
2
1 o
2
t:
_
(A
0
A)
1
A
0
A
¸
= o
2
(1 /)
and hence we have
E
_
^ c
0
^ c
(1 /)
_
= o
2
.

3
By making an additional assumption that c is normally distributed, we have
^
, = (A
0
A)
1
A
0
1
= (A
0
A)
1
A
0
(A, + c)
= , + (A
0
A)
1
A
0
c
N(,. o
2
(A
0
A)
1
)
For each individual regression coe¢cients,
^
,
i
N(,
i
. o
2
(A
0
A)
1
i
)
where o
2
(A
0
A)
1
i
is the (i. i) element of o
2
(A
0
A)
1
. In general, however, o
2
is unknown,
and thus the estimated variance of
^
, is
\
\ c:
_
^
,
_
= ^ o
2
(A
0
A)
1
II.1 Gauss-Markov Theorem
The OLS estimator is seen to be linear combinations of the dependent variables and
hence linear combinations of the error terms. As one could see above, it belongs to
the class of linear unbiased estimators. Its distinguishing feature is that its sampling
variance is the minimum that can be achieved by any linear unbiased estimator under
the classical assumptions. The Gauss-Markov Theorem is the fundamental least-square
theorem. It states that conditioned on the classical assumptions made, any other linear
unbiased estimator of the , cannot have smaller sampling variances than that of the
least-squares estimator. It is said to be a BLUE(best linear unbiased estimator).
III Measures of Goodness of Fit
The vector of the dependent variable 1 can be decomposed into the part explained by
the regressors and the unexplained part.
1 =
^
1 + ^ c where
^
1 = A
^
,
Remark 5
^
1 and ^ c are orthogonal because
^
1
0
^ c =
^
,
0
A
0
^ c
4
=
^
,
0
A
0
`
X
c
= 0 since A
0
`
X
= 0
TT

Then it follows from the Remark 5 that if

1 is the sample mean of the dependent
variable, then
1
0
1 1

1
2
. ¸¸ .
TSS
=
^
1
0
^
1 1

1
2
. ¸¸ .
ESS
+ ^ c
0
^ c
.¸¸.
RSS
.
The coe¢cient of multiple correlation 1 is de…ned as the positive square root of
1
2
=
1oo
1oo
= 1
1oo
1oo
1
2
is the proportion of the total variation of 1 explained by the linear combination of
the regressors. This value is increasing by including any additional regressors even if the
added regressors are irrelevant to the dependent. However, the adjusted 1
2
, denoted by

1
2
may decrease with the addition of variables of low explanatory power.

1
2
= 1
1oo
1oo

1 1
1 /
Two of the frequently used criteria for comparing the …t of various speci…cations is
the Schwarz criterion
oC = ln
^ c
0
^ c
1
+
/
1
ln 1
and the Akaike information criterion
¹1C = ln
^ c
0
^ c
1
+
2/
1
It is important to note that these criterions favor a model with smaller sum of the
squared residuals, but each criterion adds on a penalty for model complexity measured
by the number of the regressors. Most statistics computer programs routinely produce
those criterions.
5