You are on page 1of 50

# III 1

James B. McDonald
Brigham Young University
9/29/2010

## III. Classical Normal Linear Regression Model Extended to the Case of k

Explanatory Variables

A. Basic Concepts

## 1. The expected value of y is defined by

E( y 1 )
E( y 2 )
E(y) =

E( y n )
2. The variance of the vector y is defined by

## Var( y 1 ) Cov( y 1, y 2 ) Cov( y 1, y n )

Cov( y 2 , y 1 ) Var( y 2 ) Cov( y 2 , y n )
Var(y) =

## Cov( y n , y 1 ) Cov( y n , y 2 ) Var( y n )

NOTE: Let μ = E(y), then

y1 - 1

## =E (y1 - μ1, . . ., yn - μn)

yn - n
III 2

2
E( y 1 - 1 ) E( y 1 - 1 )( y 2 - 2 ) ... E( y 1 - 1 )( y n - n )
2
E( y 2 - 2 )( y 1 - 1 ) E( y 2 - 2 ) . . . E( y 2 - 2 )( y n - n )
. . .
=
. . .
. . .
2
E( y n - n )( y 1 - 1 ) E( y n - n )( y 2 - 2 ) ... E( y n - n )

## Var( y 1 ) Cov( y 1, y 2 ) ... Cov( y 1, y n )

Cov( y 2 , y 1 ) Var( y 2 ) ... Cov( y 2 , y n )
. . .
= .
. . .
. . .
Cov( y n , y 1 ) Cov( y n , y 2 ) ... Var( y n )

## 3. The n x l vector of random variables, y, is said to be distributed as a multivariate

normal with mean vector μ and variance covariance matrix (denoted y ~
N(μ, )) if the probability density function of y is given by

1 -1
- (y- ) (y- )
e2
f(y; , ) = n 1
.
(2 ) | | 2 2

## Special case (n = 1): y = (y1), μ = (μ1), = (ζ2).

1 1
- ( y1- 1) 2 ( y1- 1)
e2
f( y1 ; 1 , )= 1 1
2
(2 ) 2 ( )2
-(y1- 1)2
e 2 2
= .
2
2

## a. If y ~ N(μy, y), then z = Ay ~ N(μz = Aμy; z = A yA') where A is a

matrix of constants.
III 3

## b. If y ~ N(0,I) and A is a symmetric idempotent matrix, then y'Ay ~ χ2(m)

where m = Rank(A) = trace (A).
c. If y ~ N(0,I) and L is a k x n matrix of rank k, then Ly and y'Ay are
independently distributed if LA = 0.
d. If y ~ N(0,I), then the idempotent quadratic forms y'Ay and y'By are
independently distributed χ2 variables if AB = 0.

NOTE:

## (1) Proof of (a)

E(z) = E(Ay) = AE(y) = Aµy
VAR(z) = E[(z - E(z))(z - E(z))']
= E[(Ay - Aµy)(Ay - Aµy)']
= E[A(y - µy)(y - µy)'A']
= AE[(y - µy)(y - µy)']A'
= AΣyA' =Σ z

N(μ,ζ2).
y1 2
...0
. .
y ~N , .  .
. . 2
0. . .
yn

## The "Useful" Theorem 4.a implies that:

1 1 1 1 2
y = y1 + ... + y n = , . . . y ~ N( , /n) .
n n n n

Verify that
III 4

1 1
(a) ,...,  =
n n

1
n
1 1
(b) ,..., 2
I  = 2
/n .
n n
1
n
III 5

## (2) yt = β1 + β2xt2 + . . . + βkxtk + εt.

Note that βi can be interpreted as the marginal impact of a unit increase in xi on the

expected value of y.

. . . .

. . . .

(3) . . . .

y = Xβ + ε

III 6

## y1 x11 x1k columns: n observations on k

y2 individual variables.
x 21 x 2k
y= X= rows: may represent
observations at a given point
yn x n1 x nk in time.
(nx1) (nxk)

1 1

2 2
= and = .

k n

compactly as

y = Xβ + ε.

## (A.5)' The xtj's are nonstochastic and

XX
Limit = x is nonsingular.
n n
III 7

C. Estimation

We will derive the least squares, MLE, BLUE and instrumental variables estimators in

this section.

1. Least Squares:

## The basic model can be written as

y = Xβ + ε

= Xβˆ + e = Y
ˆ +e

where Ŷ = X βˆ is an nx1 vector of predicted values for the dependent variable and
e denotes a vector of residuals or estimated errors.
The sum of squared errors is defined by

n
ˆ =
SSE(β) 2
et
t=1

e1
e2
= (e1 , e2 ,  , en )

en
=ee
= (y - Xβ)ˆ (y - Xβ)ˆ
= y y - βˆ X y - y Xβˆ + βˆ X Xβˆ
= y y - 2βˆ X y + βˆ X Xβˆ .

ˆ A
The least squares estimator of β is defined as the β̂ which minimizes SSE (β).
necessary condition for SSE(β)ˆ to be a minimum is that
ˆ
dSSE(β)
=0 (see Appendix A for how to differentiate a real
dβˆ
valued function with respect to a vector)
ˆ
dSSE(β)
= -2X y + 2X Xβˆ = 0 or
ˆ

III 8

## Likelihood Function: (Recall y ~ N (Xβ; = ζ2I))

- 1 (y-X ) -1
(y-X )
2 e 2
L(y; , = I) = 1
(2 ) n/ 2 | | 2

1
- (y-X ) (y-X )
e2 2
= 1
(2 ) n/ 2 | 2
I| 2

(y-X ) (y-X ) / 2 2
e
= .
n 2 2 n2
(2 ) ( )

## The natural log of the likelihood function,

(y- X ) (y- X ) n n
 = ln L = - 2
- ln 2 - ln 2
2 2 2
1 n n
= 2
(y- X ) (y- X ) - ln 2 - ln 2
2 2 2
1 n n 2
= 2
y' y - 2 'X' Y 2 ' X ' X - ln 2 - ln
2 2 2

## is known as the log likelihood function.  is a function of β and ζ2.

The MLE. of β and ζ are defined by the two equations (necessary conditions for a
maximum):
1
= (-2 X y + 2(X X) ) = 0
β 2
2

(y - X ) (y - X ) n 1
2
= - 2
=0
2 2
2
2( )
III 9

i.e.,

= (X X ) -1 X'y

2 = 1 (y - X ) (y - X )
n
2
ee et
= =
n n
.

NOTE: (1) =ˆ

(2) 2
is a biased estimator of ζ2; whereas,

2 1 (y - X ) (y - X ) SSE
s = ee= =
n- k n-k n-k

## A proof of the unbiasedness of s2 is given in Appendix B.

Only n-k of the estimated residuals are independent. The
necessary conditions for least squares estimates impose k
restrictions on the estimated residuals (e). The restrictions
are summarized by the normal equations X'X ˆ = X'y, or
equivalently

X’e = 0

function

n SSE
 - 1 ln(2 ) ln
2 n
III 10

## equivalence of maximizing and minimizing SSE.

III 11

3. BLUE ESTIMATORS OF β, β .

## (least variance) linear unbiased estimator (BLUE) of β is the least squares

estimator. We first consider the desired properties and then derive the associated

estimator.
~
Linear: = Ay where A is a kxn matrix of constants
~
Unbiased: E( ) = AE(y) = AX
~
We note that E( ) = A X = requires AX = I.
Minimum Variance:
Var(βi) = A i Var(y) A i

= ζ2AiAi'

## where Ai = the ith row of A and β i = A i y .

Thus, the construction of BLUE is equivalent to selecting the matrix A so that the
rows of A
Min AiAi' i = 1, 2, . . ., k
s.t. AX = I
or equivalently, min Var(βi) s.t. AX = I (unbiased).

## A = (X'X)-1X' ; hence, the BLUE of β is given by = Ay (X X ) -1 X y .

The details of this derivation are contained in Appendix C.

NOTE: (1)
β = β = βˆ = (X X ) -1 X y

1
(2) AX X 'X X 'X I ; thus β is unbiased
III 12

## 4. Method of Moments Estimation

Method of moments parameter estimators are selected to equate sample and
corresponding theoretical moments. The open question is what theoretical
moments should be considered and what are the corresponding sample moments.
With the regression model we might consider the following theoretical moments
which follow from the underlying theoretical assumptions:
(A.2) E t 0
(A.5) Cov X it , t 0
The sample moment associated with (A.2) is
n
et / n e 0
t 1

## The sample covariances can be written as

n n n
X it Xi et e /n X it X i et / n X it et / n 0
t 1 t 1 t 1

## These sample moments can be summarized using matrix notation as follows:

1 1 ... 1 e1
x12 x22 ... xn 2 e2
/n X 'e / n 0
. . ... . .
x1k x2 k ... xnk en

which is equivalent to X’e=0 which are also known as the normal equations in the
OLS framework and yields the OLS estimator by solving
X 'e X ' Y X ˆ 0
for ˆ .

y = Xβ + ε

## Consider the solution of the modified normal equations:

1
Z'Y Z' X Z ; hence, β̂ z ZX Zy.

## instrumental variables Z. Instrumental variables can be very useful if the

variables on the right hand side include “endogenous” variables or in the case of
III 13

measurement error. In this case OLS will yield biased and inconsistent
estimators; whereas, instrumental variables can yield consistent estimators.
NOTE: (1) The motivation for the selection of the instruments (Z) is

## correlated. Thus Z'(Y) = Z'(Xβ + ε) = Z' X β + Z'ε Z' Xβ.

ZX Z
(2) If Lim is nonsingular and Lim = 0 , then
n n n n

β̂ z is a consistent estimator of β.

## (3) Many calculate an R2 after instrumental variables

estimation using the formula R2 = 1 – SSE/SST. Since this
can be negative, there is not a natural interpretation of R2
for instrumental variables estimators. Further, the R2 can’t
be used to construct F-statistics for IV estimators.

## (4) If Z includes “weak” instruments (weakly correlated

with the X’s), then the variances of the IV estimator can
be large and the corresponding asymptotic biases can be
large if the Z and error are correlated. This can be
seen by noting that the bias of the instrumental variables
estimator is given by
1
E Z ' X / n ( Z ' / n) .
Δ
(5) As a special case, if Z = X, then βˆ = βˆ = βˆ = β = β .
z x
(6) If Z is an n x k* matrix where k< k* (Z contains more
variables than X), then the IV estimator defined above must
be modified. The most common approach in this case is to
replace Z in the “IV” equation by the projections** of X on
the columns of Z, i.e. Xˆ Z Z ' Z Z ' X .
1

## This substitution yields the IV estimator

1
IV Xˆ ' X Xˆ ' Y
1 1 1
X 'Z Z 'Z Z'X X 'Z Z 'Z Z 'Y
which yields estimates for k k* .
III 14

.
The Stata command for the instrumental variables estimator
is given by
ivregress 2sls depvar (varlist_1 =varlist_iv)
[varlist_2]
where estimator = 2sls, gmm, or liml with
2sls is the default estimator

## A specific example is given by:

ivregres 2sls y1 (y2=z1 z2 z3) x1 x2 x3

## Identical results could be obtained with the command,

Ivregress 2sls y1 (y2 x1 x2 x3=z1 z2 z3)

## which is equivalent to regressing all of the right hand side

variables on the set of instrumental variables. This can be
thought of as being of the form

## **The projections of X on Z can be obtained by obtaining

estimates of
in the "reduced form" equation X Z V to yield
ˆ 1
Z ' Z Z ' X ; hence, the estimate of X is given by
Xˆ Zˆ
1
Z Z 'Z Z 'X
Δ
D. Distribution of β̂, β , β

Recall that under the assumptions (A.1) – (A.5) y ~ N(Xβ, = ζ2I) and
β̂ = β = β = (X X ) -1 X y;

## hence, by useful theorem (II.’ A. 4.a), we conclude that

Δ
β̂ = β = β ~ N(A y A y A ) = N[Ax , A 2
IA ]
III 15

where A = (X'X)-1X'.

## The desired derivations can be can be simplified by noting that

AXβ = (X'X)-1X'Xβ = β

ζ2AA' = ζ2(X'X)-1X'((X'X)-1X')'

= ζ2(X'X)-1X'X((X'X)-1)'

= ζ2((X'X)-1)'

= ζ2((X'X)')-1

= ζ2(X'X)-1.

Δ
1
Therefore β̂ = β = β ~ N β; 2
XX

NOTE: (1) ζ2(X'X)-1 can be shown to be the Cramer-Rao matrix, the matrix
of lower bounds for the variances of unbiased estimators.

Δ
(2) β̂, β, β, are

unbiased

consistent

## minimum variance of all (linear and nonlinear unbiased

estimators

normally distributed

s2(X'X)-1

III 16

programs.

. reg y x

. estat vce

## (5) Distribution of the variance estimator

(n - k)s 2 2
2
~ (n - k)

NOTE: This can be proven using the theorem (II'.A.4(b)) and noting that
ˆ (Y - Xβ)
(n- k)s 2 = e e = (Y - Xβ) ˆ .
= (X + ) (I - X(X X ) -1 X )(X + )

= ε'(I - X(X'X)-1X')ε.

(n- k)s 2 -1
Therefore, 2
= (I - X(X X ) X )

= M

(n- k)s 2 2
2
~ (n- k) because

## M is idempotent with rank and trace equal to n - k.

III 17

E. Statistical Inference

1. Ho: β2 = β3 = . . . = βk = 0

This hypothesis tests for the statistical significance of overall explanatory power
of the explanatory variables by comparing the model with all variables included to
the model without any of the explanatory variables, i.e., yt = β1 + εt (all non-
intercept coefficients = 0). Recall that the total sum of squares (SST) can be
partitioned as follows:
N N N
( y t - y )2 = ( yt - ŷt )2 + ( ŷt - y )2 or
t =1 t =1 t =1

## SST = SSE + SSR.

Dividing both sides of the equation by ζ2 yields quadratic forms, each having a
chi-square distribution:
SST SSE SSR
2
= 2 + 2
ζ ζ ζ
χ2(n - 1) = χ2(n - k) + χ2(k - 1).

## This result provides the basis for using

SSR
2
(K -1)(n- K)
F= K 1 = 2
~ F(K - 1, n - K)
SSE (n- K)(K -1)
n K

## to test the hypothesis that β2 = β3 = . . . = βk = 0.

2
SSR SSR SSR/SST
NOTE: (1) = = = R 2
SSE SST - SSR SSR 1- R
1-
SST
hence, the F-statistic for this hypothesis can also be rewritten as

2
R
2
k -1 n-k R
F= 2
= ~ F(k - 1, n - k).
(1 - R ) /(n - k) k -1 1- R2

III 18

follows:

## Source of Variation SS d.f MSE

Model SSR K-1 SSR/(K-1)
Error SSE n–K SSE/(n - K) s 2
Total SST n–1
K = number of coefficients in model

where the ratio of the model and error MSE’s yields the F statistic just discussed.
Additionally, remember that the adjusted R2 ( R 2 ), defined by

2 ( e 2t ) /(n- K)
R =1- 2
,
(Y t - Y) /(n - 1)
will only increase with the addition of a new variable if the t-statistic associated with
the new variable is greater than 1 in absolute value. This result follows from the
equation
2 2

(n 1) SSENew ˆ 0
2 2 New _ var
RNew ROld 1 where the last
n k n K 1 SST sˆ
New _ var

term in the product is t 2 1 and K denotes the number of coefficients in the “old”

regression model and the “new” regression model includes K+1 coefficients.

The Lagrangian Multiplier (LM) and likelihood ratio (LR) tests can also be
used to test this hypothesis where

LM NR 2 ~ a 2
(k 1)

LR N ln 1 R 2 ~ a 2
(k 1)
III 19

## 2. Testing hypotheses involving individual βi's

Recall that

β̂ ~ N (β; ζ (X X ) -1)

where

β1 βˆ 1βˆ 2 βˆ 1βˆ k

βˆ 2 βˆ 1 β2 βˆ 2 βˆ k
2 1
XX

βˆ k βˆ 1 βˆ k βˆ 2 βk

## s 2βˆ1 sβˆ1βˆ 2 sβˆ1βˆ k

2 1
sβˆ 2βˆ1 s 2βˆ 2 sβˆ 2βˆ k
s XX

βˆ i - β i
0

~ t(n - k)
s β̂ i

## The validity of this distributional result follows from

N(0,1)
~ t(d)
2
(d) /d
βˆ - β i (n - k) 2 2
since i ~ N(0,1) and 2 s β̂ i ~ χ (n - k).
ζ β̂i ζ β̂ i
III 20

## A linear combination of the βi's can be written as

k
β1
δ iβ i = (δ1,...,δ k ) = δ β.
=1
βk

## We now consider testing hypotheses of the form

H0: δ'β = γ.

Recall that

β̂ ~ N (β; ζ 2 (X X ) -1) ;

therefore,

δ βˆ ~ N (δβ; δ ζ 2 (X X) -1δ)

## δ'βˆ - δ'β δ'βˆ - γ

hence, = ~ t(n - k).
-1 2
δ' s 2(X,X) δ s δ'βˆ

## The t-test of a hypothesis involving a linear combination of the coefficients

involves running one regression and estimating the variance of δ βˆ from s2(X'X)-1

a. Introduction

## regression model (Ho: β2 = β3 = . . . βk = 0), tests involving individual parameters

(e.g., Ho: β3 = 6), and testing the validity of a linear constraint on the coefficients
III 21

(Ho: δ’β = γ). In this section we will consider how more general tests can be

performed. The testing procedures will be based on the Chow and Likelihood

ratio (LR) tests. The hypotheses may be of many different types and involve the

previous tests as special cases. Other examples might include joint hypotheses of

## the form: Ho: β2 + 6 β5 = 4, β3 = β7 = 0. The basic idea is that if the hypothesis is

really valid, then goodness of fit measures such as SSE, R2 and log-likelihood

values (l) will not be significantly impacted by imposing the valid hypothesis in

estimation. Hence, the SSE, R2 or values will not be significantly different for

## constrained (via the hypothesis) and unconstrained estimation of the underlying

regression model. The tests of the validity of the hypothesis are based on

y=Xβ+ε

## constraints on the β vector.

The Chow and likelihood ratio tests for testing Ho: g(β) = 0 can be

constructed from the output obtained from estimating the two following

regression models.

III 22

## specified by the hypothesis (Ho: g(β) = 0) in the estimation process. Let

the associated sum of squared errors, R2, log-likelihood value and degrees

## of freedom be denoted by SSE*, R2*, *

and (n - k)*, respectively.

b. Chow test

## The Chow test is defined by the following statistic:

SSE* - SSE
r
SSE ~ F(r, n - k)
n-k

## where r = (n-k) - (n-k)* is the number of independent restrictions imposed on β by

the hypothesis. For example, if the hypothesis was Ho: β2 + 6 β5 =4, β3 = β7 = 0,
then the numerator degrees of freedom (r) is equal to 3. In applications where the
SST is unaltered by the imposing the restrictions, we can divide the numerator and
denominator by SST to yield the Chow test rewritten in terms of the change in the
R2 between the constrained and unconstrained regressions.
2 2
- * n-k
F = R R2 ~ F(r, n - k)
1- R r

Note that if the hypothesis (H0: g(β) = 0) is valid, then we would expect R2 (SSE)
and R2* (SSE*) to not be significantly different from each other. Thus, it is only
large values (greater than the critical value) of F which provide the basis for
rejecting the hypothesis. Again, the R 2 form of the Chow test is only valid if the
dependent variable is the same in the constrained and unconstrained regression.

References:

## (1) Chow, G. C., "Tests of Equality Between Subsets of Coefficients in Two

Linear Regressions," Econometrica, 28(1960), 591-605.

(2) Fisher, F. M., "Tests of Equality Between Sets of Coefficients in Two Linear
III 23

## The LR test is a common method of statistical inference in classical

statistics. The motivation behind the LR test is similar to that of the Chow test
except that it is based on determining whether there has been a significant
reduction in the value of the log-likelihood value as a result of imposing the
hypothesized constraints on β in the estimation process. The LR test statistic is
defined to be twice the difference between the values of the constrained and
*
unconstrained log-likelihood values (2( - )) and, under fairly general
regularity conditions, is asymptotically distributed as a chi-square with degrees of
freedom equal to the number of independent restrictions (r) imposed by the
hypothesis. This may be summarized as follows:

a 2
LR = 2( - *) (r).

The LR test is more general than the Chow test and for the case of

independent and identically distributed normal errors, with known ζ2, LR is equal

to LR = [SSE* - SSE]/ζ2 .

Recall that s2 = SSE/(n - k) appears in the denominator of the Chow test statistic

and that for large values of (n-k), s2 is "close" to ζ2; hence, we can see the

*
LR = 2 ( - )

III 24

## If the hypothesis Ho: β2 = β3 = . . . βk = 0 is being tested in the classical

normal linear regression model, then SSE* = SST and LR can be rewritten in

## terms of the R2 as follows:

a
LR = nln[1/(1-R2)] = -nln[1-R2] ~ χ2(k-1).

In this case, the Chow test is identical to the F test for overall explanatory power

discussed earlier.

Thus the Chow test and LR test are similar in structure and purpose. The

LR test is more general than the Chow test; however, its distribution is

## to obtain SSE = et2 = (n - 4)s2, R2 ,

n SSE
= - 1 + ln(2 ) + ln ,
2 n
n-k = n - 4

III 25
III 26

## (c) Construct the test statistics

SSE* - SSE SSE* - SSE
(n k)* (n k) 2 n- 4 SSE*-SSE
Chow = = =
SSE SSE 2 SSE
n k n 4

2 2
- * n-4
= R R2 ~ F(2, n - 4)
1- R 2

a
LR = 2( - *) ~ χ2(2).

models.

(1) (1) (1) (1)
y X 0
(1)' y = (2)
= (2) (2)
+ (2)
y 0 X

## Estimate (1)' using least squares and determine SSE, R2,

and (n - k) = n1 + n2 - 2k.

Now impose the hypothesis that β(1) = β(2) = β and write (1)
as
(1) (1) (1)
y X
(2)’ y = (2)
= (2)
β + (2)
y X

III 27

## sum of squared errors (SSE*), R2*, * and

(n - k)* = n1 + n2 - k.

## (c) Construct the test statistics

SSE* - SSE
(n - k) * - (n - k)
Chow =
SSE
(n k)

2 2
- * n1 + n 2 - k ~ F (
= R R2 k , n1 + n 2 - 2 k)
1- R k

a
LR = 2( - *) ~ χ2 (k).

## 5. Testing Hypotheses using Stata

a. Stata reports the log likelihood values when the command

estat ic

## with the hypotheses:

(2) H1: β2 = 1

H2: β3 = 0

H3: β3 + β4 = 1

H4: β3β4 = 1

H5: β2 = 1 and β3 = 0

III 28

## estimation of the unconstrained model.

III 29

reg Y X2
X3 X4
estimates the unconstrained model
test X2 = 1 (Tests H1)
test X3 = 0 (Tests H2)
test X3 + X4 = 1 (Tests H3)

## testnl _b[X3]*_b[X4] = 1 (Tests H4. The “testnl” command is

for testing nonlinear hypotheses. The
suffix “_b”, along with the braces,
must be used when testing nonlinear
hypotheses)

## 95% confidence intervals on coefficient estimates are automatically calculated in

Stata. To change the confidence level, use the “level” option as follows:

## reg Y X2 X3 X4, level(90) (changes the confidence level

to 90%)
III 30

F. Stepwise Regression

## adding or removing variables in the model solely determined by their statistical

significance and not according to any theoretical reason. While stepwise regression can be

## A stepwise regression may use forward selection or backward selection. Using

forward selection, a stepwise regression will add one independent variable at a time to see

if it is significant. If the variable is significant, it is kept in the model and another variable

## is added. If the variable is not significant, or if a previously added variable becomes

insignificant, it is not included in the model. This process continues until no additional

Forward:

Backward:

III 31

## variables) other indep_vars

where the “#” in “pr(#)” is the significance level at which variables are removed, as

0.051, and the “#” in “pe(#)” is the significance level at which variables are entered or

added to the model. If pr(#1) and pr(#2) are both included in a stepwise regression

command, #1 must be greater than #2. Also, “depvar” represents the dependent variable,

“forced_indepvars” represent the independent variables which the user wishes to remain

in the model no matter what their significance level may be, and “other_indepvars”

represents the other independent variables which the stepwise regression will consider

including or excluding. Forward and backward stepwise regression may yield different

results.

G. Forecasting

Let yt = F(Xt, β) + εt

denote the stochastic relationship between the variable yt and the vector of variables Xt

## where Xt = (xt1,..., xtk). β represents a vector of unknown parameters.

ˆ ,
Forecasts are generally made by estimating the vector of parameters β(β)

ˆ .
ˆ t, β)
yˆ t = F(X

III 32

## 1. Incorrect functional form (This is an example of specification error and will be

discussed later.)

Yt

Xt

## were known with certainty

FE = yt - ŷt = yt - F(Xt,β) = εt

2
FE = Variance(FE)

= Var(εt) = ζ2.

## In this case confidence intervals for yt would be obtained from

Pr [F (X t, β) - t ( ζ < y t < F (X t, β) + t (
/ 2) ζ] = 1 - α
/ 2)

## which could be visualized as follows for the linear case:

Yt

Xt
III 33

Assume F(Xt, β) = Xtβ in the model yt = F(Xt, β) + εt, then the predicted

## value of yt for a given value of Xt is given by

yˆ t = X t βˆ ,
and the variance of yˆ t (sample regression line), 2
ŷ t is given by
2 ˆ
ζ ŷ t = X t Var (β) X t ,
with the variance of the forecast error (actual y) given by:

2
FE = ζ 2 + ζ 2 ŷ t .
2
Note that FE takes account of the uncertainty associated with the unknown

regression line and the error term and can be used to construct confidence

intervals for the actual value of Y rather than just the regression line.

## Unbiased sample estimators of 2

ŷ t and ζ 2 FE can be easily obtained by replacing ζ2
with its unbiased estimator s2.

## Confidence intervals for Yt:

P R [X tβˆ - t (α/2)s FE < Y t < X tβˆ + t (α/2)s FE] = 1 - α
III 34

Yt

Xt
III 35

## 4. A comparison of confidence intervals.

Some students have found the following table facilitates their understanding of the different confidence intervals for the

population regression line and actual value of Y. The column for the estimated coefficients is only included to compare

## Statistic ˆ Yˆt X t ˆ = sample regression line = FE (forecast error)

1
X 'X X 'Y
FE Yt Yˆt Yt Xt ˆ
predicted Y values corresponding to X t .

Distribution 2 1 2 2 1
N , X 'X N Xt , Yˆt
Xt ( X 'X ) X t' N 0, 2
FE
2 2
Yˆt

ˆ Xt ˆ Xt 1 Pr t
FE 0
t
=
t-stat 1 Pr t i i
t 1 Pr t t
/2
sFE
/2
/2 /2 /2 /2
sˆ sYˆ
i t
Pr FE t sFE 0 FE t sFE =
= Pr ˆi t s ˆ i
ˆ
i t sˆ Pr X t ˆ t sYˆ Xt X t ˆ t sYˆ 2 2
i i

Pr X t ˆ t sFE X t ˆ t sFE
2 2 2 2
Yt
2 2

## C.I. ˆ t s ˆ , ˆi t s ˆ Xt : X t ˆ t sYˆ , X t ˆ t sYˆ Yt : X t ˆ t sFE , Xt ˆ t sFE

i: i i i
2 2 2 2 2 2

where sYˆ is used to compute confidence intervals for the regression line ( E Yt X t ) and sFE is used in the calculation of
2 2 2 2
confidence intervals for the actual value of Y. Recall that s FE s s ; hence, s
Yˆ FE > sY2ˆ and the confidence intervals for
Y are larger than for the population regression line.
III 36

5. Uncertainty about X. In many situations the value of the independent variable also

needs to be predicted along with the value of y. Not surprisingly, a “poor” estimate of

Xt will likely result in a poor forecast for y. This can be represented graphically as

follows:

Yt
Y t

X t

Xt

X̂ t

## 6. Hold out samples and a predictive test.

One way to explore the predictive ability of a model is to estimate the model on a

subset of the data and then use the estimated model to predict known outcomes which

## 7. Example ŷt = 10 + 2.5 G t + 6 Mt

= βˆ t + βˆ 2G t + βˆ 3M t

where yt, Gt, Mt denote GDP, government expenditure, and money supply.

Assume that
III 37

10 5 2
2 -1
s (X X ) = 5 20 3 10-3 , s2 = 10 .
2 3 15

## Gt = 100, Mt = 200, i.e., Xt = (1, 100, 200).

10
ˆ
yˆ t = X t β = (1, 100, 200) 2.5
6
= 10 + 250 + 1200 = 1460.

## b. Evaluate s 2 ŷt and s2FE corresponding to the Xt in question (a).

10 5 2 1
2 -1
s ŷt = Xt (s2 (X X ) ) Xt = (1, 100, 200) 5 20 3 100 .10-3
2 3 15 200
= 921.81

s ŷt = 30.30

2 2
sFE = s + s ŷt = 10 + 921.81 = 931.81

SFE = 30.53

## 7. Forecasting—basic Stata commands

a) The data file should include values for the explanatory variables

## reg Y X1 . . . XK, [options]

c) Use the predict command, picking the name you want for the predictions, in
III 38

## predict yhat, xb this option predicts Ŷ

predict e, resid this option predicts the residuals (e)
predict sfe, stdf this option predicts the standard
error of the forecast ( s FE )
predict syhat, stdp this option predicts the standard
error of the prediction ( sYˆ )
list y yhat sfe this option lists indicated variables

These commands result in the calculation and reporting of Y, Ŷ, e, sFE and
sYˆ for observations 1 through n2. The predictions will show up in the Data
Editor of STATA under the variable names you picked (in this case, yhat,
e, sfe and syhat).

2 2
s ŷt = sFE - s2
III 39

## Problem Set 3.1

Theory

OBJECTIVE: The objective of problems 1 & 2 is to demonstrate that the matrix equations and
summation equations for the estimators and variances of the estimators are equivalent.
n
Remember Xt NX and Don’t get discouraged!!
t 1

## 1. BACKGROUND: Consider the model (1) Yt = β1 + β2 Xt+ εt (t = 1, . . ., N) or

equivalently,

Y1 1 X1 ε1
Y2 1 X2 ε2
(1)’ = 1
+
2

Yn 1 Xn εn

(1)” Y = Xβ + ε
ˆ
is ˆ = (X X )-1 X Y .
1
The least squares estimator of
ˆ
2

## If (A.1) - (A.5) (see class notes) are satisfied, then

Var( ˆ 1) Cov(ˆ 1 , ˆ 2)
Var( ˆ ) = = 2
(X X )-1
Cov(ˆ 2 , ˆ 1) Var( ˆ 2)

## QUESTIONS: Verify the following:

*Hint: It might be helpful to work backwards on part c and e.

N NX NY
a. XX= and X ' Y N

NX
2 X tYt
Xt t 1
III 40

b. ˆ = ( X t Y t - N X Y ) / ( X t 2 - N X 2)
2

c. ˆ =Y-ˆ X
1 2

d. Var( ˆ 2) = 2
/ ( X t 2 - N X 2)

2
1 X
e. Var( ˆ 1) = 2
+ 2 2
n Xt - N X
= Var( Y) + X 2 Var( ˆ 2)

f. Cov(ˆ 1 , ˆ 2) = - X Var( ˆ 2)
(JM II’-A, JM Stats)

## a. Show that this model is equivalent to Y = Xβ + ε

Y1 X1 ε1
Y2 X2 ε2
where Y ,X = ,ε

Yn Xn εn

b. Using the matrices in 2(a), evaluate (X X )-1 X Y and compare your answer with
the results obtained in question 4 in Problem Set 2.1.

(X X )-1 .
(JM II’-A)

Applied

## price = β0 + β1sqrft + β2bdrms + u

where price is the house price measured in thousands of dollars, sqrft is
the floorspace measured in square feet, and bdrms is the number of bedrooms.

## a. Write out the results in equation form.

III 41

b. What is the estimated increase in price for a house with one more bedroom, holding
square footage constant?
c. What is the estimated increase in price for a house with an additional bedroom that is 140
square feet in size? Compare this to your answer in part (ii).
d. What percentage variation in price is explained by square footage and number of
bedrooms?
e. The first house in the sample has sqrft = 2,438 and bdrms = 4. Find the predicted selling
price for this house from the OLS regression line.
f. The actual selling price of the first house in the sample was \$300,000 (so price = 300).
Find the residual for this house. Does it suggest that the buyer underpaid or overpaid for
the house?
III 42

Theory

## The R2 (coefficient of determination) is defined by

2 SSR SSE
R = =1-
SST SST

2 2
where SSE = et 2 and SST = (Y t - Y) , SSR = (Ŷ t - Y ) .

## Given that SST = SSR + SSE when using OLS,

a. Demonstrate that 0 R2 1.

## b. Demonstrate that n = k implies R2 = 1. (Hint: n=k implies that X is square. Be

careful! Show Y = Ŷ = X ˆ .)

## c. If an additional independent variable is included in the regression equation, will

the R2 increase, decrease, or remain unaltered? (Hint: What is the effect upon
SST, SSE?)

SSE/(n- k)
d. The adjusted R 2 , R 2 , is defined by R 2 = 1 - . Demonstrate that
SST/(n-1)
1- k
R
2
R
2
1 , i.e., the adjusted R2 can be negative.
n- k

SSE n- 1 n- 1
(Hint : 1 - R 2 = = (1 - R 2))
SST n- k n- k
e. Verify that

SSE* - SSE
LR = 2
if ζ2 is known

restricted SSE.
III 43

## f. For the hypothesis H0: β2 = . . . = βk = 0, verify that the corresponding LR statistic

1
can be written as LR = n ln = - n ln(1 - R 2) .
1- R2
FYI: The corresponding Lagrangian multiplier (LM) test statistic for this
hypothesis can be written in terms of the coefficient of variation as LM NR2 .
(JM II-B)

2. Demonstrate that

## a. X’e = 0 is equivalent to the normal equations X X ˆ = X Y .

b. X’e = 0 implies that the sum of estimated error terms will equal zero if regression
equation includes an intercept.
Remember: e Y Yˆ Y X ˆ
(JM II-B)

Applied

3. The following model can be used to study whether campaign expenditures affect election
outcomes:

## voteA = β0 + β1ln(expendA) + β2 ln(expendB) + β3 prtystrA + u

where voteA is the percent of the vote received by Candidate A, expendA and expendB are
campaign expenditures by Candidates A and B, and prtystrA is a measure of party
strength for Candidate A (the percent of the most recent presidential vote that went to A's
party).
i) What is the interpretation of β1?
ii) In terms of the parameters, state the null hypothesis that a 1% increase in A's
expenditures is offset by a 1% increase in B's expenditures.
iii) Estimate the model above using the data in VOTE1.RAW and report the results in
the usual form. Do A's expenditures affect the outcome? What about B's
expenditures? Can you use these results to test the hypothesis in part (ii)?
iv) Estimate a model that directly gives the t statistic for testing the hypothesis in part
(ii). What do you conclude? (Use a two sided alternative.). A possible approach,
test H 0 : 1 2 D , plug D 2 for 1 and simplify to obtain

III 44

## and test the hypothesis that the coefficient, D, of ln(expendA) is 0.

You can check your results by constructing the “high-tech” t-test or by using the
Stata command, test ln(expendA) + ln(expendB) =0 following the estimation of
the unconstrained regression model.
(Wooldridge C. 4.1)

## 1 40.26 64.63 133.14

2 40.84 66.30 139.24
3 42.83 65.27 141.64
4 43.89 67.32 148.77
5 46.10 67.20 151.02
6 44.45 65.18 143.38
7 43.87 65.57 148.19
8 49.99 71.42 167.12
9 52.64 77.52 171.33
10 57.93 79.46 176.41

## The Cobb Douglas Production function is defined by

β1+β 2t β β
(1) Yt = e K t 3 Lt 4 εt
where (β2t) takes account of changes in output for any reason other than a change in Lt or
Kt; εt denotes a random disturbance having the property that lnεt is distributed N(0, ζ2).
total wage receipts
Labor’s share is given by β3 if β3 + β4 (the returns to scale) is
totalsales receipts

## equal to one. β2 is frequently referred to as the rate of technological change

dYt
/ Y t for fixed L and K . Taking the natural logarithm of equation(1),we obtain
dt

III 45

## a. Estimate equation (2) using the technique of least squares.

b. Corresponding to equation (2)
1) Test the hypothesis Ho: β2 = β3 = β4 = 0. Explain the implications of this
hypothesis. (95% confidence level)
2) perform and interpret individual tests of significance of β2, β3, and β4, i.e. test
Ho : βi = 0 .α = .05.
3) test the hypothesis of constant returns to scale, i.e., Ho: β3 + β4 = 1, using
a. a t-test for general linear hypothesis, let restrictions δ= (0,0,1,1);
b. a Chow test;
c. a LR test.
c. Estimate equation (3) and test the hypothesis that labor’s share is equal to .75, i.e., β3 =
.75.
d. Re-estimate the model (equation 2) with the first nine observations and check to see if the actual
log(output) for the 10th observation lies in the 95% forecast confidence interval.
(JM II)

## ln(Y) = β1 + β 2 t + β 3 ln(L) + β 4 ln(K) + β 5(ln(L)) 2 + β 6(ln(K)) 2 + β 7(ln(L)) ln(K) + ln(εt )

a. What restrictions on the translog production function result in a Cobb-Douglas
production function?
b. Estimate the translog production function using the data in problem 5 and use the Chow and
LR tests to determine whether it provides a statistically significant improved fit to the data,
relative to the Cobb-Douglas function.
(JM II)
III 46

## 6. The transcendental production function corresponding to the data in problem 5 is defined by

Y = eβ1 + β 2 t + β3 L + β 4 K Lβ5Kβ6
a. What restrictions on the transcendental production function result in a Cobb-Douglas
production function?
b. Estimate the transcendental production function using the data in problem 2 and use the Chow
and LR tests to compare it with the Cobb-Douglas production function.
(JM II)
III 47

APPENDIX A
Some important derivatives:

x1 a1 a11 a12
Let X = , a= , A=
x2 a2 a 21 a 22
(symmetric) (a12 = a 21 = a )

d (a X) d (X a)
1. = =a
dX dX

d (X AX)
2. = 2 AX
dX

d (X a)
Proof of =a
dX

## Note: a’X = X’a = a1x1 + a2x2

d (X a) (X a) / X1 a1
= = =a
dX (X a) / X2 a2

d (X AX)
Proof of = 2 AX
dX

## Note: X’AX = a11x12 + (a12 + a21) x1x2 + a22 x22

d (X AX) (X a) / X1 2 a11 x1 + 2a x 2
= =
dX (X AX) / X2 2a x1 + 2 a 22 x 2

a11 x1 + a x 2
=2
a x1 + a 22 x 2

a11 a x1
=2
a a 22 x2
= 2 AX.
III 48

APPENDIX B

## An unbiased estimator of ζ2 is given by

1
2
s = (y (I - X (X X )-1 X ) y) = SSE/(n- k) .
n- k

n
tr (A) = a ii
i

1) tr(I) = n

## 4) tr(AB) = tr(BA) if both AB and BA are defined

5) tr(ABC) = tr(CAB)

6) tr(kA) = k tr(A)

## Now, remember that

2 1
ζ̂ = ee
n

1
and s2 = ee
n-k

e = y - Xβˆ = y - X ( X X ) -1 X y = My

= M (Xβ + ε) = MXβ + Mε ,
= Mε ,
where M = I - X(X’X)-1X’.

## Note that M is symmetric, and idempotent (problem set R.2).

1 1
So ζ̂ 2 = e e = ε M Mε
n n
III 49

1
= ε MMε .
n

1
= ε Mε .
n

1
and s 2 = ε Mε .
n-k

1 1
E (ζ̂ 2) = E (ε Mε) = E (tr(ε Mε)) because cov (ε i, ε j) = 0, i j)
n n

1 1
= Etr (M εε ) = tr (ME (εε ))
n n

1 1
= tr (M ζ 2I) = tr (ζ 2 M)
n n

2
ζ
= tr(M)
n

2
ζ -1
= tr(I - X(X X ) X )
n
2
ζ -1
= (n - tr (X(X X ) X ))
n

2
ζ -1
= (n - tr (X X(X X ) ))
n

2
ζ
= (n - trace (I k ))
n

2
ζ
= (n - k)
n

n-k 2 n
= 2
ζ so E (s ) = E (ζˆ 2) = ζ 2 .
n n-k
n
Therefore ζ̂ 2 is biased, but E (s 2) = E (ζˆ 2) = ζ 2 and s2 is unbiased.
n-k
III 50

APPENDIX C

β = AY = (X X) X Y is BLUE.

Proof: Let β i = A i Y where Ai denotes the ith row of the matrix A. Since the result will be
symmetric for each βi (hence, for each Ai), denote Ai by a’ where a is a (n by 1) vector.

or min a’Ia

## The necessary conditions for a solution are:

= 2a I + λ X = 0
a
= (X a - i) = 0 .
λ

This implies

a = (-1/ 2)λ X ) .
Now substitute a = (-½)Xλ into the expression for = 0 and we obtain
λ
(-1/ 2) X X λ = i
λ = - 2 (X X ) -1 i
a = (-1 / 2) (-2) i (X X )-1 X

= i (X X )-1 X = Ai .
which implies
A = (X X )-1 X
-1
hence, β = (X X ) X y .