You are on page 1of 50

CHAPTER THREE

THE MULTIPLE LINEAR REGRESSION


(MLR)
3.1 Introduction:The MLR
3.2 Assumptions of the MLR
3.3 Estimation:The Method of OLS
3.4 Properties of OLS Estimators
3.5 Coefficients of Determination
3.6 Statistical Inferences in MLR

1
3.1 Introduction: The Multiple Linear Regression

Relationship between a dependent & two/more


independent variables is linear in parameters.

Population Population slopes Random


Y-intercept Error

Yi = β 0 + β1 X 1i + β 2 X 2i + • • • + β K X Ki + ε i
Yi = βˆ0 + βˆ1 X 1i + βˆ2 X 2i + • • • + βˆ K X Ki + ei
Residual
Dependent (Response) Independent/Explanatory
2
variable (for sample) variables (for sample)
3.1 Introduction: The Multiple Linear Regression

What changes as we move from simple to


multiple regression?
1. Potentially more explanatory power with
more variables;
2. The ability to control for other variables;
(and the interaction of various explanatory
variables: correlations and multicollinearity);
3. Harder to visualize drawing a line through
three or more (n)-dimensional space.
4. The R2 is no longer simply the square of the
correlation coefficient betweenY and X.
3
3.1 Introduction: The Multiple Linear Regression
Slope ( βj ):
Ceteris paribus, Y changes by βj units for every
1 unit change in Xj , on average.
Y-Intercept ( β0 ):
The average value of Y when all Xjs are zero.
(may not be meaningful all the time)
A MLR model is linear in parameters, and
may not be linear in regressors.
Thus, the definition of MLR includes
polynomial regression.
e.g. Yi = β0 + β1 X1i + β2 X2i + β3 X12i + β4 X1i X2i + εi
4
3.2 Assumptions of the Multiple Linear Regression
Assumptions from Chapter Two + 1 new.
1. E(ɛi|Xji) = 0. (for all i = 1, 2, …, n; j = 1, …, K)
2. var(ɛi|Xji) = σ2. (Homoscedastic errors)
3. cov(ɛi,ɛs|Xji,Xjs) = 0 (i ≠ s) (No autocorrelation)
4. cov(ɛi,Xji) = 0. Errors are orthogonal to Xs.
5. Xj is non-stochastic & must take different
values.
6. n > K+1. (Number of observations > number
of parameters estimated). Number of
parameters is K+1 in this case ( β0, β1, …, βK )
7. ɛi ~N(0, σ2). Normally distributed errors.
5
3.2 Assumptions of the Multiple Linear Regression

Additional Assumption:
8. No perfect multicollinearity: That is, no
exact linear relation exists between any
subset of explanatory variables.
In the presence of perfect (deterministic)
linear relationship between/among any
set of the Xj5s, the impact of a single
variable ( β j ) cannot be identified.
More on multicollinearity in chapter 4!

6
3.3 Estimation:The Method of OLS
The Case of K Explanatory Variables
Yi = β 0 + β 1 X 1i + β 2 X 2 i + K + β K X Ki + ε i
Number of parameters is = K+1.
Y1 = βˆ0 + βˆ1 X11 + βˆ2 X 21 + K+ βˆK X K1 + e1
Y = βˆ + βˆ X + βˆ X + K+ βˆ X + e
2 0 1 12 2 22 K K2 2

Y3 = βˆ0 + βˆ1 X13 + βˆ2 X 23 + K+ βˆK X K 3 + e3


M M M M O M M
Yn = βˆ0 + βˆ1 X1n + βˆ2 X 2n + K+ βˆK X Kn + en
7
3.3 Estimation:The Method of OLS (Matrix Approach)

 Y1  1 X11 X 21 X 31 K X K1   βˆ 0  e1 
Y  1  ˆ   
 2  X12 X 22 X 32 K X K2   β1  e 2 
Y3  = 1 X13 X 23 X 33 K X K3  •  βˆ 2  + e 3 
       
M
   M M M M O M   M  M
Yn  1 X1n X 2n X 3n KX Kn  βˆ K  e n 
n × ( K + 1) (K +1) ×1
n ×1 n ×1

8
Y = Xβ + e
ˆ
3.3 Estimation:The Method of OLS

.
e1  Y1  1 X11 X21 X31 KXK1   βˆ 0 
e  Y  1  ˆ 
 2  2  X12 X22 X32 KXK2   β1 
e 3  = Y3  − 1 X13 X23 X33 KXK3  *  βˆ 2 
       
 M   M  M M M M O M  M 
en  Yn  1 X1n X2n X3n KXKn  βˆ K 

e = Y − Xβ̂
9
3.3 Estimation:The Method of OLS
e1 
 
e2 
RSS= e1 + e2 +...+ en = ∑ei = (e1 e2 K en ).  
2 2 2 2
M
 
e 
⇒RSS= e'e  n
RSS = (Y − Xβ̂)' (Y − Xβ̂) = Y'Y − Y'Xβˆ − βˆ ' X'Y + βˆ ' X'Xβˆ
Since Y' Xβˆ is a costant, Y' Xβˆ = (Y' Xβˆ )' = βˆ ' X' Y
⇒ RSS = Y'Y − 2βˆ ' X'Y + βˆ ' (X'X)βˆ
∂(RSS) ∂(RSS )
F.O.C. : =0 ⇒ = −2X' Y + 2X' Xβˆ = 0
∂(βˆ ) ∂(βˆ )
⇒ −2X' (Y − Xβˆ ) = 0
10
3.3 Estimation:The Method of OLS

 1 1 K 1   e   0
 X11 X12 K X1n   1   
X   e2   0
⇒ X'e = 0 ⇒  21 X 22 K X 2n .  =
M
M
 M M O M    
X   e   0
 K1 X K 2 K X Kn   n   

1. ∑ ei = 0 2. ∑ ei X ji = 0. ( j = 1,2,..., K )

X'e = X'(Y − Xβˆ ) = 0 ⇒ X' Xβˆ = X' Y


ˆβ = ( X' X) −1 X' Y
11
3.3 Estimation:The Method of OLS

K 1  
 βˆ0   1 1 X 11 K XK1
  1
 β
ˆ   X 11 X 12 K X 1 n  1 X 12 K XK2
βˆ =  1  X' X =  . 
 M M O M  M M O M 
 M 
 βˆ   X K 1 XK2 K X Kn  1 X 
K X Kn 
 K  1n

 n ∑X 1 K ∑ XK 
 
⇒ X' X =  ∑ X1 ∑X 1
2
K∑ X1 X K 
 M M O M 
 
∑ X K ∑ X K X1 K ∑ X K2 

 ∑Y 
1   1 
 1 Y  YX 
∑
1 K 1 
 X 11 X 12 K X 1n   Y 2 
X'Y =   ⇒ X ' Y =  ∑ YX 2 
 M M O M  M  
M

 
 X K 1 X K2 K X Kn   Y   ∑ YX n 
 n  
12
3.3 Estimation:The Method of OLS

ˆβ = (X' X)−1 (X' Y)


-1
 βˆ 0   n
   ∑X 1 ∑X 2 K ∑X K 

 ∑Y 
 
 βˆ1   ∑ X 1 ∑X ∑X X ∑X X  ∑ YX 1 
2
1 1 2 K 1 K 
 ˆ = X  YX 
 β2   ∑ 2 ∑X X ∑X ∑X X ∑ 2 
2 1
2
2 K 2 K


 M   M M M O M   M 
 ˆ   2   
 β K  ∑ X K ∑ X K X1 ∑ XK X2 K ∑ K 
X  ∑ YX K 
(K+1)×1 (K +1)×(K +1) (K+1)×1

You may also apply Cramer' s Rule to (X' X)βˆ = (X' Y)


and solve for βˆ
13
3.3 Estimation:The Method of OLS

Numerical Example:
Y (Salary in X1 (Years of post high X2 (Years of
'000 Dollars) school Education) Experience)
30 4 10
20 3 8
36 6 11
24 4 9
40 8 12
ƩY = 150 ƩX1 = 25 ƩX2 = 50
14
3.3 Estimation:The Method of OLS

X1Y X2Y X12 X22 X1X2 Y2


120 300 16 100 40 900
60 160 9 64 24 400
216 396 36 121 66 1296
96 216 16 81 36 576
320 480 64 144 96 1600
ƩX1Y = ƩX2Y = ƩX12 = ƩX22 = ƩX1X2 = ƩY2 =
812 1552 141 510 262 4772

15
3.3 Estimation:The Method of OLS
−1
 βˆ 0   n
   ∑X ∑X 1 2   ∑Y 
  
β = (X' X) X' Y
ˆ −1
 βˆ 1  = ∑ X 1
 ˆ  
∑X ∑X X 2
1 1 2 •  ∑YX1 
 YX 
 β2  ∑ X 2 ∑X X ∑X ∑ 2 
1 2
2

2 

 βˆ0   5 25 50 −1  150 40.825 4.375 - 6.25  150


      
  
 βˆ1  = 25 141 262 •  812 = 4.375 0.625 - 0.75  • 812
 ˆ   
1552  - 6.25 - 0.75  1552
β 50 262 510   1   
 2   
 βˆ 0   - 23.75 
   
⇒  βˆ 1  =  - 0.25 
 ˆ   
β
 2   5.5 

Ŷ = −23.75 − 0.25 X1 + 5.5 X 2


16
3.3 Estimation:The Method of OLS
Interpretation:
One more year of experience, after controlling for years of
education, results in $5,500 rise in salary, on average.

Or, if we consider two persons (A & B) with the same level of


education, the one with one more year of experience (A) is
expected to have a salary $5500 more than that of B.

Similarly, for two people (C & D) with the same level of


experience, the one with an education of one more year (D) is
expected to have a salary $250 less than that of C.

Experience looks far more important than education (which has


a negative sign).

17
3.3 Estimation:The Method of OLS
The constant term -23.75 is the salary one with no experience &
no education would get.

But, a negative salary is impossible.

Then, what is wrong?

1. The sample must have been drawn from a sub-group. We have


persons with experience ranging from 8 to 12 years (and post
high school education ranging from 3 to 8 years). So we should
not extrapolate the results too far out of this sample range.
2. Model specification: is our model correctly specified (variables,
functional form); does our data set satisfy the underlying
assumptions?

18
3.4 Properties of OLS Estimators
Given the assumptions of the CLRM (in Section
3.2), the OLS estimators of the partial
regression coefficients are BLUE: linear,
unbiased & have minimum variance in the class
of all linear unbiased estimators – the Gauss-
Markov Theorem.
In cases where the small-sample desirable
properties (BLUE) may not be found, we look
for asymptotic (or large-sample) properties
like consistency and asymptotic efficiency &
asymptotic normality (CLT).
The OLS estimators are consistent:
p lim(βˆ − β) = 0 & p lim var( βˆ ) = 0
n→ ∞
n →∞
19
3.5 Partial Correlations & Coefficients of Determination
In the multiple regression equation with 2
regressors (X1 & X2), Yi = βˆ 0 + βˆ 1X1i + βˆ 2X2i + ei ,
we can talk of:
the joint effect of X1 and X2 on Y, and
the partial effect of X1 or X2 on Y.
The partial effect of X1 is measured by β̂ 1 &
the partial effect of X2 is measured byβ̂ .
2
Partial effect: holding the other variable
constant or after eliminating the effect of the
other variable.
Thus,β̂ 1 is interpreted as measuring the effect
of X1 on Y after eliminating the effect of X2
from X1.
20
3.5 Partial Correlations and Coefficients of Determination

Similarly, βˆ 2 measures the effect of X2 on Y


after eliminating the effect of X1 on X2.
Thus, we can derive β̂1(the estimator of β1) by
estimating two separate regressions):
Step 1: Regress X1 on X2 (an auxiliary
regression to eliminate the effect of X2 from
X1). Let the regression equation be:
X 1 = a + b12 X 2 + e12 , Or, in deviation form:
x1 = b12x2 + e12 .
Then, b12 =
∑ x1x2
∑x 2
2
e12 is part of X1 free from influence of X2.
21
3.5 Partial Correlations and Coefficients of Determination

Step 2: Regress Y on e12 (residualized X1). Let


the regression equation be: y = b ye e12 + v in
deviation form.
Then, b ye = ∑ ye12
∑ e122

b ye is the same as β̂1 in the multiple


regression, y = βˆ1x1 + βˆ2 x2 + e. i.e., b ye = βˆ 1

22
3.5 Partial Correlations and Coefficients of Determination

Alternatively, we can derive β̂1 (the estimator


of β 1 ) as follows:
Step 1: regress Y on X2 & save residuals, ey2.
1. y = by2 x2 + ey2 ....... [ey2 = residualizedY]
Step 2: regress X1 on X2 & save residuals, e12.
2. x1 = b12x2 + e12 …… [e12 = residualized X1]
Step 3: regress ey2 (part of Y cleared from the
influence of X2) on e12 (part of X1 cleared
from the influence of X2).
3. e y 2 = α12e12 + u
Then, α 12 in regression (3) = βˆ 1 in y = βˆ1 x1 + βˆ 2 x2 + e!
23
3.5 Partial Correlations and Coefficients of Determination

Suppose we have a dependent variable, Y, and


two regressors, X1 and X2.
2 2
Suppose also: ry1 & ry 2 are the squares of the
simple correlation coefficients between Y & X1
andY & X2, respectively.
Then,
ry12 = proportion of TSS that X1 alone explains.
r y22 = proportion of TSS that X2 alone explains.
2
On the other hand, R y •12 is
the proportion of
the variation in Y that X1 & X2 jointly explain.

We would also like to measure something else.


24
3.5 Partial Correlations and Coefficients of Determination
For instance:
a) How much does X2 explain after X1 is
already included in the regression model? Or,
b) How much does X1 explain after X2 is
included?
These are measured by coefficients of partial
2
determination: ry 2•1 & r y21• 2 , respectively.
2
r measures the strength of the mutual
y 2•1
relationship b/n Y & X2 after the influence of
X1 is eliminated from both Y & X2.

Partial correlations are important in deciding


25 whether or not to include more regressors.
3.5 Partial Correlations and Coefficients of Determination

e.g. Suppose we have two regressors (X1 &


X2) with;
ry 2 = 0.95 ; and
2

r 2
y 2•1 = 0.01.
To explain Y, X2 alone can do a good job
(high simple correlation coefficient between
Y & X2).
But after X1 is already included, X2 does
not add much: X1 has done the job of X2
(very low partial correlation coefficient
betweenY & X2).
26
3.5 Partial Correlations and Coefficients of Determination

If we regress Y on X1 alone, then we would



have: RSS SIMP = (1 − R y2•1 ) y 2 i.e., of the total
variation in Y, an amount = (1 − Ry2•1 ) yi2 ∑
remains unexplained (by X1 alone).
If we regress Y on X1 & X2, the variation in Y
(TSS) that would be left unexplained is:
RSSMULT = (1 − R 2
y •12 )∑ y 2

Adding X2 to the model reduces the RSS by:


= −
RSSSIMP− RSSMULT 144y2 •1 ∑ − − y•12 ∑
2 2 2 2
(1 R ) y (1 R ) y
443 1442443

= (R 2
y•12 − R )∑ y
2
y•1
2
27
3.5 Partial Correlations and Coefficients of Determination
If we now regress that part of Y freed from
the effect of X1 (residualized Y) on the part of
X2 freed from the effect of X1 (residualized
X2), we will be able to explain the following
proportion of the RSSSIMP:
(R 2
y •12 − R )∑ y
2
y •1
2
R 2
y •12 −R 2
y •1
= =
2 i
r
(1 − R )∑ y
y 2•1 2
y •1
2
i 1 − R y2•1
This is Coefficient of Partial Determination
(the square of partial correlation coefficient)
We include X2 if the reduction in RSS (or the
increase in ESS) is significant.
28 But, when exactly? We will see later!
3.5 Partial Correlations and Coefficients of Determination

Coefficient of Determination (in SLR):

R =2 ∑ xy
βˆ
Or , R =
2
βˆ
∑x
2 2

∑y 2
∑y 2

Coefficient of Multiple Determination:


K n

βˆ1 ∑x1 y + βˆ2 ∑x2 y ∑ j ∑ x ji yi }


{ ˆ
β
Ry2•12 = & R 2 = Ry2•12...K = j =1 i =1

∑y 2 n

∑ i
y 2

i =1
Coefficients of Partial Determination:
R 2
y•12 −R 2
y•1 R 2
y •12 −R 2
y •2
r2
y 2•1 = & r2
y1•2 =
1− R 2
y•1 1− R 2
y •2
29
3.5 Partial Correlations and Coefficients of Determination
The coefficient of multiple determination
(R2) measures the proportion of the variation
in the dependent variable explained by (the set
of all the regressors in) the model.
R2 can be used to compare goodness-of-fit of
alternative regression equations, but only if
the regression models satisfy two conditions.
1) The models must have the same dependent
variable.
Reason: TSS, ESS & RSS depend on the units
in which the regressand (Y) is measured.
For instance, the TSS for Y is not the same as
the TSS for ln(Y).
2) The models must have the same number of
30 regressors & parameters (same value of K+1).
3.5 Partial Correlations and Coefficients of Determination
Reason: Adding a variable to a model never
raises the RSS (or, never lowers ESS or R2)
even if the new variable is not very relevant.
The adjusted R-squared, R 2, attaches a penalty to
adding more variables.
It is modified to account for changes/
differences in degrees of freedom (df): due to
differences in number of regressors (K) and/
or sample size (n).
If adding a variable raises R 2 for a regression,
then this is a better indication that it has
improved the model than if the addition
merely raises R 2.
31
3.5 Partial Correlations and Coefficients of Determination

∑ e 2

R = 2 ∑ ˆ
y 2

= 1−
∑ e 2
R2 = 1−
[
n − (K + 1)
]

∑y ∑ [∑
2 2 2
y y
]
n−1
(Dividing TSS and RSS by their df).
K+1 is number of parameters to be estimated

R 2
= 1−[
∑ e 2


n −1
]
∑y 2
n − K −1
As long as K ≥ 1,
n−1
R =1−(1−R )•(
2 2
) 1− R 2 >1− R2 ⇒ R 2 < R2
n−K −1 In general, R 2 ≤ R 2
n −1
1− R = (1− R ) • (
2 2
) As n grows larger (relative
n − K −1 to K ), R 2 → R 2 .
32
3.5 Partial Correlations and Coefficients of Determination
2 2
1. While R is always non-negative, R can be
positive or negative.
. 2 can be used to compare goodness-of-fit of
2. R
two/more regression models only if the
models have the same regressand.
3. Including more regressors reduces both RSS
& df; R 2 rises only if the first effect dominates.
. 2 or R2 should never be the sole criterion for
4. R
choosing between/among models:
In addition to R 2, one should also:
consider expected signs & values of
coefficients, and
look for consistency with economic theory
33 or reasoning (possible explanations).
3.5 Partial Correlations and Coefficients of Determination

1. TSS = ∑ y = ∑Y − nY TSS = 4772− 5(30)


2 2 2 2

⇒ TSS = 272
2 . ESS = ∑ yˆ 2 = ∑ 1 1 2 2
( ˆ x + βˆ x ) 2
β

ESS = β1 ∑ x1 + β2 ∑ x2 + 2 βˆ1 βˆ 2 ∑ x1 x2
ˆ 2 2 ˆ 2 2

2
1 ∑
2 2
1
ˆ2 2
1
2
2 ∑
ESS=β ( X −nX )+β ( X −nX )+2βˆ βˆ ( X X −nX X )
ˆ
2 2 1 2 ∑ 1 2 1 2
ESS=( −0.25)
2
[141−5(5)2 ] +(5.5)2 [510−5(10)
2
] +2(−0.25)(5.5)
[262−5(5)(10)]
⇒ ESS = 270.5
OR: ESS= βˆ1 ∑yx1 + βˆ2 ∑yx2
ESS = βˆ1 (∑YX1 − nX1Y) + βˆ2( ∑YX2 − nX 2Y)
34 ⇒ESS= −0.25(62) +5.5(52) = 270.5
3.5 Partial Correlations and Coefficients of Determination

3. RSS = TSS − ESS ⇒ RSS = 272 − 270.5


⇒ RSS = 1.5
ESS 270.5
4. R = 2
= ⇒ R = 0.9945
2

TSS 272
Our model (education & experience together)
explains about 99.45% of the wage differenti al.
RSS (n − K −1) 1.5 2
5. R = 1−
2
= 1− ⇒ R = 0.9890
2

TSS (n −1) 272 4


Regressing Y on X 1 :

βˆ y1 =
∑ yx 1
=
∑ YX 1 − nX 1Y
=
62
= 3 .875
35
∑x 2
1 ∑X 1
2
− nX 12 16
3.5 Partial Correlations and Coefficients of Determination

ESS SIMP βˆ y•1 ∑ yx1 3.875 × 62


6. R y2•1 = = = = 0.8833
TSS ∑y 2
272
RSSSIMP = (1 − 0.8833)(272) = 0.1167(272) = 31.75
X1 (education s
) aloneexplanisabout88.33%of thedifference
in wages,andleavesabout11.67%( = 31.75)unexplaine
d.
7 . R y2•12 − R y2•1 = 0 . 9945 − 0 . 8833 = 0 . 1112
(R 2
y •12 − R )∑ y = 0.1112(272) = 30.25
2
y •1
2

X 2 (experienc e) enters the wage equation with an


extra (marginal) contributi on of explaining about
11.12% ( = 30.25) of the total variation in wages.
Note that this is the contributi on of the part of X 2 which
is not related to (free from the influence of) X 1 .
36
3.5 Partial Correlations and Coefficients of Determination

R y2•12 − R y2•1 0.9945 − 0.8833


8. r 2
y 2•1 = = = 0.9528
1− R 2
y •1 1 − 0.8833
Or, X 2 (experienc e) explains about 95.28%
( = 30.25) of the wage differenti al that X 1 has
left unexplaine d ( = 31.75).

37
3.6 Statistical Inferences in Multiple Linear Regression

The case of two regressors (X1 & X2):


ε i ~ N(0,σ ) 2

σ 2

β1 ~ N ( β1 , var( β1 ));
ˆ ˆ var(βˆ
1) =
∑x 2
1i (1 − r )
2
12

σ 2

β 2 ~ N (β 2 , var(β 2 ));
ˆ ˆ β
var( ˆ )=
∑x (1 − r )
2 2 2
2i 12

βˆ0 ~ N(β0 , var(βˆ0 ));


σ 2
var(β0 ) = + X12 var(βˆ1 ) + X22 var(βˆ2 ) + 2X1X2 cov(βˆ1 , βˆ2 )
ˆ
n
38
3.6 Statistical Inferences in Multiple Linear Regression

− σ ( ∑ x 1i x 2 i ) 2
2 2
r12
cov(β1 , β2 ) =
ˆ ˆ
r =
2

∑x1i x2i (1− r12)


2 12
∑x ∑x 2
1i
2
2i

∑x 2
1i (1 − r ) is the RSS from regressingX1 on X 2 .
2
12

∑x 2
2i
2i (1 − r ) is the RSS from regressingX 2 on X1 .
2
12

RSS
σˆ 2
= 2
is an unbiased estimator of σ .
n−3 −1
 n ∑X1 K ∑ XK 
 
2∑ 1 ∑X K∑ X1 XK 
2
X
var- cov(β̂ ) = σ (X X) = σ
2 / -1 1
 M M O M 
 2 
39 ∑ XK ∑XK X1 K ∑ XK 
3.6 Statistical Inferences in Multiple Linear Regression
−1
 n ∑X 1 K ∑ XK 
 
2∑ ∑X K ∑ X1 X K 
∧ 2
X1
var − cοv(β) = σˆ
ˆ 1
 M M O M 
 
∑ X K ∑ X K X1 K ∑ X K2 
Note that:
(a) (X'X)-1 is the same matrix we use to
derive the OLS estimates, and
(b) σˆ =
2 RSS in the case of two regressors.
n−3
In the general case of K explanatory variables,
RSS is an unbiased estimator of σ 2 .
σˆ =
2

n − K −1
40
3.6 Statistical Inferences in Multiple Linear Regression
Ceteris paribus, the higher the correlation
coefficient between X1 & X2 ( r12 ), the less
precise will estimates βˆ1 & βˆ2 be, i.e., the CIs
for the parameters β1 & β 2 will be wider.
Ceteris paribus, the higher the degree of
variation of the Xjs, the more precise will the
estimates be (narrow CIs for parameters).
The above two points are contained in:
2
σ
var(βˆ j ) = 2
(1-rj.123...)∑ x2ji
where rj2.123... is the R2 from an auxiliary
regression of Xj on all other (K–1) X's & a
41 constant.
3.6 Statistical Inferences in Multiple Linear Regression

We use t-statistic to test about single


parameters and single linear functions of
parameters.
To test hypotheses about & construct
intervals for individual β j use:
βˆ j − β *j
~ t n − K −1 ; ∀ j = 0,1,..., K .
s eˆ ( βˆ j )

where β is the hypothesized value of the


*
j
parameter β j .
Tests of several parameters & several linear
functions of parameters use F-statistic.
42
3.6 Statistical Inferences in Multiple Linear Regression

Procedures for Conducting F-tests:


1. Compute the RSS from regressing Y on all
Xjs (URSS = Unrestricted RSS).
2. Compute the RSS from the regression with
the hypothesized values of parameters (β' s )
(RRSS = Restricted RSS).
3. Under H0 (if the restriction is correct)
( RRSS− URSS) / K (RU2 − RR2 )/K
~ FK ,n−K −1 ~ FK,n−K −1
URSS /( n − K − 1 ) (1 − RU )/(n− K − 1)
2

where K = number of restrictions imposed.


4. If F-calculated > F-tabulated, then RRSS (is
significantly) > URSS; thus we reject the null.
43
3.6 Statistical Inferences in Multiple Linear Regression

A special F-test of common interest is to


test the null that none of the X’s influence
Y (i.e., that our regression is useless!):
Test H0: β1 =β2 =...=βK =0 Vs H1: H0 is not true.

K n
URSS = (1 − R )∑ y = ∑ y − ∑{βˆ j ∑ x ji yi }.
2 2
i
2
i
j =1 i =1

RRSS = ∑ y . 2
i

( RRSS − URSS) / K R2 / K
⇒ = ~ FK ,n− K −1
URSS /(n − K − 1) (1 − R ) /(n − K − 1)
2

44
3.6 Statistical Inferences in Multiple Linear Regression

With reference to our example on wages,


test the following at the 5% level of
significance.
a) β1 = 0
b) β. = 0
2
c) β. 0 =0
d) the overall significance of the model
e) β. = β
1 2

45
3.6 Statistical Inferences in Multiple Linear Regression

var- cov(β) =σ (X' X)


ˆ 2 −1
−1
 5 25 50  40.825 4.375 - 6.25 
(X' X) −1 = 25 141 262 =  4.375 0.625 - 0.75 
50 262 510  - 6.25 - 0.75 1 
RSS 1.5
σ is estimated by : σˆ =
2 2
σˆ =
2
= 0.75
n − K −1 2

 40.825 4.375 - 6.25 
var - cov( βˆ ) = 0 .75  4.375 0.625 - 0.75 
 - 6.25 - 0.75 1 
 30.61875 3.28125 - 4.6875 
=  3.28125 0.4687 5 - 0.5625 
 - 4.6875 - 0.5625 0.75 
46
3.6 Statistical Inferences in Multiple Linear Regression
 var(βˆ 0 ) cov(βˆ 0 , βˆ 1 ) cov(βˆ 0 , βˆ 2 )  30.61875 3.28125 - 4.6875
 
- 0.5625 

 var(βˆ 1 ) cov(βˆ 1 , βˆ 2 )  = 0.46876
 var(βˆ 2 )   0.75 
  

βˆ −0 − 0.25
≈ −0.37 t = t ≈
a) t c = 1 2
= 4.30
seˆ( β1 )
ˆ 0.46875 tab 0 .025

t cal ≤ t tab , ⇒ we do not reject the null.

b) c
t =
βˆ 2 − 0
=
5.5
≈ 6. 35
t cal > t tab
seˆ( βˆ 2 ) 0.75
⇒ reject the null.
βˆ0 − 0 − 23.75
c) t c = = ≈ −4.29
seˆ( βˆ0 ) 30.61875
t cal ≤ t tab , ⇒ we do not reject the null! ! !
47
3.6 Statistical Inferences in Multiple Linear Regression

R2 / K 0.9945 / 2
d) Fc = = ≈ 180.82
(1 − R ) /(n − K − 1) 0.0055 / 2
2

Ftab = F ≈ 19 Fcal > Ftab , ⇒ reject the null.


0.05
2 ,2

e) From Yˆ = βˆ + βˆ X + βˆ X , URSS = 1.5


i 0 1 1i 2 2i

Now run Yˆi = βˆ 0 + βˆ X 1i + βˆ X 2 i


⇒ Yˆi = βˆ 0 + βˆ ( X 1i + X 2 i ). ⇒ RRSS = 12.08
( RRSS− URSS) / K ( 12.08 − 1.5 ) / 1
Fc = = ≈ 14.11
( URSS) /( n − K − 1 ) 1.5 / 2
Fcal ≤ Ftab
Ftab = F 0.05
1,2 ≈ 18.51 ⇒ we do not reject the null.
48
3.6 Statistical Inferences in Multiple Linear Regression

Note that we can use t-test to test the single


restriction β1 = β2 (or equivalently, β1 - β2 = 0).
β̂1 − β̂2 − 0 β̂1 − β̂2
= ~ t1
sê(β̂1 − β̂2 )vâr(β̂1 ) + vâr(β̂2 ) − 2côv(β̂1 , β̂2 )
− 5.75
tc = ≈ −3.76
0.6846532 + 0.8660254 − 2( −0.5625)
tt = t 1
0 .025 = 12 .706
t cal < t tab ⇒ do not reject the null.
The same result as the F-test, but the F-test
is easier to handle.
49
3.6 Statistical Inferences in Multiple Linear Regression
To sum up:
Assuming that our model is correctly specified
& that all the assumptions are satisfied,
Education, after controlling for experience,
doesn’t have significant influence on wages.
In contrast, experience (after controlling for
education) is a significant predictor of wages
The intercept parameter is also insignificant
(though at the margin). Less Important!
Overall, the model explains a significant
portion of the observed wage pattern.
We cannot reject the claim that the
coefficients of the two regressors are equal.
50

You might also like