Chapter Three Chapter Three The Multiple Linear Regression (MLR)

CHAPTER THREE
THE MULTIPLE LINEAR REGRESSION

(MLR)
3.1 Introduction:The MLR
3.2 Assumptions of the MLR
3.3 Estimation:The Method of OLS
3.4 Properties of OLS Estimators
3.5 Coefficients of Determination
3.6 Statistical Inferences in MLR
1
3.1 Introduction: The Multiple Linear Regression
Relationship between a dependent & two/more

independent variables is linear in parameters.
Population Population slopes Random

Y-intercept Error
Yi = β 0 + β1 X 1i + β 2 X 2i + • • • + β K X Ki + ε i
Yi = βˆ0 + βˆ1 X 1i + βˆ2 X 2i + • • • + βˆ K X Ki + ei
Residual
Dependent (Response) Independent/Explanatory
2
variable (for sample) variables (for sample)
What changes as we move from simple to

multiple regression?
1. Potentially more explanatory power with
more variables;
2. The ability to control for other variables;
(and the interaction of various explanatory
variables: correlations and multicollinearity);
3. Harder to visualize drawing a line through
three or more (n)-dimensional space.
4. The R2 is no longer simply the square of the
correlation coefficient betweenY and X.
3
Slope ( βj ):
Ceteris paribus, Y changes by βj units for every
1 unit change in Xj , on average.
Y-Intercept ( β0 ):
The average value of Y when all Xjs are zero.
(may not be meaningful all the time)
A MLR model is linear in parameters, and
may not be linear in regressors.
Thus, the definition of MLR includes
polynomial regression.
e.g. Yi = β0 + β1 X1i + β2 X2i + β3 X12i + β4 X1i X2i + εi
4
3.2 Assumptions of the Multiple Linear Regression
Assumptions from Chapter Two + 1 new.
1. E(ɛi|Xji) = 0. (for all i = 1, 2, …, n; j = 1, …, K)
2. var(ɛi|Xji) = σ2. (Homoscedastic errors)
3. cov(ɛi,ɛs|Xji,Xjs) = 0 (i ≠ s) (No autocorrelation)
4. cov(ɛi,Xji) = 0. Errors are orthogonal to Xs.
5. Xj is non-stochastic & must take different
values.
6. n > K+1. (Number of observations > number
of parameters estimated). Number of
parameters is K+1 in this case ( β0, β1, …, βK )
7. ɛi ~N(0, σ2). Normally distributed errors.
5
3.2 Assumptions of the Multiple Linear Regression
Additional Assumption:
8. No perfect multicollinearity: That is, no
exact linear relation exists between any
subset of explanatory variables.
In the presence of perfect (deterministic)
linear relationship between/among any
set of the Xj5s, the impact of a single
variable ( β j ) cannot be identified.
More on multicollinearity in chapter 4!
6
The Case of K Explanatory Variables
Yi = β 0 + β 1 X 1i + β 2 X 2 i + K + β K X Ki + ε i
Number of parameters is = K+1.
Y1 = βˆ0 + βˆ1 X11 + βˆ2 X 21 + K+ βˆK X K1 + e1
Y = βˆ + βˆ X + βˆ X + K+ βˆ X + e
2 0 1 12 2 22 K K2 2
Y3 = βˆ0 + βˆ1 X13 + βˆ2 X 23 + K+ βˆK X K 3 + e3

M M M M O M M
Yn = βˆ0 + βˆ1 X1n + βˆ2 X 2n + K+ βˆK X Kn + en
7
3.3 Estimation:The Method of OLS (Matrix Approach)
 Y1  1 X11 X 21 X 31 K X K1   βˆ 0  e1 
Y  1  ˆ   
 2  X12 X 22 X 32 K X K2   β1  e 2 
Y3  = 1 X13 X 23 X 33 K X K3  •  βˆ 2  + e 3 
       
M
   M M M M O M   M  M
Yn  1 X1n X 2n X 3n KX Kn  βˆ K  e n 
n × ( K + 1) (K +1) ×1
n ×1 n ×1
8
Y = Xβ + e
ˆ
.
e1  Y1  1 X11 X21 X31 KXK1   βˆ 0 
e  Y  1  ˆ 
 2  2  X12 X22 X32 KXK2   β1 
e 3  = Y3  − 1 X13 X23 X33 KXK3  *  βˆ 2 
       
 M   M  M M M M O M  M 
en  Yn  1 X1n X2n X3n KXKn  βˆ K 
e = Y − Xβ̂
9
e1 
 
e2 
RSS= e1 + e2 +...+ en = ∑ei = (e1 e2 K en ).  
2 2 2 2
M
 
e 
⇒RSS= e'e  n
RSS = (Y − Xβ̂)' (Y − Xβ̂) = Y'Y − Y'Xβˆ − βˆ ' X'Y + βˆ ' X'Xβˆ
Since Y' Xβˆ is a costant, Y' Xβˆ = (Y' Xβˆ )' = βˆ ' X' Y
⇒ RSS = Y'Y − 2βˆ ' X'Y + βˆ ' (X'X)βˆ
∂(RSS) ∂(RSS )
F.O.C. : =0 ⇒ = −2X' Y + 2X' Xβˆ = 0
∂(βˆ ) ∂(βˆ )
⇒ −2X' (Y − Xβˆ ) = 0
10
 1 1 K 1   e   0
 X11 X12 K X1n   1   
X   e2   0
⇒ X'e = 0 ⇒  21 X 22 K X 2n .  =
M
M
 M M O M    
X   e   0
 K1 X K 2 K X Kn   n   
1. ∑ ei = 0 2. ∑ ei X ji = 0. ( j = 1,2,..., K )
X'e = X'(Y − Xβˆ ) = 0 ⇒ X' Xβˆ = X' Y

ˆβ = ( X' X) −1 X' Y
11
K 1  
 βˆ0   1 1 X 11 K XK1
  1
 β
ˆ   X 11 X 12 K X 1 n  1 X 12 K XK2
βˆ =  1  X' X =  . 
 M M O M  M M O M 
 M 
 βˆ   X K 1 XK2 K X Kn  1 X 
K X Kn 
 K  1n
 n ∑X 1 K ∑ XK 
 
⇒ X' X =  ∑ X1 ∑X 1
2
K∑ X1 X K 
 M M O M 
 
∑ X K ∑ X K X1 K ∑ X K2 
 ∑Y 
1   1 
 1 Y  YX 
∑
1 K 1 
 X 11 X 12 K X 1n   Y 2 
X'Y =   ⇒ X ' Y =  ∑ YX 2 
 M M O M  M  
M

 
 X K 1 X K2 K X Kn   Y   ∑ YX n 
 n  
12
ˆβ = (X' X)−1 (X' Y)

-1
 βˆ 0   n
   ∑X 1 ∑X 2 K ∑X K 

 ∑Y 
 
 βˆ1   ∑ X 1 ∑X ∑X X ∑X X  ∑ YX 1 
2
1 1 2 K 1 K 
 ˆ = X  YX 
 β2   ∑ 2 ∑X X ∑X ∑X X ∑ 2 
2 1
2
2 K 2 K


 M   M M M O M   M 
 ˆ   2   
 β K  ∑ X K ∑ X K X1 ∑ XK X2 K ∑ K 
X  ∑ YX K 
(K+1)×1 (K +1)×(K +1) (K+1)×1
You may also apply Cramer' s Rule to (X' X)βˆ = (X' Y)

and solve for βˆ
13
Numerical Example:
Y (Salary in X1 (Years of post high X2 (Years of
'000 Dollars) school Education) Experience)
30 4 10
20 3 8
36 6 11
24 4 9
40 8 12
ƩY = 150 ƩX1 = 25 ƩX2 = 50
14
X1Y X2Y X12 X22 X1X2 Y2

120 300 16 100 40 900
60 160 9 64 24 400
216 396 36 121 66 1296
96 216 16 81 36 576
320 480 64 144 96 1600
ƩX1Y = ƩX2Y = ƩX12 = ƩX22 = ƩX1X2 = ƩY2 =
812 1552 141 510 262 4772
15
−1
 βˆ 0   n
   ∑X ∑X 1 2   ∑Y 
  
β = (X' X) X' Y
ˆ −1
 βˆ 1  = ∑ X 1
 ˆ  
∑X ∑X X 2
1 1 2 •  ∑YX1 
 YX 
 β2  ∑ X 2 ∑X X ∑X ∑ 2 
1 2
2

2 
 βˆ0   5 25 50 −1  150 40.825 4.375 - 6.25  150

      
  
 βˆ1  = 25 141 262 •  812 = 4.375 0.625 - 0.75  • 812
 ˆ   
1552  - 6.25 - 0.75  1552
β 50 262 510   1   
 2   
 βˆ 0   - 23.75 
   
⇒  βˆ 1  =  - 0.25 
 ˆ   
β
 2   5.5 
Ŷ = −23.75 − 0.25 X1 + 5.5 X 2

16
Interpretation:
One more year of experience, after controlling for years of
education, results in $5,500 rise in salary, on average.
Or, if we consider two persons (A & B) with the same level of

education, the one with one more year of experience (A) is
expected to have a salary $5500 more than that of B.
Similarly, for two people (C & D) with the same level of

experience, the one with an education of one more year (D) is
expected to have a salary $250 less than that of C.
Experience looks far more important than education (which has

a negative sign).
17
The constant term -23.75 is the salary one with no experience &
no education would get.
But, a negative salary is impossible.
Then, what is wrong?
1. The sample must have been drawn from a sub-group. We have

persons with experience ranging from 8 to 12 years (and post
high school education ranging from 3 to 8 years). So we should
not extrapolate the results too far out of this sample range.
2. Model specification: is our model correctly specified (variables,
functional form); does our data set satisfy the underlying
assumptions?
18
3.4 Properties of OLS Estimators
Given the assumptions of the CLRM (in Section
3.2), the OLS estimators of the partial
regression coefficients are BLUE: linear,
unbiased & have minimum variance in the class
of all linear unbiased estimators – the Gauss-
Markov Theorem.
In cases where the small-sample desirable
properties (BLUE) may not be found, we look
for asymptotic (or large-sample) properties
like consistency and asymptotic efficiency &
asymptotic normality (CLT).
The OLS estimators are consistent:
p lim(βˆ − β) = 0 & p lim var( βˆ ) = 0
n→ ∞
n →∞
19
3.5 Partial Correlations & Coefficients of Determination
In the multiple regression equation with 2
regressors (X1 & X2), Yi = βˆ 0 + βˆ 1X1i + βˆ 2X2i + ei ,
we can talk of:
the joint effect of X1 and X2 on Y, and
the partial effect of X1 or X2 on Y.
The partial effect of X1 is measured by β̂ 1 &
the partial effect of X2 is measured byβ̂ .
2
Partial effect: holding the other variable
constant or after eliminating the effect of the
other variable.
Thus,β̂ 1 is interpreted as measuring the effect
of X1 on Y after eliminating the effect of X2
from X1.
20
3.5 Partial Correlations and Coefficients of Determination
Similarly, βˆ 2 measures the effect of X2 on Y

after eliminating the effect of X1 on X2.
Thus, we can derive β̂1(the estimator of β1) by
estimating two separate regressions):
Step 1: Regress X1 on X2 (an auxiliary
regression to eliminate the effect of X2 from
X1). Let the regression equation be:
X 1 = a + b12 X 2 + e12 , Or, in deviation form:
x1 = b12x2 + e12 .
Then, b12 =
∑ x1x2
∑x 2
2
e12 is part of X1 free from influence of X2.
21
Step 2: Regress Y on e12 (residualized X1). Let

the regression equation be: y = b ye e12 + v in
deviation form.
Then, b ye = ∑ ye12
∑ e122
b ye is the same as β̂1 in the multiple

regression, y = βˆ1x1 + βˆ2 x2 + e. i.e., b ye = βˆ 1
22
Alternatively, we can derive β̂1 (the estimator

of β 1 ) as follows:
Step 1: regress Y on X2 & save residuals, ey2.
1. y = by2 x2 + ey2 ....... [ey2 = residualizedY]
Step 2: regress X1 on X2 & save residuals, e12.
2. x1 = b12x2 + e12 …… [e12 = residualized X1]
Step 3: regress ey2 (part of Y cleared from the
influence of X2) on e12 (part of X1 cleared
from the influence of X2).
3. e y 2 = α12e12 + u
Then, α 12 in regression (3) = βˆ 1 in y = βˆ1 x1 + βˆ 2 x2 + e!
23
Suppose we have a dependent variable, Y, and

two regressors, X1 and X2.
2 2
Suppose also: ry1 & ry 2 are the squares of the
simple correlation coefficients between Y & X1
andY & X2, respectively.
Then,
ry12 = proportion of TSS that X1 alone explains.
r y22 = proportion of TSS that X2 alone explains.
2
On the other hand, R y •12 is
the proportion of
the variation in Y that X1 & X2 jointly explain.
We would also like to measure something else.

24
For instance:
a) How much does X2 explain after X1 is
already included in the regression model? Or,
b) How much does X1 explain after X2 is
included?
These are measured by coefficients of partial
2
determination: ry 2•1 & r y21• 2 , respectively.
2
r measures the strength of the mutual
y 2•1
relationship b/n Y & X2 after the influence of
X1 is eliminated from both Y & X2.
Partial correlations are important in deciding

25 whether or not to include more regressors.
e.g. Suppose we have two regressors (X1 &

X2) with;
ry 2 = 0.95 ; and
2
r 2
y 2•1 = 0.01.
To explain Y, X2 alone can do a good job
(high simple correlation coefficient between
Y & X2).
But after X1 is already included, X2 does
not add much: X1 has done the job of X2
(very low partial correlation coefficient
betweenY & X2).
26
If we regress Y on X1 alone, then we would

∑
have: RSS SIMP = (1 − R y2•1 ) y 2 i.e., of the total
variation in Y, an amount = (1 − Ry2•1 ) yi2 ∑
remains unexplained (by X1 alone).
If we regress Y on X1 & X2, the variation in Y
(TSS) that would be left unexplained is:
RSSMULT = (1 − R 2
y •12 )∑ y 2
Adding X2 to the model reduces the RSS by:

= −
RSSSIMP− RSSMULT 144y2 •1 ∑ − − y•12 ∑
2 2 2 2
(1 R ) y (1 R ) y
443 1442443
= (R 2
y•12 − R )∑ y
2
y•1
2
27
If we now regress that part of Y freed from
the effect of X1 (residualized Y) on the part of
X2 freed from the effect of X1 (residualized
X2), we will be able to explain the following
proportion of the RSSSIMP:
(R 2
y •12 − R )∑ y
2
y •1
2
R 2
y •12 −R 2
y •1
= =
2 i
r
(1 − R )∑ y
y 2•1 2
y •1
2
i 1 − R y2•1
This is Coefficient of Partial Determination
(the square of partial correlation coefficient)
We include X2 if the reduction in RSS (or the
increase in ESS) is significant.
28 But, when exactly? We will see later!
Coefficient of Determination (in SLR):
R =2 ∑ xy
βˆ
Or , R =
2
βˆ
∑x
2 2
∑y 2
∑y 2
Coefficient of Multiple Determination:

K n
βˆ1 ∑x1 y + βˆ2 ∑x2 y ∑ j ∑ x ji yi }

{ ˆ
β
Ry2•12 = & R 2 = Ry2•12...K = j =1 i =1
∑y 2 n
∑ i
y 2
i =1
Coefficients of Partial Determination:
R 2
y•12 −R 2
y•1 R 2
y •12 −R 2
y •2
r2
y 2•1 = & r2
y1•2 =
1− R 2
y•1 1− R 2
y •2
29
The coefficient of multiple determination
(R2) measures the proportion of the variation
in the dependent variable explained by (the set
of all the regressors in) the model.
R2 can be used to compare goodness-of-fit of
alternative regression equations, but only if
the regression models satisfy two conditions.
1) The models must have the same dependent
variable.
Reason: TSS, ESS & RSS depend on the units
in which the regressand (Y) is measured.
For instance, the TSS for Y is not the same as
the TSS for ln(Y).
2) The models must have the same number of
30 regressors & parameters (same value of K+1).
Reason: Adding a variable to a model never
raises the RSS (or, never lowers ESS or R2)
even if the new variable is not very relevant.
The adjusted R-squared, R 2, attaches a penalty to
adding more variables.
It is modified to account for changes/
differences in degrees of freedom (df): due to
differences in number of regressors (K) and/
or sample size (n).
If adding a variable raises R 2 for a regression,
then this is a better indication that it has
improved the model than if the addition
merely raises R 2.
31
∑ e 2
R = 2 ∑ ˆ
y 2
= 1−
∑ e 2
R2 = 1−
[
n − (K + 1)
]
∑y ∑ [∑
2 2 2
y y
]
n−1
(Dividing TSS and RSS by their df).
K+1 is number of parameters to be estimated
R 2
= 1−[
∑ e 2
•
n −1
]
∑y 2
n − K −1
As long as K ≥ 1,
n−1
R =1−(1−R )•(
2 2
) 1− R 2 >1− R2 ⇒ R 2 < R2
n−K −1 In general, R 2 ≤ R 2
n −1
1− R = (1− R ) • (
2 2
) As n grows larger (relative
n − K −1 to K ), R 2 → R 2 .
32
2 2
1. While R is always non-negative, R can be
positive or negative.
. 2 can be used to compare goodness-of-fit of
2. R
two/more regression models only if the
models have the same regressand.
3. Including more regressors reduces both RSS
& df; R 2 rises only if the first effect dominates.
. 2 or R2 should never be the sole criterion for
4. R
choosing between/among models:
In addition to R 2, one should also:
consider expected signs & values of
coefficients, and
look for consistency with economic theory
33 or reasoning (possible explanations).
1. TSS = ∑ y = ∑Y − nY TSS = 4772− 5(30)

2 2 2 2
⇒ TSS = 272
2 . ESS = ∑ yˆ 2 = ∑ 1 1 2 2
( ˆ x + βˆ x ) 2
β
ESS = β1 ∑ x1 + β2 ∑ x2 + 2 βˆ1 βˆ 2 ∑ x1 x2
ˆ 2 2 ˆ 2 2
2
1 ∑
2 2
1
ˆ2 2
1
2
2 ∑
ESS=β ( X −nX )+β ( X −nX )+2βˆ βˆ ( X X −nX X )
ˆ
2 2 1 2 ∑ 1 2 1 2
ESS=( −0.25)
2
[141−5(5)2 ] +(5.5)2 [510−5(10)
2
] +2(−0.25)(5.5)
[262−5(5)(10)]
⇒ ESS = 270.5
OR: ESS= βˆ1 ∑yx1 + βˆ2 ∑yx2
ESS = βˆ1 (∑YX1 − nX1Y) + βˆ2( ∑YX2 − nX 2Y)
34 ⇒ESS= −0.25(62) +5.5(52) = 270.5
3. RSS = TSS − ESS ⇒ RSS = 272 − 270.5

⇒ RSS = 1.5
ESS 270.5
4. R = 2
= ⇒ R = 0.9945
2
TSS 272
Our model (education & experience together)
explains about 99.45% of the wage differenti al.
RSS (n − K −1) 1.5 2
5. R = 1−
2
= 1− ⇒ R = 0.9890
2
TSS (n −1) 272 4

Regressing Y on X 1 :
βˆ y1 =
∑ yx 1
=
∑ YX 1 − nX 1Y
=
62
= 3 .875
35
∑x 2
1 ∑X 1
2
− nX 12 16
ESS SIMP βˆ y•1 ∑ yx1 3.875 × 62

6. R y2•1 = = = = 0.8833
TSS ∑y 2
272
RSSSIMP = (1 − 0.8833)(272) = 0.1167(272) = 31.75
X1 (education s
) aloneexplanisabout88.33%of thedifference
in wages,andleavesabout11.67%( = 31.75)unexplaine
d.
7 . R y2•12 − R y2•1 = 0 . 9945 − 0 . 8833 = 0 . 1112
(R 2
y •12 − R )∑ y = 0.1112(272) = 30.25
2
y •1
2
X 2 (experienc e) enters the wage equation with an

extra (marginal) contributi on of explaining about
11.12% ( = 30.25) of the total variation in wages.
Note that this is the contributi on of the part of X 2 which
is not related to (free from the influence of) X 1 .
36
R y2•12 − R y2•1 0.9945 − 0.8833

8. r 2
y 2•1 = = = 0.9528
1− R 2
y •1 1 − 0.8833
Or, X 2 (experienc e) explains about 95.28%
( = 30.25) of the wage differenti al that X 1 has
left unexplaine d ( = 31.75).
37
3.6 Statistical Inferences in Multiple Linear Regression
The case of two regressors (X1 & X2):

ε i ~ N(0,σ ) 2
σ 2
β1 ~ N ( β1 , var( β1 ));
ˆ ˆ var(βˆ
1) =
∑x 2
1i (1 − r )
2
12
σ 2
β 2 ~ N (β 2 , var(β 2 ));
ˆ ˆ β
var( ˆ )=
∑x (1 − r )
2 2 2
2i 12
βˆ0 ~ N(β0 , var(βˆ0 ));

σ 2
var(β0 ) = + X12 var(βˆ1 ) + X22 var(βˆ2 ) + 2X1X2 cov(βˆ1 , βˆ2 )
ˆ
n
38
− σ ( ∑ x 1i x 2 i ) 2
2 2
r12
cov(β1 , β2 ) =
ˆ ˆ
r =
2
∑x1i x2i (1− r12)

2 12
∑x ∑x 2
1i
2
2i
∑x 2
1i (1 − r ) is the RSS from regressingX1 on X 2 .
2
12
∑x 2
2i
2i (1 − r ) is the RSS from regressingX 2 on X1 .
2
12
RSS
σˆ 2
= 2
is an unbiased estimator of σ .
n−3 −1
 n ∑X1 K ∑ XK 
 
2∑ 1 ∑X K∑ X1 XK 
2
X
var- cov(β̂ ) = σ (X X) = σ
2 / -1 1
 M M O M 
 2 
39 ∑ XK ∑XK X1 K ∑ XK 
−1
 n ∑X 1 K ∑ XK 
 
2∑ ∑X K ∑ X1 X K 
∧ 2
X1
var − cοv(β) = σˆ
ˆ 1
 M M O M 
 
∑ X K ∑ X K X1 K ∑ X K2 
Note that:
(a) (X'X)-1 is the same matrix we use to
derive the OLS estimates, and
(b) σˆ =
2 RSS in the case of two regressors.
n−3
In the general case of K explanatory variables,
RSS is an unbiased estimator of σ 2 .
σˆ =
2
n − K −1
40
Ceteris paribus, the higher the correlation
coefficient between X1 & X2 ( r12 ), the less
precise will estimates βˆ1 & βˆ2 be, i.e., the CIs
for the parameters β1 & β 2 will be wider.
Ceteris paribus, the higher the degree of
variation of the Xjs, the more precise will the
estimates be (narrow CIs for parameters).
The above two points are contained in:
2
σ
var(βˆ j ) = 2
(1-rj.123...)∑ x2ji
where rj2.123... is the R2 from an auxiliary
regression of Xj on all other (K–1) X's & a
41 constant.
We use t-statistic to test about single

parameters and single linear functions of
parameters.
To test hypotheses about & construct
intervals for individual β j use:
βˆ j − β *j
~ t n − K −1 ; ∀ j = 0,1,..., K .
s eˆ ( βˆ j )
where β is the hypothesized value of the

*
j
parameter β j .
Tests of several parameters & several linear
functions of parameters use F-statistic.
42
Procedures for Conducting F-tests:

1. Compute the RSS from regressing Y on all
Xjs (URSS = Unrestricted RSS).
2. Compute the RSS from the regression with
the hypothesized values of parameters (β' s )
(RRSS = Restricted RSS).
3. Under H0 (if the restriction is correct)
( RRSS− URSS) / K (RU2 − RR2 )/K
~ FK ,n−K −1 ~ FK,n−K −1
URSS /( n − K − 1 ) (1 − RU )/(n− K − 1)
2
where K = number of restrictions imposed.

4. If F-calculated > F-tabulated, then RRSS (is
significantly) > URSS; thus we reject the null.
43
A special F-test of common interest is to

test the null that none of the X’s influence
Y (i.e., that our regression is useless!):
Test H0: β1 =β2 =...=βK =0 Vs H1: H0 is not true.
K n
URSS = (1 − R )∑ y = ∑ y − ∑{βˆ j ∑ x ji yi }.
2 2
i
2
i
j =1 i =1
RRSS = ∑ y . 2
i
( RRSS − URSS) / K R2 / K
⇒ = ~ FK ,n− K −1
URSS /(n − K − 1) (1 − R ) /(n − K − 1)
2
44
With reference to our example on wages,

test the following at the 5% level of
significance.
a) β1 = 0
b) β. = 0
2
c) β. 0 =0
d) the overall significance of the model
e) β. = β
1 2
45
var- cov(β) =σ (X' X)

ˆ 2 −1
−1
 5 25 50  40.825 4.375 - 6.25 
(X' X) −1 = 25 141 262 =  4.375 0.625 - 0.75 
50 262 510  - 6.25 - 0.75 1 
RSS 1.5
σ is estimated by : σˆ =
2 2
σˆ =
2
= 0.75
n − K −1 2
∧
 40.825 4.375 - 6.25 
var - cov( βˆ ) = 0 .75  4.375 0.625 - 0.75 
 - 6.25 - 0.75 1 
 30.61875 3.28125 - 4.6875 
=  3.28125 0.4687 5 - 0.5625 
 - 4.6875 - 0.5625 0.75 
46
 var(βˆ 0 ) cov(βˆ 0 , βˆ 1 ) cov(βˆ 0 , βˆ 2 )  30.61875 3.28125 - 4.6875
 
- 0.5625 

 var(βˆ 1 ) cov(βˆ 1 , βˆ 2 )  = 0.46876
 var(βˆ 2 )   0.75 
  
βˆ −0 − 0.25
≈ −0.37 t = t ≈
a) t c = 1 2
= 4.30
seˆ( β1 )
ˆ 0.46875 tab 0 .025
t cal ≤ t tab , ⇒ we do not reject the null.
b) c
t =
βˆ 2 − 0
=
5.5
≈ 6. 35
t cal > t tab
seˆ( βˆ 2 ) 0.75
⇒ reject the null.
βˆ0 − 0 − 23.75
c) t c = = ≈ −4.29
seˆ( βˆ0 ) 30.61875
t cal ≤ t tab , ⇒ we do not reject the null! ! !
47
R2 / K 0.9945 / 2
d) Fc = = ≈ 180.82
(1 − R ) /(n − K − 1) 0.0055 / 2
2
Ftab = F ≈ 19 Fcal > Ftab , ⇒ reject the null.

0.05
2 ,2
e) From Yˆ = βˆ + βˆ X + βˆ X , URSS = 1.5

i 0 1 1i 2 2i
Now run Yˆi = βˆ 0 + βˆ X 1i + βˆ X 2 i

⇒ Yˆi = βˆ 0 + βˆ ( X 1i + X 2 i ). ⇒ RRSS = 12.08
( RRSS− URSS) / K ( 12.08 − 1.5 ) / 1
Fc = = ≈ 14.11
( URSS) /( n − K − 1 ) 1.5 / 2
Fcal ≤ Ftab
Ftab = F 0.05
1,2 ≈ 18.51 ⇒ we do not reject the null.
48
Note that we can use t-test to test the single

restriction β1 = β2 (or equivalently, β1 - β2 = 0).
β̂1 − β̂2 − 0 β̂1 − β̂2
= ~ t1
sê(β̂1 − β̂2 )vâr(β̂1 ) + vâr(β̂2 ) − 2côv(β̂1 , β̂2 )
− 5.75
tc = ≈ −3.76
0.6846532 + 0.8660254 − 2( −0.5625)
tt = t 1
0 .025 = 12 .706
t cal < t tab ⇒ do not reject the null.
The same result as the F-test, but the F-test
is easier to handle.
49
To sum up:
Assuming that our model is correctly specified
& that all the assumptions are satisfied,
Education, after controlling for experience,
doesn’t have significant influence on wages.
In contrast, experience (after controlling for
education) is a significant predictor of wages
The intercept parameter is also insignificant
(though at the margin). Less Important!
Overall, the model explains a significant
portion of the observed wage pattern.
We cannot reject the claim that the
coefficients of the two regressors are equal.
50

Chapter Three Chapter Three The Multiple Linear Regression (MLR)

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter Three Chapter Three The Multiple Linear Regression (MLR)

Uploaded by

Copyright:

Available Formats

CHAPTER THREE

THE MULTIPLE LINEAR REGRESSION

Relationship between a dependent & two/more

Population Population slopes Random

What changes as we move from simple to

Y3 = βˆ0 + βˆ1 X13 + βˆ2 X 23 + K+ βˆK X K 3 + e3

X'e = X'(Y − Xβˆ ) = 0 ⇒ X' Xβˆ = X' Y

ˆβ = (X' X)−1 (X' Y)

You may also apply Cramer' s Rule to (X' X)βˆ = (X' Y)

X1Y X2Y X12 X22 X1X2 Y2

 βˆ0   5 25 50 −1  150 40.825 4.375 - 6.25  150

Ŷ = −23.75 − 0.25 X1 + 5.5 X 2

Or, if we consider two persons (A & B) with the same level of

Similarly, for two people (C & D) with the same level of

Experience looks far more important than education (which has

But, a negative salary is impossible.

Then, what is wrong?

1. The sample must have been drawn from a sub-group. We have

Similarly, βˆ 2 measures the effect of X2 on Y

Step 2: Regress Y on e12 (residualized X1). Let

b ye is the same as β̂1 in the multiple

Alternatively, we can derive β̂1 (the estimator

Suppose we have a dependent variable, Y, and

We would also like to measure something else.

Partial correlations are important in deciding

e.g. Suppose we have two regressors (X1 &

If we regress Y on X1 alone, then we would

Adding X2 to the model reduces the RSS by:

Coefficient of Determination (in SLR):

Coefficient of Multiple Determination:

βˆ1 ∑x1 y + βˆ2 ∑x2 y ∑ j ∑ x ji yi }

1. TSS = ∑ y = ∑Y − nY TSS = 4772− 5(30)

3. RSS = TSS − ESS ⇒ RSS = 272 − 270.5

TSS (n −1) 272 4

ESS SIMP βˆ y•1 ∑ yx1 3.875 × 62

X 2 (experienc e) enters the wage equation with an

R y2•12 − R y2•1 0.9945 − 0.8833

The case of two regressors (X1 & X2):

βˆ0 ~ N(β0 , var(βˆ0 ));

∑x1i x2i (1− r12)

We use t-statistic to test about single

where β is the hypothesized value of the

Procedures for Conducting F-tests:

where K = number of restrictions imposed.

A special F-test of common interest is to

With reference to our example on wages,

var- cov(β) =σ (X' X)

t cal ≤ t tab , ⇒ we do not reject the null.

Ftab = F ≈ 19 Fcal > Ftab , ⇒ reject the null.

e) From Yˆ = βˆ + βˆ X + βˆ X , URSS = 1.5

Now run Yˆi = βˆ 0 + βˆ X 1i + βˆ X 2 i

Note that we can use t-test to test the single

You might also like