You are on page 1of 13

Hyunchul Kim

Department of Economics Undergraduate


Sungkyunkwan University Econometrics

Lecture Note 4
Multiple Regression Model

This note is based on the lecture note series by Kyoo il Kim.

1 Model

ˆ Natural extension of two-variable model:

Yi = β1 + β2 X2i + · · · + βk Xki + εi , i = 1, 2, ..., n

where

- Y is dependent variable
- X1 , . . . , Xk are independent variables
- ε is error term
- β1 is the intercept. (Note that we implicitly define X1i =1)
- βj measures marginal response of Y with respect to unit increase in Xj (holding
the others constant)

2 Example: Regression of Wage Equation

log\
(wage) = .284 + .092 (educ) + .0041 (exper) + .022 (tenure)

(.104) (.007) (.0017) (.003)

n = 526, R2 = 0.316

ˆ Degree of freedom: 526 − 4 = 522

ˆ Standard errors in parentheses

ˆ Interpret individual coefficients

ˆ Critical values: The 5% two-sided critical value is 1.960 and the 1% two-sided critical
value is 2.576
0.0041
texper = 0.0017 ≈ 2.41 > tα/2 (522) = 1.960 with α = 0.05, and thus βbexper is
statistically significant at 5% significance

1
3 Is Linearity Assumption Restrictive?

ˆ Often not too restrictive

ˆ Linearity in parameter is often recovered after transformation

ˆ This is because many models are inherently linear models, which can be expressed in
a form that is linear in parameters by an appropriate transformation of the variables

ˆ On the other hand, there do exist inherently nonlinear models for which such trans-
formation does not exist

ˆ Example

Obvious examples:

Y = β1 + β2 X2 + β3 X22 + ε
ln Y = β1 + β2 ln X2 + β3 X3 + ε
Y = β1 + β2 log X2 + ε
Y = β1 + β2 X2 + β3 X3 + β4 X2 X3 + ε

Nontrivial examples:

Y = γ1 X2γ2 X3γ3 ε∗ ⇐⇒ log Y = log γ1 + γ2 log X2 + γ3 log X3 + log ε∗


Y = exp (β1 + β2 X2 + β3 X3 )ε∗ ⇐⇒ log Y = β1 + β2 X2 + β3 X3 + log ε∗
1 1
Y = ⇐⇒ = β1 + β2 X2 + β3 X3 + ε
β1 + β2 X2 + β3 X3 + ε Y

Cobb-Douglas production function:

Q = ALα K β ⇐⇒ ln Q = ln A + α ln L + β ln K

Nonlinear model: No transformation is possible to recover linearity

Y = γ1 X2γ2 X3γ3 + ε

4 Least Squares Estimation of Multiple Regression Model

ˆ Consider the following multiple regression function

Yi = β1 + β2 X2i + · · · + βk Xki + εi , i = 1, 2, ..., n

2
ˆ Objective:
X 2
min RSS = Yi − βb1 − βb2 X2i − . . . − βbk Xki
βb1 ,βb2 ,...,βbk i

ˆ Why least squares? BLUE property by Gauss-Markov theorem as in the two-variable


case

ˆ Normal equation (k unknowns with k equations):


X
ei = 0
X
ei X2i = 0
..
.
X
ei Xki = 0

ˆ Derivation of normal equation (first order conditions):

∂ X 2 X ∂  
Yi − βb1 − βb2 X2i − . . . − βbk Xki = 2ei Yi − βb1 − βb2 X2i − . . . − βbk Xki
∂ βb1 ∂ βb1
X
= −2 ei = 0

∂ X 2 X ∂  
Yi − βb1 − βb2 X2i − . . . − βbk Xki = 2ei Yi − βb1 − βb2 X2i − . . . − βbk Xki
∂ βb2 ∂ βb2
X
= −2 ei X2i = 0
..
.

∂ X 2 X ∂  
Yi − βb1 − βb2 X2i − . . . − βbk Xki = 2ei Yi − βb1 − βb2 X2i − . . . − βbk Xki
∂ βbk ∂ βbk
X
= −2 ei Xki = 0

ˆ OLS estimates are obtained as solutions to the system of normal equations

ˆ Closed form expression of OLS exists in matrix algebra

ˆ Sample Analogy Principle: Roughly speaking, properties held in the population


should also hold in the sample

3
– OLS estimator can be obtained from this principle

– Classical Assumption: E[εi ] = 0, E[εi X2i ] = 0,. . .,E[εi Xki ] = 0 imply that
1X
E[εi ] = 0 =⇒ E[e
b i ] = 0 =⇒ ei = 0
n
b i X2i ] = 0 =⇒ 1
X
E[εi X2i ] = 0 =⇒ E[e ei X2i = 0
n
..
.
1X
E[εi Xki ] = 0 =⇒ E[e
b i Xki ] = 0 =⇒ ei Xki = 0
n

where E[·]
b denotes the sample analogy of expectation, which is the sample mean

ˆ The OLS estimator is obtained from mimicking in the sample the model structure
laid out in the assumptions for the population

5 Regression Statistics

ˆ An unbiased estimator of σ 2 is provided by


1 X 2
s2 = ei
n−k
where ei is the residual from the OLS estimator

ˆ Variance-covariance of OLS estimators depend on σ 2 . Because we usually do not


know σ 2 , variances and covariances of OLS estimators are estimated based on the
estimated value of σ 2 . These estimated variances and covariances are summarized
in variance-covariance matrix of computer output:
  2 
Vdar(βb1 ) = se(βb1 ) Cov(
d βb1 , βb2 ) ··· Cov(
d βb1 , βbk )

  2 

 Vd
ar(βb2 ) = se(βb2 ) ··· Cov(
d βb2 , βbk ) 


.. .. 

 . .


  2 
Vd
ar(βbk ) = se(βbk )

ˆ Observe that se(βbj ) is the square root of the j-th diagonal element of the matrix.
Computer reports this as the standard error of βbj

4
ˆ Under the assumption that the errors are independently and normally distributed,
we can apply t-test and implement t-distribution-based confidence intervals because

βbj − βj
∼ t(n − k)
se(βbj )

Note that the degree of freedom is equal to the denominator used in the computation
of s2 :

d.f. = (the sample size) - (the number of parameters estimated)

ˆ Most commonly applied null hypothesis takes the form of H0 : βj = 0 (Does the j-th
independent variable explain the dependent variable?) Observe that the t-statistic
for this case should be

βbj
t= ∼ t(n − k)
se(βbj )

Computer usually reports this ratio for each j. It is what is called the ‘t-statistic’
in the usual computer output. Note that, if your null hypothesis does not take the
default form H0 : βj = 0, then the t-statistic should be calculated by hand or by
using a special option in your software

ˆ As in the two-variable case, we have the decomposition:


X 2 X 2 X  2
Yi − Y = Ybi − Y + Yi − Ybi

T SS = ESS + RSS

We define R2 as before
ESS
R2 =
T SS

ˆ Again, R2 measures the variation of Y that is explained by the set of explanatory


variables. It is often informally used as goodness-of-fit statistic.

ˆ The problem with R2 is that it increases as a function of k, the number of explanatory


variables, regardless of their quality: Even if nonsensical independent variables are
added, R2 will always increase. Therefore, R2 may mislead us to believe that the
set of explanatory variables as a whole explains well the dependent variable. Indeed,
with k = n, we may have R2 = 1

5
ˆ Sometimes, adjusted-R2 corrects this problem. It is formally defined by
n−1
R̄2 = 1 − (1 − R2 )
n−k
Observe that (assuming R2 is fixed)
n−1 n−1 n−1
k ↑ =⇒ n − k ↓ =⇒ ↑ =⇒ (1 − R2 ) ↑ =⇒ 1 − (1 − R2 ) ↓
n−k n−k n−k
Therefore, R̄2 penalizes large number of independent variables.

ˆ Note that as more independent variables are added, R2 itself increases but k ↑.
Therefore, depending on the magnitude of increase in R2 (i.e., quality of additional
regressors), R̄2 may increase or decrease.

ˆ A reasonable hypothesis to entertain: None of the explanatory variables helps to


explain the variations of Y around the mean, which can formally be written as

H0 : β2 = · · · = βk = 0

against

H1 : ∃βj ̸= 0 for j = 2, . . . , k

The test statistic for testing this particular hypothesis is

R2 n−k
· ∼ F (k − 1, n − k)
1 − R2 k − 1
which follows a F (k − 1, n − k) distribution under the null. This is ‘F -statistic’
reported by computer

6 Joint Hypothesis Testing

We use F -tests for hypothesis involving more than one parameter, i.e. joint tests on several
regression coefficients.

ˆ Given a linear regression model

Yi = β1 + β2 X2,i + · · · + βk Xk,i + εi

it may be of interest to test the hypothesis

H0 : βk−q+1 = · · · = βk = 0 (1)

6
Under the null, the model simplifies to

Yi = β1 + β2 X2,i + · · · + βk−q Xk−q,i + εi

The null hypothesis consists of q hypotheses:

βk−q+1 = 0, βk−q+2 = 0, · · · , βk = 0

Therefore, the usual t-test cannot be used.

ˆ Note that subscripts of βk ’s above are just labeling, so the hypotheses in (1) can
nest the following examples

H0 : β2 = β4 = β6 = 0

H0 : β2 = β5 = β7 = β11 = 0

ˆ Use of F -test for testing the joint hypotheses of (1)

– Estimate the restricted model (plugging in values from the hypotheses):

Yi = β1 + β2 X2,i + · · · + βk−q Xk−q,i + εi

– Obtain the RSS of the restricted model, and call it RSSR

– Estimate the unrestricted model (the original model):

Yi = β1 + β2 X2,i + · · · + βk Xk,i + εi

– Obtain the RSS of the unrestricted model, and call it RRSU

– The statistic is obtained by

(RSSR − RSSU )/q


F =
RSSU /(n − k)

– Reject the null hypotheses if the test statistic exceeds Fα (q, n − k)

ˆ The above test is called the F -test, and the statistic is called the F -statistic

ˆ RSSR > RSSU : Least squares estimates are obtained by minimizing the RSS. With
more restrictions, the minimization will be constrained and the resulting RSS is
bound to be larger than the one without any restriction

7
ˆ The test statistic has a F (q, n − k) under the null

ˆ When q = 1, the result of the F -test is identical to that of the t-test

ˆ Alternatively, we can write

(RU2 − R2 )/q
R
F = 2 )/(n − k) , (2)
(1 − RU
2 is R2 from the unrestricted model and R2 is R2 from the unrestricted
where RU R
2 > R2 .
model. Note that RU R

Tests Involving Linear Functions of Regression Coefficients

The intuition of constructing a F -test in (2) above can be extended to testing hypotheses
defined by linear functions of regression coefficients. We learn this from the following
example.

ˆ Consider a Cobb-Douglas production function

log Q = β1 + β2 log K + β3 log L + ε

where K and L denote the capital input and labor input, respectively

– We may want to ask whether we have the constant returns to scale, H0 :


β2 + β3 = 1

– We may want to ask whether H0 : β2 = β3

ˆ Write the model as

Y = β1 + β2 X2 + β3 X3 + ε,

where Y = log Q, X2 = log K, and X3 = log L

ˆ For the first hypothesis, note that

Y = β1 + β2 X2 + (1 − β2 )X3 + ε
Y − X3 = β1 + β2 (X2 − X3 ) + ε

Therefore, the F -test is conducted as follows:

8
– Estimate the restricted model and compute RSSR

Y − X3 = β1 + β2 (X2 − X3 ) + ε

– Estimate the unrestricted model and compute RSSU

Y = β1 + β2 X2 + β3 X3 + ε

– Note that q = 1

– Then,

(RSSR − RSSU )/1


F =
RSSU /(n − 3)

ˆ For the second hypothesis, note that

Y = β1 + β2 (X2 + X3 ) + ε

Therefore, the F -test is conducted as follows:

– Estimate the restricted model and compute RSSR

Y = β1 + β2 (X2 + X3 ) + ε

– Estimate the unrestricted model and compute RSSU

Y = β1 + β2 X2 + β3 X3 + ε

– Note that q = 1

– Then,

(RSSR − RSSU )/1


F =
RSSU /(n − 3)

7 Multicollinearity

ˆ An important assumption of the multiple regression model is that there is no exact


linear relationship between any of the independent variables. If such an exact relation
exists, we say that (exact) multicollinearity exists

9
ˆ Linear regression coefficient β2 measures the marginal response of the dependent
variable with respect to a unit increase of X2 with all other variables held constant.
Suppose that X3 = 4X2 . Then, a unit increase in X2 is always accompanied by 4
units increase in X3 , which will be associated with the change in Y by β2 + 4β3 .
There is no way to separate out β2

ˆ When there is a high correlation among some sets of independent variables, we say
there exists near multicollinearity or a multicollinearity problem

ˆ A typical observation of multicollinearity problem is that standard errors of individ-


ual coefficients are quite large while R2 is reasonably high.

ˆ Example:

Note that, for the three variable (including constant term) regression model,

σ2
V ar(βb2 ) = P 2 )
(X2i − X 2 )2 (1 − r23
where r23 is the sample correlation coefficient between X2 and X3
P
(X2i − X 2 )(X3i − X 3 )
r23 = qP qP
(X2i − X 2 ) 2 (X3i − X 3 )2

V ar(βb2 ) goes to infinity as r23 → 1, i.e. as X2 and X3 are more correlated (This
implies that t-statistic is insignificant and the confidence interval is wide.)

This may also result in significant F -statistic of overall significance of coefficients,


i.e. rejecting H0 : β2 = · · · = βk = 0

Note that the F -statistic in this case is given by

R2 /(k − 1)
F =
(1 − R2 )/(n − k)
Then, F -statistic can be large even if individual t-statistic is small

That is, under multicollinearity explanatory variables can be jointly significant even
if each of them is individually insignificant

ˆ In this case, we also observe dropping one or more variables from the equation lowers
the standard errors of the remaining variables while R2 changes little

10
ˆ Existence of multicollinearity (some degree of correlation among independent vari-
ables) is itself not a problem and indeed essential. Our concern is its degree or
magnitude

8 Standardized Coefficient

ˆ Note that elasticity measures the percentage change of the dependent variable in
response to one percent change in an independent variable. Therefore, elasticity is
unit free while βj is not

ˆ To rely on standardized coefficient is another possibility to get unit-free result. For-


mally, we define
sX
βbj∗ = βbj j
sY
as the standardized coefficient estimate, where sXj and sY are standard deviaitons
of Xj and Y , respectively

ˆ It can be estimated from the regression


Yi − Y X2,i − X 2 Xk,i − X k
= β2 + · · · + βk + ε∗i
sY sX2 sXk
where ε∗i = εi /sY

ˆ A standardized coefficient 0.7 implies that a one standard deviation change in the
independent variable will lead to 0.7 standard deviation change in the dependent
variable

9 Testing the Functional Form of Regression

ˆ Example: MWD test

10 Example of Output by Software: STATA

ˆ Consider a regression of the model of term GPA:

T ermGP Ai = β1 + β2 Atndratei + β3 HW rate + β4 SAT + εi

where T ermGP A, Atndrate, HW rate, and SAT denote the term GPA of the i-
th student, the percentage classes attended out of 32 per semester, the percentage
homework turned in, and SAT score of the student, respectively.

11
ˆ STATA output

ˆ Number of variables = 4 (including constant)

ˆ Number of observations = 674

ˆ Degrees of freedom = 674 − 4 = 670

ˆ Vd
ar(βbj ) = s2 /RSSj where RSSj is the residual sum of squares from an auxiliary
regression of Xj on constant and the other independent variables (i.e. Partitioned
Regression)
q
ˆ Standard errors are given by Vd
ar(βbj )

ˆ t-statistics are calculated under the null hypothesis that H0 : βj = 0 for each j,
βb
which is given by q j
Vd
ar(βbj )

ˆ p-values on the fifth column of the table (P > |t|) are obtained from t distribution
with n − k degrees of freedom

ˆ The definition of the adjusted R-squared is

2 RSS/(n − k) Sum of Squared Resid/(n − k)


R =1− =1−
T SS/(n − 1) (S.D. of dependent variable)2
RSS Sum of Squared Resid
while the R-squared is given by R2 = 1 − T SS =1− (S.D. of dependent variable)2 ×(n−1)

qP
ˆ S.D. of dependent variable: sY =
p
(Yi − Y )2 /(n − 1) = T SS/(n − 1)

ˆ S.E. of regression:
p
s2 = RSS/(n − k)

12
ˆ F -statistic is obtained under the null hypothesis that H0 : β2 = · · · = βk = 0. The
definition of F -statistic is
2 − R2 )/q
(RSSR − RSSU )/q (RU R
F = = 2 )/(n − k)
RSSU /(n − k) (1 − RU

where the subscript ‘R’ denotes restricted model and the subscript ‘U’denotes unre-
stricted model, and q is the number of restrictions. Under H0 , RR = 0 and q = k −1,
which gives the usual formula as

(T SS − RSS)/(k − 1) ESS/(k − 1) R2 /(k − 1)


F = = =
RSS/(n − k) RSS/(n − k) (1 − R2 )/(n − k)

ˆ p-value for F -statistic (P rob > F ) is obtained from F distribution with degree of
freedom (k − 1, n − k)

13

You might also like