(2022-2-GEC-ê Ë ) LN4 - Multiple Regression Model

Hyunchul Kim
Department of Economics Undergraduate

Sungkyunkwan University Econometrics
Lecture Note 4
Multiple Regression Model
This note is based on the lecture note series by Kyoo il Kim.
1 Model
Natural extension of two-variable model:
Yi = β1 + β2 X2i + · · · + βk Xki + εi , i = 1, 2, ..., n
where
- Y is dependent variable
- X1 , . . . , Xk are independent variables
- ε is error term
- β1 is the intercept. (Note that we implicitly define X1i =1)
- βj measures marginal response of Y with respect to unit increase in Xj (holding
the others constant)
2 Example: Regression of Wage Equation
log\
(wage) = .284 + .092 (educ) + .0041 (exper) + .022 (tenure)
(.104) (.007) (.0017) (.003)
n = 526, R2 = 0.316
Degree of freedom: 526 − 4 = 522
Standard errors in parentheses
Interpret individual coefficients
Critical values: The 5% two-sided critical value is 1.960 and the 1% two-sided critical
value is 2.576
0.0041
texper = 0.0017 ≈ 2.41 > tα/2 (522) = 1.960 with α = 0.05, and thus βbexper is
statistically significant at 5% significance
1
3 Is Linearity Assumption Restrictive?
Often not too restrictive
Linearity in parameter is often recovered after transformation
This is because many models are inherently linear models, which can be expressed in
a form that is linear in parameters by an appropriate transformation of the variables
On the other hand, there do exist inherently nonlinear models for which such trans-
formation does not exist
Example
Obvious examples:
Y = β1 + β2 X2 + β3 X22 + ε
ln Y = β1 + β2 ln X2 + β3 X3 + ε
Y = β1 + β2 log X2 + ε
Y = β1 + β2 X2 + β3 X3 + β4 X2 X3 + ε
Nontrivial examples:
Y = γ1 X2γ2 X3γ3 ε∗ ⇐⇒ log Y = log γ1 + γ2 log X2 + γ3 log X3 + log ε∗

Y = exp (β1 + β2 X2 + β3 X3 )ε∗ ⇐⇒ log Y = β1 + β2 X2 + β3 X3 + log ε∗
1 1
Y = ⇐⇒ = β1 + β2 X2 + β3 X3 + ε
β1 + β2 X2 + β3 X3 + ε Y
Cobb-Douglas production function:
Q = ALα K β ⇐⇒ ln Q = ln A + α ln L + β ln K
Nonlinear model: No transformation is possible to recover linearity
Y = γ1 X2γ2 X3γ3 + ε
4 Least Squares Estimation of Multiple Regression Model
Consider the following multiple regression function
Yi = β1 + β2 X2i + · · · + βk Xki + εi , i = 1, 2, ..., n
2
Objective:
X 2
min RSS = Yi − βb1 − βb2 X2i − . . . − βbk Xki
βb1 ,βb2 ,...,βbk i
Why least squares? BLUE property by Gauss-Markov theorem as in the two-variable

case
Normal equation (k unknowns with k equations):

X
ei = 0
X
ei X2i = 0
..
.
X
ei Xki = 0
Derivation of normal equation (first order conditions):
∂ X 2 X ∂
Yi − βb1 − βb2 X2i − . . . − βbk Xki = 2ei Yi − βb1 − βb2 X2i − . . . − βbk Xki
∂ βb1 ∂ βb1
X
= −2 ei = 0
∂ X 2 X ∂
∂ βb2 ∂ βb2
X
= −2 ei X2i = 0
..
.
∂ X 2 X ∂
∂ βbk ∂ βbk
X
= −2 ei Xki = 0
OLS estimates are obtained as solutions to the system of normal equations
Closed form expression of OLS exists in matrix algebra
Sample Analogy Principle: Roughly speaking, properties held in the population

should also hold in the sample
3
– OLS estimator can be obtained from this principle
– Classical Assumption: E[εi ] = 0, E[εi X2i ] = 0,. . .,E[εi Xki ] = 0 imply that
1X
E[εi ] = 0 =⇒ E[e
b i ] = 0 =⇒ ei = 0
n
b i X2i ] = 0 =⇒ 1
X
E[εi X2i ] = 0 =⇒ E[e ei X2i = 0
n
..
.
1X
E[εi Xki ] = 0 =⇒ E[e
b i Xki ] = 0 =⇒ ei Xki = 0
n
where E[·]
b denotes the sample analogy of expectation, which is the sample mean
The OLS estimator is obtained from mimicking in the sample the model structure
laid out in the assumptions for the population
5 Regression Statistics
An unbiased estimator of σ 2 is provided by

1 X 2
s2 = ei
n−k
where ei is the residual from the OLS estimator
Variance-covariance of OLS estimators depend on σ 2 . Because we usually do not

know σ 2 , variances and covariances of OLS estimators are estimated based on the
estimated value of σ 2 . These estimated variances and covariances are summarized
in variance-covariance matrix of computer output:
 2 
Vdar(βb1 ) = se(βb1 ) Cov(
d βb1 , βb2 ) ··· Cov(
d βb1 , βbk )

 2 

 Vd
ar(βb2 ) = se(βb2 ) ··· Cov(
d βb2 , βbk ) 


.. .. 

 . .


 2 
Vd
ar(βbk ) = se(βbk )
Observe that se(βbj ) is the square root of the j-th diagonal element of the matrix.
Computer reports this as the standard error of βbj
4
Under the assumption that the errors are independently and normally distributed,
we can apply t-test and implement t-distribution-based confidence intervals because
βbj − βj
∼ t(n − k)
se(βbj )
Note that the degree of freedom is equal to the denominator used in the computation
of s2 :
d.f. = (the sample size) - (the number of parameters estimated)
Most commonly applied null hypothesis takes the form of H0 : βj = 0 (Does the j-th
independent variable explain the dependent variable?) Observe that the t-statistic
for this case should be
βbj
t= ∼ t(n − k)
se(βbj )
Computer usually reports this ratio for each j. It is what is called the ‘t-statistic’
in the usual computer output. Note that, if your null hypothesis does not take the
default form H0 : βj = 0, then the t-statistic should be calculated by hand or by
using a special option in your software
As in the two-variable case, we have the decomposition:

X 2 X 2 X 2
Yi − Y = Ybi − Y + Yi − Ybi
T SS = ESS + RSS
We define R2 as before
ESS
R2 =
T SS
Again, R2 measures the variation of Y that is explained by the set of explanatory

variables. It is often informally used as goodness-of-fit statistic.
The problem with R2 is that it increases as a function of k, the number of explanatory

variables, regardless of their quality: Even if nonsensical independent variables are
added, R2 will always increase. Therefore, R2 may mislead us to believe that the
set of explanatory variables as a whole explains well the dependent variable. Indeed,
with k = n, we may have R2 = 1
5
Sometimes, adjusted-R2 corrects this problem. It is formally defined by
n−1
R̄2 = 1 − (1 − R2 )
n−k
Observe that (assuming R2 is fixed)
n−1 n−1 n−1
k ↑ =⇒ n − k ↓ =⇒ ↑ =⇒ (1 − R2 ) ↑ =⇒ 1 − (1 − R2 ) ↓
n−k n−k n−k
Therefore, R̄2 penalizes large number of independent variables.
Note that as more independent variables are added, R2 itself increases but k ↑.
Therefore, depending on the magnitude of increase in R2 (i.e., quality of additional
regressors), R̄2 may increase or decrease.
A reasonable hypothesis to entertain: None of the explanatory variables helps to

explain the variations of Y around the mean, which can formally be written as
H0 : β2 = · · · = βk = 0
against
H1 : ∃βj ̸= 0 for j = 2, . . . , k
The test statistic for testing this particular hypothesis is
R2 n−k
· ∼ F (k − 1, n − k)
1 − R2 k − 1
which follows a F (k − 1, n − k) distribution under the null. This is ‘F -statistic’
reported by computer
6 Joint Hypothesis Testing
We use F -tests for hypothesis involving more than one parameter, i.e. joint tests on several
regression coefficients.
Given a linear regression model
Yi = β1 + β2 X2,i + · · · + βk Xk,i + εi
it may be of interest to test the hypothesis
H0 : βk−q+1 = · · · = βk = 0 (1)
6
Under the null, the model simplifies to
Yi = β1 + β2 X2,i + · · · + βk−q Xk−q,i + εi
The null hypothesis consists of q hypotheses:
βk−q+1 = 0, βk−q+2 = 0, · · · , βk = 0
Therefore, the usual t-test cannot be used.
Note that subscripts of βk ’s above are just labeling, so the hypotheses in (1) can
nest the following examples
H0 : β2 = β4 = β6 = 0
H0 : β2 = β5 = β7 = β11 = 0
Use of F -test for testing the joint hypotheses of (1)
– Estimate the restricted model (plugging in values from the hypotheses):
Yi = β1 + β2 X2,i + · · · + βk−q Xk−q,i + εi
– Obtain the RSS of the restricted model, and call it RSSR
– Estimate the unrestricted model (the original model):
Yi = β1 + β2 X2,i + · · · + βk Xk,i + εi
– Obtain the RSS of the unrestricted model, and call it RRSU
– The statistic is obtained by
(RSSR − RSSU )/q

F =
RSSU /(n − k)
– Reject the null hypotheses if the test statistic exceeds Fα (q, n − k)
The above test is called the F -test, and the statistic is called the F -statistic
RSSR > RSSU : Least squares estimates are obtained by minimizing the RSS. With
more restrictions, the minimization will be constrained and the resulting RSS is
bound to be larger than the one without any restriction
7
The test statistic has a F (q, n − k) under the null
When q = 1, the result of the F -test is identical to that of the t-test
Alternatively, we can write
(RU2 − R2 )/q
R
F = 2 )/(n − k) , (2)
(1 − RU
2 is R2 from the unrestricted model and R2 is R2 from the unrestricted
where RU R
2 > R2 .
model. Note that RU R
Tests Involving Linear Functions of Regression Coefficients
The intuition of constructing a F -test in (2) above can be extended to testing hypotheses
defined by linear functions of regression coefficients. We learn this from the following
example.
Consider a Cobb-Douglas production function
log Q = β1 + β2 log K + β3 log L + ε
where K and L denote the capital input and labor input, respectively
– We may want to ask whether we have the constant returns to scale, H0 :

β2 + β3 = 1
– We may want to ask whether H0 : β2 = β3
Write the model as
Y = β1 + β2 X2 + β3 X3 + ε,
where Y = log Q, X2 = log K, and X3 = log L
For the first hypothesis, note that
Y = β1 + β2 X2 + (1 − β2 )X3 + ε
Y − X3 = β1 + β2 (X2 − X3 ) + ε
Therefore, the F -test is conducted as follows:
8
– Estimate the restricted model and compute RSSR
Y − X3 = β1 + β2 (X2 − X3 ) + ε
– Estimate the unrestricted model and compute RSSU
Y = β1 + β2 X2 + β3 X3 + ε
– Note that q = 1
– Then,
(RSSR − RSSU )/1

F =
RSSU /(n − 3)
For the second hypothesis, note that
Y = β1 + β2 (X2 + X3 ) + ε
Therefore, the F -test is conducted as follows:
– Estimate the restricted model and compute RSSR
Y = β1 + β2 (X2 + X3 ) + ε
– Estimate the unrestricted model and compute RSSU
Y = β1 + β2 X2 + β3 X3 + ε
– Note that q = 1
– Then,
(RSSR − RSSU )/1

F =
RSSU /(n − 3)
7 Multicollinearity
An important assumption of the multiple regression model is that there is no exact

linear relationship between any of the independent variables. If such an exact relation
exists, we say that (exact) multicollinearity exists
9
Linear regression coefficient β2 measures the marginal response of the dependent
variable with respect to a unit increase of X2 with all other variables held constant.
Suppose that X3 = 4X2 . Then, a unit increase in X2 is always accompanied by 4
units increase in X3 , which will be associated with the change in Y by β2 + 4β3 .
There is no way to separate out β2
When there is a high correlation among some sets of independent variables, we say
there exists near multicollinearity or a multicollinearity problem
A typical observation of multicollinearity problem is that standard errors of individ-

ual coefficients are quite large while R2 is reasonably high.
Example:
Note that, for the three variable (including constant term) regression model,
σ2
V ar(βb2 ) = P 2 )
(X2i − X 2 )2 (1 − r23
where r23 is the sample correlation coefficient between X2 and X3
P
(X2i − X 2 )(X3i − X 3 )
r23 = qP qP
(X2i − X 2 ) 2 (X3i − X 3 )2
V ar(βb2 ) goes to infinity as r23 → 1, i.e. as X2 and X3 are more correlated (This
implies that t-statistic is insignificant and the confidence interval is wide.)
This may also result in significant F -statistic of overall significance of coefficients,

i.e. rejecting H0 : β2 = · · · = βk = 0
Note that the F -statistic in this case is given by
R2 /(k − 1)
F =
(1 − R2 )/(n − k)
Then, F -statistic can be large even if individual t-statistic is small
That is, under multicollinearity explanatory variables can be jointly significant even
if each of them is individually insignificant
In this case, we also observe dropping one or more variables from the equation lowers
the standard errors of the remaining variables while R2 changes little
10
Existence of multicollinearity (some degree of correlation among independent vari-
ables) is itself not a problem and indeed essential. Our concern is its degree or
magnitude
8 Standardized Coefficient
Note that elasticity measures the percentage change of the dependent variable in
response to one percent change in an independent variable. Therefore, elasticity is
unit free while βj is not
To rely on standardized coefficient is another possibility to get unit-free result. For-

mally, we define
sX
βbj∗ = βbj j
sY
as the standardized coefficient estimate, where sXj and sY are standard deviaitons
of Xj and Y , respectively
It can be estimated from the regression

Yi − Y X2,i − X 2 Xk,i − X k
= β2 + · · · + βk + ε∗i
sY sX2 sXk
where ε∗i = εi /sY
A standardized coefficient 0.7 implies that a one standard deviation change in the
independent variable will lead to 0.7 standard deviation change in the dependent
variable
9 Testing the Functional Form of Regression
Example: MWD test
10 Example of Output by Software: STATA
Consider a regression of the model of term GPA:
T ermGP Ai = β1 + β2 Atndratei + β3 HW rate + β4 SAT + εi
where T ermGP A, Atndrate, HW rate, and SAT denote the term GPA of the i-
th student, the percentage classes attended out of 32 per semester, the percentage
homework turned in, and SAT score of the student, respectively.
11
STATA output
Number of variables = 4 (including constant)
Number of observations = 674
Degrees of freedom = 674 − 4 = 670
Vd
ar(βbj ) = s2 /RSSj where RSSj is the residual sum of squares from an auxiliary
regression of Xj on constant and the other independent variables (i.e. Partitioned
Regression)
q
Standard errors are given by Vd
ar(βbj )
t-statistics are calculated under the null hypothesis that H0 : βj = 0 for each j,
βb
which is given by q j
Vd
ar(βbj )
p-values on the fifth column of the table (P > |t|) are obtained from t distribution
with n − k degrees of freedom
The definition of the adjusted R-squared is
2 RSS/(n − k) Sum of Squared Resid/(n − k)

R =1− =1−
T SS/(n − 1) (S.D. of dependent variable)2
RSS Sum of Squared Resid
while the R-squared is given by R2 = 1 − T SS =1− (S.D. of dependent variable)2 ×(n−1)
qP
S.D. of dependent variable: sY =
p
(Yi − Y )2 /(n − 1) = T SS/(n − 1)
√
S.E. of regression:
p
s2 = RSS/(n − k)
12
F -statistic is obtained under the null hypothesis that H0 : β2 = · · · = βk = 0. The
definition of F -statistic is
2 − R2 )/q
(RSSR − RSSU )/q (RU R
F = = 2 )/(n − k)
RSSU /(n − k) (1 − RU
where the subscript ‘R’ denotes restricted model and the subscript ‘U’denotes unre-
stricted model, and q is the number of restrictions. Under H0 , RR = 0 and q = k −1,
which gives the usual formula as
(T SS − RSS)/(k − 1) ESS/(k − 1) R2 /(k − 1)

F = = =
RSS/(n − k) RSS/(n − k) (1 − R2 )/(n − k)
p-value for F -statistic (P rob > F ) is obtained from F distribution with degree of
freedom (k − 1, n − k)
13

(2022-2-GEC-ê Ë ) LN4 - Multiple Regression Model

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

(2022-2-GEC-ê Ë ) LN4 - Multiple Regression Model

Uploaded by

Copyright:

Available Formats

Hyunchul Kim

Department of Economics Undergraduate

This note is based on the lecture note series by Kyoo il Kim.

 Natural extension of two-variable model:

Yi = β1 + β2 X2i + · · · + βk Xki + εi , i = 1, 2, ..., n

2 Example: Regression of Wage Equation

(.104) (.007) (.0017) (.003)

 Degree of freedom: 526 − 4 = 522

 Standard errors in parentheses

 Interpret individual coefficients

 Often not too restrictive

 Linearity in parameter is often recovered after transformation

Y = γ1 X2γ2 X3γ3 ε∗ ⇐⇒ log Y = log γ1 + γ2 log X2 + γ3 log X3 + log ε∗

Cobb-Douglas production function:

Nonlinear model: No transformation is possible to recover linearity

4 Least Squares Estimation of Multiple Regression Model

 Consider the following multiple regression function

Yi = β1 + β2 X2i + · · · + βk Xki + εi , i = 1, 2, ..., n

 Why least squares? BLUE property by Gauss-Markov theorem as in the two-variable

 Normal equation (k unknowns with k equations):

 Derivation of normal equation (first order conditions):

 OLS estimates are obtained as solutions to the system of normal equations

 Closed form expression of OLS exists in matrix algebra

 Sample Analogy Principle: Roughly speaking, properties held in the population

 An unbiased estimator of σ 2 is provided by

 Variance-covariance of OLS estimators depend on σ 2 . Because we usually do not

d.f. = (the sample size) - (the number of parameters estimated)

 As in the two-variable case, we have the decomposition:

 Again, R2 measures the variation of Y that is explained by the set of explanatory

 The problem with R2 is that it increases as a function of k, the number of explanatory

 A reasonable hypothesis to entertain: None of the explanatory variables helps to

The test statistic for testing this particular hypothesis is

6 Joint Hypothesis Testing

 Given a linear regression model

it may be of interest to test the hypothesis

Yi = β1 + β2 X2,i + · · · + βk−q Xk−q,i + εi

The null hypothesis consists of q hypotheses:

Therefore, the usual t-test cannot be used.

 Use of F -test for testing the joint hypotheses of (1)

– Estimate the restricted model (plugging in values from the hypotheses):

Yi = β1 + β2 X2,i + · · · + βk−q Xk−q,i + εi

– Obtain the RSS of the restricted model, and call it RSSR

– Estimate the unrestricted model (the original model):

– Obtain the RSS of the unrestricted model, and call it RRSU

– The statistic is obtained by

(RSSR − RSSU )/q

– Reject the null hypotheses if the test statistic exceeds Fα (q, n − k)

 When q = 1, the result of the F -test is identical to that of the t-test

 Alternatively, we can write

Tests Involving Linear Functions of Regression Coefficients

 Consider a Cobb-Douglas production function

log Q = β1 + β2 log K + β3 log L + ε

– We may want to ask whether we have the constant returns to scale, H0 :

– We may want to ask whether H0 : β2 = β3

 Write the model as

where Y = log Q, X2 = log K, and X3 = log L

 For the first hypothesis, note that

Therefore, the F -test is conducted as follows:

– Estimate the unrestricted model and compute RSSU

(RSSR − RSSU )/1

 For the second hypothesis, note that

Therefore, the F -test is conducted as follows:

– Estimate the restricted model and compute RSSR

Natural extension of two-variable model:

Degree of freedom: 526 − 4 = 522

Standard errors in parentheses

Interpret individual coefficients

Often not too restrictive

Linearity in parameter is often recovered after transformation

Consider the following multiple regression function

Why least squares? BLUE property by Gauss-Markov theorem as in the two-variable

Normal equation (k unknowns with k equations):

Derivation of normal equation (first order conditions):

OLS estimates are obtained as solutions to the system of normal equations

Closed form expression of OLS exists in matrix algebra

Sample Analogy Principle: Roughly speaking, properties held in the population

An unbiased estimator of σ 2 is provided by

Variance-covariance of OLS estimators depend on σ 2 . Because we usually do not

As in the two-variable case, we have the decomposition:

Again, R2 measures the variation of Y that is explained by the set of explanatory

The problem with R2 is that it increases as a function of k, the number of explanatory

A reasonable hypothesis to entertain: None of the explanatory variables helps to

Given a linear regression model

Use of F -test for testing the joint hypotheses of (1)

When q = 1, the result of the F -test is identical to that of the t-test

Alternatively, we can write

Consider a Cobb-Douglas production function

Write the model as

For the first hypothesis, note that

For the second hypothesis, note that

An important assumption of the multiple regression model is that there is no exact

A typical observation of multicollinearity problem is that standard errors of individ-

To rely on standardized coefficient is another possibility to get unit-free result. For-

It can be estimated from the regression

Example: MWD test

Consider a regression of the model of term GPA:

Number of variables = 4 (including constant)

Number of observations = 674

Degrees of freedom = 674 − 4 = 670

The definition of the adjusted R-squared is