Professional Documents
Culture Documents
Lecture Note 4
Multiple Regression Model
1 Model
where
- Y is dependent variable
- X1 , . . . , Xk are independent variables
- ε is error term
- β1 is the intercept. (Note that we implicitly define X1i =1)
- βj measures marginal response of Y with respect to unit increase in Xj (holding
the others constant)
log\
(wage) = .284 + .092 (educ) + .0041 (exper) + .022 (tenure)
n = 526, R2 = 0.316
Critical values: The 5% two-sided critical value is 1.960 and the 1% two-sided critical
value is 2.576
0.0041
texper = 0.0017 ≈ 2.41 > tα/2 (522) = 1.960 with α = 0.05, and thus βbexper is
statistically significant at 5% significance
1
3 Is Linearity Assumption Restrictive?
This is because many models are inherently linear models, which can be expressed in
a form that is linear in parameters by an appropriate transformation of the variables
On the other hand, there do exist inherently nonlinear models for which such trans-
formation does not exist
Example
Obvious examples:
Y = β1 + β2 X2 + β3 X22 + ε
ln Y = β1 + β2 ln X2 + β3 X3 + ε
Y = β1 + β2 log X2 + ε
Y = β1 + β2 X2 + β3 X3 + β4 X2 X3 + ε
Nontrivial examples:
Q = ALα K β ⇐⇒ ln Q = ln A + α ln L + β ln K
Y = γ1 X2γ2 X3γ3 + ε
2
Objective:
X 2
min RSS = Yi − βb1 − βb2 X2i − . . . − βbk Xki
βb1 ,βb2 ,...,βbk i
∂ X 2 X ∂
Yi − βb1 − βb2 X2i − . . . − βbk Xki = 2ei Yi − βb1 − βb2 X2i − . . . − βbk Xki
∂ βb1 ∂ βb1
X
= −2 ei = 0
∂ X 2 X ∂
Yi − βb1 − βb2 X2i − . . . − βbk Xki = 2ei Yi − βb1 − βb2 X2i − . . . − βbk Xki
∂ βb2 ∂ βb2
X
= −2 ei X2i = 0
..
.
∂ X 2 X ∂
Yi − βb1 − βb2 X2i − . . . − βbk Xki = 2ei Yi − βb1 − βb2 X2i − . . . − βbk Xki
∂ βbk ∂ βbk
X
= −2 ei Xki = 0
3
– OLS estimator can be obtained from this principle
– Classical Assumption: E[εi ] = 0, E[εi X2i ] = 0,. . .,E[εi Xki ] = 0 imply that
1X
E[εi ] = 0 =⇒ E[e
b i ] = 0 =⇒ ei = 0
n
b i X2i ] = 0 =⇒ 1
X
E[εi X2i ] = 0 =⇒ E[e ei X2i = 0
n
..
.
1X
E[εi Xki ] = 0 =⇒ E[e
b i Xki ] = 0 =⇒ ei Xki = 0
n
where E[·]
b denotes the sample analogy of expectation, which is the sample mean
The OLS estimator is obtained from mimicking in the sample the model structure
laid out in the assumptions for the population
5 Regression Statistics
Observe that se(βbj ) is the square root of the j-th diagonal element of the matrix.
Computer reports this as the standard error of βbj
4
Under the assumption that the errors are independently and normally distributed,
we can apply t-test and implement t-distribution-based confidence intervals because
βbj − βj
∼ t(n − k)
se(βbj )
Note that the degree of freedom is equal to the denominator used in the computation
of s2 :
Most commonly applied null hypothesis takes the form of H0 : βj = 0 (Does the j-th
independent variable explain the dependent variable?) Observe that the t-statistic
for this case should be
βbj
t= ∼ t(n − k)
se(βbj )
Computer usually reports this ratio for each j. It is what is called the ‘t-statistic’
in the usual computer output. Note that, if your null hypothesis does not take the
default form H0 : βj = 0, then the t-statistic should be calculated by hand or by
using a special option in your software
T SS = ESS + RSS
We define R2 as before
ESS
R2 =
T SS
5
Sometimes, adjusted-R2 corrects this problem. It is formally defined by
n−1
R̄2 = 1 − (1 − R2 )
n−k
Observe that (assuming R2 is fixed)
n−1 n−1 n−1
k ↑ =⇒ n − k ↓ =⇒ ↑ =⇒ (1 − R2 ) ↑ =⇒ 1 − (1 − R2 ) ↓
n−k n−k n−k
Therefore, R̄2 penalizes large number of independent variables.
Note that as more independent variables are added, R2 itself increases but k ↑.
Therefore, depending on the magnitude of increase in R2 (i.e., quality of additional
regressors), R̄2 may increase or decrease.
H0 : β2 = · · · = βk = 0
against
H1 : ∃βj ̸= 0 for j = 2, . . . , k
R2 n−k
· ∼ F (k − 1, n − k)
1 − R2 k − 1
which follows a F (k − 1, n − k) distribution under the null. This is ‘F -statistic’
reported by computer
We use F -tests for hypothesis involving more than one parameter, i.e. joint tests on several
regression coefficients.
Yi = β1 + β2 X2,i + · · · + βk Xk,i + εi
H0 : βk−q+1 = · · · = βk = 0 (1)
6
Under the null, the model simplifies to
βk−q+1 = 0, βk−q+2 = 0, · · · , βk = 0
Note that subscripts of βk ’s above are just labeling, so the hypotheses in (1) can
nest the following examples
H0 : β2 = β4 = β6 = 0
H0 : β2 = β5 = β7 = β11 = 0
Yi = β1 + β2 X2,i + · · · + βk Xk,i + εi
The above test is called the F -test, and the statistic is called the F -statistic
RSSR > RSSU : Least squares estimates are obtained by minimizing the RSS. With
more restrictions, the minimization will be constrained and the resulting RSS is
bound to be larger than the one without any restriction
7
The test statistic has a F (q, n − k) under the null
(RU2 − R2 )/q
R
F = 2 )/(n − k) , (2)
(1 − RU
2 is R2 from the unrestricted model and R2 is R2 from the unrestricted
where RU R
2 > R2 .
model. Note that RU R
The intuition of constructing a F -test in (2) above can be extended to testing hypotheses
defined by linear functions of regression coefficients. We learn this from the following
example.
where K and L denote the capital input and labor input, respectively
Y = β1 + β2 X2 + β3 X3 + ε,
Y = β1 + β2 X2 + (1 − β2 )X3 + ε
Y − X3 = β1 + β2 (X2 − X3 ) + ε
8
– Estimate the restricted model and compute RSSR
Y − X3 = β1 + β2 (X2 − X3 ) + ε
Y = β1 + β2 X2 + β3 X3 + ε
– Note that q = 1
– Then,
Y = β1 + β2 (X2 + X3 ) + ε
Y = β1 + β2 (X2 + X3 ) + ε
Y = β1 + β2 X2 + β3 X3 + ε
– Note that q = 1
– Then,
7 Multicollinearity
9
Linear regression coefficient β2 measures the marginal response of the dependent
variable with respect to a unit increase of X2 with all other variables held constant.
Suppose that X3 = 4X2 . Then, a unit increase in X2 is always accompanied by 4
units increase in X3 , which will be associated with the change in Y by β2 + 4β3 .
There is no way to separate out β2
When there is a high correlation among some sets of independent variables, we say
there exists near multicollinearity or a multicollinearity problem
Example:
Note that, for the three variable (including constant term) regression model,
σ2
V ar(βb2 ) = P 2 )
(X2i − X 2 )2 (1 − r23
where r23 is the sample correlation coefficient between X2 and X3
P
(X2i − X 2 )(X3i − X 3 )
r23 = qP qP
(X2i − X 2 ) 2 (X3i − X 3 )2
V ar(βb2 ) goes to infinity as r23 → 1, i.e. as X2 and X3 are more correlated (This
implies that t-statistic is insignificant and the confidence interval is wide.)
R2 /(k − 1)
F =
(1 − R2 )/(n − k)
Then, F -statistic can be large even if individual t-statistic is small
That is, under multicollinearity explanatory variables can be jointly significant even
if each of them is individually insignificant
In this case, we also observe dropping one or more variables from the equation lowers
the standard errors of the remaining variables while R2 changes little
10
Existence of multicollinearity (some degree of correlation among independent vari-
ables) is itself not a problem and indeed essential. Our concern is its degree or
magnitude
8 Standardized Coefficient
Note that elasticity measures the percentage change of the dependent variable in
response to one percent change in an independent variable. Therefore, elasticity is
unit free while βj is not
A standardized coefficient 0.7 implies that a one standard deviation change in the
independent variable will lead to 0.7 standard deviation change in the dependent
variable
where T ermGP A, Atndrate, HW rate, and SAT denote the term GPA of the i-
th student, the percentage classes attended out of 32 per semester, the percentage
homework turned in, and SAT score of the student, respectively.
11
STATA output
Vd
ar(βbj ) = s2 /RSSj where RSSj is the residual sum of squares from an auxiliary
regression of Xj on constant and the other independent variables (i.e. Partitioned
Regression)
q
Standard errors are given by Vd
ar(βbj )
t-statistics are calculated under the null hypothesis that H0 : βj = 0 for each j,
βb
which is given by q j
Vd
ar(βbj )
p-values on the fifth column of the table (P > |t|) are obtained from t distribution
with n − k degrees of freedom
qP
S.D. of dependent variable: sY =
p
(Yi − Y )2 /(n − 1) = T SS/(n − 1)
√
S.E. of regression:
p
s2 = RSS/(n − k)
12
F -statistic is obtained under the null hypothesis that H0 : β2 = · · · = βk = 0. The
definition of F -statistic is
2 − R2 )/q
(RSSR − RSSU )/q (RU R
F = = 2 )/(n − k)
RSSU /(n − k) (1 − RU
where the subscript ‘R’ denotes restricted model and the subscript ‘U’denotes unre-
stricted model, and q is the number of restrictions. Under H0 , RR = 0 and q = k −1,
which gives the usual formula as
p-value for F -statistic (P rob > F ) is obtained from F distribution with degree of
freedom (k − 1, n − k)
13