You are on page 1of 49

Econometrics

Dong Xuan Bach

Data Science program - NEU

2023/2024

Dong Xuan Bach Econometrics 2023/2024 1 / 49


Multiple Regression Model

Dong Xuan Bach Econometrics 2023/2024 2 / 49


The need of multiplicity?

Usually, studying an economics relationship requires many


independent variables.
More flexible and more suitable in terms of functional forms
Better regression and prediction.

Dong Xuan Bach Econometrics 2023/2024 3 / 49


Model

Model of k explanatory
In Population (PRF) In Sample (SRF)
E (yi ) = β0 + β1 xi1 + · · · + βk xik yˆi = β̂0 + β̂1 xi1 + · · · + β̂k xik
yi = β0 + β1 xi1 + · · · + βk xik + ui yi = β̂0 + β̂1 xi1 + · · · + β̂k xik + ei

Intercept β0 = E (y |0,...,0 )
∂E (y )
Slope βj = ∂xj
If β1 = · · · = βk = 0: model is overall insignificant

Dong Xuan Bach Econometrics 2023/2024 4 / 49


Matrix form

y1 = β0 + β1 x11 + · · · βk x1k + ε1
..
.
yn = β0 + β1 xn1 + · · · βk xnk + εn
     
y1 1 x11 · · · x1k   ε1
y2  1 x21 · · · x2k  β0  ε2 
 .   
 ..  =  .. ..   ..  +  .. 
  
.. . .
 .  . . . .  .
βk
yn 1 xn1 · · · xnk εn

y = Xβ + u
ŷ = X β̂
y = X β̂ + e

Dong Xuan Bach Econometrics 2023/2024 5 / 49


Interpret the coefficients

How do we interpret equation below?

\ = 1.29 + 0.453hsGPA + 0.0094ACT


colGPA
\ = 2.40 + 0.0271ACT
colGPA

Ceteris paribus interpretations


Changing more than one independent variable simultaneously

Dong Xuan Bach Econometrics 2023/2024 6 / 49


OLS estimation

Dong Xuan Bach Econometrics 2023/2024 7 / 49


OLS estimation
Find β̂j , j = 0, k such that they minimizes
n
X n
X
RSS = ei2 = (yi − β̂0 − β̂1 xi1 − · · · − β̂k xik )2
i=1 i=1

In terms of matrix form

β̂∥2
β̂) = ∥yy − X β̂
min S(β̂
β̂

β̂)′ (yy − X β̂
β̂) = (yy − X β̂
We have S(β̂ β̂) = yy ′ − 2β̂ ′X ′y + β̂ ′X ′X β̂
β̂. So take
FOC, one gets
X ′X β̂ = X ′y
If X has full column rank (no multicollinearity of x1 , . . . , xk ), then

X ′X )−1X ′y
β̂ OLS = (X (1)

Dong Xuan Bach Econometrics 2023/2024 8 / 49


OLS fitted values

The sample average of the residuals is zero and so ȳ = ŷ¯ .


The sample covariance between each independent variable and the
OLS residuals is zero. Consequently, the sample covariance between
the OLS fitted values and the OLS residuals is zero.
The point (x̄1 , x̄2 , . . . , x̄k , ȳ ) is always on the OLS regression line.

Dong Xuan Bach Econometrics 2023/2024 9 / 49


Partialling Out interpretation

We focus on β̂1 .
We have
n n
! !
X X
2
β̂1 = rˆi1 yi / rˆi1
i=1 i=1

The residuals rˆi1 come from the regression of x1 on x2 , . . . , xk .


So that we can then do a simple regression of y on rˆ1 to obtain β̂1 .
β̂1 then measures the effect of x1 on y after x2 , . . . , xk have been
partialled or netted out.

Dong Xuan Bach Econometrics 2023/2024 10 / 49


Simple and Multiple Regression Estimates

Let consider k = 2. In the simple regression of y on x1 , we have


ỹ = β̃0 + β̃1 x1 . In the multiple regression, ŷ = βb0 + βb1 x1 + βb2 x2 .
We have β̃1 = βb1 + βb2 δ̃1 where δ̃1 is the slope coefficient from the
simple regression of x2 on x1 .
Simple and multiple regression estimates are equal if
The partial effect of x2 on ŷ is zero, i.e., βb2 = 0
x1 and x2 are uncorrelated, i.e., δ̃1 = 0
Simple and multiple regression estimates are almost never identical.
But we can use the above formula to characterize why they might be
either very different or quite similar.

Dong Xuan Bach Econometrics 2023/2024 11 / 49


Geometric interpretation of OLS

We have
0 = X ′y − X ′X β̂ = X ′e
Which means that e should be perpendicular to every column vector
of matrix X , i.e. perpendicular to the vector space spanned by the
column vectors of X
Condition X ′e = 0 is called the system of normal equations.
X ′X )−1X ′ is orthogonal
Notice that ŷ = X β̂ = PX y , where PX = X (X
projector on vector space spanned by X .
Let MX = I − PX is orthogonal projector on the orthogonal space of
X , e = MX y .

Dong Xuan Bach Econometrics 2023/2024 12 / 49


Estimator of σ

β̂), variance of random error σ 2 is unknown, estimated by


In Var (β̂

ee ′
P 2
ei
s2 = =
n − (k + 1) n−k −1

Estimated variance-covariance matrix:

Var
d (β̂ X ′X )−1
β̂) = s 2 (X

Standard Error of estimated coefficient

βj ) = Var
Se(β d (β̂j )

Dong Xuan Bach Econometrics 2023/2024 13 / 49


Interpret the coefficients

How do we compare the result below?

\ = 1.29 + 0.453hsGPA + 0.0094ACT


colGPA
\ = 2.40 + 0.0271ACT
colGPA

Dong Xuan Bach Econometrics 2023/2024 14 / 49


Goodness-of-fit

Dong Xuan Bach Econometrics 2023/2024 15 / 49


Sum of squares

P
yi
Let ȳ = .
n
n
X
Total sum of squares: TSS = (yi − ȳ ), df = n − 1.
i=1
n
X
Explained/Regression sum of squares: ESS = (ŷi − ȳ ), df = k.
i=1
n
X n
X
Residual sum of squares: RSS = (yi − ŷi ) = ei2 ,
i=1 i=1
df = n − 1 − k.
TSS = ESS + RSS.

Dong Xuan Bach Econometrics 2023/2024 16 / 49


Goodness-of-fit

R-squared is squared of correlation between y and ŷ ;


ESS RSS
R2 = =1−
TSS TSS
It is interpreted as the proportion of the sample variation in y that is
explained by the OLS regression line.
Adding new explanatory variable, even it’s irrelevant, will artificially
increases R 2 .
Adjusted R-square:

n−1 RSS n − 1 s2
Ra2 = 1 − (1 − R 2 ) = 1 − =1− 2
n−k −1 TSS n − k − 1 sy

For models with a different number of explanatory variables, only the


Ra2 can be compared.

Dong Xuan Bach Econometrics 2023/2024 17 / 49


Remark

If there is no constant in the model, R 2 has no meaning because the


way it is computed requires a constant term.
R 2 and adjusted-R 2 are valid only if comparing models that have the
same dependent variable. So they are inappropriate to compare 2
models with y and log (y ) as the dependent variable.
The (adjusted)-R 2 is not enough to assess the relevance of a
regression: we’ll need statistical tests.

Dong Xuan Bach Econometrics 2023/2024 18 / 49


OLS properties

Dong Xuan Bach Econometrics 2023/2024 19 / 49


Gauss - Markov assumptions

Gauss - Markov assumptions


1. (Linearity) y = β0 + β1 x1 + · · · + βk xk + u
2. (Zero mean) E (uu ) = 0
3. (Random sampling) We have a random sample of n observations,
{(xi1 , xi2 , . . . , xik , yi ) : i = 1, n}.
4. (No perfect collinearity) Rank(X ) = k + 1.
5. (Homoskedasticity) Var (ui ) = σ 2 for all i = 1, n.
6. (Non auto correlation) Cov (ui , uj ) = 0 for all i ̸= j.
7. (Normality) u ∼ N(00, σ 2I ).

Dong Xuan Bach Econometrics 2023/2024 20 / 49


Properties of OLS estimator

OLS estimator
Under Assumptions 1-4, OLS estimator is unbias, E (β̂ OLS ) = β .
X ′X )−1 .
Under Assumptions 1-6, Var (β̂ OLS ) = σ 2 (X

Moreover
σ2
Var (β̂j OLS ) =
TSSj (1 − Rj2 )
where TSSj = i (xij − x¯j )2 is the total sample variation in xj , and Rj2 is
P
the R-squared from regressing xj on all other independent variables (and
including an intercept).

Dong Xuan Bach Econometrics 2023/2024 21 / 49


Properties of OLS estimator

BLUE
Under Assumptions 1-6, the OLS estimator, β̂ OLS , is a best linear
unbiased estimator (BLUE) of β .

When n → ∞, (β̂ OLS


n ) converges in probability to β .
Moreover, under Assumption 1-6, the estimators are asymptotically normal.
MVUE
Under Assumptions 1-7, the OLS estimator, β̂ OLS , is also the minimum
variance unbiased estimator of β .

Dong Xuan Bach Econometrics 2023/2024 22 / 49


Including Irrelevant Variables

One (or more) of the independent variables is included in the model


even though it has no partial effect on y in the population
Suppose we specify the model as

y = β0 + β1 x1 + β2 x2 + β3 x3

and this model satisfies Assumptions 1-4. However, x3 has no effect


on y after x1 and x2 have been controlled for, β3 = 0.
There is no effect on the unbiasedness of all coefficients.
However, if x1 and x3 are highly correlated then R1 is high, which
leads to a large variance of βb1 .

Dong Xuan Bach Econometrics 2023/2024 23 / 49


Omitted Variables

For example, we should regress y on x1 and x2 but instead of that, we


regress y on x1 only. Then the coefficient of x1 is mostly bias.
Omitted variable bias β2 δ̃1 where δ̃1 is the slope coefficient from the
simple regression of x2 on x1
The direction of bias.

Dong Xuan Bach Econometrics 2023/2024 24 / 49


Omitted Variables

However, if we regress more than 3 independent variables, this is more


problematic. For example, suppose the population model

y = β0 + β1 x1 + β2 x2 + β3 x3 + u

satisfies Assumptions 1-4. But we omit x3 and estimate the model as

y = β̃0 + β̃1 x1 + β̃2 x2 + u

Suppose that x1 is correlated with x3 .


It’s clear that β̃1 is probably biased (same reason as previous one)
Moreover, β̃2 is also biased even x2 is uncorrelated with x3 .
It’s usually difficult to obtain the direction of bias in β̃1 and β̃2
Nevertheless, if we assume that x1 and x2 are uncorrelated, then we
can study the direction of bias.

Dong Xuan Bach Econometrics 2023/2024 25 / 49


Omitted Variables

Now we compare two estimator of β1 . One comes from

y = βb0 + βb1 x1 + βb2 x2

And the other comes from

y = β̃0 + β̃1 x1

When β2 ̸= 0, β̃1 is biased, βb1 is unbiased, and Var(βb1 )¿Var(βb1 ).


When β2 = 0, both β̃1 and βb1 are unbiased, and Var(βb1 )¿Var(βb1 ).
What we should choose between β̃1 and βb1 ?

Dong Xuan Bach Econometrics 2023/2024 26 / 49


Inference statistics

Dong Xuan Bach Econometrics 2023/2024 27 / 49


Inference with T-distribution

We know the distribution of each βj , but one value is still unknown σj .


The unbiased estimator of σ 2 in the general multiple regression case is
RSS
b2 =
σ
n−k −1

Standard error of βbj is

σ
b
Se(βbj ) = q
TSSj (1 − Rj2 )

Using what we know about the distribution of βbj (normal) and Se(βbj )
(χ2 ), we get :
βbj − βj
∼ tn−k−1
Se(βbj )

Dong Xuan Bach Econometrics 2023/2024 28 / 49


Inference with T-distribution

Statistic Hyp. pair Reject H0 P-value


H0 : βj = βj∗ |t| > t(n−k−1)α/2 2P(T > |tobs |)
βbj − βj∗ ̸ βj∗
H1 : β j =
t= H0 : βj > βj∗ t > t(n−k−1)α/2 P(T > tobs )
Se(βbj )
H1 : β j > βj∗
H0 : β j = βj∗ t < −t(n−k−1)α/2 P(T < tobs )
H1 : β j < βj∗

Important t-test
H0 : βj = 0 vs H1 : βj ̸= 0, j = 1, k
βbj
If |t| = > tα/2 , reject H0 : significant coefficient
Se(βbj )

Dong Xuan Bach Econometrics 2023/2024 29 / 49


Inference of Coefficients

Confidence interval of single coefficient

βbj ± tα/2 Se(βbj )

Inference on two coefficient, say β1 ± β2


Testing H0 : β1 ± β2 = β ∗

(βb1 ± βb1 ) − βj∗


t=
Se(βb1 ± βb1 )

Confidence interval: (βb1 ± βb1 ) ± Se(βb1 ± βb1 )


q
where Se(βb1 ± βb1 ) = Se 2 (βb1 ) + Se 2 (βb1 ) ± 2Cov (βb1 , βb1 )

Dong Xuan Bach Econometrics 2023/2024 30 / 49


The test procedure : an example

Consider the following model:

income = a + b × height + c × education

estimated over N individuals


Suppose that the estimated parameter b̂ is close to zero
We thus infer that variable height could be irrelevant : the correlation
between income and height could (should) be zero
The ”true” b should be zero
But even if it is the case, it is very unlikely that we get b̂ = 0 (due to
computations).
Given the computed b̂, there should be a way to assess if the ”true” b
is in fact zero or not

Dong Xuan Bach Econometrics 2023/2024 31 / 49


The test procedure : an example

Let’s call H0 the hypothesis: b = 0, and H1 the hypothesis: b ̸= 0


Should we consider H0 as true?
βbj −βj
We know that for this model, ∼ tn−3
Se(βbj )
Is the latter still plausible, if we take H0 as granted ?
Taking H0 as granted means that we assume b = 0, so that t
βbj
becomes t =
Se(βbj )
If under H0 , we find this value t to be unlikely to belong to a tn−3
distribution, then we will say that H0 was wrong
Rejecting H0 ⇐⇒ parameter b is significant
Not rejecting H0 ⇐⇒ parameter b is not significant

Dong Xuan Bach Econometrics 2023/2024 32 / 49


Example

Regression result in sample of 12 employees, in which wage depends on


experience (exp: year), education (edu: year), dummy of male

[ i = −4.9 + 0.41 expi + 0.83 edui + 1.2 malei , R 2 = 0.7575


wage
(4.38) (0.098) (0.299) (1.125)

(a) Interpret estimated slope and coefficient of determination


(b) At 5%, test for significant of slope
(c) Confidence 95% of significant slope
(d) Test hypothesis that slope of experience equals unit
(e) Test for hypothesis that slope of experience is less than slope of
education, and confidence interval of difference at 5%, knowing
covariance of estimated slopes is 0.001.

Dong Xuan Bach Econometrics 2023/2024 33 / 49


Some remarks

α = TypeIerror = P(H0 rejected|H0 is true)


β = TypeIIerror = P(H0 accepted|H1 is true)
α is the significance level, what is the intuition of α?
1 − β is the power of a test : it indicates how powerful a test is in
finding deviations from the null hypothesis H0
Lowering α =⇒ increasing β. Why?
Since we cannot minimize both, we set α as fixed (e.g. 5%) and try
to find the test that minimizes β for this given α

Dong Xuan Bach Econometrics 2023/2024 34 / 49


Some remarks

Dropping a useful variable can lead to non consistent estimates, while


keeping an unimportant variable only leads to loss in precision
Say we set α = 0.01 with a small sample size: then estimates are
likely to have a large variance
So even if the true parameter is not zero, its t-statistic is likely to be
small, thus failing to reject H0 although it is false
In that case, we might remove from the analysis a relevant variable
simply because we’ve been too stringent about the size of the test

Dong Xuan Bach Econometrics 2023/2024 35 / 49


Example

Suppose we are testing the hypothesis b = 0, while the true value is


b = 0.1
The probability that we reject the null (H0 ) depends on the standard
error of b̂, thus on the sample size
The larger the sample, the smaller the standard error so the more
likely we are to reject H0 .
Type II errors thus become increasingly unlikely when sample size
increases
We can thus decrease the size of the test α, e.g., 1%
Similarly, we can choose a size of 10% in small samples

Dong Xuan Bach Econometrics 2023/2024 36 / 49


Correlation and Estimated Coefficient

Model: y = β0 + β1 x1 + · · · + βk xk + u
Correlation of xk and y and estimated β
ck may has different sign
Added Variable plot
Regress y = β0 + β1 x1 + · · · + βk−1 xk−1 + u1 , gains e1
Regress xk = α0 + α1 x1 + · · · + αk−1 xk−1 + u2 , gains e2
Plot e2 on e1 → Added Plot, shows relationship of y versus xk
Partial Correlation

t(βck )
ry ,xk |x̸=k = q
ck ))2 + n − k − 1
(t(β

Dong Xuan Bach Econometrics 2023/2024 37 / 49


Prediction Interval

 
1
x ∗ 
 1
Forecast at x1 = x1∗ , . . . , xk = xk∗ or vector x ∗ =  . 
 .. 
xk∗
ck x ∗ = x ∗′ βb
Point estimate: yc∗ = βb0 + βb1 x1∗ + · · · + β k
q

Standard error: Se(pred) = s 1 + x (X X )−1 x ∗
∗ ′

Confidence interval

yc∗ ± t(n−k−1)α/2 Se(pred)

Dong Xuan Bach Econometrics 2023/2024 38 / 49


Inference with F-distribution
Testing for reducing model
Full model
y = β0 + β1 x1 + · · · + βk xk + u
Reduced model, after remove p explanatory variable

y = β0 + β1 x1 + · · · + βk−p xk−p + u

Hypotheses
H0 : βk−p+1 = · · · = βk = 0: Reduced model is correct
H1 : not H0 : Reduced model is not correct
Statistic
(RSSReduced − RSSFull )/p RSSReduced − RSSFull
Fstat = = 2
RSSFull /(n − k − 1) p × sFull

If Fstat > f(p,n−k−1)α then reject H0 .


Dong Xuan Bach Econometrics 2023/2024 39 / 49
Overall Significant Test

Other formula of Reducing test (if dependent variable is unchanged)


2 − R2
(RFull Reduced )/p
Fstat = 2
(1 − RFull )/(n − k − 1)

Most important F-test: for all slopes, i.e, p = k


H0 : β1 = · · · = βk = 0: model is overall insignificant
H1 : not H0 : model is overall significant
2 /k
RFull
Fstat = 2 )/(n − k − 1)
(1 − RFull
If Fstat > f(k,n−k−1)α then reject H0 .

Dong Xuan Bach Econometrics 2023/2024 40 / 49


Linear Hypothesis Testing

Combine hypothesis: H0 : (β1 = 2 and β2 = 3): cannot using T-test


Called 2 restrictions, matrix present
    
1 0 β1 2
=
0 1 β2 3

General linear hypothesis (restriction) of coefficients C β = d


Hypotheses
( pair
H0 : C β = d
H1 : C β ̸= d
Number or ”=” is number of restrictions, is p

Dong Xuan Bach Econometrics 2023/2024 41 / 49


Linear Hypothesis Testing

Under H0 , full model → reduced model


Full model
y = β0 + β1 x1 + β2 x2 + beta3 x3 + u
Under hypothesis β1 = 2 and β2 = 3, reduced model is

y − 2x1 − 3x2 = β0 + beta3 x3 + u

(RSSReduced − RSSFull )/p


Fstat = , critical f(p,n−k−1)α .
RSSFull /(n − k − 1)

Dong Xuan Bach Econometrics 2023/2024 42 / 49


T-test and F-test

T-test for one restriction only,


H0 contains one “=”,
H1 can be ̸=, ¿, ¡
F-test for p restrictions, p can be larger than 1
H1 contains ̸= only
If T-test and F-test apply for same hypothesis then
Fstat = (tstat )2
fcrit = (tcrit )2 ,
T-test and F-test have same P-value

Dong Xuan Bach Econometrics 2023/2024 43 / 49


Example

Regression result in sample of 12 observations

[ i = −4.9 + 0.41expi + 0.83edui + 1.2malei


wage
R 2 = 0.7575, RSS = 22.95, s = 1.694

At significant level 5%
1. Test for significant of model
2. Remove variable male, then R 2 = 0.723, RSS = 26.202. Test for
removing male
3. Regress wage on exp, then R 2 = 0.52, RSS = 45.423. Test for
reducing model
4. Test hypothesis that sum of coefficient of exp and edu is 1, if reduced
model has R 2 = 0.6883, RSS = 24.597.

Dong Xuan Bach Econometrics 2023/2024 44 / 49


Example in R

Data or 12 employees: exp: experience (year); edu: education (year); male


= 1 for male, = 0 otherwise; wage

exp 1 2 2 3 4 5 7 10 10 12 15 16
edu 13 12 16 11 15 15 10 15 13 11 13 15
male 1 1 0 0 1 0 1 0 0 1 1 0
wage 6 6 12 6 11 8 8 10 11 10 15 13

Dong Xuan Bach Econometrics 2023/2024 45 / 49


Matrix calculation

exp <-c(1,2,2,3,4,5,7,10,10,12,15,16)
edu <-c(13,12,16,11,15,15,10,15,13,11,13,15)
male <- c(1,1,0,0,1,0,1,0,0,1,1,0)
wage <- c(6,6,12,6,11,8,8,10,11,10,15,13)
intercept <- c(rep(1,12))
explanatory<-data.frame(intercept, exp, edu, male)
X <-data.matrix(explanatory)
y <-data.matrix(wage)
beta <- solve(t(X) %*% X) %*% (t(X) %*% y)
beta

Dong Xuan Bach Econometrics 2023/2024 46 / 49


Matrix calculation

fitted <- X %*% beta # fitted value vector


resid <- y - fitted # residual vector
resid.SS <- t(resid) %*% resid # residual SS matrix
resid.SS <- as.vector(resid.SS) # convert to value
s.sq <- resid.SS/8 # regression variance
cov.beta <- s.sq* solve(t(X)%*% X) # covariance matrix
var.beta <- diag(cov.beta) # variance of coef
var.beta <- data.matrix(var.beta) # convert into matrix
se.beta <- sqrt(var.beta) # standard error
t.beta <- beta/se.beta # t-statistic
p.beta <- 2*(1-pt(abs(t.beta),8)) # P-value of t-test
TSS <- sum((wage - mean(wage))^2) # Total SS
R2 <- 1 - resid.SS/TSS # R-square
f <- (R2/3)/((1-R2)/8) # F-statistic
p.ftest <- pf(1-f,3,8) # P-value of F-test

Dong Xuan Bach Econometrics 2023/2024 47 / 49


Output

#output
reg1 <-lm(wage ~ exp + edu + male)
summary(reg1)

#variance-covariance matrix
round(vcov(reg1),4)

Dong Xuan Bach Econometrics 2023/2024 48 / 49


Linear hypothesis testing
Install packet AER

install.packages("AER")
library(AER)

Test hypothesis: βexp = 1

linearHypothesis(reg1,"exp = 1")

Test hypothesis: βexp + βedu = 1

linearHypothesis(reg1, "exp + edu = 1")

Testing for deleting 2 variables edu and male

reg1 <- lm(wage ~ exp + edu + male)


reg2 <- lm(wage ~ exp)
anova(reg1, reg2)

Dong Xuan Bach Econometrics 2023/2024 49 / 49

You might also like