Professional Documents
Culture Documents
Lecture 5 Regression
ET2013 Introduction to Econometrics
Giacomo Pasini
Lecture overview
1 Basics of Regression
2 Hypothesis testing
3 Interpreting results
4 Model specification
6 Penalized regression
Outline
1 Basics of Regression
2 Hypothesis testing
3 Interpreting results
4 Model specification
6 Penalized regression
Basics of Regression Hypothesis testing Interpreting results Model specification Robust standard errors Penalized regression
Univariate regression
Regression is the most common way of estimating the relationship between two
variables while controlling for others, allowing you to close back doors with those
controls
OLS fit a function linear in the parameters in order to explain Y with X :
Y = β0 + β1 X
.
OLS estimates β̂ 0 and β̂ 1 that minimize the sum of squared residuals
We can interpret β 1 as a slope. So, a one-unit increase in X is associated with a β 1
increase in Y .
cov (X ,Y )
with only one regressor, β̂ 1 = var (X )
Basics of Regression Hypothesis testing Interpreting results Model specification Robust standard errors Penalized regression
ε̂ = Y − Ŷ = Y − ( β̂ 0 + β̂ 1 X + β̂ 2 Z )
The error term is the difference between Y and the true best fit line in the
population:
ε = Y − ( β0 + β1 X + β2 Z )
Y = β0 + β1 X + β2 Z + ε
Basics of Regression Hypothesis testing Interpreting results Model specification Robust standard errors Penalized regression
Y = β0 + β1 X + β2 Z + ε
We estimate
Y = β0 + β1 X + β2 Z + ε
We estimate
Y = β0 + β1 X + β2 Z + ε
Univariate regression Y = β 0 + β 1 X + ε:
σ2
1
β̂ 1 ∼ N β 1 ,
n var (X )
So how can we make an OLS estimate’s sampling variation small?
1 We could shrink the standard deviation of the error term σ, i.e., make the model
predict Y more accurately
2 We could pick an X that varies a lot: it makes it easier to check for whether Y is
changing in the same way.
3 We could use a big sample so n gets big
Basics of Regression Hypothesis testing Interpreting results Model specification Robust standard errors Penalized regression
Multivariate regression Y = β 0 + β 1 X + β 2 Z + ε
2
β̂ 1 β1 σ ′ −1
∼ , (W W )
β̂ 2 β2 n
W contains both X and Z , so we divide by the variances AND covariances of X
and Z
Standard deviations of β̂ 1 and β̂ 2 are the square root of the diagonal elements of
σ 2 (W ′ W ) −1
n
Jargon: std deviation of a sampling distribution is often referred to as standard error
Basics of Regression Hypothesis testing Interpreting results Model specification Robust standard errors Penalized regression
Outline
1 Basics of Regression
2 Hypothesis testing
3 Interpreting results
4 Model specification
6 Penalized regression
Basics of Regression Hypothesis testing Interpreting results Model specification Robust standard errors Penalized regression
why did we want to know the OLS coefficient distribution? So we can use what we
observe to come to the conclusion that certain theoretical distributions are unlikely.
Thus given the assumptions we’ve made, we can use our estimate β̂ 1 to say that
certain population parameters β 1 are very unlikely
Example “it’s unlikely that the effect of X on Y is 13.2”, or (MUCH) more often
“it’s unlikely that the effect of X on Y is 0”
Basics of Regression Hypothesis testing Interpreting results Model specification Robust standard errors Penalized regression
In the previous example, at 95% significance level (α = 0.05) we accept the null
even if it wrong (type II error, false negative)
False positives, type I error: the null is rejected even if it’s true
Test result
True value is accept reject
true ✓ type I error
false type II error ✓
With lower α (0.05 < 0.10) / higher significance level (95%>90%):
Wider confidence interval for the null, i.e. wider acceptance region
higher probability of wrongly failing to reject the null hypothesis (type II error)
lower probability of wrongly rejecting the null hypothesis (type I error)
Type I and Type II error cannot be minimized simultaneously!
Basics of Regression Hypothesis testing Interpreting results Model specification Robust standard errors Penalized regression
Outline
1 Basics of Regression
2 Hypothesis testing
3 Interpreting results
4 Model specification
6 Penalized regression
Basics of Regression Hypothesis testing Interpreting results Model specification Robust standard errors Penalized regression
Regression tables
Coefficient estimates
In parenthesis, se of coeff (or, less
frequently, t-stat)
Significance stars
Num of observations
Goodness of fit measures
Basics of Regression Hypothesis testing Interpreting results Model specification Robust standard errors Penalized regression
Significance stars
These let you know at a glance whether the coefficient is statistically significantly
different from a null-hypothesis value of 0.
They’re a representation of the p-value (probability of being as far away from the
null hypothesis value (or farther) as our estimate actually is)
If the p-value is below α, that’s statistical significance (we reject the null of coeff
equal to 0).
Coeff of Number of Locations has ***: that means that if we had decided that
α = 0.01 (or higher), we would reject the null of β 1 = 0
Basics of Regression Hypothesis testing Interpreting results Model specification Robust standard errors Penalized regression
Measures of the share of the dependent variable’s variance that is predicted by the
model
Fist column: R 2 = 0.065 , 6.5% of the variation in Inspection Score is predicted by
the Number of Locations.
If we were to predict Inspection Score with Number of Locations, we’d be left with a
residual variable that has only 100 − 0.065 = 0.935, 93.5% of the variance of
Inspection Score.
Problem: adding any variable to a model always makes the R 2 go up by some small
amount, even if the variable doesn’t make any sense.
Adjusted R 2 : it only counts the variance explained above and beyond what you’d
get by just adding a random variable to the model.
Basics of Regression Hypothesis testing Interpreting results Model specification Robust standard errors Penalized regression
We have the “F-statistic”. Test stat for the null that all the coefficents in the model
(except the intercept/constant) are all zero at once. This is pretty useless...
RMSE: estimate of the standard deviation of the error term.
1 Take predicted Inspection Score based on OLS estimates and subtract from the actual
values to get a residual.
2 Calculate the standard deviation of that residual
3 Make a slight adjustment for the “degrees of freedom” (number of observations in the
data minus number of coefficients in the model).
4 If RMSE is big, the average errors in prediction for the model are big.
Several other measures (AIC, BIC...)
Basics of Regression Hypothesis testing Interpreting results Model specification Robust standard errors Penalized regression
These are measures of how well your dependent variable is predicted by your OLS
model
If you are after a causal effect of a variable on another, you are interested in
estimating specific coefficients well
If you are confident you’ve included the variables in your model necessary to identify
your treatment (i.e., you closed all back doors), then R 2 is of little importance.
Take home message: do NOT fixate on R 2 , you do not have to maximize it!
Basics of Regression Hypothesis testing Interpreting results Model specification Robust standard errors Penalized regression
If, based on what we learned in the first part of the book, we think we’ve identified
the causal effect of number of locations on inspector score, you can word it
accordingly
“a one-unit increase in number of locations decreases inspector score by −0.019.”
If you’re not sure you identified a causal relation, then regression coefficients are
partial correlations and measure associations
Basics of Regression Hypothesis testing Interpreting results Model specification Robust standard errors Penalized regression
Outline
1 Basics of Regression
2 Hypothesis testing
3 Interpreting results
4 Model specification
6 Penalized regression
Basics of Regression Hypothesis testing Interpreting results Model specification Robust standard errors Penalized regression
Binary variables
Binary variables (also called dummy variables) are common in social science: are
you a man or a woman? Are you Catholic or not? Are you married or not?
Especially important for causal analysis: the causes we tend to be interested in are
binary in nature. Did you get the treatment or not?
Binary variables can be included in regression models just as normal. So we can still
be working with
Y = β0 + β1 X + β2 Z + ε
but now X or Z (or both) can only take two values: 0 (“are not”/false) or 1
(“are”/true).
If the binary variable is a control variable, we’re just shutting off back doors that go
through the variable.
Basics of Regression Hypothesis testing Interpreting results Model specification Robust standard errors Penalized regression
β̂ 0 = 15: it is the expected mean of Sales when all the variables are zero: when
Winter is 0, we’re in Not Winter, and average sales in Not Winter is 15.
Basics of Regression Hypothesis testing Interpreting results Model specification Robust standard errors Penalized regression
Categorical variables
Categorical variables (e.g. Country you live in) can be recoded as a set of binary
variables
you live in Italy yes/no, etc
You need to exclude one category to avoid perfect multicollinearity with the
constant
e.g., you exclude US
Coefficients’ estimates are the difference between each category and the reference
category
Difference in average Y between Italy and US
If you want to know whether a categorical variable has a significant effect as a
whole, you look at all the category coefficients. This takes the form of a “joint F
test”
Basics of Regression Hypothesis testing Interpreting results Model specification Robust standard errors Penalized regression
Polynomials
Y = β0 + β1 X + β2 X 2 + β3 X 3
∂Y
= β 1 + 2β 2 X + 3β 3 X 2
∂X
The effect of X on Y dependes on the specific value of X, i.e. varies with X
Basics of Regression Hypothesis testing Interpreting results Model specification Robust standard errors Penalized regression
Example Polynomials
For a restaurant with just one
branch, addign a second one would
be associated with a
−0.0802 + 2 × 0.0001(1) = −0.08
reduction in inspection score
But for a chain with a 1000
restaurant adding one would
increase the score by
−0.0802 + 2 × 0.0002(1) =
0.1198
In these cases two things to keep in mind:
What is the support
of X? I.e., are all values for X equally plausible?
Compute se ∂Y ∂X : if the effect is both positive and negative, it cannot be always
significantly different from zero!
Basics of Regression Hypothesis testing Interpreting results Model specification Robust standard errors Penalized regression
In this case, at the min (1) the marginal effect is negative, at max (646) is positive
Still, at 75th percentile it is still negative.
For most values the effect is negative
Basics of Regression Hypothesis testing Interpreting results Model specification Robust standard errors Penalized regression
We saw that a variable is “right skewed” if there are a whole lot of observations with
low values and just a few observations with really high values. Income is an example
In this case, you might want to reduce the skew and run Y = β 0 + β 1 log(X ) + ε
Another reason you might want to use log(X ) rather than X in the regression is
because your DGP has logs in it
We model firms behavior. A common choice is to use a Cobb-Douglas for the
production function
Y = αK β Lγ
A natural nonlinear econometric model meant to bring it to the data is
Y = αK β Lγ + u
Y = αK β Lγ + u ≡ (αK β Lγ )(1 + ε)
Y = αK β Lγ e ε
taking logs of both sides we obtain the loglinear regression model, which is linear
in the parameters and in the logs of all the variables
Elasticities
∂Y K ∂ log Y
· ≈ =β
∂K Y ∂ log K
In the linear model the elasticity is given by
∂Y K K
· = β·
∂K Y Y
The linear model implies non-constant elasticities, while the loglinear model
imposes constant elasticities. This is ok if the underlying model is a Cobb-Douglas,
but it is not always desirable.
Basics of Regression Hypothesis testing Interpreting results Model specification Robust standard errors Penalized regression
Semi–Elasticities
If the log of the dependent variable is regressed on a right hand side variable
expressed in levels, its coefficient measures the expected relative change in Y due
to an absolute change in X . This is a semi-elasticity.
For example, if X is a dummy for males, β = 0.10 tells us that the relative wage
differential between men and women is 10%.
Basics of Regression Hypothesis testing Interpreting results Model specification Robust standard errors Penalized regression
Interaction Terms
What if the relationship between Y and X differs based on the value of a different
variable Z ?
Example, what’s the relationship between the price of gas and how much an
individual chooses to drive?
For people who own a car, that relationship might be quite strong and negative.
For people who don’t own a car, that relationship is probably near zero.
For people who own a car but mostly get around by bike, that relationship might be
quite weak.
Solution: include the interaction X × Z and also Z in the regression:
Y = β 0 + β 1 X + β 2 Z + β 3 XZ + ε
Basics of Regression Hypothesis testing Interpreting results Model specification Robust standard errors Penalized regression
Outline
1 Basics of Regression
2 Hypothesis testing
3 Interpreting results
4 Model specification
6 Penalized regression
Basics of Regression Hypothesis testing Interpreting results Model specification Robust standard errors Penalized regression
Sampling distribution of β̂
Y = β0 + β1 X + ε
We obtain β̂ 1 and se ( β̂ 1 )
If what we assume on ε is correct, we then know the sampling distribution of β̂ 1 ,
and we can do inference (i.e. do hypothesis testing and say something on the
population, not only on the sample)
Assumptions on ε distribution are crucial!
We assumed that:
ε is Normally distributed
ε is Indipendently and identically distributed (IID)
Basics of Regression Hypothesis testing Interpreting results Model specification Robust standard errors Penalized regression
Normality
If ε is not normally distributed, we can invoke the Law of Large Numbers (LLN)
and the Central Limit Theorem (CLT)
If ε is IID but not normally distributed, then asymtotically (i.e., if the sample is
large enough) β̂ 1 is normally distributed.
So, normality is not really crucial
Basics of Regression Hypothesis testing Interpreting results Model specification Robust standard errors Penalized regression
IID
ε must be Independent:
unrelated to the error terms of other observations
unrelated to the other variables for the same observation
..and Identically Distributed:
the error term might be different from observation to observation but it will always
have been drawn from the same distribution.
Basics of Regression Hypothesis testing Interpreting results Model specification Robust standard errors Penalized regression
Autocorrelation
Error terms are correlated with each other in some way.
This commonly pops up with data over multiple time periods (temporal
autocorrelation) or data that’s geographically clustered (spatial autocorrelation).
Temporal autocorrelation: you’re regressing the US unemployment rate growth on
year.
Heteroskedasticity
the variance of the error term’s distribution is related to the variables in the model.
Example: you regress how many Instagram followers someone has on the amount of
time spent posting daily.
If the IID assumtion is not respected, OLS estimates of the parameterts β̂ are still
normally distributed in big samples (i.e., LLN and CLT still applies)
But se ( β 1 ) is no more √ σ
nvar (X )
In multivariate regression, the Variance and Covariance matrix of the vector of OLS
2
coefficients estimates is no more σn (W ′ W )−1
Solution: instead of √ σ we use a “sandwich estimator”: the individual values
nvar (X )
of X that go into var (X ) calculation are scaled up or down, or even be multiplied
by other observations’ X values
Basics of Regression Hypothesis testing Interpreting results Model specification Robust standard errors Penalized regression
Research question: the effect of providing laptops to students on their test scores.
A given classroom of students isn’t just given laptops or not, they are also in the
same classroom with the same teacher.
The test scores of classmates will be similar based on having laptops (captured by
the regressor), and because of the similar environments they face (in the error term)
Their errors will be correlated with each other: these errors are clustered within
classroom.
Liang-Zeger standard errors:
explicitly specify a grouping, such as classrooms.
Then, lets X values interact with other X values in the same group
Basics of Regression Hypothesis testing Interpreting results Model specification Robust standard errors Penalized regression
Outline
1 Basics of Regression
2 Hypothesis testing
3 Interpreting results
4 Model specification
6 Penalized regression
Basics of Regression Hypothesis testing Interpreting results Model specification Robust standard errors Penalized regression
It’s not uncommon to end up with Causal Diagrams that leave us with far too
many candidates for inclusion as controls.
What do we do if we have thirty, fifty, a thousand potential control variables that
might help us close a back door?
Including all of them would be a statistical mess
We probably want to drop some of those controls. But which ones?
We’d like some sort of model selection procedure that would do the choosing for us.
Basics of Regression Hypothesis testing Interpreting results Model specification Robust standard errors Penalized regression
LASSO regression
LASSO picks them still minimizing SSR, but also make the function of β small
n o
argminβ ∑ Y − Ŷ + λ ∑ | β|
2
Effect of penalization:
in LASSO β̂ is big only if it really helps to reduce SSR
Sends a lot of coefficients to zero, dropping the controls
Basics of Regression Hypothesis testing Interpreting results Model specification Robust standard errors Penalized regression
Choice of λ
LASSO is good to select the regressors, then to estimate the coefficients is better
to use OLS on the subset of the selected controls
If your Causal Diagram tells you a control closes a very important back door and
LASSO drops it...well put it back into the regression!
Since the size of coefficients is sensitive to the scale of the parameters, standardize
them before running LASSO for selection, then in the OLS regression you use the
original ones, if you prefer
X − mean(X )
sd (X )