You are on page 1of 4

Economics 2P91 Fall 2022: Practice problem solutions

Question 1
Consider the following multiple regression model:

Yi = β0 + β1 X1i + β2 X2i + β3 X3i + ui (1)

(a) What is the minimum number of observations (sample size) required to obtain the OLS
estimates β̂1 , β̂2 , β̂3 and β̂4 ? To estimate the model using the OLS, we need more
observations that the number of coefficients (betas) in the model. In other words, it
must be that n > k. Since we have k = 4 coefficients in this model (β0 , β1 , β2 and β3 ),
we need at least n = 5 observations.

(b) Suppose that the regressors X1i and X3i are related as follows: X1i = 2X3i . Briefly
discuss the problem which this presents to the OLS estimation of this model. Since
there is an exact relationship between X1i and X3i , we have perfect multicollinearity,
and therefore we cannot estimate the model using OLS.

(c) What is the interpretation of the slope coefficients in multiple regression models? Slope
coefficients tell us the effect of a one-unit change in whichever independent variable
we’re interested in on Y , holding other regressors constant or controlling for the other
regressors.

Question 2
We saw that one way to account for an omitted variable is to estimate our regression of
interest for each value (subsample) of the omitted variable (like we did for socioeconomic
status in lecture 5). We also saw that we can the same thing by including the omitted variable
as another independent variable. What are the advantages of using multiple regression over
the subsample approach?

The are two main advantages. First, we end up with a single estimate for the effect of
interest instead of one for each subsample. Second, it would be much more difficult/tedious
to do this subsampling approach if our omitted variable was continuous but we don’t need
to worry about that with multiple regression.
Question 3
Imagine that you are working with a police department and are asked to investigate the
relationship between intimate partner violence (IPV) and whether or not their local hockey
team lost. You collect data for cities (i) with a hockey team in 2011 and are thinking about
the following regression model:

IP Vi = β0 + β1 teamlosti + ui (2)

(a) Your police liaison suggests that alcohol consumption is often a factor in IPV, do you
think that alcohol consumption per capita is an omitted variable? Explain why or why
not. (HINT: What two things do we need for omitted variable bias to be a

1
concern? Are they present here?) In order to be worried about alcohol consumption
being an omitted variable, it would need to be correlated with the outcome (IPV) and
the independent variable of interest (whether or not the local hockey team lost). For
our Y variable, one possibility is that alcohol consumption inhibits self-control which
makes it difficult to deescalate situations and find non-violent resolutions. There are
many other possibilities, what I want in an answer to a question like this is a plausible
reason for why our outcome and this possible omitted variable are correlated.
Then we need to think about whether there is another plausible explanation for why
alcohol consumption is correlated with whether or not your local sports team loses. In
this case, when the local team loses people may be upset an therefore are more likely to
drink alcohol to “drown their sorrows”.
Now we have established that there is reason to believe that alcohol consumption is
correlated with both our Y and our X variables and therefore the conditions for an
omitted variable are met and we should be concerned about it causes omitted variable
bias.

(b) Would excluding it result in a positive or negative bias for your estimate of β1 ? Explain.
(HINT: Have a look at slides 42-44 in Lecture 5. Think about the signs
(positive or negative) of β2 and cov(teamlost, alcoholconsumption) and explain
why those signs are they directions they are.)
To determine the sign of the bias, we need to think about the direction of the effect
of our omitted variable on our outcome (which is what β2 tells us in the slides) and
the direction of the correlation between the omitted variable and our main variable of
interest. In the explanation I gave above, alcohol consumption and IPV are positively
correlated and the local hockey team losing and alcohol consumption are also positively
correlated. Since both of these are positive, our estimate will be positively biased and
we will be overestimating the effect of the local hockey team losing on IPV.

(c) What are some additional omitted variables you might want to control for? There are
many things that we might want to control for—average income, and average education
level might be negatively related to IPV but positively related to alcohol consumption
(since if alcohol is a normal good, you might expect consumption to rise with income) and
poverty rates might have the opposite relationships with IPV (perhaps because lack of
income causes stress in the household which might lead to IPV) and alcohol consumption
(less income could suggest lower consumption of alcohol again using the normal good
argument). Of course, we could come up with explanations for why these relationships
might go in the opposite directions (more impoverished households might be more likely
to face depression or mental health challenges that could turn to alcohol consumption
as a coping mechanism, for example).
Another variable could be the betting odds which captures whether or not fans expected
the team to lose. This could be linked to IPV through the negative emotional shock that
happens when the team is “supposed” to win but instead loses (which would be a negative
correlation since we would expect more IPV when betting odds are less favourable to
the local team). The betting odds would be be more favourable when the team is less

2
likely to win, so there would be a negative correlation between this omitted variable and
our main variable of interest.
For questions like these, what I would like is for you to think about a variable that again
meets the conditions for an omitted variable and explain why this variable would be
correlated with Y and with X.

Question 4
Suppose that you are investigating which factors determine someone’s wages. You want
to know how the length of time a worker has been with the same firm (something we call
‘tenure’ and which here is measured in months) and how a firm’s size (number of employees)
are related to wages. That is, you are interested in the following regression model:

wagesi = β0 + β1 tenurei + β2 f irmsizei + ui (3)


and your estimates give you the following table:

Model 1
(Intercept) 15.568
(0.131)
tenure 0.044
(0.001)
firmsize 1.928
(0.044)
Num.Obs. 49 882
R2 0.123
F 3677.146

Table 1: Heteroskedastic-robust standard errors in parentheses.

(a) What is the interpretation of β1 and β2 ? β1 is the average effect of an additional month
of tenure on wages, holding firm size constant. β2 is the average effect of an additional
employee at an individual’s firm on wages, holding tenure constant.

(b) How many degrees of freedom do we have in this regression? Degrees of freedom is n − k
where k is the number of betas we have. Here we have n = 49, 882 and k = 3, so we
have 49, 882 − 3 = 49, 879 degrees of freedom.

(c) What is the t-critical value you would use to test

H0 : β1 = 0 (4)
H1 : β1 6= 0 (5)

at the 5% significance level? Is this a one-tailed or two-tailed test? Because we are


testing against the null hypothesis of β1 6= 0 this is a two sided/tailed test. We have
many more degrees of freedom than the ∼ 1,000 needed to reasonably use the t-critical
= 1.96 rule of thumb for the 95% confidence level.

3
(d) What is the 95% confidence interval for β1 ? The 95% confidence interval is (0.044 −
1.96 ∗ 0.001, 0.044 + 1.96 ∗ 0.001) which is (0.042, 0.046) when rounded.

(e) You want to know if either of these two variables influence wages. Write out the null
and alternative hypothesis you would be testing.

H0 : β1 = 0AN Dβ2 = 0 (6)


H1 : β1 6= 0AN D/ORβ2 6= 0 (7)

(f) After conducting the F-test, you compute a p-value of 0.00000000000000022. What can
you conclude about the joint effect of tenure and firm size on wages? This is a very,
very small p-value. We can compare to the 0.05 threshold for the 95% confidence level
and easily reject the null hypothesis that tenure and firm size are unrelated to wages.
In fact, we can do so for the 99% confidence level as well (p-threshold would be 0.01).
This means that we reject the null hypothesis that tenure and firm size are jointly zero,
at the 1% level. This tells us that the effect on wages of at least one of tenure and firm
size are different from zero, but it does not tell us which one.

You might also like