You are on page 1of 7

14.

74x Education Unit


Homework Assignment


Overview

We have provided this PDF copy of the assignment so that you can print and work through the assignment
offline. You can also go directly to complete the assignment online. If you choose to work through the
assignment using this PDF, please go back to the online platform to submit your answers based on the output
produced.

In R, you will need to run the following code to install and use the required packages for this assignment.

install.packages("dummies")
install.packages("AER")
library(dummies)
library(AER)

Unit 1: Instrumental Variables and the Return to Education

This problem deals with the INPRES school construction program discussed by Duflo (2001) in “Schooling and
Labor Market Consequences of School Construction in Indonesia: Evidence from an Unusual Policy
Experiment.” Please familiarize yourself with the paper before you begin. A link to the paper is available
through the course website. This question will cover many of the non-experimental methods commonly used
in economics. You will need to download the data inpres_data.csv (csv file) from the course website.

1) Run a regression of log monthly earnings on education. What is the estimated impact of an extra year
of education on log monthly earnings?
a. 11.40, significant at the 5% level
b. 2.72, significant at the 5% level
c. 0.7, not significant at the 5% level
d. 0.077, significant at the 5% level

2) True or false: This is a causal relationship.


a. True
b. False

Duflo estimates the causal effect of education on earnings in Indonesia by looking at the impact of a school
building policy. She uses a difference-in-differences strategy to find the effect of the policy on education and
earnings, and then uses the policy as an instrumental variable to estimate the causal effect of education on
earnings. We will replicate these steps. (You will get slightly different numbers because we are not worrying
about survey weights).

3) First we will estimate the impact of the program on educational attainment using difference in
differences. People born in 1968 or later were exposed to the program. Generate a dummy variable
that takes the value 1 if the person was born in these years and 0 otherwise. We will construct a
difference-in-differences table with four cells across low/high program intensity and old/young
(unexposed/exposed). Fill in the table with the average education level for that group. Your table
should be similar to Table 3 Panel A in the paper. For now, do not worry about standard errors. (Please
round to the nearest hundredth. For instance, 67.89 would be accepted if the correct answer is
67.8912.)

Old (Born Young (Born
Difference
before 1968) 1968 or later)
Low Intensity
(Control)
A B C
High Intensity
(Treatment)
D E F

Difference G H I

4) Consider only the regions where many schools were build due to INPRES (high intensity). What is the
average difference in outcomes between people who were young enough to take advantage of the
schools and those who were too old to benefit from the schools? (Please round to the nearest
hundredth).

Thought exercise: Under what assumptions would this be an unbiased estimate of the causal effect of
the program on education? Do you think this is an unbiased estimate in this case?


5) Now consider only people who were young enough to be exposed to the INPRES program. What is the
average difference in education between such people in regions with high and low program intensity?
(Please round to the nearest hundredth).

Thought exercise: Under what assumptions would this be an unbiased estimate of the causal effect of
the program on education? Do you think this is an unbiased estimate in this case?

6) What is the DD estimate of the effect of the program on education? (Please round to the nearest
hundredth).

Thought exercise: Under what assumptions would this be an unbiased estimate of the causal effect of
the program on education? Do you think this is an unbiased estimate in this case?







7) Now create a similar table, this time for log earnings instead of education. What is the DD estimate of
the effect of the program on log earnings? (Please round to the nearest thousandth. For example,
67.891 would be accepted if the correct answer is 67.8912.)

Old (Born Young (Born
Difference
before 1968) 1968 or later)
Low Intensity
(Control)
A B C
High Intensity
(Treatment)
D E F

Difference G H I



8) Table 3 Panel A presents the DD tables similar to those created in questions 3 and 7, comparing those
aged 2 to 6 in 1974 to those aged 12 to 17 in 1974. Panel B of table 3 performs the same exercises
comparing those aged 12-17 in 1974 to those aged 18-24 in 1974. Which of the following can be
concluded from the fact that the DD estimates are much closer to 0?
a. That the assumption of parallel trends among different populations holds up and DD is a valid
estimation strategy
b. That the findings from panel A were most likely found by chance
c. That the findings from panel A are not persistent across time
d. That the findings from panel A would be different if looking at a different and older population


9) Recall that DD can also be computed in regression form. Which of the following regression
specifications would provide the same estimate for the impact of the program on log earnings?
a. Log earnings = α + β*Young + γ*High Intensity + ε , where β + γ gives the impact estimate
b. Log earnings = α + β*Young + γ*High Intensity + ε , where γ gives the impact estimate
c. Log earnings = α + β*Young + γ*High Intensity + δ*Young*High Intensity + ε , where δ gives the
impact estimate
d. Log earnings = α + β*Young + γ*High Intensity + δ*Young*High Intensity + ε , where β + γ gives
the impact estimate


10) Recall the Wald estimator for instrumental variables, which is


Run the DD regressions for estimating the impact of the program on education. Run the DD regression
for estimating the impact of the program on log earnings. Using the resulting output, what is the Wald
estimate of the effect of earnings on education, using INPRES exposure as the instrument? (Please
round to three decimal places.)

Thought exercise: What additional assumptions (in addition to those mentioned in question 6) are
required to interpret the Wald estimate as a causal relationship? Do you think this is an unbiased
estimate in this case?


11) We can also compute the Wald estimate using IV regression. In an IV-DD setup, the two main effect
terms (i.e. program intensity and age of exposure) are included as controls, and the variable
representing the treatment (i.e. the interaction term) is used as the instrument. That is, the two stages
of the regression are:



where δ2 is the first-stage effect of the program on schooling and δ1 is our IV estimate of the causal
effect of schooling on log earnings. Hint: use ivreg() in R.

Run the regression corresponding to this Wald estimate and confirm that it returns the same value.
What is the coefficient, and what is the standard error? (Please provide your answer to three decimal
places.)


12) Thought exercise (no credit given): Duflo improves the precision of this estimate in two ways. First, she
takes advantage of the fact that there is more variation in the number of schools built than just
high/low intensity. Instead, she uses the full distribution of the number of schools in her instrumental
variable. Second, she controls for differences in cohort of birth, region of birth, and the number of
children in the region at the time of school building. Why might each of these actions improve the
precision of the result?


13) Next we will run an IV regression with a continuous experiment and controls as in the table titled
“Effect of education on labor market outcomes. OLS and 2SLS estimates” in the paper. (Table 7 in the
AER version, table 6 in the version linked to from the course webpage).

To do so, first generate the instrument by interacting the number of new schools built in one’s district
with the dummy for being of age to be exposed. Next generate control variables to remove unwanted
variation from the estimation. Generate the following control variables:
i. Dummy variables for each possible year of birth
ii. Dummy variables for each possible district of birth
iii. A variable interacting the dummy for age exposure with the number of children in the
district in 1971.

Run the IV regression on schooling with the instrument and the controls you have created (you do not
need to include any other variables). What is the coefficient? (Please round to three decimal places).

Thought exercise: What does the result tell us about the effect of an extra year of schooling on
earnings? Is it statistically significant? How does it compare to the relationship found in question 1?


Unit 2: Returns to education and education decisions

1) Suppose that the returns to education are given by the equation:

𝑦" = 𝑎" ∗ 𝑆"

Which of the following lines represents the returns to schooling, where returns are on the Y-axis and
years of schooling are on the X-axis for any given ability ai?





2) Imagine that the marginal cost of school increases with the number of years you are in school. Which
of the following sets of indifference curves represents this graphically, where the arrow indicates the
direction of increasing utility?






3) Which of the following are reasons that the cost of education may increase with years of education?
(Check all that apply)
a. Older children have a higher opportunity cost due to their ability to work
b. It may be more costly to teach more advanced subjects and/or older students
c. Governments are more likely to subsidize lower levels of education than higher levels of
education


4) Imagine that the cost of school as a function of years of schooling is given by:

If there are no financing constraints (i.e. parents can spend as much as they want on school, and if they
have to borrow they can pay it back when the child earns the returns), what level of schooling will
parents choose? Call this value Si*.


a. Si* = 2*ai
b. Si* = ai / 2
c. Si* = ai – 2Si
d. Si* = ai2


5) Suppose a parent is constrained to spend no more than ӯ on schooling. What will be her choice of
schooling when:



where Si* is the optimal amount of schooling chosen?

a. 𝑆" = ӯ
b. 𝑆" < ӯ
c. 𝑆" = ӯ


6) Now suppose that parents believe that the returns to education are in fact given by a piecewise
function:



Which of the following lines represents the perceived returns to schooling?


7) If parents are unconstrained, what level of schooling will they choose with these beliefs? How does
this relate to ability level?
a. Fewer years of schooling than before, regardless of ability level
b. Fewer years of schooling than before, as long as ability levels are not too low, in which case
zero years of schooling
c. The same level of schooling as before, regardless of ability level
d. The same level of schooling as before, as long as ability levels are not too low, in which case
fewer years of schooling


8) Now imagine parents face the spending constraints as in question 5 and have beliefs as defined in
question 6. Additionally, ai is sufficiently high such that Si* > S . What will be their choice of schooling
when ӯ < S2?
a. Parents will select Si*
b. Parents will optimize, spending as close to their optimum as they can
c. Si = 0


9) Consider a parent with beliefs as described in question 6, and two children of abilities a1 and a2 where
a1 > a2. Assume that a1 and a2 are both sufficiently large such that Si* > S for each child. Suppose this
parent faces no spending constraints. True or false: Both children will be educated equally.
a. True
b. False


10) Now imagine this parent faces spending constraints. In particular, assume she cannot spend more than
ӯ where S2 < ӯ < 2 * S2. True or false: Both children will receive at least some education, but child 1 will
receive more years of education than child 2.
a. True
b. False

You might also like