Problem Set 2 Solutions

Q1.
a) For column 1:
The first point estimate indicates that the proportion of pupils with moderate to heavy infection
is 25 percentage points lower in Group 1 schools than Group 2 schools in early 1999 and this
effect is statistically significant at 99 percent confidence.
The second point estimate indicates that for each additional thousand pupils attending Group 1
schools located within three kilometers of a school is associated with 26 percentage points fewer
moderate-to-heavy infections, and this coefficient estimate is significantly different than zero at
99 percent confidence.
The third point estimate indicates that for each additional thousand pupils attending a Group 1
school located between three to six kilometers away is associated with 14 percentage points
fewer moderate-to-heavy infections, which is smaller than the effect of pupils within three
kilometers, as expected, and is significantly different than zero at 95 percent confidence.
The fourth point estimate indicates that for each additional thousand pupils attending schools in
all groups (both treatment and control) located within three kilometers of a school is associated
with 11 percentage points greater moderate-to-heavy infections and is significantly different
than zero at 99 percent confidence.
The fifth point estimate indicates that for each additional thousand pupils attending schools in
all groups ((both treatment and control) located within three to six kilometers away is
associated with 13 percentage points greater moderate-to-heavy infections and is significantly
different than zero at 95 percent confidence.
b) For column 2:
The first point estimate indicates that within schools in Group 1, that is those schools receiving
deworming treatment, the within-school externality effect is a 12 percentage point reduction in
the proportion of moderate-to-heavy infections. In other words, within schools in Group 1 those
children who for various reasons did not receive deworming medication had a rate of
moderate-to-heavy infection 12 percentage points lower than they would have had without the
intervention. This estimate is significant at the 90% confidence level.
The second point estimate indicates that for each additional thousand pupils attending Group 1
schools located within three kilometers of a school, the within-school externality effect (i.e.
within schools in Group 1 those children who for various reasons did not receive deworming
medication) is associated with 26 percentage points fewer moderate-to-heavy infections and this
coefficient estimate is significantly different than zero at 95 percent confidence.
The third point estimate indicates that for each additional thousand pupils attending Group 1
schools located located between three to six kilometers away, the within-school externality
effect (i.e. within schools in Group 1 those children who for various reasons did not receive
deworming medication) is associated with 13 percentage points fewer moderate-to-heavy
infections, and this coefficient estimate is significantly different than zero at 95 percent
confidence.
The fourth point estimate indicates that for each additional thousand pupils attending schools in
all groups (both treatment and control) located within three kilometers of a school, the
within-school externality effect (i.e. within schools those children who for various reasons did
not receive deworming medication) is associated with 11 percentage points greater
moderate-to-heavy infections and is significantly different than zero at 90 percent confidence.
The fifth point estimate indicates that for each additional thousand pupils attending schools in
all groups ((both treatment and control) located within three to six kilometers away, the
within-school externality effect (i.e. within schools those children who for various reasons did
not receive deworming medication) , is associated with 13 percentage points greater
moderate-to-heavy infections and is significantly different than zero at 95 percent confidence.
The sixth point estimate indicates that if you offer pills to a school, the kids who will end up
accepting the pill tend to have already lower infection rates by 6 percentage points even in an
environment where nobody in the school had been offered the pill, and this is significantly
different than zero at 90 percent confidence.
The last point estimate indicates the additional direct effect of the deworming treatment which
is the difference in the treatment effect on the kids who do end up taking the pill have lower
infection rates by 14 percentage points,and this is significantly different than zero at 90 percent
confidence.
c) Using the results in column 1, the average effect of the intervention in treatment schools after
the first year---without counting spillover effects from neighboring schools would be negative 25
percentage points as the point estimate indicates that the proportion of pupils with moderate to
heavy infection is 25 percentage points lower in Group 1 schools than Group 2 schools in early
1999.
The moderate-to-heavy helminth infections among children in this area were 23 percentage
points (standard error 7 percentage points) lower on average in early 1999 as a result of health
spillovers across schools- over forty percent of overall moderate-to-heavy infection rates in
Group 2 schools. The average spillover gain is the average number of Group 1 pupils located
within three kilometers divided by 1000 (N03) times the average effect of an additional 1000
Group 1 pupils located within three kilometers on infection rates (703), plus the analogous
spillover effect due to schools located between three to six kilometers away from the school.
Based on the externality estimates in column 1, this implies the estimated average cross-school
externality reduction in moderate-to-heavy helminth infections is [703 * N03, + 736 * N36,1] =
[0.26 * 454 + 0.14 * 802]/1000 = 0.23 or 23 percentage points (by plugging in numbers
mentioned in the paper).
d) It was important for the authors to know which children in Group 2 would accept the
treatment when it was offered to them in the next school year as it indicates the kids who end
up taking the pill are not on average the same as those kids who don't end up taking the pill. It
helps the authors understand which types of kids are endogenously selecting into taking the pill.
The results indicate that if you offer pills to a school, the kids who will end up accepting the pill
tend to have already lower infection rates by 6 percentage points even in an environment where
nobody in the school had been offered the pill. It is important for authors to know so that they
know that these children have not been randomly assigned.
Q2.
a) The authors need to use an instrument for trade openness, instead of simply regressing a
measure of income on a measure of trade openness since it is hard to establish a causal
relationship between the two as countries whose incomes are high for reasons other than trade
may trade more. This will result in a bias in the OLS regression since there is a positive
correlation between trade and the error term. It is important to use an instrument for trade
openness as this instrument will have important effects on trade while being plausibly
uncorrelated with other determinants of income, and hence can be used to determine a causal
relationship between trade openness and income.
b) I would expect the coefficient to be an overestimate of the causal impact of trade on income.
There are four main reasons for this. Firstly, countries that adopt free-trade policies are likely to
adopt other policies that raise income. Second, countries that are wealthy for reasons other than
trade are likely to have better transportation and infrastructure systems. Third, countries that
are poor for reasons other than low trade may lack the institutions and resources needed to tax
domestic economic activity, and may thus have to rely on tariffs to finance government
spending. Lastly, increases in income from sources other than trade may increase the variety of
goods that households demand and shift the composition of their demand away from basic
commodities towards more processed, lighter weight goods.
c) The authors use a country’s geographical proximity to other countries as an instrument for trade
openness. In order to construct this instrumental variable to test for the causal effect of trade on
income, the authors first estimate a bilateral trade equation and then aggregate the fitted values
of the equation to estimate a geographic component of a countries’ overall trade. In contrast to
conventional gravity equations, for bilateral trade, the trade equation only includes geographic
characteristics: countries’ sizes, their distances from each other, whether they share a border,
and whether they are landlocked. This ensures that the instrument depends only on countries’
geographic characteristics, not on their incomes or actual trading patterns. These geographical
characteristics are important determinants of countries’ overall trade.
d) The assumptions that need to hold true are as follows:

i) Instrument Relevance: The instrumental variable needs to be correlated with the
independent variable. This means that the countries’ geographical characteristics' need
to be correlated with overall trade.
ii) Instrument Exogeneity: The instrumental variable cannot be correlated with anything
unobserved (which we cannot control for!) that affects the independent variable. This
restriction implies that the error term cannot be correlated with the instrumental
variable. This means that countries’ geographical characteristics cannot be correlated
with anything unobserved that affects the countries’ overall trade and that the error
term cannot be correlated with the countries’ geographical characteristics.
iii) Instrument Exclusion Restriction: This restriction implies that the dependent variable is
only affected by the instrument through its effect on the independent variable. Put
differently the instrumental variable cannot have a direct effect on the dependent
variable once we control for the effect through the independent variable. This implies
that countries’ income is only affected by the countries’ geographical characteristics
through its effect on the countries’ trade.
e) Figure 1 shows that geographic variables account for a major part of the variation in overall
trade. The correlation between T and Tˆ is 0.62. Table 1 shows that distance has a large and
overwhelmingly significant negative impact on bilateral trade; the estimated elasticity of trade
with respect to distance is -0.85. Trade between country i and country j is strongly increasing in
j’s size; the elasticity with respect to j’s population is about 0.6. In addition, trade (as a fraction of
i’s GDP) is decreasing in i’s size by about -0.24 and in j’s area by about -0.19. And if one of the
countries is landlocked, trade falls by about -0.36.Because only a small fraction of country pairs
share a border, the coefficients on the common border variables are not estimated precisely.
Nonetheless, the point estimates imply that sharing a border has a considerable effect on trade.
Evaluated at the mean value of the variables conditional on sharing a border, the estimates imply
that a common border raises trade by a factor of 2.2. The estimates also imply that the presence
of a common border alters the effects of the other variables substantially. For example, the
estimated elasticity with respect to country j’s population across a shared border is 0.47 rather
than 0.61, and the estimated elasticity with respect to distance is -0.70 rather than -0.85. Most
importantly, it shows that geographic variables are major determinants of bilateral trade. The R2
of the regression is 0.36 hence satisfying the assumption of instrument relevance.
f) Table 3 reports the regressions. Column (1) is an OLS regression of log income per person on a
constant, the trade share, and the two size measures. The regression shows a statistically and
economically significant relationship between trade and income. The t-statistic on the trade
share is 3.5; the point estimate implies that an increase in the share of one percentage point is
associated with an increase of 0.9 percent in income per person. The regression also suggests
that, controlling for international trade, there is a positive (though only marginally significant)
relation between country size and income per person; this supports the view that within-country
trade is beneficial. The point estimates imply that increasing both population and area by one
percent raises income per person by 0.1 percent.
Column (2) reports the IV estimates of the same equation. The trade share is treated as
endogenous, and the constructed trade share is used as an instrument.15 The coefficient on
trade rises sharply. That is, the point estimate suggests that examining the link between trade
and income using OLS understates rather than overstates the effect of trade. The estimates now
imply that a one-percentage-point increase in the trade share raises income per person by 2.0
percent. In addition, the hypothesis that the IV coefficient is zero is marginally rejected at
conventional levels (t = 2.0). The coefficient is much less precisely estimated under IV than under
OLS, however. As a result, the hypothesis that the IV and OLS estimates are equal cannot be
rejected (t = 1.2). Moving from OLS to IV also increases the estimated impact of country size. The
estimated effect of raising both population and area by one percent is now to increase income
per person by almost 0.3 percent. This estimate is marginally significantly different from zero (t =
1.8). One interesting aspect of the results concerning size is that the coefficient on area is
positive. One might expect increased area, controlling for population, to reduce within-country
trade and thus lower income. One possibility is that the positive coefficient is due to sampling
error: the t-statistic on area is slightly less than one. Another is that greater area has a negative
impact via decreased within-country trade, but a larger positive impact via increased natural
resources. It is because of this possibility that we focus on the sum of the coefficients on log
population and log area in our discussion. As described above, this sum shows the effects of
increased size with population density held constant. In addition, as we show below, using
population alone to measure size has no major impact on the results.
g) The point estimate suggests that examining the link between trade and income using OLS
understates rather than overstates the effect of trade which is the opposite of my answer in part
b. The IV estimates of trade’s impact on income are much larger than the OLS estimates, and are
marginally significantly different from zero. There are two leading explanations of the fact that
the IV estimate of trade’s impact exceeds the OLS estimate. The first is that it is due to sampling
variation. That is, although there is no reason to expect systematic correlation between the
instrument and the residual, it could be that by chance they are positively correlated. The
principal evidence supporting this possibility is that the differences between the IV and OLS
estimates, though quantitatively large, are well within the range that can arise from sampling
error. In our baseline regressions [columns (1) and (2) of Table 3], the t-statistic for the null
hypothesis that the OLS and IV estimates are equal is 1.2 ( p = 0.25). And in the variations on the
baseline regression that we consider, this t-statistic never exceeds 2, and it is almost always less
than 1.5. Moreover, in a few cases it is essentially zero. Thus, if one believes that theory provides
strong grounds for believing that OLS estimates are biased up, our IV estimates do not provide a
compelling reason for changing this belief. The second candidate explanation of the finding that
the IV estimates exceed the OLS estimates is that OLS is in fact biased down. The literal shipping
of goods between countries does not raise income. Rather, trade is a proxy for the many ways in
which interactions between countries raise income—specialization, spread of ideas, and so on.
Trade is likely to be highly, but not perfectly, correlated with the extent of such interactions.
Thus, trade is an imperfect measure of income-enhancing interactions among countries. And
since measurement error leads to downward bias, this would mean that OLS would lead to an
understatement of the effect of income-enhancing interactions.
Q3.
All code at the end of the document
a) We learn that there are a total of 1153 Mexican municipalities in the sample for the year
2000. In terms of average education levels and household incomes across municipalities, we can
see a vast spectrum. The average education levels range from approximately 2.5 years to about
12 years while household monthly incomes range from about $300 to $14,800. Hence, on
average these tend to vary for different municipalities
b) We can interpret that one year of education is associated with an increase of local monthly
household income by 18.3% on average. The t-statistic is 26.14 and we can conclude that the
coefficient is statistically significant at the 1% level.
c) Three plausible arguments as to why the point estimate in b) could be biased upwards or
downwards relative to the true causal effect of education on monthly household incomes is
because of the following:
a. Omitted Variable Bias- Omitted variable bias is the bias in the OLS estimator that arises when
the regressor is correlated with an omitted variable and the omitted variable is a determinant of
the dependent variable. For example, suppose workers have some unobservable "ability". We
would expect ability to be positively related to income and we would expect more able workers
to choose more education.
b. Reverse Causality- While changes in education can change income, changes in income can also
cause changes in average years of education.
c. Attenuation bias- There could be the possibility of attenuation bias caused due to
measurement error.
>d) The assumptions that need to hold true for the proportion of population speaking an
indigenous language to be used as a valid instrumental variable is as follows:
a. Instrument Relevance: The instrumental variable needs to be correlated with the independent
variable. In this case, we expect that speaking an indigenous language is associated with
attaining fewer years of education on average.
b. Instrument Exogeneity : The instrumental variable cannot be correlated with anything
unobserved that affects the independent variable. This restriction implies that the error term
cannot be correlated with the instrumental variable. This means that the proportion of the
population speaking an indigenous language cannot be correlated with anything unobserved
that affects the years of education attained on average.
c. Instrument Exclusion Restriction: This restriction implies that the dependent variable is only
affected by the instrument through its effect on the independent variable. Put differently the
instrumental variable cannot have a direct effect on the dependent variable once we control for
the effect through the independent variable. This implies that the proportion of the population
speaking an indigenous language cannot have a direct effect on the monthly household income
once we control for the effect through the years of education on average.
The concerns I have are that the assumptions of instrument exogeneity and exclusion restriction
are not valid. Suppose being indigenous is correlated with other variables such as the
discrimination of the labor market, which in turn is also correlated with income. Similar to
exogeneity, being indigenous probably affects income through channels other than education.
e) Yes, the assumption of instrument relevance is satisfied. Indigenous workers have on average
2.4 years less schooling and the coefficient is statistically significant at the 1% level.
f) We can interpret the IV regression such that one year of education is associated with an
increase of local monthly household income by 31% on average. We can conclude that the
coefficient is statistically significant at the 1% level. No, it is not as we expected as we were
expecting the "true" causal effect of education on income to be smaller than the OLS estimate
but in this case it is larger.
g) We can interpret the IV regression with controls such that one year of education is associated
with an increase of local monthly household income by 29.8% on average. We can conclude that
the coefficient is statistically significant at the 1% level. These concerns are relevant as they are
correlated. The results imply that the instrumental variable is not valid as the IV estimates are
extremely high in comparison to OLS estimates even when accounting for controls.
ps2
R Markdown
This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF,
and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the
output of any embedded R code chunks within the document. You can embed an R code chunk like this:
data = read.csv("Mexico_PS2.csv")
library(stargazer)
##
## Please cite as:
## Hlavac, Marek (2018). stargazer: Well-Formatted Regression and Summary Statistics Tables.
## R package version 5.2.2. https://CRAN.R-project.org/package=stargazer
library(lmtest)
## Loading required package: zoo
##
## Attaching package: ’zoo’
## The following objects are masked from ’package:base’:

##
## as.Date, as.Date.numeric
library(sandwich)
library(car)
## Loading required package: carData
library(ivreg)
stargazer(data, type="text",
title="Summary Statistics",
summary.stat = c("n", "mean", "sd", "min","max"),
out="Summary Table.txt")
1
##
## Summary Statistics
## ===========================================================
## Statistic N Mean St. Dev. Min Max
## -----------------------------------------------------------
## year 1,153 2,000.000 0.000 2,000 2,000
## municode 1,153 18,168.720 8,045.480 1,001 32,056
## inc_m 1,153 2,583.081 1,257.101 299.757 14,792.910
## ind_lang 1,153 0.114 0.221 0.000 0.984
## educ_years 1,153 7.031 1.354 2.488 11.870
## sales_hotel 1,153 27,120.140 201,033.400 1 5,229,616
## logtemp 1,153 5.283 0.212 4.658 5.671
## logprecip 1,153 4.304 0.562 1.790 5.746
## dist_us_km 1,153 696.171 278.301 6.609 1,348.003
## -----------------------------------------------------------
data$l_inc=log(data$inc_m)
reg_qa = lm(l_inc ~ educ_years , data=data)

reg_qa_robust = coeftest(reg_qa , vcov = vcovHC(reg_qa, type="HC1"))
stargazer(reg_qa_robust,
type = "text",
omit.stat=c("LL","ser","f","adj.rsq"))
##
## ======================================
## Dependent variable:
## ---------------------------
##
## --------------------------------------
## educ_years 0.183***
## (0.007)
##
## Constant 6.485***
## (0.052)
##
## ======================================
## ======================================
## Note: *p<0.1; **p<0.05; ***p<0.01
reg_qe = lm(educ_years ~ ind_lang, data= data)

reg_qe_robust = coeftest(reg_qe, vcov = vcovHC(reg_qe, type="HC1"))
stargazer(reg_qe_robust,
type = "text",
##
## ====================================
## ---------------------------
##
## ------------------------------------
2
## ind_lang -2.421***
## (0.149)
##
## (0.042)
##
## ====================================
## ====================================
## Note: *p<0.1; **p<0.05; ***p<0.01
reg_qf = ivreg(l_inc ~ educ_years | ind_lang, data = data)

reg_qf_robust = coeftest(reg_qf, vcov = vcovHC(reg_qf, type="HC1"))
stargazer(reg_qf_robust,
type = "text",
##
## ======================================
## ---------------------------
##
## --------------------------------------
## (0.021)
##
## (0.148)
##
## ======================================
## ======================================
## Note: *p<0.1; **p<0.05; ***p<0.01
reg_qg_controls = ivreg(l_inc ~ educ_years + dist_us_km + logtemp + logprecip + sales_hotel | ind_lang +

reg_qg_controls_robust = coeftest(reg_qg_controls, vcov = vcovHC(reg_qg_controls, type="HC1"))
stargazer(reg_qg_controls_robust,
type = "text",
##
## =======================================
## ---------------------------
##
## ---------------------------------------
## (0.032)
##
## dist_us_km -0.0002***
## (0.00005)
##
## logtemp 0.102*
## (0.057)
##
3
## logprecip 0.005
## (0.032)
##
## sales_hotel -0.00000
## (0.00000)
##
## (0.499)
##
## =======================================
## =======================================
## Note: *p<0.1; **p<0.05; ***p<0.01

Problem Set 2 Solutions

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Problem Set 2 Solutions

Uploaded by

Copyright:

Available Formats

Q1.

d) The assumptions that need to hold true are as follows:

All code at the end of the document

## R package version 5.2.2. https://CRAN.R-project.org/package=stargazer

## Loading required package: zoo

## The following objects are masked from ’package:base’:

## Loading required package: carData

reg_qa = lm(l_inc ~ educ_years , data=data)

reg_qe = lm(educ_years ~ ind_lang, data= data)

reg_qf = ivreg(l_inc ~ educ_years | ind_lang, data = data)

reg_qg_controls = ivreg(l_inc ~ educ_years + dist_us_km + logtemp + logprecip + sales_hotel | ind_lang +

You might also like