Professional Documents
Culture Documents
What you will learn: This project will give you experience in handling data, take
economics to the data and presenting data. You are also going to apply OLS regression as
well as instrumental variable technique. We encourage you to use your knowledge of
economics to interpret the results.
While this document contains a number of questions, these should not be answered
individually but integrated into a coherent empirical part of an applied economics paper.
Please also include a short introduction that motivates your research. In addition add a part
on your empirical strategy, a description of the data, a discussion of results and tables and
a short conclusion.
All analysis should be done in Stata or R and documented in the do-file / R-script that you
should include as an appendix to your project document.
Background reading: Start by reading the paper on which the project is based on:
Alan Krueger estimates the effect of class size on student achievement using data from the
Tennessee Student-Teacher Achievement Ratio experiment. Over a four-year period,
starting in the fall of 1985, Project STAR randomly assigned elementary school students
and teachers to classes of different sizes.
2
Data: We provide you with two datasets that are part of the original data used in Krueger
(1999). The data set does not include all the information of the original data set and we
therefore needed to impute a few variables. Note that your project focuses on kindergarten
and first grade pupils such that a number of variables in this data set are not important for
your work. In particular, we only have reading and math scores as well as imputed class
identifiers and imputed class size. Note the particularity of the data concerning first grade
children: One part of grade one pupils are new STAR participants and on those we don’t
have information what they did in kindergarten. Another part of first grade pupils were
new STAR participants in kindergarten and staid in STAR the next year and therefore we
also have information for first grade.
There are two data sets:
star_id.dta:
- The identifier data set contains the following variables:
1. Data Preparation
a) Merge the two data sets star_id.dta and star_student_teacher in order create one data
set that contains the joint information on pupils, their teachers and the identifiers
concerning school and class. Think about what constitutes the unit of observation in
the data sets before merging. Save the data set after merging. Describe the data
structure which should cover the following topics:
o What is the format of the datasets (unit of observations, cross-sectional, panel
data etc.)
o How many observations and variables are there? Are there any missing
values? Note that you can concentrate on the kindergarten and first grade.
2. Descriptive Statistics
a) Create the following variables from the merged data: receipt of free/reduced price
lunch in kindergarten and first grade, white or Asian student race (indicator), age in
1985, an attrition indicator (as described in note d to Table 1) in kindergarten and first
grade and the average math and reading percentile test score (as described in footnote
11) in kindergarten and first grade. Present summary statistics for the full sample.
When producing the variables make sure to treat missing values correctly.
b) Calculate means of these six variables and class size by treatment status (small class,
regular class, and regular class with aide) for new STAR participants only in
kindergarten and first grade. Your answers should be consistent with those presented
in Table 1 of Krueger (1999).] Comment on the table.
c) Test the null hypothesis that there are not significant differences in the means of these
variables by treatment status. Give p-values in your answer. [Hint: First, generate
dummies for treatment status. In Stata, you would use the command regress on two
dummies for the treatment status and then the command test].Again, focus only on
new STAR participants. Comment on the results.
d) Create density graphs of the average math and reading percentile test score for small
and regular class sizes imitating Figure 1 of Krueger. Comment on the results.
3. Random Assignment
4
a) Some researcher claims that the tests conducted in (1) support the assertion that the
treatment was randomly assigned to students and teachers. Explain. If you do not
agree, give a short discussion.
b) In fact, the treatment was randomly assigned to students and teachers within schools.
For each of the variables employed in (2), test the null hypothesis that, conditional on
school of attendance, there are no significant differences across treatment status. Once
again, give the p-values in your answer. [Hint: In Stata, you would once again use the
command regress. xi is useful in creating dummy variables. In R you can use the
as.factor()function. Your results should be consistent with those presented in
Table 2 of Krueger.] Note, that you do not need to produce this table in an automated
way.
c) Are the results in (b) consistent with the fact that the treatment was randomly assigned
conditional on school of attendance? Explain.
4. Regression Estimates
a) Using as a dependent variable the average percentile math/reading test score
constructed in (2), produce regression results similar to those given in columns (1) to
(3) of Table 5 in Krueger (1999) for kindergarten and first grade. Note that for first
grade not only the new star participants should be considered but also those who
already participated in kindergarten. In your regressions, assume that the error terms
are iid. [Use the xi and regress commands or as.factor() function in R]. You
should not report the results on the school dummys.
b) Interpret the coefficients on the small class and regular/aide class indicators in the first
specification.
c) How do the coefficients on the small class and regular/aide class indicators change as
more covariates are added to the model? Would you say that there is strong evidence
of selection on observables?
5. Error structure
In (1) and (2), you (should have) conducted hypothesis tests under the assumption that
regression errors were independent and identically distributed.
5
Redo the regressions in (3a), but cluster the standard errors on teacher (classidk
classid1) [Use the cluster option to regress or coeftest() in R.] Are the
standard errors higher or lower than before? Is this what you would have expected?