You are on page 1of 5

1

ECONM1008 Applied Economics Project Option 1


Class Size and Student Achievement
2019

What you will learn: This project will give you experience in handling data, take
economics to the data and presenting data. You are also going to apply OLS regression as
well as instrumental variable technique. We encourage you to use your knowledge of
economics to interpret the results.

While this document contains a number of questions, these should not be answered
individually but integrated into a coherent empirical part of an applied economics paper.
Please also include a short introduction that motivates your research. In addition add a part
on your empirical strategy, a description of the data, a discussion of results and tables and
a short conclusion.

All analysis should be done in Stata or R and documented in the do-file / R-script that you
should include as an appendix to your project document.

Background reading: Start by reading the paper on which the project is based on:

- Krueger, Alan B. (1999): Experimental Estimates of Education Production


Functions, The Quarterly Journal of Economics, 114, 497-532.

Alan Krueger estimates the effect of class size on student achievement using data from the
Tennessee Student-Teacher Achievement Ratio experiment. Over a four-year period,
starting in the fall of 1985, Project STAR randomly assigned elementary school students
and teachers to classes of different sizes.
2

Data: We provide you with two datasets that are part of the original data used in Krueger
(1999). The data set does not include all the information of the original data set and we
therefore needed to impute a few variables. Note that your project focuses on kindergarten
and first grade pupils such that a number of variables in this data set are not important for
your work. In particular, we only have reading and math scores as well as imputed class
identifiers and imputed class size. Note the particularity of the data concerning first grade
children: One part of grade one pupils are new STAR participants and on those we don’t
have information what they did in kindergarten. Another part of first grade pupils were
new STAR participants in kindergarten and staid in STAR the next year and therefore we
also have information for first grade.
There are two data sets:
star_id.dta:
- The identifier data set contains the following variables:

newid new student id for public star datasets


sysidkn sch(ool) system id-k (new) kindergarten
sysid1n sch system id-g1 (new) 1st grade
classidk ID of kindergarten class (imputed)
classid1 ID of grade 1 class (imputed)
star_student_teacher.dta:
- The student and teacher data set contains among others:
newid new student id for public star datasets
ssex student sex
srace student race
sbirthy year of students birth
stark attend project star class in kindergarten
star1 attend project star class in 1st grade
cltypek classroom type in kindergarten
cltype1 classroom type in first grade
sesk socio-economic status in kindergarten (measured by free
lunch)
ses1 socio-economic status in 1st grade (measured by fee
lunch)
3

1. Data Preparation

a) Merge the two data sets star_id.dta and star_student_teacher in order create one data
set that contains the joint information on pupils, their teachers and the identifiers
concerning school and class. Think about what constitutes the unit of observation in
the data sets before merging. Save the data set after merging. Describe the data
structure which should cover the following topics:
o What is the format of the datasets (unit of observations, cross-sectional, panel
data etc.)
o How many observations and variables are there? Are there any missing
values? Note that you can concentrate on the kindergarten and first grade.

2. Descriptive Statistics

a) Create the following variables from the merged data: receipt of free/reduced price
lunch in kindergarten and first grade, white or Asian student race (indicator), age in
1985, an attrition indicator (as described in note d to Table 1) in kindergarten and first
grade and the average math and reading percentile test score (as described in footnote
11) in kindergarten and first grade. Present summary statistics for the full sample.
When producing the variables make sure to treat missing values correctly.
b) Calculate means of these six variables and class size by treatment status (small class,
regular class, and regular class with aide) for new STAR participants only in
kindergarten and first grade. Your answers should be consistent with those presented
in Table 1 of Krueger (1999).] Comment on the table.
c) Test the null hypothesis that there are not significant differences in the means of these
variables by treatment status. Give p-values in your answer. [Hint: First, generate
dummies for treatment status. In Stata, you would use the command regress on two
dummies for the treatment status and then the command test].Again, focus only on
new STAR participants. Comment on the results.
d) Create density graphs of the average math and reading percentile test score for small
and regular class sizes imitating Figure 1 of Krueger. Comment on the results.

3. Random Assignment
4

a) Some researcher claims that the tests conducted in (1) support the assertion that the
treatment was randomly assigned to students and teachers. Explain. If you do not
agree, give a short discussion.
b) In fact, the treatment was randomly assigned to students and teachers within schools.
For each of the variables employed in (2), test the null hypothesis that, conditional on
school of attendance, there are no significant differences across treatment status. Once
again, give the p-values in your answer. [Hint: In Stata, you would once again use the
command regress. xi is useful in creating dummy variables. In R you can use the
as.factor()function. Your results should be consistent with those presented in
Table 2 of Krueger.] Note, that you do not need to produce this table in an automated
way.
c) Are the results in (b) consistent with the fact that the treatment was randomly assigned
conditional on school of attendance? Explain.

4. Regression Estimates
a) Using as a dependent variable the average percentile math/reading test score
constructed in (2), produce regression results similar to those given in columns (1) to
(3) of Table 5 in Krueger (1999) for kindergarten and first grade. Note that for first
grade not only the new star participants should be considered but also those who
already participated in kindergarten. In your regressions, assume that the error terms
are iid. [Use the xi and regress commands or as.factor() function in R]. You
should not report the results on the school dummys.
b) Interpret the coefficients on the small class and regular/aide class indicators in the first
specification.
c) How do the coefficients on the small class and regular/aide class indicators change as
more covariates are added to the model? Would you say that there is strong evidence
of selection on observables?

5. Error structure
In (1) and (2), you (should have) conducted hypothesis tests under the assumption that
regression errors were independent and identically distributed.
5

Redo the regressions in (3a), but cluster the standard errors on teacher (classidk
classid1) [Use the cluster option to regress or coeftest() in R.] Are the
standard errors higher or lower than before? Is this what you would have expected?

6. Intent to Treat and Instrumental Variables


One problem with focusing on actual class type is that there were transitions between
class types after the first year of the program.
a) Krueger argues initial class assignment is both random (conditional on school) and
highly correlated with actual class assignment in later years. The former was
addressed in question (3). For individuals who entered the program for the first time in
kindergarten, show that initial class assignment to small classes is a good predictor of
actual class assignment in first grade.
b) The conditions given in (a) are those necessary for initial class assignment to be a
valid instrument for actual class assignment. We will now only focus on the difference
between small classes as opposed to regular and regular with aid for grade one. Using
the Stata command ivregress 2sls or ivreg() in R, do a two-stage least
squares regression of average percentile math/reading test score on small class type
dummy, using initial small class type dummy as instruments. Present results for the
same three specifications as in (4a) (and (5)); cluster the standard errors. How do the
results compare to those in found in (5))? Do they suggest that non-random transitions
were a problem? Note that this analysis can only be conducted on the subgroup of
individuals in grade 1 that participated in STAR already in kindergarten.

You might also like