Professional Documents
Culture Documents
Take-home test
answer guidelines
Table 1 describes a dataset taken from a survey of students sampled over 3 semesters –
spring and fall 2013 and spring 2014 – taking a Statistics course at a NYU. The data was
analysed using a linear regression model that explored the relationship between lecture
attendance and student performance. Table 2 reports the results of the regression analysis.
Questions:
The model predicts that controlling for all other variables 1 point change in the
satmath score is associated with a 0.011 pp change (in the same direction) in student’s
grade; controlling for all other vars in model prediction is that males get a grade 0.736
pp higher than female students.
(b) Compute the standard error for variable male. Is this variable statistically significant
(assume 5% significance level)? Cite the relevant statistics to justify your answer.
1
s.e.= estimated beta / t-stat = 0.736/0.8
(c) Why is variable hoursstudy included in the model if the objective is to assess the
impact on attendance on grades?
Hours is included to control for the amount of effort that a student puts into studying
for the course.
Further, if a student has missed some lectures (e.g. because of illness) they could make
up for it by increasing the amount of private study.
(d) According to the model, does lecture attendance improve students’ grades? Justify
your answer.
Starting with the main variables of interest, we can see that skip12 and skip34 are not
statistically significant, but that skip56 and skip 9+ are significant (at 5% sig. level),
andskip78 is borderline significant (t-statistic of -1.83; p-value 0.068).
The results tell us that if you skip five or more classes your grade will suffer. For
example, the coefficient on skip56 is -3.228, which can be interpreted as saying that
with 5 or 6 absences the student’s grade would be expected to fall by 3.23 percentage
points (compared to a student who did not miss any class).
In essence, statistically speaking it does not matter if you miss a small number of
classes (4 or less). But it matters if you skip a lot of classes.
This result seems plausible. Missing a few classes does not make a huge impact but
missing a lot of the course is usually a problem.
(e) How well does the model explain the data? Can you think of any possible omitted
variables?
The adjusted R2 for this model is 0.435 which means that 43.5% of the variance in the
dependent variable is explained by the model. This means that there are a number of
2
variables missing from the model that help to explain the variation in student grades
for a course of this type.
3
Table 1: Data description