You are on page 1of 11

Research Methods II Mock Exam

Research Methods II
Mock Exam

name: ____________________________________________

surname: ____________________________________________

signature: ____________________________________________

Notice:  Phrase your answers in a clear and concise fashion.


 Make sure your handwriting is easily readable and all your graphs are
appropriately annotated
 Clearly indicate how solutions to technical problems were computed (do not only
provide final results)
 Only use the distributed test scripts; ask the exam supervisor for additional
copies if needed

Points Available and Obtained:

Problem Set Maximum points Obtained points

Question 1 10

Question 2 10

Question 3 10

Question 4 10

Question 5 15

Question 6 15

Total Points 70

1
Research Methods II Mock Exam

Question 1 (10 points):

Suppose that you are asked to conduct a study to determine whether smaller class sizes lead to
improved student performance of fourth graders.
a) If you could conduct any experiment you want, what would you do? Be specific.
b) More realistically, suppose you can collect observational data on several thousand
fourth graders in a given state. You can obtain the size of their fourth-grade class and
a standardized test score taken at the end of fourth grade. Why might you expect a
negative correlation between class size and test score?
c) Would a negative correlation necessarily show that smaller class sizes cause better
performance? Explain.

Solution:

a. Ideally, we could randomly assign students to classes of different sizes. That is, each
student is assigned a different class size without regard to any student characteristics
such as ability and family background. We would like substantial variation in class sizes
(subject, of course, to ethical considerations and resource constraints).
b. A negative correlation means that a larger class size is associated with lower performance.
We might find a negative correlation because a larger class size actually hurts
performance. However, with observational data, there are other reasons we might find a
negative relationship. For example, children from more affluent families might be more
likely to attend schools with smaller class sizes, and affluent children generally might score
better on standardized tests. Another possibility is that, within a school, a principal might
assign the better students to smaller classes. Or, some parents might insist their children
to be placed in smaller classes, and these same parents tend to be more involved in their
children’s education.
c. Given the potential for confounding factors – some of which are listed in b) – finding a
negative correlation would not be strong evidence that smaller class sizes actually lead to
better performance. Some way of controlling for the confounding factors is needed, and
this is the subject of multiple regression analysis.

2
Research Methods II Mock Exam

Question 2 (10 points):

In the linear consumption function

the (estimated) marginal propensity to consume (MPC) out of income is simply the slope, ^β 1.
Using observations for 100 families on annual income and consumption (both measured in
Euro), the following equation is obtained:

a. Interpret the estimated coefficent for inc. How das consumption vary with income?
b. Interpret the intercept in this equation, and comment on its sign and magnitude.
c. What is the predicted consumption when family income is €30,000?
d. How well does the model explain the relationship between consumption and income? What
might be some additional factors influencing consumption not considered in the model?

Solution:

a. Given that this is a level-level model an increase of 1 Euro in income gives rise to an increase
of 85.3 cents in consumption.
b. The intercept implies that when inc = 0, cons is predicted to be negative €124.84. This, of
course, cannot be true, and reflects the fact that this consumption function might be a poor
predictor of consumption at very low-income levels. This means, that only few observations
exhibit levels of income = 0 and thus the OLS regression does a poor job of predicting
consumption in this income range. On the other hand, on an annual basis, €124.84 is not so
far from zero.
c. Just plug 30,000 into the equation: = –124.84 + .853(30,000) = 25,465.16 Euro.
d. The variation of the independent variable income explains about 70% of the variation in the
dependent variable consumption. Thus, the goodness of fit is fairly high. Other factors not
considered in the model might be different levels of wealth (such as ownership of stocks and
real estate), also different levels of education and age might play a role.

3
Research Methods II Mock Exam

Question 3 (10 points)


For these questions, assume an econometrician has a data set containing these variables for 935
men:

Suppose the econometrician has produced the following R Output, some of which is blanked
out:

Call:
lm(formula = lwage ~ educ + exper + black + south, data = wage2)

Residuals:
Min 1Q Median 3Q Max
-1.94620 -0.22666 0.02388 0.25437 1.25937

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.685801 0.112097
educ 0.069563 0.006518
exper 0.018953 0.003220
black -0.185584 0.039064
south -0.114457 0.027237
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.383 on 930 degrees of freedom


Multiple R-squared: 0.1766, Adjusted R-squared: 0.1731
F-statistic: 49.88 on 4 and 930 DF, p-value: < 2.2e-16

4
Research Methods II Mock Exam

a) Write down the population model being estimated, in terms of the unknown
parameters, β0 β1, β2, β3, and β1.

b) The econometrician is not sure whether the variable black belongs in the model.
Assuming a 2- sided alternative, write down the null and alternative hypotheses in terms
of the unknown parameter. At the 5% level, can you reject that black does not belong?

c) In general, if you include a variable that does not belong, what will happen to your
estimated coefficients? To your estimated standard errors?

d) Give a careful interpretation of the coefficient on south.

e) Suppose the econometrician wanted to control for heavy drinking, but did not have
information on alcohol use. Suppose that heavy drinking is negatively correlated with
earnings and with experience, positively correlated with education and is not correlated
with race or region. Are all of the coefficients estimated above unbiased? Why or why
not?

Solutions:

a) lwage = β0 + β1educ + β2exper + β3black + β4south + u


b) The t statistic is – .1856/.0391 = -4.75. The critical value for a 2-sided t test with 930 degrees
of freedom. is 1.96. Since |t| > c, we can reject the null at the 5% level.
c) If you include a variable that does not belong, the coefficient estimates will remain unbiased.
Recall that β1 in a multiple regression will be the same as that from a simple regression if β2 = 0.
The standard errors are likely to increase with the addition of an insignificant variable, since the
R2 from a regression of x1 on all other x’s is likely to increase with the addition of another x.
Since 1-R2 is in the denominator of the formula for the standard error, the standard error would
increase.
d) Because the dependent variable is in logs, we can consider the coefficients to be
approximately percent effects. Holding constant education, experience, and race, people from
the south earn 11.4% less than those from the rest of the country on average.
e) Excluding a variable for heavy drinking can lead to omitted variable bias. Given that it is
correlated with both education and experience the estimated parameters for β1 and β2 will be
affected by omitted variable bias. Since there is no correlation of heavy drinking with race or
region, our estimates of β3 and β4 remain unbiased.

5
Research Methods II Mock Exam

Question 4 (10 points)

An econometrician estimates the effect of age on minutes of sleep per week,


and obtains the following results, where agesq = age*age:

a) What is the marginal impact of age on sleep?


b) How many hours per week does a 10 year old sleep based on this model? How many
does a 30 year old sleep?
c) At what age is sleep minimized? How many hours does a person of this age group sleep?
d) How well does the model explain the data? How reliable are the responses to answers a
– c?

6
Research Methods II Mock Exam

Solutions:

∂ sleep
a) Marginal effect of age on sleep: =−21.49+ 2∗0.30∗age=−21.49+0.6∗age
∂ age
b) E ( sleep|age=10 )=3608.03−21.49∗10+ 0.3∗10 2=3423.13 minutes
E ( sleep|age=30 )=3608.03−21.49∗30+0.3∗30 =3233.33 minutes
2

A ten year old sleeps approximately 57 hours per week, a 30 year old sleeps about 53.7
hours per week.
c) Sleep is minimized at a level of about 35.8 years (obtained by solving
−21.49+0.6∗age=0) and sleeps approximately 3207 minutes or 53.45 hours per week.
d) The R squared is only 1.5% which is fairly low. There are many other variables related to
both sleep and age such as health status or education that might severely bias the
estimated coefficients in the model. Thus, the responses are not very reliable.

7
Research Methods II Mock Exam

Question 5 (15 points)

a) The following figure shows a scatterplot. Describe the plot in terms of strength and
direction of association, approximate mean values and standard deviations of both x and
y. (5 points)

b) Consider the following regression results. Interpret the output in terms of model fit as
well as size and significance of the impact of x on y. (10 points)

8
Research Methods II Mock Exam

Solution:

5.a) A strong positive association between the variables is given in the scatterplot. Hence the
absolute value of the correlation coefficient is close to +1. The means for both x and y are close
to 20, the standard deviation of x is about 20 whereas the standard deviation of y is about 1.

5.b) The presented results describe a log-log regression. The mean of the response y increases
with increasing x. If x increases by 1 percent then a change of y by about 0.29 percent can be
expected. However, the effect of x is not significant at the 5 percent level (p value is larger than
0.05). The model fit is ca. 2% which means that a share of 2 percent of the variation in the log of
y is explained by the variation in the log of x.

9
Research Methods II Mock Exam

Question 6 (15 points)

The following model is a simplified version of the multiple regression model used by Biddle and
Hamermesh (1990) to study the tradeoff between time spent sleeping and working and to look
at other factors affecting sleep:

where sleep and totwrk (total work) are measured in minutes per week and educ and age are
measured in years. Using the data in SLEEP75, the estimated equation (with standard errors in
brackets) is

a. Is either educ or age individually significant at the 5% level against a two-sided alternative?
b. State the null hypothesis that both education and age do not affect the amount of sleep.
What are k and q in this example?
c. Dropping educ and age from the equation gives

Are educ and age jointly significant in the original equation at the 5% level?
d. Does including educ and age in the model greatly affect the estimated tradeoff between
sleeping and working?
e. Suppose that the sleep equation contains multicollinearity. What does this mean about the
tests computed in parts a) ?

10
Research Methods II Mock Exam

Solutions:

a. With df = 706 – 4 = 702, we use the standard normal critical value (df = 1000 in Table with the t
distribution), which is 1.96 for a two-tailed test at the 5% level. Now teduc = 11.13/5.88  1.89, so
|teduc| = 1.89 < 1.96, and we fail to reject H 0: = 0 at the 5% level. Also, tage 1.52, so age is also
statistically insignificant at the 5% level.
b. H0: β2 = β3 = 0; k = 3; q = 2
c. We need to compute the R-squared form of the F statistic for joint significance. But F = [(.113
 .103)/(1  .113)](702/2) 3.96. The 5% critical value in the F2,702 distribution can be obtained from
Table with the F distribution with denominator df1 = 2 (degrees of freedom in the numerator) and
df2 = 1000 (degrees of freedom in the denominator): cv = 3.00. Therefore, educ and age are jointly
significant at the 5% level (3.96 > 3.00).
d. Not really. These variables are jointly significant, but including them only changes the coefficient on
totwrk from –.151 to –.148.
e. Multicollinearity is given when the independent variables are highly correlated. As long as
there is no perfect collinearity (i.e. one variable is the linear transformation of another) this
phenomenon does not affect the unbiasedness of the estimated coefficients. The test are
thus still valid, but the inflated standard errors might make it hard to produce significant p-
values.

11

You might also like