You are on page 1of 13

Department of Economics Econ 131

Yale University Econometrics & Data Analysis I


Fall 2014

Final Exam — Fall 2014

GENERAL INSTRUCTIONS:

• Write your name and all of your answers in your blue books.

• You can use any notes you have written (or printed) on a single sheet of paper.

• You may use a calculator for basic calculations (adding, subtracting, multiply-
ing, dividing and taking square roots)

• You may not use a cell phone, the internet, make phone calls, or send text
messages during the exam.

• Show your work.

• Please write as neatly as you can.

• When you are done with the exam, hand in both this exam and your blue books.

• If you need to make assumptions, please clearly state them in your answer.

• If you have questions during the exam, ask me.

1
Question 1. Final Survey (2 points)
You will remember in class we analyzed the relationship between midterm exam
scores and various self-reported measures of background and class participation.
We worried about unobserved variables (like IQ) confounding our analysis. One
way around this problem is to see how changes in individual behavior are cor-
related with changes in outcomes. For example, do students who increase their
studying in the second half of the semester do better on the final exam relative
to how they did on the midterm? In order to do this analysis, I need the follow-
ing measures of post-midterm class participation. In return for answering these
questions, you will get two points on your exam, and I will email you the results
of the analysis. Wouldn’t we all like to to know the causal effect of studying
and lecture attendance on performance?

(1) How much of the material we covered after the midterm had you seen
before?
a. none
b. some
c. almost all or all

(2) After the midterm exam, we had 11 classroom lectures. How many did
you attend in person?
a. none
b. 1-5
c. 6-8
d. 9-10
e. 11

(3) Between the midterm and the due date of the last problem set, and not
including attending or watching lecture, how much time did you spend on
this class each week doing things like reading or working on problem sets?
a. less than one hour
b. 1-2 hours
c. 3-4 hours
d. 5-6 hours
e. 7+ hours

2
(4) How much time did you spend studying for the final exam?
a. less than one hour
b. 1-4 hours
c. 4-8 hours
d. 9+ hours

(5) Did you mostly study for the midterm by yourself or with other students
in the class?
a. mostly by yourself
b. mostly with friends
c. about half by yourself and half with friends

(6) What did you do to study for the final exam? (Choose all that apply)
a. Read the book
b. Re-watch portions of the video lectures
c. Go over your notes
d. Study the problem sets and solutions
e. Re-work problems from the problem sets
f. Work new sample problems

3
Question 2. Egg Russian Roulette (13 points)
A professor and a graduate student play a game with a carton of four eggs. Two
of the eggs are raw and two of the eggs are hard-boiled, and it is impossible
to tell which is which without cracking the eggs. The two players take turns
choosing eggs randomly and cracking them over their own heads. The first
player to crack two raw eggs over their own head loses. The graduate student
goes first. The game ends when both raw eggs are cracked over someone’s head,
and it may end in a draw if each player got one raw egg.

(a) (4 points) What is the probability that the game ends after two eggs are
cracked?

(b) (4 points) What is the probability that all four eggs are cracked when the
game ends?

(c) (5 points) Is going first an advantage or disadvantage? Specifically, what


is the probability of losing if you go first vs. second?

4
Question 3. Firm Revenue (10 points)
Suppose you are interested in how firm revenue changes with the age of the
firm. Suppose you don’t know the true model underlying your data, but it is:

log(revenue) = 3 + 0.1 × age + 0.5 × Dtech + ε

where Dtech = 1 when the firm is in the tech sector, and Dtech = 0 otherwise.
Firm age is measured in years, and firm age and Dtech are uncorrelated.

(a) (2 points) If you regress log(revenue) on age and age2 only, what will you
get on average as your estimates for the two coefficients?

(b) (2 points) What will you get on average if you regress log(revenue) on age,
Dtech , and their interaction?

(c) (2 points) Suppose you only observe firms that are exactly one, two, and
three years old, and you regress log(revenue) on dummy variables for age=2
and age=3. What will you get on average as coefficient estimates on the
two dummy variables?

(d) (2 points) Suppose you regress log revenue on firm age measured in months.
What will you get on average for your slope coefficient?

(e) (2 points) Suppose tech firms are actually on average younger (i.e., Dtech
is negatively correlated with a firm’s age), and you regress log(revenue) on
age (in years) by itself. Will you on average get an estimate that is smaller
than 0.1, larger than 0.1, or exactly 0.1?

5
Question 4. MedPharma (10 points)
Suppose MedPharma Corporation invents a pill that if taken daily may prevent
people from catching colds. They want to evaluate the pill using a randomized
controlled trial and gather a random sample of adults and assign half to the
treatment group and half to the control group.

(a) (5 points) First, suppose all trial participants comply with the program
(by taking the pill daily) and one year later MedPharma collects data on
everyone to see how many colds they had since the start of the trial. Using
their data, they run the following regression:

Number of colds in past year = β0 + β1 × T + ε

where T is equal to one for individuals in the treatment group and zero for
individuals in the control group. Is T endogenous? Why or why not? Does
this mean your estimate of β1 is or is not a valid estimate of the causal
effect of the pill?

(b) (5 points) Now suppose some participants didn’t comply with the treat-
ment protocol by taking the pill every day. So MedPharma runs the fol-
lowing regression:

Number of colds in past year = α0 + α1 × C + u

where C is equal to one for individuals who complied with the treatment
protocol and zero otherwise. Note that C is necessarily zero for everyone in
the control group because they didn’t get the opportunity to comply. Now
assume that treatment group participants over age 60 were far more likely
to comply with treatment protocol. Is it possible that C is endogenous?
Why or why not? Is your estimate of α1 a valid causal estimate of the
effect of taking the pill every day?

6
Question 5. Spurring Innovation (10 points)
Suppose, Massachusetts decides to reduce taxes for all firms based on the num-
ber of patents they are granted in the previous year. You want to evaluate the
policy’s impact using a difference-in-differences approach. You have two cross-
sectional data sets that include firms from Massachusetts and Connecticut in
the year before the reform was implemented in Massachusetts and the year after
it was implemented. Your data includes the number of patents each firm was
granted during the year.

(a) (5 points) What regression equation would you estimate to get the diff-in-
diff estimate of the program effect on fraction of firms granted at least one
patent? Clearly define your variables. What coefficient in the model is the
estimate of the program effect?

(b) (5 points) Suppose Connecticut increased its retail sales tax at the same
time Massachusetts implemented its patent incentive policy. Is the above
diff-in-diff estimate still valid? Why or why not?

7
Question 6. Foreign Aid and Economic Growth (10 points)
Much recent research tries to estimate the causal effect of foreign aid on eco-
nomic growth. Suppose you have observations of countries across time and
estimate the following regression:

GDPgrowth = β0 + β1 × Aid + ε

where GDPgrowth is average annual growth of per capita GDP and Aid is
amount of foreign aid per capita given in the same year.

(a) (2 points) What variables might be omitted from the regression that could
affect GDP growth and be correlated with amount of foreign aid? Would
these omitted variables mean an OLS estimate of β1 might be a poor causal
estimate of the effect of concurrent foreign aid?

An influential 2002 paper by Hansen and Tarp uses foreign aid in the previous
year to instrument foreign aid in the current year.

(b) (2 points) What two regressions would you need to estimate to get the IV
estimate of the effect of concurrent foreign aid using Hansen and Tarp’s
instrument?

(c) (2 points) Do you expect foreign aid in the previous year to be correlated
with foreign aid in the current year? Why or why not? How would you
check to see if it was?

(d) (2 points) Might foreign aid in the previous year affect current economic
growth other than through its effect on current foreign aid?

(e) (2 points) Based on your answers to (c) and (d), do you think previous
year’s foreign aid is a good instrument in this context?

8
Question 7. Three Observations (15 points)
Following are the year-end closing values for the Nasdaq Composite and S&P
500 stock market indices for the last three years, as well as average closing for
the three years:

Year Nasdaq S&P 500


2011 2605 1260
2012 2960 1402
2013 4177 1848
Avg 3247 1503

(a) (3 points) What is the sample variance of the S&P 500?

(b) (3 points) What is the sample covariance between the Nasdaq index and
S&P 500?

(c) (3 points) Suppose we regressed the Nasdaq index on the S&P 500 using
the data above. (i.e., Nasdaq is the dependent variable.) What would the
estimated slope coefficient be?

(d) (3 points) What would the estimated intercept be?

(e) (3 points) What would the R2 be?

9
Question 8. Nutrition and Parental Education (15 points)
Nutritionists argue that a child’s nutritional status can be summarized in their
height, given age. An economic model of the household would suggest that child
height (given age) should depend on long run household resource availability
and the efficiency with which household members turn those resources into
improved child health. The production of health at home is likely to depend on
parents’ education and the experience of the mother. We shall use her age as a
proxy for her experience.
Data from a Demographic Health Survey conducted in Brazil in 1987 have been
used to estimate the regressions reported below in Table 1. The dependent
variable in all four regressions is child height measured as a proportion of the
height of a well nourished child of the same age and sex. Standard errors are
reported in brackets below each coefficient estimate. You may assume we have
enough observations to invoke the Central Limit Theorem.

(a) (3 points) Based on the results in columns (2) and (3), are mother’s and
father’s education positively or negatively correlated? Why?

(b) (4 points) Based on the results in column (3), test the null hypothesis that
father’s education does not affect child height for age. Clearly state the
null and alternative hypotheses. Construct the test statistic and either
accept or reject the null with a 5% significiance level.

(c) (4 points) Suppose you were to use an LR test to test the hypothesis that
mother’s and father’s education have the same effect on child height. If the
model estimated in (3) is the unrestricted model, what would the restricted
model be?

(d) (4 points) With a 5% significance level, test the hypothesis that father’s
education has no effect on height in the model estimated in column (4).
That is, are the linear and quadratic terms jointly significant? Compute
the test statistic and compare it to the appropriate critical value.

10
Question 9. Hurricane Evacuation (15 points)
In August of 1998, as Hurricane Bonnie approached the North Carolina coast,
many residents had to make decisions about whether to evacuate. Whitehead
et al (2000) analyze post-storm telephone survey data to see how these residents
made their decisions. Their variables are described in Table 2 and their results
are reported in Table 3.

(a) (3 points) How different are the odds of a household evacuating if they are
given a voluntary order to evacuate relative to getting no evacuation order
at all?

(b) (3 points) Are wealthier households significantly more likely to evacuate


than poorer ones?

(c) (3 points) How different are the odds of evacuation for a respondent with
four year college degree (16 years of schooling) relative to a respondent
with just a high school education (12 years of schooling)?

(d) (3 points) Are female survey respondents significantly more or less likely
to evacuate than male respondents?

(e) (3 points) What does the model predict is the probability of evacuation
for a household when they do not receive any evacuation order, they make
$50,000 per year, they live in a non-mobile home, there is no wind or flood
risk, they have no pets, and the respondent is a black male with 16 years
of education?

11
Table 1: Height for age of children age 0-6 in Brazil
(1) (2) (3) (4)
Intercept 96.25 94.20 94.09 93.39
(0.66) (0.68) (0.68) (0.72)

Household income 5.88 2.38 1.89 2.54


(0.62) (0.72) (0.75) (0.79)

Years of education
of mother 0.41 0.34 0.58
(0.05) (0.06) (0.13)

of father 0.13 0.20


(0.06) (0.12)

of father squared -0.01


(0.01)

Age of mother -0.07 -0.03 -0.03 -0.02


(0.02) (0.02) (0.02) (0.02)
(1) if child is male -0.41 -0.56 -0.56 -0.56
(0.28) (0.27) (0.27) (0.27)

Residual sum of squares 33,437 31,555 31,439 31,327

# observations 1,320
(yi − y)2
P
36,082

12
Table 2: Description of variables used in analysis of evacuation decision

Table 3: Logistic regression of evacuation decision (dependent variable = evacuate)

13

You might also like