You are on page 1of 9

IRDAP 2020 Exam

EC220
Introduction to Econometrics

Suitable for all resit/deferral candidates

Instructions to candidates

This paper contains FOUR questions, divided into two sections. Section A contains ONE question
related to Michaelmas Term and Section B contains THREE questions related to Lent Term. You
should answer all questions from Section A and all questions from Section B.

If at any point in this exam you feel that anything is unclear, please make additional assumptions that
you feel are necessary and state them clearly.

For Section A: Please type your answer in a Word-processing software on a computer (e.g. Word).
You could combine the typed document with scanned or photographed hand-drawn diagrams and
computations. The maximum word count is 1500 words, beyond which nothing will be marked. There
is no minimum word count and concise answers will be rewarded.
For Section B: Please use pen and paper and scan (or photograph) your answers. You could also use
an iPad or a tablet. There is no maximum word count for Section B.
The answers must then be converted to pdf and uploaded to Moodle as ONE individual file together
with the Coversheet. Please make sure every single scanned page is legible and properly ordered.
The file will be run through Turnitin to ensure academic integrity.

Time Allowed: Submit PDF with answers within 24 hours after official start of the exam
Estimated Effort: Reading Time: 15 minutes
Answering Time: 3 hours

You are supplied with: Lindley & Scott Cambridge Statistical Tables
Table A5 Durbin-Watson d-statistic
Additional Materials: This is an open book examination

Calculators: Calculators are allowed in this examination

© LSE ST 2020/EC220R IRDAP Page 1 of 9


Section A
(Answer all questions. This section carries 1/3 of the overall mark.)

Question 1
[33.34 marks]
Charitable giving is an increasingly important component of the economy. Yet, relatively little is known
about what motivates people to give to charities. Fundraisers are interested in assessing the effec-
tiveness of door-to-door fundraising campaigns, which are normally expensive and time-consuming.
A consultancy firm in London has access to a dataset on various characteristics (home address, age,
gender, occupation, whether the household donates before, ...) of 170,000 households.
In summer 2020, the firm sends ask-for-donation flyers to 100,000 households. The flyer only contains
the exact time of a door-to-door charity solicitation in the neighbourhood. The firm codes Flyeri = 1
if household i is sent a flyer, and 0 otherwise. The team then visits the 100,000 households, but are
only able to speak with 60,000 of them. The firm codes Visitedi = 1 if the door-to-door solicitor
could speak to household i, and 0 otherwise. After the flyer campaign, the firm records in the variable
Donatei = 1 if household i donates to a charity, and 0 otherwise.

(a) The consultancy firm claims that the 60,000 successful visits should constitute a random sam-
ple from the intended population of 100,000 households who receive flyers. Critically discuss
the claim. Could the team statistically test whether or not the claim holds? If yes, carefully de-
scribe one suitable test and necessary assumptions. If not, carefully explain why. [5 marks]

(b) The firm claims that the 100,000 households were randomly selected from the original 170,000
addresses. Assume that the claim is true. The firm investigates the causal effect of the ask-for-
donation flyer campaign on giving behaviour in London.
(i) Could the firm use the available information to answer the causal question? If yes, carefully
describe and interpret the estimation regression that you would run. If not, explain why and
describe the additional information and assumptions you would require. [3.34 marks]

(ii) A critic suggests pensioners are more likely to donate when they receive an ask-for-donation
flyer. Let Pensioneri = 1 if household i0 s head is a pensioner, 0 otherwise. Could you test
this claim? If yes, carefully describe how. If no, explain why. [4 marks]

(iii) The firm now would like to estimate the causal effect of a door-to-door solicitation visit
on giving behaviour. Could it be done using the available information? If yes, describe the
regression and critically discuss your approach. If not, explain why. [7 marks]

(c) Another research team in Birmingham hypothesises that including photographs that elicit emo-
tion in ask-for-donation flyers would increase the amount of donations. The team has a dataset
of 100 potential donors with information on various household characteristics. In Spring 2020,
they send an ask-for-donation flyer with information of different charities in the city to 40 house-
holds randomly selected from the sample. The team adds additional photographs that evoke
emotion to the flyers sent to the remaining 60 households (coded as Photographs Includedi =
1, 0 otherwise). Denote Amount of Donationsi as the amount in GBP raised from household i
after the flyer campaign and Female-headedi = 1 if an adult female is the main decision maker
of household i. The team reports the results from OLS regressions in the table below.

© LSE ST 2020/EC220R IRDAP Page 2 of 9


Dependent Variable: Amount of Donationsi (£)

Regressor (1) (2) (3)

15.751 15.367 15.424


Photographs Includedi
(7.222) (4.102) (5.193)

1.124 1.139
Female-headedi
(0.151) (0.061)

1.188
Pensioneri
(0.152)

80.523 74.623 77.611


Constant
(12.244) (6.112) (9.189)

Observations 100 100 100

(i) What are the average donations from all the households in the sample after the campaign?
Carefully explain your answer. If you cannot derive the answer, clearly indicate any further
information or assumptions necessary for your calculation. [2 marks]

(ii) A critic cites prior research that pensioners are just more likely to donate than working
households, all else equal. The critic suggests that the team must include the variable
Pensioneri , which indicates whether household i0 s head is a pensioner, in the estimation to
avoid omitted variable bias. Critically evaluate the critic0 s suggestion. [5 marks]

(iii) Another critic interprets Column (2) as a causal evidence for women being more altruistic
than men. His rationale is that gender is assigned randomly at birth, Column (2) captures
the causal effect of having a female household head on the amount of donations. Carefully
explain whether the critic is right or wrong? [3 marks]

(iv) Carefully explain why the coefficients for Photographs Includedi are different in Column (1),
Column (2), and Column (3)? Clearly state any assumptions you make. [4 marks]

© LSE ST 2020/EC220R IRDAP Page 3 of 9


Section B
(Answer all questions. This section carries 2/3 of the overall mark.)

Question 2
Consider the bivariate regression model without intercept

yi = βxi + ui ,

for i = 1, . . . , n. We impose the following assumptions.


SLR.1 The population model is y = βx + u.
SLR.2 We have a random sample of size n, {(yi , xi ) : i = 1, . . . , n}, following the population model
in SLR.1.
SLR.3 The sample outcomes on {xi : i = 1, . . . , n} are not all the same value.
SLR.4 The error term u satisfies E(u|x) = 0 for any value of x.
SLR.5 The error term u satisfies V ar(u|x) = σ 2 for any value of x (homoskedasticity).
Let β̂ be the OLS estimator for the regression from y on x without intercept, that is
Pn
xi y i
β̂ = Pi=1
n 2 .
i=1 xi

[22.33 marks]
Pn
(a) Explain whether the following statement is true or false: i=1 ûi = 0 for the OLS residuals
ûi = yi − β̂xi for i = 1, . . . , n. [If it is true, prove this statement. Otherwise, explain the reason.]
[3 marks]
(b) Show that β̂ is a consistent estimator for β under SLR.1-4. [4 marks]
(c) Under SLR.1-5, derive E[β̂ 2 |X], where X = (x1 , . . . , xn ). [4 marks]
(a) (a)
(d) Now suppose another random sample of size n, {(yi , xi ) : i = 1, . . . , n}, is available.
(a) (a)
[Here “(a)” is a superscript to signify another sample.] Suppose {(yi , xi ) : i = 1, . . . , n}
is independent of the original sample {(yi , xi ) : i = 1, . . . , n}, and we impose the following
assumptions.
SLR.1a The population model is y (a) = β (a) x(a) + u(a) . [This model may be different from the
one in SLR.1.]
(a) (a)
SLR.2a We have a random sample of size n, {(yi , xi ) : i = 1, . . . , n}, following the popula-
tion model in SLR.1a.
(a)
SLR.3a The sample outcomes on {xi : i = 1, . . . , n} are not all the same value.
SLR.4a The error term u(a) satisfies E(u(a) |x(a) ) = 0 for any value of x(a) .
SLR.5a The error term u(a) satisfies V ar(u(a) |x(a) ) = (σ (a) )2 for any value of x(a) (homoskedas-
ticity).
Let β̂ (a) be the OLS estimator for the regression from y (a) on x(a) , that is
Pn (a) (a)
(a) xi y i
β̂ = Pi=1
n (a)
.
2
i=1 (xi )

Show that E[β̂ − β̂ (a) ] = β − β (a) under SLR.1-4 and SLR.1a-4a. [3.33 marks]
(e) Under SLR.1-5 and SLR.1a-5a, derive the (conditional) variance V ar(β̂ − β̂ (a) |X, X(a) ), where
(a) (a)
X(a) = (x1 , . . . , xn ). [4 marks]

© LSE ST 2020/EC220R IRDAP Page 4 of 9


(f) In addition to SLR.1-5 and SLR.1a-5a, assume σ = σ (a) and
SLR.6 The error term u is independent of x and is normally distributed with mean zero and
variance σ 2 .
SLR.6a The error term u(a) is independent of x(a) and is normally distributed with mean zero
and variance σ 2 .
Explain how to test the null hypothesis H0 : β = β (a) against the two-sided alternative H1 :
β 6= β (a) . [Hint:
 In this setup, an unbiased
 estimator of σ 2 is obtained as
1 Pn Pn (a) 2 (a) (a) (a)
s2 = 2n−2
2
i=1 ûi + i=1 (ûi ) where ûi = yi − β̂ (a) xi .] [4 marks]

© LSE ST 2020/EC220R IRDAP Page 5 of 9


Question 3
(a) Answer the following questions. [11.33 marks]
(i) Consider two scalar random variables x and u, where E(u) = 0. Compare three concepts:
(1) E(u|x) = 0, (2) Cov(x, u) = 0, and (3) x and u are independent. [5.33 marks]
(ii) Consider the regression model

y = β0 + β1 x1 + β2 x2 + u, E(u|x1 , x2 ) = 0.

Suppose that the error term is heteroskedastic, (i.e., V ar(u|x1 , x2 ) varies with x1 and x2 ).

(ii-1) Explain how to test the null hypothesis H0a : β1 = β2 against the one-sided alternative
hypothesis H1a : β1 > β2 . [3 marks]

(ii-2) Suppose we want to test the null hypothesis H0b : β1 = β2 = 0. Write down a test for
this hypothesis under homoskedasticity. Then explain the problem of this testing procedure
under the current setup. [3 marks]

© LSE ST 2020/EC220R IRDAP Page 6 of 9


(b) [11 marks]
It is postulated that a reasonable demand-supply model for the wine industry in Australia, under
market clearing assumption, would be given by

Qt = α0 + α1 Ptw + α2 Ptb + α3 Yt + α4 At + u1t demand


Qt = β0 + β1 Ptw + β2 St + u2t supply

where Qt = real per capita consumption of wine, Ptw = price of wine relative to CPI, Ptb = price
of beer relative to CPI, Yt = real per capita disposable income, At = real per capital advertising
expenditure, and St = storage cost at time t. CPI is the Consumer Price Index at time t. The
endogenous variables in this model are Q and P w , and the exogenous variables are P b , Y, A
and S . The variance of u1t and u2t are, respectively σ12 , and σ22 , and Cov(u1t , u2t ) = σ12 6= 0.
The errors do not exhibit any correlation over time.

(i) Derive the reduced form for Ptw .


[2 marks]
(ii) The OLS estimation of the demand function, based on annual data from 1955-1975 (T =
20), gave the following results (all variables are in logs and figures in parentheses are t-
ratios).

Q̂t = −23.651 + 1.158Ptw −0.275 Ptb +3.212 Yt −0.603 At


(−6.04) (4.0) (−0.45) (4.5) (−1.3)

All the coefficients except that of Y have the wrong signs. The coefficient of P w (price
elasticity of demand, α1 ) not only has the wrong sign but also appears significant.

Explain why the OLS parameter estimator may give rise to these counter-intuitive results.
You are expected to use your results in answer (a) to support your answer.
[3 marks]
(iii) The supply equation is overidentified. Clearly explain this terminology. What distinguishes
overidentification from exact identification and underidentification? Provide one set of as-
sumptions that would render the supply equation exactly identified.
[3 marks]
(iv) Discuss how you should estimate the supply equation in light of the overidentification. Dis-
cuss the benefit of using overidentification conditions. [3 marks]

© LSE ST 2020/EC220R IRDAP Page 7 of 9


Question 4
(a) [14 marks]
Consider the following time series model

crimet = α0 + ρcrimet−1 + α1 clearupt + α2 clearupt−1 + α3 clearupt−2 + et , (4.1)


|ρ| < 1, t = 3, ..., T,

where E (et |crimet−1 , clearupt , clearupt−1 , ...) = 0


and V ar (εt |crimet−1 , clearupt , clearupt−1 , ...) = σ 2 . The errors exhibit no autocorrelation.

(i) Discuss the following statement “Even though we can estimate the parameters consistently
by OLS, for inference is important to use HAC standard errors”. Support your answers with
clear arguments. In your answer briefly explain the difference between unbiasedness and
consistency of parameter estimates.
[5 marks]
(ii) Show that when you omit the relevant variable crimet−1 in the above model, you will get
evidence of autocorrelation in the errors. Explain the result.

Hint: You are expected to reformulate your model as

crimet = β0 + β1 clearupt + β2 clearupt−1 + β3 clearupt−2 + vt . (4.2)

[4 marks]
(iii) Let us assume that the true model is displayed in (4.2), where vt exhibits autocorrelation of
unknown form that displays weak dependence, and E(vt |clearupt , clearupt−1 , clearupt−2 ) =
0. You are asked to test whether the long run effect of clear-up rates on the crime rate is
significant. Discuss how you can obtain the standard error of the long run effect required
to conduct the test. [5 marks]

© LSE ST 2020/EC220R IRDAP Page 8 of 9


(b) [8 marks]
Stevenson and Wolfers (2008) amongst others have analysed happiness using data collected
in the General Social Survey. Here we are interested in explaining the binary variable vhappy , a
dummy variable that denotes whether an individual considers him/herself "very happy" or not
(1 = yes, 0 = no). The following socio-demographic variables are considered: occattend and re-
gattend (which are dummy variables indicating whether the individual occasionally or regularly
attends church, where the excluded dummy indicates that the individual never attends church),
income (family income in ’000US$), unemp10 (dummy indicating whether the individuals has
been unemployed in the last 10 years), and educ (years of education completed). A random
sample of observations from US are available.

Advised that there are benefits to using the Probit model over the Linear Probability Model (you
are not asked to discuss this), you obtain the following results:

(i) Discuss how you can obtain the predicted probability of an individual who regularly attends
church, whose lincome equals 5, has not been unemployed in the last 10 years and has 13
years of education.
[3 marks]
(ii) You want to test the joint significance of the church attendance variables occattend and
regattend. How would you conduct this test, and what additional information would you
require to implement it? Given the results presented in the table, what do you expect the
outcome of this test to be? Briefly explain your answer. [5 marks]

END OF PAPER

© LSE ST 2020/EC220R IRDAP Page 9 of 9

You might also like