You are on page 1of 27

Econometrics II

Exercises

Exercise 1
Consider the following data generating process:
yi = α + βxi + ui
Given a random sample with N observations, a researcher wants to estimate the following
sample regression function, for i = 1, ..., N :
ŷi = α̂ + β̂xi

a) Show that the OLS estimators for α and β are given by:


 α̂ = ȳ − β̂ x̄
 N
P
(yi −ȳ)xi
i=1

β̂ = PN
 (xi −x̄)xi
i=1

A complete answer requires stating the objective function, the first order conditions and all key
steps in the proof.

b) Show that:
N
X N
X
(yi − ȳ)xt = (yi − ȳ)(xt − x̄)
i=1 i=1

c) Show that:
N
X N
X
(xi − x̄)xt = (xi − x̄)2
i=1 i=1

d) Use the results from b) and c) to show that the OLS estimator for β can be expressed as:
Cov(x, y)
β̂ =
V ar(x)

e) Calculate the Hessian matrix and use it to show that OLS solutions for the problem in a)
are indeed a minimum.

1
f) Show that the OLS estimator for β is unbiased. Assume the Gauss-Markov assumptions
hold.

g) Show that the variance of the OLS estimator for β is given by:
σ2
V ar(β̂) = N
P
(xi − x̄)2
i=1

Assume the Gauss-Markov assumptions hold.

Exercise 2
A researcher is interested in estimating the β of a given stock. According to the CAPM model:
yt = α + βxt + µt ,

where yt stands for the return on the stock at time t and xt is equals the excess market
returns (rmkt − rf ree ) at time t. The researcher begins by assuming α = 0, thus estimating:

ŷt = β̃xt , t = 1, ..., T

a) Derive the OLS estimator β̃.

Assume the Gauss-Markov Assumptions hold for parts b) and c).

b) Is the estimator in a) unbiased? State which assumptions need to hold for unbiasedness
of the OLS estimator.

c) Derive an expression for the variance of β̃. How does it compare to the variance of β̂ you
obtained in question g) of Exercise 1?

d) Based on what you have learned from these exercises, do you think the researcher made
a good decision? Discuss under which conditions you would support such a decision and under
which conditions you would not.

2
Exercise 3
A researcher estimated a model to investigate the drivers of CEO salary, using data on 177
firms. Consider the results of such model in Table 3.1, where the dependent variable is the (log)
wage of the CEO, sales is the firm’s sales, mktval is the market value of the firm, prof marg is
the profit as percentage of sales, ceoten is years as CEO with the current company and comten
is total years with the company.
Table 3.1

Ln(Salary) (1) (2) (3)

Independent Variables

Ln(sales) 0.224 0.158 0.188


(0.027) (0.040) (0.040)

Ln(mktval) - 0.112 0.100


(0.050) (0.049)

profmarg - -0.0023 -0.0022


(0.0022) (0.0021)

ceoten - - 0.0171
(0.0055)

comten - - -0.0092
(0.0033)

Constant 4.94 4.628 4.57


(0.020) (0.025) (0.025)

Observations 177 177 177


R2 0.281 0.304 0.353
Notes: Standard errors in parentheses. Columns (1), (2), (3) repre-
sent the 1st , 2nd and 3rd model specifications, respectively.

a) Comment on the effect of prof marg on the CEO salary.

b) Does market value have a significant effect? Explain.

c) Interpret the coefficients on ceoten and comten. Are these explanatory variables statisti-
cally significant?

d) Find the 95% confidence interval for the parameter on Ln(sales) for the 1st specification.
(Use the standard normal approximation)

3
e) Can you reject the hypothesis H0 : βceoten = 0.02 against a two-sided alternative at a 5%
significance level? Define the null and the alternative hypothesis, the rejection rule and show
your result graphically.

Exercise 4
A dependent variable is regressed on K independent variables using n observations. Let RSSu
denote this regression’s residual sum of squares and Ru2 the coefficient of determination for
this estimated regression. We want to test the null hypothesis that K1 of these independent
variables, taken together, do not linearly affect the dependent variable, given that the other
dependent variables (K − K1 ) are also to be used.
The regression is reestimated with the K1 independent variables of interest excluded. Let
RSSr denote these second regression’s residual sum of squares and Rr2 its coefficient of deter-
mination.

a) State the null and the alternative hypothesis of the test described.

b) Show that the statistic for testing the null hypothesis can be expressed as:
(RSSr − RSSu )/K1 R2 − Rr2 n − K − 1
= u ×
RSSu /n − K − 1 1 − Ru2 K1

Exercise 5
Consider the following model:
i3t = β0 + β1 inft + β2 deft + β3 deft2 + µt
where i3t is the three-month treasury bill, inf is the annual inflation rate and def is the federal
budget deficit as a percentage of GDP.

a) Interpret the coefficients on inf , def and def 2 . Are the signs of these coefficients reason-
able?

b) Given the information in Table 5.1, how would you test the significance og inf ? State
the hypotheses (H0 and Ha ) and your conclusion. Show all calculations.

c) Given the information in Tables 5.1 and 5.2, how would you test the significance of def ?
State the hypotheses (H0 and Ha ) and your conclusion. Provide all necessary calculations.

d) Based on the DW statistic provided in Table 5.1 what can you conclude? Indicate the
critical values used in your analysis.

4
Table 5.1

Coefficient Std. Error t-Statistic Prob.

const 1.763288 0.439463 4.012369 0.0002


Inf 0.611952 0.083673 ??? ???
Def 0.563922 0.158571 3.556278 0.0008
Def 2 -0.019484 0.040037 -0.486632 0.6286

R2 0.603872 Mean Dep Var 4.908214


Adjusted R2 0.581018 SD Dep Var 2.868242
SE of Regression 1.856579 Akaike info criterion 4.144097
Sum Squared Resid 179.2380 Schwarz criterion 4.288765
Log likelihood -112.0347 Hannan-Quinn criter. 4.200185
F-statistic 26.42353 Durbin-Watson stat 0.687610
Prob(F-stat) 0.000000
Dep Var: i3
Method: Least Squares
Sample: 1948 - 2003
Observations: 56

Table 5.2

Coefficient Std. Error t-Statistic Prob.

const 2.420319 0.463285 5.224258 0.0000


Inf 0.640561 0.094247 6.796651 0.0000

R2 0.461048 Mean Dep Var 4.908214


Adjusted R2 0.451067 SD Dep Var 2.868242
SE of Regression 2.125080 Akaike info criterion 4.380557
Sum Squared Resid 243.8621 Schwarz criterion 4.452891
Log likelihood -120.6556 Hannan-Quinn criter. 4.408601
F-statistic 46.19446 Durbin-Watson stat 0.578943
Prob(F-stat) 0.000000
Dep Var: i3
Method: Least Squares
Sample: 1948 - 2003
Observations: 56

5
Consider now Table 5.3, where RESID2 is the squared residual µt from estimation shown in
Table 5.1.

Table 5.3

Coefficient Std. Error t-Statistic Prob.

const 1.226407 1.396131 0.878433 0.3842


Inf 0.415005 0.622316 0.666872 0.5081
Inf 2 -0.013228 0.052785 -0.250596 0.8032
Inf *Def -0.151082 0.238863 -0.632504 0.5301
Inf *Def 2 0.078789 0.072658 1.084383 0.2837
Def -1.580769 1.030910 -1.533372 0.1319
Def 2 0.296508 0.463201 0.640129 0.5252
Def *Def 2 0.146518 0.079208 1.859795 0.0706
2
(Def 2 ) -0.030543 0.012564 -2.430974 0.0189

R2 0.302074 Mean Dep Var 3.200679


Adjusted R2 0.183278 SD Dep Var 4.545535
SE of Regression 4.107920 Akaike info criterion 5.809935
Sum Squared Resid 793.1253 Schwarz criterion 6.135438
Log likelihood -153.6782 Hannan-Quinn criter. 5.93132
F-statistic ??? Durbin-Watson stat 1.638263
Prob(F-stat) ???
Dep Var: RESID2
Method: Least Squares
Sample: 1948 - 2003
Observations: 56
Collinear test regressors dropped from specification

e) Given the information in Table 5.3, what test is being considered? State the hypotheses
(H0 and Ha ) for this test and explain the test’s importance.

f) Given the information in Table 5.3, what can you conclude? Show all necessary calcula-
tions.

6
Table 5.4

Coefficient Std. Error t-Statistic Prob.

const 0.161277 0.339069 0.475646 0.6364


Inf -0.012146 0.064391 -0.188630 0.8511
Def 0.127326 0.123758 1.028827 0.3084
Def 2 -0.052835 0.032001 -1.651028 0.1049
RESID(−1) 0.705372 0.116134 6.073792 0.0000

R2 0.419735 Mean Dep Var -5.00E-16


Adjusted R2 0.374224 SD Dep Var 1.805235
SE of Regression 1.428049 Akaike info criterion 3.635540
Sum Squared Resid ??? Schwarz criterion 3.816375
Log likelihood -96.59513 Hannan-Quinn criter. 3.705650
F-statistic ??? Durbin-Watson stat 1.738181
Prob(F-stat) ???
Dep Var: RESID
Method: Least Squares
Sample: 1948 - 2003
Observations: 55
Presample missing value lagged residuals set to zero.

Table 5.5

Coefficient Std. Error t-Statistic Prob.

const 0.003310 0.188058 0.017599 0.9860


RESID(−1) 0.648408 0.108750 ??? ???

R2 0.401467 Mean Dep Var 0.048614


Adjusted R2 0.390174 SD Dep Var 1.784495
SE of Regression 1.393536 Akaike info criterion 3.537252
Sum Squared Resid 102.9230 Schwarz criterion 3.610246
Log likelihood -95.27444 Hannan-Quinn criter. 3.565480
F-statistic ??? Durbin-Watson stat 1.609208
Prob(F-stat) ???
Dep Var: RESID
Method: Least Squares
Sample: 1949 - 2003
Observations: 55 after adjustments

7
g) Given the information in Tables 5.4 and 5.5, what test is being considered? State the
hypotheses (H0 and Ha ) for this test. Discuss the difference between Tables 5.4 and 5.5.

h) Given the information in Table 5.4, what can you conclude regarding the test that you
are performing? Provide all necessary calculations.

i) Discuss the consequences of the results of the test in g).

Exercise 6
Consider the output in Table 6.1, where Inf and U nem are annual inflation and unemployment
rates, respectively. Compute the V ar(β̂U nem ) under the H0 : βU nem = 0.
Table 6.1

Coefficient Std. Error t-Statistic Prob.

const 1.053565 1.547957 0.680617 0.4990


U nem 0.502378 ???? ??? ???

R2 0.062154 Mean Dep Var 3.883929


Adjusted R2 0.044786 SD Dep Var 3.040381
SE of Regression 2.971518 Akaike info criterion 5.051084
Sum Squared Resid 476.8157 Schwarz criterion 5.123418
Log likelihood -139.4304 Hannan-Quinn criter. 5.079128
F-statistic 3.578726 Durbin-Watson stat 0.801482
Prob(F-stat) ???
Dep Var: Inf
Method: Least Squares
Sample: 1948 - 2003
Observations: 56

Exercise 7
Consider the following bivariate regression model
yi = α + βxi + µi

estimated on a sample of data i = 1, ..., N , where yi is an observed dependent variable, xi is


an observed exogenous regressor, µi is an unobserved disturbance, and α and β are unknown
parameters.

8
a) Derive the least squares estimator of β.

b) Under what assumptions about µi will these Least Squares estimators be Best Linear
Unbiased.

c) Explain what is the meaning of Best Linear Unbiased estimator.

d) Explain what exogenous means.

Exercise 8
Using annual data from 1930 to 1978 a researcher has estimated by OLS the following model
for cigarrette demand in Table 8.1:
Table 8.1

Coefficient Std. Error t-Statistic Prob.

const 1.972900 1.760509 ??? 0.2685


LN Y 1.236227 0.083966 ??? 0.0000
LN P -0.609198 0.324479 ??? 0.0671
LN A 0.049176 0.069216 ??? 0.4812
D64 -0.277694 0.077386 ??? 0.0008

SE of Regression 0.124051 SD Dep Var 0.391376

Dep Var: LN C
Method: Least Squares
Sample: 1930 - 1978
Observations: 49

where LN C is the log of per capita cigarette consumption, LN Y is the log of per capita
income, LN P is the log of the cigarette price, LN A is the log of the stock of per capita
advertising and D64 is a dummy, equal to 1 from 1964 (starting year of an aggressive anti-
smoking government policy) to 1978. t indexes time.

a) Calculate the F-test for zero slopes.

b) Which estimated coefficients are statistically significant? Justify.

c) Provide an economic interpretation for the coefficient on D66.

9
Exercise 9
A researcher has estimated with OLS the linear regression model reported on Table 9.1, where
the oil price (Log(S W T I)) is regressed on the percentage change and level of oil world de-
mand, (D(Log(Q T OT ))) and (Log(Q T OT )), repectively; the 3-month oil futures price and
its percentage change, (Log(F 3 W T I)) and (D(Log(F 3 W T I))) repectively, and the percent-
age change and level of industry oil stocks, (D(LOG(Q IS))) and (LOG(Q T IS)) repectively.
All variables are expressed in logs and have a quarterly frequency, from the second quarter of
1993 to the third quarter of 2005.
Table 9.1

Coefficient Std. Error t-Statistic Prob.

const 9.101826 0.869823 ??? 0.0000


D(Log(Q T OT )) 0.429238 0.157646 ??? 0.0093
Log(Q T OT ) 0.011383 0.076692 ??? 0.8827
Log(F 3 W T I) 0.965532 0.015095 ??? 0.0000
D(Log(F 3 W T I)) 0.089740 0.028369 ??? 0.0029
D(Log(Q IS)) 0.777185 160660 ??? 0.0000
Log(Q IS) -1.149918 0.111427 ??? 0.0000

R2 ??? Mean Dep Var 3.174285


SD Dep Var 0.368448
SE of Regression 0.018954 Akaike info criterion -4.964459
Sum Squared Resid 0.015447 Schwarz criterion -4.964459
Log likelihood 131.1115 F-statistic 3078.935
Durbin-Watson stat 1.674077 Prob(F-stat) 0.000000
Dep Var: Log(P W T I)
Method: Least Squares
Sample: 1993Q2 - 2005Q3
Observations: 50

a) Complete the t-statistic column and indicate which coefficients are statistically different
than 0 at the 1% significance level.

b) Provide an economic interpretation of the coefficients of Log(Q T OT ), Log(F 3 W T I)


and Log(Q IS).

c) Consider the model:


µ̂t = φµt−1
ˆ + t
where µ̂t indicates the residual of the model estimated in Table 9.1. What is the statistical
meaning of this coefficient φ?

d) Using the results reported in Table 9.1, calculate the model’s R2 .

10
Exercise 10
The expected return of stock i can be represented as:
Ri,t − rft = β0 + β1 (Rmkt,t − rft ) + β2 SM Bt + β3 HM Lt + ut
In this example we will consider monthly returns on IBM stocks for the period between 1988
ans September 2007. The variables used are:
– IBM: monthly returns on IBM stocks
– Mkt: monthly returns on the market index
– rf : monthly risk-free rate
– SMB: Fama-French size risk factor
– HML: book-to-market risk factor

a) What properties does ut need to satisfy in order for the equation above to be appropriately
estimated using OLS?

Table 1: Results

Regression 1 Regression2

Dependent Variable IBMt Dependent Variable IBMt


Observations 231 Observations 231
SSR 3.891271 SSR 4.149464

Indep Vars Indep Vars


Constant 0.032596 Constant 0.027265
(0.008759) (0.008877)
Mkt 1.171063 Mkt 1.636793
(0.246988) (0.221752)
SMB -0.000964
(0.002724)
HML -0.012403
(0.003346)

Notes: Standard errors in parentheses. SSR is the sum of squared residuals.

b) Considering the output of regression 1 in Table 1 what can you conclude regarding the
statistical significance of SM Bt and HM Lt ?

11
c) Based on the results of regression 1 in Table 1 test the null hypothesis that the parameter
associated with the excess return of the market is smaller than one.

d) Using the information on both regressions, test the hypothesis of whether SM Bt and
HM Lt are jointly significant.

e) Using the results of regression 1 in Table 1, compute a 95% confidence interval for the
parameter estimate of M ktt .

12
Exercise 11
A researcher has data for 100 workers of a large organization. His database contains information
on hourly earnings, Earnings, the skill level of the worker, Skill, and a measure of worker’s
intelligence, IQ.
log(Earnings) = β1 + β2 Skill + µ (11.1)

Skill = α1 + α2 IQ +  (11.2)
where µ and  are disturbance terms. The researcher is not sure whether µ and  are distributed
independently of each other.

a) Briefly explain whether each variable is endogenous or exogenous, and derive the reduced
form equations for the endogenous variables.

b) Explain why the researcher could use ordinary least squares (OLS) to fit equation (11.1)
if µ and  are distributed independently of each other.

c) Explain how the researcher could use instrumental variables (IV) to obtain a consistent
estimate of β2 .

d) Explain the advantages and disadvantages of using IV rather than OLS to estimate β2 ,
given that the researcher is not sure whetherµ and  are distributed independently of each
other.

e) Describe in general terms a test that might help the researcher decide whether to use OLS
or IV. What are the limitations of such test?

f) Can the researcher fit equation (11.2) and obtain consistent estimates? Explain.

13
Exercise 12
Consider monthly data on the short-term interest rate (the three-month Treasury Bill rate)
and on the AAA corporate bond yield in the USA. As Treasury Bills and AAA bonds can be
seen as alternative ways of investment in low-risk securities, it may be expected that the AAA
bond rate is positively related to the interest rate. It may further be that this relation holds
more tightly for lower than higher levels of the rates, as for higher rates there may be more
possibilities for speculative gains.
Consider the relation between changes in the AAA corporate bond rate (yi ) and in the
short-term interest rate (3-month T-Bill rate), (xi ).
yi = α + βxi + , i = 1, ..., n (12.1)

a) It may very well be that the factors i which cause changes in the AAA bond rate reflect
general financial conditions that also affect the Treasury Bill rate. What is the problem that
is alluded to and what are its implications for OLS estimates?

b) If financial markets are efficient, this means that all past information is processed in the
current price. In this case, the current value of i is uncorrelated with the past values of both
yi−j and xi−j for all j ≥ 1. What does this mean in terms of a possible solution to the potential
problem detected in a)?

c) Given the output in Tables 12.1 and 12.2, provide a detailed explanation of what is being
done.
Table 12.1

Coefficient Std. Error t-Statistic Prob.

const -0.026112 0.039009 -0.669400 0.5039


xi−1 0.358145 0.062307 5.748060 0.0000
xi−2 -0.282601 0.062266 -4.538625 0.0000

R2 0.151651 Sum Squared Res 86.3046

Panel 1 Dep Var: xi


Method: Least Squares
Sample: 1980:01 - 1999:12
Observations: 240

14
Table 12.2

Coefficient Std. Error t-Statistic Prob.

const -0.008453 0.020060 -0.421374 0.6739


x̂i 0.169779 0.078626 2.159311 0.0318

R2 0.019214 Sum Squared Res 22.69959

Panel 2 Dep Var: yi


Method: Least Squares
Sample: 1980:01 - 1999:12
Observations: 240

d) In order to (in)validate the use of OLS to estimate equation (12.1) one could use the
Hausman test. Explain how the Hausman test is used and state the null and alternative
hypotheses which are being tested.

e) Consider the following output:


Table 12.3

Coefficient Std. Error t-Statistic Prob.

const -0.003895 0.015359 -0.253610 0.800


xi -0.136674 0.060199 -2.270359 0.0241
RESID 0.161106 0.065359 2.464945 0.0144

R2 0.024996

Dep Var: RES OLS


Method: Least Squares
Sample: 1980:01 - 1999:12
Observations: 240

where RES OLS are the residuals of a regression such as in (12.1) and RESID are the
residuals of a regression as in Panel 1 of exercise c), i.e., of a regression of xi on xi−1 , xi−2 and
a constant. Using this output, compute the Hausman test. What can you conclude regarding
the nature of xi ?

15
f) In this context, the Sargan test also plays an imporant role. Explain why.

e) Using the following output, which corresponds to the second step of the Sargan’s proce-
dure, compute the Sargan test. What do you conclude regarding xi−1 and xi−2 ?
Table 12.4

Coefficient Std. Error t-Statistic Prob.

const -0.000156 0.016525 -0.009431 0.9925


xi−1 -0.002218 0.026395 -0.084042 0.9331
xi−2 -0.003387 0.026378 -0.128395 0.8979

R2 0.000135

Dep Var: RES IV


Method: Least Squares
Sample: 1980:01 - 1999:12
Observations: 240

16
Exercise 13

We want to estimate the following model:

yi = α + βxi + µi

We know that yi is a measure of sound health and xi is an indicator of eating habits (xi = 1 if
the individual i consumes fast food and xi = 0 if the individual does not). Data on yi and xi is
collected randomly in a city around Nowhere. Surprisingly, the data shows that xi and yi are
positively correlated in this sample. You notice that due to extremely high taxes on fast food
and a shortage of health personnel, both fast food and health services are rather expensive in
this city around Nowhere. You suspect that only people with higher income consume fast food
in this city around Nowhere and therefore xi might be endogenous. In addition, you do not
know the income of the citizens.
You discover that as part of an aggressive marketing campaign, a fast food chain randomly
selected individuals from this city around Nowhere and offered each individual 100 vouchers for
a fast food meal each. You observe the binary variable zi which is equal to 1 if the individual
was randomly selected by the fast food chain and 0 otherwise.
You observe a total of 70 individuals, 20 of which have xi = 1 and zi = 0 and the average
outcome for that group is ȳi = 1.5. 10 individuals have both xi = 1 and zi = 1 and the average
outcome for that group is ȳi = 1.2. You observe 30 individuals with xi = 0 and zi = 0 and an
average group outcome of ȳi = 1.0. Finally, the remaining individuals show xi = 0 and zi = 1
and an average outcome of ȳi = 0.8.

a) State the necessary conditions for a good instrument. Do you think zi is a good instrument
for xi ? Justify your position.

b) Calculate and interpret OLS estimators for α and β.

c) Calculate the IV estimator for β. Is there a difference between βOLS and βIV ? Explain.
1
d) Calculate a 95% confidence interval for the IV estimator of β. Assume V ar(µi |zi ) = 10
.

e) Test H0 : β = 0. Can you reject the null hypothesis at a 5% significance level?

(Hint: Let y,x,z be the n × 1 vectors with entries yi , xi , zi . Let 1n be a n × 1 vector with
ˆ 2sls =
entries equal to 1. Define X = (1n , x) and Z = (1n , z) and recall that V ar(Estimator
σ 2 [(X 0 Z)(Z 0 Z)−1 (Z 0 X)]−1 )

17
Exercise 14
Consider the linear regression model:
yit = x01t β1 + x02t β2 + t t = 1, 2, ..., T (14.1)
where x01t is a K1 × 1 vector, x02t is a K2 × 1 vector, while xt = (x01t , x02t )0 and β = (β10 , β20 )0
are both K × 1 vectors with K = K1 + K2 . Assume that yt and xt are stationary and weekly
dependent, so that the usual results hold.

a) State the minimal conditions for consistency of the OLS estimator in the regression
model (14.1). How is this related to the interpretation of (14.1) as a conditional expectation:
E[yt |xt ] = x0t β ?
Discuss how this assumption can be used to construct moment conditions,

g(β) = E[f (yt , xt , β)] = 0


for estimating β. Write the corresponding sample moment conditions
T
1X
gT (β) = f (yt , xt , β) = 0
t t=1

and derive the OLS estimator, β̂OLS .

Now assume that the K2 variables in x2t are endogenous, such that
E[x2t t ] 6= 0

How does that affect the properties of OLS?

b) Now assume the existence of K2 new instrumental variables z2t , which satisfy
E[z2t t ] = 0 (14.2)

Should the new instruments z2t fulfill other requirements besides (14.2) for being a valid and
relevant instrument?

Define the K × 1 vector of instruments zt = (x01t , z2t


0 0
) . State the population moment condi-
tions for the instrumental variables (IV) estimator β̂IV in this model. Write the corresponding
sample moment conditions and derive the IV estimator.

Discuss why the simple IV estimator does not work if the number of instruments is larger

18
than the number of parameters.

c) Now assume that the number of instruments in zt , R, is larger than the number of
parameters, K. Explain the intuition for the GMM estimator by referring to the quadratic
form
QT (β) = gT (β)0 WT gT (β) (14.3)
What is the role of the weight matrix WT , and how should it be optimally chosen?
State the sample moments, gT (β), for the case R > K. Insert the moment conditions in
(14.3) and derive the GMM estimator for a given weight matrix, β̂G M M (WT ), as the solution
to:
∂QT (β)
=0
∂β

Can you think of any difficulties in implementing GMM estimation in practice?

19
Exercise 15
Suppose that yi ∼ N (0, σ 2 ). We observe an iid sample yi , i = 1, 2, ..., n. Wehwant to do iinference
on the parameter σ > 0. Furthermore, we know that E(yi4 ) = 3σ 4 , E (yi2 − σ 2 )2 = 2σ 4 ,
h i h i
E (yi − σ )(yi − 3σ ) = 12σ and E (yi − 3σ ) = 96σ 8 .
2 2 4 4 6 4 4 2

From the general GMM theory we know that for a vector of moment function g(yi , σ) that
satisfies E[g(yi , σ)] = 0 at the true parameter σ, the corresponding GMM estimator with optimal
weight matrix has asymptotic variance-covariance matrix
√ −1
AsyV ar( nσ̂GM M ) = G0 (V ar[g(yi , σ)])−1 G

,

where G = ∂g(y i ,σ)


∂σ
. In general G is a matrix, but in this case where σ is a scalar parameter
G is simply a vector. You can use this general result on the asymptotic variance without proof,
and assume all required regularity conditions are satisfied.

a) Consider the moment condition E(yi2 ) = σ 2 . Write down g(yi , σ) that corresponds to that
moment condition. Provide a formula for the corresponding GMM estimator as a function of
the observed y1 , y2 , ..., yn . Compute the asymptotic variance-covariance matrix of this GMM
estimator.

b) Is the choice of weight matrix important for the GMM estimator in a)? Justify your
answer.

c) Consider now the following moment conditions E(yi2 ) = σ 2 and E(yi4 ) = 3σ 4 . Write down
g(yi , σ) that correspond to these moment conditions. Write down the corresponding GMM
objective function and explain how the GMM estimator can be obtained from that objective
function.

d) Is the choice of weight matrix important for the GMM estimator in c)? Calculate the
optimal weight matrix knowing that the weight matrix which minimizes the asymptotic variance
of the GMM estimator is given by W = (V ar[g(yi , σ)])−1 .

e) Calculate the asymptotic variance-covariance matrix of the GMM estimator in c) that


is obtained when using the optimal weight matrix. Comparing to the result in a), is there an
efficiency gain from using the additional moment condition E(yi4 ) = 3σ 4 ?

20
Exercise 16
Consider the simple linear regression model with a constant regressor and on additional regres-
sor:
yi = α + βxi + ui

Assume that ui ∼ N (0, σ 2 ) =⇒ yi ∼ N (α + βxi , σ 2 ). We observe independent random draws


yi , i = 1, ..., n. The likelihood from the density function reads:
n
" #
2
Y 1 (y i − α − βx i )
Lf (yi |α + βxi , σ 2 ) = √ × exp − 2
i=1
2πσ 2σ

The parameters of this model are θ = (α, β, σ)0 , where α, β ∈ R and σ > 0.

a) Write the log-likelihood function for this model. State the optimization problem you need
to solve to derive the maximum likelihood estimators (MLE) for α, β and σ.

b) Derive the maximum likelihood estimator for α, β and σ.

c) Find the expected Hessian matrix H for this model.

d) Show a general formula for the asymptotic variance of θ̂ = (α̂, β̂, σ̂), the matrix AV arM LE

which satisfies n(θ̂ − θ) =⇒ N (0, AV arM LE ) as n → − ∞. You don’t need to solve for the
final result.

21
Exercise 17
[Adapted from Stock Watson Chapter 11] Consider the following output, where P ass refers to
passing a driving test:
Table 17.1

Dep var: P ass

Probit Logit LPM Probit Logit LPM


(1) (2) (3) (4) (5) (6)

0.031 0.040 0.006


Experience
(0.009) (0.016) (0.002)

-0.333 -0.622 -0.071


M ale
(0.161) (0.303) (0.034)

0.712 1.059 0.774 1.282 2.197 0.900


Const
(0.126) (0.221) (0.034) (0.124) (0.242) (0.022)

Answer questions a) to c) using, in turn, the results from column (1), (2) and (3).

a) Does the probability of passing the test depend on Experience?

b) Matthew has 10 years of driving experience. What is the probability the he will pass the
test?

c) Christopher has no driving experience. What is the probability that he passes the test?

d) The sample included values of Experience between 0 and 40 years old, and only four
people in the sample had more than 30 years of driving experience. Jed is 95 years old and has
been driving since he was 15. What is column (1)’s model prediction for the probability that
Jed will pass the test? Do you thik such prediction is reliable? Justify.

Consider columns (4) to (6).

e) Compute the estimated probability of passing the test for men and for women.

f) Are the models in (4)-(6) different? Why or why not?

22
Exercise 18

Sport betting is a popular activity around the world. This exercise examines the relationship
between the outcomes of professional american football matches in the USA and a pregame
indicator of who will win a particular match. The indicator is known as the “point spread” and
is quite known among gamblers.
The point spread is an estimate of how much the team will lose by; if the point spread is
negative, the spread is the team’s predicted margin of victory. A point spread of -20 for a team
predicts that the team will win by 20 points. By the same token, a point spread of 20 predicts
the team will lose by 20 points. Gamblers often ask themselves: “How likely it is that the home
team wins if the point spread against the home team is 20 points?
Using data on all game outcomes for home teams in the 2001-2002 NFL’s regular season, the
following linear probability model for the probability of winning given spread was estimated:

Wˆini = 0.5000 − 0.0252Spreadi


(0.0188) (0.0031)

a) What can you conclude based on the value of the intercept alone?

b) What does the negative sign of on the spread variable indicate?

c) What is the predicted probability of the home team winning, considering the average
spread of 5.88?

d) What is the predicted probability of home team winning if the spread was 20 points?

Now consider that the following Logit model was estimated:

Wˆini = 6.55E−17 − 0.110Spreadi


(0.0825 (0.015)

e) What can you conclude based on the value of the intercept only? Justify.

f) What is the predicted probability of the home team winning, considering the same average
spread as in c)?

g) What is the marginal effect of Spreadi in this model?

23
Exercise 19

Suppose you have a sample of n independent observations from an exponential distribution


with density function:

 1 exp(− yi ) if y ≥ 0, θ > 0
θ θ i
i = 1, ..., n
0 if yi < 0

where E(yi ) = θ and V ar(yi ) = θ2 .

a) Write the log-likelihood function and compute the score.

b) Derive the maximum likelihood estimator of θ.

c) Compute the expectation of the score.

d) Derive the Hessian matrix and the information matrix.

e) Show that the variance of the score equals the information matrix.

f) Derive the variance of the maximum likelihood estimator.

24
Exercise 20

Data was gathered for a study conducted on consumer attitude in the use of discount codes for
a digital transport platform. Using such data, model (1) was estimated by OLS:

ŷi = −1.387 + 0.096 xi, R2 = 0.97 (1)


(−120.345) (122.213)

where xi corresponds to the discount (in cents) and Pi is the respective proportion of consumers
that used the codes.
Besides the OLS model, a Logit (2) and a Probit (3) model were also estimated using
Maximum Likelihood.

Yˆi∗ = −1.957 + 0.115xi, L0 = −3529.02 Lu = −3089.23 (2)


(−20.36) (24.22)

and
Yˆi∗ = −1.098 + 0.069xi, L0 = −3529.02 Lu = −3089.35 (3)
(−21.74) (26.00)

where Yi∗ is an unobservable variable such that Yi = 1 if Yi∗ > 0 (consumer i uses the
discount code) and Yi = 0 if Yi∗ < 0 (consumer i does not use the code).

a) Estimate the marginal contribution of X, using as reference the sample average of 15


cents.

b) Compute the value of the discount for which it is estimated that 70% of the consumers
use the code.

25
Exercise 21

Suppose that we observe n independent random draws yi , i = 1, ..., n of a two component vector
yi = (y1i , y2i )0 ∈ R2 , whose mean and variance-covariance matrix are given by:
! !
θ0 1 4/5
E yi = , V ar(yi ) =
2θ0 4/5 1

, where θ0 ∈ R is an unknown parameter of interest.

a) Let θˆ1 be the method of moments estimator for θ0 based on the following moment condi-
tion: E(y1i − θ0 ) = 0. Let θˆ2 be the method of moments estimator for θ0 based on the moment
√ √
condition E(y2i −2θ0 ) = 0. Show that n(θˆ1 −θ0 ) =⇒ N (0, V1 ) and n(θˆ2 −θ0 ) =⇒ N (0, V2 ),
as n →
− ∞.

Now, consider the GMM estimator


" n
#0 " n #
1X 1X
θ̂GM M = argmin g(yi , θ) W g(yi , θ) ,
θ∈R n n
i=1 i=1

where !
θ
g(yi , θ) = yi −

Remember that, in general, the optimal weight matrix satisfies Wopt = V ar(g(yi , θ))−1 , evalu-
ated at true θ. Finally, recall that:
" !0 !#−1
∂g(yi , θ0 ) ∂g(yi , θ0 )
AsyV ar(θ̂GM M ) = V ar(g(yi , θ0 ))−1
∂θ ∂θ

b) Find an optimal GMM weight matrix that minimizes the asymptotic variance of θ̂GM M .
For that matrix, what is the asymptotic variance of θ̂GM M ? Compare to the asymptotic vari-
ances found in a).

c) In addition to the two moment conditions stated previously, it is suggested to use the
third moment condition E(y2i − y1i − θ0 ) = 0, which is a consequence of the above assumptions.
Would you expect the efficient GMM estimator based on all three moment conditions to be
more efficient than any of the three estimators discussed so far? Explain.

26
Exercise 22

Consider the linear regression model

y1 = β1 + β2 h1 + ui ,

where yi is the logarithm of the wage of individual i, hi is a dummy variable which indi-
cates whether individual i graduated from high-school (hi = 1) or not (hi = 0), and ui is an
unobserved error. Assume that E(ui |hi ) = 0. Let xi = (1, hi ) and β = (β1 , β2 )0 . We ob-
serve an iid sample of (yi , hi ), i = 1, ..., n. In total, we observe n = 80 individuals, of which
Pn Pn
n0 = (1 − hi ) = 60 have no high-school degree and n1 = hi = 20 have a high-school
i=1 i=1
n
1
P
degree. The average log wage in the subpopulation with hi = 0 is n0
(1 − hi )yi = 0.2 and in
i=1
n
1
P
the subpopulation with h1 = 1 is n1
hi yi = 2.
i=1

a) Use the model to calculate E(yi |hi = 0) and E(yi |hi = 1). Use the information from the
samples to obtain estimators for E(yi |hi = 0) and E(yi |hi = 1). Combine both to calculate
estimators for β1 and β2 .
n n
x0i xi , the 2 × 1 vector x0i yi and the OLS estimator
P P
b) Calculate the 2 × 2 matrix
i=1 i=1
n n
x0i xi )−1 x0i yi .
P P
β̂ = ( Compare β̂ to your results in a).
i=1 i=1

c) We now also assume homoscedasticity and E(u2i |hi ) = 15. Use this information to calculate
n
standard errors for β̂1 and β̂2 . Recall that V ar(β̂) = σ 2 ( x0i xi )−1 .
P
i=1

For the following question assume you calculated βˆ2 = 0.9 and std(
ˆ βˆ2 ) = 0.5 (these are not the
values that you should have actually obtained).

d) Test β2 ≤ 0. State the hypothesis, the test statistic and the decision rule. (Use large
sample values for the critical values)

27

You might also like