You are on page 1of 4

# Part 1: Multiple Choice Questions (40 points)

Two points per question for correct answers.
1) One of the following statements is not true. In Probit and Logit models
a. the t-statistic should still be used for testing a single restriction.
b. you can include binary variables as explanatory variables.
c. you use Maximum Likelihood estimation.
d. F-statistics should not be used, since the models are nonlinear.

2) Consider the Probit model Pr(𝑌 = 1| 𝑋) = Φ(𝛽0 + 𝛽1 𝑋), where 𝑋 is a female dummy variable. The marginal effect
(𝑀𝐸) of being female(as opposed to male) on Pr(𝑌 = 1) is given by
a. 𝑀𝐸 = Φ(𝛽̂0 + 𝛽̂1 ) − Φ(𝛽̂0 ).
b. 𝑀𝐸 = 𝛽̂1.
c. 𝑀𝐸 = Φ(𝛽̂0 + 𝛽̂1 𝑋̅) − Φ(𝛽̂0 ) (Where 𝑋̅ denotes the mean of 𝑋).
d. 𝑀𝐸 = Φ′ (∙)𝛽̂1 (Where Φ′(∙) denotes the derivative of Φ(∙)).
3) In the regression model 𝑌𝑖 = 𝛽0 + 𝛽1 𝐶𝑖 + 𝛽2 𝐹𝑖 + 𝛽3 (𝐶𝑖 × 𝐹𝑖 ) + 𝑢𝑖 , where Y denotes earnings, C a dummy variable for
having a college degree and F a gender dummy variable, 𝛽2
a. is the gender difference in earnings for someone with a college degree.
b. is the gender difference in earnings for someone without a college degree.
c. is the difference in earnings between those with and without a college degree when 𝐹𝑖 = 0.
d. cannot be estimated since 𝐹𝑖 and (𝐶𝑖 × 𝐹𝑖 ) are perfectly collinear when 𝐹𝑖 = 0.
4) In multiple regression, the R2 increases whenever an explanatory variable is
a. added unless the coefficient on the added variable is exactly zero.
c. added unless there is heteroskedasticity.
d. added unless the added variable is not statistically significant at the 5%-level.

Analytical Questions

## 1. Suppose that crop yield Y is determined by fertilizer application x2 and rainfall x3 :

yi  1   2 x2i  3 x3i   i .
Suppose that data on rainfall is unavailable so the researcher is forced to estimate:
yi  b1  b2 x2i  ei .
a. What are the implications of the omission of rainfall for the properties of b2 ?
b. Can you assess the qualitative impact that this misspecification may have on the estimates of b2 ?
c. For the true model to identified, rainfall must exhibit variation. Suppose that, for the period under study,
all fields have the same rainfall. Does this affect your conclusion in (a)? Support you answer by
explaining what happens to your algebraic result in this special case.
2. Consider the following Cobb-Douglas production function:
Q   K  L
a. Show how this can be converted to the form of a multiple regression in which, given data series Qi , Ki ,
Li , the parameters  and  are estimable as slope coefficients.
b. Use the property that elasticities are logarithmic derivatives to derive convenient expressions for the
elasticities of output Q with respect to (1) capital K, and (ii) labor L.
c. What parameter restriction corresponds to the hypothesis of constant returns to scale (CRTS)?
Show how your multiple regression can be rewritten to incorporates the CRTS restriction in such a way that the parameter
 is eliminated;
2- Define dummy-variable trap with concrete example
Consider the following regression results (t ratios are in parentheses)*:
Y ˆi = 1286 + 104.97X2i - 0.026X3i + 1.20X4i + 0.69X5i-19.47X6i + 266.06X7i - 118.64X8i - 110.61X9i
t= (4.67) (3.70) (-3.80) (0.24) (0.08) (-0.40) (6.94) (-3.04) (-6.14)
R2 = 0.383 n = 1543
where Y = wife’s annual desired hours of work, calculated as usual hours of work per year plus weeks looking for work
X2 = after-tax real average hourly earnings of wife
X3 = husband’s previous year after-tax real annual earnings
X4 = wife’s age in years
X5 = years of schooling completed by wife
X6 = attitude variable, 1 = if respondent felt that it was all right for a woman to work if she desired and her husband
agrees,
0 = otherwise
X7 = attitude variable, 1 = if the respondent’s husband favored his wife’s working, 0 = otherwise
X8 = number of children less than 6 years of age
X9 = number of children in age groups 6 to 13
a. Do the signs of the coefficients of the various nondummy regressors make economic sense? Justify your answer.
b. How would you interpret the dummy variables, X6 and X7? Are these dummies statistically significant? Since the
sample is quite large, you may use the “2-t” rule of thumb to answer the question.
c. Why do you think that age and education variables are not significant factors in a woman’s labor force participation
decision in this study?
State with brief reason whether the following statements are true, false, or uncertain:
a. In the presence of heteroscedasticity OLS estimators are biased as well as inefficient.
b. If heteroscedasticity is present, the conventional t and F tests are invalid.
c. In the presence of heteroscedasticity the usual OLS method always overestimates the standard errors of estimators.
d. If residuals estimated from an OLS regression exhibit a systematic pattern, it means heteroscedasticity is present in the
data.
e. There is no general test of heteroscedasticity that is free of any assumption about which variable the error term is
correlated with.
f. If a regression model is mis-specified (e.g., an important variable is omitted), the OLS residuals will show a distinct
pattern.
g. If a regressor that has nonconstant variance is (incorrectly) omitted from a model, the (OLS) residuals will be
heteroscedastic.
11.21. You are given the following data:
RSS1 based on the first 30 observations = 55, df = 25
RSS2 based on the last 30 observations = 140, df = 25
Carry out the Goldfeld–Quandt test of heteroscedasticity at the 5 percent level of significance.
In a regression of average wages (W, \$) on the number of employees (N)
for a random sample of 30 firms, the following regression results were
obtained*:
𝑊̂ = 7.5 + 0.009N (1)
t = n.a. (16.10) R2 = 0.90
𝑊̂ /N = 0.008 + 7.8(1/N) (2)
t = (14.43) (76.58) R2 = 0.99
a. How do you interpret the two regressions?
b. What is the author assuming in going from Eq. (1) to (2)? Was he worried about heteroscedasticity? How do you
know?
c. Can you relate the slopes and intercepts of the two models?
d. Can you compare the R2 values of the two models? Why or why not?
The following gives data on median salaries of full professors in statistics in research universities in year 2000–2001.
a. Plot median salaries against years in rank (as a measure of years of experience). For the plotting purposes, assume that
the median
salaries refer to the midpoint of years in rank. Thus, the salary \$74,050 in the range 4–5 refers to 4.5 years in the rank, and
so on. For
the last group, assume that the range is 33–35.
b. Consider the following regression models:
Yi = α1 + α2 Xi + ui (1)
Yi = β1 + β2 Xi + β3 Xi2 + νi (2)
where Y = median salary, X = year in rank (measured at midpoint of the range), and u and v are the error terms. Can you
argue why model(2) might be preferable to model (1)? From the data given, estimate both the models.
c. If you observe heteroscedasticity in model (1) but not in model (2), what conclusion would you draw? Show the
necessary computations.
d. If heteroscedasticity is observed in model (2), how would you transform the data so that in the transformed model there
is no heteroscedasticity
Years in Co Median
rank unt
0 to 1 11 \$69,000
2 to 3 20 \$70,500
4 to 5 26 \$74,050
6 to 7 33 \$82,600
8 to 9 18 \$91,439
10 to 11 26 \$83,127
12 to 13 31 \$84,700
14 to 15 15 \$82,601
16 to 17 22 \$93,286
18 to 19 23 \$90,400
20 to 21 13 \$98,200
22 to 24 29 \$100,000
25 to 27 22 \$99,662
28 to 32 22 \$116,012
33 or more 11 \$85,200

12.1. State whether the following statements are true or false. Briefly justify your answer.
a. When autocorrelation is present, OLS estimators are biased as well as inefficient.
b. The Durbin–Watson d test assumes that the variance of the error term ut is homoscedastic.
c. The first-difference transformation to eliminate autocorrelation assumes that the coefficient of autocorrelation ρ is -1.
d. The R2 values of two models, one involving regression in the first difference form and another in the level form, are not
directly
comparable.
e. A significant Durbin–Watson d does not necessarily mean there is autocorrelation of the first order.
f. In the presence of autocorrelation, the conventionally computed variances and standard errors of forecast values are
inefficient.
g. The exclusion of an important variable(s) from a regression model may give a significant d value.
h. In the AR(1) scheme, a test of the hypothesis that ρ = 1 can be made by the Berenblutt–Webb g statistic as well as the
Durbin–Watson d statistic.
i. In the regression of the first difference of Y on the first differences of X, if there is a constant term and a linear trend
term, it means in the original model there is a linear as well as a quadratic trend term.
In studying the movement in the production workers’ share in the value added (i.e., labor’s share), the following models
were considered by
Model A: Yt = β0 + β1t + ut
Model B: Yt = α0 + α1t + α2t2 + ut
where Y = labor’s share and t = time. Based on annual data for 1949–1964, the following results were obtained for the
primary metal industry:
Model A: Y ˆt = 0.4529 - 0.0041t R2 = 0.5284 d = 0.8252
(-3.9608)
Model B: Y ˆt = 0.4786 - 0.0127t + 0.0005t2 R2 = 0.6629 d = 1.82
(-3.2724) (2.7777)

## where the figures in the parentheses are t ratios.

a. Is there serial correlation in model A? In model B?
b. What accounts for the serial correlation?
c. How would you distinguish between “pure’’ autocorrelation and specification bias?
Consider the following “true” (Cobb–Douglas) production function :ln Yi = α0 + α1 ln L1i + α2 ln L2i + α3 ln Ki + ui
where Y = output; L1 = production labor; L2 = nonproduction labor; K = capital
But suppose the regression actually used in empirical investigation is ln Yi = β0 + β1 ln L1i + β2 ln Ki + ui
On the assumption that you have cross-sectional data on the relevant variables,
a. Will E(βˆ1) = α1 and E(βˆ2) = α3?
b. Will the answer in a hold if it is known that L2 is an irrelevant input in the production function? Show the necessary
derivations.