You are on page 1of 7

lOMoARcPSD|7552696

Exercise 1 solution

Applied Econometric Methods I (Aarhus Universitet)

StuDocu er ikke støttet eller anerkendt af nogen studiesteder eller universiteter


Downloadet af José Ramiro Gomez Arcila (ramiro.arcila.1960@gmail.com)
lOMoARcPSD|7552696

Exercise 1 (Week 37)


1. Exercise 3.7 (exercise 7 on chapter 3) of Wooldridge.

Which of the following can cause OLS estimators to be biased?

• (i) Heteroskedasticity.

• (ii) Omitting an important variable.

• (iii) A sample correlation coefficient of .95 between two independent variables both
included in the model.

Only (ii), omitting an important variable, can cause bias, and this is true only when the
omitted variable is correlated with the included explanatory variables. The homoskedas-
ticity assumption, MLR.5, played no role in showing that the OLS estimators are unbiased.
(Homoskedasticity was used to obtain the usual variance formulas for the bˆj) Further, the
degree of collinearity between the explanatory variables in the sample, even if it is re-
flected in a correlation as high as .95, does not affect the Gauss-Markov assumptions.
Only if there is a perfect linear relationship among two or more explanatory variables is
MLR.3 violated.

2. Exercise 3.9 (exercise 9 on chapter 3) of Wooldridge.

The following equation describes the median housing price in a community in terms of
amount of pollution (nox for nitrous oxide) and the average number of rooms in houses in
the community (rooms):

log(price) = β0 + β1 log(nox) + β2 rooms + u

• (i) What are the probable signs of β1 and β2 ? What is the interpretation of β1 ?
Explain.

We expectβ1 < 0 because more pollution can be expected to lower housing values; note
that β1 is the elasticity of price with respect to nox. β2 is probably positive because rooms

Downloadet af José Ramiro Gomez Arcila (ramiro.arcila.1960@gmail.com)


lOMoARcPSD|7552696

roughly measures the size of a house. (However, it does not allow us to distinguish homes
where each room is large from homes where each room is small.)

• (ii) Why might nox [or more precisely, log(nox)] and rooms be negatively corre-
lated? If this is the case, does the simple regression of log(price) on log(nox)
produce an upward or a downward biased estimator of β1 ?

If we assume that rooms increases with quality of the home, then log(nox) and rooms are
negatively correlated when poorer neighborhoods have more pollution, something that is
often true. We can use Table 3.2 to determine the direction of the bias. If β2 > 0 and
Corr(x1 ,x2 ) < 0, the simple regression estimator β˜1 has a downward bias. But because
β1 < 0, this means that the simple regression, on average, overstates the importance of
pollution. [E(β̃1 ) is more negative than β1 ].

• (iii) Using the data in HPRICE2.RAW, the following equations were estimated:

\ = 11.71 − 1.043log(nox),
log(price) n = 506, R2 = 0.264

\ = 9.23 − 0.717log(nox) + 0.306rooms,


log(price) n = 506, R2 = 0.514

• (iv) Is the relationship between the simple and multiple regression estimates of the
elasticity of price with respect to nox what you would have predicted, given your an-
swer in part? (ii) Does this mean that -0.718 is definitely closer to the true elasticity
than -1.043?

This is what we expect from the typical sample based on our analysis in part (ii). The
simple regression estimate, −1.043, is more negative (larger in magnitude) than the multi-
ple regression estimate, −.718. As those estimates are only for one sample, we can never
know which is closer to β1 . However, if this is a “typical” sample, the true β1 is closer to
−.718.

Downloadet af José Ramiro Gomez Arcila (ramiro.arcila.1960@gmail.com)


lOMoARcPSD|7552696

3. The file CEO.dat contains data on 447 chief executive officers and can be used to
examine the effects of firm performance on CEO salary.

• (i) Estimate a model relating annual salary to firm sales and assets value. Make the
model of the constant elasticity variety for both independent variables. Write the
results out in equation form.
\ = 4.254 − 0.193log(sales) + 0.156log(assets),
log(salary) n = 447,
n = 447, R2 = 0.245
The coefficient on sales imples that every time sales increases by 1%, salary increases
by 0.193%, or another way of saying it is that everytime sales increases by a 100%, salary
increases by 19.3%. The coefficient for assets implies that everytime assets increase by
1%, salary increases by 0.156%.
• (ii) Add profits to the model from part (i). Why can this variable not be included
in logarithmic form? Would you say that these firm performance variables explain
most of the variation in CEO salaries?
\ = 4.674 − 0.151log(sales) + 0.146log(assets) + 0.0000436pro f its,
log(salary)
n = 447, R2 = 0.252
The coefficient on profits is very small. Here, profits are measured in millions, so if

profits increase by $1 billion, which means profits = 1,000 – a huge change – predicted
salary increases by about only 4.36%. However, remember that we are holding sales and
assets fixed.
• (iii) Add the variable tenure to the model in part (ii). What is the estimated percent-
age return for another year of CEO tenure, holding other factors fixed?
\ = 4.491 − 0.159log(sales) + 0.149log(assets) + 0.000041pro f its + 0.0128846tenure
log(salary)
,
n = 447, R2 = 0.280

The coefficient on tenure means that every extra year of tenure as CEO results in
0.012*100 = 1.2% increase in salary
• (iv) Find the sample correlation coefficient between the variables log(sales) and
profits. Are these variables highly correlated? What does this say about the OLS
estimators?

Downloadet af José Ramiro Gomez Arcila (ramiro.arcila.1960@gmail.com)


lOMoARcPSD|7552696

The sample correlation between log(sales) and profits is about .582, which is fairly high.
As we know, this causes no bias in the OLS estimators, although it can cause their vari-
ances to be large. Given the fairly substantial correlation between log(sales) and firm
profits, it is not too surprising that the latter adds nothing to explaining CEO salaries.

4. The file Kc house data.dat contains data on This dataset contains house sale prices
for King County, which includes Seattle. It includes homes sold between May 2014
and May 2015.

• (i) Confirm the partialling out interpretation of the OLS estimates by explicitly doing
the partialling out. Regress log(price) on log(sq f t − living), log(sq f t − lot) and
f loors.

• (ii) Regress log(sq f t − living) on log(sq f t − lot) and f loors, and save the residual,
which we can call, log(sq f t ˜− living).

• (iii) Now regress log(price) on log(sq f t ˜− living). Can you confirm that the coeffi-
cient on log(sq f t ˜− living) is the same we get for log(sq f t − living) on the regres-
sion in (i). What about the standard errors in (ii), are they the same?

• (iv) Run a regression that also gives you the same standard errors. (hint: you need to
remove the proportion of variance coming from log(sq f t − lot) and f loors on our
dependent variable)

5. Use the Kc house data.dat again, this time we look at ommited variable bias.

• (i) Run a simple regression of log(sq f t − living) on log(sq f t − above), to obtain the
slope coefficient, δ̃1 .

We obtain an estimate of δ̃ = 0.85966

• (ii) Run a simple regression of log(price) on log(sq f t − living), to obtain the slope
coefficient, β̃1 .

We obtain an estimate of β̃1 = 0.8367

Downloadet af José Ramiro Gomez Arcila (ramiro.arcila.1960@gmail.com)


lOMoARcPSD|7552696

• (iii) Run a multiple regression of log(price) on log(sq f t − living), and log(sq f t −


above),to obtain the slope coefficients, β̂1 and β̂2 .

We obtain an estimate of βˆ1 = 0.8271 and βˆ2 = 0.0110

• (iv) Verify that β̃1 = β̂1 + β̂2 δ˜1

We can verify that 0.8367 = 0.8271 + 0.0110 × 0.8596. Note that the difference is due to
rounding.

6. Exercise C 3.1 (exercise C1 on chapter 3) of Wooldridge.

A problem of interest to health officials (and others) is to determine the effects of smoking
during pregnancy on infant health. One measure of infant health is birth weight; a birth
weight that is too low can put an infant at risk for contracting various illnesses. Since
factors other than cigarette smoking that affect birth weight are likely to be correlated
with smoking, we should take those factors into account. For example, higher income
generally results in access to better prenatal care, as well as better nutrition for the mother.
An equation that recognizes this is:

bwght = β0 + β1 cigs + β2 f aminc + u

• (i) What is the most likely sign for β2 ?

Probably β2 > 0, as more income typically means better nutrition for the mother and better
prenatal care.

• (ii) Do you think cigs and faminc are likely to be correlated? Explain why the
correlation might be positive or negative.

On the one hand, an increase in income generally increases the consumption of a good,
and cigs and faminc could be positively correlated. On the other, family incomes are also
higher for families with more education, and more education and cigarette smoking tend to
be negatively correlated. The sample correlation between cigs and faminc is about −0.173,
indicating a negative correlation.

Downloadet af José Ramiro Gomez Arcila (ramiro.arcila.1960@gmail.com)


lOMoARcPSD|7552696

• (iii) Now, estimate the equation with and without faminc, using the data in BWGHT
.RAW. Report the results in equation form, including the sample size and R-squared.
Discuss your results, focusing on whether adding faminc substantially changes the
estimated effect of cigs on bwght.

The effect of cigarette smoking is slightly smaller when faminc is added to the regression,
but the difference is not great. This is due to the fact that cigs and faminc are not very
correlated, and the coefficient on faminc is practically small. (The variable faminc is
measured in thousands, so 10000 USD more in 1988 income increases predicted birth
weight by only 0.93 ounces)

Downloadet af José Ramiro Gomez Arcila (ramiro.arcila.1960@gmail.com)

You might also like