You are on page 1of 4

Prof. Dr.

Martin Biewen University of Tübingen

Applied Econometrics: Problem Set 3

Problem 1 (Wooldridge C4.8)

The data set 401KSUBS.DTA contains information on net financial wealth (nettfa), age of the
survey respondent (age), annual family income (inc), family size (fsize), and participation in
certain pension plans for people in the United States. The wealth and income variables are both
recorded in thousands of dollars. For this question, use only the data for single-person households
(so fsize = 1).

1. How many single-person households are there in the data set?

2. Use OLS to estimate the model

nettfa = β0 + β1 inc + β2 age + u,

and report the results using the usual format. Be sure to use only the single-person hou-
seholds in the sample. Interpret the slope coefficients. Are there any surprises in the slope
estimates?

3. Does the intercept from the regression in part 2. have an interesting meaning? Explain.

4. Find the p-value for the test H0 : β2 = 1 against H1 : β2 < 1. Do you reject H0 at the
1% significance level?

5. If you do a simple regression of nettfa on inc, is the estimated coefficient on inc much
different from the estimate in part 2.? Why or why not?

Problem 2 (Wooldridge C5.4, 4th ed.)

Several statistics are commonly used to detect nonnormality in underlying population distributions.
Here we will study one that measures the amount of skewness in a distribution. Recall that any
normally distributed random variable is symmetric about its mean; therefore, if we standardize a
symmetrically distributed random variable, say z = (y−µy )/σy , where µy = E(y) and σy = sd(y),
then z has mean zero, variance one, and E(z 3 ) = 0. Given a sample of data {yi : i = 1, . . . , n},
we can standardize yi in the sample by using zi = (yi − µ̂y )/σ̂y , where µ̂y is the sample mean and
σ̂y is the sample standard deviation. (We ignore the fact thatPn these are estimates based on the
3
sample.) A sample statistic that measures skewness is n −1
i=1 zi , or where n is replaced with
(n − 1) as a degrees-of-freedom adjustment. If y has a normal distribution in the population, the
skewness measure in the sample for the standardized values should not differ significantly from
zero.

1
1. First we use the data set 401KSUBS.DTA, keeping only observations with fsize = 1. Find
the skewness measure for inc. Do the same for log(inc). Which variable has more skewness
and therefore seems less likely to be normally distributed?

2. Next use BWGHT2.DTA. Find the skewness measures for bwght and log(bwght). What do
you conclude?

3. Evaluate the following statement: “The logarithmic transformation always makes a positive
variable look more normally distributed.”

4. If we are interested in the normality assumption in the context of regression, should we be


evaluating the unconditional distributions of y and log(y)? Explain.

Problem 3 (Wooldridge 6.4)

The following model allows the return to education to depend upon the total amount of both
parents’ education, called pareduc:

log(wage) = β0 + β1 educ + β2 educ · pareduc + β3 exper + β4 tenure + u.

1. Show that, in decimal form, the return to another year of education in this model is

∆ log(wage)/∆educ = β1 + β2 pareduc.

What sign do you expect for β2 ? Why?

2. Using the data in WAGE2.DTA, the estimated equation is

d
log(wage) = 5.65 + .047 educ + .00078 educ · pareduc + .019 exper + .010 tenure
(.13) (.010) (.00021) (.004) (.003)
2
n = 722, R = .169.

(Only 722 observations contain full information on parents’ education.) Interpret the coef-
ficient on the interaction term. It might help to choose two specific values for pareduc - for
example, pareduc = 32 if both parents have a college education, or pareduc = 24 if both
parents have a high school education - and to compare the estimated return to educ.

3. When pareduc is added as a separate variable to the equation, we get:

d
log(wage) = 4.94 + .097 educ + .033 pareduc − .0016 educ · pareduc
(.38) (.027) (.017) (.0012)
+ .020 exper + .010 tenure
(.004) (.003)
2
n = 722, R = .174.

Does the estimated return to education now depend positively on parent education? Test
the null hypothesis that the return to education does not depend on parent education.

2
Problem 4 (Wooldridge C6.3)

Consider a model where the return to education depends upon the amount of work experience
(and vice versa):

log(wage) = β0 + β1 educ + β2 exper + β3 educ · exper + u.

1. Show that the return to another year of education (in decimal form), holding exper fixed,
is β1 + β3 exper.

2. State the null hypothesis that the return to education does not depend on the level of
exper. What do you think is the appropriate alternative?

3. Use the data in WAGE2.DTA to test the null hypothesis in 2. against your stated alternative.

4. Let θ1 denote the return to education (in decimal form), when exper = 10: θ1 = β1 + 10β3 .
Obtain θ̂1 and a 95% confidence interval for θ1 . (Hint: Write β1 = θ1 − 10β3 and plug this
into the equation; then rearrange. This gives the regression for obtaining the confidence
interval for θ1 .)

Problem 5 (Wooldridge C6.10)

Use the data in BWGHT2.DTA for this exercise.

1. Estimate the equation

log(bwght) = β0 + β1 npvis + β2 npvis2 + u

by OLS, and report the results in the usual way. Is the quadratic term significant?

2. Show that, based on the equation from 1., the number of prenatal visits that maximizes
log(bwght) is estimated to be about 22. How many women had at least 22 prenatal visits
in the sample?

3. Does it make sense that birth weight is actually predicted to decline after 22 prenatal visits?
Explain.

4. Add mother’s age to the equation, using a quadratic functional form. Holding npvis fixed,
at what mother’s age is the birth weight of the child maximized? What fraction of women
in the sample are older than the “optimal” age?

5. Would you say that mother’s age and number of prenatal visits explain a lot of the variation
in log(bwght)?

6. Using quadratics for both npvis and age, decide whether using the natural log or the level
of bwght is better for predicting bwght.

3
Problem 6 (Wooldridge 7.10, 4th ed.)

For a child i living in a particular a school district, let voucheri be a dummy variable equal to
one if a child is selected to participate in a school voucher program, and let scorei be that child’s
score on a subsequent standardized exam. Suppose that the participation variable, voucheri , is
completely randomized in the sense that it is independent of both observed and unobserved
factors that can affect the test score.

1. If you run a simple regression scorei on voucheri using a random sample of size n, does the
OLS estimator provide an unbiased estimator of the effect of the voucher program?

2. Suppose you can collect additional background information, such as family income, family
structure (e.g., whether the child lives with both parents), and parents’ education levels.
Do you need to control for these factors to obtain an unbiased estimator of the effects of
the voucher program? Explain.

3. Why should you include the family background variables in the regression? Is there a situa-
tion in which you would not include the background variables?

Problem 7 (Wooldridge C7.6, 4th ed.)

Use the data in SLEEP75.DTA for this exercise. The equation of interest is

sleep = β0 + β1 totwrk + β2 educ + β3 age + β4 age2 + β5 yngkid + u.

1. Estimate this equation separately for men and women and report the results in the usual
form. Are there notable differences in the two estimated equations?

2. Compute the Chow test for equality of the parameters in the sleep equation for men
and women. Use the form of the test that adds male and the interaction terms male ·
totwrk, . . . , male · yngkid and uses the full set of observations. What are the relevant df for
the test? Should you reject the null at the 5% level?

3. Now, allow for a different intercept for males and females and determine whether the
interaction terms involving male are jointly significant.

4. Given the results from parts 2. and 3., what would be your final model?

You might also like