You are on page 1of 2

Homework(5) - Stata Homework 1.) Use WAGE2.DTA to answer the following questions.

Carefully look at the variables in the data set and their measurement units. i. Estimate the following model to study the wage differences between black and non-black individuals.

(i.a.) Write the results in equation form. Which variable does capture the wage differences that you are interested in? (i.b.) Looking at the stata output that you have, which coefficient estimates are statistically different than zero? (i.c.) Which information you use to answer the question (i.b.)? What is the relationship between t values and confidence intervals that you can find in stata output? (i.d.) What is the economic meaning of the intercept? (i.e.) What is the zero conditional mean assumption in this case? Is it likely to hold for the effect of black on wages? What does Rsquare imply for zero conditional mean assumption? ii. Now you think that the effect of age on wages might be non-linear. (ii.a) How can you re-estimate the model so that you capture this non-linearity without taking ln transformations of any variables? Run this estimation and write the results in equation form. (ii.b) Do you think you find evidence for this non-linearity or does the effect of age on wages seem linear? (ii.c.) Which information you use to answer the question (ii.b.)?

2.) Use the data in GPA2.dta for this exercise. (i) Consider the following equation

Where Colgpa: Cumulative college grade point average

Hsize: Size of high school graduating class (in hundreds) Hsperc: Academic percentile in graduating class Sat: Combined SAT score Female: A binary gender variable Athlete: A binary variable, which is one for student-athletes (ii) What are your expectations for the coefficients in this equation? Which ones are you unsure about? Estimate the equation in part (i) and report the results in the equation form. What is the estimated GPA differential between athletes and non-athletes? Is it statistically significant? Interpret the R-squared. Drop sat from the model part (i), re-estimate the equation. Now, what is the estimated effect of being an athlete? Discuss why the estimate is different than that obtained in part (ii). Lets assume that Homoskedasticity assumption is violated in this estimation. What does this tell you about coefficient estimates that you obtain for model in part (i).

(iii)

(iv) (v)

(vi)

You might also like