You are on page 1of 8

Department of Economics

Columbia University

W3412
Fall 2015

Problem Set 4
Introduction to Econometrics
Profs. Seyhan Erden and Miikka Rokkanen
for all sections.

Part I.
True, False, Uncertain with Explanation:
(a) One can still use a linear regression framework even if the relation between a regressor and
the dependent variable is not linear.
(b) Including an interaction term between two independent variables, 1 and 2 , allows for the
measurement of the effect of a unit increase in 1 and 2, above and beyond the sum of the
individual effects of a unit increase in the two variables alone.
(c) To decide whether = 0 + 1 + or ln( ) = 0 + 1 + fits the data better,
you should examine the regression 2 .

Part II.
1. Consider the following multiple regression model:
Yi 0 1 X 1,i 2 X 2,i ui

E[ui | X 1,i , X 2,i ] 0


(a) Suppose X 2,i X1,i 2 , Can you compute the OLS coefficient? Explain.
(b) Assume again that X 2,i 2 X 1,i Can you write a single variable model Yi 0 1 X1,i ui
which is equivalent to the multiple regression model above? Can you compute the OLS
coefficients of this single variable model? What is the intuition here?
(c) Consider the alternative model: Yi 1 X 1,i 2 X 2,i ui where again X 2,i X1,i 2 . Can you
compute the OLS coefficients in this model? Explain.
(d) Assume again X 2,i X1,i 2 . Can you write a single variable model: Yi 0 1 X 1,i ui
equivalent to the multiple regression model in (c)? Can you compute the OLS coefficients of
this single variable model? What is the intuition here?
2. Use Table 2 to answer the following questions. Table 2 presents the results of four
regressions, one in each column. Estimate the indicated regressions and fill in the values
(you may either handwrite or type the entries in; if you choose to type up the table, an
electronic copy of Table 2 in .doc format is available on the course Web site). For example,
to fill in column (2), estimate the regression with colGPA as the dependent variable and
hsGPA and skipped as the independent variables, using the robust option, and fill in the
estimated coefficients

(a) Fill out the table with necessary numbers, some will be on STATA output some you will
need to calculate yourself.
(b) Common sense predicts that your high school GPA (hsGPA) and the number of classes you
skipped (skipped) are determinants of your college GPA (colGPA). Use regression (2) to test
the hypothesis (at the 5% significance level) that the coefficients on these two economic
variables are all zero, against the alternative that at least one coefficient is nonzero.
(c) Find the F-statistic for regression (3) and explain what is it testing?
(d) Find the F-statistic for regression (4) and explain what is it testing?
(e) Are bgfriend (whether you have a boy/girlfriend) and campus (whether you live on campus)
jointly significant determinants of college GPA? Use regression (2) and (4) to test your
hypothesis. (i.e. use homoskedasticity-only F stat formula, eq.7.14 in the book, instead of
directly testing with STATA)

Table 1
Definitions of Variables in GPA4.dta (data is from Wooldridge textbook)
Variable
Definition
colGPA
Cumulative College Grade Point Average of a sample of 141
students at Michigan State University in 1994.
hsGPA
High School GPA of students.
skipped
Average number of classes skipped per week.
PC
= 1 if the students owns a personal computer
= 0 otherwise.
bgfriend
= 1 if the student answered yes to having a boy/girl friend
question
= 0 otherwise.
campus
= 1 if the student lives on campus.
= 0 otherwise.

Table 2
College GPA Results
Dependent variable: colGPA
Regressor
hsGPA

(1)

(2)

(3)

(4)

skipped

(
PC

__

bgfriend

__

__

campus

__

__

(
__

Intercept

F-statistics testing the hypothesis that the population coefficients on the indicated regressors are
all zero:
hsGPA, skipped

hsGPA, skipped, PC

__

hsGPA, skipped, PC, bgfriend,

__

__

bgfriend, campus

__

__

(
__

Regression summary statistics

R2
R

Regression RMSE
n

Notes: Heteroskedasticity-robust standard errors are given in parentheses under estimated


coefficients, and p-values are given in parentheses under F- statistics. The F-statistics are
heteroskedasticity-robust.

3. TeachingRatingsdata set contains data on course evaluations, course characteristics, and


professor characteristics for 463 courses for the academic years 2000-2002 at the University
of Texas at Austin. These data were provided by Professor Daniel Hamermesh of the
University of Texas at Austin and were used in his paper with Amy Parker, Beauty in the
Classroom: InstructorsPulchritude and Putative Pedagogical Productivity, Economics of
Education Review, August 2005, Vol. 24, No. 4, pp. 369-376.
Course_eval : Course overall teaching evaluation score, on a scale of 1 (very
unsatisfactory) to 5 (excellent)
Beauty: Rating of instructor physical appearance by a panel of six students, averaged across
the six panelists, shifted to have mean zero.
Female = 1 if the instructor is female, 0 if the instructor is male
Minority = 1 if the instructor is a non-White, 0 if the instructor is White
NNenglish = 1 if the instructor is not a native English speaker, 0 if the instructor is a native
English speaker
Intro= 1 if the course is introductory (mainly large Freshman and Sophomore courses), 0 if
the course is not introductory
Onecredit = 1 if the course is a single-credit elective (yoga, aerobics, dance, etc.), 0 otherwise
Age: Professors age
(a) Regress Course_eval on Beauty and female, test the hypothesis that all population
coefficients are jointly significant at 5% significance level.
(b) Regress Course_eval on Beauty, female, minority and age, test the hypothesis that all
population coefficients are jointly significant at 5% significance level.
(c) Now test if minority and age are jointly significant at 1% significance level using the results
from part (a) and part (b)
(d) Consider the various control variables in the data set. Which do you think should be included
in the regression? Using a table like table 3, examine the effect of Beauty on Course_eval.
(hint: Stata does not list adjusted 2 under robust option. The command to see adjusted 2 is
ereturn list r2_a)

Table 3
Teaching Ratings
Dependent variable: Course_eval
Regressor
(Standard Error
Below)
beauty

(1)

(2)

(3)

(4)

female
minority

__

nnenglish

__

intro

__

__

onecredit

__

__

age

__

__

__

__

__

__

intercept

F-statistics testing the null hypothesis: population coefficients on the


following regressors are all zero: (p-value below)
beauty, female

(
beauty, female, minority

__

intro, onecredit

__

intro, age

__

beauty, female,
minority, nnenglish

minority, age

__
__

__
__
__
__

__
__
__

Regression summary statistics

2
2

R
Regression RMSE
n

Notes: Heteroskedasticity-robust standard errors are given in parentheses under


estimated coefficients, and p-values are given in parentheses under F- statistics. The Fstatistics are heteroskedasticity-robust.

4. Lawsch85 data set is collected by Kelly Barnett, an MSU economics student, for use in a
term project. The data come from two sources: The Official Guide to U.S. Law Schools,
1986, Law School Admission Services, and The Gourman Report: A Ranking of Graduate
and Professional Programs in American and International Universities, 1995, Washington,
D.C.
(a) Regress salary on north south east and west to analyze the effects of regions on salary of
Law School graduates. What is wrong with this regression? Why can you not do this?
(b) How would you correct the problem in part (a)?
(c) Interpret the coefficient of east under your correction strategy in part (b). .
5. Does the separation of corporate control from corporate ownership lead to inflated executive
salaries and worse firm performance? George Stigler and Claire Friedland have addressed
these questions empirically using a sample of firms.1 A subset of their data are in the file
execcomp.dta. The variables in the file are described in table 4

Variable
ecomp
assets
profits
mcontrol

Table 4
Definitions of Variables in execcomp.dta
Definition
Average total amount of compensation in thousands of dollars for
a firms top three executive.
Firms assets in millions of dollars.
Firms annual profits in millions of dollars.
A dummy variable indicating management control of the firm
= 1 management-controlled firms.
= 0 ownership-controlled firms.

(a) Regress executives compensation on the firms assets and profits, the control dummy, and
an intercept term. What proportion of the variation in top executives compensation in this
sample is accounted for by these variables?
(b) If the firms profit rise by one million dollars, by how much do you estimate the top
executives average compensation will change, if assets and the form of control remain
fixed?
(c) What is the estimated difference between the expected average compensations of top
executives in management-controlled firms and those in ownership-controlled firms, if
assets and profits remain fixed?
(d) Regress firm profits on firm assets and the management-control dummy. How much of the
variation in the firms profit in this sample can be accounted for by the variation in firms
asset and the form of control?
(e) Are the empirical results in (a) and (d) consistent with the claim that management control
hurts firm performance and leads to a higher pay for executives?

George J. Stigler and Claire Friedman, The Literature of Economics: The case of Berle and Means, Journal of Law
and Economics 26 no. 2 (June 1983): 237-268

6. Consider the following STATA output on college distances. This dataset contains data from a
random sample of high school seniors interviewed in 1980 and re-interviewed in 1986. In
this exercise you will use these data to investigate the relationship between the number of
completed years of education for young adults and the distance from each student's high
school to the nearest four-year college. The variable ed corresponds to years of education and
dist is the distance to the nearest college and it is measured in tens of miles (For example dist
= 3 means that the high school of the senior is 30 miles from the nearest college).
. reg ed dist, robust
Linear regression

Number of obs
F( 1, 3794)
Prob > F
R-squared
Root MSE

=
=
=
=
=

3796
29.83
0.0000
0.0074
1.8074

-----------------------------------------------------------------------------|
Robust
ed |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------dist | -.0733727
.0134334
-5.46
0.000
-.0997101
-.0470353
_cons |
13.95586
.0378112
369.09
0.000
13.88172
14.02999
------------------------------------------------------------------------------

(a) A students high school was 18 miles from the nearest college. Estimate the number of
years of schooling completed.
(b) Compute the 99% confidence interval for the difference in the predicted years of
education between a high school senior who is 93 miles to the nearest college and another
student who attends a high school that shares a campus with a college. Explain what your
solution means in one sentence.
(c) Does distance to the nearest college explain a lot of the variation in educational
attainment? Explain.
(d) Suppose distance was measured in kilometers such that 10 miles = 16 kilometers.
Replicate the entire STATA output.
(e) Interpret the coefficient of tuition below where the dependent variable, led, is the natural
logarithm of years of education. Give one good explanation for your answer. (note that
tuition is given in $1000)
Linear regression

Number of obs =
3796
F( 3, 3792) = 151.91
Prob > F
= 0.0000
R-squared
= 0.1001
Root MSE
= .12236
-----------------------------------------------------------------------------|
Robust
led |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------tuition |
.0158511
.0069175
2.29
0.022
.0022887
.0294135
momcoll |
.0474716
.0063938
7.42
0.000
.034936
.0600071
dadcoll |
.0749874
.0055234
13.58
0.000
.0641583
.0858164
_cons |
2.582142
.0065834
392.22
0.000
2.569234
2.595049
------------------------------------------------------------------------------

Following questions will not be graded, they are for you to practice and will be discussed at
the recitation:
7. SW Empirical Exercise 6.3
8. SW Exercise 7.1
9. SW Exercise 7.4
10. SW Empirical Exercises 7.1