You are on page 1of 10

4

20123 2013

Econometrics

The Determinents
Title: Final
term paperof College Professor Salary

(Name): Michael Glas

(ID): 12D62382

1 Introduction (Background and Purpose)


The data used for this paper concern salary and other characteristics of all faculty in a small
Midwestern college collected in the early 1980s for presentation in legal proceedings for which
discrimination against women in salaries was at issue. All persons in the data hold tenured or tenure
track positions (temporary faculty are not included). The data were collected from personal files and
consists of the quantities described in task 2.1 of this paper.
The outcome of this process was that the salary is not influenced significantly by the sex of the
person. When reading this book I was personally very surprised about this since normally the average
women earns less than a man in the same position (which is definitely not a good thing!). Based on
this, I was very interested which other factors influence the salary. That is why I chose this dataset to
analyze for the final term paper in the course of economics.

2 Multiple regression
2.1 Data source
The salary data used in this paper (reference see below), consists of observations on six variables for
52 tenure-track professors in a small college (as already mentioned). The variables are:
sx = Sex, coded 1 for female and 0 for male
rk = Rank, coded
o

1 for assistant professor,

2 for associate professor, and

3 for full professor

yr = Number of years in current rank


dg = Highest degree, coded 1 if doctorate, 0 if masters
yd = Number of years since highest degree was earned
sl = Academic year salary, in dollars.

Reference: S. Weisberg (1985). Discrimination in Salaries. New York: John Wiley and Sons. Page 194.

2.2 Model specification


At the first place it is necessary to run a regression with all variables:

In the first step the variable dg is removed, because according to the t-test it has a low significance.
In that case, this means that dg is not significant at the 10% level.

Now, sx is removed for the same reason than dg in the step before. According to the t-test it has a
low significance. This means that sx is not significant at the 10% level.
In this decision also the change in the constants was considered, which dont suggest collinearity in
this case. The same applies for the standard errors.
At the same time R2 stays almost the same, which means that the same variance in salary can be
explained by the variables still in the model.

Now, yd is removed for the same reason than sx and dg in the steps before. According to the t-test it
has a low significance. This means that yd is not significant at the 10% level. Also the coefficients and
standard error show no significant change.
At the same time R2 stays almost the same, which means that the same variance in salary can be
explained by the variables still in the model.

Now all the variables left are highly significant at 1% level (at any level).

2.3 Result report


- The estimated equation is:
sl^ = 11336.67 + 4731.256 rk + 376.4993 yr
(858.8681) (450.0083)
n = 52 R2 = 0.8436

(70.45792)

- Interpretation of coefficients:
4731.256 rk This means that an increase of rank for 1 unit (for example from rank 1 to rank 2) will
increase the estimated salary by 4,731.26 USD.
376.4993 yr This means that an increase of 1 year in current rank will increase the estimated
salary by 376.50 USD.

- Significance of explanatory variables: All the explanatory variables are highly significant at 1% level.

- Explain the meaning of R-squared and F value showed in Stata outcome:


R2 = 0.8436 This means that of the variation in salary 84.36% can be explained by the variables.
Prob > F = 0.0000 This means that the whole model is highly significant at a 1% level.

2.4 Conclusion
The data used in this paper consists of observations on six variables for 52 tenure-track professors in
a small college. The purpose was to find out which factors have a significant influence on the salary
of those professors. In the first place the insignificant variables were removed in a step by step
process. Here it is interesting that sex, highest degree earned and number of years since highest
degree was earned do not play a significant role in the model. Only the highly significant variables rk
(rank) and yr (number of years in current rank) play a significant role, which is very interesting.

3 Omitted variable bias


3.1
The explanatory variable yr is dropped from the original model. The estimation results are following:

When the explanatory variable yr is dropped, both the coefficient and standard error of rk change
significantly. The coefficient changes from 4731.256 to 5952.779 and the standard error changes
from 450.0083 to 482.7553. This means that there is a considerable omitted variable bias.
R2 changes from 0.8436 to 0.7525 which also represents a significant change. The same is the case
for adjusted R2, which would not be the case if the variable could be omitted from the equation. In
this case the adjusted R2 would either not change its value or even increase its value.

3.2
As discussed in class it is very interesting to look at a variable which it is not possible to obtain data.
For this data and case, it would be really interesting to know the impact of years of practical
experience outside of university. If that explanatory variable was included, the possible impact on
the estimated coefficients would be that both the coefficients of rk and yr would decrease
especially in an environment where practical experience is seen as a positive contributor so success
in teaching.

4 t test , F test (LM test doesnt need to be performed by bachelor students)


4.1
- t test:

As already explained in task 2.3, all the explanatory variables are highly significant at a 1% level. This
means that you are wrong with 0% if you say rk has an impact on the estimation.

- F test
F test (of the variable rk):

F test (of the variable yr):

The F tests for both variables show significance to 1% level. One can even say to every percent level,
since the value is 0.0000. This means that the non hypothesis can be rejected at any level.

- F (in terms of R2):


This test was performed in Excel and shows the same result than the test before. This is the case,
because the estimated equation only has 2 variables.

4.2

a = 1000 This means that the non hypothesis can be rejected at any level.

4.3
This makes only sense with more than 2 valuables, which is not the case here.

5 Functional form
5.1
1.) log-level:

If yr = 1 year, then log(sl) = 0.0147362 which means sl/sl=1.47%


Interpretation of estimated coefficients: This means that an increase of 1 year in current rank will
increase the salary by 1.47%.

2.) level-log:

If log(yr) = 1 (yr/yr = 100%), then sl = 2259.63 dollars


So yr / yr = 1%, sl = 22.60 dollars
Interpretation of estimated coefficients: This means that an increase of 1% of number of years in
current rank will increase the salary by 22.60 dollars.

3.) log-log:

If log(yr) = 1 (yr/yr = 100%), then log(sl) = 0.0151044*(sl/sl = 1.51044%)


So yr/yr = 1%, sl/sl = 0.0151044%
Interpretation of estimated coefficients: This means that an increase of 1% of number of years in
current rank will increase the salary by 0.0151044%.

5.2
As requested, three different functional forms (log-log, log-level and level-log) were analyzed.
Hereby, only the sl and yr were considered, as rk does not have an continuous course, in which case
it does not make sense to use a log-model for this variable.
Among the above models the first model (log-level) is the best model, because the R2 increases and
also the t test shows a higher significance. However, this model will not be used in task 2.2 as agreed
with the teacher!

6 Prediction

E(sl^| rk=1, yr=1)


After generating two new variables for c1 and c2 (see screenshot),
The 95% confidence interval for

0 = 15308.22 to 17580.63

7 Heteroskedasticity
7.1 Perform BP test (hint: estat hettest)

This means that the non hypothesis can be rejected at the 5% level, which implies heteroskedasticity.

7.2 Report estimated equation with heteroskedasticity-robust standard error (hint: reg y x, robust)

The estimated equation is:


sl^ = 11336.67 + 4731.256 rk + 376.4993 yr
(732.9604) (506.1465)

(63.01736)

robust
n = 52 R2 = 0.8436

This result is different in terms of the standard error, which is the case because of the given
heteroskedasticity.