Perfect Econometrics Paper, used with a special dataset which was created in Alaska, and is counting whatever, but it really looks nice!

20123 2013

Econometrics

The Determinents

Title: Final

term paperof College Professor Salary

(ID): 12D62382

The data used for this paper concern salary and other characteristics of all faculty in a small

Midwestern college collected in the early 1980s for presentation in legal proceedings for which

discrimination against women in salaries was at issue. All persons in the data hold tenured or tenure

track positions (temporary faculty are not included). The data were collected from personal files and

consists of the quantities described in task 2.1 of this paper.

The outcome of this process was that the salary is not influenced significantly by the sex of the

person. When reading this book I was personally very surprised about this since normally the average

women earns less than a man in the same position (which is definitely not a good thing!). Based on

this, I was very interested which other factors influence the salary. That is why I chose this dataset to

analyze for the final term paper in the course of economics.

2 Multiple regression

2.1 Data source

The salary data used in this paper (reference see below), consists of observations on six variables for

52 tenure-track professors in a small college (as already mentioned). The variables are:

sx = Sex, coded 1 for female and 0 for male

rk = Rank, coded

o

dg = Highest degree, coded 1 if doctorate, 0 if masters

yd = Number of years since highest degree was earned

sl = Academic year salary, in dollars.

Reference: S. Weisberg (1985). Discrimination in Salaries. New York: John Wiley and Sons. Page 194.

At the first place it is necessary to run a regression with all variables:

In the first step the variable dg is removed, because according to the t-test it has a low significance.

In that case, this means that dg is not significant at the 10% level.

Now, sx is removed for the same reason than dg in the step before. According to the t-test it has a

low significance. This means that sx is not significant at the 10% level.

In this decision also the change in the constants was considered, which dont suggest collinearity in

this case. The same applies for the standard errors.

At the same time R2 stays almost the same, which means that the same variance in salary can be

explained by the variables still in the model.

Now, yd is removed for the same reason than sx and dg in the steps before. According to the t-test it

has a low significance. This means that yd is not significant at the 10% level. Also the coefficients and

standard error show no significant change.

At the same time R2 stays almost the same, which means that the same variance in salary can be

explained by the variables still in the model.

Now all the variables left are highly significant at 1% level (at any level).

- The estimated equation is:

sl^ = 11336.67 + 4731.256 rk + 376.4993 yr

(858.8681) (450.0083)

n = 52 R2 = 0.8436

(70.45792)

- Interpretation of coefficients:

4731.256 rk This means that an increase of rank for 1 unit (for example from rank 1 to rank 2) will

increase the estimated salary by 4,731.26 USD.

376.4993 yr This means that an increase of 1 year in current rank will increase the estimated

salary by 376.50 USD.

- Significance of explanatory variables: All the explanatory variables are highly significant at 1% level.

R2 = 0.8436 This means that of the variation in salary 84.36% can be explained by the variables.

Prob > F = 0.0000 This means that the whole model is highly significant at a 1% level.

2.4 Conclusion

The data used in this paper consists of observations on six variables for 52 tenure-track professors in

a small college. The purpose was to find out which factors have a significant influence on the salary

of those professors. In the first place the insignificant variables were removed in a step by step

process. Here it is interesting that sex, highest degree earned and number of years since highest

degree was earned do not play a significant role in the model. Only the highly significant variables rk

(rank) and yr (number of years in current rank) play a significant role, which is very interesting.

3.1

The explanatory variable yr is dropped from the original model. The estimation results are following:

When the explanatory variable yr is dropped, both the coefficient and standard error of rk change

significantly. The coefficient changes from 4731.256 to 5952.779 and the standard error changes

from 450.0083 to 482.7553. This means that there is a considerable omitted variable bias.

R2 changes from 0.8436 to 0.7525 which also represents a significant change. The same is the case

for adjusted R2, which would not be the case if the variable could be omitted from the equation. In

this case the adjusted R2 would either not change its value or even increase its value.

3.2

As discussed in class it is very interesting to look at a variable which it is not possible to obtain data.

For this data and case, it would be really interesting to know the impact of years of practical

experience outside of university. If that explanatory variable was included, the possible impact on

the estimated coefficients would be that both the coefficients of rk and yr would decrease

especially in an environment where practical experience is seen as a positive contributor so success

in teaching.

4.1

- t test:

As already explained in task 2.3, all the explanatory variables are highly significant at a 1% level. This

means that you are wrong with 0% if you say rk has an impact on the estimation.

- F test

F test (of the variable rk):

The F tests for both variables show significance to 1% level. One can even say to every percent level,

since the value is 0.0000. This means that the non hypothesis can be rejected at any level.

This test was performed in Excel and shows the same result than the test before. This is the case,

because the estimated equation only has 2 variables.

4.2

a = 1000 This means that the non hypothesis can be rejected at any level.

4.3

This makes only sense with more than 2 valuables, which is not the case here.

5 Functional form

5.1

1.) log-level:

Interpretation of estimated coefficients: This means that an increase of 1 year in current rank will

increase the salary by 1.47%.

2.) level-log:

So yr / yr = 1%, sl = 22.60 dollars

Interpretation of estimated coefficients: This means that an increase of 1% of number of years in

current rank will increase the salary by 22.60 dollars.

3.) log-log:

So yr/yr = 1%, sl/sl = 0.0151044%

Interpretation of estimated coefficients: This means that an increase of 1% of number of years in

current rank will increase the salary by 0.0151044%.

5.2

As requested, three different functional forms (log-log, log-level and level-log) were analyzed.

Hereby, only the sl and yr were considered, as rk does not have an continuous course, in which case

it does not make sense to use a log-model for this variable.

Among the above models the first model (log-level) is the best model, because the R2 increases and

also the t test shows a higher significance. However, this model will not be used in task 2.2 as agreed

with the teacher!

6 Prediction

After generating two new variables for c1 and c2 (see screenshot),

The 95% confidence interval for

0 = 15308.22 to 17580.63

7 Heteroskedasticity

7.1 Perform BP test (hint: estat hettest)

This means that the non hypothesis can be rejected at the 5% level, which implies heteroskedasticity.

7.2 Report estimated equation with heteroskedasticity-robust standard error (hint: reg y x, robust)

sl^ = 11336.67 + 4731.256 rk + 376.4993 yr

(732.9604) (506.1465)

(63.01736)

robust

n = 52 R2 = 0.8436

This result is different in terms of the standard error, which is the case because of the given

heteroskedasticity.

