You are on page 1of 6

Empirical Project Assignment 1

Ehit Karim (12573183),


Oliver van Douveren (12965545),
Sem Hermans (12587087)
and Ariel Levi (12871184)
June 2021

Abstract
People have started to earn more throughout the years. However, the wage gap in wages
between men and women has remained. In this paper we look at why that is, using the Mincer
equation. After controlling for other variables, we try to find out whether the Mincer Equation
could be improved and if it will help the gender wage gap phenomenon

1 Introduction
After doing research on the past years, Heckman and Todd (2003) found that the general population
has started to earn more, the becomes more complicated when controlling for gender. It has been
shown in Blau and Kahn (1992b) that women, in general, earn less money than men. Even when
controlling for education and a multitude a gap persists in wages. It was found in Lemieux (2006)
that the Mincer equation is a reliable way of modeling earnings, even went setting it up against the
test of time. Björklund and Kjellström (2002) went further and showed that the Mincer equation is
even useful when the assumptions of the model are violated.

2 Theoretical background and method


In the paper Lemieux (2006) it was concluded that the Mincer wage equation is an effective model
for approximating earnings. In this paper we will make use of the same model:

log(y) = β0 + β1 s + β2 x + β3 x2 + ε (1)
In which y is earnings, s is years of schooling and x is years of labour market experience.

Research done in Blau and Kahn (1992a) also adds the schooling of the parents as regressors.
However, our data set does neither contain information on the schooling of the parents of the
participants nor any relevant instrument such as years of education of the parents. Perhaps it would
be beneficial to conduct further research with these variables included, to further investigate the
possibility of omitted variable bias. In the mean time, let us take a look at which variables the data
does include.

3 Data
The data used in this research originates from the Cross sections of the American Current Popu-
lation Survey (CPS) from 2003 to 2014. Besides hourly wage, the data set also contains the age,
gender, years of education, labour experience and ethnicity of each participant.

1
Table 1: Summary of hourly wages

Min. 1st Qu. Median Mean Std. Dev 3rd Qu. Max
1.00 11.50 15.00 17.75 9.63 21.00 90.00

Not all observations of the complete data set were used. The observations which had an hourly
wage of −0.01 could not be used for the Mincer’s Equation, since its dependent variable is the log of
the hourly wage, which does not allow zero wages. As such the data set was limited to individuals
who had an hourly wage greater than 0.
Imposing these restrictions on the data set has left us with 3649 observation to do research on. The
information of the variables is summarised in table 2.
Table 2: Means and standard deviations (in parenthesis) of the variables
Variable Entire sample Male Female
Logarithm of hourly wage 2.7648 (0.4595) 2.8091 (0.4711) 2.7126 (0.4400)
Years of Schooling 13.1380 (2.5669) 12.8865 (2.6895) 13.4341 (2.3816)
-6 0.0238 (0.1507) 0.0319 (0.1758) 0.0143 (0.1188)
7-8 0.0071 (0.0838) 0.0081 (0.0897) 0.0060 (0.0770)
9 0.0151 (0.1209) 0.0203 (0.1409) 0.0089 (0.0942)
10 0.0123 (0.1097) 0.0137 (0.1162) 0.0107 (0.1031)
11 0.0162 (0.1251) 0.0167 (0.1282) 0.0155 (0.1236)
12 0.3902 (0.3809) 0.4237 (0.4941) 0.3508 (0.4772)
13 0.1959 (0.3559) 0.1946 (0.3959) 0.1975 (0.3981)
14 0.1368 (0.3192) 0.1191 (0.3240) 0.1575 (0.3643)
15-17 0.1573 (0.3342) 0.1358 (0.3426) 0.1826 (0.3863)
18 0.0340 (0.1781) 0.0223 (0.1477) 0.0477 (0.2132)
19+ 0.0112 (0.1048) 0.0137 (0.1162) 0.0084 (0.0910)
Work experience 22.7642 (13.8120) 22.4445 (13.6121) 23.1405 (14.0384)

Notice how although women on average have more years of schooling and more work experi-
ence, women have a smaller hourly wage. Hence we get a powerful first view of the gender wage
gap.

4 Results and preferred empirical model


In order to investigate the impact of gender, we modified model (1) such that the dummy variable
Df was included for if the participant was female.

log(y) = β0 + β1 s + β2 x + β3 x2 + β4 Df + ε (2)

Based on the coefficient of the female dummy variable, we observe that the hourly wage is about
13.77% less for females according to this model. The significance of the female variable is also pretty
high based on the t−value 10.68 (under null hypothesis β = 0 with α = 5%). .

Table 3: Regression coefficients of Mincer equation


Variable Est. Coefficient Std. Error
Constant 1.5743745 0.04180
Years of schooling 0.0724990 0.00270
Years of work experience 0.0242197 0.00169
Years of work experience squared/100 -0.0003528 0.00003
Female -0.1377452 0.01367

After adjusting the model such a a dummy variable is included for each major ethnicity group,
we see that barely any ethnic group has a statically significant coefficient, when compared to the
‘White-only’ ethnic group. Only the negative coefficients of the ‘Black-only’ and ‘Asian’ groups have
a t-value (under the null that β = 0) which is smaller than 0.1 (see table 6). This lack of statistical
significance is likely a result of the lack of observations of the ethnic groups.

2
5 Conclusion
Whilst the Mincer model is easy to implement for comparison of log earnings amongst men and
women, its simplicity also leads to many of it’s shortcomings. Applying the Ramsey Reset Test
returns a p-value of 0.0475, implying that it might be necessary to include higher order terms
as well as cross terms to improve our model. However there is a decision about accuracy versus
simplicity to be made here. We also find by use of a Breush-Pagan LM test that the data suffers
from heteroskedasticity which returns p-value of magnitude below 10−4 . This issue can easily be
fixed by usage of robust standard errors, and is not much of an issue. In terms of the specification
of our dependent variable, our Box-Cox transformation agrees with the usage of log(hwage) as it
doesn’t reject H0 : λ = 0. Thus confirming the Mincer specification correct within our data set.

The biggest issue with the model however might the multiple sources of endogeneity. One
example is with the schooling variable, due to how closely related it is with natural ability.
Age doesn’t seem to be a relevant instrument to measure this, and without any other data (such
as the aforementioned schooling of the parents) we would need to accept the bias that this introduces.

A place where age could absolve endogeneity is with the variable experience, which is directly related
to age and years or schooling ( experience = age - 6 - years of school ) according to the dataset.
As we include both experience and years of schooling in the mincer equation ( and other models
), using this dataset for these kind of models can be problematic for performing inference statistics
without using suitable instrumental variables.

3
References
A. Björklund and J. Kjellström. Estimating the return to investments on education: how useful
is the standard mincer equation? Economics of Education Review, 21(3):195–210, 2002. doi:
https://doi.org/10.1016/S0272-7757(01)00003-6.
F. D. Blau and L. M. Kahn. Sem do this. The American Economic Review, 82(2):533–538, 1992a.
doi: https://www.jstor.org/stable/2117457.
F. D. Blau and L. M. Kahn. The gender earnings gap: Learning from international comparisons. The
American Economic Review, 82(2):533–538, 1992b. doi: https://www.jstor.org/stable/2117457.
Lochner L. J. Heckman, J. J. and P. E. Todd. Fifty years of mincer earnings regressions. NATIONAL
BUREAU OF ECONOMIC RESEARCH, (9732), 2003. doi: http://www.nber.org/papers/w9732.

T. Lemieux. The “mincer equation” thirty years after schooling, experience, and earnings. Grossbard
S. (eds) Jacob Mincer A Pioneer of Modern Labor Economics, 2006. doi: http://link.springer.
com/chapter/10.1007/0-387-29175-X 11#page-1.

4
A Appendix

0.06

Denstiy 0.04

0.02

0.00

0 25 50 75
Hourly Wage

Figure 1: Density plot of the hourly wage

Table 4: Frequency of Years of Schooling


Years of Schooling -6 7 7.5 9 10 11 12 13 14 15 16 17 18 19-
Frequency 87 0 26 55 45 59 1424 715 499 0 574 0 124 41

Table 5
Years of Schooling -6 7-8 9 10 11 12 13 14 15-17 18 19-
Frequency 87 26 55 45 59 1424 715 499 574 124 41

Table 6: Regression results with Ethnicity dummies


(1)
VARIABLES hwage

educ -0.0309***
(0.00672)
lexp -0.00412
(0.00562)
lexp2 0.000226**
(0.000112)
black -0.134*
(0.0697)
asian -0.139
(0.0931)
other -0.202*
(0.119)
Constant 1.582***
(0.118)

Observations 54,676
R-squared 0.001
Standard errors in parentheses
Reference group is white only ethnical group
*** p<0.01, ** p<0.05, * p<0.1

Table 7: Box-cox estimates of lambda

5
Ramsey RESET:
F(3, 1964) = 2.65
Prob >F = 0.0475

Box-Cox Estimates:
constant 2.028581
7-8 .086829
9 .0233997
10 .1961737
11 .1009005
12 .321082
13 .3851343
14 .4740249
15-17 .5695451
18 .7659569
over 19 1.098054
work experience .0270459
work experience
-.0412195
squared/100
lambda one -.0188375
log likelihood -6638.4053

You might also like