Professional Documents
Culture Documents
-----------------***-----------------
GROUP 10 PROJECT
Vu Ha Phuong – 1811140094
1
Table of Contents
Part b: Introduction..........................................................................................................................2
Part m: Reference..........................................................................................................................10
Part n: Appendices.........................................................................................................................10
2
Part a: Statement of research question
Part b: Introduction
People tend to believe that you can only be successful and affluent if you study for a long
time. The length of education draws considerable attention from recruiters of large companies
and can be a key factor in making or breaking a job application. According to a study conducted
by CareerBuilder, 32% of employers, in the past 5 years, have increased their demand for
newcomers with at least a masters’ degree. (CBIA, 2020). Generally, it can be agreed upon that
longer education time results in better academic skills, and it is understandable that employers
would want their personnel to be as intelligent as possible, and the standard way to increase your
knowledge is by attending schools and lectures more. However, some of the most successful
persons in the world did not spend too much time on campus. Steve Jobs and Steve Wozniak
quitted school, Bill Gates was a Harvard dropout, Michael Dell left his university in his first
year, Mark Zuckerberg was not a Master, and the rest is history. Thus, there indeed are cases
which propose that education time has not such a great impact on income. Since neither way of
thinking is more disagreeable than the other, the purpose of this paper is to try to answer whether
education time actually has some form of effect on a person’s hourly income by the use of
regression analysis.
Having a thorough understanding of this relationship is important. On the one hand, students
can use these types of information to make decisions on how long they should stay on campus
and whether they should continue their education beyond the undergraduate level. On the other
hand, our policymakers can also refer to the data of this paper to see whether the education
3
system is actually doing its job well. For example, suppose that a longer education time really
has a negative effect on average income, then an argument calling for structural reforms could be
made.
In order to construct a linear regression model, the first thing that needs to be done is
As it has been stated above, we want to estimate the effect of education time on average
income, therefore the education time (“educ”) will be the independent variable, and the average
In the equation, it can be seen that there are two other independent variables aside from
“Female” here is a binary variable for the gender which takes a value of 1 if the person is
female, and a value of 0 if the person is male. “Exper” is a variable which measures the years of
potential work experience that the person may have at one point in time. As more experienced
people know the way their jobs work better than others, they are likely to receive higher income
as they use their accumulated experience to perform their duty, thus a positive relationship
between “exper” and “wage” is expected (b3>0). On the contrary, because of the glass ceiling
that women usually face in the workplace, we anticipate a negative relationship between
4
No matter how well we conduct our research, education time alone may not be sufficient to
explain all of the variance in wage, and omitted variable bias is certain to exist if we limit it to
just one independent variable. Omitted variable bias occurs when “the omitted variable is
correlated with the included regressor, and when the omitted variable is a determinant of the
Here, we can see that both of the extra variables, “female” and “exper” can affect the wages
received by a person, as explained above. In addition, they correlate with our main independent
variable “educ” in one way or the other: Firstly, females typically get less opportunity for long
education than males, because of certain lasting prejudice towards women in the society. For
example: Many people are still of the (sexist) opinion that females do not have to learn much
academically since their job is to stay at home in the kitchen, thereby limiting their chances to go
to school for longer periods of time. Secondly, the longer a person studies at school, the less
likely for him/her to accumulate years of working experience on the way, since the extra year of
schooling can be understood as a form of opportunity cost: People could have lessened their
education time to start working sooner, and begin to gather experience through interaction with
In short, we choose to include “exper” and “female” because those would be the most likely
Our data is cross sectional data, collected from a sample of 526 employed individuals in the
5
The main variables, “wage” and “educ” are measured in dollars and years, respectively,
while the extra variable “exper”, is also measured in years. Since “female” is a dummy variable,
Standard
Variables Mean Minimum Maximum
Deviation
Average hour
5,896 3,693 0,530 24,980
earnings (Wage)
Education 12,563 2,769 0 18
Experience 17,017 13,572 1 51
Female 0,479 0,500 0 1
The scatterplot suggests a somewhat weak positive linear relationship between the main
variables “educ” and “wage”. There also appears to be some outliers above the regression line,
6
which may have been due to certain errors in measurement or sampling methods. However, since
the outliers are not too far away from our results, we have decided to keep analyzing with them
Adjusted R2
Experience 0,163 0,222
0,007*** 0,305
0,064***
function is a function of the dependent variable and our main independent variable, “wage” and
coefficient of “educ” is 0.541. This means for each year of increased education, the average
However, when we introduce another regressor, “exper”, into our model, the estimated effect
has increased: For each year of increased education, average hourly income is expected to
increase by $0.644. Much of the same can be said about the inclusion of the 3rd and final
7
regressor - “female”- is added, as the coefficient changes again, this time in downward fashion,
into 0.602. This result in column 3 states that, with the inclusion of both extra variables “exper”
and “female”, the amount of money a person can get for an additional year staying on campus
The t-statistics obtained for all 3 coefficients in column 1, 2 and 3 are 8.837, 9.883, and
9.388, respectively. These numbers are all greater than the 1% critical value, which is 2.58,
therefore, we can comfortably reject the null hypothesis that each of these coefficients are equal
The values of the intercept across the three functions are negative, and since wage cannot be
All in all, the relationship between our 2 variables are identical to what we have previously
predicted in Part b: Education time has a positive linear relationship with average hourly
earnings.
The coefficient of the first extra variable, “exper”, as seen in column 2 of table 2, is 0.07,
which means for each additional year of working experience, average hourly earnings would
increase by $0.07. Nevertheless, when we move our attention to column 3, where the other extra
variable “female” is also introduced, this effect has decreased to 0.064, indicating that we would
earn a little bit fewer dollars for each year spent gathering more experience. Perhaps, the most
unfortunate finding here is our model’s suggestion for gender discrimination in the workplace:
8
Omitted variable bias, as stated, happens when the omitted variables are determinants of the
dependent variables, and they correlate with at least one other independent variable. While the
number of omitted variables in a regression model can be infinite, not all of them cause “bias”.
However, the data collected in columns 2 and 3 of table 2 indicates that we are indeed seeing the
bias in action: The significant difference in the slope coefficients of the main variable “educ”
observed when we add a new variable implies there are changes in their explanatory power as
In part c, we have speculated that the “exper” variable has a negative correlation with the
independent variable “educ” and a positive correlation with the dependent variable “wage”,
therefore, we expect the slope coefficient for “educ” to be underestimated. As it turns out in the
results of the model, the slope coefficient for “educ” has, indeed, increased as a result of adding
“exper”, suggesting that our model is going in the right direction. Furthermore, our postulation
that “female” is negatively correlated with both “educ” and “wage” appears to be appropriate as
well, since we have overestimated the slope coefficient, and there is truly a decrease in the value
of the slope from column 2 to column 3, where the “female” variable is integrated.
In essence, by adding the omitted variables and treating them as independent variables
alongside “educ”, we can safely assume that the bias associated with those variables has been
eliminated.
In our final regression function, the R-squared and adjusted R-squared are 0.309 and 0.305,
respectively. Since the difference between these 2 measures of fit are not too large, we can say
9
that about 30-31% of the variance in average hourly earnings is explained by our regression
model.
The SER’s value, 3.078, indicates that the typical average mistake made by the regression
model is 3.078 dollars of hourly income. The reduction in SER throughout the three regression
functions suggests that predictions about average earnings are substantially more precise if they
are made using the regression with all 3 variables than if they are made using the regression with
Although the fit of the linear regression in part h is quite impressive, it has been suggested
that we should include natural log terms of “educ” and “wage”, in order to test whether their
relationship is actually linear or not. In addition, log terms also allow for the measurement of
elasticity, which would be helpful later in the future. The results of the nonlinear regression with
Summary Statistics(1)
Regressor (2) (3)
SER
ln (Educ) 0,435
0,971***
(0,100)
Adjusted R2
Experience 0,329
0,010***
(Exper) (0,001)
Female -0,373***
(Female) (0,380)
Intercept -0,793*** 10
(0,265)
Part j: Interpretation of Nonlinear Model(s)
The results of the log-log regression function are statistically significant at the 1% level ,
since the p-values for all the coefficients are lower than 1%.
For each percentage increase in education time, average hourly earnings is estimated to
increase by 97%. Females, on average, earn 37.3% less per hour than males. An additional year
Education and income are concerns of almost every individual on planet Earth.
Understanding the mechanics behind their relationship may be crucial for one’s career, job, and
policy decision making. This paper presents some estimates of a possible linear relationship
between these two variables by looking at data collected from workers across the U.S. However,
while the results received from our models are significant, they cannot, and should not, be
considered “complete”, since there are still possibilities for further bias regarding the estimator:
Firstly, internal validity may not have been attained. Omitted variable bias may still exist.
“Exper” and “female” are by no means the only omitted variables that can distort the model. One
might argue that “wealth” may also cause bias, since families with a high level of wealth can use
their money to get their children into good positions with high wages, and wealthy families
typically enjoy better education than others. People may lie about their education time and
experience to save face, making our variables’ data susceptible to error-in-variables bias.
Simultaneous causality may also pose a problem here, since a person’s average hourly income
can be a determinant of his/her education time. For instance, the person in question may be
someone working for money to spend on his/her masters’ degree, or to pay school-related debts.
11
Our scatterplot shown in Part d suggests a weak linear relationship, therefore our result may have
been interpreted in a wrong functional form. The log-log function discussed appears to be a more
suitable model, given the nonlinear spread of data in the scatterplot. The presence of outliers also
Secondly, it is challenging to claim that our model is externally valid. To be externally valid,
the coefficients should not vary greatly when we apply this exact model to another country with
similar backgrounds and settings as the U.S. As we know, each place has drastically different
political structures, social norms, legal responsibilities, geographical conditions, etc. Given the
time constraints we have when we construct the model, external validity may be too much to test
for.
Despite the imperfect elements, our analysis may still be useful as reference materials for
researchers interested in this topic. They may be able to learn from our setbacks and improve the
analysis, so that the results can be internally and externally valid. All in all, any academic work
will have a certain type of value, be it large or small, for those interested.
Part l: Bibliography
12
Part m: Reference
Employers Raising Their Educational Requirements. (2016, March 06). CBIA. Retrieved
requirements/
James H. Stock and Mark W. Watson. (2011). Linear Regression with Multiple Regressors.
Part n: Appendices
13
14
15
16