Econ 335 Group 10

FOREIGN TRADE UNIVERSITY
COLORADO STATE UNIVERSITY
-----------------***-----------------
GROUP 10 PROJECT
SUBJECT: INTRODUCTION TO ECONOMETRICS
Members: Nguyen Kim Son – 1810140058
Tran Hong Quan – 1810140056
Nguyen Xuan Thanh – 1811140095
Ho Thien Huong – 1811140084
Vu Ha Phuong – 1811140094
Do Thi Phuong Thao – 1811140096
Nguyen Quynh Chi – 1810140011
Class: ECON-335 (FTU)
Professor: Anita Pena

Honor Pledge: “We did not give, receive, or use any unauthorized assistance on this project.”
Signed: Son, Quan, Thanh, Huong, Phuong, Thao, Chi
1
Table of Contents
Part a: Statement of research question.............................................................................................2
Part b: Introduction..........................................................................................................................2
Part c: Formulation of the baseline linear model.............................................................................2
Part d: Data description...................................................................................................................4
Part e: Estimation of linear models..................................................................................................5
Part f: Interpretations of linear model..............................................................................................6
Part g: Discussion of Single versus Multivariate Regressions........................................................7
Part h: Discussion on the measures of fit.........................................................................................7
Part i: Empirical Results from a Nonlinear Model..........................................................................8
Part j: Interpretation of Nonlinear Model(s)....................................................................................9
Part k: Summary and Discussion.....................................................................................................9
Part m: Reference..........................................................................................................................10
Part n: Appendices.........................................................................................................................10
2
Part a: Statement of research question
What is the effect of education time on average hourly earnings?
Part b: Introduction
People tend to believe that you can only be successful and affluent if you study for a long
time. The length of education draws considerable attention from recruiters of large companies
and can be a key factor in making or breaking a job application. According to a study conducted
by CareerBuilder, 32% of employers, in the past 5 years, have increased their demand for
newcomers with at least a masters’ degree. (CBIA, 2020). Generally, it can be agreed upon that
longer education time results in better academic skills, and it is understandable that employers
would want their personnel to be as intelligent as possible, and the standard way to increase your
knowledge is by attending schools and lectures more. However, some of the most successful
persons in the world did not spend too much time on campus. Steve Jobs and Steve Wozniak
quitted school, Bill Gates was a Harvard dropout, Michael Dell left his university in his first
year, Mark Zuckerberg was not a Master, and the rest is history. Thus, there indeed are cases
which propose that education time has not such a great impact on income. Since neither way of
thinking is more disagreeable than the other, the purpose of this paper is to try to answer whether
education time actually has some form of effect on a person’s hourly income by the use of
regression analysis.
Having a thorough understanding of this relationship is important. On the one hand, students
can use these types of information to make decisions on how long they should stay on campus
and whether they should continue their education beyond the undergraduate level. On the other
hand, our policymakers can also refer to the data of this paper to see whether the education
3
system is actually doing its job well. For example, suppose that a longer education time really
has a negative effect on average income, then an argument calling for structural reforms could be
made.
Part c: Formulation of the baseline linear model
In order to construct a linear regression model, the first thing that needs to be done is
identifying the independent variables and dependent variables.
As it has been stated above, we want to estimate the effect of education time on average
income, therefore the education time (“educ”) will be the independent variable, and the average
hourly earnings (“wage”), will be the dependent variable of our model:
Wage=β 1 Educ+ β 2 Exper + β 3 Female+u i
In the equation, it can be seen that there are two other independent variables aside from
“educ”, which are “female” and “exper”.
“Female” here is a binary variable for the gender which takes a value of 1 if the person is
female, and a value of 0 if the person is male. “Exper” is a variable which measures the years of
potential work experience that the person may have at one point in time. As more experienced
people know the way their jobs work better than others, they are likely to receive higher income
as they use their accumulated experience to perform their duty, thus a positive relationship
between “exper” and “wage” is expected (b3>0). On the contrary, because of the glass ceiling
that women usually face in the workplace, we anticipate a negative relationship between
“female” and “wage” (b2<0), ceteris paribus.
4
No matter how well we conduct our research, education time alone may not be sufficient to
explain all of the variance in wage, and omitted variable bias is certain to exist if we limit it to
just one independent variable. Omitted variable bias occurs when “the omitted variable is
correlated with the included regressor, and when the omitted variable is a determinant of the
dependent variable” (Stock & Watson, 2011, p. 183).
Here, we can see that both of the extra variables, “female” and “exper” can affect the wages
received by a person, as explained above. In addition, they correlate with our main independent
variable “educ” in one way or the other: Firstly, females typically get less opportunity for long
education than males, because of certain lasting prejudice towards women in the society. For
example: Many people are still of the (sexist) opinion that females do not have to learn much
academically since their job is to stay at home in the kitchen, thereby limiting their chances to go
to school for longer periods of time. Secondly, the longer a person studies at school, the less
likely for him/her to accumulate years of working experience on the way, since the extra year of
schooling can be understood as a form of opportunity cost: People could have lessened their
education time to start working sooner, and begin to gather experience through interaction with
the workplace, the employers, coworkers, etc.
In short, we choose to include “exper” and “female” because those would be the most likely
to become detrimental to our model’s results should they be omitted.
Part d: Data description
Our data is cross sectional data, collected from a sample of 526 employed individuals in the
United States, which is available on the Gretl program’s website.
5
The main variables, “wage” and “educ” are measured in dollars and years, respectively,
while the extra variable “exper”, is also measured in years. Since “female” is a dummy variable,
it does not have a unit of measurement.
Table 1: Summary statistics for variables, 526 observations
Standard
Variables Mean Minimum Maximum
Deviation
Average hour
5,896 3,693 0,530 24,980
earnings (Wage)
Education 12,563 2,769 0 18
Experience 17,017 13,572 1 51
Female 0,479 0,500 0 1
Scatterplot for “educ” and “wage”:
The scatterplot suggests a somewhat weak positive linear relationship between the main
variables “educ” and “wage”. There also appears to be some outliers above the regression line,
6
which may have been due to certain errors in measurement or sampling methods. However, since
the outliers are not too far away from our results, we have decided to keep analyzing with them
included to see if they actually cause irregularities.
Part e: Estimation of linear models
Table 2: Linear Regression Models of Personal income
Wage=β 1 Educ+ β 2 Exper + β 3 Female+u i
Dependent variable: Average hour earnings (wage), number of observations: 526
Regressor (1) (2) (3)

F-statistics and Summary Statistics
Educational 0,541*** 0,644*** 0,602***
F-statistic 48.8245
SER (Educ)
level 3,378
(0,061) 3,257
(0,065) 3,078
(0,064)
Adjusted R2
Experience 0,163 0,222
0,007*** 0,305
0,064***
(Exper) (0,011) (0,010) Part f:

Female -2,155***
Interpretations of linear
(Female) (0,259)
Intercept -0,905 -3,340*** -1,734** model
(0,725) (0,865) (0,858) The first regression
function is a function of the dependent variable and our main independent variable, “wage” and
“educ”, respectively. According to the information obtained in column 1 of table 2, the
coefficient of “educ” is 0.541. This means for each year of increased education, the average
hourly income received is expected to increase by $0.541.
However, when we introduce another regressor, “exper”, into our model, the estimated effect
has increased: For each year of increased education, average hourly income is expected to
increase by $0.644. Much of the same can be said about the inclusion of the 3rd and final
7
regressor - “female”- is added, as the coefficient changes again, this time in downward fashion,
into 0.602. This result in column 3 states that, with the inclusion of both extra variables “exper”
and “female”, the amount of money a person can get for an additional year staying on campus
would approximately be $0.602/hour.
The t-statistics obtained for all 3 coefficients in column 1, 2 and 3 are 8.837, 9.883, and
9.388, respectively. These numbers are all greater than the 1% critical value, which is 2.58,
therefore, we can comfortably reject the null hypothesis that each of these coefficients are equal
to zero. In other words, our results are statistically significant.
The values of the intercept across the three functions are negative, and since wage cannot be
negative, it is best not to interpret the intercepts here.
All in all, the relationship between our 2 variables are identical to what we have previously
predicted in Part b: Education time has a positive linear relationship with average hourly
earnings.
Part g: Discussion of Single versus Multivariate Regressions
The coefficient of the first extra variable, “exper”, as seen in column 2 of table 2, is 0.07,
which means for each additional year of working experience, average hourly earnings would
increase by $0.07. Nevertheless, when we move our attention to column 3, where the other extra
variable “female” is also introduced, this effect has decreased to 0.064, indicating that we would
earn a little bit fewer dollars for each year spent gathering more experience. Perhaps, the most
unfortunate finding here is our model’s suggestion for gender discrimination in the workplace:
Females, on average, earn less than males by $2.155 per hour!
8
Omitted variable bias, as stated, happens when the omitted variables are determinants of the
dependent variables, and they correlate with at least one other independent variable. While the
number of omitted variables in a regression model can be infinite, not all of them cause “bias”.
However, the data collected in columns 2 and 3 of table 2 indicates that we are indeed seeing the
bias in action: The significant difference in the slope coefficients of the main variable “educ”
observed when we add a new variable implies there are changes in their explanatory power as
more omitted variables become accounted for.
In part c, we have speculated that the “exper” variable has a negative correlation with the
independent variable “educ” and a positive correlation with the dependent variable “wage”,
therefore, we expect the slope coefficient for “educ” to be underestimated. As it turns out in the
results of the model, the slope coefficient for “educ” has, indeed, increased as a result of adding
“exper”, suggesting that our model is going in the right direction. Furthermore, our postulation
that “female” is negatively correlated with both “educ” and “wage” appears to be appropriate as
well, since we have overestimated the slope coefficient, and there is truly a decrease in the value
of the slope from column 2 to column 3, where the “female” variable is integrated.
In essence, by adding the omitted variables and treating them as independent variables
alongside “educ”, we can safely assume that the bias associated with those variables has been
eliminated.
Part h: Discussion on the measures of fit
In our final regression function, the R-squared and adjusted R-squared are 0.309 and 0.305,
respectively. Since the difference between these 2 measures of fit are not too large, we can say
9
that about 30-31% of the variance in average hourly earnings is explained by our regression
model.
The SER’s value, 3.078, indicates that the typical average mistake made by the regression
model is 3.078 dollars of hourly income. The reduction in SER throughout the three regression
functions suggests that predictions about average earnings are substantially more precise if they
are made using the regression with all 3 variables than if they are made using the regression with
only “educ” as a regressor.
Part i: Empirical Results from a Nonlinear Model
Although the fit of the linear regression in part h is quite impressive, it has been suggested
that we should include natural log terms of “educ” and “wage”, in order to test whether their
relationship is actually linear or not. In addition, log terms also allow for the measurement of
elasticity, which would be helpful later in the future. The results of the nonlinear regression with
log terms can be seen in Table 3 below:
Table 3: Non-Linear Regression Models of Personal income
ln ⁡(Wage)=β 1 ln ⁡( Educ)+ β 2 Exper + β 3 Female+ui
Dependent variable: Average hour earnings (wage), number of observations: 526
Summary Statistics(1)
Regressor (2) (3)
SER
ln (Educ) 0,435
0,971***
(0,100)
Adjusted R2
Experience 0,329
0,010***
(Exper) (0,001)
Female -0,373***
(Female) (0,380)
Intercept -0,793*** 10
(0,265)
Part j: Interpretation of Nonlinear Model(s)
The results of the log-log regression function are statistically significant at the 1% level ,
since the p-values for all the coefficients are lower than 1%.
For each percentage increase in education time, average hourly earnings is estimated to
increase by 97%. Females, on average, earn 37.3% less per hour than males. An additional year
of accumulated experience is associated with a 0.95% increase in average hourly earnings
Part k: Summary and Discussion
Education and income are concerns of almost every individual on planet Earth.
Understanding the mechanics behind their relationship may be crucial for one’s career, job, and
policy decision making. This paper presents some estimates of a possible linear relationship
between these two variables by looking at data collected from workers across the U.S. However,
while the results received from our models are significant, they cannot, and should not, be
considered “complete”, since there are still possibilities for further bias regarding the estimator:
Firstly, internal validity may not have been attained. Omitted variable bias may still exist.
“Exper” and “female” are by no means the only omitted variables that can distort the model. One
might argue that “wealth” may also cause bias, since families with a high level of wealth can use
their money to get their children into good positions with high wages, and wealthy families
typically enjoy better education than others. People may lie about their education time and
experience to save face, making our variables’ data susceptible to error-in-variables bias.
Simultaneous causality may also pose a problem here, since a person’s average hourly income
can be a determinant of his/her education time. For instance, the person in question may be
someone working for money to spend on his/her masters’ degree, or to pay school-related debts.
11
Our scatterplot shown in Part d suggests a weak linear relationship, therefore our result may have
been interpreted in a wrong functional form. The log-log function discussed appears to be a more
suitable model, given the nonlinear spread of data in the scatterplot. The presence of outliers also
suggests the probability of sample selection bias.
Secondly, it is challenging to claim that our model is externally valid. To be externally valid,
the coefficients should not vary greatly when we apply this exact model to another country with
similar backgrounds and settings as the U.S. As we know, each place has drastically different
political structures, social norms, legal responsibilities, geographical conditions, etc. Given the
time constraints we have when we construct the model, external validity may be too much to test
for.
Despite the imperfect elements, our analysis may still be useful as reference materials for
researchers interested in this topic. They may be able to learn from our setbacks and improve the
analysis, so that the results can be internally and externally valid. All in all, any academic work
will have a certain type of value, be it large or small, for those interested.
Part l: Bibliography
12
Part m: Reference
Employers Raising Their Educational Requirements. (2016, March 06). CBIA. Retrieved
November 22, 2020, from https://www.cbia.com/news/hr-safety/employers-raising-education-
requirements/
James H. Stock and Mark W. Watson. (2011). Linear Regression with Multiple Regressors.
Introduction to Econometrics (3rd ed., p. 183). Pearson
Part n: Appendices
13
14
15
16

Econ 335 Group 10

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Econ 335 Group 10

Uploaded by

Copyright:

Available Formats

FOREIGN TRADE UNIVERSITY

COLORADO STATE UNIVERSITY

SUBJECT: INTRODUCTION TO ECONOMETRICS

Members: Nguyen Kim Son – 1810140058

Tran Hong Quan – 1810140056

Nguyen Xuan Thanh – 1811140095

Ho Thien Huong – 1811140084

Do Thi Phuong Thao – 1811140096

Nguyen Quynh Chi – 1810140011

Class: ECON-335 (FTU)

Professor: Anita Pena

Signed: Son, Quan, Thanh, Huong, Phuong, Thao, Chi

Part a: Statement of research question.............................................................................................2

Part c: Formulation of the baseline linear model.............................................................................2

Part d: Data description...................................................................................................................4

Part e: Estimation of linear models..................................................................................................5

Part f: Interpretations of linear model..............................................................................................6

Part g: Discussion of Single versus Multivariate Regressions........................................................7

Part h: Discussion on the measures of fit.........................................................................................7

Part i: Empirical Results from a Nonlinear Model..........................................................................8

Part j: Interpretation of Nonlinear Model(s)....................................................................................9

Part k: Summary and Discussion.....................................................................................................9

What is the effect of education time on average hourly earnings?

Part c: Formulation of the baseline linear model

identifying the independent variables and dependent variables.

hourly earnings (“wage”), will be the dependent variable of our model:

Wage=β 1 Educ+ β 2 Exper + β 3 Female+u i

“educ”, which are “female” and “exper”.

“female” and “wage” (b2<0), ceteris paribus.

dependent variable” (Stock & Watson, 2011, p. 183).

the workplace, the employers, coworkers, etc.

to become detrimental to our model’s results should they be omitted.

Part d: Data description

United States, which is available on the Gretl program’s website.

it does not have a unit of measurement.

Table 1: Summary statistics for variables, 526 observations

Scatterplot for “educ” and “wage”:

included to see if they actually cause irregularities.

Part e: Estimation of linear models

Table 2: Linear Regression Models of Personal income

Wage=β 1 Educ+ β 2 Exper + β 3 Female+u i

Dependent variable: Average hour earnings (wage), number of observations: 526

Regressor (1) (2) (3)

(Exper) (0,011) (0,010) Part f:

(0,725) (0,865) (0,858) The first regression

“educ”, respectively. According to the information obtained in column 1 of table 2, the

hourly income received is expected to increase by $0.541.

would approximately be $0.602/hour.

to zero. In other words, our results are statistically significant.

negative, it is best not to interpret the intercepts here.

Part g: Discussion of Single versus Multivariate Regressions

Females, on average, earn less than males by $2.155 per hour!

more omitted variables become accounted for.

Part h: Discussion on the measures of fit

only “educ” as a regressor.

Part i: Empirical Results from a Nonlinear Model

log terms can be seen in Table 3 below:

Table 3: Non-Linear Regression Models of Personal income

ln ⁡(Wage)=β 1 ln ⁡( Educ)+ β 2 Exper + β 3 Female+ui

Dependent variable: Average hour earnings (wage), number of observations: 526

of accumulated experience is associated with a 0.95% increase in average hourly earnings

Part k: Summary and Discussion

suggests the probability of sample selection bias.