You are on page 1of 17

FOREIGN TRADE UNIVERSITY

COLORADO STATE UNIVERSITY

-----------------***-----------------

GROUP 10 PROJECT

SUBJECT: INTRODUCTION TO ECONOMETRICS

Members: Nguyen Kim Son – 1810140058

Tran Hong Quan – 1810140056

Nguyen Xuan Thanh – 1811140095

Ho Thien Huong – 1811140084

Vu Ha Phuong – 1811140094

Do Thi Phuong Thao – 1811140096

Nguyen Quynh Chi – 1810140011

Class: ECON-335 (FTU)

Professor: Anita Pena


Honor Pledge: “We did not give, receive, or use any unauthorized assistance on this project.”

Signed: Son, Quan, Thanh, Huong, Phuong, Thao, Chi

1
Table of Contents

Part a: Statement of research question.............................................................................................2

Part b: Introduction..........................................................................................................................2

Part c: Formulation of the baseline linear model.............................................................................2

Part d: Data description...................................................................................................................4

Part e: Estimation of linear models..................................................................................................5

Part f: Interpretations of linear model..............................................................................................6

Part g: Discussion of Single versus Multivariate Regressions........................................................7

Part h: Discussion on the measures of fit.........................................................................................7

Part i: Empirical Results from a Nonlinear Model..........................................................................8

Part j: Interpretation of Nonlinear Model(s)....................................................................................9

Part k: Summary and Discussion.....................................................................................................9

Part m: Reference..........................................................................................................................10

Part n: Appendices.........................................................................................................................10

2
Part a: Statement of research question

What is the effect of education time on average hourly earnings?

Part b: Introduction

People tend to believe that you can only be successful and affluent if you study for a long

time. The length of education draws considerable attention from recruiters of large companies

and can be a key factor in making or breaking a job application. According to a study conducted

by CareerBuilder, 32% of employers, in the past 5 years, have increased their demand for

newcomers with at least a masters’ degree. (CBIA, 2020). Generally, it can be agreed upon that

longer education time results in better academic skills, and it is understandable that employers

would want their personnel to be as intelligent as possible, and the standard way to increase your

knowledge  is by attending schools and lectures more.  However, some of the most successful

persons in the world did not spend too much time on campus. Steve Jobs and Steve Wozniak

quitted school, Bill Gates was a Harvard dropout, Michael Dell left his university in his first

year, Mark Zuckerberg was not a Master, and the rest is history. Thus, there indeed are cases

which propose that education time has not such a great impact on income. Since neither way of

thinking is more disagreeable than the other, the purpose of this paper is to try to answer whether

education time actually has some form of effect on a person’s hourly income by the use of

regression analysis.

Having a thorough understanding of this relationship is important. On the one hand, students

can use these types of information to make decisions on how long they should stay on campus

and whether they should continue their education beyond the undergraduate level. On the other

hand, our policymakers can also refer to the data of this paper to see whether the education

3
system is actually doing its job well. For example, suppose that a longer education time really

has a negative effect on average income, then an argument calling for structural reforms could be

made.

Part c: Formulation of the baseline linear model

In order to construct a linear regression model, the first thing that needs to be done is

identifying the independent variables and dependent variables.

As it has been stated above, we want to estimate the effect of education time on average

income, therefore the education time (“educ”) will be the independent variable, and the average

hourly earnings (“wage”), will be the dependent variable of our model:

Wage=β 1 Educ+ β 2 Exper + β 3 Female+u i

In the equation, it can be seen that there are two other independent variables aside from

“educ”, which are “female” and “exper”.

“Female” here is a binary variable for the gender which takes a value of 1 if the person is

female, and a value of 0 if the person is male. “Exper” is a variable which measures the years of

potential work experience that the person may have at one point in time. As more experienced

people know the way their jobs work better than others, they are likely to receive higher income

as they use their accumulated experience to perform their duty, thus a positive relationship

between “exper” and “wage” is expected (b3>0). On the contrary, because of the glass ceiling

that women usually face in the workplace, we anticipate a negative relationship between

“female” and “wage” (b2<0), ceteris paribus.

4
No matter how well we conduct our research, education time alone may not be sufficient to

explain all of the variance in wage, and omitted variable bias is certain to exist if we limit it to

just one independent variable.  Omitted variable bias occurs when “the omitted variable is

correlated with the included regressor, and when the omitted variable is a determinant of the

dependent variable” (Stock & Watson, 2011, p. 183).

Here, we can see that both of the extra variables, “female” and “exper” can affect the wages

received by a person, as explained above. In addition, they correlate with our main independent

variable “educ” in one way or the other: Firstly, females typically get less opportunity for long

education than males, because of certain lasting prejudice towards women in the society. For

example: Many people are still of the (sexist) opinion that females do not have to learn much

academically since their job is to stay at home in the kitchen, thereby limiting their chances to go

to school for longer periods of time. Secondly, the longer a person studies at school, the less

likely for him/her to accumulate years of working experience on the way, since the extra year of

schooling can be understood as a form of opportunity cost: People could have lessened their

education time to start working sooner, and begin to gather experience through interaction with

the workplace, the employers, coworkers, etc.

In short, we choose to include “exper” and “female” because those would be the most likely

to become detrimental to our model’s results should they be omitted.

Part d: Data description

Our data is cross sectional data, collected from a sample of 526 employed individuals in the

United States, which is available on the Gretl program’s website.

5
The main variables, “wage” and “educ” are measured in dollars and years, respectively,

while the extra variable “exper”, is also measured in years. Since “female” is a dummy variable,

it does not have a unit of measurement.

Table 1: Summary statistics for variables, 526 observations

Standard
Variables Mean Minimum Maximum
Deviation
Average hour
5,896 3,693 0,530 24,980
earnings (Wage)
Education 12,563 2,769 0 18
Experience 17,017 13,572 1 51
Female 0,479 0,500 0 1

Scatterplot for “educ” and “wage”:

The scatterplot suggests a somewhat weak positive linear relationship between the main

variables “educ” and “wage”. There also appears to be some outliers above the regression line,

6
which may have been due to certain errors in measurement or sampling methods. However, since

the outliers are not too far away from our results, we have decided to keep analyzing with them

included to see if they actually cause irregularities.

Part e: Estimation of linear models

Table 2: Linear Regression Models of Personal income

Wage=β 1 Educ+ β 2 Exper + β 3 Female+u i

Dependent variable: Average hour earnings (wage), number of observations: 526

Regressor (1) (2) (3)


F-statistics and Summary Statistics
Educational 0,541*** 0,644*** 0,602***
F-statistic 48.8245
SER (Educ)
level 3,378
(0,061) 3,257
(0,065) 3,078
(0,064)

Adjusted R2
Experience 0,163 0,222
0,007*** 0,305
0,064***

(Exper) (0,011) (0,010) Part f:


Female -2,155***
Interpretations of linear
(Female) (0,259)
Intercept -0,905 -3,340*** -1,734** model

(0,725) (0,865) (0,858) The first regression

function is a function of the dependent variable and our main independent variable, “wage” and

“educ”, respectively. According to the information obtained in column 1 of table 2, the

coefficient of “educ” is 0.541. This means for each year of increased education, the average

hourly income received is expected to increase by $0.541.

However, when we introduce another regressor, “exper”, into our model, the estimated effect

has increased: For each year of increased education, average hourly income is expected to

increase by $0.644. Much of the same can be said about the inclusion of the 3rd and final

7
regressor - “female”- is added, as the coefficient changes again, this time in downward fashion,

into 0.602. This result in column 3 states that, with the inclusion of both extra variables “exper”

and “female”, the amount of money a person can get for an additional year staying on campus

would approximately be $0.602/hour.

The t-statistics obtained for all 3 coefficients in column 1, 2 and 3 are 8.837, 9.883, and

9.388, respectively. These numbers are all greater than the 1% critical value, which is 2.58,

therefore, we can comfortably reject the null hypothesis that each of these coefficients are equal

to zero.  In other words, our results are statistically significant.

The values of the intercept across the three functions are negative, and since wage cannot be

negative, it is best not to interpret the intercepts here.

All in all, the relationship between our 2 variables are identical to what we have previously

predicted in Part b: Education time has a positive linear relationship with average hourly

earnings.

Part g: Discussion of Single versus Multivariate Regressions

The coefficient of the first extra variable, “exper”, as seen in column 2 of table 2, is 0.07,

which means for each additional year of working experience, average hourly earnings would

increase by $0.07. Nevertheless, when we move our attention to column 3, where the other extra

variable “female” is also introduced, this effect has decreased to 0.064, indicating that we would

earn a little bit fewer dollars for each year spent gathering more experience. Perhaps, the most

unfortunate finding here is our model’s suggestion for gender discrimination in the workplace:

Females, on average, earn less than males by $2.155 per hour!

8
Omitted variable bias, as stated, happens when the omitted variables are determinants of the

dependent variables, and they correlate with at least one other independent variable. While the

number of omitted variables in a regression model can be infinite, not all of them cause “bias”.

However, the data collected in columns 2 and 3 of table 2 indicates that we are indeed seeing the

bias in action: The significant difference in the slope coefficients of the main variable “educ” 

observed when we add a new variable implies there are changes in their explanatory power as

more omitted variables become accounted for.

In part c, we have speculated that the “exper” variable has a negative correlation with the

independent variable “educ” and a positive correlation with the dependent variable “wage”,

therefore, we expect the slope coefficient for “educ” to be underestimated. As it turns out in the

results of the model, the slope coefficient for “educ” has, indeed, increased as a result of adding

“exper”, suggesting that our model is going in the right direction. Furthermore, our postulation

that “female” is negatively correlated with both “educ” and “wage” appears to be appropriate as

well, since we have overestimated the slope coefficient, and there is truly a decrease in the value

of the slope from column 2 to column 3, where the “female” variable is integrated.

In essence, by adding the omitted variables and treating them as independent variables

alongside “educ”, we can safely assume that the bias associated with those variables has been

eliminated.

Part h: Discussion on the measures of fit

In our final regression function, the R-squared and adjusted R-squared are 0.309 and 0.305,

respectively. Since the difference between these 2 measures of fit are not too large, we can say

9
that about 30-31% of the variance in average hourly earnings is explained by our regression

model.

The SER’s value, 3.078, indicates that the typical average mistake made by the regression

model is 3.078 dollars of hourly income. The reduction in SER throughout the three regression

functions suggests that predictions about average earnings are substantially more precise if they

are made using the regression with all 3 variables than if they are made using the regression with

only “educ” as a regressor.

Part i: Empirical Results from a Nonlinear Model

Although the fit of the linear regression in part h is quite impressive, it has been suggested

that we should include natural log terms of “educ” and “wage”, in order to test whether their

relationship is actually linear or not. In addition, log terms also allow for the measurement of

elasticity, which would be helpful later in the future. The results of the nonlinear regression with

log terms can be seen in Table 3 below:

Table 3: Non-Linear Regression Models of Personal income

ln ⁡(Wage)=β 1 ln ⁡( Educ)+ β 2 Exper + β 3 Female+ui

Dependent variable: Average hour earnings (wage), number of observations: 526

Summary Statistics(1)
Regressor (2) (3)
SER
ln (Educ) 0,435
0,971***

(0,100)
Adjusted R2
Experience 0,329
0,010***

(Exper) (0,001)
Female -0,373***

(Female) (0,380)
Intercept -0,793*** 10
(0,265)
Part j: Interpretation of Nonlinear Model(s)

The results of the log-log regression function are statistically significant at the 1% level ,

since the p-values for all the coefficients are lower than 1%.

For each percentage increase in education time, average hourly earnings is estimated to

increase by 97%. Females, on average, earn 37.3% less per hour than males. An additional year

of accumulated experience is associated with a 0.95% increase in average hourly earnings

Part k: Summary and Discussion

Education and income are concerns of almost every individual on planet Earth.

Understanding the mechanics behind their relationship may be crucial for one’s career, job, and

policy decision making. This paper presents some estimates of a possible linear relationship

between these two variables by looking at data collected from workers across the U.S. However,

while the results received from our models are significant, they cannot, and should not, be

considered “complete”, since there are still possibilities for further bias regarding the estimator:

Firstly, internal validity may not have been attained. Omitted variable bias may still exist.

“Exper” and “female” are by no means the only omitted variables that can distort the model. One

might argue that “wealth” may also cause bias, since families with a high level of wealth can use

their money to get their children into good positions with high wages, and wealthy families

typically enjoy better education than others. People may lie about their education time and

experience to save face, making our variables’ data susceptible to error-in-variables bias.

Simultaneous causality may also pose a problem here, since a person’s average hourly income

can be a determinant of his/her education time. For instance, the person in question may be

someone working for money to spend on his/her masters’ degree, or to pay school-related debts.

11
Our scatterplot shown in Part d suggests a weak linear relationship, therefore our result may have

been interpreted in a wrong functional form. The log-log function discussed appears to be a more

suitable model, given the nonlinear spread of data in the scatterplot. The presence of outliers also

suggests the probability of sample selection bias.

Secondly, it is challenging to claim that our model is externally valid. To be externally valid,

the coefficients should not vary greatly when we apply this exact model to another country with

similar backgrounds and settings as the U.S. As we know, each place has drastically different

political structures, social norms, legal responsibilities, geographical conditions, etc. Given the

time constraints we have when we construct the model, external validity may be too much to test

for. 

Despite the imperfect elements, our analysis may still be useful as reference materials for

researchers interested in this topic. They may be able to learn from our setbacks and improve the

analysis, so that the results can be internally and externally valid. All in all, any academic work

will have a certain type of value, be it large or small, for those interested.

Part l: Bibliography

12
Part m: Reference

Employers Raising Their Educational Requirements. (2016, March 06). CBIA. Retrieved

November 22, 2020, from https://www.cbia.com/news/hr-safety/employers-raising-education-

requirements/

James H. Stock and Mark W. Watson. (2011). Linear Regression with Multiple Regressors.

Introduction to Econometrics (3rd ed., p. 183). Pearson

Part n: Appendices

13
14
15
16

You might also like