You are on page 1of 10

Regression Analysis of Student Performance in High School

Examination (Evidence from USA)

Student: Shahzad Munir ID: 27720170155975


Home Work (2)
Micro-Econometrics and Application

Purpose of the study is to analysis the Marks Secured by the students in high school Students from

the United States by using simple linear regression. The data has been acquired from Kaggle https://

www.kaggle.com/spscientist/students-performance-in-exams. The data contain eight variables such

as Gender, Race/Ethnicity, Parental Level of Education, Lunch, Test Preparation Course, Math Score,

Reading Score and Writing Score. Here, Gender, Race/Ethnicity, Parental Level of education, Lunch and

Test Preparation Course are categorical variable and assumed them as independent variables. Math Score,

Reading Score and Writing Score are considered as dependent variables. For data analysis I have used

R-language and codes are given in appendix.

1 Regression Analysis

1.1 Effect of Gender on Scores

In this section, I have analyzed the Gender effect on scores obtained by the students. For this purpose have

regressed marks obtain in mathematics, reading and writing gender variable (male or female). The Ordinary

Least Square (OLS) results are given in Table 1.

1
Table 1:

Dependent variable:

‘Math Score‘ ‘Reading Score‘ ‘Writing Score‘

(1) (2) (3)

Gendermale 5.095∗∗∗ −7.135∗∗∗ −9.156∗∗∗

(0.946) (0.896) (0.917)

Constant 63.633∗∗∗ 72.608∗∗∗ 72.467∗∗∗

(0.657) (0.622) (0.637)

∗ ∗∗ ∗∗∗
Note: p<0.1; p<0.05; p<0.01

The results of Table 1 show that the male students are secured 5.095 times more scores in mathematics

than female students on average but they got 7.135 and 9.156 times less marks than female students in

reading and writing subjects on average respectively. It is also noted that the gender effect on scores is

statistically significant for all subject. The Table 2 contains the results of gender effect on marks after

including parental level of education as a control variable.

2
Table 2:

Dependent variable:

‘Math Score‘ ‘Reading Score‘ ‘Writing Score‘

(1) (2) (3)

Gendermale 5.366∗∗∗ −6.836∗∗∗ −8.778∗∗∗

(0.933) (0.881) (0.890)

‘Parental Education‘bachelor’s degree 1.568 1.994 3.385∗∗

(1.677) (1.583) (1.600)

‘Parental Education‘high school −5.975∗∗∗ −5.930∗∗∗ −7.071∗∗∗

(1.444) (1.363) (1.377)

‘Parental Education‘master’s degree 2.333 3.846∗ 5.012∗∗

(2.158) (2.037) (2.059)

‘Parental Education‘some college −0.757 −1.465 −1.052

(1.391) (1.313) (1.327)

‘Parental Education‘some high school −4.462∗∗∗ −3.893∗∗∗ −4.884∗∗∗

(1.479) (1.396) (1.411)

Constant 65.321∗∗∗ 74.192∗∗∗ 74.088∗∗∗

(1.084) (1.023) (1.034)

∗ ∗∗ ∗∗∗
Note: p<0.1; p<0.05; p<0.01

3
It has been observed that the gender impact on marks is still statistically significant after including

parental level of education. The male students secured more scores in mathematics on average than female

and they have secured less scores in reading and writing subject than female on average. It is also observed

that the students how’s parents have master and bachelor degree got more score than those how’s parents

have associates degrees. The results show that students who’s parents have college and high school education

performed less than those who’s parents have associates degree.

1.2 Effect of Race/Ethnicity on Scores

This section contains the discussion about the impact of Race/Ethnicity on students marks. The students

are divided in four groups A, B, C and D. For this purpose I have regress math score, reading score and

writing score. The results are given in Table 3.

Table 3:

Dependent variable:

‘Math Score‘ ‘Reading Score‘ ‘Writing Score‘

(1) (2) (3)

‘Race/Ethnicity‘group B 1.823 2.678 2.926

(1.897) (1.858) (1.928)

‘Race/Ethnicity‘group C 2.835 4.429∗∗ 5.153∗∗∗

(1.770) (1.734) (1.800)

‘Race/Ethnicity‘group D 5.733∗∗∗ 5.356∗∗∗ 7.471∗∗∗

(1.812) (1.775) (1.842)

‘Race/Ethnicity‘group E 12.192∗∗∗ 8.354∗∗∗ 8.733∗∗∗

(2.002) (1.961) (2.035)

Constant 61.629∗∗∗ 64.674∗∗∗ 62.674∗∗∗

(1.565) (1.533) (1.591)

∗ ∗∗ ∗∗∗
Note: p<0.1; p<0.05; p<0.01

4
From Table 3, it has been noticed that the students form Group B, Group C and Group D relatively

performed better than the students of Group A. The students from Groups E relatively secured better marks

than other groups (Group A, Group B, Group C and Group D). The results for Group C, Group D and

Group E are statistically significant.

1.3 Effect of Test Preparation Course on Scores

The objective of this section is to analyze the impact of Test Preparation Course on scores obtained by

students. Here, the test preparation course is divided in two groups, one those complete preparation course

and second those who did not complete preparation course. The OLS results are given bellow,

Table 4:

Dependent variable:

‘Math Score‘ ‘Reading Score‘ ‘Writing Score‘

(1) (2) (3)

‘Test Preparation Course‘none −5.618∗∗∗ −7.360∗∗∗ −9.914∗∗∗

(0.985) (0.935) (0.952)

Constant 69.696∗∗∗ 73.894∗∗∗ 74.419∗∗∗

(0.789) (0.749) (0.763)

∗ ∗∗ ∗∗∗
Note: p<0.1; p<0.05; p<0.01

From Table 4, it is observed that the students who did not complete their preparation course got relatively

less scores than those students how have completed their test course. They have got 5.618, 7.630 and 9.914

times less scores (on average)in Math, Reading and Writing respectively than those have completed their

test preparation course. The coefficient of Test Preparation Course (Non) are statistically significant for all

subjects. In Table 5 I have included Lunch as a control variable and check impact of test preparation course

on students scores. Here the variable Lunch is a categorical variable and divided into two groups, one is for

5
those students who received free/reduce Lunch and second group is for those students who received Lunch

on standard price. The OLS results are given in Table 5.

Table 5:

Dependent variable:

‘Math Score‘ ‘Reading Score‘ ‘Writing Score‘

(1) (2) (3)

‘Test Preparation Course‘none −5.808∗∗∗ −7.481∗∗∗ −10.050∗∗∗

(0.919) (0.908) (0.919)

lunchstandard 11.212∗∗∗ 7.128∗∗∗ 7.972∗∗∗

(0.921) (0.910) (0.921)

Constant 62.586∗∗∗ 69.374∗∗∗ 69.364∗∗∗

(0.940) (0.928) (0.940)

∗ ∗∗ ∗∗∗
Note: p<0.1; p<0.05; p<0.01

The results for Test Preparation Course (Non) in Table 5 are same as results observed in Table 4. From

Table 5 it is revealed that the students who got Lunch at standard price relatively perform well than those

students who received Lunch at Free/Reduce price.

1.4 Combine Effect

In this section I have included all the independent variables (Gender, Parental Level of Education, Race/Eth-

nicity, Lunch and Test Preparation Course) in single regression line for each dependent variables (Math Score,

Reading Score and Writing Score). The OLS results are given in Table 6.

In Table 6, Race/Ethnicity, Parental Level of education, Lunch and Test Preparation Course are con-

sidered as control variables and analyzed the effect of Gender on exam scores. It is observed that the male

6
Table 6:
Dependent variable:
‘Math Score‘ ‘Reading Score‘ ‘Writing Score‘
(1) (2) (3)
∗∗∗ ∗∗∗
Gendermale 4.995 −7.071 −9.096∗∗∗
(0.839) (0.823) (0.795)

‘Race/Ethnicity‘group B 2.041 1.326 1.220


(1.700) (1.667) (1.610)

‘Race/Ethnicity‘group C 2.470 2.274 2.413


(1.592) (1.561) (1.508)

‘Race/Ethnicity‘group D 5.341∗∗∗ 4.106∗∗ 5.931∗∗∗


(1.624) (1.592) (1.539)

‘Race/Ethnicity‘group E 10.135∗∗∗ 5.514∗∗∗ 5.137∗∗∗


(1.802) (1.766) (1.707)

‘Parental Education‘bachelor’s degree 1.966 2.156 3.485∗∗


(1.502) (1.473) (1.423)

‘Parental Education‘high school −4.803∗∗∗ −4.900∗∗∗ −5.814∗∗∗


(1.297) (1.272) (1.229)

‘Parental Education‘master’s degree 2.888 4.205∗∗ 5.183∗∗∗


(1.938) (1.900) (1.836)

‘Parental Education‘some college −0.583 −1.280 −0.920


(1.247) (1.223) (1.181)

‘Parental Education‘some high school −4.249∗∗∗ −4.049∗∗∗ −5.322∗∗∗


(1.333) (1.307) (1.263)

lunchstandard 10.877∗∗∗ 7.246∗∗∗ 8.203∗∗∗


(0.873) (0.856) (0.827)

‘Test Preparation Course‘none −5.495∗∗∗ −7.362∗∗∗ −10.059∗∗∗


(0.876) (0.859) (0.830)

Constant 57.631∗∗∗ 71.278∗∗∗ 71.914∗∗∗


(1.872) (1.836) (1.774)

∗ ∗∗ ∗∗∗
Note: p<0.1; p<0.05; p<0.01

students relatively performed well than female in Mathematics and got 4.995 times relatively better scores

than female on average. On the other hand female students got better scores in Reading and Writing. The

male students relatively received 7.071 and 9.096 times less scores in Reading and Writing respectively. The

gender coefficients are statistically insignificant. The results revealed that the students who finished their

Test Preparation Course achieved better scores than those who did not finished or did not take Test Prepa-

ration Course. It is also observed that students who did their standard lunch performed well than those how

did not get standard lunch. The students who’s parents education level is high school attained significantly

7
less scores than other students who’s parents education level is more than high school.Findings of Table 6

are supported our all above finding.

2 Bootstrap

In this I have estimated the regression coefficients given in Table 6 by using bootstrap method. The R-Codes

for bootstrap are given in appendix. The bootstrap estimation is given in Table 7, Table 8 and Table 9 for

Math Score, Reading Score and Writing Score respectively. The data is replicated 1000 times. The bootstrap

regression coefficients are very close to original regression coefficient given in Table 6.

Table 7: Bootstrap Statistics for Math Score Table 8: Bootstrap Statistics for Reading Score
R original bootBias bootSE bootMed R original bootBias bootSE bootMed
1 1000.00 57.63 -0.06 1.85 57.57 1 1000.00 71.28 0.02 1.89 71.28
2 1000.00 5.00 0.04 0.83 5.04 2 1000.00 -7.07 -0.03 0.84 -7.08
3 1000.00 2.04 -0.04 1.66 2.01 3 1000.00 1.33 -0.01 1.74 1.30
4 1000.00 2.47 0.05 1.50 2.57 4 1000.00 2.27 0.02 1.61 2.38
5 1000.00 5.34 -0.01 1.51 5.37 5 1000.00 4.11 0.03 1.65 4.16
6 1000.00 10.13 0.09 1.78 10.25 6 1000.00 5.51 0.01 1.89 5.50
7 1000.00 1.97 0.06 1.50 2.04 7 1000.00 2.16 -0.07 1.54 2.13
8 1000.00 -4.80 0.00 1.32 -4.81 8 1000.00 -4.90 -0.01 1.22 -4.92
9 1000.00 2.89 0.00 1.81 2.84 9 1000.00 4.21 0.01 1.86 4.23
10 1000.00 -0.58 -0.00 1.27 -0.58 10 1000.00 -1.28 -0.07 1.19 -1.31
11 1000.00 -4.25 0.05 1.42 -4.09 11 1000.00 -4.05 0.05 1.41 -3.99
12 1000.00 10.88 0.01 0.90 10.91 12 1000.00 7.25 -0.02 0.85 7.20
13 1000.00 -5.49 -0.01 0.84 -5.52 13 1000.00 -7.36 0.00 0.87 -7.36

Table 9: Bootstrap Statistics for Writing Score


R original bootBias bootSE bootMed
1 1000.00 71.91 -0.05 1.81 71.82
2 1000.00 -9.10 0.03 0.78 -9.04
3 1000.00 1.22 0.08 1.66 1.27
4 1000.00 2.41 0.06 1.56 2.51
5 1000.00 5.93 0.13 1.55 5.98
6 1000.00 5.14 0.02 1.78 5.13
7 1000.00 3.48 -0.04 1.37 3.47
8 1000.00 -5.81 0.02 1.23 -5.81
9 1000.00 5.18 -0.11 1.70 5.04
10 1000.00 -0.92 0.00 1.15 -0.90
11 1000.00 -5.32 -0.04 1.29 -5.39
12 1000.00 8.20 -0.00 0.85 8.20
13 1000.00 -10.06 -0.02 0.79 -10.10

References

1. https://www.kaggle.com/spscientist/students-performance-in-exams.

2. https://cran.r-project.org/web/packages/stargazer/vignettes/stargazer.pdf.

3. https://github.com/kjhealy/latex-custom-kjh/blob/master/needs-listings/example.tex.

4. https://www.statmethods.net/advstats/bootstrapping.html.

5. https://www.datacamp.com/community/tutorials/linear-regression-R.

8
6. https://tex.stackexchange.com/questions/2832/how-can-i-have-two-tables-side-by-side.

7. https://tex.stackexchange.com/questions/297564/why-is-my-table-before-the-section-title.

Appendix
rm( l i s t = l s ( ) )
l i b r a r y (AER)
library ( boot )
l i b r a r y ( s an d w i c h )
library ( readxl )
library ( s t a r g a z e r )
library ( xtable )
S t u d e n t s P e r f o r m a n c e <− read_e x c e l ( "F : /XIAMEN␣UNIVERSITY/COURSE␣WORK/4/Micro−E c o n o m e t r i c s /Data/
StudentsPerformance . xlsx " )
attach ( S t u d e n t s P e r f o r m a n c e )
head ( S t u d e n t s P e r f o r m a n c e )
model1=lm ( ‘Math S c o r e ‘ ~ Gender )
model2=lm ( ‘ Reading S c o r e ‘ ~ Gender )
model3=lm ( ‘ W r i t i n g S c o r e ‘ ~ Gender )
s t a r g a z e r ( model1 , model2 , model3 , t a b l e . p l a c e m e n t = " htbp ! " )
model4=lm ( ‘Math S c o r e ‘ ~ Gender +‘ P a r e n t a l Education ‘ )
model5=lm ( ‘ Reading S c o r e ‘ ~ Gender +‘ P a r e n t a l Education ‘ )
model6=lm ( ‘ W r i t i n g S c o r e ‘ ~ Gender +‘ P a r e n t a l Education ‘ )
s t a r g a z e r ( model4 , model5 , model6 , t a b l e . p l a c e m e n t = " htbp ! " )
model7=lm ( ‘Math S c o r e ‘ ~ ‘ Race/ E t h n i c i t y ‘ )
model8=lm ( ‘ Reading S c o r e ‘ ~ ‘ Race/ E t h n i c i t y ‘ )
model9=lm ( ‘ W r i t i n g S c o r e ‘ ~ ‘ Race/ E t h n i c i t y ‘ )
s t a r g a z e r ( model7 , model8 , model9 , t a b l e . p l a c e m e n t = " htbp ! " )
model10=lm ( ‘Math S c o r e ‘ ~ ‘ T e s t P r e p a r a t i o n Course ‘ )
model11=lm ( ‘ Reading S c o r e ‘ ~ ‘ T e s t P r e p a r a t i o n Course ‘ )
model12=lm ( ‘ W r i t i n g S c o r e ‘ ~ ‘ T e s t P r e p a r a t i o n Course ‘ )
s t a r g a z e r ( model10 , model11 , model12 , t a b l e . p l a c e m e n t = " htbp ! " )
model13=lm ( ‘Math S c o r e ‘ ~ ‘ T e s t P r e p a r a t i o n Course ‘+ l u n c h )
model14=lm ( ‘ Reading S c o r e ‘ ~ ‘ T e s t P r e p a r a t i o n Course ‘+ l u n c h )
model15=lm ( ‘ W r i t i n g S c o r e ‘ ~ ‘ T e s t P r e p a r a t i o n Course ‘+ l u n c h )
s t a r g a z e r ( model13 , model14 , model15 , t a b l e . p l a c e m e n t = " htbp ! " )
model16=lm ( ‘Math S c o r e ‘ ~ Gender +‘ Race/ E t h n i c i t y ‘ + ‘ P a r e n t a l Education ‘+ l u n c h +‘ T e s t P r e p a r a t i o n
Course ‘ )
model17=lm ( ‘ Reading S c o r e ‘ ~ Gender +‘ Race/ E t h n i c i t y ‘ + ‘ P a r e n t a l Education ‘+ l u n c h +‘ T e s t P r e p a r a t i o n
Course ‘ )
model18=lm ( ‘ W r i t i n g S c o r e ‘ ~ Gender +‘ Race/ E t h n i c i t y ‘ + ‘ P a r e n t a l Education ‘+ l u n c h +‘ T e s t P r e p a r a t i o n
Course ‘ )
s t a r g a z e r ( model16 , model17 , model18 , t a b l e . p l a c e m e n t = " htbp ! " )
b e t a 1 <− function ( data , index ) { c o e f (lm ( ‘Math S c o r e ‘ ~ Gender +‘ Race/ E t h n i c i t y ‘ + ‘ P a r e n t a l Education ‘+
l u n c h+T e s t P r e p a r a t i o n Course ‘ , data = S t u d e n t s P e r f o r m a n c e , subset = index ) ) }
b e t a 1=b o o t ( S t u d e n t s P e r f o r m a n c e , beta1 ,R=1000)

9
b e t a 2 <− function ( data , index ) { c o e f (lm ( ‘ Reading S c o r e ‘ ~ Gender +‘ Race/ E t h n i c i t y ‘ + ‘ P a r e n t a l Education
‘+ l u n c h +‘ T e s t P r e p a r a t i o n Course ‘ , data = S t u d e n t s P e r f o r m a n c e , subset = index ) ) }
b e t a 2=b o o t ( S t u d e n t s P e r f o r m a n c e , beta2 ,R=1000)
b e t a 3 <− function ( data , index ) { c o e f (lm ( ‘ W r i t i n g S c o r e ‘ ~ Gender +‘ Race/ E t h n i c i t y ‘ + ‘ P a r e n t a l Education
‘+ l u n c h +‘ T e s t P r e p a r a t i o n Course ‘ , data = S t u d e n t s P e r f o r m a n c e , subset = index ) ) }
b e t a 3=b o o t ( S t u d e n t s P e r f o r m a n c e , beta3 ,R=1000)
b e t a 1=print . A s I s ( b e t a 1 )
b e t a 2=print . A s I s ( b e t a 2 )
b e t a 3=print . A s I s ( b e t a 3 )
xtable ( beta1 )
xtable ( beta2 )
xtable ( beta3 )

10

You might also like