You are on page 1of 12

MAE

301

Testing the Significance of Government


Spending on High School Graduation Rates
Garrett Knapp
This study uses per student spending, teacher-student ratios, and teacher salaries to model
high school drop out rates from the years 1980-2006. The experiment tests two hypotheses:
first, linear regressions using only one variable were conducted to test the null hypothesis
that high school dropout rates can be accurately modeled by one variable; secondly, all
three variables were combined using multiple linear regression to test the null hypothesis
that dropout rates can be modeled by a linear model combination of the aforementioned
variables. For each hypothesis, significance was tested using an alpha level of 0.05.

MAE 301 Final Project Dec. 5, 2014

12/5/14

Introduction:
In a time where even having a college education will not guarantee a job post
graduation, it has been even worse for those who are unable to graduate high
school. The median adult income for high school dropouts in 2006 was $23,000 and
was nearly half of the $42,000 median income for those with at least a diploma or
GED(Trends in High School Dropout Rates). Furthermore, a study by Levin and
Belfied estimates that the dropping out of high school costs are American economy
around $240,000 over his or her life when accounting for lower tax contributions,
higher rates of crime, and higher dependency on social services like welfare and
Medicaid. Therefore, finding a pattern in dropout rates and being able to detect
what variables are associated with a decrease in dropout rates has the ability to
prevent the students from spending their life stuck in the lower class as well as
saving the American economy from unnecessary spending on social service
programs.
It is quite clear that there has been a continuous decrease in Americas high school
dropout rate since the year 1980. From 1980 to 2006, the high school dropout rate
fell from 6.1% to 3.8% (Trends in High School Dropout Rates). This decrease in high
school dropouts amounts to a difference of 248,000 less students dropping out, even
though the amount of total students enrolled in high school rose by 91,000. Using
the $240,000 figure stated earlier, this estimates a lifetime savings of $59.52 billion
as a result of dropping the high school dropout rate from 6.1% to 3.8% in the year
2006 alone. Using the median income difference, this decrease in high school
dropouts would result in an income increase $4.712 billion dollars per year. This
study uses numerical data to find what variables are most strongly associated with
this decrease in high school graduation rates.
The variables in this study were chosen because they are all associated with
education spending. This study looks at pupil-teacher ratio, per student
expenditures, and average teacher salary. These variables are all continuous
random variables, which lends itself well to curve-fitting applications. When
conducting multiple linear regressions, variables must be chosen to avoid
multicollinearity. Multicollinearity arises when two variables used in the curvefitting model predict each other more so than the outcome variable. The three
variables that I have chosen avoid this problem: Pupil-teacher ratio looks at the size
of each classroom, teacher salary works under the assumption that a higher salary is
associated with better teachers, and per-student spending accounts for the amount
of resources each student has available to them. All of these variables are strongly
related with the outcome variable, dropout rates, and less associated with each
other.

Procedure:
In any statistical analysis, it is important to define the sample space. For this
analysis, the sample space is all students who were enrolled in public schools grades
10-12. The dropout rate calculated each year is taken from the amount of students
enrolled in grades 10-12 that year. This means that once a student drops out of high
school, they are counted as part of the dropout rate for the given year but are not a
part of the sample space in the following year.
Each of these dropout rates was used as the output variable for each of the
regression conducted in the study. A linear regression was used each time, which
was determined after looking at the plot of graduation rates from 1980-2006. One
multiple linear regression was conducted to show how powerful the combination of
all three of the variables is in reducing high school dropout rates. Three linear
regressions were also run with each of the predictor variables to see how each
individual variable was associated with the decrease in high school drop out rates.

Results:
This study was done to determine the relations between high school dropout rates
and various education spending measures. As stated earlier, high school drop out
rates have continually fell, as seen by the figure below.

Linear Regressions for each individual input variable were done to find the rsquared values of each input variable. These r-squared values show how much of
the variation in high school dropout rates is accounted for by the predictor variable.
The variable with the highest r-squared variable is the variable that has the highest
standalone correlation with the decrease in high school dropout rates. The
individual r-squared values are presented in the table below.
Predictor Variable
R-Squared
Teacher Salary
0.481
Student/Teacher Ratio
0.388
Per Student Spending
0.386
The plot of each linear regression are presented in the following pages, with
confidence bounds on each plot to show how many data points from each regression
lie within the confidence interval with an alpha of 0.05.

Linear regression model:


y ~ 1 + x1
Estimated Coefficients:
Estimate SE
tStat pValue
(Intercept) 7.5275 0.75482 9.9725 2.0331e-09
x1
-0.453 0.12467 -3.6335 0.001555
Number of observations: 23, Error degrees of freedom: 21
Root Mean Squared Error: 0.518
R-squared: 0.386, Adjusted R-Squared 0.357
F-statistic vs. constant model: 13.2, p-value = 0.00155

Linear regression model:


y ~ 1 + x1
Estimated Coefficients:
Estimate SE
tStat
pValue
(Intercept) -0.91007 1.5716 -0.57907
0.5687
x1
0.38143 0.1045 3.6502 0.0014947
Number of observations: 23, Error degrees of freedom: 21
Root Mean Squared Error: 0.517
R-squared: 0.388, Adjusted R-Squared 0.359
F-statistic vs. constant model: 13.3, p-value = 0.00149

Linear regression model:


y ~ 1 + x1
Estimated Coefficients:
Estimate SE
tStat pValue
(Intercept) 11.764 1.5773 7.4584 2.4865e-07
x1
-0.12687 0.02873 -4.4158 0.00024042
Number of observations: 23, Error degrees of freedom: 21
Root Mean Squared Error: 0.476
R-squared: 0.481, Adjusted R-Squared 0.457
F-statistic vs. constant model: 19.5, p-value = 0.00024

Lastly, a linear regression involving all three input variables was run to see how a
combination of these variables could be used to model high school dropout rates. A
plot like the ones before cannot be produced because there are too many variables
to represent, even on a 3-D plot. Instead, a plot of the residuals is shown below.

Linear regression model:


y ~ 1 + x1
Estimated Coefficients:
Estimate SE
tStat pValue
(Intercept)
0.5 0.014783 33.822 8.4405e-20
x1
0.61826 0.032604 18.963 1.0848e-14
Number of observations: 23, Error degrees of freedom: 21
Root Mean Squared Error: 0.0709
R-squared: 0.945, Adjusted R-Squared 0.942
F-statistic vs. constant model: 360, p-value = 1.08e-14

The multiple-linear regression model yielded the results below:


Linear regression model:
y ~ 1 + x1 + x2 + x3
Estimated Coefficients:
Estimate SE
tStat pValue
(Intercept) 8.9595 9.0338 0.99178 0.33377
x1
0.10833 0.38711 0.27983 0.78263
x2
0.056633 0.47943 0.11813 0.90721
x3
-0.11153 0.060793 -1.8346 0.082269
Number of observations: 23, Error degrees of freedom: 19
Root Mean Squared Error: 0.499
R-squared: 0.485, Adjusted R-Squared 0.404
F-statistic vs. constant model: 5.97, p-value = 0.0048

Conclusion:
This study was conducted to determine if student-teacher ratios, teachers
salary, and per student spending can be used as good predictors in modeling high
school dropout rates. For each individual regression, the corresponding p-value was
below 0.05, which allows us to accept the null hypothesis that states high school
dropout rates are linearly related to teacher salaries, student-teacher ratio, and per
student spending. Assessing each of these variables individually, the r-squared
values from the individual regressions indicate that teacher salaries have the
strongest ability to predict high school graduation rate. The downward slope of the
regression confirms that the higher the teacher salary, the lower the dropout rate
within the sample space. The relative predictive strength of teacher salaries is also
shown in the fact that the corresponding regression has the lowest p-value and has
more data points within its confidence interval than the other regressions. This
conclusion lends itself to the recommendation that increases in federal education
spending should be geared towards teachers in order to obtain a lower dropout
rate.
In combining all three variables to create a multiple linear regression, the
null hypothesis that high school dropout rates can be modeled by a linear
combination of the previously mentioned variables must be rejected because none
of the variables have an associated p-value of 0.05. This does not necessarily mean
that the chosen variables are poor predictors of dropout rates, but it does mean that
a linear combination of these variables provides weak predictive ability in modeling
dropout rates. This multiple regression could yield greater power if the data was
transformed to something other than a linear plot; an exponential model could fit
better fit the model. For example, in a school that has very low per student spending,
increasing spending by $500/student could have a greater impact than if a school
already spends adequate money on each student.
Although all three variables could not be combined into a single, strong model, this
study does show that teacher salaries, per student spending, and student-teacher
ratios all do play a roll in reducing high-school dropout rates. Teacher salaries had
the strongest correlation with the decline in dropout rates, so starting with this
while also increasing per-student spending and limiting student-teacher ratios all
have the ability to provide more Americans with a high school diploma, and
effectively help lessen the burden on both the lower class and taxpayer money spent
on social services.

References:
Chapman, C., Laird, J., and KewalRamani, A. (2010). Trends in High School Dropout
and Completion Rates in the United States: 19722008 (NCES 2011-012).
National Center for Education Statistics, Institute of Education Sciences, U.S.
Department of Education.Washington, DC.
National Education Association, Estimates of School Statistics, 1959-60 through
2011-12; and unpublished tabulations. U.S. Department of Commerce,
Bureau of Economic Analysis, National Income and Product Accounts,
tables 6.6B-D, retrieved November 2, 2011,
from http://www.bea.gov/national/nipaweb/SelectTable.asp.
Snyder, T.D., and Dillow, S.A. (2012). Digest of Education Statistics 2011 (NCES 2012001). National Center for Education Statistics, Institute of Education Sciences,
U.S. Department of Education. Washington, DC.
U.S. Department of Education, National Center for Education Statistics, Digest of
Education Statistics, 1995.

Appendix: (MATLAB Code)


ratio=[16.6; 16.9; 16.6; 16.4; 16.1; 15.8; 15.7; 15.2; 14.9; 14.6; 14.6; 14.7; 15.1; 15.2;
14.9; 14.9; 14.7; 14.4; 13.4;13.4;13.7; 13.5; 13.8];
spending=[4.481; 4.494; 4.671; 4.837; 5.090; 5.355; 5.537; 5.679; 5.946; 6.075;
6.091; 6.085; 6.089; 6.264; 6.035;6.192; 6.395; 6.553; 6.773;7.094;7.176;
7.344; 7.562];
salary=[48.246; 47.659; 47.891; 49.364; 50.426; 52.040; 54.051; 55.737; 56.572;
56.741; 57.438; 57.599; 57.344; 57.288; 56.909; 56.772; 56.555; 56.114;
56.200; 56.985; 57.191; 57.255; 57.864];
dropout=[6.1; 5.9; 5.5; 5.2; 5.1; 5.2; 4.7; 4.1; 4.8; 4.5; 4.0; 4.0; 4.4; 4.5; 5.3; 5.7; 5.0;
4.6; 4.8; 5.0; 3.6; 4.0; 4.7];
ones=[1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1];
x=[ones ratio spending salary];
X=[ratio spending salary];
plot(dropout)
xlabel('Years Since 1980');
ylabel('High School Dropout Rate');
title('High School Dropout Rates From 1980-2006');
mdl_sal=fitlm(salary,dropout);
plot(mdl_sal)
mdl_ratio=fitlm(ratio,dropout);
plot(mdl_ratio)
mdl_spending=fitlm(spending,dropout);
plot(mdl_spending)
mdl_all=fitlm(X,dropout)
res=ans(:,1);
res=sortrows(res);
probstats=[0.021739; 0.065217; 0.1087; 0.15217; 0.19565; 0.23913; 0.28261;
0.32609; 0.36957; 0.41304; 0.45652; 0.5; 0.54348; 0.58696; 0.63043;
0.67391; 0.71739; 0.76087; 0.80435; 0.84783; 0.8913; 0.93478; 0.97826];
mdl_res=fitlm(res,probstats);
plot(mdl_res);