33 views

Uploaded by Garrett Knapp

MAE 301 ASU Project

- Document(1)
- Variable-Selection-for-Regression-Models-Always-Used-with-Poor-Results
- Bivariate Analysis
- IMPACT OF WATER SUPPLY ON ECONOMIC GROWTH: A CASE STUDY OF INDIA
- Report Flexi1
- L21064067
- Marketing Mix - 2
- Undergraduate Econometric
- Annotated Simple Linear Regression
- Ww2 Coastal Edu Kingw Statistics R Tutorials Simplelinear Ht
- 05 Historical DOE
- Intro to Finance
- Why House Price Indexes Differ Wp12125
- 132-0704
- Artu Ger
- new
- Mark Us Cid Draft
- Multiple Regression
- Factors Affecting Bank Profitability in Pakistan
- An Investigation of Factors Influencing Design Team Attributes in Green Buildings

You are on page 1of 12

301

Spending on High School Graduation Rates

Garrett Knapp

This study uses per student spending, teacher-student ratios, and teacher salaries to model

high school drop out rates from the years 1980-2006. The experiment tests two hypotheses:

first, linear regressions using only one variable were conducted to test the null hypothesis

that high school dropout rates can be accurately modeled by one variable; secondly, all

three variables were combined using multiple linear regression to test the null hypothesis

that dropout rates can be modeled by a linear model combination of the aforementioned

variables. For each hypothesis, significance was tested using an alpha level of 0.05.

12/5/14

Introduction:

In a time where even having a college education will not guarantee a job post

graduation, it has been even worse for those who are unable to graduate high

school. The median adult income for high school dropouts in 2006 was $23,000 and

was nearly half of the $42,000 median income for those with at least a diploma or

GED(Trends in High School Dropout Rates). Furthermore, a study by Levin and

Belfied estimates that the dropping out of high school costs are American economy

around $240,000 over his or her life when accounting for lower tax contributions,

higher rates of crime, and higher dependency on social services like welfare and

Medicaid. Therefore, finding a pattern in dropout rates and being able to detect

what variables are associated with a decrease in dropout rates has the ability to

prevent the students from spending their life stuck in the lower class as well as

saving the American economy from unnecessary spending on social service

programs.

It is quite clear that there has been a continuous decrease in Americas high school

dropout rate since the year 1980. From 1980 to 2006, the high school dropout rate

fell from 6.1% to 3.8% (Trends in High School Dropout Rates). This decrease in high

school dropouts amounts to a difference of 248,000 less students dropping out, even

though the amount of total students enrolled in high school rose by 91,000. Using

the $240,000 figure stated earlier, this estimates a lifetime savings of $59.52 billion

as a result of dropping the high school dropout rate from 6.1% to 3.8% in the year

2006 alone. Using the median income difference, this decrease in high school

dropouts would result in an income increase $4.712 billion dollars per year. This

study uses numerical data to find what variables are most strongly associated with

this decrease in high school graduation rates.

The variables in this study were chosen because they are all associated with

education spending. This study looks at pupil-teacher ratio, per student

expenditures, and average teacher salary. These variables are all continuous

random variables, which lends itself well to curve-fitting applications. When

conducting multiple linear regressions, variables must be chosen to avoid

multicollinearity. Multicollinearity arises when two variables used in the curvefitting model predict each other more so than the outcome variable. The three

variables that I have chosen avoid this problem: Pupil-teacher ratio looks at the size

of each classroom, teacher salary works under the assumption that a higher salary is

associated with better teachers, and per-student spending accounts for the amount

of resources each student has available to them. All of these variables are strongly

related with the outcome variable, dropout rates, and less associated with each

other.

Procedure:

In any statistical analysis, it is important to define the sample space. For this

analysis, the sample space is all students who were enrolled in public schools grades

10-12. The dropout rate calculated each year is taken from the amount of students

enrolled in grades 10-12 that year. This means that once a student drops out of high

school, they are counted as part of the dropout rate for the given year but are not a

part of the sample space in the following year.

Each of these dropout rates was used as the output variable for each of the

regression conducted in the study. A linear regression was used each time, which

was determined after looking at the plot of graduation rates from 1980-2006. One

multiple linear regression was conducted to show how powerful the combination of

all three of the variables is in reducing high school dropout rates. Three linear

regressions were also run with each of the predictor variables to see how each

individual variable was associated with the decrease in high school drop out rates.

Results:

This study was done to determine the relations between high school dropout rates

and various education spending measures. As stated earlier, high school drop out

rates have continually fell, as seen by the figure below.

Linear Regressions for each individual input variable were done to find the rsquared values of each input variable. These r-squared values show how much of

the variation in high school dropout rates is accounted for by the predictor variable.

The variable with the highest r-squared variable is the variable that has the highest

standalone correlation with the decrease in high school dropout rates. The

individual r-squared values are presented in the table below.

Predictor Variable

R-Squared

Teacher Salary

0.481

Student/Teacher Ratio

0.388

Per Student Spending

0.386

The plot of each linear regression are presented in the following pages, with

confidence bounds on each plot to show how many data points from each regression

lie within the confidence interval with an alpha of 0.05.

y ~ 1 + x1

Estimated Coefficients:

Estimate SE

tStat pValue

(Intercept) 7.5275 0.75482 9.9725 2.0331e-09

x1

-0.453 0.12467 -3.6335 0.001555

Number of observations: 23, Error degrees of freedom: 21

Root Mean Squared Error: 0.518

R-squared: 0.386, Adjusted R-Squared 0.357

F-statistic vs. constant model: 13.2, p-value = 0.00155

y ~ 1 + x1

Estimated Coefficients:

Estimate SE

tStat

pValue

(Intercept) -0.91007 1.5716 -0.57907

0.5687

x1

0.38143 0.1045 3.6502 0.0014947

Number of observations: 23, Error degrees of freedom: 21

Root Mean Squared Error: 0.517

R-squared: 0.388, Adjusted R-Squared 0.359

F-statistic vs. constant model: 13.3, p-value = 0.00149

y ~ 1 + x1

Estimated Coefficients:

Estimate SE

tStat pValue

(Intercept) 11.764 1.5773 7.4584 2.4865e-07

x1

-0.12687 0.02873 -4.4158 0.00024042

Number of observations: 23, Error degrees of freedom: 21

Root Mean Squared Error: 0.476

R-squared: 0.481, Adjusted R-Squared 0.457

F-statistic vs. constant model: 19.5, p-value = 0.00024

Lastly, a linear regression involving all three input variables was run to see how a

combination of these variables could be used to model high school dropout rates. A

plot like the ones before cannot be produced because there are too many variables

to represent, even on a 3-D plot. Instead, a plot of the residuals is shown below.

y ~ 1 + x1

Estimated Coefficients:

Estimate SE

tStat pValue

(Intercept)

0.5 0.014783 33.822 8.4405e-20

x1

0.61826 0.032604 18.963 1.0848e-14

Number of observations: 23, Error degrees of freedom: 21

Root Mean Squared Error: 0.0709

R-squared: 0.945, Adjusted R-Squared 0.942

F-statistic vs. constant model: 360, p-value = 1.08e-14

Linear regression model:

y ~ 1 + x1 + x2 + x3

Estimated Coefficients:

Estimate SE

tStat pValue

(Intercept) 8.9595 9.0338 0.99178 0.33377

x1

0.10833 0.38711 0.27983 0.78263

x2

0.056633 0.47943 0.11813 0.90721

x3

-0.11153 0.060793 -1.8346 0.082269

Number of observations: 23, Error degrees of freedom: 19

Root Mean Squared Error: 0.499

R-squared: 0.485, Adjusted R-Squared 0.404

F-statistic vs. constant model: 5.97, p-value = 0.0048

Conclusion:

This study was conducted to determine if student-teacher ratios, teachers

salary, and per student spending can be used as good predictors in modeling high

school dropout rates. For each individual regression, the corresponding p-value was

below 0.05, which allows us to accept the null hypothesis that states high school

dropout rates are linearly related to teacher salaries, student-teacher ratio, and per

student spending. Assessing each of these variables individually, the r-squared

values from the individual regressions indicate that teacher salaries have the

strongest ability to predict high school graduation rate. The downward slope of the

regression confirms that the higher the teacher salary, the lower the dropout rate

within the sample space. The relative predictive strength of teacher salaries is also

shown in the fact that the corresponding regression has the lowest p-value and has

more data points within its confidence interval than the other regressions. This

conclusion lends itself to the recommendation that increases in federal education

spending should be geared towards teachers in order to obtain a lower dropout

rate.

In combining all three variables to create a multiple linear regression, the

null hypothesis that high school dropout rates can be modeled by a linear

combination of the previously mentioned variables must be rejected because none

of the variables have an associated p-value of 0.05. This does not necessarily mean

that the chosen variables are poor predictors of dropout rates, but it does mean that

a linear combination of these variables provides weak predictive ability in modeling

dropout rates. This multiple regression could yield greater power if the data was

transformed to something other than a linear plot; an exponential model could fit

better fit the model. For example, in a school that has very low per student spending,

increasing spending by $500/student could have a greater impact than if a school

already spends adequate money on each student.

Although all three variables could not be combined into a single, strong model, this

study does show that teacher salaries, per student spending, and student-teacher

ratios all do play a roll in reducing high-school dropout rates. Teacher salaries had

the strongest correlation with the decline in dropout rates, so starting with this

while also increasing per-student spending and limiting student-teacher ratios all

have the ability to provide more Americans with a high school diploma, and

effectively help lessen the burden on both the lower class and taxpayer money spent

on social services.

References:

Chapman, C., Laird, J., and KewalRamani, A. (2010). Trends in High School Dropout

and Completion Rates in the United States: 19722008 (NCES 2011-012).

National Center for Education Statistics, Institute of Education Sciences, U.S.

Department of Education.Washington, DC.

National Education Association, Estimates of School Statistics, 1959-60 through

2011-12; and unpublished tabulations. U.S. Department of Commerce,

Bureau of Economic Analysis, National Income and Product Accounts,

tables 6.6B-D, retrieved November 2, 2011,

from http://www.bea.gov/national/nipaweb/SelectTable.asp.

Snyder, T.D., and Dillow, S.A. (2012). Digest of Education Statistics 2011 (NCES 2012001). National Center for Education Statistics, Institute of Education Sciences,

U.S. Department of Education. Washington, DC.

U.S. Department of Education, National Center for Education Statistics, Digest of

Education Statistics, 1995.

ratio=[16.6; 16.9; 16.6; 16.4; 16.1; 15.8; 15.7; 15.2; 14.9; 14.6; 14.6; 14.7; 15.1; 15.2;

14.9; 14.9; 14.7; 14.4; 13.4;13.4;13.7; 13.5; 13.8];

spending=[4.481; 4.494; 4.671; 4.837; 5.090; 5.355; 5.537; 5.679; 5.946; 6.075;

6.091; 6.085; 6.089; 6.264; 6.035;6.192; 6.395; 6.553; 6.773;7.094;7.176;

7.344; 7.562];

salary=[48.246; 47.659; 47.891; 49.364; 50.426; 52.040; 54.051; 55.737; 56.572;

56.741; 57.438; 57.599; 57.344; 57.288; 56.909; 56.772; 56.555; 56.114;

56.200; 56.985; 57.191; 57.255; 57.864];

dropout=[6.1; 5.9; 5.5; 5.2; 5.1; 5.2; 4.7; 4.1; 4.8; 4.5; 4.0; 4.0; 4.4; 4.5; 5.3; 5.7; 5.0;

4.6; 4.8; 5.0; 3.6; 4.0; 4.7];

ones=[1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1];

x=[ones ratio spending salary];

X=[ratio spending salary];

plot(dropout)

xlabel('Years Since 1980');

ylabel('High School Dropout Rate');

title('High School Dropout Rates From 1980-2006');

mdl_sal=fitlm(salary,dropout);

plot(mdl_sal)

mdl_ratio=fitlm(ratio,dropout);

plot(mdl_ratio)

mdl_spending=fitlm(spending,dropout);

plot(mdl_spending)

mdl_all=fitlm(X,dropout)

res=ans(:,1);

res=sortrows(res);

probstats=[0.021739; 0.065217; 0.1087; 0.15217; 0.19565; 0.23913; 0.28261;

0.32609; 0.36957; 0.41304; 0.45652; 0.5; 0.54348; 0.58696; 0.63043;

0.67391; 0.71739; 0.76087; 0.80435; 0.84783; 0.8913; 0.93478; 0.97826];

mdl_res=fitlm(res,probstats);

plot(mdl_res);

- Document(1)Uploaded byMariaStefany
- Variable-Selection-for-Regression-Models-Always-Used-with-Poor-ResultsUploaded bywhoiswho12345
- Bivariate AnalysisUploaded byAnonymous I4p7bNBDl
- IMPACT OF WATER SUPPLY ON ECONOMIC GROWTH: A CASE STUDY OF INDIAUploaded byInternational Journal of Innovative Science and Research Technology
- Report Flexi1Uploaded byAli Najam
- L21064067Uploaded byAnonymous 7VPPkWS8O
- Marketing Mix - 2Uploaded byHermawan W Sjamsudin
- Undergraduate EconometricUploaded byAcho Jie
- Annotated Simple Linear RegressionUploaded byamnoman17
- Ww2 Coastal Edu Kingw Statistics R Tutorials Simplelinear HtUploaded byNandkishore Srinivasan
- 05 Historical DOEUploaded byRajanishshetty
- Intro to FinanceUploaded byNotmeNo
- Why House Price Indexes Differ Wp12125Uploaded bybillpetrrie
- 132-0704Uploaded byapi-27548664
- Artu GerUploaded byElyta Febrina
- newUploaded bybiswarup1988
- Mark Us Cid DraftUploaded byGreg Markus
- Multiple RegressionUploaded byShanyaRastogi
- Factors Affecting Bank Profitability in PakistanUploaded byUsman Sajad
- An Investigation of Factors Influencing Design Team Attributes in Green BuildingsUploaded byforjani69
- 83638657-US-Foreign-Aid-and-Pakistan.pdfUploaded byAmir Hayat
- Dividend Changes and Future ProfitabilityUploaded byBahulu Berinti 'Arashi
- 3152-9092-1-PBUploaded bySone Vipgd
- 09-13Uploaded byFaedah Karim FM's
- Shajib Final 2Uploaded byAmitav Ranjan
- Recapitalization. 1Uploaded byndukweg
- calculos de molienda martillosUploaded byPoncho López de Ere
- 2584FI422D02520171 Korelasi Dan RegresiUploaded byWulandari Anugrah W
- hclark lessonplantemplate-istefinalUploaded byapi-253519526
- Frequency to Time ConversionUploaded byThakur Jitender Pundir

- Matlab Lab 1Uploaded byGarrett Knapp
- Mat 211 Cheat SheetUploaded byGarrett Knapp
- NFL EditorialUploaded byGarrett Knapp
- King’s Use of PathosUploaded byGarrett Knapp
- LRM-info-graphic-poster-16-5x21-5.pdfUploaded byGarrett Knapp
- Dial SheetUploaded byGarrett Knapp
- DialSheet (3)Uploaded byGarrett Knapp
- Handout Water PumpUploaded byGarrett Knapp
- Final QuestionsUploaded byGarrett Knapp
- Business Plans Spring 2015 #21 (1)Uploaded byGarrett Knapp
- mae 384 hw 4Uploaded byGarrett Knapp
- mae 384 hw 5Uploaded byGarrett Knapp
- North KindergartenUploaded byGarrett Knapp
- Medical MarijuanaUploaded byGarrett Knapp
- Committee Application 2012 PwUploaded byGarrett Knapp

- RAM Cost Cutting and ReductionUploaded byRavindra Chauhan
- Data Science Assingment (Probability Manual)Uploaded byUmair Sajid Minhas
- 10.11648.j.ijdsa.20150101.11Uploaded byUsman Shabbir
- Statistical Analysis in Microbiology StatNotesUploaded byTeh Tarik
- PredAnalytics HandoutsUploaded byDiana Pacheco
- Young Children Job SatisfactionUploaded byLee Hou Yew
- 008 Research MethodologyUploaded byNamrata Saxena
- IntroUploaded byAwsb Khan
- G. David Garson-Logistic Regression_ Binary and Multinomial-Statistical Associates Publishing (2014)Uploaded byandres
- Tutorial contingency table.docUploaded byCART11
- Exercise 1Uploaded byNoraini Ismail
- Garson_2008_PathAnalysis.pdfUploaded bySara Bennani
- Survival AnalysisUploaded byamme_puspita
- Course Outline for AST 330Uploaded bymahbub alam
- Computer For.docxUploaded byFahad Niazi
- assg 4Uploaded byNaila Mehboob
- Kisango_Factors Influencing Students’ Participation in Co-curricular Activities in Public Secondary Schools in Lamu County KenyaUploaded byFerdieD.Pinon
- Econometrics_ch4.pptUploaded byKashif Khurshid
- Chapter 10 Test for Forensic Accounting & Fraud Examination, 1 e Mary-Jo Kranacher ISBN-10; 047043774X Wiley 2010Uploaded byShoniqua Johnson
- CHAPTER 7Uploaded byFaZz Baa
- 2SLS Klein macro.pdfUploaded byNiken Dwi
- The Boston Housing DatasetUploaded bySwastik Mishra
- joavi1Uploaded bybigdaddy223
- Qnix 8500 Data SheetUploaded byMustafa Bahtiyar
- Kurth Curricular AdaptationsUploaded bykbnation1
- Group Assignment%5b4639%5dUploaded byAkirah McEwen
- conditional_expectationUploaded byVictor Haselmann Arakawa
- SAS Annotated OutputUploaded byzknightvn
- Nature Conservation SyllabiUploaded byAmod Kumar
- Bms Project Shubham 4Uploaded byRishab Lohan