Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Download
Standard view
Full view
of .
Look up keyword
Like this
2Activity
0 of .
Results for:
No results containing your search query
P. 1
Building Predictive Models for NYC High Schools (Alec Hubel)

Building Predictive Models for NYC High Schools (Alec Hubel)

Ratings: (0)|Views: 429|Likes:
Published by Aaron Schumacher
The New York City public school system (responsible for the education of over 1 million students) is the largest in the country. Unfortunately, it's size only makes it more susceptible to impeding issues. The fact that school budgets are consistently tightening is only worsened by the fact that American students are falling behind their international competition. As a way to monitor the success of a school, the Department of Education monitors two key statistics – high school graduation rates and aspirational performance measures. This study looks to uncover the key drivers of those measures in an at tempt to isolate the factors that are most responsible for a successful education in New York City public schools.
The New York City public school system (responsible for the education of over 1 million students) is the largest in the country. Unfortunately, it's size only makes it more susceptible to impeding issues. The fact that school budgets are consistently tightening is only worsened by the fact that American students are falling behind their international competition. As a way to monitor the success of a school, the Department of Education monitors two key statistics – high school graduation rates and aspirational performance measures. This study looks to uncover the key drivers of those measures in an at tempt to isolate the factors that are most responsible for a successful education in New York City public schools.

More info:

Published by: Aaron Schumacher on Dec 12, 2013
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

02/03/2014

pdf

text

original

 
Building Predictive Models for NYC Public High Schools
Alec Hubel | Introduction to Data Science - all !"#$Abstract
The New York City public school system (responsible for the education of over 1 million students) is the largest in the country. nfortunately! it"s si#e only makes it more susceptible to impeding issues. The fact that school budgets are consistently tightening is only worsened by the fact that $merican students are falling behind their international competition. $s a way to monitor the success of a school!the %epartment of &ducation monitors two key statistics ' high school graduation rates and aspirational performance measures. This study looks to uncover the key drivers of those measures in an at tempt to isolate the factors that are most responsible for a successful education in New York City public schools.
Introduction
 New York City public schools employ !*** teachers across over 1!** schools. These teachers are responsible for the education of 1.1 million students and represent an overwhelming portion of the +,- million annual budget. $ system of this magnitude reuires consistent monitoring in order to determineit"s efficacy. nfortunately! a evaluation of each and every school! teacher! and students would be a huge draw on already limited resources. /ecause of this! the %epartment of &ducation must rely on certain performance metrics to decide if a school"s performance is up to snuff. 0or high schools! the  primary metrics that are used for this purpose are a schools graduation rate (what percentage of a senior class will successfully graduate in a given year) and it"s aspirational performance measure ($2). The New York 3tate %epartment of &ducation uses the below definition for aspirational performance measures4
The percent of students in the cohort who earned a 5egents diploma with $dvanced %esignation (i.e.! earned ,, units of course credit6 passed 78 5egents e9ams at a score of : or above6 and took advanced course seuences in Career and Technical &ducation! the arts! or a language other than &nglish)6 and
The percent of students in the cohort who graduated with a local! 5egents! or 5egents with $dvanced %esignation diploma and earned a score of  or greater on their &nglish 5egents e9amination and an ;* or better on a mathematics 5egents e9am (note4 this aspirational measure is referred to as the <&=$>2ath $2?)This data point is meant to measure what percent of a graduating class is prepared for college or a post7high school career. 0or this analysis! @ attempted to build predictive , models. Ane for a school"s graduation rate! and one for a school"s aspiration performance measure.
Data
The New York City %epartment of &ducation makes an enormous amount of data available for public use and review. Thanks to this fact! collecting all the data reuired for my analysis was substantially easier than anticipated. To start! @ decided to focus on the ,*117,*1, school year. @ wanted to keep the data as recent as possible! in order to have my results be as reflective of the current status of the school system as possible. The data that @ reuired was held across primarily B separate data7sets. The first data7set contained demographic data! presenting values for the racial composition of schools! what  percentage of the student body ualified for free or subsidi#ed school lunches (a common pro9y for the income levels of a student population)! student7teacher ratios! and the graduation rates and $2s of individual schools. The second data7set contained budgetary information for each of the schools. 0rom this! @ was able to e9trapolate the dollar allocated per student. This would be a more useful measure for the funding of a school than the absolute budget! because a larger school would naturally have a larger budget! but may
 
not necessarily have enough resources for all of the students that it is responsible for. @ was also able tocollect the average salary for teachers in a given school. 2y intention was to use this measure as a  pro9y for the uality of teacher in a given school. $ teacher"s salary in New York is determined by howmuch training they have received and how many years of e9perience they have. @ decided to operate under the assumption that that a school with a higher average teacher salary has a higher uality teachers.
 Drawing 1: NYC School Districts Drawing 2: Heatmap of Demographic Data by District 
 
=astly! @ collected a data7set from the yearly school survey that is administered to parents! teachers! andstudents. 0rom this survey! the NYC %epartment of &ducation is able to e9tract scores for safety and respect! communication! engagement! and academic e9pectations. $dditionally! it contained information on the e9tracurricular offerings of a school.&ach school in the above data sets was given a uniue identifier called a "%/N". This was e9tremely useful for two reasons. 0irstly! it allowed me to use the andas "oin" function to combine all of my datain to easily combine all of my data in to a single data frame! without too much e9traneous data cleaning. 3econdly! the %/N allowed me to e9tract the district and borough for a given school. 0or e9ample! /ron9 =eadership $cademy Digh 3chool"s %/N is *8E,. The first two digits ' *8 ' signify that this school is located in district 8 (there are B, school districts within New York City). The third character ' E ' corresponds to the /ron9 (the other letter>borough pairs are 2>2anhattan! F>Fueens! 5>3taten @sland! and G>/rooklyn).
Methodolog%
0irst! @ had to narrow down my data7set from the 1**H schools to the -*, high schools in the NYC school system. * of those schools did not report graduation rates and $2 measures. This is a result of a regulatory reuirement that prevents a school from releasing this information when there are ,* or less graduates (generally the smaller schools in the system). $fter removing those schools with missingdata! @ employed a randomi#ed *>B* split to create a training set and a testing set. 0or both of my models! @ was attempting to predict a continuous value ' graduation rate and $2. $s such! @ decided to use scikit7learn"s ridge regression algorithm. @ began with a "kitchen sink" approach and threw all of my variable in to the model. @ then removed variables one7by7one until @ had could isolate the factors that most influenced graduation rate and $2. Iariable were selected for removal when their p7valuesindicated a lack of statistical significance and their absence from my model did not substantially detractfrom my model"s accuracy. The accuracy of my model was determined using both the 57suared and mean suared error (23&).
&esults ' (raduation &ate

You're Reading a Free Preview

Download
scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->