You are on page 1of 9

Heath McLaughlin, Akash Miriyala, Ryan Petrill, Michelle Setyanto, Ya Su

K353 Final Project


Monday, May 2nd, 2016

Exploring the Optimal GPA Path for Kelley School of Business


Students
Introduction Problem Statement & Data Description
Our analysis is focused on Kelley School of Business students GPAs; we approach
this idea through a variety of methods in order to truly understand the relationships
within our data. In reflection of our time at Kelley, our team was intrigued by the
possible relationships one could find if he/she was to compile all Kelley
undergraduate course and GPA data. One can compare the relative difficulties
between professors of the same course, compare the relative difficulties of each
major at Kelley, and even put together these insights to design the ideal Kelley
career. We utilize the Indiana University Grade Distribution database as our data
source and choose to include records dating back four years in an attempt to
maintain relevancy. Our specific metrics of interest are: Course Subject, Instructor
Name, Average Class Grade (in GPA form), and Average Student GPA.
We develop an analytical model to investigate the relationship between
departmental major and average class GPA to determine which of the Kelley majors
holds the highest average class grades. However, because average student GPA
varies and encompasses all courses (in which some are more difficult than others),
we will assume that serves as an acceptable representation of the GPA of a Kelley
student. In doing so, we can derive the distribution of Kelley Student GPAs which will
allow us to pull a random students GPA in our simulation.
Our simulation model is designed to mimic the probabilities of the selection of a
Kelley student by employers as a new hire. It seems logical to apply this knowledge
of the Kelley student GPA distribution to a simulation of an employer to see if we
can gain insights into what constitutes a good GPA at Kelley, on a relative basis. We
assume the employer has one opening and is able to interview one Kelley student at
random. We allow for the input of any GPA cutoff value and our model subsequently
determines the chances that student is hired. We expect to see an inverse
relationship as the chances of being hired should decline as the GPA cutoff is
increased.
With our analytical model we can determine which of the Kelley majors is the
easiest in terms of highest expected GPA. Using this knowledge, we design an
optimization model to map out the optimal path of classes for a student to take in
order to maximize his/her expected GPA. Students are constrained to enroll in
classes that their major requires, thus we must account for this in our optimization.
We also choose to require that a student does not repeat professors in their years at
Kelley (unless a professor was the sole instructor of multiple courses) in order to
maximize the diversity of his/her experience as a Kelley undergrad.
Analytical Model

Our chosen analytical methods investigate the relationship between departmental


major and average class GPA and then allow us to determine which of the Kelley
majors hold the higher average class grades compared to the others. Prior to setting
up our model, we must first take note of and deal with some issues with our data
structure that could complicate our analysis.
Our goal is to run a multiple linear regression of average class section GPA on
average cumulative GPA of students in the class and dummy variables indicating
majors. We create dependent flag variables where a 1 indicates the class is part of
the course path for each of the following: Real Estate, Finance, Accounting,
Entrepreneurship, BEPP, Management, Information Systems, Operations, Supply
Chain, Marketing and Sales. For each variable we input a 1 for each of the electives
and required major courses.
To run our regression we must first choose one or more majors to exclude, which is
referred to as the base group. We choose to run a kitchen sink regression on all
the variables and then choose which variable to exclude based on the results. We
will accept p-values less than 0.1 for the sake of having a richer analysis. As seen in
the kitchen sink results in the Appendix B, Sales is the least statistically significant
and thus is chosen as our base group moving forward.
We now re-run our regressions, additionally excluding the least significant variable
each time, until we arrive at the best possible regression in terms of predictability
and fit. In addition to Sales, we find Operations, Marketing, Management, and
Supply Chain all insignificant. The effects of these excluded variables are reflected
in the intercept term, hence the insignificant p-value for the intercept in the results
shown Appendix B.
Predictions of average student cumulative GPA from the regression can be seen in
Appendix B. It is important to note that in the interpretation of this regression the
coefficients on the major flag variables show the effect of that major over the
excluded variables (Sales, Operations, Marketing, Management, and Supply Chain).
So, we see that our regression predicts Accounting and Real Estate as more difficult
majors, and Entrepreneurship and Information Systems as easier majors, relative to
the excluded.
As we discuss later, we use the results of our analytical model to aid the design of
our optimization model. Since our optimization model will map the best course path
to take, we will have to account for constraints relating to required major courses
and electives. This leads us to believe that our regression predictions will differ a bit
from the optimal major GPAs that our optimization model will output. We considered
all possible major classes in our regression but our optimization will only include the
chosen class. Additionally, our regression intercept term is insignificant and can
vary greatly, thus we can expect differences due to this effect as well.

While we now have this knowledge of student GPA differences between majors, our
analysis to this point has only been considering GPA relative to other majors.
However, we dont have an idea of what really constitutes a good GPA relative to
all Kelley students. We investigate this matter in the following simulation of an
employer and the probabilities of that employer hiring a random Kelley student
based on the jobs GPA cutoff.
Simulation Model
As we mentioned before, we generated a simulation in order to discover the
possibility that an employer would hire a random student in Kelley based on their
own GPA cutoff. As seen in Appendix C, the distribution of the independent variable
appears relatively normal. The independent variable is the average GPA of all
students in a specific class, and because the selection of students varies among
different classes, different classes have varying difficulties, and we have data
spanning 4051 classes, we assume that the GPA can therefore represent the GPA of
an individual Kelley student. The scatter plot of the student GPAs returns observably
a normal distribution with a mean of 3.40 and a standard distribution of 0.20. The
numbers range between 2.34 and 3.92.
The simulation randomly pulls out a GPA from the student GPA distribution. If the
students GPA is higher than the employee cutoff GPA, it returns a 1, which means
the company will hire the student, and if the student scores lower, it returns 0,
which means the student will not be hired. We simulated 1000 trials that ranged
from 3.0 to 4.0 cutoffs, and discovered the probabilities of being hired as shown in
Appendix D. About 97.70% of Kelley students will have GPAs above 3.0, which
means that if the employee cutoff GPA is 3.0 almost all students will get hired, and
as the GPA cutoff level gets higher, fewer students are able to fit into the range.
The dependent variable we simulated here is the dummy variable of getting hired or
not. If the cutoff GPA is 3.3, the chance of getting hired will be 69.60%; if cutoff GPA
is 3.7, the chance of getting hired will only be 6.60%. On average, it is hard to get a
GPA higher than 3.7. Therefore, it is important to optimize the choice of classes and
professors in order to maximize the chance of getting a higher GPA during time in
Kelley and getting hired.
Optimization Model
In accomplishing our goal of finding the optimal path for Kelley students by
determining which instructors will maximize their expected GPA, we use Solver for
our optimization model. Based on our final regression model, our optimization
model is applied to each of the independent variables in our final regression that
represent different majors. Those variables are real estate, finance, accounting,
BEPP, entrepreneurship, and information systems. The optimization model for each
variable can be seen in each of their respective worksheet within the file

RegressionandOptimization.xls, and an example of an optimal path can be seen in


Appendix A.
Aside from optimizing the path for the aforementioned majors found on our
regression model, we also optimize the path for Kelley pre-requisites. We recognize
that not all Kelley students are direct-admits. Hence, we aim to help that specific
population to be able to major in Kelley.
In doing all the optimization through Solver, we constrain the path from having the
same teacher more than once to enable the students to diversify their experience in
Kelley. Initially, we ran Evolutionary solver, which gave us only the locally optimal
solutions. However, after modifying our constraints, we are able to use Simplex LP
instead. Thus, the results received are optimal.
Within each of the model, we sum the average section GPA of the classes needed
within the respective major and made it our objective cell. Since we aim to help
students maximize their GPA, we maximize our objective cell. Furthermore, the
major classes include all the possible electives the students can take for their
respective majors.
We constrain the values in column Choice to be greater than or equal to 0, less than
or equal to 1 and be an integer. This ensures that the students couldnt partially
take any classes. Furthermore, we put a constraint on the required classes for each
major so that the values in the Choice column for those classes equal to 1. Hence,
we have taken into account that any students taking a particular major must take
the required classes to graduate.
All in all, the results of our optimization model using Solver can be seen in the
RegressionandOptimization.xlsx file, within each worksheet titled Pre-Req, Real
Estate, Finance, Accounting, BEPP, Entrepreneurship and Information Systems. The
recommended instructors for each of the required classes are arranged into green
tables, complete with the expected GPA for that course based on past data and the
average cumulative GPA of students within that section.
Conclusion
Our analysis focused on delving into data regarding Kelley School of Business
students GPAs and gathering interesting insights. We developed an analytical
model using linear regression to compare the relative difficulties of each major at
Kelley, and even put together these insights to optimize the ideal Kelley career.
We found a logical simulation to run would be to simulate the role of an employer.
This simulation revealed to us an inverse relationship between the chances of being
hired and GPA cutoff.
The constraints of our optimization model and insignificance of certain majors
caused some discrepancies between the expected GPAs predicted via regression
and those found in our optimization. We saw the regression predict

Entrepreneurship as the easiest major and Information Systems as the second


easiest, but saw our optimization predict Information Systems as easiest and
Entrepreneurship as second easiest. While our regression focused on the prediction
of section GPA for each major, our optimization focused on cumulative student GPA,
and each was necessary in order to arrive at conclusions that make sense, are
statistically significant, and add value to our analysis.

Appendix A
Optimal Path for an Accounting Major at the Kelley School of Business
Prerequisite Courses:
Subject and
Course
BUS-A 100
BUS-A 201
BUS-A 202
BUS-C 104
BUS-C 204
BUS-D 270
BUS-D 271
BUS-G 202
BUS-J 375
BUS-K 201
BUS-K 303
BUS-L 201
BUS-L 375
BUS-T 175
BUS-T 275

INSTRUCTOR NAME
Winston,Vivian
Kim,Yoon Hoo
Gosalia,Pranay
Dutton,Emily R
Watson,Carol A
Harrison,David Aron
Garcia,P. Roberto
Kreft,Steven Francis
Chin,M.K.
Thompson,Alaina
Paige
Valencic,Taryn Renee
Engber,Michael David
Fort,Timothy L
Sklar,Pamela S.
Irvin,Sally J.

AVG SECT
AVG STDNT CUM
GPA
GPA
2.9824
3.171509091
3.008
3.362
3.098333333
3.4395
3.85
3.635
3.905666667
3.34
3.374142857
3.420571429
3.771
3.5135
3.110764706
3.421
3.668333333
3.397666667
3.209
3.79825
3.204
3.7815
3.809333333
3.997

3.297
3.46475
3.247
3.491625
3.217333333
3.4585

Major Courses:
Subject and
Course
BUS-A 311
BUS-A 312
BUS-A 325
BUS-A 329
BUS-A 337
BUS-A 424
BUS-C 301

INSTRUCTOR NAME
Tiller,Mikel G.
Quay,James Bothwell
Pomeroy,Don
Hite,Peggy A.
Hsieh,Christine J
Kane,Shannon
Elizabeth
Brimm,David Robert

AVG STDNT CUM


AVG SECT GPA
GPA
3.007
3.499
3.221
3.4995
3.6245
3.484
2.9689
3.418
3.79
3.524
3.412
3.484113636

3.3615
3.4505

Appendix B
Kitchen Sink Regression Results:
Variable
Intercept
AVG SECT GPA
Real Estate
Finance
Accounting
Entrepreneurshi
p
BEPP
Management
Information
Systems
Operations
Supply Chain
Marketing
Sales

Coefficient
s
2.26
0.34
0.02
0.08
0.11
-0.12

Standard
Error
0.06
0.02
0.05
0.05
0.03
0.03

t Stat

P-value

39.14
20.92
0.49
1.71
3.94
-3.92

0.00
0.00
0.62
0.09
0.00
0.00

0.05
-0.06
-0.11

0.03
0.03
0.04

1.56
-2.05
-3.05

0.12
0.04
0.00

0.10
-0.07
-0.08
0.01

0.07
0.07
0.04
0.05

1.44
-0.99
-1.92
0.14

0.15
0.32
0.05
0.89

Optimal Regression Results:


Variable
Intercept
AVG STDNT CUM
GPA
Real Estate
Finance
Accounting
BEPP
Entrepreneurship
Information
Systems

Coefficient
s
0.28

Standard
Error
0.56

t Stat
0.51

P-value
0.61

0.94
-0.23
-0.22
-0.34
-0.18
0.18

0.17
0.06
0.06
0.04
0.05
0.04

5.70
-3.65
-3.42
-7.64
-3.67
4.94

0.00
0.00
0.00
0.00
0.00
0.00

0.09

0.05

1.82

0.07

Predictions of Average Student Cumulative GPA:

Major
Pre-Requisites
Real Estate
Finance
Accounting
BEPP
Entrepreneurship
Information
Systems

Regression
Prediction
3.2052001
3.289991996
3.206465718
3.325461952
3.638684342
3.564868826

Appendix C
Simulation Model

Kelley Student GPA Distribution

Independent Variables Distribution:

Appendix D
Simulation Model

PROBABILITY OF HIRING
1.2
1
0.8
0.6
0.4
0.2
0

Hiring
Probability Distribution Based on GPA: