Professional Documents
Culture Documents
Python Final Project
Python Final Project
NFL
combine results by every player that participated. The NFL combine is a skills competition for rookie
players coming out of college that hope to show off their athletic ability to NFL scouts. Because I am an
avid fan of Football, I know what all the columns mean, however viewers may not. Here is a brief
explanation of every column and what it means :
“Hand Size (in)” – The players hand size from pinky to thumb in inches.
“Bench press” – this is the amount of reps done with 225 pounds bench press.
“Shuttle” – From a three-point stance, the drill begins with players sprinting five yards in one direction,
then sprinting ten yards back the other direction, and finally five yards through the original starting
point for a total run of 20 yards. This is timed in seconds.
“3Cone” - consists of a series of quick five-yard sprints and tight turns, using three cones that form an L-
shape as guides. Timed in seconds.
“60Yd Shuttle” – just like the original shuttle, however it is a 60 yard run instead of 20. Also timed in
seconds.
This data when looked at right now can give people a good understanding of a specific players athletic
ability. However my goal with this data is to compare these stats over time with the hopes of proving an
increase in athletic ability in players. I believe that through the use of scatter plots and graphs,
comparing average drill results to time, I will be able to show trends in the data that prove that athletes
athletic ability is improving over the years. I will analyze the columns that I feel best represent athletic
abilities such as speed, strength, agility, jumping ability and more.
First I imported all of the extensions we used in class just in case I need any of them. I than added my
data file
By observing this graph, we see the mean 40 Yard dash times per year over a 31 year period. First we
notice a upwards trend from around 1987-1990, than it starts to spike down. I assume this is due to an
increase in the efficiency of the timing of this event, becoming more accurate over this time frame. We
than notice here a very clear downwards trend in average 40 times over the year. From this we can
conclude that on average, athletes who are competing in this event are getting increasingly quicker over
the years.
#grouping the data and calculating the mean 40-yard time for each year
grouped_data = data.groupby('year')['40_yard'].mean().reset_index()
#setting the axis labels and graph title and then showing it
plt.xlabel('Year')
plt.ylabel('40-yard time (seconds)')
plt.title('Average 40-yard time at the NFL Combine')
plt.show()
Another stat we can look at to analyze speed is the shuttle. This also shows us the development of
agility in football players over the years. For this I will take a similar approach as before, except by using
all the shuttle times (in seconds), instead of the mean, to time (in years). From this we will look at each
year individually, looking for any trends or patterns that show athletes getting quicker over the years.
From this we don’t observe to much of a pattern that proves my theory. Overall we can see a
slight decrease in the amount of 5 second and above times, meaning less players are running
the shuttle in over 5 seconds. This provides some evidence to my theory, however the fastest of
times aren’t getting any faster than those 30 years ago. Perhaps this has to do with the players
stoping and starting ability, which plays a large role in the shuttle drill.
Here is the code and it’s purpose as # for this scatter plot:
#choosing the columns I want
data = combine_data[['year', 'shuttle']].copy()
#setting the axis labels and graph title and then showing it
plt.xlabel('Year')
plt.ylabel('Shuttle time (seconds)')
plt.title('Shuttle Time at the NFL Combine')
plt.show()
I am dissatisfied with my results from the last graph so let’s try another drill that takes into
account a athletes speed, as well as his agility. This drill is the 3 Cone drill. Like the first plot, I
will first find the mean average and create a scatter plot comparing it to time in years once
again. This time I will add a trend line so we can clearly see the trend the data is taking.
*I was finding the trend line was getting messed up due to a data point from “year 0” which I
assume is a mistake in the data. To avoid this I specified the years of data I wanted to use.*
#Identify my columns
data = data[['year', '3cone']].copy()
#group the data and calculate the mean 3-cone time for each year
grouped_data = data.groupby('year')['3cone'].mean().reset_index()
#set the axis labels and graph title and then show it
plt.xlabel('Year')
plt.ylabel('3-Cone time (seconds)')
plt.title('Average 3-Cone Time at the NFL Combine')
plt.show()
Now that we have looked at speed, I’m going to look at strength. The best way to do this is to
look at the results of the bench press event. This is simply as many reps as a player can do on
bench press with 225 pounds. For the sake of not repeating the same code I will present this as
a bar graph. I will take the average amount of reps per player per year and look at them over all
the years. My hope is to see an increase in the bars height over time to show an increase in the
average amount of reps done.
From this we can observe a clear increase from 1987 to around 2009. This shows an increase in
strength over this time period, at least when it comes to bench pressing. The interesting part is
how the bars start to trend downwards after 2009. According to this graph it show a decrease
over the last ten years of the sample. If I could improve this I would definitely find a sample that
dates up until 2022 so I could see if this trend continued over the last few years. Overall, there
is a clear increase in the reps done in 1987 compared to 2018, this backs up my theory that
athletes are becoming more athletic throughout the years.
#picking my columns
data = data[['year', 'bench_press']].copy()
#group the data and calculate the mean bench press amount for each year
grouped_data = data.groupby('year')['bench_press'].mean().reset_index()
#set the axis labels and graph title and then show it
plt.xlabel('Year')
plt.ylabel('Average bench press amount')
plt.title('Average Bench Press Amount at the NFL Combine')
plt.show()
Next I would like to look at players jumping ability. To do this I will use both” vertical jump” and
“broad jump.” I want to create a plot with both of these categories calculated means per year
once again, and compare them time in years. I will add trend lines and a legend so it easy to
observe each. However the line with the larger value is obviously the broad jump as it is a
different test. I am expecting the trend lines show an increase in values over the years.
From this graph we can see that the trend line for both Vertical Leap and Broad Jump trends
upwards over time. We can observe a larger increase in Broad Jump, but still a small increase in
Vertical leap as well. This graph backs up my theory, by showing that on average athletes
jumping ability is increasing over the years.
#group the data and calculate the mean vert leap and broad jump for each year #Add a trend line for each scatter plot and show it
grouped_data = data.groupby('year')['vert_leap_in', 'broad_jump_in'].mean().reset_index() sns.regplot(x='year', y='vert_leap_in', data=grouped_data, scatter=False)
sns.regplot(x='year', y='broad_jump_in', data=grouped_data, scatter=False)
#create the scatter plot using seaborn plt.show()
sns.scatterplot(x='year', y='vert_leap_in', data=grouped_data, label='Vertical Leap')
sns.scatterplot(x='year', y='broad_jump_in', data=grouped_data, label='Broad Jump')
All of these plots provide some evidence of my theory that football athletes have become
increasingly more athletic over the years. We’ve observed multiple trend lines in the data sets
that prove this. However now I would like to look at how the weight effects their results in the
different drills. Once we see if this has any effect on the results, we will look at how weight has
changed in football players over the years and determine if it has any relation to the improving
athleticism.
First let’s look at how weight effects the results of the 40 yard dash. I assume that the heavier
the athlete, the slower the time as it is harder to move more weight. I will create a scatter plot
comparing the weight (Ibs) to the results of 40 yard to show any relationships.
#create the scatter plot with the classic trend line using seaborn
sns.regplot(x='weight_lbs', y='40_yard', data=data)
As we can see from the trend line my prediction was correct, the amount of reps done
increases overall with the players weight. We can also notice my prediction on which players
was somewhat right. I was also right about the tackles, centres and guards dominating the
higher rep range, but for any football fan this was a given. We can that positions such as kicker
are down in the low rep area. This makes sense because a position like kicker doesn’t require
much muscle mass or strength, it’s all in the legs.
#set the axis labels and graph title and then show it
plt.xlabel('Weight (lbs)')
plt.ylabel('Bench press')
plt.title('Bench press vs weight at the NFL Combine by position')
plt.show()
Now I would like to look to compare weight to the jumping ability. For this I will create a
stacked bar graph of both broad jump and vertical leap compared to different weights. I will
make bins of 10 pounds. If my prediction is correct we will see smaller jump results in those
that are heavier as it is harder to push more weight.
From this we can observe the downwards trend in bar height as weight increases in both
vertical leap and broad jump. This shows that my prediction was correct and that the heavier
the player, the less they can jump on average.
#calculate the mean of vert-leap and broad jump for all the weight group
grouped_data = data.groupby('weight_group')[['vert_leap_in', 'broad_jump_in']].mean()
#set the axis labels and title the graph and then show it
plt.xlabel('Weight (lbs)')
plt.ylabel('Inches')
plt.title('Vertical leap and broad jump by weight at the NFL Combine')
plt.show()
Now that we can see how weight plays a role in results in speed, strength and jumping ability,
let’s see how the weight in football players has changed over the years to see if this could be a
influential factor on faster 40 yard speeds and increase in jumping distances. For this we will
simply create a line graph of the mean weight per year. From this we can see how weight has
trended over the last 30 years.
#filter data to include only weight and year columns and remove any missing data
data = combine_data[['year', 'weight_lbs']].dropna()
#group data by year and calculate the mean weight for each year
grouped_data = data.groupby('year')['weight_lbs'].mean().reset_index()
https://www.kaggle.com/code/themlphdstudent/cheat-sheet-seaborn-charts
https://www.geeksforgeeks.org/how-to-create-a-stacked-bar-plot-in-seaborn/
https://seaborn.pydata.org/generated/seaborn.regplot.html