You are on page 1of 12

The data set I chose to analyze is a data set sampled from 1987 to 2018, that contains all of the

NFL
combine results by every player that participated. The NFL combine is a skills competition for rookie
players coming out of college that hope to show off their athletic ability to NFL scouts. Because I am an
avid fan of Football, I know what all the columns mean, however viewers may not. Here is a brief
explanation of every column and what it means :

“Year”- The year the statistics were taken.

“Name” – The name of he player.

“College” – The college they played football at.

“POS” – The position the player plays.

“Height (in)” – The players height in inches.

“Weight (lbs) – The players weight in pounds.

“Hand Size (in)” – The players hand size from pinky to thumb in inches.

“Arm length (in)” – The players arm length in inches.

“40 yard” – This is a timed 40 yard run in seconds.

“Bench press” – this is the amount of reps done with 225 pounds bench press.

“Vert Leap (in)” – this is their vertical jump distance in inches.

“Broad Jump (in) – this is a standing long jump in inches.

“Shuttle” – From a three-point stance, the drill begins with players sprinting five yards in one direction,
then sprinting ten yards back the other direction, and finally five yards through the original starting
point for a total run of 20 yards. This is timed in seconds.

“3Cone” - consists of a series of quick five-yard sprints and tight turns, using three cones that form an L-
shape as guides. Timed in seconds.

“60Yd Shuttle” – just like the original shuttle, however it is a 60 yard run instead of 20. Also timed in
seconds.

This data when looked at right now can give people a good understanding of a specific players athletic
ability. However my goal with this data is to compare these stats over time with the hopes of proving an
increase in athletic ability in players. I believe that through the use of scatter plots and graphs,
comparing average drill results to time, I will be able to show trends in the data that prove that athletes
athletic ability is improving over the years. I will analyze the columns that I feel best represent athletic
abilities such as speed, strength, agility, jumping ability and more.
First I imported all of the extensions we used in class just in case I need any of them. I than added my
data file

#importing all the packages (Imported all just in case)


import pandas as pd
import numpy as np
import statistics as stats
import matplotlib.pyplot as plt
import seaborn as sns

#Importing the data file I will be observing


combine_data = pd.read_csv('combine_results.csv')
The first athletic ability I will be looking for improvement in is speed. In order to do this, my best option
is to take the results of the “40 yard” (in seconds), find the average per year, and display them on a
scatter plot with time (in years) as the X axis.

By observing this graph, we see the mean 40 Yard dash times per year over a 31 year period. First we
notice a upwards trend from around 1987-1990, than it starts to spike down. I assume this is due to an
increase in the efficiency of the timing of this event, becoming more accurate over this time frame. We
than notice here a very clear downwards trend in average 40 times over the year. From this we can
conclude that on average, athletes who are competing in this event are getting increasingly quicker over
the years.

Here is code and it’s purpose as # for this scatter plot:


#choosing the columns I want
data = combine_data[['year', '40_yard']].copy()

#clearing the missing data


data.dropna(subset=['year', '40_yard'], inplace=True)

#grouping the data and calculating the mean 40-yard time for each year
grouped_data = data.groupby('year')['40_yard'].mean().reset_index()

#creating the scatter plot using seaborn


sns.scatterplot(x='year', y='40_yard', data=grouped_data)

#setting the axis labels and graph title and then showing it
plt.xlabel('Year')
plt.ylabel('40-yard time (seconds)')
plt.title('Average 40-yard time at the NFL Combine')
plt.show()
Another stat we can look at to analyze speed is the shuttle. This also shows us the development of
agility in football players over the years. For this I will take a similar approach as before, except by using
all the shuttle times (in seconds), instead of the mean, to time (in years). From this we will look at each
year individually, looking for any trends or patterns that show athletes getting quicker over the years.

From this we don’t observe to much of a pattern that proves my theory. Overall we can see a
slight decrease in the amount of 5 second and above times, meaning less players are running
the shuttle in over 5 seconds. This provides some evidence to my theory, however the fastest of
times aren’t getting any faster than those 30 years ago. Perhaps this has to do with the players
stoping and starting ability, which plays a large role in the shuttle drill.

Here is the code and it’s purpose as # for this scatter plot:
#choosing the columns I want
data = combine_data[['year', 'shuttle']].copy()

#drop missing data


data.dropna(subset=['year', 'shuttle'], inplace=True)

#creating the scatter plot using seaborn


sns.scatterplot(x='year', y='shuttle', data=data)

#setting the axis labels and graph title and then showing it
plt.xlabel('Year')
plt.ylabel('Shuttle time (seconds)')
plt.title('Shuttle Time at the NFL Combine')
plt.show()
I am dissatisfied with my results from the last graph so let’s try another drill that takes into
account a athletes speed, as well as his agility. This drill is the 3 Cone drill. Like the first plot, I
will first find the mean average and create a scatter plot comparing it to time in years once
again. This time I will add a trend line so we can clearly see the trend the data is taking.
*I was finding the trend line was getting messed up due to a data point from “year 0” which I
assume is a mistake in the data. To avoid this I specified the years of data I wanted to use.*

From this plot we can clearly observe


the downwards trend in time from
1987-2018. Although it fluctuates in
the last few years, it still follows a
trend downwards overall. This
combined with the other plots I have
shown above gives us evidence that
football players are on average
getting increasingly better in terms of
speed.

The code and the meaning as the # for this plot:


#filter data to include only years 1987-2018
data = combine_data[(combine_data['year'] >= 1987) & (combine_data['year'] <= 2018)]

#Identify my columns
data = data[['year', '3cone']].copy()

#clear missing data


data.dropna(subset=['year', '3cone'], inplace=True)

#group the data and calculate the mean 3-cone time for each year
grouped_data = data.groupby('year')['3cone'].mean().reset_index()

#create the scatter plot with a trendline using seaborn


sns.lmplot(x='year', y='3cone', data=grouped_data, scatter_kws={'s': 30})

#set the axis labels and graph title and then show it
plt.xlabel('Year')
plt.ylabel('3-Cone time (seconds)')
plt.title('Average 3-Cone Time at the NFL Combine')
plt.show()
Now that we have looked at speed, I’m going to look at strength. The best way to do this is to
look at the results of the bench press event. This is simply as many reps as a player can do on
bench press with 225 pounds. For the sake of not repeating the same code I will present this as
a bar graph. I will take the average amount of reps per player per year and look at them over all
the years. My hope is to see an increase in the bars height over time to show an increase in the
average amount of reps done.

From this we can observe a clear increase from 1987 to around 2009. This shows an increase in
strength over this time period, at least when it comes to bench pressing. The interesting part is
how the bars start to trend downwards after 2009. According to this graph it show a decrease
over the last ten years of the sample. If I could improve this I would definitely find a sample that
dates up until 2022 so I could see if this trend continued over the last few years. Overall, there
is a clear increase in the reps done in 1987 compared to 2018, this backs up my theory that
athletes are becoming more athletic throughout the years.

The code and it’s purpose:


#filter data to include only years 1987-2018
data = combine_data[(combine_data['year'] >= 1987) & (combine_data['year'] <= 2018)]

#picking my columns
data = data[['year', 'bench_press']].copy()

#drop missing data


data.dropna(subset=['year', 'bench_press'], inplace=True)

#group the data and calculate the mean bench press amount for each year
grouped_data = data.groupby('year')['bench_press'].mean().reset_index()

#create the bar-graph using seaborn


sns.barplot(x='year', y='bench_press', data=grouped_data)

#set the axis labels and graph title and then show it
plt.xlabel('Year')
plt.ylabel('Average bench press amount')
plt.title('Average Bench Press Amount at the NFL Combine')
plt.show()
Next I would like to look at players jumping ability. To do this I will use both” vertical jump” and
“broad jump.” I want to create a plot with both of these categories calculated means per year
once again, and compare them time in years. I will add trend lines and a legend so it easy to
observe each. However the line with the larger value is obviously the broad jump as it is a
different test. I am expecting the trend lines show an increase in values over the years.

From this graph we can see that the trend line for both Vertical Leap and Broad Jump trends
upwards over time. We can observe a larger increase in Broad Jump, but still a small increase in
Vertical leap as well. This graph backs up my theory, by showing that on average athletes
jumping ability is increasing over the years.

The code and it’s purpose:


#specify the years I want the data
data = combine_data[(combine_data['year'] >= 1987) & (combine_data['year'] <= 2018)] #Set the axis labels and graph title
plt.xlabel('Year')
#select the columns I want plt.ylabel('Mean vertical leap/broad jump (inches)')
data = data[['year', 'vert_leap_in', 'broad_jump_in']].copy() plt.title('Mean Vertical Leap and Broad Jump at the NFL Combine')

#drop missing data #Add a legend


data.dropna(subset=['year', 'vert_leap_in', 'broad_jump_in'], inplace=True) plt.legend()

#group the data and calculate the mean vert leap and broad jump for each year #Add a trend line for each scatter plot and show it
grouped_data = data.groupby('year')['vert_leap_in', 'broad_jump_in'].mean().reset_index() sns.regplot(x='year', y='vert_leap_in', data=grouped_data, scatter=False)
sns.regplot(x='year', y='broad_jump_in', data=grouped_data, scatter=False)
#create the scatter plot using seaborn plt.show()
sns.scatterplot(x='year', y='vert_leap_in', data=grouped_data, label='Vertical Leap')
sns.scatterplot(x='year', y='broad_jump_in', data=grouped_data, label='Broad Jump')
All of these plots provide some evidence of my theory that football athletes have become
increasingly more athletic over the years. We’ve observed multiple trend lines in the data sets
that prove this. However now I would like to look at how the weight effects their results in the
different drills. Once we see if this has any effect on the results, we will look at how weight has
changed in football players over the years and determine if it has any relation to the improving
athleticism.

First let’s look at how weight effects the results of the 40 yard dash. I assume that the heavier
the athlete, the slower the time as it is harder to move more weight. I will create a scatter plot
comparing the weight (Ibs) to the results of 40 yard to show any relationships.

As I predicted, this plot clearly shows a


relationship between weight and
speed. The lighter athletes are able to
move quicker as they don’t have to
move as much weight. This is only
beneficial in certain positions on the
team, as football isn’t all about speed.

The code and it’s purpose:


#filter data to include only years 1982-2018
data = combine_data[(combine_data['year'] >= 1982) & (combine_data['year'] <= 2018)]

#select the columns I am using


data = data[['weight_lbs', '40_yard']].copy()

#drop any missing data


data.dropna(subset=['weight_lbs', '40_yard'], inplace=True)

#create the scatter plot with the classic trend line using seaborn
sns.regplot(x='weight_lbs', y='40_yard', data=data)

#set the axis labels and graph title and show it


plt.xlabel('Weight (lbs)')
plt.ylabel('40-yard time (seconds)')
plt.title('Weight vs. 40-yard time at the NFL Combine')
plt.show()
Now let’s look at how the weight of the athletes affects the bench press. For this I will use a
scatter comparing the bench press reps to weight. I would also like to add a hue of the
“position,” this way we can see which positions are bench pressing the most. My prediction is
that the more weight the player, the higher amount of reps they will do. I also predict we will
see more reps in positions such as tackles, centres and guards as these are usually the bigger
guys that can lift a lot of weight.

As we can see from the trend line my prediction was correct, the amount of reps done
increases overall with the players weight. We can also notice my prediction on which players
was somewhat right. I was also right about the tackles, centres and guards dominating the
higher rep range, but for any football fan this was a given. We can that positions such as kicker
are down in the low rep area. This makes sense because a position like kicker doesn’t require
much muscle mass or strength, it’s all in the legs.

The code and it’s purpose:


#select the columns I am using
data = combine_data[['bench_press', 'weight_lbs', 'pos']].copy()

#clear missing data in the columns


data.dropna(subset=['bench_press', 'weight_lbs', 'pos'], inplace=True)

#create the scatter plot with a hue for position


sns.scatterplot(x='weight_lbs', y='bench_press', data=data, hue='pos')

#add a trend line coloured red


sns.regplot(x='weight_lbs', y='bench_press', data=data, scatter=False, color='red')

#set the axis labels and graph title and then show it
plt.xlabel('Weight (lbs)')
plt.ylabel('Bench press')
plt.title('Bench press vs weight at the NFL Combine by position')
plt.show()
Now I would like to look to compare weight to the jumping ability. For this I will create a
stacked bar graph of both broad jump and vertical leap compared to different weights. I will
make bins of 10 pounds. If my prediction is correct we will see smaller jump results in those
that are heavier as it is harder to push more weight.

From this we can observe the downwards trend in bar height as weight increases in both
vertical leap and broad jump. This shows that my prediction was correct and that the heavier
the player, the less they can jump on average.

Here the code and it’s purpose


#select the columns
data = combine_data[['weight_lbs', 'vert_leap_in', 'broad_jump_in']].copy()

#drop any missing data in the columns


data.dropna(subset=['weight_lbs', 'vert_leap_in', 'broad_jump_in'], inplace=True)

#group weight into 10-pound intervals from 150lbs to 400lbs


data['weight_group'] = pd.cut(data['weight_lbs'], bins=range(150, 400, 10), right=False)

#calculate the mean of vert-leap and broad jump for all the weight group
grouped_data = data.groupby('weight_group')[['vert_leap_in', 'broad_jump_in']].mean()

#plot the graph in stacked bar form


grouped_data.plot(kind='bar', stacked=True)

#set the axis labels and title the graph and then show it
plt.xlabel('Weight (lbs)')
plt.ylabel('Inches')
plt.title('Vertical leap and broad jump by weight at the NFL Combine')
plt.show()
Now that we can see how weight plays a role in results in speed, strength and jumping ability,
let’s see how the weight in football players has changed over the years to see if this could be a
influential factor on faster 40 yard speeds and increase in jumping distances. For this we will
simply create a line graph of the mean weight per year. From this we can see how weight has
trended over the last 30 years.

As we can see here, the weight


actually increased from 1987 to
around 2003. This is interesting
because 40 times and jumping
ability also increases over this
time meaning players were
getting heavier, faster and
jumping higher and further on
average. After around 2003 the
weight seems to start to trend
back downwards which could
potentially be a influential factor
to the improvement of
athleticism over this time.

The code and it’s purpose :

#filter data to include only weight and year columns and remove any missing data
data = combine_data[['year', 'weight_lbs']].dropna()

#convert year column to an integer


data['year'] = data['year'].astype(int)

#filter data to include only years from 1987 - 2018


data = data.loc[(data['year'] >= 1987) & (data['year'] <= 2018)]

#group data by year and calculate the mean weight for each year
grouped_data = data.groupby('year')['weight_lbs'].mean().reset_index()

#create line plot using seaborn


sns.lineplot(x='year', y='weight_lbs', data=grouped_data)

#set the axis labels and graph title and show it


plt.xlabel('Year')
plt.ylabel('Mean Weight (lbs)')
plt.title('Mean Weight at the NFL Combine from 1987 to 2018')
plt.show()
Overall, by observing this data and breaking it down into these graphs, we are able to see how
athleticism in football players has changed over the years. We can observe that my prediction
was right. Athletes ability in the combine shows steady improvement in speed, strength, and
jumping ability over the years. We also are able to observe how a players weight effects their
performance in different drills. This was helpful because we can now compare the change in
weight over the years to the change in ability. I hope you, the readers, find this data break
down of the NFL combine somewhat informative. I know I did.

Here are my sources for code help:

https://www.kaggle.com/code/themlphdstudent/cheat-sheet-seaborn-charts

https://www.geeksforgeeks.org/how-to-create-a-stacked-bar-plot-in-seaborn/

https://seaborn.pydata.org/generated/seaborn.regplot.html

As well as labs in class.

You might also like