You are on page 1of 20

Scott Price

SPAD 637-Sport Analytics

Research Paper

Examining the Relationship between Positional Spending in the NFL and Success Metrics

Introduction

Within professional football, it is nearly universal knowledge that the quarterback

position is the “most valuable position” in the sport, (Hughes, Koedel, Price 2015) with it often

being assumed to be the most valuable position in professional team sports. However, beyond the

quarterback discussion there is a lot of debate as to what positions for teams to value highest,

with many differing opinions being seen. This topic is an important point of discussion now in

pro football as we see the salary cap drop to $182.5 million due to the pandemic, down from

$198.2 million. (Belson, 2021) although we expect it to rise with economic recovery through the

industry. Teams in the NFL are beginning to adopt more analytical thought processes after

seeing the success of “Moneyball” teams in baseball like the Tampa Bay Rays and the Oakland

Athletics. Personnel department decisions should use new-age analytical tools look at all the data

available and determine the value of different positions in relation to success of your team would

be a great piece of information to see which position groups are most valuable.

As Patrick Schilling shared in his article, “The Art of Positional Spending in the NFL”,

“If a team is unable to balance their pay throughout their roster to excel in all three phases of the

game, it’s highly unlikely that they will be able to lift the Lombardi Trophy at the end of the

season.” He then goes on to mention the 2017 Philadelphia Eagles, Super Bowl LII champions,

noting how they ranked only 26th in the NFL at quarterback spending. According to Schilling,
the Eagles did a good job in this season spreading the limited cap space around position groups,

allowing them to be strong throughout the roster rather than having the cap space heavily spent

solely on one position or even one player.

The purpose of this research will be to find statistically significant data linking NFL positional

group salaries to different metrics of success. This will be achieved through regression model

analysis and correlation studies to see whether there is a statistically significant link between

positional spending and these measures of success.

The main stakeholders in this research should be NFL personnel departments and owners.

Personnel departments are the first stakeholders as they are the ones who would be able to best

use the information that comes from this study, as it will better allow them to value players based

on position. Owners would also be stakeholders, although secondary, as their main desire should

be to find as much bang for their buck in terms of salary spent on players. Another theoretical

stakeholder could be the gambling community, as both the oddsmakers and the bettors

themselves use every advantage possible, including analytical data, in order to guarantee the

highest return possible. (Assunção, Pelechrinis 2018)

The main research question that is being asked in this research would ask are “What is

the relationship between positional spending in the NFL and both yards per play and points per

game?” These two metrics of success will give this study the best chance to project whether

positional spending has an appreciable amount of impact on the variance of success.

Method

For my research into the relationship between positional spending in the NFL and success

on the field, I will use two main sources. Salary data will be collected from Spotrac.com, which
has already tracked the positional spending by NFL franchises back to the 2015 season. I had

initially thought going into this research I would be able to find data further back in time, but I

feel that the 6 seasons of data from 2015 to 2020 will be sufficient to give my research a large

enough sample size. When it comes to my metric for success, which will be yards per play from

scrimmage, I will have to use two different sources, NFL.com and ESPN.com. This is solely

because both sites only publish one of the metrics required to compute this calculation, with the

NFL publishing plays from scrimmage and ESPN providing total yardage. When it comes to

position groups, I will split them by major groups as Spotrac already does. This will leave me

with 9 position groups to analyze: quarterback, running backs, wide receivers, tight ends,

offensive lineman, defensive lineman, linebackers, defensive backs, and specialists. Ideally I

would like to split these groups even further, but for the case of this research it will be sufficient,

giving us ample information to draw from.

Prior to running the correlation analysis, I will check the normality of the DV’s for my

research by creating a histogram of each of the DV’s that will be used in this study (yards per

play and points per game). This histogram will show us the distribution curve of the DV’s,

allowing us to see whether there is a normal curve.

In order to find whether there is a significant statistical link between NFL positional spending

and yards per play, I would run a correlation analysis between yards per play and points per

game, which will be the dependent variables for this test, and the NFL team spending by

positional groups, which will be my independent variables. Both variables are continuous values,

with them more specifically being ratio variables. First running a correlation analysis before

moving on to running a regression model will give the research a way to find whether there is a

significant link between these variables, as well as giving a clue as to what the regression model
will give us. For the correlation analysis, the null hypothesis will be that there is no significant

statistical link between yards per play/points per game and positional spending by NFL teams.

The alternative hypothesis will therefore be that there is a statistically significant link between

yards per play/points per game and positional spending. The correlation analysis will also be ran

in order to see if there are any multicollinearity concerns between the independent variables.

This will be done by examining whether there is any significant correlation between our

independent variables.

After running the correlation analysis to see whether there is a link between the variables,

a multiple linear regression model will be run to see the what the links are and how the

independent variables affect the dependent variables. The first metric that will be looked at after

running the regression will be the R-square, which will tell us about the total variance in the

DV’s that is caused by the IV’s. Running a regression model is imperative finding a solid answer

to my research question as, assuming we reject the null hypothesis of the correlation analysis, it

will best show the actual link between success and positional spending by showing the impact of

said link using the coefficients provided by the regression model. We will finally check the

normality of our regression model by creating a histogram of the residuals that come from our

models.

I will first look at major descriptive statistics like mean, median, standard deviation and

variance in my results section. When the multiple linear regression model is run, I will use the p-

values as a check in order to determine whether the results of my regression model are

statistically significant. In the results section of my research, I will explain the significance of the

results of the regression model and using the coefficients of that model in tandem with the

descriptive statistics gathered from the dataset I hope to determine what positional groups are
more valuable for NFL teams to invest in, and perhaps more importantly identify where NFL

teams should look to spend in order to get greater value out of their spending.

Results

First we looked at basic descriptive statistics for our independent variables and our

dependent variables. For our dependent variables (see tables 1-2), we found that the averages for

points per game both scored and allowed were 23.05, with a standard deviation of 4.26 for points

scored and 3.54 for points allowed. For yards per play, we see an average of 5.48 yards both

gained and allowed per play, with a standard deviation of 0.47 for yards gained and 0.41 for

yards allowed. Interpreting these differences in standard deviation show the potential that NFL

offenses show a higher degree of variance in terms of success metrics rather than their

counterparts on defense, but more research would need to be done to determine that. When

looking at the dependent variables of positional spending (tables 3-4), we see large variance and

standard deviation numbers for all position groups, showing that there is a wide range of figures

between the datapoints.

Table 1

Descriptive Statistics: NFL Offenses


Column1 Scrm Plys YDS PTS/G YDS/PLAY
Mean 1019.92 5590.01 23.05 5.48
StdDev 45.94 551.85 4.26 0.47
304539.8
Variance 2110.27 4 18.11 0.22
Min 878.00 3865.00 14.00 4.28
1Q 994.75 5231.50 20.15 5.14
Median 1018.50 5654.50 22.90 5.46
3Q 1052.00 5975.25 25.90 5.79
Max 1135.00 6904.00 35.30 6.84
Table 2

Descriptive Statistics: NFL Defenses


Scrm Plys YDS PTS/G YDS/PLAY
Mean 1019.92 5590.01 23.05 5.48
StdDev 40.57 501.09 3.54 0.41
Variance 1645.97 251091.28 12.54 0.16
Min 921.00 4414.00 14.10 4.39
1Q 994.00 5249.50 20.18 5.19
Median 1017.00 5587.00 23.10 5.48
3Q 1045.25 5939.75 25.43 5.74
Max 1148.00 6725.00 32.40 6.65

Table 3

Descriptive Statistics: NFL Offensive Payroll


Column
1 QB RB/FB WR TE OL OFF TOT
Mean $14,278,646 $5,930,729 $13,980,208 $5,791,146 $21,766,667 $61,747,396
StdDev 9133250.393 3560276.132 7624598.62 3793933.581 8516560.665 17339387.77
Var 8.34163E+13 1.26756E+13 5.81345E+13 1.43939E+13 7.25318E+13 3.00654E+14
Min $400,000 $600,000 $1,300,000 $600,000 $2,600,000 $16,400,000
1Q $6,525,000 $3,000,000 $7,675,000 $2,700,000 $15,100,000 $48,500,000
Med $13,700,000 $5,200,000 $13,000,000 $5,000,000 $21,500,000 $61,400,000
3Q $21,950,000 $8,325,000 $18,900,000 $7,700,000 $27,025,000 $75,175,000
Max $47,200,000 $17,400,000 $36,400,000 $16,400,000 $49,200,000 $111,200,000

Table 4

Descriptive Statistics: NFL Defensive Payroll


Column
1 DL LB DB DEF TOT
Mean $20,674,479 $15,805,729 $20,752,083 $57,232,292
StdDev 10996572.61 8786448.994 8689001.419 15265316.73
Var 1.20925E+14 7.72017E+13 7.54987E+13 2.3303E+14
Min $1,500,000 $1,500,000 $6,000,000 $20,100,000
1Q $12,125,000 $9,475,000 $13,400,000 $46,375,000
Med $19,800,000 $13,950,000 $20,500,000 $56,150,000
3Q $27,800,000 $21,400,000 $26,925,000 $66,925,000
Max $51,300,000 $49,000,000 $49,700,000 $103,900,000
Next, I looked at the normality of the distribution curve for our dependent variables

(charts 5-8), with the histogram showing normal distribution for both variables on offense and

defense. The concern shown from these histograms is in the histogram of points per game, in

which we see a small dip in the otherwise normal distribution curve.

Chart 5

Chart 6
Chart 7

Chart 8
After checking the normality of the dependent variables, a correlation analysis was ran in

order to determine the relationship between our independent variables and our dependent

variables (Tables 9-10). In our correlation analysis of NFL offenses, we see that both of our

dependent variables have weak positive relationships with QB spending and OL spending, with

the other position groups having negligible correlation. Because of these weak relationships with

the dependent variables, we reject the null hypotheses that there is no statistically significant

relationship between positional spending and yards per play/points per game and accept the

alternative hypotheses. On defense however, we see negligible negative correlations between all

of our dependent variables and independent variables. For both offense and defense we do not

have multicollinearity concerns as there is negligible correlation between the independent

variables in each analysis.

Table 9 (Offensive Correlation)


YDS/
  PTS/G PLAY QB RB/FB WR TE OL
PTS/G 1
YDS/ 0.77943
PLAY 2 1
0.30735 0.31121
QB 6 2 1
0.10807 0.02737 -
RB/FB 8 1 0.08235 1
0.18277 0.18653 0.03920 -
WR 9 2 2 0.08131 1
0.16237 0.06922 0.02375 0.08440
R TE 8 3 0.17346 7 7 1
0.35043 0.15649 0.02002 0.08628 0.14701
OL 0.32626 5 4 3 2 6 1

Table 10 (Defensive Correlation)


YDS/
  PTS/G PLAY DL LB DB
PTS/G 1
YDS/ 0.69995
PLAY 2 1
- -
DL 0.12252 0.07917 1
- -
LB -0.1488 0.16073 0.29759 1
- - 0.07533 0.01647
DB 0.23423 0.12037 8 4 1

After the correlation analysis we move on to the multilinear regression models ran, with

two models both on offense and defense, one using yards per play as the IV and one using points

per game. (See Tables 11-14) All four of our regression models show F-stats of less than 0.05, so

all of our models as a whole are statistically significant and we would reject the null hypothesis

that there is no link between the dependent variable and positional spending by NFL teams and

accept the alternative hypotheses. The R-square show on both models for offense that

approximately 21% of all variances in the dependent variables is due to positional spending. On

defense however we see that positional spending only has 10% of the effect on variance of points

per game, and 5% of the effect on the variance of yards per play allowed. The individual p-
values in our models show us that RB and TE spending are not statistically significant in our

yards per play model on offense, while TE spending is not statistically significant to the model in

our offensive PPG regression as well. On defense, DL and DB spending is not statistically

significant to our YPP model. All other dependent variables have a p-value of less than 0.05 and

are statistically significant to our model. The coefficients off of each of our regression model

show us how much we would expect positional spending to be based on each of our dependent

variables. For example, in our regression model of offensive PPG, we expect to see an additional

PPG scored for every increase in QB spending by ~$1.21m.

Table 11 (Offense YPP Regression)

R Square 0.2170326
Coefficien
  ts P-value
Intercep 4.758E-
t 4.7706655 93
8.038E-
QB 1.379E-08 05
0.382538
RB/FB 7.521E-09 6
0.016959
WR 9.685E-09 2
0.588454
TE -4.45E-09 8
1.196E-
OL 1.641E-08 05
Table 12 (Offense PPG Regression)

R Square 0.2183195
Coefficien
  ts P-value
Intercep 1.075E-
t 15.898148 32
0.000128
QB 1.216E-07 9
0.039471
RB/FB 1.62E-07 7
WR 8.7E-08 0.018319
3
0.348720
TE 7.013E-08 9
0.000124
OL 1.3E-07 5

Table 13 (Defense YPP Regression)


R Square 0.0549666
Coefficien
  ts P-value
Intercep 3.37E-
t 5.8256443 113
0.082991
DL -4.80E-09 9
LB -9.13E-09 0.008518
0.133167
DB -5.02E-09 7

Table 14 (Defense PPG Regression)


R Square 0.1001421
Coefficien
  ts P-value
Intercep
t 27.224364 5.33E-70
0.025501
DL -5.27E-08 2
0.008159
LB -7.82E-08 2
0.001903
DB -8.91E-08 4

For a final check of our regression models, we looked at the distribution curve of the

residuals found by our models. (see charts 15-18) In each of these histograms we see a normal

distribution curve, which is what we would expect to see. This is important to see as it is a

validity test for our regression models, if we were to see an abnormal distribution it would tell us

there is an issue with our regression model.


Chart 15

Chart 16
Chart 17

Chart 18
Discussion

After going through the results of my regression models, one of the clearest datapoints that I

could find was the strength of the spending on QB and OL in both the model for Points per

Game and Yards Per Play, especially in comparison to the other offensive positions. This finding

supports the common opinions that these are the most important positions to spend on.

(Schilling) When looking at the league we see that NFL teams on average follow the guidance

that this model would forecast, with QB and OL both having two highest average salary spent

per position group on offense. Our model also infers that WR is the most significant skill

position group to spend on, more so than at TE or at RB/FB. I would hypothesize this is due to

the increase in NFL offenses relying on passing during the years included in the model.

When we move over to look at defenses, one of the more eye-catching results that came

out of the two regression models was the lack of statistical significance from the DL and DB

position groups in the defensive YPP model. I personally was surprised to see this as a result, as

many teams in the NFL are putting less focus on the LB position, which was the only position

group found to be statistically significant in that same defensive YPP model. Fundamentally,

however, we saw minimal link between the positional spending by NFL teams on defense and

metrics of team success. I would hypothesize that this is potentially due to defenses

fundamentally being less individually focused than offenses that build around individual players

like QBs.

In my eyes one of the major practical takeaways that I see through this research is that

there is not a higher amount of variance that is caused by positional spending. While positional

spending as a whole did have an approximate impact on the variance of our dependent variables

of 21%, not an insignificant value, the vast majority of this value comes from the QB position
along with the OL. Seeing minimal impact from the skill positions on the variance of our

dependent variables was my most surprising result, as we think of skill positions as intuitively

having major impacts on the game, although overall the good and bad may even out. This value

of positions is something I would encourage other researchers to investigate, looking the total

win share value of these skill positions as a whole.

Another practical takeaway is the minimal amount of variance caused by positional

spending on defenses. Only having 5% of variance in dependent variables caused by positional

spending is incredibly marginal, even in a league like the NFL where these types of margins

matter. We also saw a large spread of our residuals from our regression models, especially our

PPG model, which aligns well with the marginal impact of our independent variables on our

dependent variables. Overall, I would say that there is minimal reason for NFL teams to spend

resources to optimize positional spending, as my opinion after undertaking this research is that

positional spending has no tangible impact on the defensive side of football.

Overall, I would say that this research has changed my thoughts on positional spending in

some ways, while solidifying what I thought about others. Finding that QB spending had a

sizable impact on success wasn’t surprising to me, while also finding that spending on a position

like TE did not was not what I had expected. Going forward, one topic I would like explored is

whether individual salaries at these positions make differences, and whether most big-money

contracts equate to value and win shares on the field. One major limitation for my study was that

it only encompassed the 5 years from 2016-2020. While I feel that this was a large enough

sample size to find conclusions from the data, especially about the current NFL landscape, I

would encourage researchers to look at how this position value has changed over time, especially

as the NFL has transitioned from a run-first league to a pass-first one. Something else that I
would suggest fellow researchers look at in the future is whether the significance that the

regression models found at the WR position has been that way over time, or if in the past a

position like RB was more significant, especially further in the past when NFL offenses were

more heavily reliant on running the ball.

Citations:

Hughes, A., Koedel, C., & Price, J. A. (2015). Positional WAR in the National Football League.

Journal of Sports Economics, 16(6), 597–613. https://doi.org/10.1177/1527002515580931

Belson, K. (2021, March 10). N.F.L. salary cap drops to $182.5 million for 2021. The New York

Times. Retrieved October 5, 2021, from

https://www.nytimes.com/2021/03/10/sports/football/nfl-salary-cap.html#:~:text=The%20N.F.L.

%20has%20determined%20the,%24198.2%20million%2C%20a%20league%20record.

Schilling, P. (n.d.). It's not all about the quarterback. Samford University. Retrieved October 5,

2021, from https://www.samford.edu/sports-analytics/fans/2019/The-Art-of-Positional-

Spending-Part-1.

Assunção, R., & Pelechrinisis, K. (n.d.). Sports analytics in the era of big data ... -

liebertpub.com. Retrieved October 5, 2021, from

https://www.liebertpub.com/doi/full/10.1089/big.2018.29028.edi.

ESPN Internet Ventures. (n.d.). 2021 NFL Team Total Offense Stats. ESPN. Retrieved October
19, 2021, from https://www.espn.com/nfl/stats/team.
NFL 2021 Reg - Offense Passing Stats. NFL.com. (n.d.). Retrieved October 19, 2021, from
https://www.nfl.com/stats/team-stats/.
NFL positional payrolls. Spotrac.com. (n.d.). Retrieved October 19, 2021, from
https://www.spotrac.com/nfl/positional/breakdown/.
SPAD 637 Poor Satisfactory Good Excellent
Sport Analytics
Project Rubric
Difficult to An issue is Explains the issue Clearly explains the
understand the issue presented, but no but does not place it issue within
Introduction being investigated. context provided. within a larger context. Well-
No clear purpose No indication of context. Purpose written purpose
statement provided. study purpose. statement provided. statement.
Minimal discussion Mentions some Stakeholder groups Thorough analysis
20 Points of relevant stakeholder groups. listed. RQs present, of relevant
stakeholder groups. RQs either not but wording needs stakeholder groups.
Grade: 16/20 No RQs provided. provided or not improvement. Appropriately
worded worded RQs.
appropriately.
Does not mention Identifies IVs and Fully explains main Fully explains main
IVs and DVs used DVs, but does not IVs and DVs. Good IVs and DVs.
for analysis. Very explain what types explanation of data Detailed
Method surface level of variables they collection strategy. explanation of data
explanation of data are. Data collection Indicates use of collection strategy.
collection strategy. strategy is not descriptive stats and Appropriate
No indication of explained in detail. inferential tests but identification of
20 Points data analysis No coherent does not explain descriptive
techniques/strategy. indication of data analysis statistics, inferential
Grade: 18/20 analysis technique/strategy tests, and
techniques/strategy. in detail. assumption checks
to perform on the
data.
No indication of Does not provide Provides Provides
Results descriptive stats or descriptive stats. appropriate appropriate
inferential test Inferential tests do descriptive stats and descriptive stats and
results. No use of not match what was inferential test inferential test
tables and figures to outlined in Method. results as outlined results as outlined
20 Points represent data. Very little use of in Method section. in Method section.
tables and figures to Some use of tables Excellent use of
Grade: 17/20 represent data. and figures to tables and figures to
represent data. represent data.
Almost no actual Limited discussion Good discussion of Strong discussion of
discussion of of results. Maybe results. Some results. Excellent
Discussion results. No practical one practical practical description of
implications implication implications practical
offered. Missing provided. Possibly provided. Well- implications. Well-
20 Points several of missing Limitations, written Limitations, written Limitations,
Limitations, Future Future Research, Future Research, Future Research,
Grade: 18/20 Research, and/or and/or Conclusion and Conclusion and Conclusion
Conclusion sections. sections. sections.
sections.
Grammar, Unprofessional Formatting lacks Clean, consistent Professional
Writing, and document consistency. formatting presentation. Clean,
Formatting presentation. No Obvious grammar throughout. Some consistent
clear and consistent or spelling errors. minor grammar or formatting. No or
20 points formatting. Many Consistent issues spelling errors. very minor
spelling and with APA Some issues with grammar or spelling
Grade: 18/20 grammar errors. formatting APA formatting. errors. Appropriate
Very little thought throughout. APA formatting
given to APA throughout.
formatting.
Total

100 points

Overall: 87/100. Nice work on this project Scott!

You might also like