Commentary On David Mease's Paper On Penalised Maximum Likelihood Rankings For College Football

MATH3000: Information Skills in Mathematics
A Penalised Maximum Likelihood Approach for the Ranking of College Football Teams Independent of Victory Margins
Jonathan Windridge - Student No : 200603055
Introduction
The college football system in America is one of the biggest sporting organistations in America; the largest league of which is the National Collegiate Athletic Association (NCAA). NCAA teams are split across 3 divisions, with some divisions themselves split into subdivisions. The Football Bowl Subdivision (FBS, formerly Division 1-A) is the most elite level of college football in the USA, with 125 teams playing at this level[2]. Due to the size of this division, it is impractical for each team to play all others (125 C2 = 7750 possible games) over the course of a season, and so an alternative method is required to rank the teams, based on the xtures that they have played. Traditionally, such a ranking was produced via the result of two polls: one run by the Associated Press addressed to journalists and sports writers; the other run by USA Today adressed to head coaches at various colleges in the league. As the author explains, this system was augmented at the beginning of the 1998 with the introduction of the Bowl Championship Series (BCS), which used computer models in addition to the poll results to determine the top teams in the league[1, p.1]. The top teams then played each other in a series of bowl games to decide the end-of-season rankings. The computer models were introduced for a variety of reasons, namely because unlike the humans involved in the polls, computer programs have no bias towards any particular team (as might be the case with college alumni involved in the poll), and can consider all the results in a 14-week season, something virtually impossible for a human to do accurately[1, p. 2], as can be seen in Section 2. Since the introduction of these computer models into the BCS system, they have been changed and updated a number of times by the league, with many more suggestions for replacements oered by statisticians in journal articles such as the subject of this report. The author mentions a number of these models and their potential pitfalls - these are discussed in more detail in Section 3. In Section 4, an explanation is given of the mathematics behind the model proposed in the title article, including a breakdown of its various components. In Section 5, another journal article referencing David Meases work is explored, whilst in Section 6, a practical application of the authors model is made to the BCS season just nished and compared to the ocial team rankings. Finally, in Section 7, we conclude with a look at the future of the NCAA championship and statistics role in helping determine teams rankings.
Motivating Example
Mease gives a small example schedule (5 teams, 8 xtures) to demonstrate the diculties of fairly ranking even a small schedule, which is recreated below (Team A vs Team B to mean Team A defeats Team B): Team: Wins-Losses Team A: 2-1 Team B: 2-0 Team C: 2-1 Team D: 2-1 Team E: 0-5
(1) Team (2) Team (3) Team (4) Team
A vs Team C A vs Team E B vs Team A B vs Team E
(5) Team (6) Team (7) Team (8) Team
C vs Team D C vs Team E D vs Team E D vs Team E
Based soley on this win-loss record data, the following ranking might be suggested: 1. Team B: Has no losses and beat Team A, a strong opponent who would have otherwise been undefeated 2. Team A: Only loss was to Team B, the #1 ranked team 3. Team C: Won against Team D but lost to Team A 4. Team D: Only wins were over Team E, the worst-ranked team 5. Team E: Lost every game, including two against Team D who without Team E would be the worst-ranked team This argument is workable for an illustratory league such as this, but becomes increasingly complex as the number of teams increases to a more realistic level, and by the time it reaches the number of teams in the FBS, is virtually unworkable for a human.
Previous Methods
As the author notes [1, p. 4], the question of the ranking of college football teams is not a new problem, and several approaches have already been oered in various journal articles. In this section, an attempt is made to explain the basic reasoning behind the main models mentioned.
3.1
Harvilles Linear Model
Harvilles paper, whilst published well in advance of the 1998 BCS introduction, attempts to construct a rating system for a teams performance in a given league, without giving encouragement to the running up of the score (continuing to score points against a weaker team after the result has already eectively been decided) - a practice that is considered bad sportsmanship[3, p.278]. The model Harville proposes takes into consideration the eects of so-called homeeld advantage and a baseline performance level for a team, as well as deviation from such a performance level on a game-by-game basis[3, p. 279]. The model is proposed as follows: yk = xk + b1 (k) b2 (k) + c1 (k),1 (k) + c2 (k),2 (k) + ek [3, p. 279] (1)
where: yk is the number of points team 1 (k ) scores in game k minus the number of points scored by team 2 (k ) xk = 1, 0 or 1 depending on whether game k is played on team 1 (k )s home eld, on a neutral eld, or on the road is an unknown parameter quantifying home-eld advantage a (k ) = j if game k is the j th game played by team a (k ) bi is the baseline performance of team i cij is the deviation in perofmance of team i in its k th game. ek is a residual term A ranking of the teams is then achieved via a maximum-likelihood estimate of the bi [3, p. 281]. However, to limit the incentive for running up the score, Harville also proposes an alternative version of the model, where the yi are truncated at a pre-determined maximum value K , with values of K = 1 and K = 15 being two examples given[3, p. 261]; Mease refers to these truncated systems when comparing the eectiveness of the model construced within the article.
3.2
Keener & the Bradley-Terry Model
In Keeners article, a variety of dierent possible comparison methods for sports teams are set out, the last two sections of which deal specically with an application to the (at the time) Division 1-A college football teams. The following method is suggested as a possible ranking system. First, dene a ranking vector r such that ri [4, p. 86] (2) ij = ri + rj where: ij is the probability that team i beats team j Next, take aij to be 1 if team i beats team j , 0 otherwise. It is assumed that the outome of each match is a Bernoulli trial determined by the values for ij , giving the probability of an outcome being aij as P =
i<j
aij + aji a a ijij jiji aij aij + aji aij ri ri + rj

aij
=
i<j
rj ri + rj
aji
(3) [4, p. 87 89]
Since the outcomes are already known, the ri are chosen such that P (r) is maximised, and the vector r can be ordered to give a ranking of the teams. Keener suggests that a better choice for the aij (as opposed to 0 or 1) would be aij = Sij Sij + Sji [4, p. 90] (4)
where: Sij is the number of points team i scores in the match 3
Sji is the number of points team j scores in the match and gives the models ranking of teams in the 1989 season. However, this method has a major aw; for any undefeated team (i say), the maximum likelihood estimator for ri is innity, and so for any season where there is more than one undefeated team this model gives no means to determine the winner. As Mease notes, whilst in the 1989 season this was not the case, the varying schedule diculty across the league can make this a frequent occurrence [1, p. 5].
The Proposed Model
The rst assumption made is that pi , the perfomance level of each team for a given game is distributed according to pi N (i , 0.5), where i is the teams baseline performance(which we obtain estimates for in order to produce a ranking). It is not explained why the value 2 = 0.5 is chosen, although I would suggest this is for convenience, as explained in equations 5 and 6. As Mease explains, [1, p. 6], this can be viewed in one of two ways. Firstly, treating a teams performance level on any given day as random gives a reasoning as to why, although it is rare, very good teams can be defeated by much worse teams. Alternatively, if one takes the performance level on any given day to be a function of a large number of independent factors (such as injuries, player fatigue, mental attitude, ground conditions, weather, etc), then the Central Limit Theorem can be invoked to claim this as approximately normal. Next we assume that for any two teams X & Y , pX is independent of pY , and that all games are won by whichever team has the greater p value. From this, we obtain the probability of team X defeating team Y as follows: Dene: W = P r(Team X defeats Team Y ) = P r(pX > pY ) = P r(pX pY > 0) pX N (X , 0.5), pY N (Y , 0.5), so W N (X Y , 0.5 + 0.5) = N (X Y , 1) (5)
Then, assuming that over a large number of possible games between various dierent opponents, each side has an equal chance of winning, we may take = 0 in W N (, 1) (since W = 0 gives each team an equal chance of winning). Thus we have: W N (0, 1) =Z and so, for specic teams X &Y : P r(Team X defeats Team Y ) =(X Y ) where: Z is the standard normal distribution. is the standard cumulative normal distribution. In this way, the choice of 2 = 0.5 may be seen simply as a way of conveniently obtaining a standard normal distribution for the probability of X defeating Y . Finally, dene the following: S = Ordered pairs (i, j ) where team i defeats team j for both teams i, j in the FBS nij = Number of times team i defeats team j. S = Ordered pairs (i, j )where team i defeats team j for one of i, j = n + 1 and the other in {1, . . . , n} 4 (7) (8) (9) (6)
Then the likelihood function l() is given by:
Part B
n
l ( ) =
(i,j )S
[(i j )]nij
i=1
(i )(i ) (n+1 )(n+1 )

(i,j )S
[(i j )]nij
[1, p. 7]
Part A
Part C
(10)
This function can be broken down into three distinct parts, the reasoning and mathematics behind each of which will now be discussed.
4.1
Part A
lA () =
(i,j )S
Isolating part A, we have: [(i j )]nij (11)
This part of the equation deals with the probability discussed in equation 6. When lA is maximised for values of , teams obtain a large value of by defeating other teams with large values. As Mease explains, this mimics the thought process used by a human in Section 2 [1, p. 7].
4.2
Part B
Part A is not sucient on its own, however, because as mentioned in Section 3.2, denitions of the win probability of a form such as that in equation 6 give an unbounded estimate of for undefeated teams, making it impossible to choose a leader. Part B of equation 10 deals with this by introducing a penalty term. Isolating part B, we have:
n
lB () =
i=1
(i )(i )
(12)
The author suggests that this term can be viewed as Bayesian prior distribution [1, p. 8], and it is this idea that shall be discussed rst. Bayesian inference is used for the prediction of parameters and distributions from data, and works by using a combination of prior assumptions about a dataset (made before it is examined), and the data contained within it to estimate the nal result (known as the posterior); these prior assumptions are known as Prior Probabilities [6]. A Beta distribution (Beta(, )) is used to give an estimate of such prior probabilities: P (p; , ) = p1 (1 p) 1 B (, ) [7] (13)
This describes the distribution of the probability value p, and for situations in which prior probabilities should eectively be ignored (applicable in this situation as we are only basing 5
our estimates o a given seasons results rather than anything else), a value of B (1, 1) is taken; this gives a uniform probability density, and so we end up with just P (p; , ) = p1 (1 p) 1 . In this case we dene a parameter x = (X ) associated with each team X , this is the equivalent of the probability of a team X defeating another team Y with Y = 0. Assigning a Beta distribution to each of the i , we get two important results: rstly, the posterior value for the s remains proportional to the unpenalised likelihood, so no rankings will change as a result of this part of the equation; secondly that
n n
i1 (1 i ) 1 =
i=1 i=1
(i )1 (1 (i )) 1
(14)
by the denition of i earlier. It is explained that it was decided to set = to give an equal prior probability of win and loss to each team, whilst the value = = 2 was chosen after comparing the model results for dierent values to the league rankings[1, p. 8]. Alternatively, and perhaps more intuitively, equation 12 can be seen as the introduction of a new virtual team with a value of i = 0, to which every team has one win and one loss. This counteracts the problem described at the beginning of this subsection with undefeated teams, as now each team has one loss to this new virtual team.
4.3
Part C
At this point, all games played between FBS teams have been considered, and an adequate ranking could be achieved by multiplying the results of equations 11 and 12 together to give l() = lA () lB (). However, this does not take into consideration games played between FBS and non-FBS teams (which are not uncommon) since such non-FBS teams are not contained within the set S as given in equation 7. This could be seen by some as a major aw because an FBS team that was undefeated, save for a loss to a non-FBS team, would be ranked extremely highly by such a model, despite having lost to a much weaker opponent. As a result of this, the nal part of equation 10 deals with games against non-FBS opponents. It is mentioned as a possible alternative that the set S could be expanded to include all teams that played against an FBS opponent, but then those teams have to also be given a strength value in order to rank their diculty which would require consideration of xtures played against the league below also; this continues until it we require a ranking for every college football team in the United States, a task that would require considerable amounts of xture information. Part C is oered as an alternative solution:
Part II
lC () =
(i,j )S
[(i j )]nij (n+1 )(n+1 )
(15)
Part I
4.3.1
Part I
From the denition of S in equation 9, we group all non-FBS teams into a generic team n + 1, and from this, produce a similiar equation to that in Section 4.1, the two of which combine to cover all the games played by FBS teams over the course of the ranking period. This term also has the added property that since - in the vast majority of cases - the n + 1th team will lose, n+1 has a very small estimated value (since is the parameter value that will maximise the probability of teams 1 to n winning against non-FBS opponents). This means that should any of the n teams have a loss to the generic non-FBS team, this will have a big downward impact on the ranking of that team, as should be expected from the explanation given above. 4.3.2 Part II
This is the equivalent of the term in equation 12 for this generic n + 1th team, and penalises the likelihood in a similiar fashion. Thus we obtain equation 10 as l() = lA () lB () lC ().
Referencing Works
A year after the publication of Meases paper, another article appeared in The American Statistician concerned with the same question, authored by the writers of journal articles who had attempted to tackle the problem. Two main questions are repeatedly raised throughout this article: rstly, what is the ultimate purpose of the BCS rankings? Secondly, what other parameters might be considered as inuencing teams perfomances? Adressing this rst question, Harville notes that [5, p. 188] if the main pupose of the computer rankings is to emulate the results of the human polls, then there appears to be little reason to include them in the rst place. Stern explains that the ultimate purpose of the BCS rankings, selecting the top two teams in the league so that they can play in a bowl game to determine winner, is badly dened: the weightings given to the polls and the various computer models used by the BCS is determined only by their ability to give a ranking that consistently agrees with public consensus (again, poorly dened) for previous years [5, p. 181]. He goes on to oer two possible denitions for the top two teams, based on previous articles. First, Stern proposes that the top teams should be those that should statistically defeat all others in a tournament at the end of the season [5, p. 182]. Reasons against the holding of such a tournament supposedly include concerns for player safety and negative impact on players academic perfomances [5, p. 180]. He also references the work of C Morris, suggesting that teams should be ranked based on the outcome of a hypothetical tournament in which each team plays every other (impossible practically for the reasons mentioned in Section 1)[5, p. 182]. Concerning this second question, Harville claims rstly that the advangage to playing another team at ones home ground is thougt to be about four points[5, p. 188], and that most of the models proposed to date do not account for these (including all of those used by the BCS). Home-eld advantage is thought to play a small but signicant role in teams performances and for teams in conferences that play more games at their home ground than away, a rating 7
system that does not account for this advantage will articially inate the teams rank, giving them perhaps more credit than is due. Additionally, whilst it was explained in Section 3.1 that scores are often ignored in models or at the very least restricted to a range of values, Harville goes on to suggest that a middle ground should be sought between a system that only focuses on win/loss records and one that considers full scores [5, p. 189]. He argues that whilst not restricting victory margins at all encourages unsportsmanlike conduct by allowing teams to run-up scores in already decided games, restricting them entirely makes it impossible to rank undefeated teams that have in fact played a very weak schedule (allowing them to run up large numbers of points in a number of their games, which would otherwise increase their overall rank). In his section of the same article Mease expands on this point further, explaining that humans can distinguish between the cases of bad sportsmanship described above and one team outperforming the other for the full duration of the game [5, p. 193]. However, he goes on to reiterate the point made in Section 1 that it is impossible for the poll respondents to keep track of all the games played and make such a distinction for all games. Mease mentions another drawback of the polls - they tend to favour teams that traditionally perform well, and so rankings for one season can often be inuenced by the successes of teams in previous years [5, p. 192]. In their current forms, none of the current models account for this; whilst on the surface this appears good in that each teams ranking is based solely on their performance for that season, this in fact discounts important information that might otherwise be useful - a team might be performing well above where it was the previous year. This leads onto another one of Meases points: whether rankings should be based on which teams will defeat which others, or whether they should be based on which team has accomplished more. He claims that the poll rankings seem to nd a middle ground, suggesting that whilst a team seen as unlikely to beat another would probably be ranked below it, a team with a worse record would be unlikely to be ranked above a team with a better one, even if the rst team was predicted to beat the second [5, p 192]. In this way, the poll rankings exhibit behaviours typical of both schools of thought.
Computational Application
Mease mentions that the likelihood function proposed in equation 10 can be maximised using statistical software, and outlines a method of doing so [1, p. 9]. On his website, an R script le can be found that gives a method of ranking the basic conference scenario as described in Section 2[8]. This script was adapted to produce a ranking for the whole of the FBS division for the 2013-14 season, and the results are shown below (compared to the ocial coaches poll rankings). The full R code used is given in an appendix to this report, whilst xtures and team lists were obtained from ESPN [9],[10].
Mease Rankings Rank Team i estimate 1 Florida State 1.64373 2 Auburn 1.56405 3 Alabama 1.38732 4 Michigan State 1.33871 5 Ohio State 1.33807 6 Stanford 1.26954 7 Missouri 1.24712 8 Baylor 1.18243 9 South Carolina 1.13252 10 Arizona State 1.12154 11 UCF 1.09745 12 Clemson 1.05601 13 Oregon 1.00338 14 Oklahoma 0.97188 15 Northern Illinois 0.93002 16 UCLA 0.92453 17 Louisville 0.90005 18 Oklahoma State 0.8815 19 LSU 0.86147 20 Georgia 0.82737 21 Fresno State 0.78858 22 Wisconsin 0.78208 23 Duke 0.78082 24 Notre Dame 0.7808 25 Texas A&M 0.75353
Associated Press Rank Team 1 Florida State 2 Auburn 3 Michigan State 4 South Carolina 5 Missouri 6 Oklahoma 7 Alabama 8 Clemson 9 Oregon 10 UCF 11 Stanford 12 Ohio State 13 Baylor 14 LSU 15 Louisville 16 UCLA 17 Oklahoma State 18 Texas A&M 19 USC 20 Notre Dame 21 Arizona State 22 Wisconsin 23 Duke 24 Vanderbilt 25 Washington
USA Today Rank Team 1 Florida State 2 Auburn 3 Michigan State 4 South Carolina 5 Missouri 6 Oklahoma 7 Clemson 8 Alabama 9 Oregon 10 Ohio State 10 Stanford 12 UCF 13 Baylor 14 LSU 15 Louisville 16 UCLA 17 Oklahoma State 18 Texas A&M 19 USC 20 Arizona State 21 Wisconsin 22 Duke 23 Vanderbilt 24 Notre Dame 25 Nebraska
As can be seen from the above tables, there are a number of similarities between the rankings, and the top two teams are agreed upon among all three polls. As we move further down the table, although on the whole the tables agree, there are some notable dierences (for example, South Carolina is ranked 9th in the proposed model, but 4th in both of the poll rankings, whilst Arizona State is ranked 10th in the model, but 20th and 21st in the USA Today and AP polls respectively. There are a number of possible reasons for these disparities (and the others in the table), but on the whole the model serves as a good approximation to human opinions. For example, point margins are entirely disregarded by Meases model, whilst South Carolina may have won key games against tough opponents with large point margins signicant enough to leave an impression on the human pollsters. The most important point to note is that the top two ranked teams are the same across all three, as after all, the main goal of the BCS rankings is to produce the top two teams that should play against each other for the national championship title.
Conclusion
In this report, an attempt has been made to explain the mathematics behind the proposed model; outline some similiar work also concerned with the ranking of college football teams; and nally to give a practical application of the model to the most recent football season and 9
compare it to the polls, the results from which the model attempts to emulate. As of the start of the 2014 season, FBS teams will no longer compete in the BCS, instead competing for four places in the College Football Playo (CFP), where two semi-nal games and a nal to decide an overall league champion. [11] The change in league structure also precludes the involvement of statistical analysis - a committee of 13 will exclusively decide on the rankings for each team, uninuenced by polls or computer ranking models [11]. However, since this committee will not meet every week to produce a ranking, there does remain a very limited scope for such models; the polls will most likely continue to be run, and computer models can still be used to propose rankings in the weeks where no such ocial ranking announcement is made, although the importance and relevance of these results will be greatly reduced, not carrying any ocial status. It remains to be seen, considering all of the previous arguments against entirely human-based rankings, whether the results of this committee more accurately reect public opinion than those of the BCS system which it replaces.
References
[1] Mease, D. A Penalised Maximum Likelihood Approach for the Ranking of College Football Teams Independent of Victory Margins. The American Statistician. [Online]. 2003, 57(4), pp.241-248. [Accessed 8th February 2014]. Available from: http://www.davemease.com/papers/football.pdf [2] Sports Illustrated. College Football Teams NCAA FBS [Online]. 2014. [Accessed 15th February 2014]. Available http://sportsillustrated.cnn.com/football/ncaa/teams/divia.html Teams. from:
[3] Harville, D. The Use of Linear-Model Methodology to Rate High Scool or College Football Teams. Journal of the American Statistical Associan. [Online]. 1966, 72(358), pp.278-289. [Accessed 17th February 2014]. Available from: http://www.jstor.org/stable/2286789 [4] Keener, J. P. The Perron-Frobenius Theorem and the Ranking of Football Teams. SIAM Review. [Online]. 1993, 35(1), pp.80-93. [Accessed 17th February 2014]. Available from http://www-stat.wharton.upenn.edu/ steele/Courses/956/Ranking/RankingFootballSIAM93.pdf [5] Stern, H.S. et al. Statistics and the College Football Championship. The American Statistician. [Online]. 2004, 58(3), pp.179-195. [Accessed 17 February 2014]. Available from: http://www.jstor.org/stable/27643551 [6] Statisticat. Prior Probabilities and Bayesian Inference. [Online]. 2014. [Accessed 19 February 2014]. Available from: http://www.bayesian-inference.com/priors [7] Wikipedia. Beta distribution. [Online]. 2014. [Accessed 19 February 2014]. Available from: http://en.wikipedia.org/wiki/Beta distribution#Bayesian inference [8] Mease D. R code for David Meases Football Rankings. [Online]. 2007. [Accessed 25 February 2014]. Available from: http://www.davemease.com/football/Rcode.html 2007 [9] ESPN. NCAA College Football Teams. [Online]. 2014. [Accessed 25 February 2014]. Available from: http://espn.go.com/college-football/teams 10
[10] ESPN. 2013 NCAA Division I-A NCAA Football Scores and Schedules . [Online]. 2014. [Accessed 25 February 2014]. Available from: http://espn.go.com/college-football/schedule [11] Wikipedia. College Football Playo. [Online]. 2014. [Accessed 27th February 2014]. Available from: http://en.wikipedia.org/wiki/College Football Playo
Appendix
Below is the R code used for the computer model in Section 6, as adapted from [8]: nteams<-127 ngames<-820 imatrix <- as.matrix(read.table("Football Results.txt", header = FALSE, nrows = ngames)) dmatrix <- matrix(0,(ngames+2*nteams),nteams) for (i in 1:ngames){ dmatrix[i,imatrix[i,1]]<-1 dmatrix[i,imatrix[i,2]]<-(-1) } for (i in 1:nteams){ dmatrix[(ngames+i),i]<-1 dmatrix[(ngames+nteams+i),i]<-(-1) } model<-glm(rep(1,(ngames+2*nteams))~dmatrix-1,family=binomial(link=probit)) summary(model) write.table(model["coefficients"], file="results.txt", sep="\t")
11

Commentary On David Mease's Paper On Penalised Maximum Likelihood Rankings For College Football

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Commentary On David Mease's Paper On Penalised Maximum Likelihood Rankings For College Football

Uploaded by

Copyright:

Available Formats

MATH3000: Information Skills in Mathematics

(1) Team (2) Team (3) Team (4) Team

A vs Team C A vs Team E B vs Team A B vs Team E

(5) Team (6) Team (7) Team (8) Team

C vs Team D C vs Team E D vs Team E D vs Team E

Harvilles Linear Model

Keener & the Bradley-Terry Model

aij + aji a a ijij jiji aij aij + aji aij ri ri + rj

(3) [4, p. 87 89]

where: Sij is the number of points team i scores in the match 3

The Proposed Model

Then the likelihood function l() is given by:

(i )(i ) (n+1 )(n+1 )

Isolating part A, we have: [(i j )]nij (11)

[(i j )]nij (n+1 )(n+1 )

You might also like