You are on page 1of 4

Welcome to part seven of “Forecasting Legends: Building Classification Models to

Predict MLB Hall of Famers.” Throughout this series, I put several classification models to work
to analyze the chances of a player getting into the Hall of Fame at various stopping points in their
career. I also used these models to predict the future and assess which players are destined for the
Hall of Fame. In the final part of this series, I review all the models I worked with and look
ahead to next year’s Hall of Fame candidates.
For those who may have missed some of my previous work in this series, please click on
any of the links below. Each link focuses on a different time frame. Join me one final time as I
look to unravel the secrets behind the road to baseball immortality!
 Overview & First 5 Years
 First 7 Years
 First 10 Years
 First 12 Years
 First 15 Years
 Entire Career
 Review & 2024 MLB Hall of Fame
Review of Models (Validation Set)
Of all the models that were tested, the best model turned out to be the one that examined
the first 12 years of a player’s career. The validation set produced an accuracy of 95.02% with an
area under the curve of 0.97577. As I expanded the time period for evaluation, the accuracy
began to rise. This makes sense because while a player could be producing early on in their
career, it doesn’t mean those numbers will stick throughout the duration of their career. For
example, we saw someone like Darryl Strawberry who had a 97.38% chance of making it into
the Hall of Fame after five seasons. His production cratered as his career went along and he
eventually would miss out on a spot in the Hall of Fame. Therefore, the more games a player has
under their belt, the easier it will be to predict whether they will be a Hall of Famer. There is one
exception to this rule, however. The “Entire Career” model was the fifth most accurate model. It
is my assumption that this occurred because of how vague the term “Entire Career” is. For many
players, an entire career is five seasons while for others it is 15. This had a direct impact on how
the model evaluated each of the records in the dataset and ultimately how it decided to make its
predictions.
RANK MODEL ACCURACY (%) AREA UNDER THE
CURVE (AUC)
1 First 12 Years 95.02 0.97577
2 First 15 Years 94.38 0.98584
3 First 10 Years 93.85 0.97402
4 First 7 Years 92.03 0.95812
5 Entire Career 91.24 0.94404
6 First 5 Years 89.21 0.95555
2024 MLB Hall of Fame
Per Baseball Reference, there are 22 hitters that will be eligible for the Hall of Fame (all
players who played at least ten seasons and have a score of at least 10 in the HOF Monitor) in
2024.1 This list consists of players who fell short of the Hall of Fame a year ago but received
enough votes to stay on. An example is Todd Helton who tallied 72.2% of the vote (just short of
the 75% threshold). This will be his sixth year on the ballot. This list also consists of players who
will be on the ballot for the first time. This includes Adrian Beltre, Chase Utley, Joe Mauer, and
others. Overall, there are 12 first-timers and 10 returners.
Among this group of players, Alex Rodriguez has the highest probability of getting into
the Hall of Fame. Rodriguez hit the ground running as soon as he got to the big leagues. In his
first ten seasons, he accumulated 63.6 WAR while posting a slash line of .308/.382/.581 with 345
HRs and 990 RBIs. At this point, Rodriguez had a 99.99% chance of getting into the Hall of
Fame. In his final 12 seasons, he accumulated 54.0 WAR while posting a slash line
of .283/.378/.523 with 351 HRs and 1,096 RBIs. At the conclusion of his career, Rodriguez had a
100% chance of getting into the Hall of Fame. Overall, in 22 seasons with three different teams,
Rodriguez was honored at 14 All-Star Games and was awarded 3 MVPs, 2 Gold Gloves, and 10
Silver Sluggers. This upcoming year will mark his third year on the ballot. So far, he has peaked
at 35.7% of the vote. While his numbers suggest he is a lock for the Hall of Fame, his use of
steroids may ultimately hurt his chances of joining the best of the best.
Chase Headley has the lowest probability of any player in this group (10.05%). In his
first five seasons, Headley accumulated 7.9 WAR while posting a slash line of .269/.343/.392
with 36 HRs and 204 RBIs. At this point, Headley had a 47.79% chance of getting into the Hall
of Fame. In his final seven seasons, he did not help his chances. He accumulated 18.0 WAR
while posting a slash line of .259/.342/.403 with 94 HRs and 392 RBIs. He was also rewarded
with a Gold Glove and a Silver Slugger during his career. From the conclusion of his fifth season
to the final day of his career, his chances of getting into the Hall of Fame decreased by 78.98%.
This year will mark Headley’s first year on the ballot, but it is unlikely he will get enough votes
to either stay on the ballot or get elected in his first year.
Gary Sheffield is slated to be on the ballot for the tenth and final time this upcoming year.
Some notable players to get selected in their tenth year on the ballot are Larry Walker (2020),
Edgar Martinez (2019), and Tim Raines (2017). Last year (his ninth year on the ballot), he
received 55.0% of the vote which is the most he has received thus far. To begin his career, it was
anything but smooth sailing. In his first five seasons, Sheffield accumulated 7.7 WAR while
posting a slash line of .283/.341/.444 with 54 HRs and 233 RBIs. At this point, he had a 34.58%
chance of getting into the Hall of Fame. In his final 17 seasons, he accumulated 52.9 WAR while
posting a slash line of .294/.404/.529 with 455 HRs and 1,443 RBIs. Throughout his career, he
was honored at 9 All-Star Games and was awarded 5 Silver Sluggers. At the conclusion of his
career, Sheffield had a 98.93% chance of getting into the Hall of Fame (a 186.07% increase from
the point his fifth season ended). We will see if he can capture the attention of more voters this
year and get over the hump.
1
https://www.baseball-reference.com/about/leader_glossary.shtml#hof_monitor
While it is far from a guarantee that all 22 of these players will get into the Hall of Fame,
the numbers suggest that 19 of these players have a great shot at joining an elite class.

Figure 1: 2024 MLB HOF Ballot w/ Probabilities

Conclusion
Can any insights be gained from a player’s first several years in the big leagues? Yes and
no. While a player may begin their career on a high note, there is never a guarantee that that
player will sustain that success. In fact, most players who succeed in their first several seasons
find it hard to replicate that success as numerous factors come into play. Whether it has to do
with pitchers learning their tendencies or an inability to stay healthy, flourishing in that batter’s
box is no easy task. There is a reason the average career length of an MLB player is 5.6 years. 2
Sustained success is simply hard. Therefore, as a player moves into his tenth season and beyond,
there is a bigger sample size to draw from. While it is still not a given that a player will make it
to the Hall of Fame based on that achievement alone, they already have a leg up on the hundreds
of players who failed to lengthen their careers. Again, it was determined that the most predictive
power can be gained after a player’s 12 th season. For players such as Mike Trout, Paul
Goldschmidt, and Anthony Rizzo who recently completed their 12 th season in 2022, a case can be
made for or against their Hall of Fame candidacy since they have been in the league and have
produced for quite some time. While players such as Ronald Acuna and Juan Soto have already
2
https://mlbrun.com/average-career-length-of-mlb
cemented themselves as faces of baseball and put themselves on the Hall of Fame highway,
baseball is a tricky sport, and the future is full of the unknown. At the end of the day, the Hall of
Fame status of players everywhere is determined by the BBWAA, not a comprehensive machine
learning model.

You might also like