Mark E. Johnson SportMetrika, Inc. maejohns@yahoo.com 4 October 2005

Outline

1. 2. 3. 4. 5. Why a mathematician in sports? Quantitative Decision Making What is “statistics” ? Related problems in sports statistics Some other examples of mathematics in industry 6. Getting here 7. Future directions

A mathematician in sports?

• Sports is just behind the times; businesses weren’t always analytical either • Off-season decision making • Moneyball

– Finding market inefficiencies – Convincing others of a new way to look at things

**From Michael Lewis’ Moneyball
**

"... OPS was the simple addition of on-base and slugging percentages. Crude as it was, it was a much better indicator than any other offensive statistic of the number of runs a team would score. Simply adding the two statistics together, however, implied that they were of equal importance. If the goal was to raise a team's OPS, an extra percentage point of on-base was as good as an extra percentage point of slugging. Before his thought experiment Paul (DePodesta) had felt uneasy with this crude assumption; now he saw that the assumption was absurd. An extra point of on-base percentage was clearly more valuable than an extra point of slugging percentage -- but by how much? … In his model an extra point of on-base percentage was worth three times an extra point of slugging percentage."

**A little history of statistics and baseball
**

• Bill James, the pioneer.

• SABR – Society for American Baseball Research

– http://www.sabr.org

**• Some websites and information:
**

– Books

• Bill James Historical Abstracts • “Baseball Hacks”, by Joe Adler

– Some sites

• http://www.baseballprospectus.com/ • http://www.Baseball-Reference.com • http://www.Retrosheet.org

– Other resources

• http://www.sportmetrika.com/resources.php

Video

• [insert video here]

**Sports Decision Making
**

• League-level decisions

– Commissioner

• •

**Team-level decisions
**

– General Manager

**Playing the game
**

– Field Manager

League-level

• Schedules

– An optimization problem

• • • • Maximize chances of close races at end of season Maximize attendance Play every team N times, etc… No more than X days on the road at-a-time.

• Playoff Format

– 5-game playoff versus 7-game playoff – A wildcard? Why only 8 teams?

**Team-level: Player Evaluation
**

• • A player is an investment to a sports team’s owner. When is this a good or a bad investment? Opportunities to acquire a player:

– – – – Amateur Draft Trades Free agents Contract extensions

•

**How can we predict how good a player will be in the future?
**

– Is there a sufficient amount of data to analyze and project a player’s future performance? – How do we analyze it?

•

Sure, a good player will increase a team’s chances of winning a game, but how much $ is that worth? Is that the only strategy?

**Player evaluation, continued
**

• Is the data sufficiently detailed to allow us to remove as much context from events?

– Ballparks affect measurements – So does the strength of your competition… – A pitcher throws differently to a batter, depending on the situation.

• One’s first objective would be to remove as much context from the observations, so that you compare apples to apples.

– This can be done with wisely-chosen mathematical models .

**In-game strategies
**

• Are there enough data to answer questions like these?

– Setting a lineup

• Should the “best” hitter bat 3rd, 4th, or 5th? • Is the 9th slot really the best place for the pitcher?

– Bunting

• When should you give up the opportunity for bigger hits by sacrificing a player’s at-bat?

– Intentional Walks

• Do too many teams walk Barry Bonds?

– Stealing bases

• What success rate should you expect out of a player before you start signaling that he attempt to steal a base?

**What are “statistics” ?
**

• Fans of sports use the term statistics to refer to what they read in the morning newspaper.

– A journalist tells stories by using story-telling statistics

• A mathematician cares about how players and teams are going to do in the future, so they try to define measurements that can be used to predict expected DESIRED outcomes.

**Some quick examples
**

• What wins games?

– Data from The Baseball Archive http://www.baseball1.com – All MLB teams over the last 10 seasons

**Runs scored versus wins
**

Runs scored versus wins

**Hits versus wins
**

Hits versus wins

**Runs allowed versus wins
**

RA

RA

50

60

70

80

90

100

110

120

Common Statistics relationship with each other

From Joe Adler’s online article: Analyzing Baseball Statistics Using R http://www.oreillynet.com/pub/a/network/2004/10/27/baseball.html

**What scores runs? (recall Moneyball quote)
**

AVG 0.300 0.290 0.280 0.270 0.260 0.250 0.240 0.230 0.220 0.210 0.200 450 550 650 750 850 950 1050

0.650 0.600 450 550 650 750 850 950 1050 0.850 0.800 0.750 0.900 OPS

AVG

OPS

0.700

OPS2 1.300 1.250 1.200 1.150 1.100 1.050 1.000 0.950 0.900 450 OPS2 1.600 1.550 1.500 1.450 1.400 1.350 1.300 1.250 1.200 450

OPS3

OPS3

550

650

750

850

950

1050

550

650

750

850

950

1050

**Return to Moneyball for a moment…
**

• Why a 3? Why not a 2? • Linear regression can be used to determine a best fit of A, B, and C in the linear model: A * OBP + B * SLG + C = Runs • What is A / B ?

**It depends on how you try to answer the question…
**

• This most likely is not the right way to answer this question….

– Consider outliers due to “odd” seasons, such as the numbers that may be generated because Barry Bonds is on your team, or if you play in Coors Field. – Or, perhaps a least-squares distant metric is not best choice.

• •

Or, this may not be the best question to ask …. Pending editorial review, see my Hack in Joe Adler’s “Baseball Hacks” for more on this subject.

**Defining States in baseball
**

• Baseball is discrete • Events can be recorded, as can the state of affairs before and after the event. • For the time being, consider a state as being the (number of outs, base-runners) pair.

**State transitions and Run Expectancy
**

• The number of runs expected in the remainder of the inning, under average conditions. • These values were taken directly from http://www.tangotiger.net, but can be easily derived either by:

• Averaging outcomes using play-by-play data.

RE 99-02 Empty 1st 2nd 3rd 1st and 2nd 1st and 3rd 2nd and 3rd Loaded 0 outs .56 .95 1.19 1.48 1.57 1.90 2.05 2.42 1 out .30 .57 .73 .98 .97 1.24 1.47 1.65 2 outs .12 .25 .34 .39 .47 .54 .63 .82

**When should you steal a base?
**

• • • • • • • (0,1st) = 0.95 (0,2nd) = 1.19 (1,none) = 0.30 Success is worth +0.24 runs Failure is worth -0.65 runs Failure loses 2.71 times what success gains When is it worth the risk? When you succeed 2.71 times more than you fail (about a 73 % success rate).

**Other sports and other applications
**

• Other sports

– Football

• College: BCS ranking system… how do determine a ‘best team’ with very little data • NFL: New England Patriots have won Superbowl three out of the last four season and are known to be very analytical in their game calling…

– When to punt on fourth down, as a function of: time left in game, score, field position, number of time outs left, etc…

– Basketball

• Dean Oliver (“Basketball on Paper”), worked for the Seattle Supersonics • http://www.82games.com • Mark Cuban & Jeff Sagarin

– Tennis

• Ranking & Tournament Scheduling

•

Other applications

– Fantasy sports

• Constructing a team, but with a different scoring system

– Gambling

• Betting against the line-makers; when to bet against the line or popular demand.

**Other applications of mathematics in industry
**

• Yahoo

– Ad targeting and pricing – E-commerce

• Calculating product affinities (up-sells and cross-sells) • Finding what you want

– Search algorithms – Site testing – Web Traffic Arbitrage

• Netflix

– Supply and demand modeling – Product recommendation

• Entelos

– Mathematical modeling of disease – Simulations of human response to disease

