You are on page 1of 29

Columbia University Seminar

in Applied Mathematics
Mark E. Johnson
SportMetrika, Inc.
maejohns@yahoo.com
4 October 2005
Outline
1. Why a mathematician in sports?
2. Quantitative Decision Making
3. What is “statistics” ?
4. Related problems in sports statistics
5. Some other examples of mathematics in
industry
6. Getting here
7. Future directions
Outline
1. Why a mathematician in sports?
2. Quantitative Decision Making
3. What is “statistics” ?
4. Related problems in sports statistics
5. Some other examples of mathematics in
industry
6. Getting here
7. Future directions
A mathematician in sports?
• Sports is just behind the
times; businesses weren’t
always analytical either
• Off-season decision making
• Moneyball
– Finding market inefficiencies
– Convincing others of a new way
to look at things
From Michael Lewis’ Moneyball
"... OPS was the simple addition of on-base and slugging percentages.
Crude as it was, it was a much better indicator than any other
offensive statistic of the number of runs a team would score. Simply
adding the two statistics together, however, implied that they were
of equal importance. If the goal was to raise a team's OPS, an extra
percentage point of on-base was as good as an extra percentage
point of slugging.
Before his thought experiment Paul (DePodesta) had felt uneasy
with this crude assumption; now he saw that the assumption was
absurd. An extra point of on-base percentage was clearly more
valuable than an extra point of slugging percentage -- but by how
much? … In his model an extra point of on-base percentage was
worth three times an extra point of slugging percentage."
A little history of statistics and
baseball
• Bill James, the pioneer.
• SABR – Society for American Baseball Research
– http://www.sabr.org
• Some websites and information:
– Books
• Bill James Historical Abstracts
• “Baseball Hacks”, by Joe Adler
– Some sites
• http://www.baseballprospectus.com/
• http://www.Baseball-Reference.com
• http://www.Retrosheet.org
– Other resources
• http://www.sportmetrika.com/resources.php
Video
• [insert video here]
Outline
1. Why a mathematician in sports?
2. Quantitative Decision Making
3. What is “statistics” ?
4. Related problems in sports statistics
5. Some other examples of mathematics in
industry
6. Getting here
7. Future directions
Sports Decision Making

• League-level decisions
– Commissioner

• Team-level decisions
– General Manager

• Playing the game


– Field Manager
League-level
• Schedules
– An optimization problem
• Maximize chances of close races at end of season
• Maximize attendance
• Play every team N times, etc…
• No more than X days on the road at-a-time.
• Playoff Format
– 5-game playoff versus 7-game playoff
– A wildcard? Why only 8 teams?
Team-level: Player Evaluation
• A player is an investment to a sports team’s owner. When is this a
good or a bad investment?
• Opportunities to acquire a player:
– Amateur Draft
– Trades
– Free agents
– Contract extensions
• How can we predict how good a player will be in the future?
– Is there a sufficient amount of data to analyze and project a player’s
future performance?
– How do we analyze it?
• Sure, a good player will increase a team’s chances of winning a
game, but how much $ is that worth? Is that the only strategy?
Player evaluation, continued
• Is the data sufficiently detailed to allow us to
remove as much context from events?
– Ballparks affect measurements
– So does the strength of your competition…
– A pitcher throws differently to a batter, depending on
the situation.
• One’s first objective would be to remove as
much context from the observations, so that you
compare apples to apples.
– This can be done with wisely-chosen mathematical
models .
In-game strategies
• Are there enough data to answer questions like
these?
– Setting a lineup
• Should the “best” hitter bat 3rd, 4th, or 5th?
• Is the 9th slot really the best place for the pitcher?
– Bunting
• When should you give up the opportunity for bigger hits by
sacrificing a player’s at-bat?
– Intentional Walks
• Do too many teams walk Barry Bonds?
– Stealing bases
• What success rate should you expect out of a player before
you start signaling that he attempt to steal a base?
Outline
1. Why a mathematician in sports?
2. Quantitative Decision Making
3. What is “statistics” ?
4. Related problems in sports statistics
5. Some other examples of mathematics in
industry
6. Getting here
7. Future directions
What are “statistics” ?
• Fans of sports use the term statistics to refer to
what they read in the morning newspaper.
– A journalist tells stories by using story-telling statistics
• A mathematician cares about how players and
teams are going to do in the future, so they try
to define measurements that can be used to
predict expected DESIRED outcomes.
Some quick examples
Runs scored versus wins
• What wins games?
R

1100

– Data from The Baseball Archive 1000

http://www.baseball1.com 900

– All MLB teams over the last 10 seasons 800 R

700

600

500
40 50 60 70 80 90 100 110 120

Hits versus wins Runs allowed versus wins


H RA

1800 1100

1700 1000

1600 900

1500 H 800 RA

1400 700

1300 600

1200 500
40 50 60 70 80 90 100 110 120 40 50 60 70 80 90 100 110 120
Common Statistics relationship
with each other

From Joe Adler’s online article:


Analyzing Baseball Statistics Using R
http://www.oreillynet.com/pub/a/network/2004/10/27/baseball.html
What scores runs? (recall
Moneyball quote)
AVG
OPS

0.300 0.900
0.290
0.850
0.280
0.270
0.800
0.260
0.250 AVG 0.750 OPS

0.240
0.700
0.230
0.220 0.650

0.210
0.600
0.200 450 550 650 750 850 950 1050
450 550 650 750 850 950 1050

OPS2 OPS3

1.300 1.600

1.250 1.550

1.200 1.500

1.150 1.450

1.100 OPS2 1.400 OPS3

1.050 1.350

1.000 1.300

0.950 1.250

0.900 1.200
450 550 650 750 850 950 1050 450 550 650 750 850 950 1050
Return to Moneyball for a
moment…
• Why a 3? Why not a 2?
• Linear regression can be used to
determine a best fit of A, B, and C in the
linear model:

A * OBP + B * SLG + C = Runs

• What is A / B ?
It depends on how you try to
answer the question…
• This most likely is not the right
way to answer this question….
– Consider outliers due to “odd”
seasons, such as the
numbers that may be
generated because Barry
Bonds is on your team, or if
you play in Coors Field.
– Or, perhaps a least-squares
distant metric is not best
choice.
• Or, this may not be the best
question to ask ….
• Pending editorial review, see
my Hack in Joe Adler’s
“Baseball Hacks” for more on
this subject.
Defining States in baseball
• Baseball is discrete
• Events can be recorded, as can the state
of affairs before and after the event.
• For the time being, consider a state as
being the (number of outs, base-runners)
pair.
State transitions and Run
Expectancy
RE 99-02 0 outs 1 out 2 outs
• The number of runs
expected in the Empty .56 .30 .12

remainder of the inning, 1st .95 .57 .25


under average conditions. 2nd 1.19 .73 .34
• These values were taken 3rd 1.48 .98 .39
directly from
1st and 1.57 .97 .47
http://www.tangotiger.net, 2nd
but can be easily derived 1st and 1.90 1.24 .54
3rd
either by:
2nd and 2.05 1.47 .63
• Averaging outcomes 3rd
using play-by-play data. Loaded 2.42 1.65 .82
When should you steal a base?
• (0,1st) = 0.95
• (0,2nd) = 1.19
• (1,none) = 0.30
• Success is worth +0.24 runs
• Failure is worth -0.65 runs
• Failure loses 2.71 times what success gains
• When is it worth the risk? When you succeed
2.71 times more than you fail (about a 73 %
success rate).
Outline
1. Why a mathematician in sports?
2. Quantitative Decision Making
3. What is “statistics” ?
4. Related problems in sports statistics
5. Some other examples of mathematics in
industry
6. Getting here
7. Future directions
Other sports and other applications
• Other sports
– Football
• College: BCS ranking system… how do determine a ‘best team’ with very little data
• NFL: New England Patriots have won Superbowl three out of the last four season and
are known to be very analytical in their game calling…
– When to punt on fourth down, as a function of: time left in game, score, field position,
number of time outs left, etc…
– Basketball
• Dean Oliver (“Basketball on Paper”), worked for the Seattle Supersonics
• http://www.82games.com
• Mark Cuban & Jeff Sagarin
– Tennis
• Ranking & Tournament Scheduling
• Other applications
– Fantasy sports
• Constructing a team, but with a different scoring system
– Gambling
• Betting against the line-makers; when to bet against the line or popular demand.
Outline
1. Why a mathematician in sports?
2. Quantitative Decision Making
3. What is “statistics” ?
4. Related problems in sports statistics
5. Some other examples of mathematics
in industry
6. Getting here
7. Future directions
Other applications of mathematics
in industry
• Yahoo
– Ad targeting and pricing
– E-commerce
• Calculating product affinities (up-sells and cross-sells)
• Finding what you want
– Search algorithms
– Site testing
– Web Traffic Arbitrage
• Netflix
– Supply and demand modeling
– Product recommendation
• Entelos
– Mathematical modeling of disease
– Simulations of human response to disease
Outline
1. Why a mathematician in sports?
2. Quantitative Decision Making
3. What is “statistics” ?
4. Related problems in sports statistics
5. Some other examples of mathematics in
industry
6. Getting here
7. Future directions
Outline
1. Why a mathematician in sports?
2. Quantitative Decision Making
3. What is “statistics” ?
4. Related problems in sports statistics
5. Some other examples of mathematics in
industry
6. Getting here
7. Future directions