in Applied Mathematics

Mark E. Johnson

SportMetrika, Inc.

maejohns@yahoo.com

4 October 2005

Outline

1. Why a mathematician in sports?

2. Quantitative Decision Making

3. What is “statistics” ?

4. Related problems in sports statistics

5. Some other examples of mathematics in

industry

6. Getting here

7. Future directions

A mathematician in sports?

• Sports is just behind the

times; businesses weren’t

always analytical either

• Off-season decision making

• Moneyball

– Finding market inefficiencies

– Convincing others of a new way

to look at things

From Michael Lewis’ Moneyball

"... OPS was the simple addition of on-base and slugging percentages.

Crude as it was, it was a much better indicator than any other

offensive statistic of the number of runs a team would score. Simply

adding the two statistics together, however, implied that they were

of equal importance. If the goal was to raise a team's OPS, an extra

percentage point of on-base was as good as an extra percentage

point of slugging.

Before his thought experiment Paul (DePodesta) had felt uneasy

with this crude assumption; now he saw that the assumption was

absurd. An extra point of on-base percentage was clearly more

valuable than an extra point of slugging percentage -- but by how

much? … In his model an extra point of on-base percentage was

worth three times an extra point of slugging percentage."

A little history of statistics and

baseball

• Bill James, the pioneer.

• SABR – Society for American Baseball Research

– http://www.sabr.org

• Some websites and information:

– Books

• Bill James Historical Abstracts

• “Baseball Hacks”, by Joe Adler

– Some sites

• http://www.baseballprospectus.com/

• http://www.Baseball-Reference.com

• http://www.Retrosheet.org

– Other resources

• http://www.sportmetrika.com/resources.php

Video

• [insert video here]

Sports Decision Making

• League-level decisions

– Commissioner

• Team-level decisions

– General Manager

– Field Manager

League-level

• Schedules

– An optimization problem

• Maximize chances of close races at end of season

• Maximize attendance

• Play every team N times, etc…

• No more than X days on the road at-a-time.

• Playoff Format

– 5-game playoff versus 7-game playoff

– A wildcard? Why only 8 teams?

Team-level: Player Evaluation

• A player is an investment to a sports team’s owner. When is this a

good or a bad investment?

• Opportunities to acquire a player:

– Amateur Draft

– Trades

– Free agents

– Contract extensions

• How can we predict how good a player will be in the future?

– Is there a sufficient amount of data to analyze and project a player’s

future performance?

– How do we analyze it?

• Sure, a good player will increase a team’s chances of winning a

game, but how much $ is that worth? Is that the only strategy?

Player evaluation, continued

• Is the data sufficiently detailed to allow us to

remove as much context from events?

– Ballparks affect measurements

– So does the strength of your competition…

– A pitcher throws differently to a batter, depending on

the situation.

• One’s first objective would be to remove as

much context from the observations, so that you

compare apples to apples.

– This can be done with wisely-chosen mathematical

models .

In-game strategies

• Are there enough data to answer questions like

these?

– Setting a lineup

• Should the “best” hitter bat 3rd, 4th, or 5th?

• Is the 9th slot really the best place for the pitcher?

– Bunting

• When should you give up the opportunity for bigger hits by

sacrificing a player’s at-bat?

– Intentional Walks

• Do too many teams walk Barry Bonds?

– Stealing bases

• What success rate should you expect out of a player before

you start signaling that he attempt to steal a base?

What are “statistics” ?

• Fans of sports use the term statistics to refer to

what they read in the morning newspaper.

– A journalist tells stories by using story-telling statistics

• A mathematician cares about how players and

teams are going to do in the future, so they try

to define measurements that can be used to

predict expected DESIRED outcomes.

Some quick examples

Runs scored versus wins

• What wins games?

R

1100

http://www.baseball1.com 900

700

600

500

40 50 60 70 80 90 100 110 120

H RA

1800 1100

1700 1000

1600 900

1500 H 800 RA

1400 700

1300 600

1200 500

40 50 60 70 80 90 100 110 120 40 50 60 70 80 90 100 110 120

Common Statistics relationship

with each other

Analyzing Baseball Statistics Using R

http://www.oreillynet.com/pub/a/network/2004/10/27/baseball.html

What scores runs? (recall

Moneyball quote)

AVG

OPS

0.300 0.900

0.290

0.850

0.280

0.270

0.800

0.260

0.250 AVG 0.750 OPS

0.240

0.700

0.230

0.220 0.650

0.210

0.600

0.200 450 550 650 750 850 950 1050

450 550 650 750 850 950 1050

OPS2 OPS3

1.300 1.600

1.250 1.550

1.200 1.500

1.150 1.450

1.050 1.350

1.000 1.300

0.950 1.250

0.900 1.200

450 550 650 750 850 950 1050 450 550 650 750 850 950 1050

Return to Moneyball for a

moment…

• Why a 3? Why not a 2?

• Linear regression can be used to

determine a best fit of A, B, and C in the

linear model:

• What is A / B ?

It depends on how you try to

answer the question…

• This most likely is not the right

way to answer this question….

– Consider outliers due to “odd”

seasons, such as the

numbers that may be

generated because Barry

Bonds is on your team, or if

you play in Coors Field.

– Or, perhaps a least-squares

distant metric is not best

choice.

• Or, this may not be the best

question to ask ….

• Pending editorial review, see

my Hack in Joe Adler’s

“Baseball Hacks” for more on

this subject.

Defining States in baseball

• Baseball is discrete

• Events can be recorded, as can the state

of affairs before and after the event.

• For the time being, consider a state as

being the (number of outs, base-runners)

pair.

State transitions and Run

Expectancy

RE 99-02 0 outs 1 out 2 outs

• The number of runs

expected in the Empty .56 .30 .12

under average conditions. 2nd 1.19 .73 .34

• These values were taken 3rd 1.48 .98 .39

directly from

1st and 1.57 .97 .47

http://www.tangotiger.net, 2nd

but can be easily derived 1st and 1.90 1.24 .54

3rd

either by:

2nd and 2.05 1.47 .63

• Averaging outcomes 3rd

using play-by-play data. Loaded 2.42 1.65 .82

When should you steal a base?

• (0,1st) = 0.95

• (0,2nd) = 1.19

• (1,none) = 0.30

• Success is worth +0.24 runs

• Failure is worth -0.65 runs

• Failure loses 2.71 times what success gains

• When is it worth the risk? When you succeed

2.71 times more than you fail (about a 73 %

success rate).

Other sports and other applications

• Other sports

– Football

• College: BCS ranking system… how do determine a ‘best team’ with very little data

• NFL: New England Patriots have won Superbowl three out of the last four season and

are known to be very analytical in their game calling…

– When to punt on fourth down, as a function of: time left in game, score, field position,

number of time outs left, etc…

– Basketball

• Dean Oliver (“Basketball on Paper”), worked for the Seattle Supersonics

• http://www.82games.com

• Mark Cuban & Jeff Sagarin

– Tennis

• Ranking & Tournament Scheduling

• Other applications

– Fantasy sports

• Constructing a team, but with a different scoring system

– Gambling

• Betting against the line-makers; when to bet against the line or popular demand.

Other applications of mathematics

in industry

• Yahoo

– Ad targeting and pricing

– E-commerce

• Calculating product affinities (up-sells and cross-sells)

• Finding what you want

– Search algorithms

– Site testing

– Web Traffic Arbitrage

• Netflix

– Supply and demand modeling

– Product recommendation

• Entelos

– Mathematical modeling of disease

– Simulations of human response to disease

