You are on page 1of 6

How do markets play soccer?

Benoit Jottreau*
* Department of Applied Mathematics, Universite Paris-Est,
Cite Descartes, 5 Bd Diderot, Champs sur Marne, 77454 Marne La Vallee, France;
E-mail: benoit jottreau@yahoo.fr
Abstract. Online soccer betting has become a complex continuous-time market. Hence, the need for continuous time and sophisticated models is obvious as the original simple Poisson model of Maher is not able
to reproduce the prices and dynamics that we observe in the market. In order to price bets more accurately,
Dixon and Robinson proposed some modulated Poisson processes for goals. Nevertheless, as there are few
closed-form for most liquid bets, they chose to calibrate their model historically. We generalize their models
and describe a procedure to calibrate the prices implicitely from a set of basic bets prices. Our main result is
the expressions of prices of correct score bets in this model.
1. Introduction
1.1 Introduction
Back to 20th century, betting on soccer was limited to bets on match outcome (win/draw/lose) and correct
score guessing. In 90s, the betting market began to grow intensively in volume and in diversity of bets. We
can count more than one hundred different bets that can be taken on a single soccer match: from the number
on the shirt of the first goalscorer to the number of substitutions made in each team passing by the time of
the n-th goal in the match. Moreover, some of these bets can be taken whereas the match is playing. Hence,
modelling the final score does not allow to price such exotic bets whose payoff is determined by the whole
path of the match process.
First attempts to price bets tried to model the outcome as a discrete variable which can take values in the
set (win, draw, lose). In these link-models, each team is given a rating, so that the strength differential in the
match is modelled as the difference of ratings and the discrete law is constructed with this rating differential.
Maher (1982) tried to model directly the final numbers of goals scored by each team. In his model, the final
score follows the law of a couple of independant Poisson variables with parameter depending on attack and
defence ratings of each team. More generally, we can define two independent Poisson processes N1 and N2 for
the score of each team.
To account for team strategies in a specific match evolution, Lee (1997) and later Dixon and Robinson
(1998) introduced time-varying parameters during the match depending on the instantaneous score. For example if a team is leading 1-0 he may choose to be more defensive. On the other way a losing team may adopt
a strategy with more attack. They estimate then the different abilities for each type of score. This approach
requires data about the goal times for the estimation to be done by likelyhood maximization of the goal-times
laws.
Empirical work by Vecer et al. (2006) and game theoretical work by Palomino et al. (2000) have shown with
a simple poisson model that teams play with more attack strength when they are losing and in a less statistically
significant way, they play more defensively when they are winnng. Indeed, we know that constant Poisson
model underestimates draw probability and does not reflect time variation of parameters which sometimes is
important. It seems that market intensities are in most cases draw-reverting in the sense that they evolve to
make the score closer to the draw result. Dixon incorporated these features in Maher model and also timedependency of intensities. We introduce here a general setup for soccer goals processes that reminds of the
models for credit event pricing.
Our aim being implicit calibration, we need a model that allows us to find closed formulas for the final
score probabilities. From basic bets prices, we can then infer the values of parameters for this specific match.
Then, by closed-form or by Monte-Carlo simulation we are able to give the prices of more sophisticated bets
like goal times or win/draw/lose (w/d/l). Ironically, in these models the bets on outcome w/d/l are highly exotic
and no closed-formula is available.

1.2 Market description


Let us review the principal products in the soccer betting market. The most liquid products are first the bets
on the match result and secondly the bets over/under which are defined on the total number of goals scored in
the match. Concretely these derivatives have payoff 1 if the match ends respectively in a home win, a draw or
a home lose, zero otherwise. For the over/under bets, the payoff is 1 if N1 (T ) + N2 (T ) is respectively over or
under a certain barrier, zero otherwise.
1.3 The general model for a soccer match
We will work on the probability space (, A , P), wher P is the risk-neutral probability chosen by the market.
Let W be an unidimensional Brownian motion under P and G a filtration including the Brownian motion
filtration. We recall that a Cox process is a Poisson process with stochastic jump rate. We define the goal
processes as follows:

N1 is a G -adapted Cox process with G -intensity 1

N2 is a G -adapted Cox process with G -intensity 2

for i = 1, 2 , i is a positive G -adapted process, solution of the SDE:


d i = i dt + i dWt + i,1 dN1 + i,2 dN2

where we suppose the state vector (t, N1 , N2 , 1 , 2 ) to be Markovian. It means that in each time interval
[t,t + dt], conditionally on Gt , process Ni has a jump of size 1 with probability i (t)dt and no jump occurs with
probability 1 (1 + 2 )dt, also no jumps occur simultaneously.
Rt

Finally, processes Ni (t) i (s)ds are G -martingales.


0

The SDE for i s clearly induces that intensities depend on time and score as in Lee-Dixon-Robinson
models but moreover we added stochasticity to the inter-goals intensities processes through the Brownian
motion W . We can imagine another generalization where instead of modelling only the scores, we model all
events which are of interest for bet pricing, e.g. the times of cards, corners or shots. Though this full model
would require more counting processes , more detailed data and would become highly untractable, it must be
investigated further. For example, Fitt et al. (2006) used simple Poisson processes for corners and goals and
then did some implicit calibration to the spread bet market.
2. Pricing and implicit calibration in a deterministic model
2.1 The deterministic time-proportional model
We will use the following assumptions (Hypothesis A):

We suppose i = 0, hence, between each goal, intensities are deterministic.

Time proportionality: i (t, n1 , n2 ) = in1 ,n2 . (t) where is a deterministic function.

The Markovian state vector is now (t, N1 (t), N2 (t)). Let us denote by Mt1 and Mt2 the martingales associated
with the Poisson processes under the market probability, which are defined by:
dMti = dNti i (t, Nt1 , Nt2 )dt , M0i = 0

Let PtT,n1 ,n2 (A ) denote the price at time t of some bet A if the score is (n1 , n2 ) at time t. By Itos lemma, we
get for the price dynamics:
hP
i
dP(t, Nt1 , Nt2 ) =
+ 1 P.1 + 2 P.2 .dt + 1 P.dMt1 + 2 P.dMt2
t
where: 1 P(t, n1 , n2 ) = P(t, n1 + 1, n2 ) P(t, n1 , n2 ) and 2 P(t, n1 , n2 ) = P(t, n1 , n2 + 1) P(t, n1 , n2 ).
Hence, under risk-neutral probability prices are martingales and the price dynamics between two goals is
given by the differential equation:
P
1
2
1
2
t + 1 P.1 (t, Nt , Nt ) + 2 P.2 (t, Nt , Nt ) = 0
Solving this equation gives the recursive (backward) pricing equation:
PtT, j,k

R T j,k
= PTT, j,k e t s ds +

Z T
t

[P1 (u)1j,k (u) + P2 (u)2j,k (u)]e

R u j,k
t

s ds

du

where sj,k = 1 (s, j, k) + 2 (s, j, k) is the total intensity and Pi is the value of the same bet just after a goal of
Team i.
Theorem 1. Under hypothesis A, the price of any bet depending only on the final scores has the following
form:

PtT,n1 ,n2 =

c( j,k,n1 ,n2 ) e

jn1 ;kn2

RT
t

(1 (s, j,k)+2 (s, j,k))ds

Sketch of the proof: we reduce the problem to correct score pricing and using the recursive formula, we
proceed by induction on goals needed to settle the bet.
For example, the probability of the score ending in (0 : 0) is:
R T 0,0
RT
PtT,0,0 (0 : 0) = e t s ds = e t 1 (s,0,0)+2 (s,0,0)ds
and the one for the score ending in (1 : 0) if the score is still (0 : 0):
h R T 0,0
(0,0)
R T 1,0 i

PtT,0,0 (1 : 0) = (1,0) (1,0)1 (0,0) (0,0) e t s ds e t s ds


1

+2

2.2 The case of score situation dependent intensities


Here, we consider that intensities are only governed by the instatenous result (w/d/l) and constant otherwise.
This generalizes Model III in Dixon and Robinson (1998) in that we do not impose any constraints on intergoal intensities .
This gives us six different intensities for a specific match:

(1,d) , (1,w) , (1,l) , (2,d) , (2,w) , (2,l)


and total intensities (w ,d ,l ), where w,d,l refer to the situation in the match from the view of team 1. For
example, 11,0 = 13,1 = (1,w) i.e. intensities are the same if score is 1-0 or 3-1.
In order to evaluate our six intensities we need (at least) six probabilities. Our main tool for calibration is
thus the following proposition which gives the probabilities for each score with less than two goals scored:
Theorem 2. In the model where intensities depend only of match situation win/draw/lose the following results
hold:
PtT,0,0 (0 : 0) = ed (T t)


PtT,0,0 (1 : 0) = (1,d) (T t).1 (d w )(T t) .ed (T t)


PtT,0,0 (0 : 1) = (2,d) (T t).1 (d l )(T t) .ed (T t)


PtT,0,0 (2 : 0) = (1,d) .(1,w) (T t)2 .2 (w d )(T t) .ew (T t)
 (T t)

T,0,0
2

)(T

t)
.e l
P
(0
:
2)
=
.
(T

t)
.
(
2
l
d
(2,d)
(2,l)
t
h
i



T,0,0
Pt (1 : 1) = (1,d) (2,w) .2 (d w )(T t) + (2,d) (1,l) .2 (d l )(T t) .(T t)2 ed (T t)

where function 1 (x) =

ex 1
x

is defined in x = 0 by its limit 1.

Similarly we define functions n by n (x) =

1 x
x2
xn (e 1 x 2!

n1

x
... (n1)!
) , n (0) =

1
n!

Nevertheless those formulas are not invertible in a closed form so we will use least squares fitting and use
more prices to get a better fit of the model.
2.3 Example of static implicit calibration and pricing on a specific match
We perform the implicit calibration on a prices set from a particular match. These prices have been taken
before the match. In table 1, we report the fitted intensities and observe the general feature of intensities, i.e.
the team who leads generally plays more defensively and the team who is behind plays more offensively. This
pattern confirms the results found by Palomino et al. (2000) in their analysis based on game theory and optimal
strategy for soccer teams.
In table 2 and 3, we present the results of the pricing by simulation/closed-forms.

Table 1. Intensities values for the whole match.


(1,d) (1,w) (1,l) (2,d) (2,w) (2,l)
1.189 1.062 1.306 1.127 1.199 1.088
Table 2. Result of pricing classical bets in the basic model.
Bet name
Back Lay Model price Confidence Interval - error %error
Win
2,72 2,74
2,74
2,73:2,76
1,4
0,51
Draw
3,35 3,4
3,34
3,32:3,37
-1,24 -0,92
Lose
2,98 3
2,97
2,95:2,97
-1,9
-0,64
Under2,5
1,69 1,71
1,69
0:0
-0,7
-0,41
0:0
9,8 10
10,14
0:0
2,4
2,42
0:1
9
9,2
9,35
0:0
2,53
2,78
0:2
17,5 18
17,42
0:0
-1,32 -1,86
0:3
50 60
49,53
47,79:51,4
-1,09 -9,95
1:0
8
8,2
8,29
0:0
1,86
2,3
1:1
7
7,2
7,02
0:0
-0,77 -1,08
1:2
13 13,5
12,93
0:0
-1,28 -2,42
1:3
36 38
36,34
34,24:37,5
-0,66 -1,78
2:0
15,5 16
15,47
0:0
-1,12 -1,78
2:1
12 12,5
12,01
0:0
-0,96 -1,96
2:2
20 21
20,01
19,57:20,48
-0,98 -2,39
2:3
48 55
52,83
50,91:54,89
0,38
2,58
3:0
46 50
43,51
0:0
-2,25 -9,35
3:1
30 34
34,28
33,28:35,35
1,14
7,13
3:2
42 48
51,76
49,9:53,76
2,25 15,02
3:3
110 120
125,6
118,7:133,4
2,12
9,22
Anyother
17 18
18,74
18,33:19,16
2,48
7,09
Team1 scores 1st 2 2,16
2,16
0:0
1,01
3,89
Team2 scores 1st 2,2 2,29
2,28
0:0
0,8
1,6

Error
0,85
1,07
1,1
0,54
2,41
2,65
1,57
3,3
2,07
0,91
1,76
1,09
1,41
1,37
1,53
0,99
4,58
2,85
5,82
4,42
4,19
1,99
1,13

The -error and %-error we quoted in the results are the difference between model price and mid-quote
price measured in half bid-ask spread and percentage of mid-price respectively.
Table 3. Result of pricing spread bets in the basic model.
Betname
Bid Ask Model Confidence Interval - error %-error
Winningmargin 0 0,2 0,05
0,04:0,06
-0,5
-50
Totalgoals
2,2 2,4 2,31
2,28:2,33
0,08
0,35
Firstmatchgoal 40 43 37,11
36,96:37,26
-2,93 -10,58
FirstTeam1goal 54 57 53,86
53,69:54,03
-1,09
-2,95
FirstTeam1goal 56 59 55,61
55,44:55,77
-1,26
-3,29
2ndMatchgoal 64 67 63,35
63,21:63,49
-1,43
-3,28
2ndTeam1goal 78 81 79,35
79,24:79,45
-0,1
-0,19
2ndTeam2goal 80 82 79,98
79,88:80,09
-1,02
-1,26
3rdMatchgoal 78 81 78,6
78,5:78,7
-0,6
-1,13
3rdTeam1goal 87 89 87,62
87,57:87,67
-0,38
-0,43
3rdTeam2goal 87 89 87,68
87,63:87,73
-0,32
-0,36
Lastmatchgoal 58 61 58,31
58,16:58,47
-0,79
-2
Winninggoal
33 36
34
33,8:34,1
-0,33
-1,45
Totalgoalminutes 110 120 111,4
110,9:111,8
-0,72
-3,13
TGM supremacy -2 12 2,06
1,66:2,46
-0,42
-58,8
TGM Team1
58 63 56,7
56,4:57
-1,52
-6,28
TGM Team2
53 58 54,6
54,3:54,9
-0,36
-1,62

Error
5
0,17
5,56
1,8
2,04
2,17
0,14
1,13
0,82
0,41
0,34
1,26
0,7
1,5
4,97
3,09
0,76

We remark that the fit is pretty good for almost all bets except for big scores. Nevertheless, we truncated the

intensity surface to one goal difference and then used only three levels. From Dixon and Robinson (1998), we
know a better but still parsimonious fit can be achieved by taking five levels for the intensity surface. This can
be compared to volatility surfaces in traditional finance and option pricing as in Black-Scholes model where
deep out or in-the-money options generally have a different implied volatility than at-the-money options.
For the spread bets, the fit is pretty good aswell even if some prices are underestimated by the model.
The larger spread in this market results in lower error due to the measurement unit. Nevertheless, the fit is
really better than a simple Poisson model would produce. The worst fit is for the time of first goal. This can
be investigated further. One reason could be the fact that intensities are not only score-dependent but also
time-dependent. We know by empirical works that intensity is generally slightly increasing in each half-time.
This clearly induces that the time of first goal would be greater than the one predicted by time-independent
intensities with same mean over the match. Indeed, the findings of Dixon and Robinson (1998) show that
intensities are better fitted with a time-trend. This induces that even if starting prices are well fitted, the
time-trend is necessary to get prices during the match closed to market prices.
2.4 Extension to affine inter-goals intensities
From theorem 1, we know that classical bets prices depend only on the integral of intensities, so we can easily
insert a time trend into the model as long as we keep the same average intensity.
As goal times are undervalued, we add an increasing affine component by taking (t) = at + (1 aT /2)
such that intensities have same means over the match.
In this affine case, we have a closed-formula for the First Goal Time price. Fitting the value of a with the First
Goal Time price, we found aT 0.7344. Hence, intensities grow from (1 aT /2) to (1 + aT /2) i.e. from
0.63 when match starts to 1.37 at the end of the match.
Within this affine model, the spread bets calibration is much better as we can see in table 4.
Table 4. Result of pricing spread bets in the affine model.
Betname
Bid Ask Model with trend -Error %-Error
Firstmatchgoal 40 43
41,5
0
0
FirstTeam1goal 54 57
57.11
1,07
2,9
FirstTeam2goal 56 59
58.66
0,77
2,02
2ndMatchgoal 64 67
66,64
0,76
1,74
2ndTeam1goal 78 81
80,80
0,87
1,64
2ndTeam2goal 80 82
81,29
0,29
0,36
3rdMatchgoal 78 81
80,20
0,47
0,88
3rdTeam1goal 87 89
88.00
0
0
3rdTeam2goal 87 89
88,06
0,06
0,07
Lastmatchgoal 58 61
61,33
1,22
3,08
Winninggoal
33 36
37.02
1,68
7,3
Totalgoalminutes 110 120
121.53
1,31
5,68
TGMsupremacy -2 12
2,59
-0,34
-48,2
TGM Team1
58 63
62.06
0,62
2,58
TGM Team2
53 58
59.47
1,59
7,15

3. Towards stochastic inter-goals intensities


The need for a stochastic inter-goals intensity is obvious if we look at the implied parameters during the match,
they evolve really randomly around the deterministic fit. In order to get some closed-forms for basic prices, we
chose for intensities the shifted square root process introduced in finance by Cox et al. (1985) as the essential
formula we need, its Laplace transform, is given explicitely.
(n ,n )
We suppose i (t, n1 , n2 ) = i 1 2 . (t) where (t) is a shifted CIR process i.e:
(t) = (t) + x(t)

where (t) 0 is a deterministic function and xt is a CIR process with x0 > 0 and dynamics given by:

dxt = ( xt )dt + xt dWt

with , , > 0 and 2 > 2 to ensure positivity of xt .


This special form for i s has several implications. The Markovian state vector is (t, xt , N1 (t), N2 (t)) and
the prices of classical bets can be found by conditoning on x path as we recover the prices in the deterministic
model with assumption A.
h
i
PtT,n1 ,n2 ,x (A ) = E PtT,n1 ,n2 (A ) | xt = x, N1 (t) = n1 , N2 (t) = n2
where P (resp. P) denotes the price in the stochastic (resp. deterministic) model.
From Lamberton and Lapeyre (1995), we know that if (xs ) is a G -adapted CIR process with 2 > 2
then
h
i
RT
E e t xs ds | Gt = em( ,T t)n( ,T t).xt .
where


2
2 (1 e )
2
, n( , ) =
.
m( , ) = 2 ln

+ + ( )e
+ + ( )e

and = 2 2 + 2 .
Hence, we recover closed-forms for the prices of classical bets as they are linear combinations of Laplace
transforms of the integral of x.
For example, the probability that no more goal is scored is:
(T,n ,n ,x)
Pt 1 2 (n1 : n2 ) = em( ,T t)n( ,T t).x
(n ,n )

RT
t (s)ds

(n ,n )

with = 1 1 2 + 2 1 2 .
Finally, we can perform some implicit calibration to recover the parameters of the models for any match
but we need more prices in closed-forms due to the additional parameters. We remark that another process
than CIR may be used if we know its integrated Laplace transform. In a next work, we will present the results
obtained in this stochastic intensity model.
4. Conclusion
We extended the model of Dixon and Robinson and found closed-formulas for soccer betting products which
fit price structure better than the Poisson model with constant intensities. Implicit calibration gives surprisingly
good results given the simplicity of the model. Nevertheless, if we use a more detailed price structure, this basic
model might show its limits and we introduce a stochastic inter-goals intensity model where some calibration
procedure is described aswell.
References
Cox J.C., Ingersoll J.E. and Ross S.A. (1985) A theory of the term structure of interest rates. Econometrica
53, 385408.
Dixon M.J. and Robinson M.E. (1998) A birth process model for association football matches. The
Statistician 47, 523538.
Fitt A., Howls C. and Kabelka M. (2006) The valuation of soccer spread bets. J. Oper. Res. Soc. 57, 975985.
Lamberton D. and Lapeyre B. (1995) An introduction to stochastic calculus applied to finance, Chapman and
Hall.
Lee A. (1997) Modeling scores in the premier league: Is manchester united really the best? Chance , 1519.
Maher M. (1982) Modelling association football scores. Statistica Neerlandica 36, 109118.
Palomino F., Rigotti L. and Rustichini A. (2000) Skill, strategy, and passion: an empirical analysis of soccer,
Econometric Society World Congress 2000 Contributed Papers 1822, Econometric Society.
Vecer J., Ichiba T. and Laudanovic M. (2006) Parallels between betting contracts and credit derivatives:
Lessons learned from fifa world cup 2006 betting markets, Technical report, Department of Statistics,
Columbia University.

You might also like