You are on page 1of 5

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/340594164

The Math of Epidemic Outbreaks and Spread (Part 3) Least Squares Fitting of
Gompertz Growth Models

Technical Report · April 2020

CITATION READS

1 829

1 author:

Christian Bauckhage
University of Bonn
392 PUBLICATIONS   6,446 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Informed Learning - Hybrid Learning View project

lectures on pattern recognition View project

All content following this page was uploaded by Christian Bauckhage on 12 April 2020.

The user has requested enhancement of the downloaded file.


The Math of Epidemic Outbreaks and Spread (Part 3)
Least Squares Fitting of Gompertz Growth Models
Christian Bauckhage∗
ABSTRACT
200K
Ten days after our last note in this series, worldwide testing for prediction on Apr 11
COVID-19 infections has continued and and it seems that current 150K prediction on Apr 10
case data can be better explained in terms of Gompertz growth prediction on Apr 9
prediction on Apr 8
functions rather than in terms of logistic growth functions. 100K
prediction on Apr 7
In this note, we therefore discuss the Gompertz function and its
50K
use in mathematical epidemiology. Moreover, we briefly discuss
how to fit this model to data points representing infection counts 0
at different points in time. To practically support our theoretical Feb Mar Apr May
discussion, we once again consider recent COVID-19 data. (a) data and logistic predictions

1 INTRODUCTION 250K
Previously [4, 5], we discussed logistic growth as a mathematical prediction on Apr 11
200K prediction on Apr 10
model of the evolution of the accumulated infection counts dur- prediction on Apr 9
ing an epidemic outbreak. Letting I (t) denote the total number of 150K
prediction on Apr 8
infections at time t, we saw that the logistic differential equation 100K prediction on Apr 7
 
d I (t) = r · I (t) · 1 − I (t) (1)
50K
dt N 0
describes a dynamic where the change in infection counts grows Feb Mar Apr May
proportional to the current number but is also modulated by another (b) data and Gompertz predictions
time dependent function. In other words, we have a multiplicative
coupling of two dynamics. The first factor is r · I (t) where the Figure 1: Daily confirmed COVID-19 cases (infected people)
constant r > 0 is a growth rate and the second factor is the damping in Germany as of April 11, 2020 and predictions according
function to logistic- and Gompertz models. Note the different scales
I (t) on the vertical axes.
d L (t) = 1 −
N
where the constant N is called the carrying capacity of an epidemic.
Hence, as long as I (t) is rather small (during the early phase of
an epidemic), the dynamics in (1) reflect quasi exponential growth.
ten days after our last note in this series, it appears that, regarding
But, as I (t) approaches N , growth will slow down and eventually
the current COVID-19 pandemic, the logistic growth model in (2)
cease altogether.
favors simplicity over expressiveness.
Solving the logistic differential equation for I (t) leads to the
Looking at Fig. 1(a) which shows the evolution of accumulated
logistic function
German COVID-19 cases as of April 11, 2020 and logistic functions
N fitted to this data, we note that the logistic model underestimates
I (t) = (2)
1 + exp −r · (t − t 0 )

the number of cases at the end of the observation period.
where t 0 is a location parameter indicating the inflection point of In this note, we therefore consider yet another epidemic growth
the S-shaped graph of this function. model, the so called Gompertz model, and fit it to the latest available
Indeed, fitting this simple model to real world epidemic data, COVID-19 data. Indeed, simple visual inspection of the results in
i.e. adjusting its parameters N , r , and t 0 such that the graph of I (t) Fig. 1(b) suggests that the Gompertz model explains the latest data
agrees with the given data, often provides reasonable results (see better than the logistic model does.
Fig. 2 which is a reprint from [5]). In what follows, we discuss the Gompertz model and how to fit
However, in [4, 5], we also discussed mathematical modeling it to data (corresponding NumPy / SciPy snippets are provided in
in general and remarked that it usually requires us to trade off the appendix). To have practical examples for our discussion, we
simplicity against expressiveness. That is, mathematical modeling will once again analyze COVID-19 data from the Johns Hopkins
of real world phenomena is often a balancing act where the model University Coronavirus Ressource Center for which frequently
should be simple enough to work with yet powerful enough to updated CSV data files are available on this site
provide good matches to any available data or observations. Now,
data.humdata.org/dataset/novel-coronavirus-2019-ncov-cases
∗ ORCID: 0000-0001-6615-2128
Christian Bauckhage

150K Table 1: Parameter estimates for logistic- and Gompertz


prediction on Mar 30 growth models fitted to the recent COVID-19 data in Fig. 1
prediction on Mar 29
100K prediction on Mar 28
prediction on Mar 27
estimates for error
prediction on Mar 26
50K model data up to N r t0 E
Apr 11 135903.46 0.19 63.16 1.12 · 103
0 Apr 10 133812.24 0.19 62.96 1.07 · 103
Feb Mar Apr May logistic Apr 9 130853.49 0.19 62.67 9.81 · 102
Apr 8 127605.51 0.20 62.37 8.79 · 102
Figure 2: Reprint from [5]: German COVID-19 case data as Apr 7 124765.46 0.20 62.10 8.12 · 102
of March 30, 2020 with logistic fits and predictions.
Apr 11 179518.39 0.08 62.81 7.24 · 102
Apr 10 181636.22 0.08 62.96 7.17 · 102
Gompertz Apr 9 182025.15 0.08 62.99 7.21 · 102
2 GOMPERTZ GROWTH FUNCTIONS Apr 8 182112.66 0.08 62.99 7.26 · 102
The origins of the logistic differential equation in (1) can be traced Apr 7 184116.82 0.08 63.13 7.27 · 102
back to Verhulst [15]. A possible generalization of this dymnamic
is due to Richards [13], namely
 1! 3 FITTING GOMPERTZ MODELS TO DATA
d I (t) = r · I (t) · s · 1 − I (t)
s

dt (3) Just as the logistic function, the doubly exponential Gompertz func-
N
tion in (5) has an S-shaped graph whose appearance is controlled
where the additional parameter s allows for controlling whether by the three parameters N , r , and t 0 (see Fig. 3).
maximum growth occurs closer to the lower asymptote (0) or upper Parameter N indicates which value the function converges to.
asymptote (N ) of the corresponding function I (t). In other words, N = limt →∞ I (t) and, for larger N , the graph of
Apparently, this generalization still describes a dynamic where I (t) will approach values higher above the horizontal axis. The rate
the change in infection counts grows proportional to the current parameter r controls the maximum steepness of I (t); the smaller r ,
count modulated by a damping function the slower I (t) increases. Finally, the location parameter t 0 controls
the location of the graph of I (t) along the horizontal axis. For t 0 < 0
1!
it moves to the left and, for t 0 > 0, it moves to the right.

I (t) s
d R (t) = s · 1 − Note, however, that, contrary to the logistic function, the graph
N
of the Gompertz function is not necessarily symmetric. For instance,
Note that the logistic differential equation in (1) is really but most of the graphs plotted in Fig. 3 show very fast initial growth
a special case of Richards’ differential equation, namely the case and tamper off not quite as quickly. In other words, the Gompertz
where s = 1. But what about other choices of s? What if s was really function describes a different kind of growth dynamic than the
large? Since logistic function.
 1!   Next, we briefly discuss how to fit Gompertz functions of the
I (t) s I (t) form in (5) to data representing accumulated infection counts during
lim d R (t) = lim s · 1 − = − log
s→∞ s→∞ N N an epidemic. Just as in [5], we suppose we are given a sequence
of observations I 0 , I 1 , . . . , IT where It denotes the total number of
and since − log(x) = log(1/x ), the latter of these questions leads to
confirmed infections at time t and t counts how many days have
yet another damping function
passed since the initial count I 0 became available.
Given these assumptions, we may cast our model model fitting
 
N
dG (t) = log problem as the problem of minimizing the root mean square error
I (t)
v
u
tT
In other words, in the limit s → ∞, Richards’ differential equation Õ 2
in (3) becomes E= I (t) − It (6)
  t =0
d I (t) = r · I (t) · log N (4)
dt with respect to the model parameters N , r , and t 0 . In [5], we saw that
I (t)
this can be done using iterative gradient descent procedures such
which is known as the Gompertz differential equation named after as the Levenberg-Marquardt algorithm. When working with SciPy,
the British actuary B. Gompertz who considered it as a model of these can be easily implemented since SciPy’s optimize module
the change of mortality rates [10]. provides methods for this purpose and we will show how to use
Solving (4) for I (t) leads to the Gompertz function which we write them in the appendix.
using parameters compatible to those in our earlier discussion, The data in Fig. 1 represent daily accumulated infections counts
namely I 0 , I 1 , I 2 , . . . during the current COVID-19 outbreak in Germany.
  The top panel shows the graphs of five logistic functions which were
I (t) = N · exp − exp −r · (t − t 0 ) (5) obtained from fitting to data available up until Apr 7, Apr 8, Apr 9,
The Math of Epidemic Outbreaks and Spread (Part 3)

N = 0.25 r = 0.25 t0 = −2
1 1
N = 0.50 r = 0.50 t0 = −1
N = 1.00 r = 1.00 t0 =0
N = 2.00 r = 2.00 t0 =1
N = 4.00 r = 4.00
t0 =2

0 0 0
-5 -4 -3 -2 -1 0 1 2 3 4 5 -5 -4 -3 -2 -1 0 1 2 3 4 5 -5 -4 -3 -2 -1 0 1 2 3 4 5

(a) effect of parameter N (b) effect of parameter r (c) effect of parameter t 0

Figure 3: Illustration of the effects of varying the parameters N , r , and t 0 of the Gompertz function in (5).

Apr 10, and Apr 11, respectively. The bottom panel shows respective A PYTHON CODE
results obtained from fitting the Gompertz model. Apparently, the Listing 1 provides NumPy / SciPy code for fitting Gompertz models
five Gompertz fits (and their corresponding predictions) are more to COVID-19 data using least squares methods. Readers who would
similar to one another than the five logistic fits. More importantly, like to experiment with this snippet should import
the Gompertz fits also seem to match the data better than the logistic
fits. import numpy as np
This qualitative observation is corroborated by the quantitative import scipy . optimize as opt
measurements in Tab. 1. In addition to our (simple) least squares
parameter estimates for both models, the table also lists the corre- Assuming that fname is a string containing the file path to a
sponding root mean square errors. For the fitted Gompertz func- corresponding data file, line 15 uses the function loadData (pro-
tions these are consistently smaller than for the logistic functions. In vided in [4]) to load German COVID-19 case data into a 1D NumPy
other words, the Gompertz model “explains” the German COVID-19 array cases. As this array may contain leading 0s, we call function
data better than the logistic model. onsetIndex to determine the index of the first entry different from
0. Using this index, line 18 computes an array I_t that represents
the infection counts I 0 , I 1 , I 2 , . . . we discussed in section 3. In line
19, we initialize an array t containing time points t = 0, 1, 2, . . .
4 CONCLUSION Line 21 declares that the model we would like to fit to the
Extending our previous discussion [4, 5], we considered yet another data in I_t is the Gompertz function and we note that function
model of the progress of infections during an epidemic. The Gom- gompertzFct is a direct implementation of equation (5). Line 22
pertz model, too, can be derived from reasonable assumptions as to defines an initial guess of the model parameters N , r , and t 0 .
the rate of change of infection counts: Just as the logistic differential Given model and guess, line 24 calls function curve_fit from
equation, the Gompertz differential equation is a special case of a SciPy’s optimize module. Its arguments are the model we want to
model due to Richards [13]. fit, the time points t for which we have data, the given data I_t,
The fact that the Gompertz model fits recent COVID-19 data and an initial guess of the model parameters. Once it has completed
better than the logistic model underlines the importance of selecting its computation, curve_fit returns a 1D array parameters con-
appropriate models for empirical data analysis [16]. taining point estimates of N , r , and t 0 and a 2D array variances
with covariance estimates for these parameters.
Finally, line 26 evaluates the fitted model on the time points in t
and line 28 prints the root mean square error between the model
5 NOTES AND FURTHER READING
prediction G_t and the given observations in I_t.
Example of the use of Gompertz models in analyzing and predicting
COVID-19 developments can be found in [1, 12]
REFERENCES
Due to its origins as a model of mortality, the Gompertz func-
[1] A. Ahmadi, Y. Fadaei, M. Shirani, and F. Rahmani. 2020. Modeling and
tion has a rich history in epidemiology, statistical biology, and Forecasting Trend of COVID-19 Epidemic in Iran. medRxiv (2020). DOI:
the demographic- and actuarial sciences [2, 14, 17]. It can also be 10.1101/2020.03.17.20037671.
[2] R.B. Banks. 1994. Growth and Diffusion Phenomena. Springer.
rewritten in terms of a probability density function [3] and is often [3] C. Bauckhage. 2014. Characterizations and Kullback-Leibler Divergence of Gom-
considered as a model of the diffusion of novel products as well pertz Distributions. arXiv:1402.3193 [cs.IT] (2014).
as of customer life-time values [8, 9, 11]. Finally, in social media [4] C. Bauckhage. 2020. The Math of Epidemic Outbreaks and Spread (Part 1)
Exponential Growth versus Logistic Growth. researchgate.net.
analysis, Gompertz growth dynamics are often observed to account [5] C. Bauckhage. 2020. The Math of Epidemic Outbreaks and Spread (Part 2) Least
well for the spread of memes or other viral content [6, 7]. Squares Parameter Estimation for Logistic Models. researchgate.net.
Christian Bauckhage

Listing 1: Python code for Gompertz model fitting [6] C. Bauckhage, K. Kersting, and F. Hadiji. 2013. Mathematical Models of Fads
Explain the Temporal Dynamics of Internet Memes. In Proc. ICWSM. AAAI.
1 def gompertzFct (t , N , r , t0 ): [7] C. Bauckhage, K. Kersting, and B. Rastegarpanah. 2014. Collective Attention to
2 return N * np . exp (- np . exp (-r * (t - t0 ))) Social Media Evolves According to Diffusion Models. In Proc. WWW. ACM.
3 [8] A. Bemmaor. 1994. Modeling the Diffusion of New Durable Goods: Word-of-
4
5
mouth Effect Versus Consumer Heterogeneity. In Research Traditions in Marketing,
6 def onsetIndex ( data ): G. Laurent, G.L. Lilien, and B. Pras (Eds.). Springer.
7 for i , d in enumerate ( data ): [9] P.A. Geroski. 2000. Models of Technology Diffusion. Research Policy 29, 4–5
8 if d != 0: (2000).
9 return i [10] B. Gompertz. 1825. On the Nature of the Function Expressive of the Law of Human
10 return 0 Mortality, and on a New Mode of Determining the Value of Life Contingencies.
11 Philosophical Trans. of the Royal Society 115 (1825).
12
13
[11] H. Jaakola. 1996. Comparison and Analysis of Diffusion Models. In Diffusion and
14 # ## use loadData in reference [4] to load COVID -19 data Adoption opf Information Technology, K. Kautz and J. Pries-Heje (Eds.). Chapman
15 ctr , lat , lon , dates , cases = loadData ( ' Germany ' , fname ) & Hall.
16 [12] L. Jia, K. Li, Y. Jiang, X. Guo, and T. Zhao. 2020. Prediction and Analysis of
17 onset = onsetIndex ( cases ) Coronavirus Disease 2019. arXiv:2003.05447 [q-bio.PE] (2020).
18 I_t = cases [ onset :] [13] F.J. Richards. 1959. A Flexible Growth Function for Empirical Use. J. of Experi-
19 t = np . arange ( len ( I_t )) mental Botany 10, 2 (1959).
20
21 model = gompertzFct ;
[14] B.L. Strehler and A.S. Mildvan. 1960. General Theory of Mortality and Aging.
22 guess = (100000. , .1 , 50.) Science 132, 3418 (1960).
23 [15] P.-F. Verhulst. 1838. Notice sur la loi que la population suit dans son accroissement.
24 parameters , variances = opt . curve_fit ( model , t , I_t , p0 = guess ) Correspondance Mathématique et Physique 10 (1838).
25 [16] L. von Rueden, S. Mayer, K. Beckh, B. Georgiev, S. Giesselbach, R. Heese, B. Kirsch,
26 G_t = model (t , * parameters ) J. Pfrommer, A. Pick, R. Ramamurthy, M. Walczak, J. Garcke, C. Bauckhage, and
27 J. Schuecker. 2020. Informed Machine Learning – A Taxonomy and Survey of
28 print ( np . sqrt ( np . mean (( I_t - G_t )**2)))
Integrating Knowledge into Learning Systems. arXiv:1903.12394 [stat.ML] (2020).
[17] C. Winsor. 1932. The Gompertz Curve as a Growth Curve. Proc. of the National
Academy of Sciences 18, 1 (1932).

View publication stats

You might also like