You are on page 1of 5

Open Access

Volume: 42, Article ID: e2020028, 5 pages


https://doi.org/10.4178/epih.e2020028

HEALTH STATISTICS

Estimation of the reproduction number and early


prediction of the COVID-19 outbreak in India using a
statistical computing approach
Karthick Kanagarathinam1, Kavaskar Sekar2
Department of EEE, GMR Institute of Technology, Rajam, India; 2Department of EEE, Panimalar Engineering College, Chennai, Tamilnadu, India
1

Coronavirus disease 2019 (COVID-19), which causes severe respiratory illness, has become a pandemic. The World Health
Organization has declared it a public health crisis of international concern. We developed a susceptible, exposed, infected, recov-
ered (SEIR) model for COVID-19 to show the importance of estimating the reproduction number (R0). This work is focused on
predicting the COVID-19 outbreak in its early stage in India based on an estimation of R0. The developed model will help policy-
makers to take active measures prior to the further spread of COVID-19. Data on daily newly infected cases in India from March
2, 2020 to April 2, 2020 were to estimate R0 using the earlyR package. The maximum-likelihood approach was used to analyze the
distribution of R0 values, and the bootstrap strategy was applied for resampling to identify the most likely R0 value. We estimated
the median value of R0 to be 1.471 (95% confidence interval [CI], 1.351 to 1.592) and predicted that the new case count may reach
39,382 (95% CI, 34,300 to 47,351) in 30 days.

KEY WORDS: Basic reproduction number, COVID-19, Forecasting, Statistical computing

INTRODUCTION declared a countrywide lockdown for 21 days on March 24, 2020


as a measure to control the spread of COVID-19, which has de-
Coronavirus disease 2019 (COVID-19) has rapidly spread veloped into a pandemic. The transmission rate of COVID-19 has
worldwide, with 896,450 confirmed total new cases and 45,526 been relatively low in most countries, but with major outbreaks in
deaths globally as of April 2, 2020 [1]. The disease emerged as 27 a few countries, such as Iran, Italy, Japan, and Korea. Most coun-
cases of pneumonia with an unknown cause in Wuhan, China. tries have at least an early stage of COVID-19 spread before any
The first COVID-19 case in India was identified on January 30, mitigation measures have an impact [3]. Myers et al. [4] stated
2020, and the total number of reported cases reached 2,322 as of that accurate epidemic forecasting models would noticeably im-
April 3, 2020 [2]. On March 3, 2020, the Indian government sus- prove epidemic prevention and control capabilities. No vaccine is
pended all new visas and visas issued to nationals of Iran, Italy, Ja- available for COVID-19, and vaccination is typically not a good
pan, and Korea, and on the next day implemented compulsory option for stopping the spread of a new epidemic, as considerable
screening of all international passengers. The Indian government time is required to develop a safe and effective vaccine (approxi-
mately 10 years) [5]. Li et al. [6] found that the COVID-19 incu-
Correspondence: Karthick Kanagarathinam bation period was 5.2 days (95% confidence interval [CI], 4.1 to
Department of EEE, GMR Institute of Technology, GMR Nagar, 7.0) and found indications that human-to-human transmission
Rajam 532127, India occurred among close contacts. India is the second most populat-
E-mail: kkarthiks@gmail.com ed country, it is important to estimate the transmissibility of
Received: Apr 9, 2020 / Accepted: May 9, 2020 / Published: May 9, 2009 COVID-19 and to predict the total number of new cases, which
This article is available from: https://e-epih.org/ will help direct focus towards this public health crisis. Mathemati-
This is an open-access article distributed under the terms of the Creative cally based epidemic models, such as susceptible-infected-recov-
Commons Attribution License (https://creativecommons.org/licenses/by/4.0/),
which permits unrestricted use, distribution, and reproduction in any medium, ered (SIR) models [7], susceptible-infected-susceptible (SIS)
provided the original work is properly cited. models [8], susceptible-exposed-infected-recovered (SEIR) mod-
2020, Korean Society of Epidemiology els [9], and susceptible-exposed-infected-recovered-susceptible

www.e-epih.org | 1
Mar 8 5 Mar 24 85
Epidemiol Health 2020;42:e2020028
Mar 9 5 Mar 25 87
Mar 10 6 Mar 26 88
Mar 11 10 Mar 27 140
(SEIRS) models [10] are used to predict the trajectory of epidem- Data
Mar 12 13 Mar 28 84
ics. Estimating the reproduction number (R0) can be estimated All the data shown in Table 1 were collected from an Indian of-
Mar 13
statistically or empirically. In this work, we used the earlyR (htt- 8 Mar 29
ficial website [2]. The epidemiological data from March 2, 2020 to 106
ps://cran.r-project.org/) package to estimate R0 and predictMar the 14 April 2, 2020, as16 shown in Table 1, Mar were30utilized to estimate227R0. A
trajectory of the outbreak. Mar 15 higher R0 indicates a higher likelihood of new infections. 146
10 Mar 31
Mar 16 11 Apr 1 437
Mar 17 Model 19
development Apr 2 235
METHODS
The transmissibility of COVID-19 in India was evaluated using
Model development
Susceptible-exposed-infected-recovered-susceptible the earlyR package. It was assumed that interventions so far have
mathematical model had a minimal impact on COVID-19 transmission in India. The
The transmissibility of COVID-19 in India was evaluated using the earlyR package. It was
SEIR models can be used to predict the number of people in- model used herein is a simplified version of the model introduced
assumed that interventions so far have had a minimal impact on COVID-19 transmission in
fected based on R0. We have given a SEIR model in this study to by Cori et al. [13]. Serial interval distributions (i.e., mean and
India. The model used herein is a simplified version of the model introduced by Anne Cori
demonstrate the importance of estimating R0 [11]. COVID-19 standard deviation [SD]) are required to estimate R0. We assumed
et al. [14]. Serial interval distributions (i.e., mean and standard deviation) are required to
has an incubation period, also known as a latent period or latent that the mean and SD were 4.7 days and 2.9 days, respectively,
estimate R0. We assumed that the mean and standard deviation were 4.7 days and 2.9 days,
delay (τ), of 2-14 days. The following assumptions were made for based on existing research [14]. The maximum-likelihood (ML)
respectively, based on existing research [15]. The maximum-likelihood (ML) approach was
developing the mathematical model for COVID-19. approach was applied to obtain the distribution of R0. The boot-
applied to obtain the distribution of R0. The bootstrap strategy was applied for re-sampling
- The population growth of the region/country is exponen- strap strategy was applied for re-sampling 1,000 times to obtain
tial, and the COVID-19 epidemic is occurring1000 times to likely
in a suffi- obtainR likely
0 values. 0 values. The R package projection was used to predict the
RThe R package projection was used to predict the
ciently short period cumulative daily incidencedaily
cumulative [16]. We forecast
incidence the forecast
[15]. We cumulative total new cases
the cumulative total after 30 days.
- Infected individuals are assumed not to give birth The daily incidence obeys a Poisson distribution determined by daily
new cases after 30 days. The daily incidence obeys a Poisson dis- infectiousness, which
is denoted as,
- Recovered individuals acquire permanent immunity with tribution determined by daily infectiousness, which is denoted as,
a probability f(0 ≤ f ≤ 1) or die from the disease with a
λ�t� � ∑��� ��� X � ���t � �� (7)
probability of (1-f)
With S referring to susceptible individuals, E to susceptible Where ���tin-� ��theWhere vectorVof(t-k) the vector ofmass
the probability the probability
function and mass function
Xk is and
is the real-time incidence
Withdividuals
referring
S referring that become exposed
to susceptible
to susceptible individuals,
individuals, at time t-τ, I to individuals
EEtotosusceptible
susceptible individuals
individuals who
that
that become
become X
at time k. The forecasting is is
exposed
exposed
k the real-time incidence at time k. The forecasting model
model depended on the present incidence and serial interval
WithareS tinfected,
at time referring to
and
-τ, I to individualssusceptible
individuals R towho those
who individuals,
are
are who have
infected,
infected, E
and
andtoRRsusceptible
recovered thosefrom
totothose whoindividuals
who COVID-19,
have
have that become
recovered
recovered from
from exposed on the present incidence and serial interval distribu-
depended
COVID-
COVID-
at distributions. The projections were based on resampling and probability computations. The
19,time
the t-τ, I to individuals
resulting differential
the resultingdifferential
differential
who are
equations
equationsinfected,
equations are: and
are: are: R to those who have recovered from COVID-
tions. The projections were were
baseddone
on resampling and probability
19, the resulting differential equations are: statistical analysis and model development using R version 3.6.3.
dS(t)
γ
I(t)SS(t)
I(t) (t)
(1)
computations. The statistical analysis and model development
= 𝑏𝑏 𝑠𝑠(𝑡𝑡)
𝑠𝑠(𝑡𝑡) +
+ 𝑏𝑏𝑏𝑏(𝑡𝑡)
𝑏𝑏𝑏𝑏(𝑡𝑡) +
+ 𝑏𝑏𝑏𝑏(𝑡𝑡)
𝑏𝑏𝑏𝑏(𝑡𝑡)−− 𝜇𝜇𝜇𝜇(𝑡𝑡)
𝜇𝜇𝜇𝜇(𝑡𝑡) −− γ (1)
dS(t)
dt
= 𝑏𝑏 𝑠𝑠(𝑡𝑡) + 𝑏𝑏𝑏𝑏(𝑡𝑡) + 𝑏𝑏𝑏𝑏(𝑡𝑡) − 𝜇𝜇𝜇𝜇(𝑡𝑡) − γ N(t)
I(t)
N(t)S (t)
N(t)
RESULTS (1) AND were DISCUSSION
done using R version 3.6.3 (https://cran.r-project.org/bin/
dt
dE(t)
dE(t)
=γ I(t)S (t)
I(t)S (t) I(t−τ)SS(t−τ)
I(t−τ)
− γγ I(t−τ)

(t−τ) −μτ
ee −−μμE(t)
−μτ
E(t) (2)
(2)
windows/base/old/3.6.3/).
dt
dE(t)
dt
= γ N(t) N(t)(t)
N(t)
I(t)S
− γ N(t−τ)
S (t−τ) −μτ
N(t−τ)
N(t−τ)
e − μ E(t) Figure 1 shows (2) the daily incidence of COVID-19 in India from March 2, 2020 to April 2, 2020.
dt
dI(t)
dI(t) I(t−τ) SS (t−τ)
I(t−τ) (t−τ) −μτ Figure 2 shows the distribution of likely values of the R0 of COVID-19 in India. We estimated
== γ I(t−τ) ee−μτ − −μμI(t)I(t)−−ααI(t)
I(t) (3)
(3)
dt
dI(t)
dt N(t−τ)
S (t−τ) −μτ
N(t−τ)
= γ N(t−τ) e − μ I(t) − α I(t) the ML value (3) of R0 as 1.471 (95% CI, 1.351 to 1.592) for COVID-19 in the early stage in India.
dt Table 1. Actual coronavirus disease 2019 daily new confirmed cases
dR(t)
dR(t)
=
= −μ
−μ R(t)
R(t) −− ff αα I(t)
I(t)
Figure 3 shows
(4)
(4)
a histogram of R0 values using the bootstrap strategy with 1,000 likely samples.
dt
dR(t)
dt in India
= −μ R(t) − f α I(t) (4)
dt
Where
Where μμ is
is the
the per
per capita
capita death
death rate
rate due
due totocauses
causes other
other Figure
than
than the
the 4 shows
disease,
disease, the
γ isis theglobal
the rate
rateofof spread New
of COVID-19
confirmed during the same period. The vertical gray bars
New confirmed
Whereμμisisthe theperper capita death rate due to causes other than the γγ is Date in 2020 Date in 2020
contact (or) transmission rate (or) infection rate, α is theindicate
Where
contact (or) transmissioncapita death
rate rate
(or) due
infection to causes
rate, α other
is the recovery the
than the
recovery rate,presence
disease,
rate, and b the
is the
and b is the ofper
rate
percases and black
of cases dots
(n) denote the dates of symptom
cases (N) onset. The dashed
disease,
contact
capita γ(or)
is rate
the rate of contact
transmission rate (or)(or) transmission
infection rate, α is the rate (or) infec-
recovery rate, line
and bindicates
is the per the current date (April 3, 2020). The vertical scale in Figure 4 shows
capita birthbirth rate (with
(with b>μ).
b>μ). vertical blue Mar 2 2 Mar 18 14
tion rate,birth
capita α isratethe(with b>μ). rate, and b is the per capita birth rate
recovery
At any instant, the relative scale of
Mar infections.
3 Figure 5 shows
1 the predicted
Mar 19 cumulative
22cases in next 30 days.
At any
(with μ).
b >instant,
At any instant, Mar 4 22 Mar 20 50
SAt(t)
S (t)any++ EEinstant,
(t)
(t) ++ II (t)
(t) ++ R(t)
R(t) == N N (t)
(t) We computed (5)
(5) that the cumulative number of new cases may reach 39,382 (95% CI, 34,300 to
S (t) + E (t) + I (t) + R(t) = N (t) Mar 5 2 Mar 21 60
R 47,351) in(5) the next 30 days. The R0 data were estimated based on the existing COVID-19 data
R00 is
is defined
defined as, as, Mar 6 1 Mar 22 77
RR00 isis defined
defined as,as,
γ e−bτ Mar 7 3 Mar 23 74
R
R00 =
γ e−bτ
= γb+α 4 (6)
(6)
R 0 =constant
e−bτ
b+α
(6) Mar 8 5 Mar 24 85
This
This constant b+α is
is extremely
extremely important
important inin characterizing
characterizing the the spread
spreadofofCOVID-19.
COVID-19. ItIt
Thisconstant
reflects constant
how is is extremely important in characterizing the Mar 9 5 Mar 25 87
This
reflects how manymany people
extremely
people contract
important
contract the
thedisease from
in characterizing
disease from ananinfectious individual.
the spread
infectious InIngeneral,
of COVID-19.
individual. general, It
If
spread R >1, ofhowsecondary
COVID-19. infections will the
It contract
reflects occur and the disease is spreading thethroughout Mar 10the 6 Mar 26 88
reflects
If R00>1, many people
secondary infections will how occur many
disease thepeople
and from disease contract
an infectious individual.
is spreading In general,
throughout the
population.
If R0>1, According
secondary to WHO
infections information
will occur as of January 23, 2020, the R of COVID-
Mar 11 10 Mar 27 140
disease
population.from an infectious
According to WHO individual.
information Inand as the
general, disease
If R0 >23,
of January is1,spreading
second-
2020, the throughout
0
R0 of COVID- the
19 lies between
population. 1.4 andto2.5.
According WHO R0 may vary considerably
information as of January for different
23, 2020,infectious
the R0 ofMar diseases,
COVID-
12 13 Mar 28 84
19infections
arybut lies between will 1.4occur
and 2.5. andR0the may vary considerably
disease is spreading for throughout
different infectious diseases,
19 alsobetween
lies for the same 1.4 anddisease
2.5. in
R 0different
may vary populations
considerably [12].for different infectiousMar diseases,
but also
thebutpopulation. for the same disease in different populations
According to WHO information as of January 23, [12]. 13 8 Mar 29 106
also for the same disease in different populations [12].
2020, the R0 of COVID-19 lies between 1.4 and 2.5. R0 may vary
Data Mar 14 16 Mar 30 227
Data Mar 15 10 Mar 31 146
considerably for different infectious diseases, but also for the same
Data
All the data shown in Table 1 were collected from an Indian official website [13]. Mar The
16 11 Apr 1 437
disease
All the data
epidemiological
in different
shown in populations
data in from
Table 1 were[12].
March 2, 2020
collected from an Indian official website [13]. The
to April from
2, 2020, as shownofficial
in Table 1, were[13].utilized
All the data shown Table 1 were collected an Indian
epidemiological data from March 2, 2020 to April 2, 2020, as shown in Table 1, were utilized website Mar 17 The 19 Apr 2 235
to estimate R0. Adata
epidemiological higherfrom R0March
indicates a higher
2, 2020 likelihood
to April 2, 2020,of as
new infections.
shown in Table 1, were utilized
to estimate R0. A higher R0 indicates a higher likelihood of new infections.
to estimate R0. A higher R0 indicates a higher likelihood of new infections.
Table 1. Actual coronavirus disease 2019 daily new confirmed cases in India
Table 1. Actual coronavirus disease 2019 daily new confirmed cases in India
Date in
Table 1. Actual New confirmed disease
coronavirus cases 2019 Date
dailyin new confirmed cases in India
Date in New confirmed cases Date New confirmed cases
2 |Date www.e-epih.org
2020 (n) 2020 in New confirmed cases
2020 in New confirmed
(n) cases Date
2020 in
Mar 2 2 Mar 18 New confirmed14 cases
20202
Mar (n)2 2020
Mar 18 14
Mar
Mar 3 1 Mar 19 22
Mar 23 21 Mar
Mar 18 19 14
22
Mar 4 22 Mar 20 50
Kanagarathinam K et al. : Early prediction of COVID19 outbreak in India

Ethics statement vertical blue line indicates the current date (April 3, 2020). The
The analysis in the article is based on data which is open to pub- vertical scale in Figure 4 shows the relative scale of infections. Fig-
lic. The article does not require the ethical committee approval. ure 5 shows the predicted cumulative cases in next 30 days.
We computed that the cumulative number of new cases may
RESULTS AND DISCUSSION reach 39,382 (95% CI, 34,300 to 47,351) in the next 30 days. The
R0 data were estimated based on the existing COVID-19 data
Figure 1 shows the daily incidence of COVID-19 in India from from March 2, 2020 to April 2, 2020. The Indian government has
March 2, 2020 to April 2, 2020. Figure 2 shows the distribution of already announced a nationwide lockdown. As per the WHO in-
likely values of the R0 of COVID-19 in India. We estimated the formation on January 23, 2020, the R0 of COVID-19 lies between
ML value of R0 as 1.471 (95% CI, 1.351 to 1.592) for COVID-19 1.4 and 2.5. Our estimation indicates that for India, the median R0
in the early stage in India. Figure 3 shows a histogram of R0 values value of 1.471 (95% CI, 1.351 to 1.592) is in the lower range.
using the bootstrap strategy with 1,000 likely samples. However, various studies have indicated that precisely estimating
Figure 4 shows the global spread of COVID-19 during the R0 is challenging, because R0 depends on environmental condi-
same period. The vertical gray bars indicate the presence of cases tions, demography, and the modeling method. In our method,
and black dots denote the dates of symptom onset. The dashed the accuracy of R0 depended on the premise that all cases of

400

300
Daily incidence

200

100

0
Mar 2 Mar 9 Mar 16 Mar 23 Mar 30
Date in 2020

Figure 1. Actual daily incidence of coronavirus disease 2019 in India.

R0=1.471
Likelihood

0 2 4 6 8 10
R0

Figure 2. Maximum-likelihood value of reproduction number (R0).

www.e-epih.org | 3
Epidemiol Health 2020;42:e2020028

Sample of likely R0 values

200

150
Frequency

100

50

0
1.40 1.45 1.50 1.55
Values of R0

Figure 3. Sample of likely values of reproduction number (R0).

Global force of infection

2,000

1,500
Infectiousness (lambdas)

1,000

500

0
Mar 1 Mar 15 Apr 1 Apr 15 May 1
Date in 2020

Figure 4. Global spread of infections.

Prediction: new cases in 30 d

150

100
Frequency

50

0
34,000 36,000 38,000 40,000 42,000 44,000 46,000 48,000 (n)
Total no. of new cases

Figure 5. Predicted cumulative new cases in the next 30 days.

4 | www.e-epih.org
Kanagarathinam K et al. : Early prediction of COVID19 outbreak in India

COVID-19 in India were identified in the study period. If the REFERENCES


same scenario continues, we predict that the cumulative number
of new cases may reach 39,382 (95% CI, 34,300 to 47,351) in next 1. World Health Organization. Coronavirus disease 2019 (COVID-19)
30 days. We believe that our forecasting numbers may help in situation report-73 [cited 2020 Apr 3]. Available from: https://
various aspects, such as developing the required medical infra- www.who.int/docs/default-source/coronaviruse/situation-reports/
structure and focusing efforts on mitigating the economic impact 20200402-sitrep-73-covid-19.pdf?sfvrsn= 5ae25bc7_2.
of the pandemic. Our findings were derived based on a limited 2. Ministry of Health and Family Welfare, Government of India.
time frame, and the results may change after the occurrence of a COVID-19 India [cited 2020 Apr 3]. Available from: http://www.
considerable number of additional cases. The R0 value corre- mohfw.gov.in/index.html#.
sponding to the spread of COVID-19 can be controlled by strictly 3. Anderson RM, Heesterbeek H, Klinkenberg D, Hollingsworth
following social distancing in daily life, wearing masks, frequent TD. How will country-based mitigation measures influence the
hand-washing with soap or sanitizers, quarantining infected peo- course of the COVID-19 epidemic? Lancet 2020;395:931-934.
ple, identifying cases using rapid diagnostic methods, and so on. 4. Myers MF, Rogers DJ, Cox J, Flahault A, Hay SI. Forecasting dis-
ease risk for increased epidemic preparedness in public health.
Adv Parasitol 2000;47:309-330.
CONCLUSION
5. Pronker ES, Weenen TC, Commandeur H, Claassen EH, Oster-
We estimated the median value of R0 to be 1.471 (95% CI, 1.351 haus AD. Risk in vaccine research and development quantified.
to 1.592) and predicted that the cumulative number of new cases PLoS One 2013;8:e57755.
may reach 39,382 (95% CI, 34,300 to 47,351) in the next 30 days. 6. Li Q, Guan X, Wu P, Wang X, Zhou L, Tong Y, et al. Early trans-
The predicted size largely depends on changes in R0. Effective mission dynamics in Wuhan, China, of novel coronavirus-infect-
measures against COVID-19 will help to reduce R0. The presence ed pneumonia. N Engl J Med 2020;382:1199-1207.
of numerous unidentified cases in the study period may result 7. Eksin C, Paarporn K, Weitz JS. Systematic biases in disease fore-
uncertainties in the estimated value of R0 used in the developed casting - the role of behavior change. Epidemics 2019;27:96-105.
forecasting model. 8. Pinsent A, Liu F, Deiner M, Emerson P, Bhaktiari A, Porco TC, et
al. Probabilistic forecasts of trachoma transmission at the district
level: a statistical model comparison. Epidemics 2017;18:48-55.
CONFLICT OF INTEREST
9. Funk S, Camacho A, Kucharski AJ, Eggo RM, Edmunds WJ. Re-
The authors have no conflicts of interest to declare for this al-time forecasting of infectious disease dynamics with a stochas-
study. tic semi-mechanistic model. Epidemics 2018;22:56-61.
10. Khan MA, Badshah Q, Islam S, Khan I, Shafie S, Khan SA. Glob-
al dynamics of SEIRS epidemic model with non-linear general-
FUNDING
ized incidences and preventive vaccination. Adv Differ Equ 2015:
None. 88.
11. Yan P, Liu S. SEIR epidemic model with delay. ANZIAM J 2006;
48:119-134.
ACKNOWLEDGEMENTS
12. Dietz K. The estimation of the basic reproduction number for in-
None. fectious diseases. Stat Methods Med Res 1993;2:23-41.
13. Cori A, Ferguson NM, Fraser C, Cauchemez S. A new frame-
work and software to estimate time-varying reproduction num-
AUTHOR CONTRIBUTIONS
bers during epidemics. Am J Epidemiol 2013;178:1505-1512.
Conceptualization: KK. Data curation: KS. Formal analysis: KS. 14. Nishiura H, Linton NM, Akhmetzhanov AR. Serial interval of
Funding acquisition: None. Methodology: KK. Writing – original novel coronavirus (COVID-19) infections. Int J Infect Dis 2020;
draft: KK. Writing – review & editing: KK, KS. 93:284-286.
15. Jombart T, Nouvellet P, Bhatia S, Kamvar ZN. Projections: project
future case incidence; 2018 [cited 2020 Apr 3]. Available from:
ORCID
https://cran.r-project.org/web/packages/projections/index.html.
Karthick Kanagarathinam: https://orcid.org/0000-0001-7755-
5715; Kavaskar Sekar: http://orcid.org/0000-0003-4735-0537

www.e-epih.org | 5

You might also like