Professional Documents
Culture Documents
24, 2020.
Digital Object Identifier 10.1109/ACCESS.2020.2979599
ABSTRACT The 2019 novel coronavirus (2019-nCoV) outbreak has been treated as a Public Health
Emergency of International Concern by the World Health Organization. This work made an early prediction
of the 2019-nCoV outbreak in China based on a simple mathematical model and limited epidemiological
data. Combing characteristics of the historical epidemic, we found part of the released data is unreasonable.
Through ruling out the unreasonable data, the model predictions exhibit that the number of the cumulative
2019-nCoV cases may reach 76,000 to 230,000, with a peak of the unrecovered infectives (22,000-74,000)
occurring in late February to early March. After that, the infected cases will rapidly monotonically decrease
until early May to late June, when the 2019-nCoV outbreak will fade out. Strong anti-epidemic measures
may reduce the cumulative infected cases by 40%-49%. The improvement of medical care can also lead to
about one-half transmission decrease and effectively shorten the duration of the 2019-nCoV.
INDEX TERMS Epidemic transmission, infection rate, mathematical model, novel coronavirus, prediction,
removal rate.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
VOLUME 8, 2020 51761
L. Zhong et al.: Early Prediction of the 2019 Novel Coronavirus Outbreak in the Mainland China
II. DATA AND METHODS drop Eq. (3) and lead to a single-equation system, i.e., Eq. (2).
A. DATA In the finite-difference form, Eq. (2) can be discretized as
The 2019-nCoV data used in this study has several
I (t + 1t) = I (t) + (β − γ ) I (t) 1t, (4)
sources, including: (1) Wuhan Municipal Health Commis-
sion (http://wjw.wuhan.gov.cn/), providing infective data in where 1t is the time interval of numerical integration, and
Wuhan from December 31, 2019 to January 19, 2020; the constant S is combined into the infection rate β with the
(2) Health Commission of Hubei Province (http://wjw.hubei. definition of
gov.cn/), providing the latest epidemiological data in Hubei
Province from January 20, 2020 up to the present, includ- β (t) = [I (t + 1t) − I (t)] / [I (t) 1t] (5)
ing cumulative confirmed cases, deaths, suspected cases,
And, the removal rate γ is similarly defined as
and cures in cities of Hubei Province; (3) National
Health Commission of the People’s Republic of China γ (t) = [R (t + 1t) − R (t)] / [I (t) 1t] (6)
(http://www.nhc.gov.cn/), providing the latest epidemiologi-
cal data of China from January 20, 2020 up to the present. The In Eq. (4), the two parameters, β and γ , need to be set before
data of SARS is obtained from WHO (https://www.who.int/). performing model prediction. In real epidemic transmission,
It provides epidemiological data all over the world from the infection rate β is a time-varying variable that can be
March 17, 2003 to July 11, 2003 [23]. statistically estimated via fitting the epidemiological data.
In principle, the parameter γ can be also estimated in the
B. SIMPLIFIED SIR MODEL similar way. Equation (4) with the definitions of β and γ can
Under the assumption of no change of population due to other be temporally integrated forward. Here, we use MATLAB to
causes and considering a disease that confers immunity after realize the simple numerical computation of Eq. (4). Based
recovery, we can divide the population into three distinct on the prediction of I , the removed infectives can be obtained
classes: the susceptibles (S), namely the healthy population by γ I (t). With the numbers of I and the removed infectives,
vulnerable to infection; the infectives (I), the infected popu- we can further obtain the cumulative number of infected cases
lation; and the removed infectives (R), the population has no by simply summing I (t) and γ I (t).
transmissibility including the recovered and dead infectives.
The population of the three classes is governed by the follow- C. METHOD OF PARAMETER ESTIMATION
ing system of nonlinear ordinary differential equations [19]: As was mentioned in previous section, the 2019-nCoV and
SARS belongs to the same family of coronavirus. Here,
dS/dt = −βS (t) I (t) (1) we assume the two kinds of viruses both follow the basic
dI /dt = βS (t) I (t) − γ I (t) (2) rule of epidemic transmission. From the knowledge of epi-
dR/dt = γ I (t) (3) demic transmission [22], the variable I follows a bell-shaped
function, i.e., increasing from zero to a turning point and then
where t is time (in day); β the infection rate, i.e., the infected decreasing to a stable value; the variable S and R, respectively,
ratio by one infective during unit time; and γ the removal follow the monotonically decreasing and increasing functions
rate, i.e., the ratio of the removed number to the number of the until reaching their stable states. This requires the parameter
infectives. Equations (1-3) are coupled through the two right- β and γ satisfying a monotonically decreasing and increasing
hand-side (RHS) terms: βS (t) I (t), i.e., the newly increased function, respectively. So, this is one key criterion to fit the
infectives; and γ I (t), i.e., the newly removed infectives. For parameters in practice.
Eq. (1), the solution of S generally follows a function mono- The infection rate (β) and removal rate (γ ) can be obtained
tonically decreasing to a stable final value until the epidemic statistically or empirically. For the early prediction discussed
fading out [11]. The initial value of S is given empirically here, we should firstly evaluate the reliability of the available
according to the community population affected by the dis- data with combing the knowledge of the historical epidemic,
ease. But, the 2019-nCoV outbroke in Wuhan, a big city with such as SARS in 2003. To do this, several subsets of the
population over ten million. The high population density and epidemiological data are extracted by resampling from the
frequent population mobility made it hard to exactly estimate available data. We tried to identify the reasonable subsets of
the susceptible population in Wuhan, let alone the whole epidemiological data for the model prediction via evaluating
mainland China. Therefore, we assume that the infected pop- the reasonability of the fitted infection rate (β) from different
ulation of the 2019-nCoV can be omitted with compared subsets of data. In the objective analysis of data, there are two
to the huge susceptible population in China. That is to say, criterions used to rule out the unrealistic dataset. One is that
we can treat the variable S as a large constant with compared the fitted parameter β should be monotonically decreasing
to the variable I . With this assumption, Eq. (1) can thus with time, otherwise, the modelled epidemic will not stop
be dropped out from the coupled system Eq. (1-3). For the unless all the susceptibles become infectives. This extreme
rest two-equation system, i.e., Eqs. (2-3), the change of the situation is particularly impossible for the epidemic spread
infectives (I ) is thought to be more important for epidemic over an area like China, which has large spatial scale and
prediction than the removed infectives (R). So, we can further huge susceptible population. The other rule-out criterion is
the fitted β exhibits unrealistically sharp decrease, which TABLE 1. The epidemiological data of the 2019-nCoV: the cumulative
number of the infected individuals (I), the cumulative removed infectives
will let model significantly underestimate the epidemic trans- (R), the infection rate (β ) and the removal rate (γ ). The Removed
missibility and predict much less infectives than the real infectives include the dead and cured infectives.
data. Through investigating the characteristics of the fitted
infection rates based on different data subsets, we can obtain
the reasonable estimates of β according to the above two
criterions.
For the parameter γ , it generally varies slowly at the initial
stage of the epidemic outbreak because most infectives have
not yet reached the recovery stage. So, the fitted γ based on
the early epidemiology data is bound to be underestimated,
and thus cause unrealistically long duration of the epidemic
spread. Thus, we here treated the parameter γ as a constant
with referring to the removal rate computed from the real data.
After we obtaining the reasonable ranges of the two param-
eters, we can out sensitivity experiments with respect to the
parameter intervals to find the prediction intervals of the
infectives and associated variables.
III. RESULTS
Although the 2019-nCoV is a member of the coronavirus, the
2019-nCoV still shows some different characteristics from
SARS. Figure 1a shows the number of infectives (I ) and
the SIR prediction of the SARS over Mainland China during
10 April 2003 to 24 June 2003 [23]. The simplified SIR model
(Eq. (4)) well depicts the propagation of the SARS in 2003.
In contrast, the 2019-nCoV shows a pulse-like increase of
infected cases after day 10. As shown by Table 1, the epidemi-
ological data that can be used by the model was released since
11 January 2020 although some cases of unknown pneumonia
have been identified earlier. During 11 January −15 January,
the cumulative infectives keeps unchanged at 41 cases, i.e.,
zero infection rate (β) in the first 5 days. But, on 18 January,
the infectives (121 cases) is almost doubled, which causes In order to predict the future infective number, the infection
a pulse-like infection rate of 1.44 per day. The huge change rate is often specified as a constant or a time-varying analyt-
of the infection rate seems not to be explained as the natural ical function. For SARS, 2019-nCoV and other epidemics,
variation of the epidemic. The sharp increase of the confirmed treating the infection rate as a time-varying variable [26] can
infected cases at such a short time may be attributable to capture the dynamical process of the epidemic transmission,
the improvements of the government emphasis on the dis- including the natural processes and the human intervention.
ease and the diagnosis technique. Therefore, the data before As the fitted line shown in Fig. 1b, the infection rate of
18 January should be used with caution. Since then, all the the SARS in 2003 well satisfies a monotonically decreasing
provinces of China began to continuously release and update exponential function, particular after day 20. This analytical
the epidemiological data. It is supposed that the data after function of β is then substituted into Eq. (4) to make epidemic
that day is more reliable to reflect the characteristic of the prediction. Figure 1a shows that the predicted the evolution
2019-nCoV. of the infective number of 2003-SARS is highly consistent
As shown by Table 1 and Fig. 1b, the infection rate of the with the real data (Fig. 1a).
2019-nCoV is over 0.5 per day in most of time after 17 Jan- For the 2019-nCoV, considering the poor quality of the
uary. That means every two 2019-nCoV infectives can infect epidemiological data of the 2019-nCoV before 18 January,
at least one person per day. This rate is more than the doubled we extracted several subsets from the available epidemio-
infection rate of the SARS in 2003 (Fig. 1b). Therefore, from logical data shown in Table 1 via sequentially excluding the
the initial epidemiological characteristics, the 2019-nCoV data between 11 January to 20 January in one-day interval.
virus exhibited a much stronger infectivity than the SARS Thus, we can obtain the ten subsets successively named by
virus. This feature was also reported by some latest model the first day of the extracted data, i.e., t0 = Jan − 11,
studies of 2019-nCoV [24], [25]. As was mentioned in the t0 = Jan − 12, . . . , t0 = Jan − 20. The dataset t0 =
above, the infection rate is a key parameter determining the Jan − 11 means all of the released data since 11 January is
prediction of the epidemic spread based on the SIR model. extracted, and t0 = Jan − 20 for the data after 19 January.
TABLE 2. The prediction intervals of the 2019-nCoV. The mean model IV. DISCUSSION
prediction with its lower and upper bounds are shown. The mean
prediction is the arithmetic mean of the experiments shown in Figs. 3b, c. It is hard to accurately predicting the epidemic evolution
The date and the infective number at inflection point are listed in the based on limited data, especially in the condition of lack-
2nd -3rd column. The 4th and 5th column, respectively, show the date and
the cumulative number at the fading-out point, which is defined as the
ing reliable data. Although the 2019-nCoV outbreak can be
infected population lower than 1,000. traced back to late December 2019 (perhaps earlier), the sys-
tematically released epidemiological data is only available
after 11 January 2020, among which the reasonability of the
data before 18 January 2020 is still unconvinced. So, until
the authors finished this work, the reliable data only covers
no more than two weeks, which may lead to large uncer-
tainty in the early prediction of the 2019-nCoV outbreak.
But, the effort of the early prediction is still meaningful.
Mathematical models of different complexities have been
proved to be effective in predicting the evolution of epidemic
outbreak. But, the more complex the structure of the model,
outbreak under the same γ . However, the increase of γ the more parameters are needed to be determined. Under
is more effective in prohibiting the infected number. It is the condition of lacking epidemiological data, it is hard to
estimated from the model predictions that every increase of objectively determine all the parameters. Too many unknown
the removal rate in 1γ = 0.01 (about 16%-20%) will lead parameters will bring large uncertainties in the model predic-
to 50%-60% decrease of infected cases (Fig. 3b), and 40%- tion. Based on the point, this work formulated a simplified
50% decrease of cumulative cases (Fig. 3c). Furthermore, SIR model with the least parameters, i.e., the infection rate
large removal rate or high-level medical care significantly and the removal rate, to reduce the uncertainty as much as
shortens the duration of the epidemic outbreak. Under the possible. The model shows good ability in hindcasting the
t0 = 19 − Jan function of β, the model with γ = 0.06 spreading of SARS in 2003. So, we further applied this model
predicts the 2019-nCoV will fade out in late May. But under to the 2019-nCoV.
γ = 0.07 and the same β function, that fading-out time is Through eliminating the unreliable data via objective
advanced to early May (Fig. 3b). This reflects the tendency analysis, we provided epidemic predictions under different
of the epidemic is highly sensitive to the medical-service scenarios with respect to different-level anti-epidemic mea-
level. High-level medical care can significantly prohibit the sure and medical care represented by the two model param-
propagation of the epidemic situation. eters, i.e., infection rate and removal rate. The predictions
Considering all the sensitivity experiments with respect are supposed to be a helpful guide to the decision mak-
to β and γ , the prediction intervals of the 2019-nCoV are ing in coping with the ongoing 2019-nCoV transmission in
concluded in Table 2. The inflection point of the infected China. The strictness of the current quarantine measures and
case variation is a key indicator for the epidemic transmis- infection control precautions employed by Chinese Govern-
sion monitor. In theory, the inflection point of I (t) can be ment is historically unprecedented. So, as predicted by this
obtained by simply setting dI /dt = 0 in Eq. (2), i.e., the work, the control measures should pay more attention to the
time point satisfying βS(t) = γ . Recalling the infection medical-service aspects, such as accelerating the diagnos-
rate is a monotonically decreasing function, the turning point tic speed and enhancing the hospitalization capacity. If all
of I (t) is thus determined by the variation of the magnitude the above efforts get the cumulative infected cases down
contrast between the infection rate and the removal rate. From to below about 80,000 until late February (Fig. 3c), the
the experiments under different sets of β and γ (Fig. 3b severity of the 2019-nCoV may be controlled at the rela-
and Table 2), the inflection point of 2019-nCoV will occur tively low level finally. The sensitivity of the simplified-
in late February to early March, when the number of the model prediction to the parameters also emphasized the
unrecovered infectives will reach its peak value of about importance of the openness and transparency in releasing
43,000 cases with the variation interval between 22,000 and the data relevant to the public health. As the progress-
74,000. After the inflection point, the number of the infected ing of the 2019-nCoV, more epidemiological data will be
cases will decrease rapidly until the epidemic has faded out available to verify and revise this early prediction of the
in late April to late June. On the whole, the 2019-nCoV 2019-nCoV.
epidemic may persist three to five months. From Fig. 3c
and Table 2, the final cumulative infected case will reach REFERENCES
about 140,000 varying in the interval of 76,000-230,000, [1] (2020). Wuhan Municipal Health Commission Infection Data. [Online]
which reflects three-fold difference between the most opti- Available:http://wjw.wuhan.gov.cn/front/web/list2nd/no/710
mized measure and the worst one. That is to say, the above [2] K. O. Kwok, A. Tang, V. W. Wei, W. H. Park, E. K. Yeoh, and S. Riley,
prediction intervals are strongly determined by the anti- ‘‘Epidemic models of contact tracing: Systematic review of transmission
studies of severe acute respiratory syndrome and middle east respira-
epidemic measures and the medical-service level against the tory syndrome,’’ Comput. Struct. Biotechnol. J., vol. 17, pp. 186–194,
2019nCoV. Jan. 2019.
[3] World Health Organization, WHO Statement on the second meeting [25] Q. Li, X. H. Guan, and P. Wu, ‘‘Early transmission dynamics in Wuhan,
of the International Health Regulations. (2005). Emergency Committee China, of novel coronavirus–infected pneumonia,’’ New England J. Med.,
Regarding the Outbreak of Novel Coronavirus (2019-nCoV). Geneva, to be published, doi: 10.1056/nejmoa2001316.
Swiss. Accessed: 2020. [Online] Available:https://www.who.int/news- [26] N. E. Huang and F. Qiao, ‘‘A data driven time-dependent transmission rate
room/detail/30-01-2020-statement-on-the-second-meeting-of-the- for tracking an epidemic: A case study of 2019-nCoV,’’ Sci. Bull., to be
international-health-regulations-(2005)-emergency-committee-regarding- published, doi: 10.1016/j.scib.2020.02.005.
the-outbreak-of-novel-coronavirus-(2019-ncov)
[4] (2020). World Health Organization, WHO Novel Coronavirus-Thailand
(ex-China).Geneva, Swiss. Accessed: Jan. 14, 2020. [Online]. Available:
http://www.who.int/csr/don/14-january-2020-novel-coronavirus-
thailand/en/
[5] (2020). World Health Organization, WHO Novel Coronavirus-Japan LINHAO ZHONG was born in Jiangxi, China,
(ex-China). Geneva, Swiss. Accessed: Jan. 17, 2020. [Online]. Available: in 1977. He received the B.S. and Ph.D. degrees
http://www.who.int/csr/don/17-january-2020-novel-coronavirus-japan- from the College of Physical and Environmen-
ex-china/en/ tal Oceanography, Ocean University of China,
[6] (2020). World Health Organization, WHO Novel Coronavirus-Republic in 2000 and 2005, respectively.
of Korea (ex-China). Geneva, Swiss. Accessed: Jan. 21, 2020. From 2005 to 2007, he was a Research Assistant
[Online]. Available: http://www.who.int/csr/don/21-january-2020- with the Institute of Atmospheric Physics, Chinese
novel-coronavirus-republic-of-korea-ex-china/en/ Academy of Sciences. Since 2008, he has been
[7] Centers for Disease Control and Prevention. (2020). CDC First Travel- an Associate Research Fellow of the Institute of
Related Case of 2019 Novel Coronavirus Detected in United States. Atmospheric Physics. He is the author of more
Accessed: Jan. 21, 2020. [Online]. Available: https://www.cdc.gov/media/ than 30 articles. His research interests include numerical methods, physical
releases/2020/p0121-novel-coronavirus-travel-case.html oceanography, and climate change. He has deep understanding and experi-
[8] R. M. Anderson, ‘‘The pandemic of antibiotic resistance,’’ Nature Med., ence in numerical model construction. He independently constructed multi-
vol. 5, pp. 147–149, Feb. 1999. layer numerical models for ocean circulation, meso-scale vortex simulation,
[9] J. Koopman, ‘‘Modeling infection transmission,’’ Annu. Rev. Public and atmospheric water cycle. He has strong ability in statistical analysis for
Health, vol. 25, pp. 303–326, Apr. 2004.
climate data.
[10] S. A. Levin, B. Grenfell, A. Hastings, and A. S. Perelson, ‘‘Mathematical
and computational challenges in population biology and ecosystems sci-
ence,’’ Science, vol. 275, pp. 334–343, Jan. 1997.
[11] T. W. Ng, G. Turinici, and A. Danchin, ‘‘A double epidemic model
for the SARS propagation,’’ BMC Infectious Diseases, vol. 3, p. 19,
Sep. 2003. LIN MU received the B.S., M.S., and Ph.D.
[12] W. O. Kermack and G. Anderson McKendrick, ‘‘Contributions to the degrees in physical oceanography from the Ocean
mathematical theory of epidemics—I. 1927,’’ Bull. Math. Biol., vol. 53, University of China, Qingdao, China, in 2000,
pp. 33–55, Jan. 1991. 2002, and 2007, respectively. He is currently a Pro-
[13] N. T. J. Baily, The Mathematical Theory of Infectious Diseases, 2nd ed. fessor of physical oceanography with the College
New York, NY, USA: Hafner, 1975. of Life Sciences and Oceanography, Shenzhen
[14] R. R. Tang, ‘‘The singularly perturbed boundary value problem of non- University, Guangdong, China. He has authored
linear integro-differential system,’’ Ann. Differ. Equ., vol. 4, pp. 407–412, or coauthored over 20 scientific articles and
Dec. 2004. four books. His research interests include phys-
[15] M. Iannelli, M. Martcheva, and X. Z. Li, ‘‘Strain replacement in an epi- ical oceanography, prevention and mitigation of
demic model with super-infection and perfect vaccination,’’ Math. Biosci., marine disasters, maritime search and rescue, and emergency response man-
vol. 195, no. 1, pp. 23–46, 2005. agement of offshore oil spills.
[16] J. Liu, Y. Tang, and Z. R. Yang, ‘‘The spread of disease with birth and
death on networks,’’ J. Stat. Mech., Theory Exp., vol. 2004, no. 8, 2004,
Art. no. P08008.
[17] Dye., C. Epidemiology: Modeling the SARS Epidemic. Science, 2003,
vol. 300, no. 5627, pp. 1884–1885.
[18] M. Lipsitch, ‘‘Transmission dynamics and control of severe acute JING LI received the M.S. degree in marine sci-
respiratory syndrome,’’ Science, vol. 300, no. 5627, pp. 1966–1970, ence from Shanghai Ocean University, Shanghai,
2003. China, in 2016. She is currently pursuing the Ph.D.
[19] X. N. Han, S. J. De Vlas, L. Q. Fang, D. Feng, W. C. Cao, and degree in marine science with the China Univer-
J. D. F. Habbema, ‘‘Mathematical modelling of SARS and other infec- sity of Geosciences, Wuhan, China. Her research
tious diseases in China: A review,’’ vol. 14, no. s1, pp. 92–100, 2009, areas are ocean-atmosphere interaction and Arctic
doi: 10.1111/j.1365-3156.2009.02244.x. climate change.
[20] D. Watts and S. H. Strogatz, ‘‘Collective dynamics of ‘small-world’ net-
works,’’ Nature, vol. 393, no. 6684 pp. 440–442, 1998.
[21] R. Pastor-Satorras and A. Vespignani, ‘‘Epidemic spreading in
scale-free networks,’’ Phys. Rev. Lett., vol. 86, pp. 3200–3203,
Apr. 2001.
[22] K. C. Ang, ‘‘A simple stocchastic model for an eepidemic-numerical
eperiments with MATLAB,’’ Electron. J. Math. Technol., vol. 1, no. 2,
pp. 117–128, 2007. JIAYING WANG received the B.S. degree in
[23] World Health Organization. (May 6, 2003). WHO The Cumulative marine science from the Ocean University of
Number of Reported Probable Cases of Severe Acute Respi- China, Qingdao, China, in 2019. She is currently
ratory Syndrome (SARS). Geneva, Swiss. [Online]. Available: pursuing the M.S. degree in marine science with
https://www.who.int/csr/sarscountry/2003_05_06/en/ the China University of Geosciences, Wuhan,
[24] J. T. Wu, K. Leung, and G. M. Leung, ‘‘Nowcasting and forecast- China. Her research area is physical oceanography.
ing the potential domestic and international spread of the 2019-nCoV
outbreak originating in Wuhan, China: A modelling study,’’ Lancet,
vol. 395, no. 10225, pp. 689–697, Feb./Mar. 2020, doi: 10.1016/S0140-
6736(20)30260-9.
ZHE YIN received the B.S. degree in surveying DARONG LIU received the B.S. degree in engi-
and mapping engineering from the China Univer- neering from the China University of Geosciences,
sity of Geosciences, Wuhan, China, in 2018, where Wuhan, China, where he is currently pursuing the
he is currently pursuing the M.S. degree in geo- Ph.D. degree in marine science. During the under-
logical engineering. His research areas are marine graduate study, his research interests include com-
geographical information systems, satellites data puter science, three-dimensional modeling, and
processing, and marine surveying and mapping. numerical simulation in geoscience. In 2018, his
research turns to the numerical simulation in phys-
ical oceanography, machine learning, and neural
networks.