Professional Documents
Culture Documents
Prediction Model For Covid-19 Cases in Indonesia U
Prediction Model For Covid-19 Cases in Indonesia U
yana.cahyana@ubpkarawang.ac.id
Abstract
The World Health Organization on March 11, 2020, declared Coronavirus Disease 2019 (Covid-19)
a pandemic. Covid-19 is a disease caused by a new type of coronavirus, namely Sars-CoV-2, which
affects the respiratory system. Until now, positive confirmed cases of Covid-19 in Indonesia are still
occurring every day. This study aims to predict the addition of Covid-19 cases in Indonesia. The
data is sourced from the public API page covid19.go.id in the form of an additional number of
Covid-19 cases in Indonesia by 122 lines of data. Predictions are made using linear regression and
polynomial regression methods as comparisons. Evaluation of the linear regression method
Article info
obtains a value of R2 = 0.57, while the polynomial regression method obtains a value of R2 = 0.84.
Submitted: Based on these evaluations, the polynomial regression method yields better results than the linear
2022-12-14
regression method. The prediction of Covid-19 cases in Indonesia from January to March 2022
Revised: using the polynomial regression method predicts that the addition of Covid-19 cases will rise again.
2023-02-01
Accepted: Keywords: Covid-19, Prediction, Linear Regression, Regression Polynomial
2023-03-13
1. Introduction
On March 11, 2020, Coronavirus Disease 2019 (Covid-19) was deemed an epidemic by the
World Health Organization. A novel coronavirus is the source of the illness known as Covid-19, Sars-
This work is licensed under
a Creative Commons CoV-2, which interferes with the respiratory system, causing inflammation of the lungs [1]. This
Attribution-NonCommercial 4.0 disease first appeared in early December 2019 in Wuham City, China. Meanwhile, in Indonesia,
International License Covid-19 was first detected on March 2, 2020 [2]. Since then, positive confirmed cases of Covid-19
have continued to increase every day, along with the increasing number of interactions in the
Publisher community and the easing of policies made by the government related to activities during the
Edutran Academic Publisher Covid-19 pandemic [3] [4].
Data regarding the condition of Covid-19 in Indonesia can undoubtedly be used as material for
consideration in determining policies carried out by the government because, in today's technological
developments, data is a very important and inseparable part of retrieving information [5]. Information
related to future cases of Covid-19 can be obtained by a data mining process based on data that has
occurred before. Data mining is an activity to analyze data from different points of view and conclude
it into important information or knowledge [6] [7]. One application of data mining is to predict an
event in the future based on data that has occurred with specific methods. Several methods can
be used to make a forecast or prediction, namely Linear Regression and Polynomial Regression.
Linear Regression is a statistical analysis that models the relationship of several variables according
to the form of an explicit linear equation relationship [8]. Meanwhile, Polynomial Regression is a
linear regression model formed by adding the influence of each predictor variable (X) raised to the
k-order [9]. One way to describe polynomial regression is as a special case of linear regression.
When using linear regression, the two factors (the target variable and the independent variable)
must be correlated and the continuous data must be known. [10].
Another study using the same method also obtained very good results in having a good
knowledge on fate and transport of the pollutants to properly manage river water quality [11].
Meanwhile, research using the polynomial regression method can work well in predicting daily
cases of Covid-19 in DKI Jakarta [12]. The use of the polynomial regression method in Prediction
Model to Predict the Compressive Strength of Eco-Friendly Concrete [13]. Based on the problems
that have been described and research that has proven that the linear regression and polynomial
regression methods get results in good categories in making forecasts or predictions. So in this
study, predictions were made for Covid-19 cases in Indonesia using linear and polynomial
regression methods.
2. Methods
2.1. Covid-19
Covid-19 is an inflammatory disease of the lungs that occurs when a person is infected with the
coronavirus. There are many kinds of corona viruses and what can cause a person to get Covid -19 is the
type of SARS Corona Virus-2. This virus has a very small size, ranging from 50 s.d. to 200 nanometers
[14].
2.2. Prediction
Prediction or forecasting is an activity to predict future events using specific scientific methods or
approaches. The data used to make a prediction or forecast is usually quantitative data. Evaluation of a
prediction model can be done by calculating the value of the Coefficient of Determination (R2). The
coefficient of determination is a statistical measure that indicates the influence of the independent or
independent variable (X) on the dependent or dependent variable (Y). The coefficient of determination
is between 0 and 1. If the value of R2 is greater or closer to 1, the stronger the influence of the
independent variable (X) on the dependent variable (Y). According to Niclas et al [15], the general form
of the coefficient of determination can be seen in Equation (1) below.
𝑆𝑆𝑅 ∑( y′ − y )2
R2 = = ∑( (1)
𝑆𝑆𝑇 y − y )2
Where SSR is the sum of the squares of the difference between the predicted value (y ′ ), and
the average actual value (y) , while SST is the sum of the squares of the difference between the
actual value (y) and the average actual value (y).
𝑛(∑𝑥𝑦)−(∑𝑥)(∑𝑦)
b= (3)
𝑛(∑𝑥 2 )−(∑𝑥)2
y = a + bx (4)
Where y is the number of cases, x is the time period, n is the amount of data, a is the intercept and
b is the slope.
Where X is the independent variable, Y is the predicted variable, b0 is the intercept, b1, b2,.., bn is
the slope or regression coefficients, and n is the degree or rank of the polynomial [20].
Transformation of data on the time attribute into a time index based on the value of the time
attribute minus the minimum value of the time attribute so that data can be processed when
making models shown in Table 2.
Model creation is carried out using the Linear Regression and Polynomial Regression methods
using python. Time index attribute as X variable and positive number as Y variable. Figure 2 shows
the Making of a Linear Regression Model.
Figure 2.
Making a Linear
Regression Model
Table 3. Time x y xy X2
Manual Linear Regress
Calculations 01/09/2021 0 10337 0 0
02/09/2021 1 8955 8955 1
03/09/2021 2 7797 15594 4
04/09/2021 3 6727 20181 9
… … … … …
… … … … …
29/12/2021 119 194 23086 14161
30/12/2021 120 189 22680 14400
31/12/2021 121 180 21780 14641
Jumlah 7381 172919 4081947 597861
(∑𝑦)(∑x2 )−(∑𝑥)(∑𝑥𝑦)
a= 𝑛(∑𝑥 2 )−(∑𝑥)2
(172919∗597861)−(7381∗4081947)
= (122∗597861)−(73812 )
= 3968,21
𝑛(∑𝑥𝑦)−(∑𝑥)(∑𝑦)
b= 𝑛(∑𝑥 2 )−(∑𝑥)2
(122∗4081947)−(7381∗172919)
= (122∗597861)−(73812 )
= -42,16
So, the formula for finding the predicted value in the Linear Regression model is as follows.
y = 3968,21 – 42,16𝑥
Figure 3.
Creating a Regression
Polynomial Model
𝑛 ∑𝑥 ∑𝑥 2 ∑𝑥 0 𝑦
[ ∑𝑥 ∑𝑥 2 ∑𝑥 ]= [∑𝑥1 𝑦]
3
∑𝑥 2 ∑𝑥 3 ∑𝑥 4 ∑𝑥 2 𝑦
Solving using the Gauss Elimination method:
1 0 0 6196,33217
0 1 0 -153,56879
0 0 1 0,92071184
b0 = 6196,33217
b1 = -153,56879
b2 = 0,92071184
Edutran Computer Science and Information Technology, Vol.1 No.1 5
(2023)
Amid Rakhman, et al.
So, the formula for finding the predicted value in the Regression Polynomial model is as follows.
y = 6196,33 - 153,57𝑥 + 0,92𝑥 2
Model testing is done on the data by making predictions using the equations in the known Linear
Regression and Polynomial Regression Models shown in Table 5. Then for Linear Regression Model
Visualization and Polynomial Regression showed in Figure 4.
Figure 4.
Visualization of Linear
Regression and Polynomial
Regression Models
Evaluate the Linear Regression and Polynomial Regression Models by calculating the
coefficient of determination (R2) value shown in Figure 5.
Figure 5.
Evaluation of Linear
Regression and Polynomial
Regression Models
Based on the coefficient of determination (R2) value in both models, the model using the
Linear Regression method gets a value of R2 = 0.57, while the model using the Polynomial
Regression method gets a value of R2 = 0.84. So it can be stated that the model using the Polynomial
Regression method can predict better than the Linear Regression method. Figure 6 shows the
Prediction Results for January to March 2022.
Figure 6.
Visualization of
Prediction Results for
January to.d. March 2022
4. Conclusion
Based on the research results and discussion, it can be concluded that the Polynomial
Regression method gets a better value than the Linear Regression method. Evaluation of the
Regression method gets a value of R2 = 0.57, while the Polynomial Regression method gets a value
of R2 = 0.84. Prediction results for Covid-19 cases in Indonesia from January to. In March 2022,
using the Regression Polynomial, it is predicted that the addition of Covid-19 cases will rise again.
Authors' Declaration
Authors’ contributions and responsibilities – The authors made substantial contributions to the
conception and design of the study. The authors took responsibility for data analysis,
interpretation, and discussion of results. The authors read and approved the final manuscript.
Funding – No funding information from the authors.
Availability of data and materials – All data are available from the authors.
Competing interests – The authors declare no competing interest.
Additional information – No additional information from the authors.
References
[1] S. Saidi, S. Saidi, N. Herawati, and K. Nisa, “Modeling with generalized linear model on covid-
19: Cases in Indonesia,” Int. J. Electron. Commun. Syst., vol. 1, no. 1, pp. 25–32, 2021.
[2] U. Mukhaiyar, D. Widyanti, and S. Vantika, “The time series regression analysis in evaluating
the economic impact of COVID-19 cases in Indonesia,” Model Assist. Stat. Appl., vol. 16, no.
3, pp. 197–210, Aug. 2021, doi: 10.3233/MAS-210533.
[3] M. Dong, C. Tang, J. Ji, Q. Lin, and K. C. Wong, “Transmission trend of the COVID-19
pandemic predicted by dendritic neural regression,” Appl. Soft Comput., vol. 111, p. 107683,
2021, doi: 10.1016/j.asoc.2021.107683.
[4] K. K.M.U.B, “Forecasting COVID -19 Outbreak in the Philippines and Indonesia,” J. New
Front. Healthc. Biol. Sci., vol. 2, no. 1, pp. 1–19, 2021, [Online]. Available:
https://imathm.edu.lk/files/documents/file_name/fb67096c-3453-4247-b427-
cbdff75a266e/JNFHBS _2_1_ 2021_1-19.pdf
[5] A. Y. Paulindino, E. Selvano, P. K. Maryanto, and W. Budiharto, “Covid-19 Forecasting in
Indonesia Using Prophet Model,” ICIC Express Lett. Part B Appl., vol. 13, no. 2, pp. 211–218,
2022, doi: 10.24507/icicelb.13.02.211.
[6] A. Lia Hananto, B. Priyatna, A. Fauzi, A. Yuniar Rahman, Y. Pangestika, and Tukino, “Analysis
of the Best Employee Selection Decision Support System Using Analytical Hierarchy Process
(AHP),” J. Phys. Conf. Ser., vol. 1908, no. 1, 2021, doi: 10.1088/1742-6596/1908/1/012023.
[7] S. S. Hananto, A. L., Assiroj, P., Priyatna, B., Fauzi, A., Rahman, A. Y., & Hilabi, “Analysis of
Drug Data Mining with Clustering Technique Using K-Means Algorithm. In Journal of Physics:
Conference Series,” IOP Publ., vol. 1908, no. 1, 2021.
[8] S. T. A. Shah, A. Iftikhar, M. I. Khan, M. Mansoor, A. F. Mirza, and M. Bilal, “PREDICTING
COVID-19 INFECTIONS PREVALENCE USING LINEAR REGRESSION TOOL,” J. Exp. Biol. Agric.
Sci., vol. 8, p. 2020, 2020, [Online]. Available: http://www.horizonpublisherindia.in/]
[9] S. Shaikh, J. Gala, A. Jain, S. Advani, S. Jaidhara, and M. R. Edinburgh, “Analysis and
Prediction of COVID-19 using Regression Models and Time Series Forecasting,” Proc. Conflu.
2021 11th Int. Conf. Cloud Comput. Data Sci. Eng., pp. 989–995, 2021, doi:
10.1109/Confluence51648.2021.9377137.
[10] E. Gambhir, R. Jain, A. Gupta, and U. Tomer, “Regression Analysis of COVID-19 using
Edutran Computer Science and Information Technology, Vol.1 No.1 7
(2023)
Amid Rakhman, et al.
Machine Learning Algorithms,” in 2020 International Conference on Smart Electronics and
Communication (ICOSEC), Sep. 2020, pp. 65–71. doi: 10.1109/ICOSEC49089.2020.9215356.
[11] M. R. Balf, R. Noori, R. Berndtsson, A. Ghaemi, and B. Ghiasi, “Evolutionary polynomial
regression approach to predict longitudinal dispersion coefficient in rivers,” J. Water Supply
Res. Technol. - Aqua, p. jws2018021, Jun. 2018, doi: 10.2166/aqua.2018.021.
[12] R. Rory and R. Diana, “Modeling of COVID-19 data using local polynomial regression,” Semin.
Nas. Off. Stat. 2020, vol. 2, pp. 91–98, 2020.
[13] H. Imran, N. M. Al-Abdaly, M. H. Shamsa, A. Shatnawi, M. Ibrahim, and K. A. Ostrowski,
“Development of Prediction Model to Predict the CompressiveStrength of Eco-Friendly
Concrete Using MultivariatePolynomial Regression Combined with Stepwise Method,”
Materials (Basel)., vol. 15, no. 1, 2022, doi: 10.3390/ma15010317.
[14] Yayan Sat, “Do Human Restriction Mobility Policy in Indonesia effectively reduce the Spread
of Covid-19,” no. July, 2020.
[15] N. Rollborn et al., “Accuracy of determination of free light chains (Kappa and Lambda) in
plasma and serum by Swedish laboratories as monitored by external quality assessment,”
Clin. Biochem., vol. 111, pp. 47–53, Jan. 2023, doi: 10.1016/j.clinbiochem.2022.10.003.
[16] H. Zhang et al., “Revealing the influence of oxygen-containing functional groups on mercury
adsorption via density functional theory and multiple linear regression analysis,” Fuel, vol.
335, p. 127040, Mar. 2023, doi: 10.1016/j.fuel.2022.127040.
[17] B. Hakim and A. Fauzi, “Indonesian Covid-19 Prevention Policies Analysis Using Cumulative
Cases Data Regression,” OISAA J. Indones. Emas, vol. 4, no. 1, pp. 28–33, 2021, doi:
10.52162/jie.2021.004.01.4.
[18] A. Parnianifard and M. A. I. Muhammadimranglasgowacuk, “Expedited Surrogate-Based
Quanti cation of Engineering Tolerances using A Modi ed Polynomial Regression”.
[19] A. Hernandez-Matamoros, H. Fujita, T. Hayashi, and H. Perez-Meana, “Forecasting of
COVID19 per regions using ARIMA models and polynomial functions,” Appl. Soft Comput. J.,
vol. 96, p. 106610, 2020, doi: 10.1016/j.asoc.2020.106610.
[20] E. Matthew and O. Adeyinka, “Application of Hierarchical Polynomial Regression Models to
Predict Transmission of COVID-19 at Global Level,” Int. J. Clin. Biostat. Biometrics, vol. 6, no.
1, 2020, doi: 10.23937/2469-5831/1510027.