You are on page 1of 8

Type of contribution:

Editorial Edutran Computer Science


Research Paper
Case Study
ECSITE & Information Technology
Vol. 1, No. 1 (2023) pp 1-8
Review Paper e-issn. 2986-7703
Scientific Data p-issn. 2986-9013
Report of Tech. Application

Prediction Model for Covid-19 Cases in Indonesia Using


Linear Regression and Polynomial Regression Methods
Amid Rakhman1, Yana Cahyana2*, Rahmat3, Tukino4, Syahroni Wahyu Iriananda5
1 Department of Informatics Engineering, Universitas Buana Perjuangan Karawang, 41361, Indonesia
2 Department of Informatics Engineering, Universitas Buana Perjuangan Karawang, 41361, Indonesia
3 Department of Informatics Engineering, Universitas Buana Perjuangan Karawang, 41361, Indonesia
4 Department of Information System, Universitas Buana Perjuangan Karawang, 41361, Indonesia
5 Department of Informatics Engineering, Universitas Widyagama Malang, 65128, Indonesia

yana.cahyana@ubpkarawang.ac.id

This article Highlights:


contributes to:
• Information related to future cases
of Covid-19 can be obtained by a
data mining process based on data
that has occurred before.
• Several methods can be used to make
a forecast or prediction: Linear
Regression and Polynomial
Regression.
• Linear regression and polynomial
regression methods get results
with good categories in making a
forecast or prediction

Abstract
The World Health Organization on March 11, 2020, declared Coronavirus Disease 2019 (Covid-19)
a pandemic. Covid-19 is a disease caused by a new type of coronavirus, namely Sars-CoV-2, which
affects the respiratory system. Until now, positive confirmed cases of Covid-19 in Indonesia are still
occurring every day. This study aims to predict the addition of Covid-19 cases in Indonesia. The
data is sourced from the public API page covid19.go.id in the form of an additional number of
Covid-19 cases in Indonesia by 122 lines of data. Predictions are made using linear regression and
polynomial regression methods as comparisons. Evaluation of the linear regression method
Article info
obtains a value of R2 = 0.57, while the polynomial regression method obtains a value of R2 = 0.84.
Submitted: Based on these evaluations, the polynomial regression method yields better results than the linear
2022-12-14
regression method. The prediction of Covid-19 cases in Indonesia from January to March 2022
Revised: using the polynomial regression method predicts that the addition of Covid-19 cases will rise again.
2023-02-01
Accepted: Keywords: Covid-19, Prediction, Linear Regression, Regression Polynomial
2023-03-13

1. Introduction
On March 11, 2020, Coronavirus Disease 2019 (Covid-19) was deemed an epidemic by the
World Health Organization. A novel coronavirus is the source of the illness known as Covid-19, Sars-
This work is licensed under
a Creative Commons CoV-2, which interferes with the respiratory system, causing inflammation of the lungs [1]. This
Attribution-NonCommercial 4.0 disease first appeared in early December 2019 in Wuham City, China. Meanwhile, in Indonesia,
International License Covid-19 was first detected on March 2, 2020 [2]. Since then, positive confirmed cases of Covid-19
have continued to increase every day, along with the increasing number of interactions in the
Publisher community and the easing of policies made by the government related to activities during the
Edutran Academic Publisher Covid-19 pandemic [3] [4].

Edutran Computer Science and Information Technology, Vol.1 No.1 1


(2023)
Amid Rakhman, et al.

Data regarding the condition of Covid-19 in Indonesia can undoubtedly be used as material for
consideration in determining policies carried out by the government because, in today's technological
developments, data is a very important and inseparable part of retrieving information [5]. Information
related to future cases of Covid-19 can be obtained by a data mining process based on data that has
occurred before. Data mining is an activity to analyze data from different points of view and conclude
it into important information or knowledge [6] [7]. One application of data mining is to predict an
event in the future based on data that has occurred with specific methods. Several methods can
be used to make a forecast or prediction, namely Linear Regression and Polynomial Regression.
Linear Regression is a statistical analysis that models the relationship of several variables according
to the form of an explicit linear equation relationship [8]. Meanwhile, Polynomial Regression is a
linear regression model formed by adding the influence of each predictor variable (X) raised to the
k-order [9]. One way to describe polynomial regression is as a special case of linear regression.
When using linear regression, the two factors (the target variable and the independent variable)
must be correlated and the continuous data must be known. [10].
Another study using the same method also obtained very good results in having a good
knowledge on fate and transport of the pollutants to properly manage river water quality [11].
Meanwhile, research using the polynomial regression method can work well in predicting daily
cases of Covid-19 in DKI Jakarta [12]. The use of the polynomial regression method in Prediction
Model to Predict the Compressive Strength of Eco-Friendly Concrete [13]. Based on the problems
that have been described and research that has proven that the linear regression and polynomial
regression methods get results in good categories in making forecasts or predictions. So in this
study, predictions were made for Covid-19 cases in Indonesia using linear and polynomial
regression methods.

2. Methods
2.1. Covid-19
Covid-19 is an inflammatory disease of the lungs that occurs when a person is infected with the
coronavirus. There are many kinds of corona viruses and what can cause a person to get Covid -19 is the
type of SARS Corona Virus-2. This virus has a very small size, ranging from 50 s.d. to 200 nanometers
[14].

2.2. Prediction
Prediction or forecasting is an activity to predict future events using specific scientific methods or
approaches. The data used to make a prediction or forecast is usually quantitative data. Evaluation of a
prediction model can be done by calculating the value of the Coefficient of Determination (R2). The
coefficient of determination is a statistical measure that indicates the influence of the independent or
independent variable (X) on the dependent or dependent variable (Y). The coefficient of determination
is between 0 and 1. If the value of R2 is greater or closer to 1, the stronger the influence of the
independent variable (X) on the dependent variable (Y). According to Niclas et al [15], the general form
of the coefficient of determination can be seen in Equation (1) below.
𝑆𝑆𝑅 ∑( y′ − y )2
R2 = = ∑( (1)
𝑆𝑆𝑇 y − y )2

Where SSR is the sum of the squares of the difference between the predicted value (y ′ ), and
the average actual value (y) , while SST is the sum of the squares of the difference between the
actual value (y) and the average actual value (y).

2.3. Linear Regression


Linear Regression is a statistical analysis performed to make forecasting or prediction by modeling
the relationship between several variables, namely the dependent variable (Y) and the independent
variable (X). According to Zhang [16] [17], the formula for simple linear regression is explained in the
following Equation.
(∑𝑦)(∑x2 )−(∑𝑥)(∑𝑥𝑦)
a= (2)
𝑛(∑𝑥 2 )−(∑𝑥)2

Edutran Computer Science and Information Technology, Vol.1 No.1 2


(2023)
Amid Rakhman, et al.

𝑛(∑𝑥𝑦)−(∑𝑥)(∑𝑦)
b= (3)
𝑛(∑𝑥 2 )−(∑𝑥)2

y = a + bx (4)

Where y is the number of cases, x is the time period, n is the amount of data, a is the intercept and
b is the slope.

2.4. Polynomial Regression


A specific kind of regression that takes into account the curvilinear connection between the
dependent value (Y) and the independent value is called polynomial regression (X). Polynomial
Regression is a Linear Regression model which is modeled by adding up the effect of each predictor
variable (X) raised to the nth order. In general, according to Amir et al [18] [19] the Polynomial
Regression model is written in the following Equation (5).

Y = b0 + b1X + b2X2 + …. + bnXn (5)

Where X is the independent variable, Y is the predicted variable, b0 is the intercept, b1, b2,.., bn is
the slope or regression coefficients, and n is the degree or rank of the polynomial [20].

2.5. Research Overview


Figure 1 shows the research flow that begins with selecting data accessed through the public
API from the covid19.go.id page. One hundred twenty-two rows of data were selected with 2
attributes, namely the time period attribute (X) and the number of Covid-19 cases (Y). Then
preprocessing the data and transforming the time period attribute (variable X). Next, modeling was
carried out using the Linear Regression and Polynomial Regression methods on the data. After
making the model, then testing the data and calculating the error value using the coefficient of
determination (R2) in both models. A better model is then used to predict additional cases of Covid-
19 in January to. March 2022

Start Data Selection Preprocessing Transformation

Testing Model Training Model Make Models Data Splitting

Calculate errors Make predictions Finish


Figure 1.
Research Overview

3. Results and Discussion


3.1. Research Results
The data that has been selected totals 122 rows of data from September 1, 2021, to.d.
December 31, 2021. The data consists of two attributes, namely the period (X) and the number of
positives (Y) shown in Table 1.

Edutran Computer Science and Information Technology, Vol.1 No.1 3


(2023)
Amid Rakhman, et al.

Table 1. Time Positive Number


Selection Result Data
0 2021-09-01 10337
1 2021-09-02 8955
2 2021-09-03 7797
3 2021-09-04 6727
4 2021-09-05 5403
… … …
117 2021-12-27 120
118 2021-12-28 278
119 2021-12-29 194
120 2021-12-30 189
121 2021-12-31 180

Transformation of data on the time attribute into a time index based on the value of the time
attribute minus the minimum value of the time attribute so that data can be processed when
making models shown in Table 2.

Table 2. Time Time Indexes Positive Number


Data Transformation
0 2021-09-01 0 10337
1 2021-09-02 1 8955
2 2021-09-03 2 7797
3 2021-09-04 3 6727
4 2021-09-05 4 5403

Model creation is carried out using the Linear Regression and Polynomial Regression methods
using python. Time index attribute as X variable and positive number as Y variable. Figure 2 shows
the Making of a Linear Regression Model.

Figure 2.
Making a Linear
Regression Model

Manual calculation of the Linear Regress model shown in Table 3. :

Table 3. Time x y xy X2
Manual Linear Regress
Calculations 01/09/2021 0 10337 0 0
02/09/2021 1 8955 8955 1
03/09/2021 2 7797 15594 4
04/09/2021 3 6727 20181 9
… … … … …
… … … … …
29/12/2021 119 194 23086 14161
30/12/2021 120 189 22680 14400
31/12/2021 121 180 21780 14641
Jumlah 7381 172919 4081947 597861

Edutran Computer Science and Information Technology, Vol.1 No.1 4


(2023)
Amid Rakhman, et al.

(∑𝑦)(∑x2 )−(∑𝑥)(∑𝑥𝑦)
a= 𝑛(∑𝑥 2 )−(∑𝑥)2
(172919∗597861)−(7381∗4081947)
= (122∗597861)−(73812 )

= 3968,21
𝑛(∑𝑥𝑦)−(∑𝑥)(∑𝑦)
b= 𝑛(∑𝑥 2 )−(∑𝑥)2
(122∗4081947)−(7381∗172919)
= (122∗597861)−(73812 )

= -42,16

So, the formula for finding the predicted value in the Linear Regression model is as follows.
y = 3968,21 – 42,16𝑥

Figure 3 shows the Regression Polynomial Model Building.

Figure 3.
Creating a Regression
Polynomial Model

Manual calculation on Regression Polynomial shown in Table 4. :

Table 4. Time x y xy X2 X3 X4 X2y


Regression Polynomial
Manual Calculations 01/09/2021 0 10337 0 0 0 0 0
02/09/2021 1 8955 8955 1 1 1 8955
03/09/2021 2 7797 15594 4 8 16 31188
04/09/2021 3 6727 20181 9 27 81 60543
… … … … … … … …
… … … … … … … …
29/12/2021 119 194 23086 14161 1685159 200533921 2747234
30/12/2021 120 189 22680 14400 1728000 207360000 2721600
31/12/2021 121 180 21780 14641 1771561 214358881 2635380
Jumlah 7381 172919 4081947 597861 54479161 5295254877 213650469

𝑛 ∑𝑥 ∑𝑥 2 ∑𝑥 0 𝑦
[ ∑𝑥 ∑𝑥 2 ∑𝑥 ]= [∑𝑥1 𝑦]
3

∑𝑥 2 ∑𝑥 3 ∑𝑥 4 ∑𝑥 2 𝑦
Solving using the Gauss Elimination method:

122 7381 597861 𝑏0 172919


[ 7381 597861 54479161 ] [𝑏1 ] = [ 4081947 ]
597861 54479161 5295254877 𝑏2 213650469

1 0 0 6196,33217
0 1 0 -153,56879
0 0 1 0,92071184

b0 = 6196,33217
b1 = -153,56879
b2 = 0,92071184
Edutran Computer Science and Information Technology, Vol.1 No.1 5
(2023)
Amid Rakhman, et al.

So, the formula for finding the predicted value in the Regression Polynomial model is as follows.
y = 6196,33 - 153,57𝑥 + 0,92𝑥 2
Model testing is done on the data by making predictions using the equations in the known Linear
Regression and Polynomial Regression Models shown in Table 5. Then for Linear Regression Model
Visualization and Polynomial Regression showed in Figure 4.

Table 5. polynomial regression


Time Time Indexes Positive number linear regression predictions
Model Testing Results on prediction
Linear Regression and 0 2021-09-01 0 10337 3968.209516 6196.332170
Polynomial Regression
1 2021-09-02 1 8955 3926.046861 6043.684094
2 2021-09-03 2 7797 3883.884205 5892.877441
3 2021-09-04 3 6727 3841.721549 5743.912212
4 2021-09-05 4 5403 3799.558894 5596.788407

Figure 4.
Visualization of Linear
Regression and Polynomial
Regression Models
Evaluate the Linear Regression and Polynomial Regression Models by calculating the
coefficient of determination (R2) value shown in Figure 5.

Figure 5.
Evaluation of Linear
Regression and Polynomial
Regression Models

Based on the coefficient of determination (R2) value in both models, the model using the
Linear Regression method gets a value of R2 = 0.57, while the model using the Polynomial
Regression method gets a value of R2 = 0.84. So it can be stated that the model using the Polynomial
Regression method can predict better than the Linear Regression method. Figure 6 shows the
Prediction Results for January to March 2022.

Figure 6.
Visualization of
Prediction Results for
January to.d. March 2022

Edutran Computer Science and Information Technology, Vol.1 No.1 6


(2023)
Amid Rakhman, et al.

3.2. Research Discussion


Evaluation of the Covid-19 case prediction model in Indonesia using the Linear Regression
method obtained a value of R2 = 0.57, while the Polynomial Regression method obtained a value
of R2 = 0.84. Thus, the Polynomial Regression method gets a better value in predicting Covid-19
cases in Indonesia than the Linear Regression method. The Polynomial Regression method is used
to predict the addition of Covid-19 cases in Indonesia in January 2022.

4. Conclusion
Based on the research results and discussion, it can be concluded that the Polynomial
Regression method gets a better value than the Linear Regression method. Evaluation of the
Regression method gets a value of R2 = 0.57, while the Polynomial Regression method gets a value
of R2 = 0.84. Prediction results for Covid-19 cases in Indonesia from January to. In March 2022,
using the Regression Polynomial, it is predicted that the addition of Covid-19 cases will rise again.

Authors' Declaration
Authors’ contributions and responsibilities – The authors made substantial contributions to the
conception and design of the study. The authors took responsibility for data analysis,
interpretation, and discussion of results. The authors read and approved the final manuscript.
Funding – No funding information from the authors.
Availability of data and materials – All data are available from the authors.
Competing interests – The authors declare no competing interest.
Additional information – No additional information from the authors.

References
[1] S. Saidi, S. Saidi, N. Herawati, and K. Nisa, “Modeling with generalized linear model on covid-
19: Cases in Indonesia,” Int. J. Electron. Commun. Syst., vol. 1, no. 1, pp. 25–32, 2021.
[2] U. Mukhaiyar, D. Widyanti, and S. Vantika, “The time series regression analysis in evaluating
the economic impact of COVID-19 cases in Indonesia,” Model Assist. Stat. Appl., vol. 16, no.
3, pp. 197–210, Aug. 2021, doi: 10.3233/MAS-210533.
[3] M. Dong, C. Tang, J. Ji, Q. Lin, and K. C. Wong, “Transmission trend of the COVID-19
pandemic predicted by dendritic neural regression,” Appl. Soft Comput., vol. 111, p. 107683,
2021, doi: 10.1016/j.asoc.2021.107683.
[4] K. K.M.U.B, “Forecasting COVID -19 Outbreak in the Philippines and Indonesia,” J. New
Front. Healthc. Biol. Sci., vol. 2, no. 1, pp. 1–19, 2021, [Online]. Available:
https://imathm.edu.lk/files/documents/file_name/fb67096c-3453-4247-b427-
cbdff75a266e/JNFHBS _2_1_ 2021_1-19.pdf
[5] A. Y. Paulindino, E. Selvano, P. K. Maryanto, and W. Budiharto, “Covid-19 Forecasting in
Indonesia Using Prophet Model,” ICIC Express Lett. Part B Appl., vol. 13, no. 2, pp. 211–218,
2022, doi: 10.24507/icicelb.13.02.211.
[6] A. Lia Hananto, B. Priyatna, A. Fauzi, A. Yuniar Rahman, Y. Pangestika, and Tukino, “Analysis
of the Best Employee Selection Decision Support System Using Analytical Hierarchy Process
(AHP),” J. Phys. Conf. Ser., vol. 1908, no. 1, 2021, doi: 10.1088/1742-6596/1908/1/012023.
[7] S. S. Hananto, A. L., Assiroj, P., Priyatna, B., Fauzi, A., Rahman, A. Y., & Hilabi, “Analysis of
Drug Data Mining with Clustering Technique Using K-Means Algorithm. In Journal of Physics:
Conference Series,” IOP Publ., vol. 1908, no. 1, 2021.
[8] S. T. A. Shah, A. Iftikhar, M. I. Khan, M. Mansoor, A. F. Mirza, and M. Bilal, “PREDICTING
COVID-19 INFECTIONS PREVALENCE USING LINEAR REGRESSION TOOL,” J. Exp. Biol. Agric.
Sci., vol. 8, p. 2020, 2020, [Online]. Available: http://www.horizonpublisherindia.in/]
[9] S. Shaikh, J. Gala, A. Jain, S. Advani, S. Jaidhara, and M. R. Edinburgh, “Analysis and
Prediction of COVID-19 using Regression Models and Time Series Forecasting,” Proc. Conflu.
2021 11th Int. Conf. Cloud Comput. Data Sci. Eng., pp. 989–995, 2021, doi:
10.1109/Confluence51648.2021.9377137.
[10] E. Gambhir, R. Jain, A. Gupta, and U. Tomer, “Regression Analysis of COVID-19 using
Edutran Computer Science and Information Technology, Vol.1 No.1 7
(2023)
Amid Rakhman, et al.
Machine Learning Algorithms,” in 2020 International Conference on Smart Electronics and
Communication (ICOSEC), Sep. 2020, pp. 65–71. doi: 10.1109/ICOSEC49089.2020.9215356.
[11] M. R. Balf, R. Noori, R. Berndtsson, A. Ghaemi, and B. Ghiasi, “Evolutionary polynomial
regression approach to predict longitudinal dispersion coefficient in rivers,” J. Water Supply
Res. Technol. - Aqua, p. jws2018021, Jun. 2018, doi: 10.2166/aqua.2018.021.
[12] R. Rory and R. Diana, “Modeling of COVID-19 data using local polynomial regression,” Semin.
Nas. Off. Stat. 2020, vol. 2, pp. 91–98, 2020.
[13] H. Imran, N. M. Al-Abdaly, M. H. Shamsa, A. Shatnawi, M. Ibrahim, and K. A. Ostrowski,
“Development of Prediction Model to Predict the CompressiveStrength of Eco-Friendly
Concrete Using MultivariatePolynomial Regression Combined with Stepwise Method,”
Materials (Basel)., vol. 15, no. 1, 2022, doi: 10.3390/ma15010317.
[14] Yayan Sat, “Do Human Restriction Mobility Policy in Indonesia effectively reduce the Spread
of Covid-19,” no. July, 2020.
[15] N. Rollborn et al., “Accuracy of determination of free light chains (Kappa and Lambda) in
plasma and serum by Swedish laboratories as monitored by external quality assessment,”
Clin. Biochem., vol. 111, pp. 47–53, Jan. 2023, doi: 10.1016/j.clinbiochem.2022.10.003.
[16] H. Zhang et al., “Revealing the influence of oxygen-containing functional groups on mercury
adsorption via density functional theory and multiple linear regression analysis,” Fuel, vol.
335, p. 127040, Mar. 2023, doi: 10.1016/j.fuel.2022.127040.
[17] B. Hakim and A. Fauzi, “Indonesian Covid-19 Prevention Policies Analysis Using Cumulative
Cases Data Regression,” OISAA J. Indones. Emas, vol. 4, no. 1, pp. 28–33, 2021, doi:
10.52162/jie.2021.004.01.4.
[18] A. Parnianifard and M. A. I. Muhammadimranglasgowacuk, “Expedited Surrogate-Based
Quanti cation of Engineering Tolerances using A Modi ed Polynomial Regression”.
[19] A. Hernandez-Matamoros, H. Fujita, T. Hayashi, and H. Perez-Meana, “Forecasting of
COVID19 per regions using ARIMA models and polynomial functions,” Appl. Soft Comput. J.,
vol. 96, p. 106610, 2020, doi: 10.1016/j.asoc.2020.106610.
[20] E. Matthew and O. Adeyinka, “Application of Hierarchical Polynomial Regression Models to
Predict Transmission of COVID-19 at Global Level,” Int. J. Clin. Biostat. Biometrics, vol. 6, no.
1, 2020, doi: 10.23937/2469-5831/1510027.

Edutran Computer Science and Information Technology, Vol.1 No.1 8


(2023)

You might also like