Professional Documents
Culture Documents
degree of
BACHELOR OF TECHNOLOGY
in
Guided by
1|Page
CANDIDATE’S DECLARATION
It is hereby certified that the work which is being presented in the B. Tech Industrial/In-house training
Report entitled "Covid-19 Cases Prediction" in partial fulfilment of the requirements for the award of the
degree of Bachelor of Technology and submitted in the Department of Electronics & Communication
Engineering of BHARATI VIDYAPEETH’S COLLEGE OF ENGINEERING, New Delhi (Affiliated to Guru Gobind
Singh Indraprastha University, Delhi) is an authentic record of our own work carried out during a period
from February 14th 2021 to March 25th 2021 under the guidance of Mr. Adgaonker Shashank, Innovians
Technology.
The matter presented in the B. Tech Industrial/In-house training Report has not been submitted by me
for the award of any other degree of this or any other Institute.
This is to certify that the above statement made by the candidate is correct to the best of
my knowledge. He/She/They are permitted to appear in the External Industrial/In-house
training Examination
2|Page
ABSTRACT
Information technology, with an emphasis on subjects like data science and machine
learning, can aid in the fight against the epidemic. It's critical to have early warning
systems in place that can predict how much a sickness will harm society and then make
decisions based on that information. In this project, we include methods for forecasting
future cases based on existing data. On the basis of historical active cases, deaths, and
recovery rates, the ML technique is used to predict the number of active cases in the
future.
This Report describes our work in Polynomial model used to predict covid-19 cases
prediction. By comparing the results with other regression models, we found out
that Polynomial regression Model is proved to be best in predicting the most
accurate results.
3|Page
ACKNOWLEDGMENT
We express our deep gratitude to Mr. Adgaonker Shashank, Innovians Technology, for
his valuable guidance and suggestion throughout my project work. We are thankful to
Dr S.B Kumar for his valuable guidance.
We would like to extend my sincere thanks to Head of the Department, Prof. Kirti
Gupta for her time-to-time suggestions to complete my project work. I am also
thankful to Prof. Dharmender Saini, Principal for providing me the facilities to carry out
my project work.
4|Page
TABLE OF CONTENTS
CANDIDATE DECLARATION
ABSTRACT
ACKNOWLEDGEMENT
TABLE OF CONTENTS
Chapter 1: Introduction
Chapter 2: Motivation
Chapter 3: Objective
Chapter 4: Workflow
Chapter 5: Results
Visual representation of Result
5.1
Chapter 6: Conclusion
5|Page
LIST OF FIGURES
2. Graphs showing Actual values, Predicted Values and difference between actual
and predicted values.
6|Page
CHAPTER 1
INTRODUCTION
Machine Learning (ML) is all about programming the unprogrammable. For example,
if you want to predict covid cases, ML helps to predict the cases. Prediction of Covid-
19 cases depends on various features such as confirmed cases, confirmed deaths,
rising data for covid, daily new cases which are reported for covid, depends on the
recovery rate and other factors.
• Prepare the data. Load the data from the database or CSV files.
Extract/Identify the key features (input and output parameters) relevant to
the problem you will solve or predict the outcome.
• Build and train ML model. Here you can evaluate different algorithms, settings
and see which model is best for your scenario.
7|Page
Fig 1: Machine Learning Workflow.
8|Page
1.2 About dataset
This dataset contains almost one-year covid-19 cases data. The individual medical costs
billed by health insurance are the target variable charges, and the rest of columns
contain personal information such as age, gender, family status, and whether the
patient smokes among other features.
8. new recovered: new patient recovered from covid on the daily basis.
Since we are predicting covid-19 cases, new cases will be our target feature.
9|Page
1.3 Regression Models
Linear regression performs the task to predict a dependent variable value (y) based
on a given independent variable (x). So, this regression technique finds out a linear
relationship between x (input) and y(output). Hence, the name is Linear
Regression.
Support Vector Machine (SVM) is a very popular Machine Learning algorithm that is
used in both Regression and Classification. Support Vector Regression is similar to
Linear Regression in that the equation of the line is
y= wx+b
In SVR, this straight line is referred to as hyperplane. The data points on either side of
the hyperplane that are closest to the hyperplane are called Support Vectors which is
used to plot the boundary line.
Unlike other Regression models that try to minimize the error between the real and
predicted value, the SVR tries to fit the best line within a threshold value (Distance
between hyperplane and boundary line), a. Thus, we can say that SVR model tries
satisfy the condition
-a < y-wx+b< a.
In this model, for each unit increase in the value of x, the conditional expectation of y
increases by β1 units. In many settings, such a linear relationship may not hold. For
example, if we are modeling the yield of a chemical synthesis in terms of the
temperature at which the synthesis takes place, we may find that the yield improves by
increasing amounts for each unit increase in temperature. In this case, we might
propose a quadratic model of the form
y=β0+β1x+β2x^2+e
In this model, when the temperature is increased from x to x + 1 units, the expected
yield changes by β1+β2 (2x+1). For infinitesimal changes in x, the effect on y is given by
the total derivative with respect to x: β1+2β2x The fact that the change in yield depends
on x is what makes the relationship between x and y nonlinear even though the model is
linear in the parameters to be estimated.
In general, we can model the expected value of y as an nth degree polynomial, yielding
the general polynomial regression model
y= β0+β1x+β2x^2+β3x^3+......+βnx^n+e
11 | P a g e
Conveniently, these models are all linear from the point of view of estimation, since the
regression function is linear in terms of the unknown parameters β0, β1, .... Therefore,
for least squares analysis, the computational and inferential problems of polynomial
regression can be completely addressed using the techniques of multiple regression.
This is done by treating x, x2, ... as being distinct independent variables in a multiple
regression model.
12 | P a g e
CHAPTER 2
MOTIVATION
People’s healthcare cost forecasting is now a valuable tool for improving healthcare
accountability. The healthcare sector produces a very large amount of data related
to patients, diseases, and diagnosis, but since it has not been analyzed properly, it
does not provide the significance which it holds along with the patient healthcare
cost.
A health insurance policy is a policy that covers or minimizes the expenses of losses
caused by a variety of hazards. A variety of factors influence the cost of insurance or
healthcare. For a variety of stakeholders and health departments, accurately predicting
individual healthcare expenses using prediction models is critical. Accurate cost
estimates can help health insurers and, increasingly, healthcare delivery organizations
to plan for the future and priorities the allocation of limited care management
resources. Furthermore, knowing ahead of time what their probable expenses for the
future can assist patients to choose insurance plans with appropriate deductibles and
premiums. These elements play a role in the development of insurance policies.
In the insurance sector, ML can help enhance the efficiency of policy wording. In
healthcare, ML algorithms are particularly good at predicting high-cost, high-need
patient expenditures. ML can be categorized into three different types, as shown in
the following Figure. These types are supervised machine learning (i.e., a task-driven
approach) used for classification/regression and all data labelled; unsupervised
machine learning (i.e., a data-driven approach) used for clustering and all data
unlabeled; and reinforcement learning (i.e., learning from mistakes) used for decision
making.
13 | P a g e
CHAPTER 3
OBJECTIVE
The objective is to train a ML polynomial model that can predict covid-19 rising active
cases more accurately. Being a polynomial model problem, metrics such as the
coefficient of determination and the mean absolute error are used to evaluate the
model.
14 | P a g e
CHAPTER 4
WORKFLOW
15 | P a g e
Information about dataset
16 | P a g e
Checking for Null Values
After reading heatmap, we found that there is no null value present in our dataset.
17 | P a g e
Data Analysis
18 | P a g e
Data Analysis
19 | P a g e
Distribution of Age Value
20 | P a g e
BMI Distribution
21 | P a g e
4.2 Data Pre-Processing
22 | P a g e
4.3 Train-Test Split
23 | P a g e
4.6 Prediction for future
24 | P a g e
CHAPTER 5
RESULTS
The graph shows the plots of Actual value and Predicted value using four different models.
Fig. 2: Graphs showing Actual and Predicted Values and future prediction for active cases.
25 | P a g e
CHAPTER 6
CONCLUSION
Machine learning (ML) is one aspect of computational intelligence that can solve different problems in
a wide range of applications and systems when it comes to leveraging historical data. Predicting
medical insurance costs is still a problem in the healthcare industry that needs to be investigated and
improved. In this project, by using a set of ML algorithms, a computational intelligence approach is
applied to predict healthcare insurance costs. The medical insurance dataset was obtained from the
KAGGLE repository and was utilized for training and testing the Linear Regression, Support Vector
Regression, Gradient Boosting, Random Forest Regressor. The regression of this dataset followed the
steps of preprocessing, feature engineering, data splitting, regression, and evaluation. After the
evaluation it was observed that we got better accuracy and less mean absolute error by using gradient
boosting Model.
26 | P a g e
21 | P a g e