Professional Documents
Culture Documents
by
Dr. Jeevaraj S
1
ACKNOWLEDGEMENTS
I am highly indebted to Dr. Jeevaraj S and are obliged for giving us
the autonomy of functioning and experimenting with ideas. I would
like to take this opportunity to express our profound gratitude to him
not only for their academic guidance but also for their personal interest
in our project and constant support coupled with confidence boosting
and motivating sessions which proved very fruitful and were instru-
mental in infusing self-assurance and trust within us. The nurturing
and blossoming of the present work is mainly due to their valuable
guidance, suggestions, astute judgment, constructive criticism and an
eye for perfection. Our mentor always answered myriad of our doubts
with smiling graciousness and prodigious patience, never letting us feel
that we are novices by always lending an ear to our views, appreciating
and improving them and by giving us a free hand in our project. It’s
only because of their overwhelming interest and helpful attitude, the
present work has attained the stage it has.
Finally, we are grateful to our Institution whose constant encourage-
ment served to renew our spirit, refocus our attention and energy and
helped us in carrying out this work.
(Harsh Walia)
2
Contents
1 ABSTRACT 4
2 INTRODUCTION 4
2.1 BACKGROUND MOTIVATION . . . . . . . . . . . . 5
2.2 PROJECT OBJECTIVES . . . . . . . . . . . . . . . . 6
2.3 LITERATURE SURVEY . . . . . . . . . . . . . . . . . 6
3 METHODOLOGY 8
3.1 BLOCK DESIGN DIAGRAM . . . . . . . . . . . . . . 8
3.2 SYSTEM ARCHITECTURE . . . . . . . . . . . . . . 8
3.3 IMPLEMENTATION . . . . . . . . . . . . . . . . . . . 13
3.3.1 TOOLS AND LIBRARIES USED . . . . . . . . 13
5 TASK TO BE COMPLETED 15
6 GANTT CHART 15
7 REFERENCES 16
3
1 ABSTRACT
COVID-19, which is subsequently named as SARS-CoV-2, First Hu-
man case was found in Wuhan City, from China, in December 2019.
After, that the World health organization (WHO) has declared Coro-
navirus as a Pandemic on 11th March 2020.In this study, our pri-
mary aim is to Detect the Severe Covid-19 patient in the Early Stages
by looking at the information on demographics, comorbidities, ad-
mission laboratory values, admission medications, admission supple-
mental oxygen orders, discharge and mortality.4711 patient’s dataset
with confirmed SARS-CoV-2 infection were included in the study. So,
we have filtered the Top Best features out of 85 features from the
dataset using the seven different feature Selection algorithm and taken
the most common features from the different feature Selection algo-
rithm. After selecting the top most important features, we have ap-
plied around 17 different types of machine learning models like Linear
Regression, Logistic regression, SVM, K-Means, XGBoost, Random
Forest, Decision Tree Classifier, neural network, and many more mod-
els to predict the result using different Metrics to achieve an Effective
Results. This model can be Deployed and can be used by Hospitals
to Predict the Severity of Covid-19 Patients.
2 INTRODUCTION
A novel Coronavirus found its first case in December 2019, and af-
ter that, coronavirus cases are increasing with each subsequent day.
As we all know, many of the people have lost their lives in 1st wave
of COVID-19, and the number of Deaths increased in the 2nd Wave
of COVID-19. In this Research, we, therefore, aim to make a ro-
bust and efficient Model which help us to Predict the Severity of
Covid-19patient in the Early stages from the information based on
demographics, comorbidities, admission laboratory values, admission
medications, admission supplemental oxygen orders, discharge, and
Mortality. In this Dataset, we are given 85 features, out of which
4
some of them are important to us, and the rest are not. So, we have
used feature Selection algorithms to select the Best features out of it.
The different feature Selection algorithm are used to select the Best
feature for predicting the Mortality of Patient.
After selecting the Best Features, we have shortlisted the Best fea-
ture to Predict the Severity of Covid-19 Patients. Different types of
Machine learning Models are used for the prediction of patients who
are at High Risk of Mortality. The Models which are to be used in
this Research are Linear regression, Logistic Regression, SVM, Linear
SVC, Naive Bayes, k-Nearest Neighbours algorithm, Neural network
with Keras, Stochastic Gradient Descent, Gradient Boosting Classifier,
RidgeCV, Bagging Classifier, Decision Tree Classifier, Random Forest
Classifier, AdaBoost Classifier, XGBClassifier, LGBM Classifier, Ex-
traTrees Classifier, Gaussian Process Classification, MLP Classifier
and Voting Classifier.Finally, Ensemble the top accuracy models from
the above models to predict the final result with good accuracy.
5
supplemental oxygen orders.
While knowing the number of patients which may require a Intesive-
Care-Unit(ICU) in the future, Hospitals can arrange the ICU beds
accordingly, which can lead to Save the Patients life or knowing
which patients don’t need any ICU support or Not Severely affected
by COVID-19 can go for home quarantine.
6
as well as an explainable machine learning system that may
provide clinicians with simple decision criteria to use as a
support for assessing patient risk.
2. Early prediction keys for COVID-19 cases progression:
A meta-analysis
• The present meta-analysis combed through many databases
for relevant articles on bio marker values and major risk
factors that predict progression from mild to moderate to
severe and critical cases.
• Meta-analysis of the difference between COVID-19 patients
with severe vs mild disease in: (A) Mean age (B) Albumin
level (C) Aspartate amino transferase (D)Creatinine (E) C-
reactive protein (F) D-dimer. (G–L): Meta-analysis of the
difference between COVID-19 patients with severe vs mild
disease in: (G) Interleukin-6 (H) LDH (I) Lymphocytes
(J) Neutrophil count (K) %PD-1 expression on T cells (L)
Cortisol. (M) Hypertension (N) Diabetes (O) Chronic ob-
structive lung disease.
3. Development and validation of a laboratory risk score
for the early prediction of COVID-19 severity and in-
hospital mortality
• The goal of this study was to generate a scoring system
for identifying high-risk people, validate it in a different
samples, and assess its accuracy in predicting in-hospital
mortality mortality.
• Methods:Biological data from 330 SARS-CoV-2 infected
individuals were utilised to construct a risk score that might
predict severity progression in this cohort research. The
score was then validated using data from 240 more COVID-
19 participants in a second step. The area under the re-
ceiver operating characteristic curve was used to determine
7
the score’s accuracy.
3 METHODOLOGY
3.1 BLOCK DESIGN DIAGRAM
8
and mortality. The data relate to COVID-19 patients admitted
to a single healthcare system, over a specific period of time, and
separated into the 1st 3 weeks of the pandemic and the 2nd 3
weeks of the pandemic.
• Perform Exploratory Data Analysis to better understand and
visualize the data. Visualize each feature of the Dataset for
taking the better insight of data.
2. Feature Engineering
• Some Features of dataset like Age has Object datatype basically
it is a string. So ,we have converted to Target Encoding to pass
them into the model.
3. Feature Selection
• Dataset contain a huge number of features(85). Now, Our Task is
to get the most important and relevant feature out of all features.
Seven Feature selection algorithm is used to get the Best
features out of all features. Different algorithm have its own
criteria for finding the best feature. So, we had find the most
common feature from all the feature selection algorithm.
The 7 Feature Selection algorithm are mentioned below:
3.1 FS with the Pearson correlation :
• High correlation features are more linearly dependant and-
hence have roughly the same influence on the dependentvariable.
When two characteristics have a strong correlation,one of them
might be dropped.
3.2 FS by the SelectFromModel with LinearSVC :
• SelectFromModel is a meta-transformer that can be used in
conjunction with any estimator that gives significance to each
feature through a particular property (such as coef ,feature im-
portances ).
9
• LinearSVC (Linear Support Vector Classification) is similar to
Support vector classification, but the parameter kernel is ’linear’.
LinearSVC is implemented in terms of liblinear while SVC is
implemented in libsvm, so it has more flexibility in the choice of
loss function and penalties and It scales better to large numbers
of samples.
3.3 FS by the SelectFromModel with Lasso :
• The Lasso is a linear model for estimating sparse coefficients
that is beneficial in particular situations because it prefers solu-
tions with fewer non-zero coefficients, effectively decreasing the
amount of characteristics that the provided solution is reliant
on.
3.4 FS by the SelectKBest with Chi-2 :
• The SelectKBest method help to select Best K Features out of
all the features.
3.5 FS by the Recursive Feature Elimination with Logistic
Regression :
• It is a greedy optimization method that seeks to identify the
highest performing feature subset. It generates models over and
over again, putting away the best or worst performing feature
at each iteration. It builds the next model using the features on
the left until all of the features are used up. The features are
then ranked in order of their removal.
3.6 FS by the Recursive Feature Elimination with Random
Forest :
• Here the Recursive Feature Elimination use the Random
forest to recursively get the best features out of it.
3.7 FS by the Variance Threshold
• Feature selector that removes all low-variance features.
10
4. Training Our Model :
For Training the Model I have implemented different Models and com-
pare their AUC-ROC score to get the Best Model out of it.
11
4.4 Linear SVC
• LinearSVC (Linear Support Vector Classification) is similar to
Support vector classification, but the parameter kernel is ’linear’.
LinearSVC is implemented in terms of liblinear while SVC is
implemented in libsvm, so it has more flexibility in the choice of
loss function and penalties and It scales better to large numbers
of samples.
This class can handle both sparse and dense input and the mul-
ticlass support is handled according to a one Vs All scheme.
12
3.3 IMPLEMENTATION
Implementation of Various Machine Learning Models and Feature Se-
lection algorithm are done using various ML libraries.
13
4 RESULTS (Progress Made so far)
Training and test result ofAUC(Area under the curve) Score of
Various Models is given below in the table.
14
5 TASK TO BE COMPLETED
• Some More Advanced Models are yet to be implemented like
Random Forest, AdaBoost, Gradient Boost, XGBoost, Light-
GBM, Ridge Classifier, BaggingClassifier, Extra Trees Classifier,
k-Nearest Neighbors (KNN), Naive Bayes , Neural Network with
Keras.
• Finally Voting classifier will be made which is a Ensemble of all
the Top Machine Learning Models.
• Different Hyperparameter tuning method will be used to train
the model like GridSearchCv, RandomizedSeachCV etc.
6 GANTT CHART
15
7 REFERENCES
References
[1] Early prediction keys for COVID-19 cases progression: A meta-
analysis Addison-Wesley, Reading, Massachusetts, 1993.
https://doi.org/10.1016/j.jiph.2021.03.001
[2] Development and validation of a laboratory risk score for the early
prediction of COVID-19 severity and in-hospital mortality
https://doi.org/10.1016/j.iccn.2021.103012
16