FINALHeart-Disease-Predection Literature Review

Tribhuvan University
Institute of Science and Technology

Central Department of Computer Science and Information Technology
Kirtipur, Kathmandu
A Literature Review On
“A Review on Heart Disease Prediction Using Machine Learning”
Under the Supervision of
Asst Prof. Jagdish Bhatta
Submitted By
Narayan Upreti (Roll No. 601/077)
Submitted To
August 25, 2023


Kirtipur, Kathmandu
SUPERVISOR’S RECOMMENDATION
I hereby recommend that this Literature Review report prepared under my supervision by
Mr. Narayan Upreti entitled “ A review on Heart Disease prediction Using Machine
Learning” in partial fulfillment of the requirements for the degree of M.Sc. in Computer
Science and Information Technology be processed for the evaluation.
…………………………………..
Prof. Jagdish Bhatta
(LR Supervisor)

Kirtipur, Kathmandu
LETTER OF APPROVAL
This is to certify that this Literature Review prepared by Mr. Narayan Upreti entitled
“A review on Heart Disease Prediction Using Machine Learning” in partial fulfillment
of the requirements for the degree of M.Sc. in Computer Science and Information
Technology has been well studied. In our opinion it is satisfactory in the scope and quality
as a Literature Review for the required degree.
Evaluation Committee
………………………… ………………………… ………………………………
Prof. Jagdish Bhatta Internal Examiner Asst Prof. Sarbin Sayami
( LR Supervisor) Tribhuvan University (Head of CDCSIT)
Tribhuvan University Tribhuvan University

ACKNOWLEDGEMENT
I express my sincere gratitude to the Central Department of Computer Science and

Information Technology, Tribhuvan University for including Literature Review program
as a part of our curriculum.
I am very glad to express my deepest sense of gratitude and sincere thanks to my highly
respected and esteemed supervisor Prof. Jagdish Bhatta Central Department of computer
science and Information Technology for his valuable supervision, guidance,
encouragement, and support for completing this seminar report.
I am also thankful to Asst. Prof. Sarbin Sayami, HOD of Central Department of Computer
Science and Information Technology for his constant support throughout the period. At the
end I would like to express my sincere thanks to all my friends and others who helped me
directly or indirectly.
Narayan Upreti
Roll no: 601/077
i
ABSTRACT
In medical field the diagnosis of heart disease is most difficult task. It depends on the
careful analysis of different clinical and pathological data of the patient by medical experts,
which is complicated process. Due to advancement in machine learning and information
technology, the researchers and medical practitioners in large extent are interested in the
development of automated system for the prediction of heart disease that is highly accurate,
effective and helpful in early diagnosis. This report presents a review of current research
on heart disease and prediction system for heart disease using Random Forest Algorithm.
Keyword: Heart Disease, Random Forest Algorithm
ii
Table Contents
ACKNOWLEDGEMENT ..........................................................................................................i
ABSTRACT ..............................................................................................................................ii
List of Figures ........................................................................................................................... iv
List of Tables ............................................................................................................................. v
List of Abbreviation ..................................................................................................................vi
CHAPTER 1: INTRODUCTION .............................................................................................. 1
1.1 Overview .......................................................................................................................... 1
1.2 Heart Disease ................................................................................................................... 1
1.3 Random Forest Algorithm ............................................................................................... 2
1.4 Problem of Statement ....................................................................................................... 2
1.4 Objective .......................................................................................................................... 2
CHAPTER 2: LITERATURE REVIEW ................................................................................... 3
CHAPTER 3: METHODOLOGY ............................................................................................. 5
3.1 Selection of Research Papers ........................................................................................... 5
3.2 Summarization of different papers................................................................................... 5
3.3 Random Forest Algorithm ............................................................................................... 8
CHAPTER 5: IMPLEMENTATION ...................................................................................... 10
5.1 Tool Used ....................................................................................................................... 10
CHAPTER 6: RESULT AND ANALYSIS ............................................................................ 11
6.1 Model Evaluation ........................................................................................................... 11
6.2 Evaluation of Training data ........................................................................................... 11
6.2 Evaluation Testing data.................................................................................................. 12
CHAPTER 7: CONCLUSION ................................................................................................ 13
References................................................................................................................................ 14
iii
List of Figures
Figure 1: Result of MLP ...................................................................................................... 7

Figure 2: Result of different Machine Learning Algorithm ................................................. 7
Figure 3: Result of Hybrid algorithm ................................................................................... 7
Figure 4: Classification results of Different Machine learning algorithm ........................... 8
Figure 5: Performance Measures ......................................................................................... 8
Figure 6: 2 Percentage accuracy results of classification techniques .................................. 8
Figure 7: Performance of Training data ............................................................................. 12
Figure 8: Performance of Testing Data .............................................................................. 12
iv
List of Tables
Table 1: Summarization of papers on heart disease prediction ........................................... 5

Table 2: Performance of Training Data ............................................................................. 11
Table 3: Performance of Testing Data ............................................................................... 12
v
List of Abbreviation
FN False Negative
FP False positive
RF Random Forest
Sklearn Scikit Learn
TN True Negative
TP True Positive
vi
CHAPTER 1: INTRODUCTION
This chapter explains about overview of this report, supporting theory and basic knowledge
about Heart Disease, reason to use machine learning to predict Heart Disease and main
objectives of the report.
1.1 Overview
This report explains how the heart disease is predicted using machine learning algorithm. Here
Random Forest is implemented to predict heart disease. Dataset is collected by the Kaggle.
After that dataset is split into train and test dataset for training and testing phase. Panda library
is used for data manipulation and Sklearn library is used to split data, train model using
RandomForestClassifier. After train a model test data is used for implementation of
RandomForestClassifier to predict the model effectiveness. At last step result are recorded
during implementation. All the processes are explained in detail later in this report.
1.2 Heart Disease

Heart disease describes a range of conditions that affect the heart. Heart diseases include:
• Blood vessel disease, such as coronary artery disease

• Heart rhythm problems (arrhythmias)
• Heart defects you're born with (congenital heart defects)
• Heart valve disease
• Disease of the heart muscle
Heart is an important organ of all living creature, which plays a vital role of pumping blood to
the rest of the organs through the blood vessels of the circulatory system. Any functional
problem in the heart has a direct impact on the survival of concerned human being, since it
affects other parts of the body such as brain, lungs, kidney, liver etc. Heart Diseases describe a
range of conditions that affect the heart and stand as a leading cause of death all over the world.
The clinical symptoms of the Heart Disease complicate the prognosis, as it is influenced by
many factors like functional and pathologic appearance. This could subsequently delay the
prognosis of the disease. Hence, there is a need for the invention of newer concepts to improve
the prediction accuracy with short span. Disease prognosis
1
through numerous factors or symptoms is a complicated problem, even that could lead to a false
assumption. Therefore, an attempt is made to bridge the knowledge and the experience of the
experts and to build a system that fairly supports the diagnosing process. Hence, this paper
review on different approach, by implementing the Random Forest Algorithm over a Heart
Disease.
1.3 Random Forest Algorithm

Random forest is a Supervised Machine Learning Algorithm that is used widely in
Classification and Regression problems. It builds decision trees on different samples and takes
their majority vote for classification and average in case of regression. One of the most
important features of the Random Forest Algorithm is that it can handle the data set containing
continuous variables as in the case of regression and categorical variables as in the case of
classification. It performs better results for classification problems. It can use large number of
datasets.
1.4 Problem of Statement

Doctors rely on common knowledge for treatment. When common knowledge is lacking,
studies are summarized after some number of cases have been studied. But this process takes
time. In medical field the diagnosis of heart disease is most difficult task. It depends on the
careful analysis of different clinical and pathological data of the patient by medical experts,
which is complicated process. Due to advancement in machine learning and information
technology, the researchers and medical practitioners in large extent are interested in the
development of automated system for the prediction of heart disease that is highly accurate,
effective and helpful in early diagnosis. This report present a prediction system for heart disease
using Random Forest Approach.
1.4 Objective
The objectives of LR are:
 To summaries and analysis previous research and theories

 To predict heart disease using RF algorithm
2
CHAPTER 2: LITERATURE REVIEW
V.V. Ramalingam et al. [1] proposed a survey of various models based on such algorithms and
techniques and analyze their performance. Models based on supervised learning algorithms
such as Support Vector Machines (SVM), K-Nearest Neighbour (KNN), NaïveBayes, Decision
Trees (DT), Random Forest (RF) and ensemble models are found very popular among the
researchers.
Aditi Gavhane et al. [2] proposed to develop an application which can predict the vulnerability
of a heart disease given basic symptoms like age, sex, pulse rate etc. The machine learning
algorithm neural networks has proven to be the most accurate and reliable algorithm and hence
used in the proposed system.
Savitha Kamalapurkar et al. [3] proposed the web based system for prediction of heart disease
using machine learning (ML) algorithms with a good accuracy compared to other works. It uses
ensemble classification method for prediction of heart disease, as ensemble methods gives
better accuracy compared to individual classifiers like Support Vector Machine (SVM) or
Random Forest (RF).
Dr. M. Kavitha [4] the Cleveland heart disease dataset, and data mining techniques such as
regression and classification are used. Machine learning techniques Random Forest and
Decision Tree are applied. The novel technique of the machine learning model is designed. In
implementation, 3 machine learning algorithms are used, they are 1. Random Forest, 2.
Decision Tree and 3. Hybrid model (Hybrid of random forest and decision tree). Experimental
results show an accuracy level of 88.7% through the heart disease prediction model with the
hybrid model. The interface is designed to get the user's input parameter to predict the heart
disease, for which we used a hybrid model of Decision Tree and Random Forest.
Md Mamun Ali [5] aimed to identify machine learning classifiers with the highest accuracy for
such diagnostic purposes. Several supervised machine-learning algorithms were applied and
compared for performance and accuracy in heart disease prediction. Feature importance scores
for each feature were estimated for all applied algorithms except MLP and KNN. All the
features were ranked based on the importance score to find those giving high heart disease
predictions. This study found that using a heart disease dataset collected from Kaggle three-
3
classification based on k-nearest neighbor (KNN), decision tree (DT) and random forests (RF)
algorithms the RF method achieved 100% accuracy along with 100% sensitivity and specificity.
Thus, we found that a relatively simple supervised machine learning algorithm can be used to
make heart disease predictions with very high accuracy and excellent potential utility.
Vijeta Sharma [6] used a benchmark dataset of UCI Heart disease prediction for this research
work, which consist of 14 different parameters related to Heart Disease. Machine Learning
algorithms such as Random Forest, Support Vector Machine (SVM), Naive Bayes and Decision
tree have been used for the development of model. In our research we have also tried to find
the correlations between the different attributes available in the dataset with the help of standard
Machine Learning methods and then using them efficiently in the prediction of chances of Heart
disease. Result shows that compared to other ML techniques, Random Forest gives more
accuracy in less time for the prediction. This model can be helpful to the medical practitioners
at their clinic as decision support system
Devansh Shah [7] presented various attributes related to heart disease, and the model on basis
of supervised learning algorithms as Naïve Bayes, decision tree, K-nearest neighbor, and
random forest algorithm. It uses the existing dataset from the Cleveland database of UCI
repository of heart disease patients. The dataset comprises 303 instances and 76 attributes. Of
these 76 attributes, only 14 attributes are considered for testing, important to substantiate the
performance of diferent algorithms. This research paper aims to envision the probability of
developing heart disease in the patients. The results portray that the highest accuracy score is
achieved with K-nearest neighbor.
4
CHAPTER 3: METHODOLOGY
3.1 Selection of Research Papers

The steps followed during the selection of research papers are:
Step 1: At first, papers are searched using Google Scholar search for relevant papers.
Step 2: Review the search results and assess the relevance of each paper based on their titles
and abstracts. Exclude papers that are obviously unrelated to the topic.
Step 3: Skim through the introduction and conclusion of each paper to understand the research
context , objectives, and findings.
Step 4: Latest research paper are selected from the list of papers.
Step 5: Finally altogether ten relevant papers are selected.
3.2 Summarization of different papers

Table 1: Summarization of papers on heart disease prediction
Title Dataset Algorithms Performanc Remarks(Wi

e nner Alg.)
Measures(A
ccuracy%)
Heart disease Cleveland NB, SVM, 84.15, 85.76 RF achieve

prediction using dataset KNN, DT, 83.16, 97%
machine RF 77.55, accuracy
learning 97 resp.
techniques a
survey [1]
Prediction of Cleveland MLP 91 MLP

Heart Disease dataset (Precision)
Using Machine from UCI shown in fig
Learning [2] library 1
5
Online Portal kaggle DT, KNN, 72, 74, 90, 92 RF achieve 92
for Prediction of SVM, RF resp. shown
Heart Disease in fig 2
using Machine
Learning
Ensemble
Method(PrHD-
ML) [3]
Heart Disease - DT, RF, 79, 81, 88 (DT+RF)

Prediction using (DT+RF) achieve 88%
Hybrid machine Resp. shown accuracy
Learning Model in fig 3
[4]
Heart disease Cleveland LR, ABMI, 89.62, 95.02, RF, KNN,

prediction using dataset MLP, KNN, 97.95, DTT achieve
supervised DT, RF 100,100, 100 100%
machine accuracy
learning shown in fig
algorithms: 4
Performance
analysis and
comparison [5]
Heart Disease Cleveland SVM, RF, 99.5, 99.7 RF achieve

Prediction using dataset DT, NB 85.1, 90.4 99.7%
Machine resp.(Precisi precision
Learning on) shown in
Techniques [6] fig 5
Heart Disease UCI NB, KNN, 81.05, 90.78, KNN achieve

Prediction using DT, RF 80.26, 82.89 90.78%
Machine resp. shown accuracy.
Learning in fig 6
Techniques [7]
6
Figure 1: Result of MLP
Figure 2: Result of different Machine Learning Algorithm
Figure 3: Result of Hybrid algorithm
7
Figure 4: Classification results of Different Machine learning algorithm
Figure 5: Performance Measures
Figure 6: 2 Percentage accuracy results of classification techniques
3.3 Random Forest Algorithm

RF algorithm use CART method for decision tree which use Gini method to create split points
including Gini Index (Gini Impurity) and Gini Gain. This algorithm contains separated random
dataset from original dataset which is known as bagging to generate multiple decision trees.
Main concept for generating decision tree is Gini index which helps to determine the splitting
node or splitting criteria for decision trees node. Which nodes have minimum Gini index
selected as a root node and split decision tree into leaf node.
Gini index can be calculated by,
𝐺𝑖𝑛𝑖 = 1 − ∑ -pj 2
j=0
Where, P is the probability and j is the number of data present in bootstrap dataset.
Algorithm
8
Step 1: Create bootstrap table by taking k number of random records from n numbers of records
in dataset.
Step 2: Construct individual decision trees for each bootstrap table.
Step 3: Each decision tree will generate an output for input.
Step 4: Final output is considered based on Majority Voting or Averaging for Classification
and regression respectively.
9
CHAPTER 5: IMPLEMENTATION
5.1 Tool Used

The implementation is carried out using python and its library, dataset retrieve from Kaggle
and RF algorithm. They are;
• Panda: Panda library is helps to data manipulation in pre-processing phase.

read_csv() method of panda library is used to load the dataset into the system. In
preprocessing phase, isnull() method is used to check null value present in dataset. From
panda library drop() method is used for to split dependent(feature/input) and
independent(output/target) data which are present in dataset.
• Sklearn: Sklearn isalsobig library which contain many different method for
helps to implement algorithm. Among them there are some used method,
o StandardScaler(): dataset contain different range of data so
StandardScaler() method is used for normalize data. After applying this
method normalized data are in between -1, 1.
o train_test_split(): the data are splitted into train and test data with the
help of this method. The splitted ration of the data is 80-20% where among all
data 80% of data are labelled as train data and 20% data are labelled as test data.
o RandomForestClassifier(): Thisis the main module of the sklearn
library for this report which contain fit() and predict() method. fit() method is
used for train the data and predict() method is used for generating output based
on the learning.
o metrics(): metrics module of the this library is used to measure the
overall performance of the algorithm. accuracy_score(), precision_score(),
recall_score() and f1_score() are included into metrics module which helps to
determine the performance of the algorithm. It is also known as confusion metric
which contain true and false value of actual and predicted value.
10
CHAPTER 6: RESULT AND ANALYSIS
The results are obtained after implementation of the algorithm in terms of performance. The
performances are measure by confusion matrix, accuracy, precision, recall and f1 score.
6.1 Model Evaluation

This report consists of two types of model which need to be evaluated. They are:
• Train model: this model used train data for learning proposed using MLP
algorithm. After learning again train data and label is used for testing proposed using
predict method of RandomForestClassifier module which determine effectiveness of
this model. The evaluation data are described later in details.
• Test Model: this model use test data and label for testing proposed where testing
data are those data which are not used to train among dataset. The evaluation of test data
also describe later in this report.
6.2 Evaluation of Training data

This train model achieved 100% accuracy, precision, recall and f1 score respectively.
Table 2: Performance of Training Data
Accuracy Precision Recall F1 Score
100% 100% 100% 100%
11
Figure 7: Performance of Training data
6.2 Evaluation Testing data

This train model achieved 80%, 79.48%, 88.57% and 83.78% accuracy, precision, recall and f1
score respectively.
Table 3: Performance of Testing Data
Accuracy Precision Recall F1 Score
80% 79.48% 88.57% 83.78%
Figure 8: Performance of Testing Data

12
CHAPTER 7: CONCLUSION
This report reviews recent literature in the domain of heart disease prediction. Researchers
apply several data mining and machine learning techniques to analyze huge complex medical
data, helping healthcare professionals to predict heart disease. The aim of this report is to
present an overview of machine learning techniques used in recent times for the heart disease
prediction. This report reviewed many papers employing various algorithms. Different machine
learning algorithms are used with their corresponding evaluation matrices to evaluate the
performance of algorithm. Among them it hard to declare any one algorithm as best suited for
heart disease prediction because performance of the algorithm determine other key factors. This
report has included only limited number of papers.
After reviewing all mentions paper in most of case RF perform better than other algorithms.
Because RF can handle big amount of data which is not possible by other algorithms.
13
References
[1] R. V.V., D. Ayantan and K. R. M, "Heart disease prediction using machine learning
techniques: a survey," International Journal of Engineering & Technology, 2018.
[2] A. Gavhane, G. Kokkula, I. Pandya and P. K. Devadkar, "Prediction of Heart Disease

Using Machine Learning," IEEE, 2018.
[3] S. Kamalapurkar and S. G. G. H, "Online Portal for Prediction of Heart Disease using
Machine Learning," IEEE, 2021.
[4] D. K. M., G. G., D. R., R. S. Y. and S. S. R., "Heart Disease Prediction using Hybrid
machine Learning Model," IEEE, 2021.
[5] M. A. Md, B. K. Paul, K. Ahmed, F. M. Bui, J. M., W. Q. e and M. A. Moni, "Heart

disease prediction using supervised machine learning algorithms: Performance analysis
and comparison," Computers in Biology and Medicine, 2021.
[6] V. Sharma, S. Yadav and M. Gupta, "Heart Disease Prediction using Machine Learning
Techniques," Communication Control and Networking, 2020.
[7] D. Shah, ·. S. Pate and S. K. Bharti, "Heart Disease Prediction using Machine Learning
Techniques," Computer Science, 2020.
14
15

FINALHeart-Disease-Predection Literature Review

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

FINALHeart-Disease-Predection Literature Review

Uploaded by

Copyright:

Available Formats

Tribhuvan University

Institute of Science and Technology

“A Review on Heart Disease Prediction Using Machine Learning”

Under the Supervision of

Asst Prof. Jagdish Bhatta

Narayan Upreti (Roll No. 601/077)

Central Department of Computer Science and Information Technology

Institute of Science and Technology

August 25, 2023

Institute of Science and Technology

Prof. Jagdish Bhatta

Institute of Science and Technology

………………………… ………………………… ………………………………

Prof. Jagdish Bhatta Internal Examiner Asst Prof. Sarbin Sayami

( LR Supervisor) Tribhuvan University (Head of CDCSIT)

Tribhuvan University Tribhuvan University

I express my sincere gratitude to the Central Department of Computer Science and

Roll no: 601/077

Keyword: Heart Disease, Random Forest Algorithm

List of Figures ........................................................................................................................... iv

List of Tables ............................................................................................................................. v

List of Abbreviation ..................................................................................................................vi

CHAPTER 1: INTRODUCTION .............................................................................................. 1

1.1 Overview .......................................................................................................................... 1

1.2 Heart Disease ................................................................................................................... 1

1.3 Random Forest Algorithm ............................................................................................... 2

1.4 Problem of Statement ....................................................................................................... 2

1.4 Objective .......................................................................................................................... 2

CHAPTER 2: LITERATURE REVIEW ................................................................................... 3

CHAPTER 3: METHODOLOGY ............................................................................................. 5

3.1 Selection of Research Papers ........................................................................................... 5

3.2 Summarization of different papers................................................................................... 5

3.3 Random Forest Algorithm ............................................................................................... 8

CHAPTER 5: IMPLEMENTATION ...................................................................................... 10

5.1 Tool Used ....................................................................................................................... 10

CHAPTER 6: RESULT AND ANALYSIS ............................................................................ 11

6.1 Model Evaluation ........................................................................................................... 11

6.2 Evaluation of Training data ........................................................................................... 11

6.2 Evaluation Testing data.................................................................................................. 12

CHAPTER 7: CONCLUSION ................................................................................................ 13

Figure 1: Result of MLP ...................................................................................................... 7

Table 1: Summarization of papers on heart disease prediction ........................................... 5

Sklearn Scikit Learn

1.2 Heart Disease

• Blood vessel disease, such as coronary artery disease

1.3 Random Forest Algorithm

1.4 Problem of Statement

 To summaries and analysis previous research and theories

3.1 Selection of Research Papers

Step 5: Finally altogether ten relevant papers are selected.

3.2 Summarization of different papers

Title Dataset Algorithms Performanc Remarks(Wi

Heart disease Cleveland NB, SVM, 84.15, 85.76 RF achieve

Prediction of Cleveland MLP 91 MLP

Heart Disease - DT, RF, 79, 81, 88 (DT+RF)

Heart disease Cleveland LR, ABMI, 89.62, 95.02, RF, KNN,

Heart Disease Cleveland SVM, RF, 99.5, 99.7 RF achieve

Heart Disease UCI NB, KNN, 81.05, 90.78, KNN achieve

Figure 2: Result of different Machine Learning Algorithm

Figure 3: Result of Hybrid algorithm

Figure 5: Performance Measures

Figure 6: 2 Percentage accuracy results of classification techniques

3.3 Random Forest Algorithm