You are on page 1of 24

Tribhuvan University

Institute of Science and Technology


Central Department of Computer Science and Information Technology
Kirtipur, Kathmandu

A Literature Review On

“A Review on Heart Disease Prediction Using Machine Learning”

Under the Supervision of

Asst Prof. Jagdish Bhatta

Tribhuvan University

Submitted By

Narayan Upreti (Roll No. 601/077)

Submitted To

Central Department of Computer Science and Information Technology

Institute of Science and Technology

Tribhuvan University

August 25, 2023


Tribhuvan University

Institute of Science and Technology


Central Department of Computer Science and Information Technology
Kirtipur, Kathmandu

SUPERVISOR’S RECOMMENDATION

I hereby recommend that this Literature Review report prepared under my supervision by
Mr. Narayan Upreti entitled “ A review on Heart Disease prediction Using Machine
Learning” in partial fulfillment of the requirements for the degree of M.Sc. in Computer
Science and Information Technology be processed for the evaluation.

…………………………………..

Prof. Jagdish Bhatta

(LR Supervisor)

Tribhuvan University
Tribhuvan University

Institute of Science and Technology


Central Department of Computer Science and Information Technology
Kirtipur, Kathmandu

LETTER OF APPROVAL

This is to certify that this Literature Review prepared by Mr. Narayan Upreti entitled
“A review on Heart Disease Prediction Using Machine Learning” in partial fulfillment
of the requirements for the degree of M.Sc. in Computer Science and Information
Technology has been well studied. In our opinion it is satisfactory in the scope and quality
as a Literature Review for the required degree.

Evaluation Committee

………………………… ………………………… ………………………………

Prof. Jagdish Bhatta Internal Examiner Asst Prof. Sarbin Sayami

( LR Supervisor) Tribhuvan University (Head of CDCSIT)

Tribhuvan University Tribhuvan University


ACKNOWLEDGEMENT

I express my sincere gratitude to the Central Department of Computer Science and


Information Technology, Tribhuvan University for including Literature Review program
as a part of our curriculum.

I am very glad to express my deepest sense of gratitude and sincere thanks to my highly
respected and esteemed supervisor Prof. Jagdish Bhatta Central Department of computer
science and Information Technology for his valuable supervision, guidance,
encouragement, and support for completing this seminar report.

I am also thankful to Asst. Prof. Sarbin Sayami, HOD of Central Department of Computer
Science and Information Technology for his constant support throughout the period. At the
end I would like to express my sincere thanks to all my friends and others who helped me
directly or indirectly.

Narayan Upreti

Roll no: 601/077

i
ABSTRACT

In medical field the diagnosis of heart disease is most difficult task. It depends on the
careful analysis of different clinical and pathological data of the patient by medical experts,
which is complicated process. Due to advancement in machine learning and information
technology, the researchers and medical practitioners in large extent are interested in the
development of automated system for the prediction of heart disease that is highly accurate,
effective and helpful in early diagnosis. This report presents a review of current research
on heart disease and prediction system for heart disease using Random Forest Algorithm.

Keyword: Heart Disease, Random Forest Algorithm

ii
Table Contents
ACKNOWLEDGEMENT ..........................................................................................................i

ABSTRACT ..............................................................................................................................ii

List of Figures ........................................................................................................................... iv

List of Tables ............................................................................................................................. v

List of Abbreviation ..................................................................................................................vi

CHAPTER 1: INTRODUCTION .............................................................................................. 1

1.1 Overview .......................................................................................................................... 1

1.2 Heart Disease ................................................................................................................... 1

1.3 Random Forest Algorithm ............................................................................................... 2

1.4 Problem of Statement ....................................................................................................... 2

1.4 Objective .......................................................................................................................... 2

CHAPTER 2: LITERATURE REVIEW ................................................................................... 3

CHAPTER 3: METHODOLOGY ............................................................................................. 5

3.1 Selection of Research Papers ........................................................................................... 5

3.2 Summarization of different papers................................................................................... 5

3.3 Random Forest Algorithm ............................................................................................... 8

CHAPTER 5: IMPLEMENTATION ...................................................................................... 10

5.1 Tool Used ....................................................................................................................... 10

CHAPTER 6: RESULT AND ANALYSIS ............................................................................ 11

6.1 Model Evaluation ........................................................................................................... 11

6.2 Evaluation of Training data ........................................................................................... 11

6.2 Evaluation Testing data.................................................................................................. 12

CHAPTER 7: CONCLUSION ................................................................................................ 13

References................................................................................................................................ 14

iii
List of Figures

Figure 1: Result of MLP ...................................................................................................... 7


Figure 2: Result of different Machine Learning Algorithm ................................................. 7
Figure 3: Result of Hybrid algorithm ................................................................................... 7
Figure 4: Classification results of Different Machine learning algorithm ........................... 8
Figure 5: Performance Measures ......................................................................................... 8
Figure 6: 2 Percentage accuracy results of classification techniques .................................. 8
Figure 7: Performance of Training data ............................................................................. 12
Figure 8: Performance of Testing Data .............................................................................. 12

iv
List of Tables

Table 1: Summarization of papers on heart disease prediction ........................................... 5


Table 2: Performance of Training Data ............................................................................. 11
Table 3: Performance of Testing Data ............................................................................... 12

v
List of Abbreviation

FN False Negative

FP False positive

RF Random Forest

Sklearn Scikit Learn

TN True Negative

TP True Positive

vi
CHAPTER 1: INTRODUCTION

This chapter explains about overview of this report, supporting theory and basic knowledge
about Heart Disease, reason to use machine learning to predict Heart Disease and main
objectives of the report.

1.1 Overview
This report explains how the heart disease is predicted using machine learning algorithm. Here
Random Forest is implemented to predict heart disease. Dataset is collected by the Kaggle.
After that dataset is split into train and test dataset for training and testing phase. Panda library
is used for data manipulation and Sklearn library is used to split data, train model using
RandomForestClassifier. After train a model test data is used for implementation of
RandomForestClassifier to predict the model effectiveness. At last step result are recorded
during implementation. All the processes are explained in detail later in this report.

1.2 Heart Disease


Heart disease describes a range of conditions that affect the heart. Heart diseases include:

• Blood vessel disease, such as coronary artery disease


• Heart rhythm problems (arrhythmias)
• Heart defects you're born with (congenital heart defects)
• Heart valve disease
• Disease of the heart muscle

Heart is an important organ of all living creature, which plays a vital role of pumping blood to
the rest of the organs through the blood vessels of the circulatory system. Any functional
problem in the heart has a direct impact on the survival of concerned human being, since it
affects other parts of the body such as brain, lungs, kidney, liver etc. Heart Diseases describe a
range of conditions that affect the heart and stand as a leading cause of death all over the world.
The clinical symptoms of the Heart Disease complicate the prognosis, as it is influenced by
many factors like functional and pathologic appearance. This could subsequently delay the
prognosis of the disease. Hence, there is a need for the invention of newer concepts to improve
the prediction accuracy with short span. Disease prognosis

1
through numerous factors or symptoms is a complicated problem, even that could lead to a false
assumption. Therefore, an attempt is made to bridge the knowledge and the experience of the
experts and to build a system that fairly supports the diagnosing process. Hence, this paper
review on different approach, by implementing the Random Forest Algorithm over a Heart
Disease.

1.3 Random Forest Algorithm


Random forest is a Supervised Machine Learning Algorithm that is used widely in
Classification and Regression problems. It builds decision trees on different samples and takes
their majority vote for classification and average in case of regression. One of the most
important features of the Random Forest Algorithm is that it can handle the data set containing
continuous variables as in the case of regression and categorical variables as in the case of
classification. It performs better results for classification problems. It can use large number of
datasets.

1.4 Problem of Statement


Doctors rely on common knowledge for treatment. When common knowledge is lacking,
studies are summarized after some number of cases have been studied. But this process takes
time. In medical field the diagnosis of heart disease is most difficult task. It depends on the
careful analysis of different clinical and pathological data of the patient by medical experts,
which is complicated process. Due to advancement in machine learning and information
technology, the researchers and medical practitioners in large extent are interested in the
development of automated system for the prediction of heart disease that is highly accurate,
effective and helpful in early diagnosis. This report present a prediction system for heart disease
using Random Forest Approach.

1.4 Objective
The objectives of LR are:

 To summaries and analysis previous research and theories


 To predict heart disease using RF algorithm

2
CHAPTER 2: LITERATURE REVIEW

V.V. Ramalingam et al. [1] proposed a survey of various models based on such algorithms and
techniques and analyze their performance. Models based on supervised learning algorithms
such as Support Vector Machines (SVM), K-Nearest Neighbour (KNN), NaïveBayes, Decision
Trees (DT), Random Forest (RF) and ensemble models are found very popular among the
researchers.

Aditi Gavhane et al. [2] proposed to develop an application which can predict the vulnerability
of a heart disease given basic symptoms like age, sex, pulse rate etc. The machine learning
algorithm neural networks has proven to be the most accurate and reliable algorithm and hence
used in the proposed system.

Savitha Kamalapurkar et al. [3] proposed the web based system for prediction of heart disease
using machine learning (ML) algorithms with a good accuracy compared to other works. It uses
ensemble classification method for prediction of heart disease, as ensemble methods gives
better accuracy compared to individual classifiers like Support Vector Machine (SVM) or
Random Forest (RF).

Dr. M. Kavitha [4] the Cleveland heart disease dataset, and data mining techniques such as
regression and classification are used. Machine learning techniques Random Forest and
Decision Tree are applied. The novel technique of the machine learning model is designed. In
implementation, 3 machine learning algorithms are used, they are 1. Random Forest, 2.
Decision Tree and 3. Hybrid model (Hybrid of random forest and decision tree). Experimental
results show an accuracy level of 88.7% through the heart disease prediction model with the
hybrid model. The interface is designed to get the user's input parameter to predict the heart
disease, for which we used a hybrid model of Decision Tree and Random Forest.

Md Mamun Ali [5] aimed to identify machine learning classifiers with the highest accuracy for
such diagnostic purposes. Several supervised machine-learning algorithms were applied and
compared for performance and accuracy in heart disease prediction. Feature importance scores
for each feature were estimated for all applied algorithms except MLP and KNN. All the
features were ranked based on the importance score to find those giving high heart disease
predictions. This study found that using a heart disease dataset collected from Kaggle three-
3
classification based on k-nearest neighbor (KNN), decision tree (DT) and random forests (RF)
algorithms the RF method achieved 100% accuracy along with 100% sensitivity and specificity.
Thus, we found that a relatively simple supervised machine learning algorithm can be used to
make heart disease predictions with very high accuracy and excellent potential utility.

Vijeta Sharma [6] used a benchmark dataset of UCI Heart disease prediction for this research
work, which consist of 14 different parameters related to Heart Disease. Machine Learning
algorithms such as Random Forest, Support Vector Machine (SVM), Naive Bayes and Decision
tree have been used for the development of model. In our research we have also tried to find
the correlations between the different attributes available in the dataset with the help of standard
Machine Learning methods and then using them efficiently in the prediction of chances of Heart
disease. Result shows that compared to other ML techniques, Random Forest gives more
accuracy in less time for the prediction. This model can be helpful to the medical practitioners
at their clinic as decision support system

Devansh Shah [7] presented various attributes related to heart disease, and the model on basis
of supervised learning algorithms as Naïve Bayes, decision tree, K-nearest neighbor, and
random forest algorithm. It uses the existing dataset from the Cleveland database of UCI
repository of heart disease patients. The dataset comprises 303 instances and 76 attributes. Of
these 76 attributes, only 14 attributes are considered for testing, important to substantiate the
performance of diferent algorithms. This research paper aims to envision the probability of
developing heart disease in the patients. The results portray that the highest accuracy score is
achieved with K-nearest neighbor.

4
CHAPTER 3: METHODOLOGY

3.1 Selection of Research Papers


The steps followed during the selection of research papers are:

Step 1: At first, papers are searched using Google Scholar search for relevant papers.

Step 2: Review the search results and assess the relevance of each paper based on their titles
and abstracts. Exclude papers that are obviously unrelated to the topic.

Step 3: Skim through the introduction and conclusion of each paper to understand the research
context , objectives, and findings.

Step 4: Latest research paper are selected from the list of papers.

Step 5: Finally altogether ten relevant papers are selected.

3.2 Summarization of different papers


Table 1: Summarization of papers on heart disease prediction

Title Dataset Algorithms Performanc Remarks(Wi


e nner Alg.)
Measures(A
ccuracy%)

Heart disease Cleveland NB, SVM, 84.15, 85.76 RF achieve


prediction using dataset KNN, DT, 83.16, 97%
machine RF 77.55, accuracy
learning 97 resp.
techniques a
survey [1]

Prediction of Cleveland MLP 91 MLP


Heart Disease dataset (Precision)
Using Machine from UCI shown in fig
Learning [2] library 1

5
Online Portal kaggle DT, KNN, 72, 74, 90, 92 RF achieve 92
for Prediction of SVM, RF resp. shown
Heart Disease in fig 2
using Machine
Learning
Ensemble
Method(PrHD-
ML) [3]

Heart Disease - DT, RF, 79, 81, 88 (DT+RF)


Prediction using (DT+RF) achieve 88%
Hybrid machine Resp. shown accuracy
Learning Model in fig 3
[4]

Heart disease Cleveland LR, ABMI, 89.62, 95.02, RF, KNN,


prediction using dataset MLP, KNN, 97.95, DTT achieve
supervised DT, RF 100,100, 100 100%
machine accuracy
learning shown in fig
algorithms: 4
Performance
analysis and
comparison [5]

Heart Disease Cleveland SVM, RF, 99.5, 99.7 RF achieve


Prediction using dataset DT, NB 85.1, 90.4 99.7%
Machine resp.(Precisi precision
Learning on) shown in
Techniques [6] fig 5

Heart Disease UCI NB, KNN, 81.05, 90.78, KNN achieve


Prediction using DT, RF 80.26, 82.89 90.78%
Machine resp. shown accuracy.
Learning in fig 6
Techniques [7]

6
Figure 1: Result of MLP

Figure 2: Result of different Machine Learning Algorithm

Figure 3: Result of Hybrid algorithm

7
Figure 4: Classification results of Different Machine learning algorithm

Figure 5: Performance Measures

Figure 6: 2 Percentage accuracy results of classification techniques

3.3 Random Forest Algorithm


RF algorithm use CART method for decision tree which use Gini method to create split points
including Gini Index (Gini Impurity) and Gini Gain. This algorithm contains separated random
dataset from original dataset which is known as bagging to generate multiple decision trees.
Main concept for generating decision tree is Gini index which helps to determine the splitting
node or splitting criteria for decision trees node. Which nodes have minimum Gini index
selected as a root node and split decision tree into leaf node.

Gini index can be calculated by,

𝐺𝑖𝑛𝑖 = 1 − ∑ -pj 2
j=0

Where, P is the probability and j is the number of data present in bootstrap dataset.

Algorithm

8
Step 1: Create bootstrap table by taking k number of random records from n numbers of records
in dataset.

Step 2: Construct individual decision trees for each bootstrap table.

Step 3: Each decision tree will generate an output for input.

Step 4: Final output is considered based on Majority Voting or Averaging for Classification
and regression respectively.

9
CHAPTER 5: IMPLEMENTATION

5.1 Tool Used


The implementation is carried out using python and its library, dataset retrieve from Kaggle
and RF algorithm. They are;

• Panda: Panda library is helps to data manipulation in pre-processing phase.


read_csv() method of panda library is used to load the dataset into the system. In
preprocessing phase, isnull() method is used to check null value present in dataset. From
panda library drop() method is used for to split dependent(feature/input) and
independent(output/target) data which are present in dataset.
• Sklearn: Sklearn isalsobig library which contain many different method for
helps to implement algorithm. Among them there are some used method,
o StandardScaler(): dataset contain different range of data so
StandardScaler() method is used for normalize data. After applying this
method normalized data are in between -1, 1.
o train_test_split(): the data are splitted into train and test data with the
help of this method. The splitted ration of the data is 80-20% where among all
data 80% of data are labelled as train data and 20% data are labelled as test data.
o RandomForestClassifier(): Thisis the main module of the sklearn
library for this report which contain fit() and predict() method. fit() method is
used for train the data and predict() method is used for generating output based
on the learning.
o metrics(): metrics module of the this library is used to measure the
overall performance of the algorithm. accuracy_score(), precision_score(),
recall_score() and f1_score() are included into metrics module which helps to
determine the performance of the algorithm. It is also known as confusion metric
which contain true and false value of actual and predicted value.

10
CHAPTER 6: RESULT AND ANALYSIS

The results are obtained after implementation of the algorithm in terms of performance. The
performances are measure by confusion matrix, accuracy, precision, recall and f1 score.

6.1 Model Evaluation


This report consists of two types of model which need to be evaluated. They are:

• Train model: this model used train data for learning proposed using MLP
algorithm. After learning again train data and label is used for testing proposed using
predict method of RandomForestClassifier module which determine effectiveness of
this model. The evaluation data are described later in details.
• Test Model: this model use test data and label for testing proposed where testing
data are those data which are not used to train among dataset. The evaluation of test data
also describe later in this report.

6.2 Evaluation of Training data


This train model achieved 100% accuracy, precision, recall and f1 score respectively.

Table 2: Performance of Training Data

Accuracy Precision Recall F1 Score

100% 100% 100% 100%

11
Figure 7: Performance of Training data

6.2 Evaluation Testing data


This train model achieved 80%, 79.48%, 88.57% and 83.78% accuracy, precision, recall and f1
score respectively.

Table 3: Performance of Testing Data

Accuracy Precision Recall F1 Score

80% 79.48% 88.57% 83.78%

Figure 8: Performance of Testing Data


12
CHAPTER 7: CONCLUSION

This report reviews recent literature in the domain of heart disease prediction. Researchers
apply several data mining and machine learning techniques to analyze huge complex medical
data, helping healthcare professionals to predict heart disease. The aim of this report is to
present an overview of machine learning techniques used in recent times for the heart disease
prediction. This report reviewed many papers employing various algorithms. Different machine
learning algorithms are used with their corresponding evaluation matrices to evaluate the
performance of algorithm. Among them it hard to declare any one algorithm as best suited for
heart disease prediction because performance of the algorithm determine other key factors. This
report has included only limited number of papers.

After reviewing all mentions paper in most of case RF perform better than other algorithms.
Because RF can handle big amount of data which is not possible by other algorithms.

13
References

[1] R. V.V., D. Ayantan and K. R. M, "Heart disease prediction using machine learning
techniques: a survey," International Journal of Engineering & Technology, 2018.

[2] A. Gavhane, G. Kokkula, I. Pandya and P. K. Devadkar, "Prediction of Heart Disease


Using Machine Learning," IEEE, 2018.

[3] S. Kamalapurkar and S. G. G. H, "Online Portal for Prediction of Heart Disease using
Machine Learning," IEEE, 2021.

[4] D. K. M., G. G., D. R., R. S. Y. and S. S. R., "Heart Disease Prediction using Hybrid
machine Learning Model," IEEE, 2021.

[5] M. A. Md, B. K. Paul, K. Ahmed, F. M. Bui, J. M., W. Q. e and M. A. Moni, "Heart


disease prediction using supervised machine learning algorithms: Performance analysis
and comparison," Computers in Biology and Medicine, 2021.

[6] V. Sharma, S. Yadav and M. Gupta, "Heart Disease Prediction using Machine Learning
Techniques," Communication Control and Networking, 2020.

[7] D. Shah, ·. S. Pate and S. K. Bharti, "Heart Disease Prediction using Machine Learning
Techniques," Computer Science, 2020.

14
15

You might also like