Professional Documents
Culture Documents
A Literature Review On
Tribhuvan University
Submitted By
Submitted To
Tribhuvan University
SUPERVISOR’S RECOMMENDATION
I hereby recommend that this Literature Review report prepared under my supervision by
Mr. Narayan Upreti entitled “ A review on Heart Disease prediction Using Machine
Learning” in partial fulfillment of the requirements for the degree of M.Sc. in Computer
Science and Information Technology be processed for the evaluation.
…………………………………..
(LR Supervisor)
Tribhuvan University
Tribhuvan University
LETTER OF APPROVAL
This is to certify that this Literature Review prepared by Mr. Narayan Upreti entitled
“A review on Heart Disease Prediction Using Machine Learning” in partial fulfillment
of the requirements for the degree of M.Sc. in Computer Science and Information
Technology has been well studied. In our opinion it is satisfactory in the scope and quality
as a Literature Review for the required degree.
Evaluation Committee
I am very glad to express my deepest sense of gratitude and sincere thanks to my highly
respected and esteemed supervisor Prof. Jagdish Bhatta Central Department of computer
science and Information Technology for his valuable supervision, guidance,
encouragement, and support for completing this seminar report.
I am also thankful to Asst. Prof. Sarbin Sayami, HOD of Central Department of Computer
Science and Information Technology for his constant support throughout the period. At the
end I would like to express my sincere thanks to all my friends and others who helped me
directly or indirectly.
Narayan Upreti
i
ABSTRACT
In medical field the diagnosis of heart disease is most difficult task. It depends on the
careful analysis of different clinical and pathological data of the patient by medical experts,
which is complicated process. Due to advancement in machine learning and information
technology, the researchers and medical practitioners in large extent are interested in the
development of automated system for the prediction of heart disease that is highly accurate,
effective and helpful in early diagnosis. This report presents a review of current research
on heart disease and prediction system for heart disease using Random Forest Algorithm.
ii
Table Contents
ACKNOWLEDGEMENT ..........................................................................................................i
ABSTRACT ..............................................................................................................................ii
References................................................................................................................................ 14
iii
List of Figures
iv
List of Tables
v
List of Abbreviation
FN False Negative
FP False positive
RF Random Forest
TN True Negative
TP True Positive
vi
CHAPTER 1: INTRODUCTION
This chapter explains about overview of this report, supporting theory and basic knowledge
about Heart Disease, reason to use machine learning to predict Heart Disease and main
objectives of the report.
1.1 Overview
This report explains how the heart disease is predicted using machine learning algorithm. Here
Random Forest is implemented to predict heart disease. Dataset is collected by the Kaggle.
After that dataset is split into train and test dataset for training and testing phase. Panda library
is used for data manipulation and Sklearn library is used to split data, train model using
RandomForestClassifier. After train a model test data is used for implementation of
RandomForestClassifier to predict the model effectiveness. At last step result are recorded
during implementation. All the processes are explained in detail later in this report.
Heart is an important organ of all living creature, which plays a vital role of pumping blood to
the rest of the organs through the blood vessels of the circulatory system. Any functional
problem in the heart has a direct impact on the survival of concerned human being, since it
affects other parts of the body such as brain, lungs, kidney, liver etc. Heart Diseases describe a
range of conditions that affect the heart and stand as a leading cause of death all over the world.
The clinical symptoms of the Heart Disease complicate the prognosis, as it is influenced by
many factors like functional and pathologic appearance. This could subsequently delay the
prognosis of the disease. Hence, there is a need for the invention of newer concepts to improve
the prediction accuracy with short span. Disease prognosis
1
through numerous factors or symptoms is a complicated problem, even that could lead to a false
assumption. Therefore, an attempt is made to bridge the knowledge and the experience of the
experts and to build a system that fairly supports the diagnosing process. Hence, this paper
review on different approach, by implementing the Random Forest Algorithm over a Heart
Disease.
1.4 Objective
The objectives of LR are:
2
CHAPTER 2: LITERATURE REVIEW
V.V. Ramalingam et al. [1] proposed a survey of various models based on such algorithms and
techniques and analyze their performance. Models based on supervised learning algorithms
such as Support Vector Machines (SVM), K-Nearest Neighbour (KNN), NaïveBayes, Decision
Trees (DT), Random Forest (RF) and ensemble models are found very popular among the
researchers.
Aditi Gavhane et al. [2] proposed to develop an application which can predict the vulnerability
of a heart disease given basic symptoms like age, sex, pulse rate etc. The machine learning
algorithm neural networks has proven to be the most accurate and reliable algorithm and hence
used in the proposed system.
Savitha Kamalapurkar et al. [3] proposed the web based system for prediction of heart disease
using machine learning (ML) algorithms with a good accuracy compared to other works. It uses
ensemble classification method for prediction of heart disease, as ensemble methods gives
better accuracy compared to individual classifiers like Support Vector Machine (SVM) or
Random Forest (RF).
Dr. M. Kavitha [4] the Cleveland heart disease dataset, and data mining techniques such as
regression and classification are used. Machine learning techniques Random Forest and
Decision Tree are applied. The novel technique of the machine learning model is designed. In
implementation, 3 machine learning algorithms are used, they are 1. Random Forest, 2.
Decision Tree and 3. Hybrid model (Hybrid of random forest and decision tree). Experimental
results show an accuracy level of 88.7% through the heart disease prediction model with the
hybrid model. The interface is designed to get the user's input parameter to predict the heart
disease, for which we used a hybrid model of Decision Tree and Random Forest.
Md Mamun Ali [5] aimed to identify machine learning classifiers with the highest accuracy for
such diagnostic purposes. Several supervised machine-learning algorithms were applied and
compared for performance and accuracy in heart disease prediction. Feature importance scores
for each feature were estimated for all applied algorithms except MLP and KNN. All the
features were ranked based on the importance score to find those giving high heart disease
predictions. This study found that using a heart disease dataset collected from Kaggle three-
3
classification based on k-nearest neighbor (KNN), decision tree (DT) and random forests (RF)
algorithms the RF method achieved 100% accuracy along with 100% sensitivity and specificity.
Thus, we found that a relatively simple supervised machine learning algorithm can be used to
make heart disease predictions with very high accuracy and excellent potential utility.
Vijeta Sharma [6] used a benchmark dataset of UCI Heart disease prediction for this research
work, which consist of 14 different parameters related to Heart Disease. Machine Learning
algorithms such as Random Forest, Support Vector Machine (SVM), Naive Bayes and Decision
tree have been used for the development of model. In our research we have also tried to find
the correlations between the different attributes available in the dataset with the help of standard
Machine Learning methods and then using them efficiently in the prediction of chances of Heart
disease. Result shows that compared to other ML techniques, Random Forest gives more
accuracy in less time for the prediction. This model can be helpful to the medical practitioners
at their clinic as decision support system
Devansh Shah [7] presented various attributes related to heart disease, and the model on basis
of supervised learning algorithms as Naïve Bayes, decision tree, K-nearest neighbor, and
random forest algorithm. It uses the existing dataset from the Cleveland database of UCI
repository of heart disease patients. The dataset comprises 303 instances and 76 attributes. Of
these 76 attributes, only 14 attributes are considered for testing, important to substantiate the
performance of diferent algorithms. This research paper aims to envision the probability of
developing heart disease in the patients. The results portray that the highest accuracy score is
achieved with K-nearest neighbor.
4
CHAPTER 3: METHODOLOGY
Step 1: At first, papers are searched using Google Scholar search for relevant papers.
Step 2: Review the search results and assess the relevance of each paper based on their titles
and abstracts. Exclude papers that are obviously unrelated to the topic.
Step 3: Skim through the introduction and conclusion of each paper to understand the research
context , objectives, and findings.
Step 4: Latest research paper are selected from the list of papers.
5
Online Portal kaggle DT, KNN, 72, 74, 90, 92 RF achieve 92
for Prediction of SVM, RF resp. shown
Heart Disease in fig 2
using Machine
Learning
Ensemble
Method(PrHD-
ML) [3]
6
Figure 1: Result of MLP
7
Figure 4: Classification results of Different Machine learning algorithm
𝐺𝑖𝑛𝑖 = 1 − ∑ -pj 2
j=0
Where, P is the probability and j is the number of data present in bootstrap dataset.
Algorithm
8
Step 1: Create bootstrap table by taking k number of random records from n numbers of records
in dataset.
Step 4: Final output is considered based on Majority Voting or Averaging for Classification
and regression respectively.
9
CHAPTER 5: IMPLEMENTATION
10
CHAPTER 6: RESULT AND ANALYSIS
The results are obtained after implementation of the algorithm in terms of performance. The
performances are measure by confusion matrix, accuracy, precision, recall and f1 score.
• Train model: this model used train data for learning proposed using MLP
algorithm. After learning again train data and label is used for testing proposed using
predict method of RandomForestClassifier module which determine effectiveness of
this model. The evaluation data are described later in details.
• Test Model: this model use test data and label for testing proposed where testing
data are those data which are not used to train among dataset. The evaluation of test data
also describe later in this report.
11
Figure 7: Performance of Training data
This report reviews recent literature in the domain of heart disease prediction. Researchers
apply several data mining and machine learning techniques to analyze huge complex medical
data, helping healthcare professionals to predict heart disease. The aim of this report is to
present an overview of machine learning techniques used in recent times for the heart disease
prediction. This report reviewed many papers employing various algorithms. Different machine
learning algorithms are used with their corresponding evaluation matrices to evaluate the
performance of algorithm. Among them it hard to declare any one algorithm as best suited for
heart disease prediction because performance of the algorithm determine other key factors. This
report has included only limited number of papers.
After reviewing all mentions paper in most of case RF perform better than other algorithms.
Because RF can handle big amount of data which is not possible by other algorithms.
13
References
[1] R. V.V., D. Ayantan and K. R. M, "Heart disease prediction using machine learning
techniques: a survey," International Journal of Engineering & Technology, 2018.
[3] S. Kamalapurkar and S. G. G. H, "Online Portal for Prediction of Heart Disease using
Machine Learning," IEEE, 2021.
[4] D. K. M., G. G., D. R., R. S. Y. and S. S. R., "Heart Disease Prediction using Hybrid
machine Learning Model," IEEE, 2021.
[6] V. Sharma, S. Yadav and M. Gupta, "Heart Disease Prediction using Machine Learning
Techniques," Communication Control and Networking, 2020.
[7] D. Shah, ·. S. Pate and S. K. Bharti, "Heart Disease Prediction using Machine Learning
Techniques," Computer Science, 2020.
14
15