You are on page 1of 29

Heart Disease Prediction Using Machine Learning

Md.Walid , Md Tanvir Ahammad and Sanjida Akter Trisha

A Thesis in the Partial Fulfillment of the Requirements

for the Award of Bachelor of Computer Science and Engineering (BCSE)

Department of Computer Science and Engineering


College of Engineering and Technology
IUBAT – International University of Business Agriculture and Technology
Heart Disease Prediction Using Machine Learning

Md.Walid , Md Tanvir Ahammad and Sanjida Akter Trisha

A Thesis in the Partial Fulfillment of the Requirements for the Award of Bachelor of Computer
Science and Engineering (BCSE)
The thesis has been examined and approved,

___________________________
Prof Dr. UtpalKanti Das
Chairman & Professor
Dept. of Computer Science and Engineering
IUBAT – International University of Business Agriculture and Technology

___________________________
Dr. Hasibur Rashid Chayon
Coordinator and Associated Professor
Dept. of Computer Science and Engineering
IUBAT – International University of Business Agriculture and Technology

___________________________
Krishna Das
Assistant Professor
Dept. of Computer Science and Engineering
IUBAT – International University of Business Agriculture and Technology

Fall- 2023
LETTER OF TRANSMITTAL

24th February, 2023


The Chairman
Thesis Defense Committee
College of Engineering and Technology - CEAT
IUBAT- International University of Business Agriculture and Technology 4
Embankment Drive Road, Sector- 10, Uttara Model Town
Dhaka-1230, Bangladesh

Subject: Letter of Transmittal.

Sir,

With due respect, I would like to inform you that it is a great pleasure and a great pleasure for me
to submit this report entitled “Heart Disease Prediction Using Machine Learning ” to complete my
Practicum course.
It was a great opportunity for me to work on this project to make my theoretical knowledge more
realistic and I gained a lot of exposure to the business culture of a famous company. I now look
forward to your kind commentary on this performance report.
I will always be very grateful to you if you kindly go through this report and check my
performance.

Thanking you,

____________ _____________ _____________

Md.Walid Md Tanvir Ahammad Sanjida Akter Trisha

Student ID : 20103160 Student ID : 20103172 Student ID : 20103113


STUDENT’S DECLARATION

I am Md.Walid, a student of the BCSE-Bachelor of Computer Science and Engineering program,


under the College of Engineering and Technology (CEAT) of the International University of
Business Agriculture and Technology (IUBAT) announcing, this report entitled. Aurora IT 'Heart
Disease Prediction Using Machine Learning ' has been prepared for the completion of the CSC 490
job training course, which is part of the Bachelor of Computer Science and engineering degree.

The report and the project "Heart Disease Prediction Using Machine Learning " was edited by me.
All modules and procedures for this project are done after proper testing and online information.

It is not designed for other purposes, awards or presentations.

____________ _____________ _____________

Md.Walid Md Tanvir Ahammad Sanjida Akter Trisha

Student ID : 20103160 Student ID : 20103172 Student ID : 20103113


ACKNOWLEDGMENTS

First off, we would like to thank Almighty and many others for assisting us in completing the

research work on time and successfully. Secondly, we would like to express our gratitude to our

Supervisor, Krishna Das Sir, for his readiness to assist us and provide constructive suggestions

and remarks from the beginning to the end of this research paper. We would want to express our

gratitude to our family for their unwavering support and encouragement during our studies and

research. Finally, our thanks go to all the people who have supported us in completing the

research work, directly or indirectly. We are grateful to everyone who helped us finish our

research work.
SUPERVISOR’S CERTIFICATION

This is to ensure that the Practicum report on the “Heart Disease Prediction Using Machine
Learning ” is compiled by Md.Walid, with ID #20103160, of IUBAT– International University
of Business Agriculture and Technology, as part of the fulfillment of the required part of an
effective defense course. The report has been prepared under my supervision and is a record of
the work accomplished, successfully completed. To the best of my knowledge and as per her
declaration, no portions of this report have been posted anywhere by any degree, diploma or
certificate.

You are now allowed to submit a report. I wish her every success in her future endeavors.
Practicum Supervisor

_______________________________

Nusrath Tabassum
Lecturer
Department of Computer Science and Engineering
IUBAT–International University of Business Agriculture and Technology
DEPARTMENT’S CERTIFICATION

On behalf of the Department of Computer Science and Engineering, IUBAT-International


University of Business Agriculture and Technology, I undersigned, confirm the performance
report ‘Heart Disease Prediction Using Machine Learning ’ for Bachelor of Computer Science and
Engineering (BCSE) degrees was duly presented by Md.Walid (ID No. 20103160) and approved
by the department.

___________________________
Krishna Das
Supervisor, Assistant Professor
Department of Computer Science and Engineering
IUBAT- International University of Business Agriculture and Technology

___________________________
Dr. Hasibur Rashid Chayon
Coordinator and Associated Professor
Department of Computer Science and Engineering
IUBAT- International University of Business Agriculture and Technology

___________________________
Prof. Dr. UtpalKantiDas
Chairman & Professor
Department of Computer Science and Engineering
IUBAT- International University of Business Agriculture and Technology
ABSTRACT

Heart disease is regarded as one of the leading causes of death worldwide. Medical professionals

cannot predict it easily because it is a difficult task that requires higher knowledge and expertise

for prediction. Even today, the healthcare industry is "information rich" but "knowledge poor."

On the internet, there is a ton of information about healthcare systems. Effective analysis tools,

however, are lacking, making it difficult to find hidden relationships and patterns in data. An

automated system for making diagnoses in medicine would improve care and cut costs. Based on

information gathered from Kaggle and medical research conducted by the Cleveland Foundation,

particularly in the field of heart disease, this web application seeks to predict the occurrence of a

disease. By using data mining techniques on the dataset, it is intended to uncover hidden patterns

that are significant to heart diseases and to forecast patients' likelihood of having heart

disease.where a scale is used to rate the presence. Large amounts of data that are too complex

and massive to process and analyze using traditional methods are needed for the prediction of

heart disease. Our goal is to identify a machine learning method that can accurately predict heart

disease while also being computationally efficient. Data mining is a technique for extracting

hidden patterns and relationships from huge databases by combining statistical analysis, machine

learning, and database technology.


TABLE OF CONTENTS

Letter of Transmittal........................................................................................................iii

Student’s Declaration.......................................................................................................iv

Supervisor’s Certification.................................................................................................v

Abstract..............................................................................................................................vi

Acknowledgments............................................................................................................vii

List of Figures.....................................................................................................................x

List of Tables.....................................................................................................................xi

Chapter I. Introduction.....................................................................................................1

1.1 Statement of the Problem.......................................................................4

1.2 Rationale / Significance of the study ---------------------------------------5

1.3 Objectives of the study -------------------------------------------------------6

1.4 Hypothesis of the study ------------------------------------------------------7

1.5 Scope of the study ------------------------------------------------------------8

1.6 Limitations / Delimitations -------------------------------------------------9

1.7 Definitions of Terms ---------------------------------------------------------10

Chapter II. Literature Review........................................................................................11

2.1 Literature Review.............................................................................................22

2.2 Strongest Points of Literature Review .............................................................26

2.3 Overview and Alignment with suggested model.............................................27

Chapter III. Research Methodology..............................................................................10

3.1 Figures..................................................................................................12

3.1.1 Figure Style...........................................................................13


3.1.2 Figure Description Style.......................................................13

3.2 Tables...................................................................................................13

3.2.1 Table Title Style....................................................................13

3.2.2 Table Cell Style.....................................................................14

3.2.3 Table Description Style.........................................................14

Chapter IV. Result and Discussion................................................................................15

4.1 Title Style.............................................................................................18

4.2 Author Style.........................................................................................18

4.3 Centered Text Style..............................................................................18

Chapter V. Conclusion....................................................................................................19

References.........................................................................................................................21

LIST OF FIGURES

Figure 3.1 Figure Title.............................................................................................................13

Figure 3.2 Figure Title2...........................................................................................................14


LIST OF TABLES

Table 1.1 Evaluation of energy consumption of DSDV and DSR in MANET.........................2

Table 2.1 An example to understand how to write literature review.........................................6

Table 3.1 Table Title................................................................................................................15


Introduction
1.1 Statement of the problem:

Heart disease, also known as cardiovascular disease, refers to a group of conditions that affect
the heart and blood vessels, such as coronary artery disease, heart failure, and arrhythmias.
According to the World Health Organization (WHO), heart disease is the leading cause of death
worldwide, accounting for approximately 17.9 million deaths annually. Early detection of heart
disease can improve outcomes and prevent complications, making it a crucial public health
concern.

Machine learning (ML) is a subset of artificial intelligence (AI) that involves the use of
algorithms and statistical models to enable computers to learn from data and make predictions or
decisions without being explicitly programmed. ML has emerged as a promising approach for
predicting heart disease risk and improving clinical decision-making.

Several studies have explored the use of ML techniques to predict heart disease risk based on
patient data, such as demographic information, medical history, and clinical measurements. For
example, one study used ML algorithms to predict the risk of heart disease in patients with type 2
diabetes based on their medical history and clinical measurements (Luo et al., 2021). Another
study used ML techniques to predict the risk of heart disease in individuals with hypertension
based on their demographic information, medical history, and laboratory data (Yoon et al.,
2020).

1.2 Signficance of the study:

The provision of quality services at affordable prices is a significant challenge facing healthcare

organizations (hospitals, medical centers) affordable prices Correct patient diagnosis and

effective treatment delivery are essential components of quality care. Poor clinical decisions can

have catastrophic results, including the following:

consequently unacceptable Clinical test costs must be kept to a minimum by hospitals. By using

the proper computer-based information and/or decision support systems, they can achieve these

results. Health care data is vast [8]. It consists of transformed, resource management, and patient

data.Organizations in the healthcare industry must be able to analyze data.

.
1.3 Objectives of the study

The main objective of this study is to develop a heart disease prediction model using machine

learning algorithms. Specifically, the study aims to achieve the following objectives:

To preprocess the Cleveland Heart Disease dataset by cleaning, transforming, and engineering

relevant features to ensure that the dataset is suitable for training the machine learning models.

To train and evaluate three machine learning models (Logistic Regression, Random Forest, and

Support Vector Machines) for heart disease prediction using the preprocessed dataset. To

compare the performance of the three machine learning models using various metrics such as

accuracy, precision, recall, and F1 score.To identify the factors that contribute to the accuracy of

the machine learning models and provide insights into the importance of various features in

predicting heart disease.To provide recommendations for the use of machine learning algorithms

for heart disease prediction and suggest avenues for future research.The results of this study can

help medical practitioners identify patients at risk of heart disease and provide early intervention

to prevent fatal outcomes. Additionally, this study can contribute to the growing field of machine

learning in healthcare and inform the development of more accurate and interpretable models for

heart disease prediction.

1.4 Hypothesis of the study


Based on the objectives of the study, the following hypothesis is proposed:
H1: Machine learning algorithms can accurately predict heart disease based on the Cleveland

Heart Disease dataset. The hypothesis is based on the assumption that machine learning

algorithms can learn patterns and relationships from the dataset that can be used to predict heart
disease. The hypothesis also assumes that the Cleveland Heart Disease dataset is a suitable

representation of the population and contains sufficient information to predict heart disease

accurately. The study will test the hypothesis by training and evaluating machine learning

models on the Cleveland Heart Disease dataset and comparing their performance using various

metrics. If the hypothesis is supported, it can provide evidence for the use of machine learning

algorithms in heart disease prediction and contribute to the development of more accurate and

reliable prediction models. If the hypothesis is not supported, the study can identify the

limitations and challenges of using machine learning algorithms for heart disease prediction and

suggest ways to improve their performance in future research.

1.5 Scope of the study

The scope of this study is to develop a heart disease prediction model using machine learning

algorithms. Specifically, we will use the Cleveland Heart Disease dataset, which contains 303

instances and 14 attributes, to train and evaluate three machine learning models: Logistic

Regression, Random Forest, and Support Vector Machines (SVM). We will preprocess the data

by cleaning, transforming, and engineering relevant features to ensure that the dataset is suitable

for training the models. We will then use the preprocessed data to train the models and evaluate

their performance using various metrics such as accuracy, precision, recall, and F1 score.

The aim of this study is to provide a comprehensive analysis of the performance of the three

machine learning models for heart disease prediction. The results of this study can help medical
practitioners identify patients at risk of heart disease and provide early intervention to prevent

fatal outcomes.However, this study has certain limitations. The Cleveland Heart Disease dataset

used in this study is relatively small and may not be representative of the general population.

Additionally, the machine learning models used in this study are not interpretable, and it may be

challenging to understand the underlying factors that contribute to the model's predictions.

Further research is needed to address these limitations and develop more accurate and

interpretable models for heart disease prediction.

1.6 Limitations/Delimitations

This study has several limitations and delimitations that may affect the generalizability and

interpretation of the results. Firstly, the Cleveland Heart Disease dataset used in this study is

relatively small and may not be representative of the general population. The dataset contains

data from a single center, and the population may not be diverse enough to represent other

populations with different genetic, environmental, and lifestyle factors. Secondly, the machine

learning models used in this study are not interpretable, and it may be challenging to understand

the underlying factors that contribute to the model's predictions. This may limit the ability of

medical practitioners to make informed decisions based on the model's predictions.

Thirdly, the accuracy of the machine learning models may be affected by missing or incomplete

data in the dataset. We will address this limitation by using data imputation techniques to fill in

missing data. Lastly, the results of this study may be influenced by the choice of machine

learning algorithms and hyper parameters used. The performance of the models may vary
depending on the algorithms and hyperparameters selected. Therefore, it is important to use a

rigorous methodology to select the most suitable algorithms and hyperparameters for the dataset.

Despite these limitations and delimitations, this study provides a comprehensive analysis of the

performance of machine learning algorithms for heart disease prediction using the Cleveland

Heart Disease dataset. The results of this study can inform medical practitioners on the use of

machine learning algorithms for early detection and prevention of heart disease.

1.7 Define of Terms / Operational Definitions

To ensure clarity and consistency in this study, the following terms are defined and

operationalized:

Heart disease: Refers to any condition that affects the structure and function of the heart,

including coronary artery disease, heart failure, and arrhythmias. In this study, heart disease is

the dependent variable that we aim to predict using machine learning algorithms.

Machine learning: A subfield of artificial intelligence that involves the use of algorithms to learn

from data and make predictions or decisions. In this study, machine learning algorithms are used

to predict heart disease based on the Cleveland Heart Disease dataset.

Cleveland Heart Disease dataset: A dataset containing 303 instances and 14 attributes related to

heart disease. The dataset was collected from the Cleveland Clinic Foundation and is commonly

used in research on heart disease prediction.


Logistic Regression: A machine learning algorithm used for classification problems. In this

study, logistic regression is one of the three algorithms used to predict heart disease.

Random Forest: A machine learning algorithm used for classification and regression problems.

In this study, random forest is one of the three algorithms used to predict heart disease.

Support Vector Machines (SVM): A machine learning algorithm used for classification and

regression problems. In this study, SVM is one of the three algorithms used to predict heart

disease.

Accuracy: A metric used to evaluate the performance of machine learning algorithms. Accuracy

measures the proportion of correct predictions made by the model.

Precision: A metric used to evaluate the performance of machine learning algorithms. Precision

measures the proportion of true positives (correctly predicted cases of heart disease) among all

positive predictions.

Recall: A metric used to evaluate the performance of machine learning algorithms. Recall

measures the proportion of true positives (correctly predicted cases of heart disease) among all

cases of heart disease in the dataset.

F1 score: A metric used to evaluate the performance of machine learning algorithms. The F1

score is the harmonic mean of precision and recall and provides a balanced measure of the

model's performance.
Chapter II. Literature Review

Part 1

Analysis of Data Mining Techniques for Heart Disease Prediction: In the entire world, heart

disease is regarded as one of the leading causes of death. It is difficult to forecast for the medical

professionals because it is a challenging task that necessitates skill and advanced knowledge for

prediction. The topic of heart disease prediction based on input attributes using data mining

techniques is covered in this paper. Through the use of the Weka software, we looked into the

prediction of heart disease using KStar, J48, SMO, Bayes Net, and Multilayer Perceptrons. By

combining the results of predictive accuracy, ROC curve, and AUC value using a 6 standard data

set as well as a collected data set, the performance of these data mining techniques is evaluated.

Based on performance factor SMO and Bayes Net techniques show optimum performances than

the performances of K-Star, Multilayer Perceptron and J48 techniques[3].

Part 2

Machine Learning Application Predict the Risk of Coronary Artery Atherosclerosis: In the

entire world, coronary artery disease is the main cause of death. In this study, we suggest an a

machine learning-based algorithm for predicting the risk of coronary artery atherosclerosis. In

order to estimate the missing values in the atherosclerosis databases, a ridge expectation

maximization imputation (REMI) technique is proposed. In order to reduce the size of the feature

space, eliminate pointless attributes, and speed up learning, a conditional likelihood

maximization method is used. The proposed algorithm is assessed using the STULONG and UCI
databases. Two classification models' predictions of heart disease performance are examined and

contrasted with earlier research. The accuracy percentage of risk has increased, according to

experimental results. The impact of missing value imputation on prediction accuracy is also

assessed, and the proposed REMI approach outperforms traditional methods by a wide

margin[4].

Part 3

Intelligent Heart Disease Prediction System Using Data Mining Techniques: Unfortunately, the

vast amounts of healthcare data that the industry gathers are not "mined" to find hidden

information to make smart decisions.

Discovery of hidden patterns and relationships often goes unexploited. Advanced data mining

methods can help to change this. A prototype Intelligent Heart Disease Prediction System

(IHDPS) has been created as a result of this research using data mining techniques, specifically

Decision Trees, Naive Bayes, and Neural Networks[1].

Part 4

Locally frequent illnesses. This paper focuses on decomposition data mining techniques that are

necessary for therapeutic data mining, particularly to find locally visited diseases like heart

diseases, lung danger, bosom infection, etc. In- arrangement mining is a method for removing

data to uncover unused examples that Vembandasamy et al. used in their study to analyze and
pinpoint heart disease. Credulous Bayes analysis was used in this calculation. They used the

Bayes theorem in the Naive Bayes calculation. Therefore, Credulous Bayes has an exceptional

control over making suspicions arbitrarily. The used data set was obtained from a Tamilnadu

organization that organized a diabetes inquiry in Chennai driving efficiently. The dataset

contains information on more than 500 patients.Weka is the tool used, and 70% of Rate Part is

used to carry out the classification. Naive Bayes provides an accuracy of 86.419%.

(Deeanna Kelley “Heart Disease: Causes, Prevention, and Current

Research” in JCCC Honors Journal ) Mohammed Abdul Khaleel presented a paper as part of the

Study of Methods for Information

Part 5

system for detecting heart disease Learning neural framework calculation for vector quantization

This outline's neural network recognizes 13 clinical inputs as data and predicts whether or not

coronary illness will be present in the quiet, nearby different execution measures.

(Applying k-Nearest Neighbour in Diagnosing Heart Disease Pa- tients Mai Shouman, Tim

Turner, and Rob Stocker International Journal of Information and Education Technology, Vol. 2,

No. 3, June 2012 ) A paper titled Intelligence Framework for Conclusion Level of Coronary

Heart Malady with K-Star Calculation was published by Wiharto and Hari Kusnanto. In this

essay, they express a desire

Part 6
It is challenging to plan ahead for the restorative specialists because it might be a challenging

task that calls for ability and in-depth knowledge for expectation. This study looks at how

information mining techniques can be used to predict heart disease based on input properties.

Using the Weka computer program, we investigated the likelihood of heart infection using

KStar, J48, SMO, Bayes Net, and Multilayer Perceptron. By combining the results of predictive

accuracy, ROC curve, and AUC value using a 6 standard information set as well as a collected

information set, these information mining methods' effectiveness is evaluated. based on

performance Figure SMO and Bayes Net exhibits better results than those of K-Star, Multilayer

Perceptron, and J48 methods.

(Marjia Sultana, Afrin Haider and Mohammad Shorif Uddin “Analysis of Data Mining

Techniques for Heart Disease Prediction”, May 2015.) Analysis of Heart Disease Data Mining

TechniquesPrediction: Heart disease is regarded as one of the leading causes of death worldwide.

Part 7

This study compares the performance of various machine learning algorithms such as decision

tree, random forest, K-nearest neighbors, and support vector machines for predicting heart

disease. The authors used the Cleveland dataset from the UCI machine learning repository,

which contains various features related to heart disease such as age, sex, blood pressure, and

cholesterol levels. They found that the random forest algorithm outperformed the other

algorithms, achieving an accuracy of 84.26%. The study concludes that machine learning

algorithms can be effectively used for heart disease prediction.

Authors: Ahmad, T., Munir, A., & Bhatti, K.


Part 8

This study explores the use of deep learning models such as convolutional neural networks

(CNN) and recurrent neural networks (RNN) for predicting heart disease. The authors used the

Cleveland dataset and preprocessed it using techniques such as feature scaling and one-hot

encoding. They found that the CNN model achieved an accuracy of 92.89%, outperforming the

RNN model, which achieved an accuracy of 87.96%. The study concludes that deep learning

models can be effective for heart disease prediction and that CNNs may be particularly well-

suited for this task due to their ability to extract features from image data.

These literature reviews provide a good example of how different studies can explore various

aspects of heart disease prediction using machine learning, such as different algorithms, datasets,

and preprocessing techniques. By reviewing and synthesizing these studies, you can gain a better

understanding of the current state-of-the-art in this field and identify potential research gaps or

areas for improvement.

Authors: Kaur, H., & Singh, G.

Chapter III. Research Methodology

3.1 Machine Learning

Machine learning is a kind of algorithm that allows software applications to become more

accurate in predictability without being explicitly programmed. It is a subset of artificial

intelligence based on the idea that system can learn from data, identify the pattern and make
decisions to get optimal solutions with minimum human intervention. It has quickly become

the most popular and most successful subfield of AI, a trend driven by the availability of

faster hardware and larger datasets. There are two kinds of ML algorithms, supervised

machine learning algorithms and unsupervised machine learning algorithms.

3.2 Deep Learning

Deep learning is a specific subfield of machine learning: a new take on learning

representations from data that puts an emphasis on learning successive layers of increasingly

meaningful representations. The “deep” in “deep learning” isn’t a reference to any kind of

deeper understanding achieved by the approach; rather, it stands for this idea of successive

layers of representations. How many layers contribute to a model of the data is called the

depth of the model. Other appropriate names for the field could have been layered

representations learning or hierarchical representations learning. Machine learning tends to

focus on learning only one or two layers of representations of the data, in deep learning these
layered representations are learned via models called neural networks, structured in literal

layers stacked on top of each other. A neural network is parameterized by its weights.

3.3 Methodology

First, I will need to collect a dataset containing various features related to heart disease such as

age, gender, blood pressure, cholesterol levels, and so on. There are several sources where you

can obtain such datasets such as UCI Machine Learning Repository, Kaggle, or other medical
databases. Then I will need to clean and preprocess it. This involves removing any missing or

duplicate data, normalizing the data, and converting categorical features into numerical ones.

Next, I will need to select the most important features that will be used for training your machine

learning model. This can be done using various techniques such as correlation analysis, feature

importance, and PCA. After selecting the features, I will need to choose the appropriate machine

learning algorithm to use for predicting heart disease. You can consider using algorithms such as

random forest, SVM, or decision tree models. Once I have chosen a model, I will need to train it

using the selected features and dataset. You will also need to split the dataset into training and

testing sets, where the training set will be used for training the model, and the testing set will be

used for evaluating its performance.

After training the model evaluation and Fine-tuning will need to do. Finally, I can deploy your

trained model in a real-world scenario where it can be used for predicting heart disease in

patients. This can involve building a web-based or mobile application that takes patient data as

input and outputs the predicted probability of heart disease.

3.3.1 Proposed Model Flow Architecture


Fig : Proposal Model Flow Architecture
Chapter IV. Result and Discussion

In this study, we developed and evaluated machine learning algorithms for heart disease
prediction using a dataset of patients with and without heart disease. The dataset contains 303
instances and 14 features, including age, sex, blood pressure, cholesterol levels, and symptoms.
The dataset was preprocessed and split into training and testing sets, and various machine
learning algorithms were trained and evaluated on the training and testing sets.

The performance of the machine learning algorithms was evaluated using accuracy, precision,
recall, and F1-score metrics. Table 1 shows the results of the evaluation on the testing set.

Table 1: Evaluation results on the testing set

The results show that all the algorithms performed relatively well, with the artificial neural
network achieving the highest accuracy of 95%. This indicates that the artificial neural network
model is the most effective at predicting heart disease using the given features. The high
accuracy of the model suggests that it can assist healthcare professionals in making accurate
diagnoses and can potentially reduce the number of misdiagnoses and unnecessary tests.

Additionally, the results show that the use of feature selection techniques and appropriate data
preprocessing methods can improve the accuracy and efficiency of machine learning models for
heart disease prediction. In this study, we used correlation-based feature selection and principal
component analysis to select the most relevant features and reduce the dimensionality of the
dataset. The results show that the use of feature selection techniques can improve the accuracy of
the machine learning models, as shown by the higher accuracy of the models with feature
selection compared to those without.
Furthermore, the results suggest that the choice of machine learning algorithm is crucial in
achieving high accuracy in heart disease prediction. The artificial neural network outperformed
the other algorithms, suggesting that it is the most suitable algorithm for heart disease prediction
using the given features. This is consistent with previous studies that have shown the
effectiveness of artificial neural networks in healthcare applications.

Overall, the results of this study suggest that machine learning algorithms can be effective tools
for heart disease prediction and can potentially assist healthcare professionals in making accurate
diagnoses. The study highlights the importance of feature selection and appropriate data
preprocessing in machine learning models and can provide insights into the development of more
accurate and efficient diagnostic tools for heart disease.

Chapter V. Conclusion

Recent research on heart disease prediction using machine learning has produced encouraging

results. It is now possible to accurately predict a person's likelihood of having heart disease based

on their medical history and various risk factors thanks to advanced machine learning algorithms.

Heart disease can be predicted with high accuracy using machine learning models like Logistic

Regression, Decision Trees, Random Forest, Support Vector Machines (SVM), and Neural

Networks. To produce precise predictions, these models can take into account a vsariety of

variables, including age, gender, blood pressure, cholesterol, and smoking habits.

In conclusion, machine learning has created new opportunities for heart disease early detection

and prevention. Large datasets can be analyzed by machine learning models to find risk factors

that Doctors and other healthcare professionals might not see it right away. This can aid medical

professionals in making wise choices regarding patient care, improving patient outcomes and

quality of life.
References

Zhang, Y., Liu, H., & Hu, X. (2020). Heart Disease Diagnosis with Machine Learning
Algorithms: A Comprehensive Review. Computational and Mathematical Methods in Medicine,
2020, 1-15. doi: 10.1155/2020/4387524

Krittanawong, C., Zhang, H., & Wang, Z. (2018). Artificial Intelligence in Precision
Cardiovascular Medicine. Journal of the American College of Cardiology, 71(23), 2668-2679.
doi: 10.1016/j.jacc.2018.03.521

Gholami, M., & Eftekhari, A. (2019). A review on heart disease prediction using machine
learning techniques. Journal of Cardiovascular and Thoracic Research, 11(2), 80-85. doi:
10.15171/jcvtr.2019.14

Rajpurkar, P., Irvin, J., Zhu, K., Yang, B., Mehta, H., Duan, T., et al. (2018). CheXNet:
Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning. arXiv preprint
arXiv:1711.05225.

Johnson, A. E., Pollard, T. J., Berkowitz, S. J., Greenbaum, N. R., Lungren, M. P., Deng, C. Y.,
et al. (2019). MIMIC-CXR: A large publicly available database of labeled chest radiographs.
arXiv preprint arXiv:1901.07031.

Choi, E., Bahadori, M. T., Sun, J., Kulas, J. A., Schuetz, A., & Stewart, W. F. (2017). RETAIN:
An Interpretable Predictive Model for Healthcare using Reverse Time Attention Mechanism. In
Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and
Data Mining, 1297-1305. doi: 10.1145/3097983.3098146

Alizadehsani, R., Habibi, J., Hosseini, M. J., Mashayekhi, H., & Alizadeh Sani, Z. (2019).
Hybrid intelligent model for heart disease diagnosis using machine learning classifiers and
feature selection. Journal of Medical Systems, 43(3), 1-10. doi: 10.1007/s10916-019-1171-4
Khalilia, M., Chakraborty, S., & Popescu, M. (2011). Predicting disease risks from highly
imbalanced data using random forest. BMC Medical Informatics and Decision Making, 11(1), 1-
10. doi: 10.1186/1472-6947-11-S1-S5

Beaulieu-Jones, B. K., & Greene, C. S. (2016). Semi-supervised learning of the electronic health
record for phenotype stratification. Journal of Biomedical Informatics, 64, 168-178. doi:
10.1016/j.jbi.2016.10.006

Wang, Z., Guo, Y., Zhang, R., & Lu, X. (2019). Heart Disease Diagnosis Method Based on
AdaBoost Algorithm. Journal of Medical Systems, 43(7), 1-10. doi: 10.

Luo, L., Liu, H., Gao, Y., Xiao, L., Wang, Y., Chen, X., & Zhao, J. (2021). Machine learning
algorithms for predicting the risk of cardiovascular disease in patients with type 2 diabetes
mellitus. Frontiers in Cardiovascular Medicine, 8, 643827.
https://doi.org/10.3389/fcvm.2021.643827

Yoon, J., Lee, S., Kim, Y. J., Kim, H. S., & Kim, H. K. (2020). Machine learning-based
prediction of cardiovascular disease in hypertensive patients using clinical and laboratory data.
Scientific Reports, 10(1), 20536. https://doi.org/10.1038/s41598-020-77315-w

You might also like