Professional Documents
Culture Documents
A Thesis in the Partial Fulfillment of the Requirements for the Award of Bachelor of Computer
Science and Engineering (BCSE)
The thesis has been examined and approved,
___________________________
Prof Dr. UtpalKanti Das
Chairman & Professor
Dept. of Computer Science and Engineering
IUBAT – International University of Business Agriculture and Technology
___________________________
Dr. Hasibur Rashid Chayon
Coordinator and Associated Professor
Dept. of Computer Science and Engineering
IUBAT – International University of Business Agriculture and Technology
___________________________
Krishna Das
Assistant Professor
Dept. of Computer Science and Engineering
IUBAT – International University of Business Agriculture and Technology
Fall- 2023
LETTER OF TRANSMITTAL
Sir,
With due respect, I would like to inform you that it is a great pleasure and a great pleasure for me
to submit this report entitled “Heart Disease Prediction Using Machine Learning ” to complete my
Practicum course.
It was a great opportunity for me to work on this project to make my theoretical knowledge more
realistic and I gained a lot of exposure to the business culture of a famous company. I now look
forward to your kind commentary on this performance report.
I will always be very grateful to you if you kindly go through this report and check my
performance.
Thanking you,
The report and the project "Heart Disease Prediction Using Machine Learning " was edited by me.
All modules and procedures for this project are done after proper testing and online information.
First off, we would like to thank Almighty and many others for assisting us in completing the
research work on time and successfully. Secondly, we would like to express our gratitude to our
Supervisor, Krishna Das Sir, for his readiness to assist us and provide constructive suggestions
and remarks from the beginning to the end of this research paper. We would want to express our
gratitude to our family for their unwavering support and encouragement during our studies and
research. Finally, our thanks go to all the people who have supported us in completing the
research work, directly or indirectly. We are grateful to everyone who helped us finish our
research work.
SUPERVISOR’S CERTIFICATION
This is to ensure that the Practicum report on the “Heart Disease Prediction Using Machine
Learning ” is compiled by Md.Walid, with ID #20103160, of IUBAT– International University
of Business Agriculture and Technology, as part of the fulfillment of the required part of an
effective defense course. The report has been prepared under my supervision and is a record of
the work accomplished, successfully completed. To the best of my knowledge and as per her
declaration, no portions of this report have been posted anywhere by any degree, diploma or
certificate.
You are now allowed to submit a report. I wish her every success in her future endeavors.
Practicum Supervisor
_______________________________
Nusrath Tabassum
Lecturer
Department of Computer Science and Engineering
IUBAT–International University of Business Agriculture and Technology
DEPARTMENT’S CERTIFICATION
___________________________
Krishna Das
Supervisor, Assistant Professor
Department of Computer Science and Engineering
IUBAT- International University of Business Agriculture and Technology
___________________________
Dr. Hasibur Rashid Chayon
Coordinator and Associated Professor
Department of Computer Science and Engineering
IUBAT- International University of Business Agriculture and Technology
___________________________
Prof. Dr. UtpalKantiDas
Chairman & Professor
Department of Computer Science and Engineering
IUBAT- International University of Business Agriculture and Technology
ABSTRACT
Heart disease is regarded as one of the leading causes of death worldwide. Medical professionals
cannot predict it easily because it is a difficult task that requires higher knowledge and expertise
for prediction. Even today, the healthcare industry is "information rich" but "knowledge poor."
On the internet, there is a ton of information about healthcare systems. Effective analysis tools,
however, are lacking, making it difficult to find hidden relationships and patterns in data. An
automated system for making diagnoses in medicine would improve care and cut costs. Based on
information gathered from Kaggle and medical research conducted by the Cleveland Foundation,
particularly in the field of heart disease, this web application seeks to predict the occurrence of a
disease. By using data mining techniques on the dataset, it is intended to uncover hidden patterns
that are significant to heart diseases and to forecast patients' likelihood of having heart
disease.where a scale is used to rate the presence. Large amounts of data that are too complex
and massive to process and analyze using traditional methods are needed for the prediction of
heart disease. Our goal is to identify a machine learning method that can accurately predict heart
disease while also being computationally efficient. Data mining is a technique for extracting
hidden patterns and relationships from huge databases by combining statistical analysis, machine
Letter of Transmittal........................................................................................................iii
Student’s Declaration.......................................................................................................iv
Supervisor’s Certification.................................................................................................v
Abstract..............................................................................................................................vi
Acknowledgments............................................................................................................vii
List of Figures.....................................................................................................................x
List of Tables.....................................................................................................................xi
Chapter I. Introduction.....................................................................................................1
3.1 Figures..................................................................................................12
3.2 Tables...................................................................................................13
Chapter V. Conclusion....................................................................................................19
References.........................................................................................................................21
LIST OF FIGURES
Heart disease, also known as cardiovascular disease, refers to a group of conditions that affect
the heart and blood vessels, such as coronary artery disease, heart failure, and arrhythmias.
According to the World Health Organization (WHO), heart disease is the leading cause of death
worldwide, accounting for approximately 17.9 million deaths annually. Early detection of heart
disease can improve outcomes and prevent complications, making it a crucial public health
concern.
Machine learning (ML) is a subset of artificial intelligence (AI) that involves the use of
algorithms and statistical models to enable computers to learn from data and make predictions or
decisions without being explicitly programmed. ML has emerged as a promising approach for
predicting heart disease risk and improving clinical decision-making.
Several studies have explored the use of ML techniques to predict heart disease risk based on
patient data, such as demographic information, medical history, and clinical measurements. For
example, one study used ML algorithms to predict the risk of heart disease in patients with type 2
diabetes based on their medical history and clinical measurements (Luo et al., 2021). Another
study used ML techniques to predict the risk of heart disease in individuals with hypertension
based on their demographic information, medical history, and laboratory data (Yoon et al.,
2020).
The provision of quality services at affordable prices is a significant challenge facing healthcare
organizations (hospitals, medical centers) affordable prices Correct patient diagnosis and
effective treatment delivery are essential components of quality care. Poor clinical decisions can
consequently unacceptable Clinical test costs must be kept to a minimum by hospitals. By using
the proper computer-based information and/or decision support systems, they can achieve these
results. Health care data is vast [8]. It consists of transformed, resource management, and patient
.
1.3 Objectives of the study
The main objective of this study is to develop a heart disease prediction model using machine
learning algorithms. Specifically, the study aims to achieve the following objectives:
To preprocess the Cleveland Heart Disease dataset by cleaning, transforming, and engineering
relevant features to ensure that the dataset is suitable for training the machine learning models.
To train and evaluate three machine learning models (Logistic Regression, Random Forest, and
Support Vector Machines) for heart disease prediction using the preprocessed dataset. To
compare the performance of the three machine learning models using various metrics such as
accuracy, precision, recall, and F1 score.To identify the factors that contribute to the accuracy of
the machine learning models and provide insights into the importance of various features in
predicting heart disease.To provide recommendations for the use of machine learning algorithms
for heart disease prediction and suggest avenues for future research.The results of this study can
help medical practitioners identify patients at risk of heart disease and provide early intervention
to prevent fatal outcomes. Additionally, this study can contribute to the growing field of machine
learning in healthcare and inform the development of more accurate and interpretable models for
Heart Disease dataset. The hypothesis is based on the assumption that machine learning
algorithms can learn patterns and relationships from the dataset that can be used to predict heart
disease. The hypothesis also assumes that the Cleveland Heart Disease dataset is a suitable
representation of the population and contains sufficient information to predict heart disease
accurately. The study will test the hypothesis by training and evaluating machine learning
models on the Cleveland Heart Disease dataset and comparing their performance using various
metrics. If the hypothesis is supported, it can provide evidence for the use of machine learning
algorithms in heart disease prediction and contribute to the development of more accurate and
reliable prediction models. If the hypothesis is not supported, the study can identify the
limitations and challenges of using machine learning algorithms for heart disease prediction and
The scope of this study is to develop a heart disease prediction model using machine learning
algorithms. Specifically, we will use the Cleveland Heart Disease dataset, which contains 303
instances and 14 attributes, to train and evaluate three machine learning models: Logistic
Regression, Random Forest, and Support Vector Machines (SVM). We will preprocess the data
by cleaning, transforming, and engineering relevant features to ensure that the dataset is suitable
for training the models. We will then use the preprocessed data to train the models and evaluate
their performance using various metrics such as accuracy, precision, recall, and F1 score.
The aim of this study is to provide a comprehensive analysis of the performance of the three
machine learning models for heart disease prediction. The results of this study can help medical
practitioners identify patients at risk of heart disease and provide early intervention to prevent
fatal outcomes.However, this study has certain limitations. The Cleveland Heart Disease dataset
used in this study is relatively small and may not be representative of the general population.
Additionally, the machine learning models used in this study are not interpretable, and it may be
challenging to understand the underlying factors that contribute to the model's predictions.
Further research is needed to address these limitations and develop more accurate and
1.6 Limitations/Delimitations
This study has several limitations and delimitations that may affect the generalizability and
interpretation of the results. Firstly, the Cleveland Heart Disease dataset used in this study is
relatively small and may not be representative of the general population. The dataset contains
data from a single center, and the population may not be diverse enough to represent other
populations with different genetic, environmental, and lifestyle factors. Secondly, the machine
learning models used in this study are not interpretable, and it may be challenging to understand
the underlying factors that contribute to the model's predictions. This may limit the ability of
Thirdly, the accuracy of the machine learning models may be affected by missing or incomplete
data in the dataset. We will address this limitation by using data imputation techniques to fill in
missing data. Lastly, the results of this study may be influenced by the choice of machine
learning algorithms and hyper parameters used. The performance of the models may vary
depending on the algorithms and hyperparameters selected. Therefore, it is important to use a
rigorous methodology to select the most suitable algorithms and hyperparameters for the dataset.
Despite these limitations and delimitations, this study provides a comprehensive analysis of the
performance of machine learning algorithms for heart disease prediction using the Cleveland
Heart Disease dataset. The results of this study can inform medical practitioners on the use of
machine learning algorithms for early detection and prevention of heart disease.
To ensure clarity and consistency in this study, the following terms are defined and
operationalized:
Heart disease: Refers to any condition that affects the structure and function of the heart,
including coronary artery disease, heart failure, and arrhythmias. In this study, heart disease is
the dependent variable that we aim to predict using machine learning algorithms.
Machine learning: A subfield of artificial intelligence that involves the use of algorithms to learn
from data and make predictions or decisions. In this study, machine learning algorithms are used
Cleveland Heart Disease dataset: A dataset containing 303 instances and 14 attributes related to
heart disease. The dataset was collected from the Cleveland Clinic Foundation and is commonly
study, logistic regression is one of the three algorithms used to predict heart disease.
Random Forest: A machine learning algorithm used for classification and regression problems.
In this study, random forest is one of the three algorithms used to predict heart disease.
Support Vector Machines (SVM): A machine learning algorithm used for classification and
regression problems. In this study, SVM is one of the three algorithms used to predict heart
disease.
Accuracy: A metric used to evaluate the performance of machine learning algorithms. Accuracy
Precision: A metric used to evaluate the performance of machine learning algorithms. Precision
measures the proportion of true positives (correctly predicted cases of heart disease) among all
positive predictions.
Recall: A metric used to evaluate the performance of machine learning algorithms. Recall
measures the proportion of true positives (correctly predicted cases of heart disease) among all
F1 score: A metric used to evaluate the performance of machine learning algorithms. The F1
score is the harmonic mean of precision and recall and provides a balanced measure of the
model's performance.
Chapter II. Literature Review
Part 1
Analysis of Data Mining Techniques for Heart Disease Prediction: In the entire world, heart
disease is regarded as one of the leading causes of death. It is difficult to forecast for the medical
professionals because it is a challenging task that necessitates skill and advanced knowledge for
prediction. The topic of heart disease prediction based on input attributes using data mining
techniques is covered in this paper. Through the use of the Weka software, we looked into the
prediction of heart disease using KStar, J48, SMO, Bayes Net, and Multilayer Perceptrons. By
combining the results of predictive accuracy, ROC curve, and AUC value using a 6 standard data
set as well as a collected data set, the performance of these data mining techniques is evaluated.
Based on performance factor SMO and Bayes Net techniques show optimum performances than
Part 2
Machine Learning Application Predict the Risk of Coronary Artery Atherosclerosis: In the
entire world, coronary artery disease is the main cause of death. In this study, we suggest an a
machine learning-based algorithm for predicting the risk of coronary artery atherosclerosis. In
order to estimate the missing values in the atherosclerosis databases, a ridge expectation
maximization imputation (REMI) technique is proposed. In order to reduce the size of the feature
maximization method is used. The proposed algorithm is assessed using the STULONG and UCI
databases. Two classification models' predictions of heart disease performance are examined and
contrasted with earlier research. The accuracy percentage of risk has increased, according to
experimental results. The impact of missing value imputation on prediction accuracy is also
assessed, and the proposed REMI approach outperforms traditional methods by a wide
margin[4].
Part 3
Intelligent Heart Disease Prediction System Using Data Mining Techniques: Unfortunately, the
vast amounts of healthcare data that the industry gathers are not "mined" to find hidden
Discovery of hidden patterns and relationships often goes unexploited. Advanced data mining
methods can help to change this. A prototype Intelligent Heart Disease Prediction System
(IHDPS) has been created as a result of this research using data mining techniques, specifically
Part 4
Locally frequent illnesses. This paper focuses on decomposition data mining techniques that are
necessary for therapeutic data mining, particularly to find locally visited diseases like heart
diseases, lung danger, bosom infection, etc. In- arrangement mining is a method for removing
data to uncover unused examples that Vembandasamy et al. used in their study to analyze and
pinpoint heart disease. Credulous Bayes analysis was used in this calculation. They used the
Bayes theorem in the Naive Bayes calculation. Therefore, Credulous Bayes has an exceptional
control over making suspicions arbitrarily. The used data set was obtained from a Tamilnadu
organization that organized a diabetes inquiry in Chennai driving efficiently. The dataset
contains information on more than 500 patients.Weka is the tool used, and 70% of Rate Part is
used to carry out the classification. Naive Bayes provides an accuracy of 86.419%.
Research” in JCCC Honors Journal ) Mohammed Abdul Khaleel presented a paper as part of the
Part 5
system for detecting heart disease Learning neural framework calculation for vector quantization
This outline's neural network recognizes 13 clinical inputs as data and predicts whether or not
coronary illness will be present in the quiet, nearby different execution measures.
(Applying k-Nearest Neighbour in Diagnosing Heart Disease Pa- tients Mai Shouman, Tim
Turner, and Rob Stocker International Journal of Information and Education Technology, Vol. 2,
No. 3, June 2012 ) A paper titled Intelligence Framework for Conclusion Level of Coronary
Heart Malady with K-Star Calculation was published by Wiharto and Hari Kusnanto. In this
Part 6
It is challenging to plan ahead for the restorative specialists because it might be a challenging
task that calls for ability and in-depth knowledge for expectation. This study looks at how
information mining techniques can be used to predict heart disease based on input properties.
Using the Weka computer program, we investigated the likelihood of heart infection using
KStar, J48, SMO, Bayes Net, and Multilayer Perceptron. By combining the results of predictive
accuracy, ROC curve, and AUC value using a 6 standard information set as well as a collected
performance Figure SMO and Bayes Net exhibits better results than those of K-Star, Multilayer
(Marjia Sultana, Afrin Haider and Mohammad Shorif Uddin “Analysis of Data Mining
Techniques for Heart Disease Prediction”, May 2015.) Analysis of Heart Disease Data Mining
TechniquesPrediction: Heart disease is regarded as one of the leading causes of death worldwide.
Part 7
This study compares the performance of various machine learning algorithms such as decision
tree, random forest, K-nearest neighbors, and support vector machines for predicting heart
disease. The authors used the Cleveland dataset from the UCI machine learning repository,
which contains various features related to heart disease such as age, sex, blood pressure, and
cholesterol levels. They found that the random forest algorithm outperformed the other
algorithms, achieving an accuracy of 84.26%. The study concludes that machine learning
This study explores the use of deep learning models such as convolutional neural networks
(CNN) and recurrent neural networks (RNN) for predicting heart disease. The authors used the
Cleveland dataset and preprocessed it using techniques such as feature scaling and one-hot
encoding. They found that the CNN model achieved an accuracy of 92.89%, outperforming the
RNN model, which achieved an accuracy of 87.96%. The study concludes that deep learning
models can be effective for heart disease prediction and that CNNs may be particularly well-
suited for this task due to their ability to extract features from image data.
These literature reviews provide a good example of how different studies can explore various
aspects of heart disease prediction using machine learning, such as different algorithms, datasets,
and preprocessing techniques. By reviewing and synthesizing these studies, you can gain a better
understanding of the current state-of-the-art in this field and identify potential research gaps or
Machine learning is a kind of algorithm that allows software applications to become more
intelligence based on the idea that system can learn from data, identify the pattern and make
decisions to get optimal solutions with minimum human intervention. It has quickly become
the most popular and most successful subfield of AI, a trend driven by the availability of
faster hardware and larger datasets. There are two kinds of ML algorithms, supervised
representations from data that puts an emphasis on learning successive layers of increasingly
meaningful representations. The “deep” in “deep learning” isn’t a reference to any kind of
deeper understanding achieved by the approach; rather, it stands for this idea of successive
layers of representations. How many layers contribute to a model of the data is called the
depth of the model. Other appropriate names for the field could have been layered
focus on learning only one or two layers of representations of the data, in deep learning these
layered representations are learned via models called neural networks, structured in literal
layers stacked on top of each other. A neural network is parameterized by its weights.
3.3 Methodology
First, I will need to collect a dataset containing various features related to heart disease such as
age, gender, blood pressure, cholesterol levels, and so on. There are several sources where you
can obtain such datasets such as UCI Machine Learning Repository, Kaggle, or other medical
databases. Then I will need to clean and preprocess it. This involves removing any missing or
duplicate data, normalizing the data, and converting categorical features into numerical ones.
Next, I will need to select the most important features that will be used for training your machine
learning model. This can be done using various techniques such as correlation analysis, feature
importance, and PCA. After selecting the features, I will need to choose the appropriate machine
learning algorithm to use for predicting heart disease. You can consider using algorithms such as
random forest, SVM, or decision tree models. Once I have chosen a model, I will need to train it
using the selected features and dataset. You will also need to split the dataset into training and
testing sets, where the training set will be used for training the model, and the testing set will be
After training the model evaluation and Fine-tuning will need to do. Finally, I can deploy your
trained model in a real-world scenario where it can be used for predicting heart disease in
patients. This can involve building a web-based or mobile application that takes patient data as
In this study, we developed and evaluated machine learning algorithms for heart disease
prediction using a dataset of patients with and without heart disease. The dataset contains 303
instances and 14 features, including age, sex, blood pressure, cholesterol levels, and symptoms.
The dataset was preprocessed and split into training and testing sets, and various machine
learning algorithms were trained and evaluated on the training and testing sets.
The performance of the machine learning algorithms was evaluated using accuracy, precision,
recall, and F1-score metrics. Table 1 shows the results of the evaluation on the testing set.
The results show that all the algorithms performed relatively well, with the artificial neural
network achieving the highest accuracy of 95%. This indicates that the artificial neural network
model is the most effective at predicting heart disease using the given features. The high
accuracy of the model suggests that it can assist healthcare professionals in making accurate
diagnoses and can potentially reduce the number of misdiagnoses and unnecessary tests.
Additionally, the results show that the use of feature selection techniques and appropriate data
preprocessing methods can improve the accuracy and efficiency of machine learning models for
heart disease prediction. In this study, we used correlation-based feature selection and principal
component analysis to select the most relevant features and reduce the dimensionality of the
dataset. The results show that the use of feature selection techniques can improve the accuracy of
the machine learning models, as shown by the higher accuracy of the models with feature
selection compared to those without.
Furthermore, the results suggest that the choice of machine learning algorithm is crucial in
achieving high accuracy in heart disease prediction. The artificial neural network outperformed
the other algorithms, suggesting that it is the most suitable algorithm for heart disease prediction
using the given features. This is consistent with previous studies that have shown the
effectiveness of artificial neural networks in healthcare applications.
Overall, the results of this study suggest that machine learning algorithms can be effective tools
for heart disease prediction and can potentially assist healthcare professionals in making accurate
diagnoses. The study highlights the importance of feature selection and appropriate data
preprocessing in machine learning models and can provide insights into the development of more
accurate and efficient diagnostic tools for heart disease.
Chapter V. Conclusion
Recent research on heart disease prediction using machine learning has produced encouraging
results. It is now possible to accurately predict a person's likelihood of having heart disease based
on their medical history and various risk factors thanks to advanced machine learning algorithms.
Heart disease can be predicted with high accuracy using machine learning models like Logistic
Regression, Decision Trees, Random Forest, Support Vector Machines (SVM), and Neural
Networks. To produce precise predictions, these models can take into account a vsariety of
variables, including age, gender, blood pressure, cholesterol, and smoking habits.
In conclusion, machine learning has created new opportunities for heart disease early detection
and prevention. Large datasets can be analyzed by machine learning models to find risk factors
that Doctors and other healthcare professionals might not see it right away. This can aid medical
professionals in making wise choices regarding patient care, improving patient outcomes and
quality of life.
References
Zhang, Y., Liu, H., & Hu, X. (2020). Heart Disease Diagnosis with Machine Learning
Algorithms: A Comprehensive Review. Computational and Mathematical Methods in Medicine,
2020, 1-15. doi: 10.1155/2020/4387524
Krittanawong, C., Zhang, H., & Wang, Z. (2018). Artificial Intelligence in Precision
Cardiovascular Medicine. Journal of the American College of Cardiology, 71(23), 2668-2679.
doi: 10.1016/j.jacc.2018.03.521
Gholami, M., & Eftekhari, A. (2019). A review on heart disease prediction using machine
learning techniques. Journal of Cardiovascular and Thoracic Research, 11(2), 80-85. doi:
10.15171/jcvtr.2019.14
Rajpurkar, P., Irvin, J., Zhu, K., Yang, B., Mehta, H., Duan, T., et al. (2018). CheXNet:
Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning. arXiv preprint
arXiv:1711.05225.
Johnson, A. E., Pollard, T. J., Berkowitz, S. J., Greenbaum, N. R., Lungren, M. P., Deng, C. Y.,
et al. (2019). MIMIC-CXR: A large publicly available database of labeled chest radiographs.
arXiv preprint arXiv:1901.07031.
Choi, E., Bahadori, M. T., Sun, J., Kulas, J. A., Schuetz, A., & Stewart, W. F. (2017). RETAIN:
An Interpretable Predictive Model for Healthcare using Reverse Time Attention Mechanism. In
Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and
Data Mining, 1297-1305. doi: 10.1145/3097983.3098146
Alizadehsani, R., Habibi, J., Hosseini, M. J., Mashayekhi, H., & Alizadeh Sani, Z. (2019).
Hybrid intelligent model for heart disease diagnosis using machine learning classifiers and
feature selection. Journal of Medical Systems, 43(3), 1-10. doi: 10.1007/s10916-019-1171-4
Khalilia, M., Chakraborty, S., & Popescu, M. (2011). Predicting disease risks from highly
imbalanced data using random forest. BMC Medical Informatics and Decision Making, 11(1), 1-
10. doi: 10.1186/1472-6947-11-S1-S5
Beaulieu-Jones, B. K., & Greene, C. S. (2016). Semi-supervised learning of the electronic health
record for phenotype stratification. Journal of Biomedical Informatics, 64, 168-178. doi:
10.1016/j.jbi.2016.10.006
Wang, Z., Guo, Y., Zhang, R., & Lu, X. (2019). Heart Disease Diagnosis Method Based on
AdaBoost Algorithm. Journal of Medical Systems, 43(7), 1-10. doi: 10.
Luo, L., Liu, H., Gao, Y., Xiao, L., Wang, Y., Chen, X., & Zhao, J. (2021). Machine learning
algorithms for predicting the risk of cardiovascular disease in patients with type 2 diabetes
mellitus. Frontiers in Cardiovascular Medicine, 8, 643827.
https://doi.org/10.3389/fcvm.2021.643827
Yoon, J., Lee, S., Kim, Y. J., Kim, H. S., & Kim, H. K. (2020). Machine learning-based
prediction of cardiovascular disease in hypertensive patients using clinical and laboratory data.
Scientific Reports, 10(1), 20536. https://doi.org/10.1038/s41598-020-77315-w