You are on page 1of 14

Heart Disease Prediction using Machine Learning Techniques





Submitted By Submitted To
Amit Kumar CSJMA20001390011 Dr. VINEETA SINGH

Anuj Kumar CSJMA20001390013

Deepankaj Sharma CSJMA20001390020


This is to certify that project entitled “Heart Disease Prediction Using Machine Learning
Techniques” is being submitted by Amit Kumar (CSJMA20001390011), Anuj Kumar
(CSJMA20001390013), Deepankaj Sharama (CSJMA20001390020) is a record of their own work,
carried out under my supervision.

Dr. Deepak Kumar Verma Dr. Vineeta Singh
Computer Science and Engineering Computer Science and Engineering
UIET, C.S.J.M University, UIET C.S.J.M University,
Kanpur Kanpur


We would like to express our heartfelt gratitude to all the individuals who played a crucial role in the
research for this project. Without their active cooperation, the preparation of this project could not
have been completed within the specified time limit.

It is with great pleasure that we present this report on the project named “Heart Disease Prediction
Using Machine Learning Techniques” undertaken as part of our B.Tech (CSE) curriculum. We are
thankful to CSJM University for providing us with such a wonderful and challenging opportunity.

We extend warm thanks to our project guide, Dr. Vineeta Singh, for motivating us to complete this
project with complete focus and attention. She supported us throughout this project with utmost
cooperation and patience, helping us to bring this project to completion.

We wish to express our sincere thanks to Dr. Deepak Kumar Verma, Head of Department of
Computer Science & Engineering, UIET, CSJM University, Kanpur for her moral support in spite of
being busy with her duties.
Thank You…

With Gratitude
Amit Kumar (CSJMA20001390011)
Anuj Kumar (CSJMA20001390013)
Deepankaj Sharma (CSJMA20001390013)


The prevalence of heart diseases continues to be a significant global health concern, necessitating the
exploration of advanced technologies for early detection and prediction. This project presents a
comprehensive study on "Heart Disease Prediction Using Machine Learning Techniques." The
primary objective is to develop a predictive model that can assist in identifying individuals at risk of
heart diseases based on various clinical and demographic features.

The methodology involves the collection of relevant health data, preprocessing steps to handle
missing values and outliers, and the selection of essential features for model development. Machine
learning algorithms, including but not limited to Decision Trees, Random Forest, and Support Vector
Machines, are employed to train the predictive model.

The results obtained from the model are rigorously evaluated using appropriate metrics, providing
insights into the model's accuracy, sensitivity, and specificity. A comparative analysis with existing
models is conducted to assess the effectiveness of the proposed approach. The discussion section
interprets the findings, highlights the strengths and limitations of the model, and suggests potential
avenues for future research.

This project contributes to the ongoing efforts in leveraging machine learning for proactive healthcare
management. The developed model holds promise in aiding healthcare professionals in the early
identification of individuals susceptible to heart diseases, thereby facilitating timely intervention and
prevention strategies.

Keywords: Heart Disease, Machine Learning, Predictive Modelling, Healthcare, Feature Selection,
Data Analysis, Machine Learning Techniques, Ensemble Learning.


Acknowledgement ………………………………………………………….3


Chapter 1 Introduction……………………………………………………..6

Chapter 2 Problem statement And Objective…………………………….7

Chapter 3 Literature Review………………………………………………8

Chapter 4 Methodology and Implementation………………………………..9-11

4.1 Data Collection ………………………………………………….

4.2 Data Preprocessing……………………………………………....
4.3 Feature Engineering…………………………………….………..
4.4 Exploratory Data Analysis (EDA)……………………….………
4.5 Feature Selection……………………………………………...….
4.6 Model Selection ……………………………………….…………
4.7 Cross-Validation…………………………………………...……..
4.8 Performance Evaluation………………………………………….
4.9 Model Interpretability………………………………………...….
4.10 Real Time Deployment…………………………………………
4.11 User Interface development…………………………………….
4.12 Continuous monitoring and Inprovement………………………

Chapter 5 Expected Outcomes ………………………………….…….….12

Chapter 6 Conclusion and Future Work……………………………….....13


Chapter 1
Cardiovascular diseases (CVDs) are a big problem globally, causing more than 70% of deaths
worldwide. By 2030, these diseases are expected to cause even more deaths. They're expensive too,
costing about USD 3.7 trillion between 2010 and 2015. Detecting heart disease is tough because the
tests are costly, impacting both people's health and how much organizations spend.
It's crucial to find heart disease early. Using data mining and machine learning helps predict heart
disease risk, but it's complex. Factors like diabetes, high blood pressure, high cholesterol, and
irregular pulse make it hard to predict accurately.
Data mining finds hidden patterns that help doctors diagnose diseases. Machine learning also helps
predict diseases better. But, earlier studies had trouble accurately predicting how heart disease
Our main goal is to predict heart disease accurately using different machine learning methods. We
used random forest, decision trees, Boosting and Ensemble technique on a dataset from Kaggle. We
cleaned up the data to improve our models. We hope to improve on past studies by using a larger and
more diverse dataset to get more reliable results.

Chapter 2


Cardiovascular diseases remain a significant global health concern, contributing to a substantial

number of fatalities each year. Early detection and intervention are pivotal in mitigating the impact of
these diseases. The project, "Heart Disease Prediction Using Machine Learning Techniques,"
addresses the critical need for an efficient and accurate predictive model to assess the risk of heart
diseases based on individual health parameters.
Problem Statement:
Despite advancements in medical science, predicting the onset of heart diseases with precision
remains challenging. Traditional methods often lack the sophistication to analyze complex
relationships among diverse health factors. There is a pressing need for a comprehensive and reliable
predictive model that leverages machine learning techniques to enhance accuracy and provide a
timely assessment of cardiovascular risk.
The primary objective of this project is to develop an advanced machine learning-based predictive
model capable of accurately identifying the likelihood of heart diseases. The project aims to:

1. Gather a diverse and comprehensive dataset, incorporating various health parameters.

2. Employ feature engineering techniques to enhance dataset informativeness.
3. Identify influential features for heart disease prediction.
4. Utilize machine learning algorithms to construct an effective predictive model.
5. Rigorously evaluate model performance using metrics such as accuracy, precision, recall, and
F1 score.
6. Enhance model interpretability for clear understanding of influencing factors.
7. Deploy the model in real-time environments for timely predictions.
8. Develop a user-friendly interface facilitating seamless integration into clinical workflows.

The overarching goal is to contribute to healthcare by providing a valuable tool for early detection of
heart diseases, thereby improving patient outcomes and reducing cardiovascular-related mortality.

Chapter 3
Literature Review: Enhancing Predictive Health Analysis through
Machine Learning

Various studies have explored machine learning applications for predicting heart diseases.
Widespread applications include clinical decision support, diagnostics, treatment decisions, fraud
detection, and prevention (Bardhwaj et al., 2017; Shailaja et al., 2018; Sun et al., 2019; Lee &
Yoon, 2017). While general health applications have been discussed, our focus narrows to heart
disease diagnosis.

Tripoliti et al. (2017) extensively reviewed machine learning methodologies for heart failure,
addressing severity estimation and predicting re-hospitalization, mortality, and destabilizations. Other
studies used supervised classifiers, such as Naïve Bayes and Decision Trees, achieving high accuracy
in predicting heart diseases (J. & S., 2019; Kamal kant et al., 2014).

Notably, the Naïve Bayes algorithm has been highlighted as effective in heart disease prediction, with
research favoring it over other methods like Neural Networks and Decision Trees (Kamal kant et al.,
2014; Nidhi Bhatla et al., 2012). Additionally, ensemble approaches combining multiple classifiers
(Mustafa et al., 2018) and optimized Support Vector Machines (Dolatabaddi et al., 2017) have
shown promise.

The literature underscores the reliability of data mining algorithms, with SVM, Naïve Bayes,
Decision Trees, Bagging and Boosting, and Random Forest achieving high accuracy in heart disease
prediction (Jan et al., 2018). Many models using these algorithms have demonstrated efficacy, paving
the way for our research objective to explore and build an optimized model for heart disease

Chapter 4

The project's methodology involves the following steps:
1. Data collection. The first step is to gather a dataset containing features such as age,
blood pressure, cholesterol levels, and other health-related indicators.
2. Data Preprocessing: Conduct thorough data cleaning, handling missing values, and
addressing outliers. Implement techniques such as normalization and scaling to
ensure uniformity across diverse features.
3. Feature Engineering: Apply advanced feature engineering techniques to extract
relevant patterns and relationships from the dataset. This involves transforming and
creating new features to enhance the model's predictive capabilities.
4. Exploratory Data Analysis (EDA): Perform in-depth EDA to gain insights into data
distributions, correlations, and potential hidden patterns. This step guides further
decisions in feature selection and model development.
5. Feature Selection: Employ rigorous feature selection methods, including statistical
tests and model-based techniques, to identify the most influential factors contributing
to heart disease prediction.
6. Model Selection: Explore a range of machine learning algorithms such as logistic
regression, decision trees, random forests, and gradient boosting to identify the most
suitable model. Fine-tune hyperparameters to optimize model performance.
7. Cross-Validation: Implement cross-validation techniques to ensure the model's
robustness and generalizability. This includes techniques like k-fold cross-validation
to assess performance across different subsets of the dataset.
8. Performance Metrics: Evaluate the model using a comprehensive set of
performance metrics, including accuracy, precision, recall, F1 score, and area under
the ROC curve (AUC-ROC), providing a holistic view of the model's effectiveness.
9. Model Interpretability: Employ interpretability tools, such as SHAP (SHapley
Additive exPlanations), LIME (Local Interpretable Model-agnostic Explanations), or
feature importance plots, to enhance understanding of the model's decision-making

10. Real-time Deployment: Once we are satisfied with the performance of our model,
we can deploy it to a production environment. Deploy the trained model in a real-
time environment, ensuring scalability and responsiveness. Integrate the model into
healthcare systems to facilitate timely predictions.
11. User Interface Development: Design a user-friendly interface allowing healthcare
professionals to input patient data seamlessly and interpret model predictions. Ensure
the interface adheres to usability standards and promotes user trust.
12. Continuous Monitoring and Improvement: Implement mechanisms for continuous
model monitoring in production. Incorporate feedback loops and update the model as
needed to adapt to changing healthcare dynamics.

This comprehensive methodology aims to deliver a state-of-the-art predictive model that not
only accurately assesses the risk of heart diseases but also aligns with the evolving needs of
healthcare practitioners for effective patient care.

Working Flow

Flow Diagram

Machine learning model flow

This shows a flow chart diagram that outlines the sequential steps involved in predicting the
probability of heart disease. The diagram visually depicts the data preprocessing procedures, such as
cleaning, removing duplicate entries, detecting outliers, and scaling data. It also emphasizes the
utilization of ensemble learning algorithms and the process of hyperparameter tuning to train and
optimize the model. The flow chart serves as a visual representation of the study’s methodology,
aiding in the comprehension of the research process.

Chapter 5

The project aims to deliver a highly accurate predictive model for early heart disease detection,
contributing to improved patient outcomes and reduced healthcare burdens. The expected outcomes
include precise risk assessments, widespread adoption among healthcare professionals through a
user-friendly interface, and real-time predictions. The model's interpretability features aim to foster
user trust, and continuous improvement mechanisms ensure adaptability to changing healthcare
dynamics. Ultimately, the project aspires to make a positive impact on public health by aiding in the
prevention and early management of heart diseases.

Chapter 6
Work to be done in next semester

1. Consider incorporating various types of data sources beyond traditional health records. This
might involve integrating genetic data, lifestyle factors (like diet and exercise patterns),
environmental information, and wearable device data (such as heart rate variability). By
combining multiple data modalities, we could potentially improve the accuracy and
robustness of the predictive models, offering a more comprehensive understanding of an
individual's risk factors for heart disease. This approach could involve challenges related to
data fusion, feature engineering, and model adaptation to handle diverse data types effectively
2. Evaluate and discuss the impact of ensemble learning techniques on the predictive accuracy,
generalization, and stability of heart disease prediction models. Compare the ensemble
approach with single-model methods, highlighting the strengths and limitations of each
technique and also we will do hyperparameter tunning for individual base learners and
ensemble methods through techniques like grid search or randomized search to enhance
predictive accuracy.
This future work would aim to demonstrate how ensemble learning methods can enhance the
predictive power of heart disease detection models compared to individual machine learning
algorithms, thereby providing insights into the potential benefits of leveraging ensemble
techniques in healthcare applications.


1. Prabhakaran D., Jeemon P., Sharma M. The changing patterns of cardiovascular diseases and their
risk factors in the states of India: the Global Burden of Disease Study 1990–2016. Lancet Glob
Health. 2018 doi: 10.1016/s2214-109x(18)30407-8. [PMC free article] [PubMed]
[CrossRef] [Google Scholar]
2. Kasthuri A. Challenges to healthcare in India - the five A's. Indian J Community
Med. 2018;43(3):141–143. doi: 10.4103/ijcm.IJCM_194_18. [PMC free article] [PubMed]
[CrossRef] [Google Scholar]
3. George A., Badagabettu S., Berra K., George L.S., Kamath V., Thimmappa L. Prevention of
cardiovascular disease in India: barriers and opportunities for nursing. J Clin Prev
Cardiol. 2018;7:72–77. [Google Scholar]
4. Sangar S., Dutt V., Thakur R. Why people avoid prescribed medical treatment in India? Indian J
Publ Health. 2019;63:151–153. [PubMed] [Google Scholar]
5. Maini Ekta, Venkateswarlu Bondu. Artificial intelligence-futuristic pediatric healthcare. Indian
Pediatr. 2019;56:796. [PubMed] [Google Scholar]
6. Van der Heijden A.A., Abramoff M.D., Verbraak F., van Hecke M.V., Liem A., Nijpels G.
Validation of automated screening for referable diabetic retinopathy with the IDx-DR device in the
Hoorn Diabetes Care System. Acta Ophthalmol. 2017;96(1):63–68. doi: 10.1111/aos.13613. [PMC
free article] [PubMed] [CrossRef] [Google Scholar]
7. Alexander C.A., Wang L. Big data analytics in heart attack prediction. J Nurs Care. 2017;6(2)
doi: 10.4172/2167-1168.1000393. [CrossRef] [Google Scholar]
8. Maini E., Venkateswarlu B., Gupta A. International Conference on Intelligent Data
Communication Technologies and Internet of Things (ICICI) 2018. 2018. Applying machine learning
algorithms to develop a universal cardiovascular disease prediction system; pp. 627–632. [Google
9. Shafenoor Amin M., Kia Chiam Y., Dewi Varathan K. Identification of significant features and
data mining techniques in predicting heart disease. Telematics Inf. 2018
doi: 10.1016/j.tele.2018.11.007. [CrossRef] [Google Scholar]
10. Maini E., Venkateswarlu B., Gupta A. Determination of significant features for building an
efficient heart disease prediction system. Int J Recent Technol. 2019;8(2):4500–4506. [Google
11. UCI Machine Learning Repository: Heart Disease Data
Set. 2019. [Internet]. [cited
17 December 2019]. Available from: [Google Scholar]


You might also like