Professional Documents
Culture Documents
ON
HEART DISEASE PREDICTION USING MACHINE LEARNING
HEART DISEASE PREDICTION USING MACHINE
LEARNING
A Mini Project Work
Submitted in partial fulfilment of the requirements for the award of the
degree of
BACHELOR OF TECHNOLOGY
IN
ELECTRONICS AND COMMUNICATION ENGINEERING
By
KUMARI AAKANKSHA – 20EG104115
2023-2024
1
ANURAG UNIVERSITY
SCHOOL OF ENGINEERING
( Hyderabad,Venkatapur(V),Ghatkesar(M), Medchal-Malkajgiri Dist-500088 )
CERTIFICATE
This is to certify that the project report entitled Gesture Control for Computers being
submitted by
KUMARI AAKANKSHA 20EG104115
Department of ECE
External Examiner
2
ACKNOWLEDGEMENT
I would like to take this opportunity to express my heartfelt gratitude to all those
who contributed to the successful completion of this project. Their support, guidance,
and encouragement have been invaluable throughout this endeavor.
First and foremost, I extend my deepest appreciation to our project guide, Dr.P
Ramakrishna, whose expertise, mentorship and guided us at every stage of this
project. Your insights and constructive feedback were instrumental in shaping the
direction of my work.
We would like to express our deep sense of gratitude to Dr.V.Vijay Kumar, Dean
School of Engineering. Anurag Group of Institutions for his tremendous support,
encouragement and inspiration. Lastly, we thank almighty, our parents, friends for
their constant encouragement without which this assignment would not be possible.
We would like to thank all the other staff members, both teaching and non- teaching,
which have extended their timely help and eased my work.
BY
KUMARI AAKANKSHA 20EG104115
3
DECLARATION
We hereby declare that the result embodied in this project report entitled “Heart
Disease Prediction Using Machine Learning” is carried out by us during the year
2023-2024 for the partial fulfilment of the award of Bachelor of Technology in
Electronics and Communication Engineering, from ANURAG GROUP OF
INSTITUTIONS. We have not submitted this project report to any other Universities /
Institute for the award of any degree.
BY
KUMARI AAKANKSHA 20EG104115
4
ABSTRACT
In this machine learning project, we aimed to develop a predictive model for heart
disease classification based on a comprehensive dataset containing various medical
and patient-related features. The objective was to create a reliable tool that could assist
healthcare professionals in early detection and risk assessment of heart disease,
ultimately contributing to better patient care and outcomes.The project encompassed
several key stages, including data preprocessing, feature selection, model selection,
hyperparameter tuning, and performance evaluation. Data preprocessing involved
tasks such as handling missing values, encoding categorical variables, and feature
scaling to ensure data quality and compatibility with machine learning algorithms.
Feature selection played a crucial role in identifying the most informative attributes for
predicting heart disease while reducing dimensionality. A combination of domain
knowledge and feature importance techniques guided the selection process.We
evaluated multiple machine learning models, with logistic regression and random
forests emerging as the top contenders. A comprehensive hyperparameter tuning
strategy was employed to optimize the selected model's performance, balancing
precision, recall, and overall accuracy.
5
TABLE OF CONTENTS
CONTENTS
6
7.1 CONCLUSION
REFERENCES
25
APPENDIX
26
LIST OF FIGURES
LIST OF TABLES
7
CHAPTER-1
INTRODUCTION
Heart Disease Prediction: Heart disease describes a range of conditions that affect your
heart. Today, cardiovascular diseases are the leading cause ofdeath worldwide with
17.9 million deaths annually, as per the World Health Organization reports. Various
unhealthy activities are the reason for the increase in the risk of heart disease like high
cholesterol, obesity, increase in triglycerides levels, hypertension, etc.. *ere are certain
signs which the American Heart Association lists like the persons having sleep issues,
a certain increase and decrease in heart rate (ir regular heart beat), swollen legs, and in
some cases weight gain occurring quite fast; it can be 1-2 kg daily. All these
symptoms resemble different diseases also like it occurs in the aging persons, so it
becomes a difficult task to get a correct diagnosis, which results in fatality in near
future. But as time is passing, a lot of research data and patients records of hospitals
are available. There are many open sources for accessing the patient’s records and
researches can be conducted so that various computer technologies could be used for
doing the correct diagnosis of the patients and detect this disease to stop it from
becoming fatal. Now a days it is well known that machine learning and artificial
intelligence are playing a huge role in the medical industry. We can use different
machine learning and deep learning modelsto diagnose the disease and classify or
predict the results. A complete genomic data analysis can easily be done using
machine learning models. Models can be trained for knowledge of pandemic
predictions, and medical records can be transformed and analyzed more deeply for
better predictions.
Using machine learning, we can diagnose, detect, and predict various diseases.
Recently, there has been a growing interest in using data mining and machine learning
techniques to predict the likelihood of developing certain diseases. The already-
existing work contains applications of data mining techniques for predicting the
8
disease. Although some studies have attempted to predict the future risk of the
progression of the disease, they have yet to find accurate results. The main goal of this
paper is to accurately predict the possibility of heart disease in the human body.
In this research, we aim to investigate the effectiveness of various machine
learning algorithms in predicting heart disease. To achieve this goal, we employed a
variety of techniques, including random forest, decision tree classifier, and multilayer
perceptron to build predictive models. In order to improve the convergence of the
models, we applied k-modes clustering to preprocess the dataset and scale it. The
dataset used in this study is publicly available on Kaggle. All the computation,
preprocessing, and visualization were conducted on Google Colab using Python.
Previous studies have reported accuracy rates of up to 94% using machine learning
techniques for heart disease prediction. However, these studies have often used small
sample sizes, and the results may not be generalizable to larger populations. Our study
aims to address this limitation by using a larger and more diverse dataset, which is
expected to increase the generalizability of the results.
9
CHAPTER 2
LITERATURE SURVEY
In recent years,the techniques have been widely adopted and have demonstrated
efficacy in various healthcare applications, particularly in the field of medical
cardiology. The rapid accumulation of medical data has presented researchers with an
unprecedented opportunity to develop and test new algorithms in this field. Heart
disease remains a leading cause of mortality in developing nations and identifying risk
factors and early signs of the disease has become an important area of research. The
utilization of data mining and machine learning techniques in this field can potentially
aid in the early detection and prevention of heart disease.
The below papers cover a range of techniques, methodologies, and datasets used in
heart disease prediction research.
10
This review focuses on the use of deep learning methods for cardiac image
analysis, including applications in the diagnosis and prediction of heart
diseases using medical imaging data.
v) "A Comprehensive Review of Heart Disease Prediction Using Data Mining
Techniques" by S. Lakshmi et al. (2012)
This comprehensive review covers various data mining techniques applied
to heart disease prediction, including decision trees, neural networks,
support vector machines, and more.
vi) "A Survey of Machine Learning Algorithms for Disease Prediction" by R.
J. Jena et al. (2019)
While not specific to heart disease, this survey provides a broader
perspective on machine learning algorithms used for disease prediction,
which can be valuable for understanding the landscape of predictive
modeling in healthcare.
vii) "Prediction of Cardiovascular Disease on the Basis of Combined Analysis
of Ultrasound Images and Questionnaire Data" by M. F. Hasan et al. (2017)
This paper discusses the integration of medical imaging data and questionnaire data to
predict cardiovascular disease, showcasing a multidisciplinary approach.
11
CHAPTER-3
METHODOLOGY
1.1 Machine Learning (ML) is a subfield of artificial intelligence (AI) that focuses on
the development of algorithms and statistical models that enable computer systems
to learn and improve their performance on a specific task or set of tasks without
being explicitly programmed. It is a powerful tool for extracting patterns, insights,
and predictions from data. Machine learning has found applications in various
domains, including finance, healthcare, e-commerce, entertainment, and more. In
this introduction, we'll explore the fundamental concepts and components of
machine learning.
Fig.1.1:
12
Machine learning is a fundamental component of deep learning. Deep learning is a
subset of machine learning that focuses on neural networks with multiple layers (deep
neural networks) and is particularly effective for tasks involving complex and
unstructured data, such as image and speech recognition.
13
Model Learning: Neural networks consist of interconnected artificial neurons
organized into layers. Machine learning techniques, specifically
backpropagation, are used to train these networks. During training, the model
learns by adjusting the weights and biases of its neurons to minimize a chosen
loss function, effectively mapping input data to desired outputs.
14
Data Preprocessing: Machine learning methods for data preprocessing, such as
feature scaling, normalization, and data augmentation, are applied to prepare
input data for neural networks. Proper preprocessing can improve training
efficiency and model performance.
15
1.2 Importance of Machine Learning: Machine learning is an important component of
the growing field of data science. Through the use of statistical methods,
algorithms are trained to make classifications or predictions, and to uncover key
insights in data mining projects. These insights subsequently drive decision
making within applications and businesses, ideally impacting key growth metrics.
As big data continues to expand and grow, the market demand for data scientists
will increase. They will be required to help identify the most relevant business
questions and the data to answer them.
i) Data Collection: The first step in any machine learning project is to gather
and collect relevant data. This data can come from various sources, such as
databases, sensors, text documents, images, or user interactions.
ii) Data Preprocessing: Raw data is often messy and may contain missing
values, outliers, or noise. Data preprocessing involves cleaning and
preparing the data for analysis. This includes handling missing data, scaling
features, and encoding categorical variables.
16
used as input for the machine learning model. The choice of features can
significantly impact the model's performance.
iv) Data Splitting: The dataset is typically divided into two or more subsets: a
training set and a test set. The training set is used to train the machine
learning model, while the test set is used to evaluate its performance on
unseen data.
vi) Model Training: The selected machine learning model is trained using the
training data. During training, the model learns to make predictions by
adjusting its internal parameters to minimize the difference between its
predictions and the actual target values (labels).
ix) Model Deployment: If the model performs well during evaluation, it can be
deployed in a real-world environment to make predictions on new, unseen
data. This can involve integrating the model into software applications,
websites, or other systems.
17
x) Monitoring and Maintenance: Machine learning models require ongoing
monitoring and maintenance. Data distributions can change over time,
leading to model drift, and the model may need periodic retraining to
maintain its accuracy.
xiii) Ethical Considerations: It's essential to consider ethical and fairness issues
when developing and deploying machine learning models, as biases in data
or algorithms can lead to unfair or discriminatory outcomes.
18
Fig 1.4 Methods of Machine Learning
19
ii) Unsupervised learning: It is also known as unsupervised machine learning,
uses machine learning algorithms to analyze and cluster unlabeled
datasets. These algorithms discover hidden patterns or data groupings
without the need for human intervention. This method’s ability to discover
similarities and differences in information make it ideal for exploratory
data analysis, cross-selling strategies, customer segmentation, and image
and pattern recognition.
iii) Semi-supervised learning offers a happy medium between supervised and
unsupervised learning. During training, it uses a smaller labeled data set to
guide classification and feature extraction from a larger, unlabeled data
set. Semi-supervised learning can solve the problem of not having enough
labeled data for a supervised learning algorithm.
iv) Reinforcement learning (RL) is a machine learning paradigm that focuses
on training agents to make sequences of decisions in an environment to
maximize cumulative rewards. Reinforcement learning is concerned with
decision-making and learning from interactions with an environment. It is
widely used in various applications, including robotics, game playing,
autonomous systems, and recommendation systems.
20
2.Importing the dataset
21
Find the missing values:
22
Maximum and minimum scaling, also known as min-max scaling or feature scaling, is
a data preprocessing technique commonly used in machine learning. Its purpose is to
scale numerical features (variables) within a specific range, typically between 0 and 1,
to ensure that all features have the same scale. This scaling method helps prevent
certain features from dominating the learning process and can improve the
performance of some machine learning algorithms, especially those sensitive to the
scale of input features.
Find the Minimum and Maximum Values: For each feature, calculate the minimum
and maximum values across the entire dataset. Let's call these values min_val and
max_val for a particular feature.
Scale the Data: For each data point and each feature, apply the following
transformation to scale it within the desired range:
This formula scales the original value to a new value between 0 and 1, with 0
corresponding to the minimum value (min_val) and 1 corresponding to the maximum
value (max_val).
5. Logistic Regression
23
24
CHAPTER-6
This project demonstrated the effectiveness of using Logistic Regression classifiers for
prediction of heart diseases of patients. The choice of classifier can be tailored to the
specific characteristics of the data and the desired trade-offs between precision and
recall. Further enhancements to the model could involve tuning hyperparameters,
trying different feature extraction techniques.
6.1 Results:
Logistic regression:
Accuracy: 0.87
Precision: 0.85
Recall: 0.81
F1-Score: 0.83
Random Forests:
Accuracy: 0.815
Precision: 0.85
Recall: 0.809
F1-Score: 0.829
Table 6.1 Difference between Logistic Regression and Random Forests results.
Based on these results logistic regression classifier is selected over Random forest
classifier .
25
Fig 1.4 output
CHAPTER-7
7.1 CONCLUSION
the given model are Logistic regression, Random Forest Classifier. The accuracy of
our model is 87%. Use of more training data ensures the higher chances of the model
to accurately predict whether the given person has a heart disease or not . By using
these, computer aided techniques we can predict the patient fast and better and the cost
can be reduced very much. There are a number of medical databases that we can work
on as these Machine learning techniques are better and they can predict better than a
human being which helps the patient as well as the doctors. Therefore, in conclusion
this project helps us predict the patients who are diagnosed with heart diseases by
cleaning the dataset and applying logistic
regression to get an accuracy of an average of 87% on our model which is better than
the previous models having an accuracy of 81%. Also, it is concluded that accuracy of
Logistic regression is highest between the two algorithms that we have used .
26
REFERENCES
1. Krittanawong, C., Zhang, H., Wang, Z., Aydar, M., & Kitai, T. (2017). Artificial
intelligence in precision cardiovascular medicine.Journal of the American College of
Cardiology, 69(21), 2657-2664.
27
2. Chicco, D., & Jurman, G. (2020). Machine learning can predict survival of patients
with heart failure from serum creatinine and ejection fraction alone.BMC Medical
Informatics and Decision Making, 20(1), 1-15.
3. Raghavendra, U., Fujita, H., & Gudigar, A. (2019). Deep convolution neural
network for accurate diagnosis of glaucoma using digital fundus images.Information
Sciences, 480, 107-117.
5. Madhavan, P., Zhang, L., & Alshammari, F. (2020). Predictive modeling of heart
disease using machine learning techniques.Journal of King Saud University-Computer
and Information Sciences, 32(3), 2454-2461.
6. Khan, J. A., Bhoi, A. K., & Roy, P. P. (2019). A novel hybrid feature selection
method for effective prediction of heart disease using heart disease dataset.Applied
Soft Computing, 77, 438-447.
APPENDICES
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
data = pd.read_csv('/content/Heart_Disease_Prediction.csv')
28
print(data.columns)
data.fillna(data.mean(), inplace=True)
import pandas as pd
from sklearn.model_selection import train_test_split
data = pd.read_csv('/content/Heart_Disease_Prediction.csv')
29
# Splitting the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
30
if y_train.dtypes == 'int64' and y_test.dtypes == 'int64':
print("Target variable data type is consistent (integer).")
else:
print("Target variable data type is not consistent.")
print(y_train.unique())
print(y_test.unique())
print(y_train.unique())
print(y_test.unique())
y_train = y_train.astype(int)
y_test = y_test.astype(int)
31
# Creating heatmap for missing values
plt.figure(figsize=(10, 6))
sns.heatmap(data.isnull(), cmap='viridis', cbar=False)
plt.title('Missing Values Heatmap')
plt.show()
label_encoder = LabelEncoder()
data['Heart Disease'] = label_encoder.fit_transform(data['Heart Disease'])
#missing values
data['Heart Disease'].isna().sum()
data['Heart Disease'].unique()
32
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
logistic_regression_model = LogisticRegression()
logistic_regression_model.fit(X_train, y_train)
y_pred = logistic_regression_model.predict(X_test)
#cross validation
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LogisticRegression
33
logistic_regression_model = LogisticRegression()
prediction=logistic_regression_model.predict(idr)
print(prediction)
if (prediction[0]==0):
print('The person doesnot have a heart disease')
else:
print("The person has a Heart disease")
34