You are on page 1of 35

PROJECT REPORT

ON
HEART DISEASE PREDICTION USING MACHINE LEARNING
HEART DISEASE PREDICTION USING MACHINE
LEARNING
A Mini Project Work
Submitted in partial fulfilment of the requirements for the award of the
degree of
BACHELOR OF TECHNOLOGY
IN
ELECTRONICS AND COMMUNICATION ENGINEERING
By
KUMARI AAKANKSHA – 20EG104115

Under the guidance of


MR.Dr.P. RAMAKRISHNA
Assistant Professor
Department of ECE

Department of Electronics and Communication Engineering


ANURAG UNIVERSITY
SCHOOL OF ENGINEERING
Hyderabad,Venkatapur(V), Ghatkesar(M), Medchal-Malkajgiri Dist-500088

2023-2024

1
ANURAG UNIVERSITY
SCHOOL OF ENGINEERING
( Hyderabad,Venkatapur(V),Ghatkesar(M), Medchal-Malkajgiri Dist-500088 )

DEPARTMENT OF ELECTRONICS AND COMMUNICATION


ENGINEERING

CERTIFICATE
This is to certify that the project report entitled Gesture Control for Computers being
submitted by
KUMARI AAKANKSHA 20EG104115

in partial fulfillment for the award of the Degree of Bachelor of Technology in


Electronics & Communication Engineering to the Jawaharlal Nehru Technological
University, Hyderabad is a record of bonafide work carried out under my guidance and
supervision. The results embodied in this project report have not been submitted to any
other University or Institute for the award of any Degree or Diploma.

DR.P.RAMAKRISHNA N.MANGALA GOURI


Assistant Professor Head of the Department

Department of ECE

External Examiner

2
ACKNOWLEDGEMENT
I would like to take this opportunity to express my heartfelt gratitude to all those
who contributed to the successful completion of this project. Their support, guidance,
and encouragement have been invaluable throughout this endeavor.

First and foremost, I extend my deepest appreciation to our project guide, Dr.P
Ramakrishna, whose expertise, mentorship and guided us at every stage of this
project. Your insights and constructive feedback were instrumental in shaping the
direction of my work.

We express our sincere gratitude to Dr. N. Mangala Gouri, Head of the


Department, of Electronics and Communication Engineering for her precious
suggestions for the successful completion of this project. She is also a great source of
inspiration to our work.

We would like to express our deep sense of gratitude to Dr.V.Vijay Kumar, Dean
School of Engineering. Anurag Group of Institutions for his tremendous support,
encouragement and inspiration. Lastly, we thank almighty, our parents, friends for
their constant encouragement without which this assignment would not be possible.
We would like to thank all the other staff members, both teaching and non- teaching,
which have extended their timely help and eased my work.

BY
KUMARI AAKANKSHA 20EG104115

3
DECLARATION

We hereby declare that the result embodied in this project report entitled “Heart
Disease Prediction Using Machine Learning” is carried out by us during the year
2023-2024 for the partial fulfilment of the award of Bachelor of Technology in
Electronics and Communication Engineering, from ANURAG GROUP OF
INSTITUTIONS. We have not submitted this project report to any other Universities /
Institute for the award of any degree.

BY
KUMARI AAKANKSHA 20EG104115

4
ABSTRACT
In this machine learning project, we aimed to develop a predictive model for heart
disease classification based on a comprehensive dataset containing various medical
and patient-related features. The objective was to create a reliable tool that could assist
healthcare professionals in early detection and risk assessment of heart disease,
ultimately contributing to better patient care and outcomes.The project encompassed
several key stages, including data preprocessing, feature selection, model selection,
hyperparameter tuning, and performance evaluation. Data preprocessing involved
tasks such as handling missing values, encoding categorical variables, and feature
scaling to ensure data quality and compatibility with machine learning algorithms.

Feature selection played a crucial role in identifying the most informative attributes for
predicting heart disease while reducing dimensionality. A combination of domain
knowledge and feature importance techniques guided the selection process.We
evaluated multiple machine learning models, with logistic regression and random
forests emerging as the top contenders. A comprehensive hyperparameter tuning
strategy was employed to optimize the selected model's performance, balancing
precision, recall, and overall accuracy.

Performance evaluation was conducted using various metrics, including accuracy,


precision, recall, and f1-score. Additionally.The results demonstrated the effectiveness
of the developed predictive model in heart disease classification, with
accuracy,precision,recall and f1-score. These findings underline the potential utility
of machine learning in healthcare settings and suggest that the model could be
integrated into clinical practice to assist medical professionals in diagnosing and
managing heart disease.

This project exemplifies the application of machine learning in addressing critical


healthcare challenges and underscores the importance of data-driven solutions in
enhancing patient care and outcomes.

5
TABLE OF CONTENTS
CONTENTS

Name of the Content


Page No
LIST OF FIGURES
6
LIST OF TABLES
LIST OF ABBREVATION
ABSTRACT
7
CHAPTER 1 : INTRODUCTION
8
1.1 HEART DISEASE PREDICTION USING MACHINE
LEARNING
CHAPTER 2: LITERATURE SURVEY
11
CHAPTER 3: SOFTWARE REQUIREMENTS
16
3.1 KAGGLE
3.2 GOOGLE COLAB

CHAPTER 5: PROPOSED METHODOLOGY


18
5.1 PROPOSED METHOD
CHAPTER 6: RESULTS AND DISCUSSION
22
6.1 RESULTS
CHAPTER 7: CONCLUSION AND FUTURE SCOPE
24

6
7.1 CONCLUSION
REFERENCES
25
APPENDIX
26

LIST OF FIGURES

Fig No. Name of the figure Page No.

1.1 Functions of Machine Learning 12


1.2 Methods of Machine Learning 15
1.3 Missing values 19
1.4 Output Snapshot 21

LIST OF TABLES

Table No. Name of the Table Page No.


6.1 Difference between Logistic Regression 25
and Random Forests results.

7
CHAPTER-1
INTRODUCTION
Heart Disease Prediction: Heart disease describes a range of conditions that affect your
heart. Today, cardiovascular diseases are the leading cause ofdeath worldwide with
17.9 million deaths annually, as per the World Health Organization reports. Various
unhealthy activities are the reason for the increase in the risk of heart disease like high
cholesterol, obesity, increase in triglycerides levels, hypertension, etc.. *ere are certain
signs which the American Heart Association lists like the persons having sleep issues,
a certain increase and decrease in heart rate (ir regular heart beat), swollen legs, and in
some cases weight gain occurring quite fast; it can be 1-2 kg daily. All these
symptoms resemble different diseases also like it occurs in the aging persons, so it
becomes a difficult task to get a correct diagnosis, which results in fatality in near
future. But as time is passing, a lot of research data and patients records of hospitals
are available. There are many open sources for accessing the patient’s records and
researches can be conducted so that various computer technologies could be used for
doing the correct diagnosis of the patients and detect this disease to stop it from
becoming fatal. Now a days it is well known that machine learning and artificial
intelligence are playing a huge role in the medical industry. We can use different
machine learning and deep learning modelsto diagnose the disease and classify or
predict the results. A complete genomic data analysis can easily be done using
machine learning models. Models can be trained for knowledge of pandemic
predictions, and medical records can be transformed and analyzed more deeply for
better predictions.

Using machine learning, we can diagnose, detect, and predict various diseases.
Recently, there has been a growing interest in using data mining and machine learning
techniques to predict the likelihood of developing certain diseases. The already-
existing work contains applications of data mining techniques for predicting the

8
disease. Although some studies have attempted to predict the future risk of the
progression of the disease, they have yet to find accurate results. The main goal of this
paper is to accurately predict the possibility of heart disease in the human body.
In this research, we aim to investigate the effectiveness of various machine
learning algorithms in predicting heart disease. To achieve this goal, we employed a
variety of techniques, including random forest, decision tree classifier, and multilayer
perceptron to build predictive models. In order to improve the convergence of the
models, we applied k-modes clustering to preprocess the dataset and scale it. The
dataset used in this study is publicly available on Kaggle. All the computation,
preprocessing, and visualization were conducted on Google Colab using Python.
Previous studies have reported accuracy rates of up to 94% using machine learning
techniques for heart disease prediction. However, these studies have often used small
sample sizes, and the results may not be generalizable to larger populations. Our study
aims to address this limitation by using a larger and more diverse dataset, which is
expected to increase the generalizability of the results.

9
CHAPTER 2
LITERATURE SURVEY

In recent years,the techniques have been widely adopted and have demonstrated
efficacy in various healthcare applications, particularly in the field of medical
cardiology. The rapid accumulation of medical data has presented researchers with an
unprecedented opportunity to develop and test new algorithms in this field. Heart
disease remains a leading cause of mortality in developing nations and identifying risk
factors and early signs of the disease has become an important area of research. The
utilization of data mining and machine learning techniques in this field can potentially
aid in the early detection and prevention of heart disease.

The below papers cover a range of techniques, methodologies, and datasets used in
heart disease prediction research.

i) "A Survey of Heart Disease Prediction Strategies" by M. A. Mohammed et


al. (2018)
This survey provides an overview of various data mining and machine
learning techniques used for heart disease prediction. It covers datasets,
preprocessing, and different classification algorithms applied in the field.
ii) "Predicting Heart Disease Using Data Mining Techniques" by M. Shaveta
et al. (2014)
This paper explores the use of data mining techniques like decision trees,
Naive Bayes, and k-Nearest Neighbors for heart disease prediction. It
discusses feature selection and evaluation metrics.
iii) In a study by Drod et al. (2022) [2], the objective was to use machine learning
(ML) techniques to identify the most significant risk variables for cardiovascular
disease (CVD) in patients with metabolic-associated fatty liver disease (MAFLD).
iv) "Deep Learning for Cardiac Image Analysis: A Review" by O. Bernard et
al. (2018)

10
This review focuses on the use of deep learning methods for cardiac image
analysis, including applications in the diagnosis and prediction of heart
diseases using medical imaging data.
v) "A Comprehensive Review of Heart Disease Prediction Using Data Mining
Techniques" by S. Lakshmi et al. (2012)
This comprehensive review covers various data mining techniques applied
to heart disease prediction, including decision trees, neural networks,
support vector machines, and more.
vi) "A Survey of Machine Learning Algorithms for Disease Prediction" by R.
J. Jena et al. (2019)
While not specific to heart disease, this survey provides a broader
perspective on machine learning algorithms used for disease prediction,
which can be valuable for understanding the landscape of predictive
modeling in healthcare.
vii) "Prediction of Cardiovascular Disease on the Basis of Combined Analysis
of Ultrasound Images and Questionnaire Data" by M. F. Hasan et al. (2017)

This paper discusses the integration of medical imaging data and questionnaire data to
predict cardiovascular disease, showcasing a multidisciplinary approach.

viii) "Cardiovascular Disease Diagnosis via Deep Learning: A Review" by S.


Yassin et al. (2020)
This review focuses on the application of deep learning techniques, such as
convolutional neural networks (CNNs) and recurrent neural networks
(RNNs), for cardiovascular disease diagnosis and prediction.
ix) "Predicting Heart Disease Using Machine Learning Algorithms" by S.
Mahajan et al. (2019)
This study explores the use of machine learning algorithms, including
Random Forest, Decision Trees, and k-Nearest Neighbors, for heart disease
prediction and compares their performance.
x) "Heart Disease Prediction Using Ensemble of Machine Learning
Algorithms" by R. R. Thabtah (2018)

11
CHAPTER-3
METHODOLOGY

1.1 Machine Learning (ML) is a subfield of artificial intelligence (AI) that focuses on
the development of algorithms and statistical models that enable computer systems
to learn and improve their performance on a specific task or set of tasks without
being explicitly programmed. It is a powerful tool for extracting patterns, insights,
and predictions from data. Machine learning has found applications in various
domains, including finance, healthcare, e-commerce, entertainment, and more. In
this introduction, we'll explore the fundamental concepts and components of
machine learning.

Fig.1.1 Functions of Machine Learning

Fig.1.1:

a) Machine Learning in Deep Learning:

12
Machine learning is a fundamental component of deep learning. Deep learning is a
subset of machine learning that focuses on neural networks with multiple layers (deep
neural networks) and is particularly effective for tasks involving complex and
unstructured data, such as image and speech recognition.

b) Machine Learning in Algorithms:


Machine learning can be used within algorithms to enhance their functionality
and decision-making capabilities.
The several ways in which machine learning techniques can be incorporated
into algorithms are through Data-driven Decisions, Pattern Recognition,
Optimization, Anomaly Detection, etc.

c) Machine learning in AI:


For several reasons, machine learning is a critical component of artificial
intelligence (AI), and its integration into AI systems is essential for achieving
many AI goals and capabilities. Machine Learning is needed in AI for Data-
Driven Decision Making, Pattern Recognition, Automation, Natural Language
Processing, etc.

d) Machine Learning in Data Mining:


Machine learning plays a significant role in data mining by providing the tools
and techniques to extract valuable patterns, knowledge, and insights from large
and complex datasets. Data mining is the process of discovering hidden
patterns, relationships, and trends within data, and machine learning algorithms
are a crucial component of this process.

e) Machine learning plays a central role in neural networks:


Particularly in training and optimizing these networks to perform various tasks.
Neural networks are a subset of machine learning models inspired by the
structure and functioning of the human brain.

13
Model Learning: Neural networks consist of interconnected artificial neurons
organized into layers. Machine learning techniques, specifically
backpropagation, are used to train these networks. During training, the model
learns by adjusting the weights and biases of its neurons to minimize a chosen
loss function, effectively mapping input data to desired outputs.

Activation Functions: Machine learning principles guide the choice of


activation functions used in neural networks. Activation functions introduce
non-linearity into the model, allowing it to capture complex relationships in
data. Common activation functions include the Rectified Linear Unit (ReLU),
sigmoid, and hyperbolic tangent (tanh).

Loss Functions: The selection of an appropriate loss function is crucial in


training neural networks. Machine learning considerations guide the choice of
loss functions based on the nature of the task, whether it's regression,
classification, or another problem.

Optimization Algorithms: Machine learning optimization algorithms, such as


stochastic gradient descent (SGD), Adam, RMSprop, and others, are applied to
update the model's parameters during training. These algorithms adjust the
weights and biases to minimize the loss and improve the model's performance.

Hyperparameter Tuning: Neural networks have various hyperparameters that


need to be fine-tuned for optimal performance. Machine learning techniques,
including grid search, random search, and Bayesian optimization, are used to
find the best hyperparameter settings.

Regularization: Techniques from machine learning, such as dropout and L1/L2


regularization, are used to prevent overfitting in neural networks.
Regularization helps the model generalize better to unseen data.

14
Data Preprocessing: Machine learning methods for data preprocessing, such as
feature scaling, normalization, and data augmentation, are applied to prepare
input data for neural networks. Proper preprocessing can improve training
efficiency and model performance.

Transfer Learning: Transfer learning, a machine learning concept, is frequently


applied in neural networks. Pre-trained neural network architectures (e.g.,
convolutional neural networks like VGG16, ResNet, or language models like
BERT) are fine-tuned for specific tasks, saving time and resources compared to
training from scratch.

Ensemble Methods: Neural networks can benefit from ensemble learning


techniques, which combine the predictions of multiple models to improve
overall performance and reduce overfitting. This is particularly useful in
applications like image classification and object detection.

Interpretability and Explainability: Neural networks can be challenging to


interpret. Machine learning techniques are employed to provide insights into
model decisions, such as feature importance analysis, saliency maps, and
attention mechanisms.

AutoML: Automated Machine Learning (AutoML) tools simplify the process


of designing and training neural networks. AutoML automates architecture
search, hyperparameter tuning, and other aspects of model development.

In essence, machine learning is the driving force behind the training,


optimization, and effective functioning of neural networks. Neural networks
leverage machine learning techniques to model complex relationships in data
and excel in a wide range of applications, including image recognition, natural
language processing, robotics, and reinforcement learning.

15
1.2 Importance of Machine Learning: Machine learning is an important component of
the growing field of data science. Through the use of statistical methods,
algorithms are trained to make classifications or predictions, and to uncover key
insights in data mining projects. These insights subsequently drive decision
making within applications and businesses, ideally impacting key growth metrics.
As big data continues to expand and grow, the market demand for data scientists
will increase. They will be required to help identify the most relevant business
questions and the data to answer them.

1.3 Working of Machine Learning


Machine learning is a versatile technology that can be applied to a wide range
of problems, from image recognition and natural language processing to
recommendation systems and autonomous vehicles. The choice of algorithms and
techniques depends on the specific problem and the available data. Successful
machine learning projects often involve an iterative process of data exploration,
model development, and evaluation until the desired level of performance is
achieved.
The following are the steps involved in working of Machine learning:

i) Data Collection: The first step in any machine learning project is to gather
and collect relevant data. This data can come from various sources, such as
databases, sensors, text documents, images, or user interactions.

ii) Data Preprocessing: Raw data is often messy and may contain missing
values, outliers, or noise. Data preprocessing involves cleaning and
preparing the data for analysis. This includes handling missing data, scaling
features, and encoding categorical variables.

iii) Feature Engineering: Feature engineering is the process of selecting,


transforming, or creating new features (attributes) from the data that will be

16
used as input for the machine learning model. The choice of features can
significantly impact the model's performance.

iv) Data Splitting: The dataset is typically divided into two or more subsets: a
training set and a test set. The training set is used to train the machine
learning model, while the test set is used to evaluate its performance on
unseen data.

v) Model Selection: Depending on the nature of the problem (classification,


regression, clustering, etc.), a suitable machine learning algorithm or model
is chosen. There are many algorithms to choose from, ranging from simple
linear regression to complex deep neural networks.

vi) Model Training: The selected machine learning model is trained using the
training data. During training, the model learns to make predictions by
adjusting its internal parameters to minimize the difference between its
predictions and the actual target values (labels).

vii) Model Evaluation: After training, the model's performance is evaluated


using the test data. Common evaluation metrics include accuracy,
precision, recall, F1-score, mean squared error (MSE), and others,
depending on the type of problem.

viii) Hyperparameter Tuning: Machine learning models often have


hyperparameters, which are settings that are not learned from the data but
are set prior to training. Hyperparameter tuning involves finding the best
combination of hyperparameters to optimize the model's performance.

ix) Model Deployment: If the model performs well during evaluation, it can be
deployed in a real-world environment to make predictions on new, unseen
data. This can involve integrating the model into software applications,
websites, or other systems.

17
x) Monitoring and Maintenance: Machine learning models require ongoing
monitoring and maintenance. Data distributions can change over time,
leading to model drift, and the model may need periodic retraining to
maintain its accuracy.

xi) Feedback Loop: In some applications, feedback loops are used to


continually improve the model's performance. Feedback from users or real-
world outcomes can be used to retrain and adapt the model over time.

xii) Interpretability and Explainability: Understanding why a machine learning


model makes specific predictions or decisions is crucial, especially in
applications where transparency and accountability are important. Various
techniques are used to interpret and explain model predictions.

xiii) Ethical Considerations: It's essential to consider ethical and fairness issues
when developing and deploying machine learning models, as biases in data
or algorithms can lead to unfair or discriminatory outcomes.

1.4 Methods of Machine Learning:

18
Fig 1.4 Methods of Machine Learning

i) In supervised learning, the training dataset consists of pairs of input data


(features) and their corresponding output data (labels or targets). The
labels represent the correct answers or desired outcomes. During the
training phase, the machine learning model learns to make predictions by
adjusting its internal parameters based on the input features and their
corresponding labels. The model's objective is to minimize the difference
between its predictions and the actual labels.
Types of Supervised Learning:
Classification: In classification tasks, the goal is to assign input data points
to predefined categories or classes. For example, classifying emails as
spam or not spam, or recognizing handwritten digits as numbers 0 to 9.
Regression: In regression tasks, the goal is to predict a continuous
numerical value. For example, predicting house prices based on features
like square footage, number of bedrooms, and location.

19
ii) Unsupervised learning: It is also known as unsupervised machine learning,
uses machine learning algorithms to analyze and cluster unlabeled
datasets. These algorithms discover hidden patterns or data groupings
without the need for human intervention. This method’s ability to discover
similarities and differences in information make it ideal for exploratory
data analysis, cross-selling strategies, customer segmentation, and image
and pattern recognition.
iii) Semi-supervised learning offers a happy medium between supervised and
unsupervised learning. During training, it uses a smaller labeled data set to
guide classification and feature extraction from a larger, unlabeled data
set. Semi-supervised learning can solve the problem of not having enough
labeled data for a supervised learning algorithm.
iv) Reinforcement learning (RL) is a machine learning paradigm that focuses
on training agents to make sequences of decisions in an environment to
maximize cumulative rewards. Reinforcement learning is concerned with
decision-making and learning from interactions with an environment. It is
widely used in various applications, including robotics, game playing,
autonomous systems, and recommendation systems.

The steps involved here are:


1.Importing the Depenedencies/Libraries

20
2.Importing the dataset

3.Train and Split the data

21
Find the missing values:

Fig 1.3 Missing values

Maximum and Minimum scaling:

22
Maximum and minimum scaling, also known as min-max scaling or feature scaling, is
a data preprocessing technique commonly used in machine learning. Its purpose is to
scale numerical features (variables) within a specific range, typically between 0 and 1,
to ensure that all features have the same scale. This scaling method helps prevent
certain features from dominating the learning process and can improve the
performance of some machine learning algorithms, especially those sensitive to the
scale of input features.

Find the Minimum and Maximum Values: For each feature, calculate the minimum
and maximum values across the entire dataset. Let's call these values min_val and
max_val for a particular feature.

Scale the Data: For each data point and each feature, apply the following
transformation to scale it within the desired range:

Scaled Value = (Original Value - min_val) / (max_val - min_val)

This formula scales the original value to a new value between 0 and 1, with 0
corresponding to the minimum value (min_val) and 1 corresponding to the maximum
value (max_val).

5. Logistic Regression

o Logistic regression predicts the output of a categorical dependent variable.


Therefore the outcome must be a categorical or discrete value. It can be either
Yes or No, 0 or 1, true or False, etc. but instead of giving the exact value as 0
and 1, it gives the probabilistic values which lie between 0 and 1.
o Logistic Regression is much similar to the Linear Regression except that how
they are used. Linear Regression is used for solving Regression problems,
whereas Logistic regression is used for solving the classification problems.

23
24
CHAPTER-6

6.1 RESULTS AND DISCUSSION

This project demonstrated the effectiveness of using Logistic Regression classifiers for
prediction of heart diseases of patients. The choice of classifier can be tailored to the
specific characteristics of the data and the desired trade-offs between precision and
recall. Further enhancements to the model could involve tuning hyperparameters,
trying different feature extraction techniques.

6.1 Results:

Logistic regression:

Accuracy: 0.87
Precision: 0.85
Recall: 0.81
F1-Score: 0.83

Random Forests:

Accuracy: 0.815
Precision: 0.85
Recall: 0.809
F1-Score: 0.829

Table 6.1 Difference between Logistic Regression and Random Forests results.
Based on these results logistic regression classifier is selected over Random forest
classifier .

25
Fig 1.4 output

CHAPTER-7
7.1 CONCLUSION

A cardiovascular disease detection model has been developed using two ML


classification modelling techniques. This project predicts people with cardiovascular
disease by extracting the patient medical history that leads to a fatal heart disease from
a dataset that includes patients. This Heart Disease detection system assists a patient
based on his/her clinical information of them been diagnosed with a previous heart
disease. The algorithms used in building

the given model are Logistic regression, Random Forest Classifier. The accuracy of
our model is 87%. Use of more training data ensures the higher chances of the model
to accurately predict whether the given person has a heart disease or not . By using
these, computer aided techniques we can predict the patient fast and better and the cost
can be reduced very much. There are a number of medical databases that we can work
on as these Machine learning techniques are better and they can predict better than a
human being which helps the patient as well as the doctors. Therefore, in conclusion
this project helps us predict the patients who are diagnosed with heart diseases by
cleaning the dataset and applying logistic

regression to get an accuracy of an average of 87% on our model which is better than
the previous models having an accuracy of 81%. Also, it is concluded that accuracy of
Logistic regression is highest between the two algorithms that we have used .

26
REFERENCES

1. Krittanawong, C., Zhang, H., Wang, Z., Aydar, M., & Kitai, T. (2017). Artificial
intelligence in precision cardiovascular medicine.Journal of the American College of
Cardiology, 69(21), 2657-2664.

27
2. Chicco, D., & Jurman, G. (2020). Machine learning can predict survival of patients
with heart failure from serum creatinine and ejection fraction alone.BMC Medical
Informatics and Decision Making, 20(1), 1-15.

3. Raghavendra, U., Fujita, H., & Gudigar, A. (2019). Deep convolution neural
network for accurate diagnosis of glaucoma using digital fundus images.Information
Sciences, 480, 107-117.

4. Bisong, E. (2019). Machine learning for predictive modeling: A case study on


cardiovascular disease.Journal of King Saud University-Computer and Information
Sciences.

5. Madhavan, P., Zhang, L., & Alshammari, F. (2020). Predictive modeling of heart
disease using machine learning techniques.Journal of King Saud University-Computer
and Information Sciences, 32(3), 2454-2461.

6. Khan, J. A., Bhoi, A. K., & Roy, P. P. (2019). A novel hybrid feature selection
method for effective prediction of heart disease using heart disease dataset.Applied
Soft Computing, 77, 438-447.

APPENDICES
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split

data = pd.read_csv('/content/Heart_Disease_Prediction.csv')

28
print(data.columns)

print(data.head()) # View the first few rows of data


print(data.info()) # Check data types and missing values

data.fillna(data.mean(), inplace=True)

from sklearn.preprocessing import StandardScaler


columns_to_scale = ['Age', 'BP', 'Cholesterol', 'Max HR', 'ST depression','Number of
vessels fluro']
scaler = StandardScaler()
data[columns_to_scale] = scaler.fit_transform(data[columns_to_scale])

data = data[[ 'Age', 'Sex', 'Chest pain type', 'BP', 'Cholesterol',


'FBS over 120', 'EKG results', 'Max HR', 'Exercise angina',
'ST depression', 'Slope of ST', 'Number of vessels fluro', 'Thallium']]

import pandas as pd
from sklearn.model_selection import train_test_split

data = pd.read_csv('/content/Heart_Disease_Prediction.csv')

# features (X) and target variable (y)


X = data.drop(columns=['Age', 'Sex', 'Chest pain type', 'BP', 'Cholesterol','FBS over
120','EKG results','Max HR', 'Exercise angina','ST depression', 'Slope of ST', 'Number
of vessels fluro', 'Thallium']) # Features (exclude the target variable)
y = data['Heart Disease']

29
# Splitting the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Checking if the column names are same or not


if list(X_train.columns) == list(X_test.columns):
print("Column names are consistent.")
else:
print("Column names are not consistent.")

# Checking data types


if X_train.dtypes.equals(X_test.dtypes):
print("Data types are consistent.")
else:
print("Data types are not consistent.")

# Checking the number of columns


if X_train.shape[1] == X_test.shape[1]:
print("Number of columns is consistent.")
else:
print("Number of columns is not consistent.")

# Checking for missing values


if X_train.isnull().sum().sum() == 0 and X_test.isnull().sum().sum() == 0:
print("No missing values in both datasets.")
else:
print("Missing values detected in one or both datasets.")

# Checking the data type of the target variable

30
if y_train.dtypes == 'int64' and y_test.dtypes == 'int64':
print("Target variable data type is consistent (integer).")
else:
print("Target variable data type is not consistent.")

# Checking unique values in the target variable


unique_values_train = y_train.unique()
unique_values_test = y_test.unique()

if len(unique_values_train) == 2 and len(unique_values_test) == 2:


print("Target variable has consistent unique values.")
else:
print("Target variable has inconsistent unique values.")

print(y_train.unique())
print(y_test.unique())

target_mapping = {'Absence': 0, 'Presence': 1}


y_train = y_train.map(target_mapping)
y_test = y_test.map(target_mapping)

print(y_train.unique())
print(y_test.unique())

y_train = y_train.astype(int)
y_test = y_test.astype(int)

import matplotlib.pyplot as plt


import seaborn as sns

31
# Creating heatmap for missing values
plt.figure(figsize=(10, 6))
sns.heatmap(data.isnull(), cmap='viridis', cbar=False)
plt.title('Missing Values Heatmap')
plt.show()

from sklearn.preprocessing import LabelEncoder

label_encoder = LabelEncoder()
data['Heart Disease'] = label_encoder.fit_transform(data['Heart Disease'])

# Select columns with non-numeric data types (e.g., object, category)


non_numeric_columns = data.select_dtypes(exclude=['number'])

# Display the list of columns with non-numeric data


print(non_numeric_columns.columns)

#missing values
data['Heart Disease'].isna().sum()

data['Heart Disease'].unique()

data['Heart Disease'] = data['Heart Disease'].astype(int)

from sklearn.preprocessing import LabelEncoder


encoder = LabelEncoder()
data['Heart Disease'] = encoder.fit_transform(data['Heart Disease'])

32
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.2, random_state=42)

logistic_regression_model = LogisticRegression()
logistic_regression_model.fit(X_train, y_train)

y_pred = logistic_regression_model.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)


precision = precision_score(y_test, y_pred, pos_label='Presence')
recall = recall_score(y_test, y_pred, pos_label='Presence')
f1 = f1_score(y_test, y_pred, pos_label='Presence')

# Print the evaluation metrics


print(f'Accuracy: {accuracy:.2f}')
print(f'Precision: {precision:.2f}')
print(f'Recall: {recall:.2f}')
print(f'F1-Score: {f1:.2f}')

#cross validation
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LogisticRegression

33
logistic_regression_model = LogisticRegression()

# Perform 5-fold cross-validation


cv_scores = cross_val_score(logistic_regression_model, X_train, y_train, cv=5)

# Print the cross-validation scores


print("Cross-Validation Scores:", cv_scores)

# Calculate and print the mean accuracy of cross-validation


mean_accuracy = cv_scores.mean()
print("Mean Accuracy:", mean_accuracy)

#building predictive model


import numpy as np
input_data=(44,59,1,3,126,218,1,0,134,0,2.2,2,1,6)
#numpy
id=np.asarray(input_data)
idr=id.reshape(1,-1)

prediction=logistic_regression_model.predict(idr)
print(prediction)
if (prediction[0]==0):
print('The person doesnot have a heart disease')
else:
print("The person has a Heart disease")

34

You might also like