You are on page 1of 8

HEART AILMENT PREDICTION USING

MACHINE LEARNING METHODS


1 2 3 4 5
Abhigyan Hedau Himanshu Naidu Mrunali Gadekar Riddhi Mirajkar Shreyash Chaple
1
B.Tech Information Technology, Vishwakarma Institute of Information Technology, Pune
abhigyan.22010904@viit.ac.in
2
B.Tech Information Technology, Vishwakarma Institute of Information Technology, Pune
himanshu.22010394@viit.ac.in
3
B.Tech Information Technology, Vishwakarma Institute of Information Technology, Pune
mrunali.22010831@viit.ac.in
4
Faculty, Vishwakarma Institute of Information Technology, Pune
riddhi.mirajkar@viit.ac.in
5
B.Tech Information Technology, Vishwakarma Institute of Information Technology, Pune
shreyash.22010681@viit.ac.in

Abstract-- The heart is the coordinating I. INTRODUCTION


center of the major endocrine glandular Rheumatic heart disease is linked to around
structure of the body, which produces 2% of cardiovascular disease-related
hormones that profoundly affect the fatalities worldwide. The terms
operations of the body, and diagnosing "cardiovascular disease" and "heart disease"
cardiovascular disease is a difficult but are sometimes used interchangeably. Heart
critical task. By extracting knowledge and attacks, chest pain (angina), strokes, and
information about the disease from other illnesses caused by restricted or
patient data, data mining is a more obstructed blood vessels are together
practical technique to help doctors detect referred to as cardiovascular disease.
disorders. We use a variety of machine Symptoms consist of Angina, or chest pain
learning methods here, including logistic from the heart muscle due to insufficient
regression and support vector classifiers oxygen and nutrient-rich arterial blood flow,
(SVC), K-nearest neighbors which is a typical sign of heart disease. You
Classifier(KNN), Decision Tree Classifier, get chest discomfort as a result of angina.
Random Forest Classifier and Gradient Around their breastbone, some people feel
Boosting Classifier. These algorithms are tightness or a squeezing sensation. The neck,
applied to patient’s data containing 13 shoulder blades, upper arms, upper
different factors to build a system that abdomen, and upper back may all
predicts heart disease in less time with experience pain radiating from the lower
more accuracy. back.

Keywords: Logistic Regression, Support The most important organ in the human
Vector Classifier, K-Nearest Neighbours, body, the heart controls blood flow
Decision Tree, Random Forest and throughout the body. Other body parts may
Gradient Boosting suffer if there is any kind of heart function
2

impairment. Heart disease is currently the compared. The complex task of decision
biggest cause of death among people. making using discrete data is easily handled
According to estimates from the World by machine learning.By identifying hidden
Health Organization, almost 12 million patterns, machine learning (ML) analyzes
people die from heart disease each year the provided data. As a result, a tool is
(WHO). The WHO estimates almost the created that enables medical professionals to
death rate would increase to 23.6 million by diagnose patients quickly, treat them
2030 [6]. effectively, and prevent negative outcomes.
[1] [4] [14].
Dizziness, ankle swelling, shortness of
breath, slow heartbeats, fainting, The field of machine learning is growing at
lightheadedness, pain in the neck, jaw, a fast pace in different industries such as
throat, dullness, weakness, or coldness in healthcare, transportation, finance,
your body parts, and irregular heartbeats are agriculture, cybersecurity, marketing, etc.
all signs of this illness. We can reduce manual error with computer
analysis and also, increase the accuracy and
Heart disease can be prevented if detected efficiency of a system.
earlier. More accurate diagnoses in less time.
Providing the best standard services and The need for near 100% accuracy and
early, correct diagnosis is the industry's key reduction of human error is most in the
problem. The extensive application of healthcare industry.
machine learning, which also produces
favorable results with the highest accuracy
for medical diagnostics, can have a positive II. LITERATURE SURVEY
impact on the healthcare sector. A project is created to predict the
possibility of getting heart disease in
Finding the best algorithm for heart disease patients. It is predicted in terms of
prediction is the study's goal for data percentage using decision Tree as well as
mining. K-nearest Neighbor algorithm, taking into
consideration vital factors which include
Algorithms like Random Forest, Logistic pressure level, gender, age, cholesterol,
Regression, K-Nearest Neighbors Classifier, chest pain, rest blood pressure, fasting
Support Vector Classifiers, Decision Tree blood sugar, chest pain type,
Classifier, and Gradient Boosting Classifier electrocardiographic result [1].
are used in the classification and
construction of a model to diagnose heart A study on the prediction of
disease in patients. [11]. cardiovascular disease compared the
accuracies of two algorithms and their
A dataset is used to apply algorithms, and hybrids. The study came to the conclusion
the accuracy levels of the results are that Decision tree algorithm had a 79%
3

accuracy rate, Random forest algorithm heart disease patients and the Decision
had an 81% accuracy rate, and their Tree Model had an accuracy level of
Hybrid model had an 88% accuracy rate. 91%.[5].
[2]
Mohd Faisal Ansari studied how attributes
Ekta Maini's work on developing a affected the outcomes of a logistic
machine learning model for effective and regression technique model. He used a
early cardiovascular disease prediction. variety of models, including logistic (all
With the aid of different algorithms, India attributes), logistic (most significant
took into account eleven related factors of attributes), logistic (removing the least
a subject and came to the conclusion that significant attribute), SVM, and logistic
the accuracy of logistic regression is (removing the least significant attribute)
90.8%, the specificity of KNN is 87.1%, (with PCA) gave 86% accuracy, recall
and the specificity of the AB model is 68% and specificity 69% with precision
93.1% [3]. 77%, and a f1 score of 72%, the study's
findings demonstrate that Logistic (with
Rati Goel et al gave a brief comparison PCA) performed best. [6].
between the efficiency of six different
algorithms which include Support Vector T Marikani went through various studies
Machine, Random Forest, Naïve Bayes, to find the best suited algorithm of heart
Decision Tree, Logistic regression, disease prediction, the algorithms under
K-nearest Neighbor, for the purpose of scanner here are supervised learning
finding the best suited algorithm to detect algorithms like Decision tree, Naïve
heart disease. The study came to a Bayes, Random Forest Tree, KNN and
conclusion that the accuracies of each of When finished, use a support vector
the algorithms were as follows -Logistic machine. According to the study, the
Regression 77%, KNN 82%, SVM 86%, accuracy of various algorithms varied
Naïve Bayes 68%, Decision Tree 83%, depending on the implementation tools
Random Forest 83%. According to the and attributes used. [7]
analysis, Support Vector Machine is the
best algorithm for heart disease early V.V.Ramalingam carried a comprehensive
prediction [4]. study on the comparison of various
methodologies for heart disease prediction
Santhana Krishnan used Decision Tree which included algorithms and techniques
Classification and Naive Bayes such as Decision Tree, Support Vector
Classification models for classification. Machine, Naïve Bayes, Random Forest, K
After applying these two supervised-data – Nearest Neighbour, Ensemble Model,
mining algorithms to the dataset, it was the study concluded that Each of the
discovered that the Naive Bayes classifier above-mentioned algorithms have
had an accuracy level of 87% in predicting performed well in some cases but poorly
4

in other cases. Different Models that were


based on Naïve Bayes classifiers were A. Architecture Diagram
quite quick and performed well. For most
of the cases, SVM performed well. [8]
Pooja Anbuselvan evaluated the
performance of a number of machine
learning algorithms, including Naive
Bayes, Logistic Regression, K-Nearest
Neighbor, Decision Tree, Random Forest,
Support Vector Machine, and discovered
that Random Forest and XGBoost are the
most effective ones, each scoring 86.89%
and 78.69% respectively. The least
accurate algorithm was K-Nearest
Neighbor, which performed at 57.83% [9]. B. Data source
The dataset used in the prediction
III. PROPOSED SYSTEM process was obtained from the machine
The main goal of this paper is to estimate learning repository at the University of
the likelihood that patients may develop California, Irvine. The dataset consists
heart disease, and data mining is crucial in of 1026 instances of data with the 13
achieving this goal. This research makes use medical factors which are appropriate
of the 13 factor heart disease dataset. for prediction. [15].
Gender, age, exercise-induced angina,
resting blood pressure, cholesterol, fasting C. Steps Involved for Modeling of
blood sugar, chest pain, thalassemia, results Dataset
of resting electrocardiography, maximum
heart rate reached, ST depression brought on 1. User Input: In this step we
by exercise in comparison to rest, slope and prompt the user to fill in the
number of major vessels are some of these required parameters via an
factors. The programme employs a interface, which will act as an
classification technique. input to the prediction system.
2. Test data: The data entered by
the user is authenticated in
context of correctness and
relevance before giving it to our
system.
3. Train data: It is a part of a
dataset which is fed to our
machine learning model so that it
can discover meaningful patterns.
5

4. System for predicting diseases: Gradient Boosting 80.32


Here, we use our list of
algorithms and data classifiers, Support Vector 80.32
which will be used in the future Machine
to make the proper prediction
about the user Input. The output Random Forest 83.60
is expected to be binary in nature
in the sense the output prediction
would be either Yes or No in
context of the existence of a heart
disease.

IV. RESULT ANALYSIS


The dataset has two categories of data:
training data and test data. Classification
techniques including logistic regression,
Gradient Boosting, SVC, Decision Tree,
K-Nearest Neighbors, and Random
Forest are applied after preprocessing
the data. It demonstrates that K-Nearest
Neighbors, Decision Tree, and Logistic UI consists of a form, where the user will be
Regression provide accuracy rates of able to enter the values of the factors that we
78.68%, 77%, and 73.77%, respectively. considered for training the model.
Random forest has an accuracy rate of
about 83.60%. Both the Support Vector
Machine and Gradient Boosting
algorithms yield an accuracy of
80.327%.

Classification Algorithms and


Accuracy

Algorithm Accuracy
(%)

K Nearest Neighbor 73.77

Decision Tree 77.04

Logistic Regression 78.68


6

V. CONCLUSION
Our research focuses on using various
machine learning techniques to predict heart
disease, and we assess the efficacy of these
algorithms by presenting a variety of signs
that can be used to determine whether a
patient has heart disease or not. The
research demonstrates how several machine
learning algorithms function in the
foretelling of a cardiovascular disease.
Using Python programming, the
classification procedures employed in the
study were carried out. According to the
results above, the Random Forest Classifier
is the best-performing machine learning
After the user has entered the values for the technique out of all the strategies examined.
fields, if the model returns 1 for the values It has an accuracy rate of 83.60 percent.
user provided, then it will show “Possibility The average accuracy predicted is 78.94%.
of Heart Disease”, else, it’ll show “No Heart K-Nearest Neighbors is the least accurate
Disease” algorithm with accuracy 73.77%. In order to
predict cardiac illness earlier and lower the
death rate, machine learning can be utilized
efficiently in this way.

VI. FUTURE SCOPE


Advanced technology like deep learning can
be applied to increase the correctness of the
system up to 100%. With the
implementation of better ML systems in the
healthcare sector, we can briefly reduce the
human error factor and also, increase the
accuracy of prediction of various diseases
such as heart disease, liver disease, diabetes,
tumor predictions, etc.
7

VII. References [6] Ansari, M.F., Alankar, B., Kaur, H.


(2021). A Prediction of Heart Disease Using
Machine Learning Algorithms. In: Chen,
[1] Arul Jothi, K., S. Subburam, V. J.IZ., Tavares, J.M.R.S., Shakya, S., Iliyasu,
Umadevi and K. Hemavathy. “Heart disease A.M. (eds) Image Processing and Capsule
prediction system using machine Networks. ICIPCN 2020. Advances in
learning.” Materials Today: Proceedings Intelligent Systems and Computing, vol
(2021): n. Pag. 1200. Springer, Cham.
https://doi.org/10.1007/978-3-030-51859-2_
[2] M. Kavitha, G. Gnaneswar, R. 45
Dinesh, Y. R. Sai and R. S. Suraj, "Heart
Disease Prediction using Hybrid machine [7] Marikani, T. and K. Shyamala.
Learning Model," 2021 6th International “Prediction of Heart Disease using
Conference on Inventive Computation Supervised Learning Algorithms.”
Technologies (ICICT), 2021, pp. 1329-1333, International Journal of Computer
doi: 10.1109/ICICT50816.2021.9358597. Applications 165 (2017): 41-44.

[3] Maini, Ekta & Venkateswarlu, [8] V V Ramalingam, 2018, Heart


Bondu & Maini, Baljeet & Marwaha, disease prediction using machine learning
Dheeraj. (2021). Machine learning–based techniques: A survey, INTERNATIONAL
heart disease prediction system for Indian JOURNAL OF ENGINEERING
population: An exploratory study done in RESEARCH & TECHNOLOGY.
South India. Medical Journal Armed Forces
India. 77. 10.1016/j.mjafi.2020.10.013. [9] Pooja Anbuselvan, 2020, Heart
Disease Prediction using Machine Learning
[4] Goel, Rati, Heart Disease Prediction Techniques, INTERNATIONAL JOURNAL
Using Various Algorithms of Machine OF ENGINEERING RESEARCH &
Learning (July 12, 2021). Proceedings of the TECHNOLOGY (IJERT) Volume 09, Issue
International Conference on Innovative 11 (November 2020).
Computing & Communication (ICICC)
2021, Available at SSRN: [10] Manjula P, Aravind U R, Darshan M
https://ssrn.com/abstract=3884968 or V, Halaswamy M H, Hemanth E, 2022,
http://dx.doi.org/10.2139/ssrn.3884968 Heart Attack Prediction Using Machine
Learning Algorithms, INTERNATIONAL
[5] J, Santhana & S, Geetha. (2019). JOURNAL OF ENGINEERING
Prediction of Heart Disease Using Machine RESEARCH & TECHNOLOGY (IJERT)
Learning Algorithms. 1-5. ICEI – 2022 (Volume 10 – Issue 11).
10.1109/ICIICT1.2019.8741465.
[11] Harshit Jindal, Sarthak Agrawal,
Rishabh Khera1, Rachna Jain and Preeti
8

Nagrath, Heart disease prediction using


machine learning algorithms.

[12] Pabitra Kumar Bhunia, Arijit


Debnath, Poulami Mondal, Heart Disease
Prediction using Machine Learning,
International Journal of Engineering
Research & Technology (IJERT) ISSN:
2278-0181 Published by, www.ijert.org
NCETER - 2021 Conference Proceedings.

[13] Senthil kumar mohan, chandrasegar


thirumalai and Gautam Srivastva, “Effective
Heart Disease Prediction Using Hybrid
Machine Learning Techniques” IEEE
Access 2019.

[14] Ch Anwar ul Hassan, Muhammad


Sufyan Khan, Munan Ali Shah,
Comparision Of Machine Learning
Algorithms in Data Classification, in: 2018
Proceedings of the 24th International
Conference on Automation and Computing,
Newcastle University, Newcastle Upon
Tyne, UK. IEEE, 2018.

[15]https://www.kaggle.com/johnsmith88/he
art-disease-dataset.

You might also like