You are on page 1of 5

(Prediction of Heart Disease Using Machine

Learning)
(Zainab A. Alsyydi) (Dr. Basheer M. Almaqaleh)
PG Student Asst. Professor
Department of Computer science and information Department of Computer science and information
technology technology
Thamar University, Yemen Thamar University, Yemen
Zaidalsyyadi666@gmail.com e-mail address if desired

Keywords- Indexed Terms-- Machine Learning, Decision Tree diseases have increased after the COVID-19 pandemicCVDs
Classifier, r commonly occur when the heart fails to supply blood under
I Introduction normal conditions, leading to high blood pressure, diabetes,
chest pain, or other cardiac disorders. Initially, hospitals use
Data mining is extracting information and knowledge from regular tools like ECGs to determine and evaluate the stage of
huge amount of data. Data mining is an essential step in the disease. Better technologies like MCGs help in detecting
discovering knowledge from databases. There are numbers of these diseases when they are in an early stage. However, such
databases, data marts, data warehouses all over the world. Data devices and technologies are not just expensive and unfit for
Mining is mainly used to extract the hidden information from a smaller clinics for heart disease diagnosis but also time-
large amount of database. Data mining is also called as consuming and sensitive to tangential causes. Thus,
Knowledge Discovery Database (KDD). technicians and clinicians would greatly benefit if there was an
The healthcare domain is one of the prominent autonomous system in place that could perform a test for heart
disease detection and warn of CVD risk at an early stage. With
research fields in the current scenario with the rapid AI entering every other industry with the advance of computer
improvement of technology and data. It is difficult science, medical science perhaps has the vastest scope of need
to handle the huge amount of data of the patients. It and use for artificial intelligence solutions -- especially in
is easier to handle this data through Big Data developing countries.
Analytics. There are a lot of procedures for the Several risk factors for manual heart disease prediction may
treatment of multiple diseases across the world. include inactivity in a physical form, unhealthy eating habits,
Machine Learning is an emerging approach that or even the consumption of alcohol. Preexisting conditions,
helps in prediction, diagnosis of a disease. This age, chest pain level, blood test results, and several such factors
can be ensembled together computationally for heart disease
paper depicts the prediction of disease based on prediction. With such well-defined parameters and the rise of
symptoms using machine learning. data science, a data-driven approach can surely help in heart
Machine Learning algorithms such as Naive disease prediction using machine learning technologies. Early
identification of heart disease of improved diagnosis and high-
Bayes, Decision Tree and Random Forest are risk individuals using a prediction model can be recommended
employed on the provided dataset and predict the for a fatality rate reduction, and decision-making is improved
disease. Its implementation is done through the for further treatment and prevention.
python programming language. The research Be sure that the symbols in your equation have been defined
demonstrates the best algorithm based on their before or immediately following the equation. Use Machine
accuracy. The accuracy of an algorithm is learning (ML) is a branch of artificial intelligence (AI) that is
determined by the performance on the given dataset increasingly utilized within the field of cardiovascular
medicine. It is essentially how computers make sense of data
Today, heart failure diseases affect more people worldwide and decide or classify a task with or without human
than other autoimmune conditions. Cardiovascular Diseases supervision. The conceptual framework of ML is based on
(CVDs) affect the heart and obstruct blood flow through the models that receive input data (e.g., images or text) and
blood vessels. Chronic ailments in CVD include heart disease through a combination of mathematical optimization and
(heart attack), cerebrovascular diseases (strokes), congestive statistical analysis predict outcomes (e.g., favorable,
heart failure, and many more pathologies. Worldwide, CVDs unfavorable, or neutral). Several ML algorithms have been
kill around 17 million a year, and death rates due to heart applied to daily activities.
Heart disease describes a range of conditions that affect your classification technique is used for classifying the entire dataset
heart. Today, cardiovascular diseases are the leading cause of into two categories namely yes and No. Classification
death worldwide with 17.9 million deaths annually, as per the technique is applied to the dataset through the machine learning
World Health Organization reports [1]. Various unhealthy classification algorithm namely Decision tree classification and
activities are the reason for the increase in the risk of heart Naïve Bayes Classification models
disease like high cholesterol, obesity, increase in triglycerides
These models are used to enhance the accuracy level of the
levels, hypertension, etc. [1]. There are certain signs which the
classification technique. This model performs both the
American Heart Association [2] lists like the persons having
classification and prediction methods. These models are
sleep issues, a certain increase and decrease in heart rate
performed using python Programming Language where it will
(irregular heartbeat), swollen legs, and in some cases weight
benefit us in enhance visualization and ease of interpretation
gain occurring quite fast; it can be 1-2 kg daily [3]. All these
symptoms resemble different diseases also like it occurs in the Heart disease is very fatal and it should not be taken
aging persons, so it becomes a difficult task to get a correct lightly. heart disease happens more in males than females,
diagnosis, which results in fatality in near future.But as time is which can be read further from Harvard Health Publishing
passing, a lot of research data and patients records of hospitals [15] researchers found that, throughout life, men were about
are available. There are many open sources for accessing the twice as likely as women to have a heart attack that higher risk
patient’s records and researches can be conducted so that persisted even after they accounted for traditional risk factors
various computer technologies could be used for doing the of heart disease, including high cholesterol, high blood
correct diagnosis of the patients and detect this disease to stop pressure, diabetes, body mass index, and physical activity .The
it from becoming fatal researchers are working on this dataset as it contains certain
important parameters like dates from 1998, and it is
considered as one of the benchmark datasets when someone is
Nowadays it is well known that machine learning and artificial working on heart disease prediction.
intelligence are playing a huge role in the medical industry. We II .LITERATURE SURVEY
can use different machine learning and deep learning models to A quiet significant amount of work related to the diagnosis of
diagnose the disease and classify or predict the results. Cardiovascular Heart disease using Machine Learning
A complete genomic data analysis can easily be done using techniques has motivated this study. An efficient
machine learning models. Models can be trained for knowledge Cardiovascular heart disease prediction has been made by
pandemic predictions and also medical records can be using many techniques some of them include KNN,
transformed and analyzed more deeply for better predictions , Random Forest Classifier etc. It can be seen in Results that
The main Methodology used for prediction is KNN each technique has its strength to register the defined
Algorithms, Decision Trees like CART, C4.5, CHAID, J48, objectives
ID3 Algorithms, and Naive Bayes Techniques. This system This Paper predicts heart disease for Male Patient using
uses 13 medical attributes as input and with that input, Data Classification Techniques. The detailed information about
sets it to process the data mining techniques and shows the Coronary Heart diseases such as its Facts, Common Types,
most accurate one since all previous studies used three main and Risk Factors has been explained in this paper. The Data
data mining techniques in their work: namely Decision Tree, Mining tool used is WEKA (Waikato Environment for
Neural Networks and Naïve Bayes Classifier are used. The Knowledge Analysis), a good Data Mining Tool for
main task of data Prediction is done using these three Bioinformatics Fields. The all three available Interface in
techniques however ,there are many disadvantages in this WEKA is used here. Naive Bayes, Artificial Neural Networks
studies represented in medical misdiagnoses are a serious risk and Decision Tree (J48) are Main Data Mining Techniques
to our healthcare profession. If they continue, then people will and through this techniques heart disease is predicted in this
fear going to the hospital for treatment. We can put an end to System [4].
medical misdiagnosis by informing the public and filing claims Through this paper the information about Data Mining and
and suits against the medical practitioners at fault. heart diseases has been gathered. The detailed information
Most of these studies are theoretical analysis at the macro about heart diseases, symptoms of heart attack and heart
level and there is a lack of quantitative investigations . disease types are presented in this paper, the three main data
Data mining is a knowledge discovery technique to analyze mining techniques namely Decision Tree, Neural Networks
data and encapsulate it into useful information The current and Naive Bayes Classifier are used. The main task of data
research intends to predict the probability of getting heart Prediction is done using these three techniques. The
disease given patient data set Predictions and descriptions are Advantages and Disadvantages of each technique can be
principal goals of data mining, in practice Prediction in data known using this paper [5].
mining involves attributes or variables in the data set to find an This System predicts the arising possibilities of Heart
unknown or future state Disease. The outcomes of this system provide the chances of
In this system, a heart disease data set is used. The main occurring heart disease in terms of percentage. The datasets
aim of this system is to predict the possibilities of occurring used are classified in terms of medical parameters. This
heart disease of the patients in terms of percentage. This is system evaluates those parameters using data mining
performed through data mining classification techniques.The classification technique. The datasets are processed in python
programming using two main Machine Learning Algorithm Decision Tree, Support Vector Machine, Random Forest
namely Decision Tree Algorithm and Naive Bayes Algorithm Classifier, and Logistic Regression Classifier with the selected
which shows the best algorithm among these two in terms of features as well as full features. We apply hyperparameter
accuracy level of heart disease. [6] tuning and cross-validation with machine learning to enhance
Through this paper: With the rampant increase in the heart accuracy. One core merit of the proposed system is able to
stroke rates at juvenile ages, we need to put a system in place handle Twitter data streams that contain patients’ data
to be able to detect the symptoms of a heart stroke at an early efficiently. This is done by integrating Apache Kafka with
stage and thus prevent it. It is impractical for a common man Apache Spark as the underlying infrastructure of the system.
to frequently undergo costly tests like the ECG and thus there The results show the random forest classifier outperforms the
needs to be a system in place which is handy and at the same other models by achieving the highest accuracy at 94.9%[10]
time reliable, in predicting the chances of a heart disease. Thus In order to tackle this issue and to accurately diagnosis the
we propose to develop an application which can predict the patient of PD, we proposed a machine-learning-based
vulnerability of a heart disease given basic symptoms like age, prediction system. In the development of the proposed system,
sex, pulse rate etc. The machine learning algorithm neural the support vector machine (SVM) was used as a predictive
networks has proven to be the most accurate and reliable model for the prediction of PD. The L1-norm SVM of features
algorithm and hence used in the proposed system[7] selection was used for appropriate and highly related features
Heart disease is the Leading cause of death worldwide. This selection for accurate target classification of PD and healthy
System predicts the arising possibilities of Heart Disease. The people. The L1-norm SVM produced a new subset of features
outcomes of this system provide the chances of occurring from the PD dataset based on a feature weight value. For the
heart disease in terms of percentage. The datasets used are validation of the proposed system, the K-fold cross-validation
classified in terms of medical parameters. This system method was used. In addition, the metrics of performance
evaluates those parameters using data mining classification measures, such as accuracy, sensitivity, specificity, precision,
technique. The datasets are processed in python programming F1 score, and execution time, were computed for model
using two main Machine Learning Algorithm namely Decision performance evaluation. The PD dataset was in this paper. The
Tree Algorithm and Naive Bayes Algorithm which shows the optimal accuracy achieved the best subset of the selected
best algorithm among these two in terms of accuracy level of features that might be due to various contributions of the PD
heart disease [8] features. The experimental findings of this paper suggest that
This study proposes an efficient neural network with the proposed method can be used to accurately predict the PD
convolutional layers to classify significantly class-imbalanced and can be easily incorporated in healthcare for diagnosis
clinical data. The data is curated from the National Health and purpose. Currently, the computer-based assisted predictive
Nutritional Examination Survey (NHANES) with the goal of system is playing an important role to assist in PD recognition.
predicting the occurrence of Coronary Heart Disease (CHD). In addition, the proposed approach fills in a gap on feature
While the majority of the existing machine learning models selection and classification using voice recordings data by
that have been used on this class of data are vulnerable to class properly matching the experimental design[11]
imbalance even after the adjustment of class-specific weights, , the purpose of this study is to increase the accuracy of
our simple two-layer CNN exhibits resilience to the imbalance coronary heart disease diagnosis through selecting significant
with fair harmony in class-specific performance. Given a predictive features in order of their ranking. In this study, we
highly imbalanced dataset, it is often challenging to propose an integrated method using machine learning. The
simultaneously achieve a high class 1 (true CHD prediction machine learning methods of random trees (RTs), decision
rate) accuracy along with a high class 0 accuracy, as the test tree of C5.0, support vector machine (SVM), and decision tree
data size increases. We adopt a two-step approach: first, we of Chi-squared automatic interaction detection (CHAID) are
employ least absolute shrinkage and selection operator used in this study. The proposed method shows promising
(LASSO) based feature weight assessment followed by results and the study confirms that the RTs model outperforms
majority-voting based identification of important features. other models[12].
Next, the important features are homogenized by using a fully Data mining is the useful tool to discovering the
connected layer, a crucial step before passing the output of the knowledge from large data. Different methods & algorithms
layer to successive convolutional stages. We also propose a are available in data mining. Classification is most common
training routine per epoch, akin to a simulated annealing method used for finding the mine rule from the large database.
process, to boost the classification accuracy[9] Decision tree method generally used for the Classification,
This paper presents a real-time system for predicting heart because it is the simple hierarchical structure for the user
disease from medical data streams that describe a patient’s understanding & decision making. Various data mining
current health status. The main goal of the proposed system is algorithms available for classification based on Artificial
to find the optimal machine learning algorithm that achieves Neural Network, Nearest Neighbour Rule & Baysen
high accuracy for heart disease prediction. Two types of classifiers but decision tree mining is simple one. ID3 and
features selection algorithms, univariate feature selection and C4.5 algorithms have been introduced by J.R Quinlan which
Relief, are used to select important features from the dataset. produce reasonable decision trees. The objective of this paper
We compared four types of machine learning algorithms; is to present these algorithms. At first we present the classical
algorithm that is ID3, then highlights of this study we will • Coronary artery disease
discuss in more detail C4.5 this one is a natural extension of • Cardiac arrest
the ID3 algorithm. And we will make a comparison between • Congestive heart failure
these two algorithms and others algorithms such as C5.0 and • Stroke
CART[13] • Lift ,right heart failure
This study gathered data from a variety of sources and divided • Acute ,chronic heart failure
it into two parts: 80 percent for the training dataset and 20% • Systolic ,diastolic heart failure
for the test dataset. Different classifier methods were used to
IV Methodology
improve accuracy, which was then summarized. Random
Health care field has a vast amount of data, for processing
Forest Classifier, Decision Tree Classifier, Support Vector
those data certain techniques are used. Data mining is one of
Machine, k-nearest neighbor, Logistic Regression, and Naive
the techniques often used. Heart disease is the Leading cause
Bayes are the methods in question. SVM, Logistic Regression,
of death worldwide. This System predicts the arising
and KNN all performed as well as or better than other
possibilities of Heart Disease. The outcomes of this system
methods. This research offers a development in which
provide the chances of occurring heart disease in terms of
fundamental prefixes such as sex, glucose, blood pressure,
percentage. The datasets used are classified in terms of
heart rate, and others are used to determine which factors are
medical parameters. This system evaluates those parameters
prone to heart disease. The paper's next aim is to conduct real-
using data mining classification technique. The datasets are
life tests using various equipment and clinical trials. [14] processed in python programming using two main Machine
In this paper, we proposed three methods in which
Learning Algorithm namely Decision Tree Algorithm and
comparative analysis was done and promising results were
Naïve Bayes Algorithm which shows the best algorithm
achieved. The conclusion which we found is that machine
among these two in terms of accuracy level of heart disease
learning algorithms performed better in this analysis. Many
The proposed methodology includes the following steps: the
researchers have previously suggested that we should use ML.
first step is referred to as the collection of the data then in
[16] second step, it extracts significant values then the 3rd is the
preprocessing step where we explore the data. Data
III.HEART DISEASE
preprocessing deals with the missing values, cleaning of data
and normalization depending on algorithms used
A heart is a vital organ of the human body. If a heart does not
After pre-processing of data, each classifier is used to classify
perform its operation properly, it will influence the other organ
the pre-processed data. Finally, the proposed model is
of the human-like kidney, brain, etc. According to the undertaken, where we evaluated our model on the basis of
statistical data from WHO, one-third population worldwide
accuracy and performance using various performance metrics.
died from heart disease; heart disease is found to be the
Here in this model, an effective Heart Disease Prediction
leading cause of death in developing countries by 2017. The
System has been developed using different classifiers. This
heart pumps blood through the blood vessels of the circulatory
model uses 17 medical features such as an attributes below in
system. Blood provides the body with oxygen and nutrients, as
the table for prediction
well as assisting in the removal of metabolic wastes. In the
event, if blood in the body is insufficient then many organs Attribute Description Possible Values
like cerebrum suffer and if heart quits working by, death
Dummy Identification of
happens inside minutes. Patient Id
the patient
Heart disease risk factor include:
Gender Male, Female
• High Cholesterol
• High blood pressure Youth = 30-39, Young
Young
• Diabetics Adult =40-49 Adult
Age Young Adult
• Smoking =50-59 Old People =60-
Adult Old
• Consuming too much alcohol 69
• Being overweight or obese Stable Angina –
• Family history of coronary illness Predictable Chest Pain Stable angina
Symptoms of Heart attack: Unstable Angina –Chest Non-angina
• Shortness of breath Chest Pain pain that signal Unstable angina
• Pain and discomfort in chest Type impending heart attack Prinzmetal’s
• Pain may spread to left or right arm or to neck, jaw, back or Prinzmetal's Angina – angina
stomach have coronary Asymptomatic
• Fatigue artery disease
• Cold sweat and unsteadiness Low-density
• Rapid or irregular heart beat lipoproteins (LDL) (Bad
LDL
• Heart burn or abnormal pain Cholesterol Cholesterol), High-
HDL
Types of cardiovascular disease: density lipoproteins
(HDL) (Good
Cholesterol on Innovations in Information and Communication Technology
(ICIICT), 2019.
Smoking Yes, No
[7] A. Gavhane, G. Kokkula, I. Pandya and K. Devadkar, "Prediction of
If Blood Sugar level is > Heart Disease Using Machine Learning," 2018 Second International
Blood Sugar 120 mg/dl -Increase the True, False Conference on Electronics, Communication and Aerospace Technology
risk (ICECA), 2018, pp. 1275-1278, doi: 10.1109/ICECA.2018.8474922
Normal(systolic<139mm [8] S. K. J. and G. S., "Prediction of Heart Disease Using Machine Learning
Algorithms," 2019 1st International Conference on Innovations in
Blood Hg), Prehypertension- Normal Information and Communication Technology (ICIICT), 2019, pp. 1-5,
Pressure (systolic >140 mmHg), Prehypertension doi: 10.1109/ICIICT1.2019.8741465
High – (systolic > 160 High [9] An efficient convolutional neural network for coronary heart disease
mmHg) prediction
Normal - ST_T wave Aniruddha Dutta, Tamal Batabyal, Meheli Basu, Scott T Acton
Abnormality, Left Expert Systems with Applications 159, 113408, 2020
Normal
(ECG) Ventricular Hypertrophy [10] Hager Ahmed, Eman MG Younis, Abdeltawab Hendawi, Abdelmgeid A
Abnormal Ali Future Generation Computer Systems 111, 714-722, 2020
(LVH) {Electro cardio
graphic results } Heart disease identification from patients’ social posts, machine learning
solution on Spark
Healthy,
Diet [11] Feature selection based on L1-norm support vector machine and
Unhealthy effective recognition system for Parkinson’s disease using voice
Alcohol True, False recordings
Amin Ul Haq, Jian Ping Li, Muhammad Hammad Memon, Asad Malik,
Tanvir Ahmad, Amjad Ali, Shah Nazir, Ijaz Ahad, Mohammad Shahid
V CONCLUSION IEEE access 7, 37718-37734, 2019
In this paper, two supervised data mining algorithm was [12] Javad Hassannataj Joloudari, Edris Hassannataj Joloudari, Hamid
applied on the dataset to predict the possibilities of having heart Saadatfar, Mohammad Ghasemigol, Seyyed Mohammad Razavi, Amir
Mosavi, Narjes Nabipour, Shahaboddin Shamshirband, Laszlo Nadai
disease of a patient, were analyzed with classification model
International journal of environmental research and public health 17 (3),
namely Naïve Bayes Classifier and Decision tree classification. 731, 2020
These two algorithms are applied to the same dataset in order [13] Badr HSSINA, Abdelkarim MERBOUHA,Hanane
to analyze the best algorithm in terms of accuracy.. EZZIKOURI,Mohammed ERRITALI TIAD laboratory, Computer
Sciences Department, Faculty of sciences and techniques Sultan
REFERENCES "A comparative study of decision tree ID3 and C4.5" Sultan Moulay
Slimane University Beni-Mellal, BP: 523, Morocco
[14] G.R. RAO1, SHOURYA KHUJNERI2, ATUL KUMAR
TOMAR3, RUDRAKSH SHARMA4 1 Associate Professor,
[1] World Health Organization, Cardiovascular Diseases, WHO, Geneva, Department of computer Engineering Bharati Vidyapeeth Deemed to be
Switzerland, 2020, https://www.who.int/health-topics/cardiovascular- University College of Engineering, Pune, India 2, 3, 4 Department of
diseases/#tab=tab_1. computer Engineering Bharati Vidyapeeth Deemed to be University
[2] American Heart Association, Classes of Heart Failure, American Heart College of Engineering, Pune, India" Heart Disease Prediction Using
Association, Chicago, IL, USA, 2020, https://www.heart.org/en/health- Machine Learning June 2022 | IJIRT | Volume 9 Issue 1 | ISSN: 2349-
topics/heart-failure/what-is-heart-failure/classes-of-heart-failure. 6002 "
[3] American Heart Association, Heart Failure, American Heart [15] An efficient convolutional neural network for coronary heart disease
Association, Chicago, IL, USA, 2020, https://www.heart.org/en/health- prediction
topics/heart-failure Aniruddha Dutta, Tamal Batabyal, Meheli Basu, Scott T Acton
Expert Systems with Applications 159, 113408, 2020
[4] Gandhi, Monika, and Shailendra Narayan Singh. "Predictions in heart [16] Rohit Bharti ,1 Aditya Khamparia ,2 Mohammad Shabaz ,3 Gaurav
disease using techniques of data mining." In 2015 International Dhiman ,4 Sagar Pande ,1 and Parneet Singh 5" Prediction of Heart
Conference on Futuristic Trends on Computational Analysis and Disease Using a Combination of Machine Learning and Deep Learning"
Knowledge Management (ABLAZE), pp. 520-525. IEEE, 2015. Computational Intelligence and Neuroscience Volume 2021, Article ID
[5] Rairikar, A., Kulkarni, V., Sabale, V., Kale, H., & Lamgunde, A. (2017, 8387680, 11 pages
June). Heart disease prediction using data mining techniques. In 2017
International Conference on Intelligent Computing and Control (I2C2)
(pp. 1-8). IEEE
[6] Santhana Krishnan.J, Geetha.S, “Prediction of Heart Disease Using
Machine Learning Algorithms”, IEEE 2019 1st International Conference

You might also like