You are on page 1of 5

Dynamic Heart Disease Prediction using Multi-

Machine Learning Techniques


Shaik Farzana Duggineni Veeraiah
Depatment of CSE Depatment of CSE
Lakireddy Bali Reddy College of Engineering (A) Lakireddy Bali Reddy College of Engineering (A)
Mylavaram, Krishna, Andhra Pradesh Mylavaram, Krishna, Andhra Pradesh
shaik.farzana52@gmail.com veeraiahdvc@gmail.com

Abstract— Health Care Field having enormous data, for Data analysis has proved its importance in the medical
processing these data we must use any advanced techniques field. It is the base of all to take any complex decisions.
which will be helpful to provide the effective results and Among all, mostly this analysis helps in keeping human
making effective decisions on data and getting the appropriate predisposition far from medical conclusion with the help of
results. Heart disease is the leading problem and one of the
proper measurable treatment. By using different data mining
biggest causes for no. of deaths happening all over the world.
In this paper, an effective Heart Disease Prediction framework techniques, we can explore huge amount of information. In
is implemented using algorithms in Machine Learning such as a wide scope of wellbeing science applications, Data Mining
Gaussian Naïve Bayes, Random Forest, K-Nearest Neighbour, is frequently growing. Huge performance of accuracy rate
Support Vector Machine, Xg- Boost. The framework uses 13 and low costing human services administrations can be done
features such as age, gender, blood pressure, cholesterol, by using the Data Mining Classification methods and heart
obesity, cp, etc. It is a user- friendly system where we are disease prediction frameworks.
having some phases. A huge volume of information had by medicinal services
enterprises that containing some shrouded data is useful to
In the first phase, we upload the dataset file and select the
algorithm to perform on the selected dataset. Then the
settle on powerful choices to give the precise outcomes to
accuracy is predicted for each selected algorithm along with a take the successful decisions on information. These methods
graph, and the modal is generated for the one having highest of data mining are utilized to better the experience and
frequency by training the dataset to it. In the next phase, input conclusion that have been given.
for each parameter of the heart is given and based on that Machine Learning ends up being compelling in helping
modal generated, the diseased stage of the heart gets predicted. with settling on choices and expectations from the enormous
We then take the precautions based on the condition of the amount of information delivered by the human services
patient. Our strategy is effective in foreseeing the heart illness industry to decide contrary based diagnosing of heart
of a victim. The Heart Disease Prediction Framework disease, a non- invasive medical supportive network
progressed in this view is a one of a unique methodology that
might be utilized inside the class of heart disease.
dependent on machine learning predictive modelling like
Support Vector Machine, Naïve Bayes, Logistic Regression,
Keywords— Data Mining, Machine Learning, Gaussian K nearest neighbour, and more used by multi researches for
Naive Bayes, Support Vector Machine (SVM), Random Forest, heart illness prediction, and because of these, the death of
K-Nearest Neighbour, Xg- Boost. heart disease proportion reduced and this illness through
machine learning based framework has been noted in multi
I. INTRODUCTION research studies[2].
In this study, the recent study of heart disease can be
In today’s world, among various life-threatening
significantly reduced. We have proposed a multi- classifier
diseases, heart disease is one of the most complex and life-
model for predicting the better accuracy, reliability, and
and-death disease which affects both middle and old age in
scalability. The prediction framework we have proposed is
the world. It is a difficult task, which can offer computerized
user- interface based and can have further improvements
expectation about the heart state of patient with the objective
according to our requirements.
that further treatment can be made reasonable. This heart
illness is usually based on its indications, manifestations, and
physical assessments of the patient. Risk of the disease can II. RELATED WORK
be improved by some of the factors such as like body Heart Disease is a major cause to be dealt with in
cholesterol level, high blood pressure, smoking habit, our present world. Due to some chance parameters like bp,
obesity, family ancestry of coronary illness and absence of cholesterol, etc, it is getting difficult. Due to these reasons,
physical exercise [1]. Because of such limitations, researches the researchers have concentrated on Data Mining or
have turned towards current methodologies like Data Mining Machine Learning for predicting the results for a disease
and Machine Learning for foreseeing the sickness.
which are the present-day approaches.
There are some complex investigation techniques to Data Mining or Machine Learning is valuable for various
predict heart disease which are results in multifaceted nature arrangement of issues. A reliant approach from the
is the significant cause which effecting our daily lives. estimations of free factors is the utilization of this technique.
Hence the treatment of heart disease is too complex, The healthcare administration must deal manually
particularly in the developing countries because of the tremendous amount of information assets as it is an
uncommon accessibility of mechanical assembly and application region of data mining. Heart Disease has been
deficiency of doctors and other assets which influence distinguished as one of the biggest reasons of death for the
legitimate expectation and treatment of heart patients. countries all over the world.

978-1-7281-9180-5/20/$31.00 ©2020 IEEE

Authorized licensed use limited to: UNIVERSITY OF WESTERN ONTARIO. Downloaded on May 26,2021 at 10:48:55 UTC from IEEE Xplore. Restrictions apply.
For the prediction of heart illness various techniques have 1 Age (in years) Continuous
been introduced among various Data Mining techniques.
2 Gender (male or female) 1- Male,0- Female
There are some health care problems which are solved by
using the techniques in Machine Learning and Data Mining Trestbps (resting blood
3 Continuous value
pressure)
having wide range which gives the greatest strengths. Chest pain type, values are
Effective part is played by machine learning techniques in taken between 1 to 4
prediction of heart disease as compared to the other 4 Cp (type of chest pain)
1- typical angina
algorithms previously defined like clustering etc. Existing 2- atypical angina
3- non- anginal pain
literatures has an effective role for solving all of our 4- asymptomatic
problems and taking decisions properly and solving them Measured in mg/dL
too. 5 fbs (fasting blood sugar)
<=100 - normal
Paper published in [1] predicted the accuracy with 100% 110 to 125 pre- diabetes
>=126 - diabetes
by using neural networks. Paper in [2] used FS algorithm
0-normal
relief with best accuracy. PolaRaju and Durga Prasad [4] 6 restecg (ECG result) 1-having ST- T
proposed prediction of Heart Disease using Multiple 2-hypertrophy
Regression. with 13 different attributes. K.Gomathi It takes following values
Kamaraj[3] had proposed Disease Journalof System and 3- normal
7 thal (Heart rate of patient)
6- fixed defect
Software by using Multi Machine Learning Algorithms. 7- reversible defect
Jayam Patel [5] and Ashwini Shetty [6] had developed a chol (serum cholesterol) Continuous value
8
prediction framework for the heart illenss by considering the
exang (exercised induced 0 - no
dataset containing few factors of the patient. 9
angina) 1 - yes
A Heart Disease Prediction Framework is implemented by
10 thalach (max heartbeat rate) Continuous value
using Computer Science Technology which is having
Machine Learning Models is proposed by Megha Shahi and 11 Oldpeak (ST depression) Continuous value
Kaur [7]. R.Sharmila and S. Chellammal proposed a Ca (major vessels number Number of major vessels
12
technical model for the prediction of heart disease using the colored by fluoroscopy) from 0-3
data and Engineering [8]. These are some of the reference 13 Slope (peak slope) Takes values from 1 to 3
papers published by various researchers, students and others
in which each paper having a unique methods and For the heart disease prediction most cases depended on
procedures for the prediction of heart disease and give the the obsessive information due to the complex problems.
required suggestions. Different algorithms we use are having different accuracies.
Due to some of the complex problems, there is an
III. DATASET DESCRIPTION
enthusiasm raised in the researchers for the prediction of
In this project, the parameters we are taken are age, gender, diseases especially with respect to the heart. Among all of
chol, obesity, cp, trestbps, fbs, restecg, thalach, exang, them, we must pick one based on our problem. In this paper,
oldpeak, slope, ca, thal. We have considered the data from a heart illness foresee structure we have built up helps high
the standard dataset that containing 304 records from UCI to investigates in anticipating condition of the heart dependent
predict the heart illness. The description of these parameters on the clinical data given by the patients. Our approach
is described below Table I. consists of three phases as mentioned later in this study. The
Heart Predictor Framework will utilize the data accuracy of prediction is near 88%.
mining information to give a user-arranged way to deal with We have developed a framework for heart disease
new and hidden designs in the information. The information prediction which is easy to understand. This framework
which is executed can be utilized by healthcare specialists to considered many factors including accuracy chart as shown
show signs of better improvement of service and to lessen in figure 3, graph plotted based on the given clinical input
the degree of medicine impact. on the user- interface. The Approach we have proposed is
effective in predicting the heart disease of a patient. The
IV. PROPOSED METHODOLOGY Heart Disease Prediction framework developed in this study
Heart Disease prediction is nothing, but which describes methodology that can be utilized in the order of coronary
the state of the heart. Heart Disease is a major concern of disease.
death for both men and women. It is one of the biggest The process of our methodology is as follows:
problems in our world. So, its prediction has become a
subject in one of the parts in the analysis of the information.
A. Phase 1 - Data Preprocessing:
When the amount of data is huge, it is difficult for the
healthcare industry. Data Mining and Machine Learning In the first phase of our process, dataset which have
accepts the huge amount of data and then convert them into been taken from UCI is taken as input which consists of
information which is helpful to perform the predictions and 304 records. It is get uploaded to get Pre-Processed.
for taking respective decisions. There are the checkboxes available which contains the
algorithm models we have used in this methodology.
TABLE 1: PARAMETERS USED FOR PROJECT

SNO PARAMETERS VALUES

Authorized licensed use limited to: UNIVERSITY OF WESTERN ONTARIO. Downloaded on May 26,2021 at 10:48:55 UTC from IEEE Xplore. Restrictions apply.
are Gaussian Naive Bayes, Support Vector Machine,
Random Forest, K- Nearest Neighbour, Xg- Boost. For each
algorithm model, accuracy gets calculated for the given
dataset. The dataset gets trained with every algorithm we
have check boxed.

Fig. 1: Pre-processing of data

The algorithms we have considered are Gaussian Fig. 2: Data Flow Diagram of feature selection
Naïve Bayes, Support Vector Machine, Random Forest,
KNN, Xg- Boost. Select the check box of algorithms which a) Gaussian Naïve Bayes: Naïve Bayes classification
you want to use on the taken dataset. After selection, the is based on the Bayes Theorem. This method applies the
accuracy of each algorithm we have selected gets predicted. Bayes rules through independent of the presence or absence
A graph is also plotted as shown in the figure 3. A modal is of features. This is a vigorous classifier for predicting the
generated for the algorithm having the highest accuracy. Of heart disease. It is used for computing the probability of
all, Random Forest is having the best accuracy result after each class to classify the data sets. Equation is:
pre- processing our dataset.
P(Ci/Xf) P(Xf)
B. Phase 2 – Feature Selection: P(Xf/Ci) =
After pre- processing, the output obtained is the modal P(Ci)
generated by the algorithm having highest accuracy. Then Xf are the instances and Ci are the class values we consider
from the user- interface, the input is provided. All the 13 for predicting.
parameters i.e., gender, age, chol, exang, cp, fbs, trestbps,
slope, oldpeak, thal, ca, restecg, thalach. The input given is b) Support Vector Machine: Support vector Machine
passed to the python backend. By using the modal generated is also a classification method that manages both linear and
in the pre-processing the stage, the input given gets trained. non-linear datasets. It is a predictive analysis data-
After the input gets trained, a graph is plotted as shown classification algorithm that allocates new information
in the figure 5. Based on the graph, the diseased stage of the components to one of marked groupings. SVM uses kernel
patient is known. Accordingly, the precautions should be functions to classify the instance.
taken by the patient. It is the best method in which we get c) Random Forest: Machine Learning algorithms like
the result with less risk and in less time. Random Forest Random Forest can improve the
presentation of hazard forecasts by exploiting enormous
C. Classification Modelling information repositories to distinguish chance indicators and
increasingly complex interactions between them. In Random
In Machine Learning, there are many Classification
Forests, we pick an arbitrary determination of highlights for
models which alludes to a predictive modeling by taking the
developing the best split. Random Forests Classifier selects
given information as input. Extraction is done based on the a randomly subset of training dataset and then makes a set
models that are having the data classes. A classifier or a of decision trees. It decides the votes to decide the final test
classification model predicts categorical classes. Mainly in object class.
Machine Learning we perform two types of actions; one is
d) K nearest neighbor: KNN is a classification
predicting and another one is decision making. algorithm. This is a supervised algorithm. This algorithm is
In this paper, we have used some of the classification used to extract the knowledge based on the samples distance
models of Machine Learning. The models we are considered

Authorized licensed use limited to: UNIVERSITY OF WESTERN ONTARIO. Downloaded on May 26,2021 at 10:48:55 UTC from IEEE Xplore. Restrictions apply.
function and majority of k-nearest neighbors. It checks the This is the user- inteface from where we pass the
whole dataset to find the k nearest instances to the new imput to the python backend. After passing the input to the
instance and then output the mode for a classification backend, by using the modal generated in the first phase, a
problem. For some instances KNN algorithm does not go graph is plotted with the given input as shown in the Figure
well that is it gets the low accuracy when compared to the 5.
others.
e) Xg-Boost: This algorithm uses a gradient boosting
framework.For every iteration in this algorithm the error
gets reduced as it uses the optimal gradient rather than the
methods which uses classical gradients. Xg- Boost models
perform the best in laboratory results in each of the cases.

V. EXPERIMENTAL RESULTS
This shows that whether the patient has the heart disease
by considering the parameters such as age, gender, chol,
exang, restecg, thal, slope, ca, oldpeak, trestbps, fps, cp and
thalach. This experiment is performed by training the
dataset containing of 304 records with 13 different
parameters.
TABLE 2. ALGORITHMS ACCURACY COMPARISON Fig. 4: User- Interface

S No Classification Algorithms Accuracy

1 Gaussian Naïve Bayes 82.25%

2 Support Vector Machine 81.97%

3 Random Forest 88.52%

4 K- Nearest Neighbour 67.21%

5 Xg- Boost 78.69%

After performing all the classification techniques,


accuracy of random forest is with 88.52% which is good and Fig. 5: Ranking of features and taking precautions
higher when compared to other models. The accuracy
obtained for all the algorithm models is mentioned below in The output graph describes the heart condition i.e.,
the table 2. Those are the obtained accuracies after pre- the diseased stage of the person and then accordingly
processing our dataset. This is the accuracy chart we have precautions should be taken based on the result.
obtained after pre-processing of the classification models we
have selected for our Dataset. VI. CONCLUSION
From our studies, we have figured out how to
accomplish our exploration goals. To predict the heart
disease, we have used different machine learning
algorithms. Dataset for heart illness containing its factors
have been taken from UCI Machine Learning Repository
and further classification models applied on the respective
dataset. Our proposed methodology uses five machine
learning models namely Support Vector Machine, Random
Forest, KNN, Gaussian Naïve Bayes, Xg- Boost algorithms
for predicting of heart disease in a short period of time
retrieve the results and diminish the expenses for people.
Fig. 3: Implemented algorithms Accuracy Prediction We are utilizing these algorithms to improve the
standardization.
In the above chart, we are mentioning the To calculate the accuracy and to improve the
algorithm model and along with the accuracy of each of the performance of the algorithms taken and provided in the
algorithm model after training the dataset. The modal is domain required. To improve the flexibility and accuracy of
generated with the algorithm having the highest accuracy as the proposed framework by doing required possible
in the process mentioned in figure 3 which is Random enhancements.
Forest. Then in the next phase, the input of 13 different
attributes are given from the user interface.

Authorized licensed use limited to: UNIVERSITY OF WESTERN ONTARIO. Downloaded on May 26,2021 at 10:48:55 UTC from IEEE Xplore. Restrictions apply.
REFERENCES [5] Jaymin Patel, Prof. Teja;Upadhyay, Dr.Samir Patel, Heart Disease
Prediction using Machine Learning and Data Mining Technique,
[1] M. Marimuthu, M. Abinaya, K.S. Hariesh, K. Madhankumar, A IJCSC, vol 7, pp- 129- 137.
Review on Heart Disease Prediction using Machine Learning and
Data Analytics Approach, International Journal of Computer [6] Ashwini Shetty A, Chandra Naik, Different Data International Journal
Applications, Vol 181- No. 18, September 2018. of Innovative in Science Engineering and Technology, Vol.5, pp.277-
281.
[2] Animesh Hazra, Subrata Kumar Mandal, Amit Gupta, Arkomita
Mukherjee and Asmita Mukherjee Heart Disease Diagnosis and [7] Megha Shahi, R. Kaur Gurm, Heart Disease Prediction Computer
Science Technology, vol 6, pp.457-466.
Prediction Using Machine Learning and Data Mining Techiniques: A
Review, ISSN vol 10, No. 7, pp. 2137- 2159. [8] R. Sharmila, S. Chellammal, A conceptual method to enhance the
[3] K.Gomathi Kamaraj, D. Shanmuga Priyaa, Mutli Disease Journal of prediction of heart diseases using the data and Engineering, May
System and Software Engineering, Dec 2016. 2018.
[4] K. Polaraju, D. Durga Prasad, Prediction of Heart Disease using
Multiple Linear Regression Model, IJEDR,vol 5, ISSN:2321-9939,
2017.

Authorized licensed use limited to: UNIVERSITY OF WESTERN ONTARIO. Downloaded on May 26,2021 at 10:48:55 UTC from IEEE Xplore. Restrictions apply.

You might also like