You are on page 1of 10

High Technology Letters ISSN NO : 1006-6748

CARDIAC ATTACK PREDICTION USING MACHINE LEARNING


TECHNIQUES

Manu Gupta1, M. Sriniha2, K. Sreehasa3, Bhuvana Chandra4


Department of Electronics and Computer Engineering,
Sreenidhi Institute of Science and Technology, Hyderabad,India

Abstract
According to recent reports, the leading cause of death is heart disease, especially heart
attacks, which has surpassed cancer as the world's leading cause of death. Heart illnesses are
one of the most lethal, but silent killers of humans, resulting in an annual increase in the
death rate of patients. Machine Learning is playing a major role in predicting the
presence/absence of locomotor disorders, Lung diseases, brain tumors, heart diseases and so
on. This helps in early diagnosis and treatment of patient. In proposed work, various machine
learning algorithms i.e., KNN, Naïve Bayes classifier are applied for cardiac attack prediction
and analyze their performance. This work intends for designing a user-friendly Graphical
User Interface (GUI) which predicts the cardiac attacks based on the patient record data with
14 attributes like Blood Pressure(bp), age, slope, exang etc. The attributes are recorded as
patient input and a health condition of patients is predicted as safe or risky. The system also
gives recommendations of expert doctors for further consultations.

Keywords:-Cardiac Attack, Graphical User Interface, Naïve Bayes Classifier, Random


Forest, Ensemble Learning.

I.INTRODUCTION
Heart illnesses are one of the most lethal, but silent killers of humans, resulting in an annual
increase in the death rate of patients. In 2020, the World Health Organization (WHO) claimed
that heart disease is responsible for 17.9 million deaths globally each year. In the healthcare
industry, massive amounts of data are created on a daily basis, containing many sorts of data,
and gaining information from these data is critical. This knowledge may be obtained by
creating models from the medical records dataset using various data mining approaches.
Machine Learning (ML) known in the corporate world for its valuable use in controlling,
contrasting and managing large datasets can be applied with much success to the prediction
of cardiovascular diseases. The doctors cannot go through every minute detail of the data and
predict accurately each and every time. It is time consuming and risky. This project intends
for designing a user-friendly Graphical User Interface (GUI) with machine learning models
running in the background, which predicts the cardiac attacks based on the patient record data
consisting 14 attributes. This GUI takes patient’s data as attributes which is input to the
system and gives the output showing whether the health condition of patient will be safe or
risky predicted by various classifiers.

II.LITERATURE REVIEW
In this modern era, we cannot deny the existence of data and its importance anywhere.
Whether you are a buyer or seller, company or customer, doctor or patient there is a lot of
data involved, everywhere all the significant stories live in your data which if arranged and

Volume 27, Issue 5, 2021 608 http://www.gjstx-e.cn/


High Technology Letters ISSN NO : 1006-6748

organized clearly, it can be a powerful means of data observation and comprehension. On a


daily basis, vast amounts of data are created in the health care business, which includes many
forms of data, and gaining information from these data is critical. This knowledge may be
obtained by creating models from the medical records dataset using various data mining
approaches [1-4].
Thirumalai et al. [5] predicted that it is not feasible to predict heart attack using 12 attributes
taken from cutting edge of smartphones. They selected attributes by using Pearson model that
contribute towards effective prediction of heart attack and finally used linear regression for
predicting heart attacks. A study is performed by Srinivas et al. [6] on workers of Singareni
coal mine who died or suffered with heart attacks. In this study, 15 attributes are used to build
a model for predicting the heart attack. The study took details of actual patients for building a
model and used data mining for feature selection. Then Naïve Bayes and Decision Trees are
used for building the model which gave results around 60%. Obasi and Shafiq [7] performed
a study on real-time patients to build a model for helping the doctors and patients for better
detection and treatment of heart attacks respectively. In this study, about 18 attributes are
taken which are the serious factors for causing heart attacks in patients. After data pre-
processing, a model is built using Random Forest Classifier, Naïve Bayes and Logistic
regression which gave results around 80%.
Many researches [8-10] are carried out and many models are built for predicting heart attacks
but, none of the research have provided maximum out of data visualization i.e. by providing a
graphical representation of the data to know more about the various approaches. This
research took maximum advantage of data visualization by plotting the data on different
graphs which can be easily interpreted by the doctors and patients as well. A GUI is created
which is user friendly both for doctors and patients to predict the possibility that the patient
may get heart attacks. This GUI can be used by patients without doctor intervention, by
entering the details of patient like blood pressure, exang, age, calicum etc. In this work,
ensemble learning algorithms i.e. Random Forest, XGBoost and Soft Voting are used for data
classification. The KNN and Naive Bayes are also applied and their output is analyzed. The
performance of these classifiers is compared for heart attack prediction.

III. PROPOSED SYSTEM


The flow chart of the proposed system for cardiac attack prediction is shown in Fig. 1. The
proposed system comprises of following stages:

• Historical data: In this study, a dataset from Kaggle website is taken for creating
model with maximum possible accuracy. It includes an actual dataset of 1026 data
samples with 14 different features (13 predictors; 1 target), such as blood pressure,
kind of chest discomfort, ECG result, and so on.
• Data Pre-processing: Preparing the raw data for a machine learning model. We
have to evaluate the raw data and check whether the data is overfitted or not. Cross
validation method is used in proposed work for this stage.
• Attribute Selection: Select a subset of relevant features like variables, predictors
which are used to construct a model --- like, BP, age, exang, slope etc...
• Training Data: Train the model with different algorithms using 75% training data.
• Testing Data: Test the model with 25% of the test data and compare the values.
• Evaluate Prediction: Evaluate the predicted value with actual value using
performance metrics like accuracy score, classification report and confusion
matrix.

Volume 27, Issue 5, 2021 609 http://www.gjstx-e.cn/


Fig. 1. Flow chart of proposed system

3.1. Classifiers Used in Proposed Work


In proposed work, Random forest, Naïve Bayes, K
Voting Classifier and XGBoost Classifier are utilized for data classification and cardiac
attack prediction for the patient depending upon the input attributes. The class
described in detail in following sections.
3.1.1 Random Forest
Random Forest is a popular machine learning algorithm that belongs to the supervised
learning technique. It is often used for both Classification and Regression problems in ML. It
is based on the concept of ensemble learning, which is a process of combining multiple
classifiers to solve a complex problem and to improve the performance of the model. Figure 2
shows the simplified diagram explaining working of random forest classifier.

Fig. 2. Random Forest Classifier


High Technology Letters ISSN NO : 1006-6748

3.1.2 K-Nearest Neighbor (KNN)


K-Nearest Neighbor is one of the simplest Machine Learning algorithms based on
Supervised Learning technique. It assumes the similarity between the new case/data and
existing or available cases or data in-order to put the new case or data into the category
which it is most similar to. Since it uses all of the testing samples for classification and
merely stores the data in memory, the KNN classifier does not require any advanced
training. Since it makes no assumptions about the training results, KNN is a non-
parametric algorithm. This makes it useful for non-linear data.
3.1.3 Naïve Bayes Classifier
Naïve Bayes Classifier is one of the simple and most effective classification algorithms. It
helps in building machine learning models rapidly within no time that can make quick
predictions. This classifier is based on probability, which means it predicts on the basis of
the probability of an object. The Bayes Theorem is often used to create a collection of
classification algorithms known as Naive Bayes classifiers. It is a group of algorithms
that share a common maxim, namely that each pair of features being classified is
independent of the others. Baye’s theorem used to calculate conditional posterior
probabilities P (C│S) of class C for given data sample X as follows:

| .  
| = 

(1)

where |  is the posterior probability of S conditioned on C,  is the probability


of the input dataset and  is the prior probability of class.
3.1.4 BAGGING: Soft Voting Classifier
There are two types of voting used to aggregate the output of all the weak learners.

• Hard voting — If we receive the output of each weak learner in classes (for e.g: 0/1),
and finally we select the output class which was returned in majority, then this
method is called hard voting.

• Soft voting — There are many algorithms which also provide us with prediction
probabilities and if we receive the output in probabilities, we take the average of the
probabilities of each of the classes and whichever class is having the highest average
will be the final prediction. This way of prediction is called Soft voting and is better
than hard voting as we receive more information in terms of probability.
3.1.5 BOOSTING: XGBoost
The main idea behind boosting is to convert weak learners to strong learners in sequential
iterations. XGBClassifier is a classification class that works with the scikit-learn API.
3.2. UML Diagrams
UML diagrams define the system's boundary, configuration, and action, as well as the
objects inside it. UML Diagrams for our application are as follows:
3.2.1 Use Case Diagram
A use case is a description of how a user who actually uses that process or system will
accomplish a goal. The admin will manage the dataset and train the model. Here the Doctor is
a regular user who uses the GUI to interact with system to obtain results. Admin is the one

Volume 27, Issue 5, 2021 611 http://www.gjstx-e.cn/


High Technology Letters ISSN NO : 1006-6748

who will login, manage dataset and also train dataset for a best model when used gives
accurate predicted values. The doctor is the one who uses this interface to interact with model
and treats patients based on the predicted values obtained by given input attribute values.

Fig. 3. Use Case diagram

3.2.2 Class Diagram


The class diagram describes the structure of the system by depicting the system's classes,
attributes, and their operations, and the relationships among objects. We have four class
namely admin, doctor, modal class, patient and Database (DB) class.
Patient: The attributes of patient are patient ID, patient NAME which will be unique to
each patient.
Doctor: The next is doctor, who treats the patients. The attributes of doctor are ID, PWD,
NAME of the doctor for accessing the interface and the operations of the doctor are
give_input() which gives input as patient data, view_report() which shows the patients
report, write_test() which writes the tests if required and diagonize() which diagonizes the
patient.
DB: This is the database the doctor uses for accessing patients reports and all. Attributes
are ID, PWD of doctor for accessing and operations are analyze_data() and give_results().
Admin: The admin is the one who manages the DB class and also Modal Class. This
performs operations on data like add, delete, update, retrieve the dataset.
Modal Class: This class processes the data, splits it, trains the model with data and also
tests it.

Volume 27, Issue 5, 2021 612 http://www.gjstx-e.cn/


High Technology Letters ISSN NO : 1006-6748

Fig. 4. Class diagram

3.2.3 Sequence Diagram for Training Model


Sequence Diagrams are interaction diagrams that details how operations are carried out.
They capture the interaction between objects in the context of collaboration. This
explains the step-by-step
step process of training the model.

Fig. 5. Sequence diagram

Volume 27, Issue 5, 2021 613 http://www.gjstx-e.cn/


High Technology Letters ISSN NO : 1006-6748

3.2.4 Activity Diagrams


Activity diagram depicts the flow of the events in the whole system as shown in Fig. 6.

Fig. 6. Activity diagram

IV. IMPLEMENTATION AND RESULTS


This section explains the implementation procedure for the proposed system and the results
obtained.

4.1 Language/Technology used for implementation


Python is a programming language and environment for developing both desktop and web
applications. Python offers readable code. While complex algorithms stand behind machine
learning and AI, Python's simplicity allows programmers and developers to write reliable
systems. Developers put all their effort into solving a Machine learning problem.

4.2. Procedure Followed


• Step-1: First, all the packages are imported into jupyter notebook, where the actual
model is going to be created, like pandas, numpy, seaborn, plotly.express,
matplotlib.pyplot, pickle and from sklearn.model_selection import train_test_split.
• Step-2: Now, read the data which is in the csv form from the device and check for false
or null values in the data.
• Step-3: The data is split into train and test datasets which are 75% and 25% respectively.
• Step-4: The training data is used for training different types of classifiers as mentioned
in previous sections, Random Forest, KNN, Naive Bayes, Soft Voting, XGBoost.
• Step-5: Then, the test data is used to test classifiers for their performance of predicting
unseen data with accuracy.
• Step-6: This model is then dumped using pickle which is again loaded when the
patient’s attributes are entered for results by the doctor in the GUI.
• Step-7: The GUI is created, name “app.py”, using python language when executed
generates a link which is a link of GUI interface for entering the data by the doctor.

Volume 27, Issue 5, 2021 614 http://www.gjstx-e.cn/


High Technology Letters ISSN NO : 1006-6748

When attributes are entered and requested to predict, the pickle loaded in “app.py” is
executed and the results are finally displayed.

4.3. Classification Results


The performance of various classifiers for heart attack prediction using proposed system is
summarized in Table 1.

Table 1: Classification Results

S.No. Classifier Accuracy(in %) F1 Score

1 Random Forest Classifier 96.43 0.96

2 KNN CLASSIFIER 71.75 0.72

3 Naïve Bayes Classifier 83.44 0.82

4 SOFT-VOTING (Bagging) 92.86 0.93

5 XgBoost (BOOSTING) 95.45 0.95

4.4. GUI Results


The results obtained from the GUI developed in proposed system are demonstrated in
Fig. 7 and Fig. 8 showing input and output interfaces respectively.

Fig. 7. Input Interface

Volume 27, Issue 5, 2021 615 http://www.gjstx-e.cn/


High Technology Letters ISSN NO : 1006-6748

Fig. 8. (a) Output Interface for Patient with Heart disease risk

Fig. 8. (b) Output Interface for Healthy Patient

V. CONCLUSION AND FUTURE SCOPE


The results obtained shows that the proposed system provides accuracy of 96% in heart
attack prediction using random forest classifier. This is comparable to accuracy achieved
using bagging (92%) and boosting (965%) methods but higher in comparison to that
attained from KNN (71%) and Naïve Bayes (83%) classifier.

Volume 27, Issue 5, 2021 616 http://www.gjstx-e.cn/


High Technology Letters ISSN NO : 1006-6748

The proposed system can be used to reduce risk and to prevent heart attack or sudden
cardiac death. Using various machine learning algorithms, we have become successful in
forecasting the health condition of the user. We have created a GUI where patient can
enter 14 attributes of his health and know his health condition. As prevention is better
than cure, this project focuses on prevention of the cardiac emergencies which helps in
saving many lives.
This model can be developed and used in smart watches for predicting heart attack and
indicating an alarm in advance to prevent emergencies. Also hospitals can create their
own app or website by including this model for better treatment of heart attacks. These
developments reduce deaths due to heart attacks to maximum rate.

References
[1] Takci H. Improvement of heart attack prediction by the feature selection methods.
Turkish Journal of Electrical Engineering & Computer Sciences. 2018 Jan
27;26(1):1-0.
[2] Patil SB, Kumaraswamy YS. Extraction of significant patterns from heart disease
warehouses for heart attack prediction. IJCSNS. 2009 Feb 28;9(2):228-35.
[3] Alexander CA, Wang L. Big data analytics in heart attack prediction. J
Nurs Care. 2017 Apr;6(393):2167-1168.
[4] Rajkumar, A. and Reena, G.S., 2010. Diagnosis of heart disease using datamining
algorithm.Global journal of computer science and technology, 10(10), pp.38-43.
[5] C. Thirumalai, A. Duba and R. Reddy, "Decision making system using machine
learning and Pearson for heart attack," 2017 International conference of
Electronics, Communication and Aerospace Technology (ICECA), 2017, pp. 206-
210.
[6] K. Srinivas, G. R. Rao and A. Govardhan, "Analysis of coronary heart disease and
prediction of heart attack in coal mining regions using data mining techniques,"
2010 5th International Conference on Computer Science & Education, 2010, pp.
1344-1349.
[7] T. Obasi and M. OmairShafiq, "Towards comparing and using Machine Learning
techniques for detecting and predicting Heart Attack and Diseases," 2019 IEEE
International Conference on Big Data (Big Data), 2019, pp. 2393-2402.
[8] Jabbar MA, Chandra P, Deekshatulu BL. Cluster based association rule mining
for heart attack prediction. Journal of Theoretical and Applied Information
Technology. 2011 Oct 31;32(2):196-201.
[9] Soni J, Ansari U, Sharma D, Soni S. Predictive data mining for medical diagnosis:
An overview of heart disease prediction. International Journal of Computer
Applications. 2011 Mar 8;17(8):43-8.
[10] Kwon, Jm., Kim, KH.,Jeon, KH. et al. Artificial intelligence algorithm for
predicting cardiac arrest using electrocardiography. Scand J Trauma ResuscEmerg
Med 28, 98 (2020).

Volume 27, Issue 5, 2021 617 http://www.gjstx-e.cn/

You might also like