You are on page 1of 4

Predicting cardiovascular disease using logistic

regression
Shobhit Rana1, Student, Nilanjana Pradhan2, Assistant Professor,

School of Computing Science and Engineering, School of Computing Science and Engineering,

Galgotias University, Greater Noida, India Galgotias University, Greater Noida, India,

Shobhit.rana09@gmail.com nilanjana.pradhan@gmail.com

Abstract—Cardiovascular disease is affecting people all doctor. The system proposed in this paper is meant
around us and of all ages, now with the change in our to develop a disease prediction system by
lifestyle it has become an invincible part of life. Most of exploiting machine learning. The classification
the cases can be dealt with if we could predict it within the prediction system is done with the help
accurately before happening. Our healthcare facilities of the logistical regression algorithm. This may
need to be advanced so that it will be possible to make facilitate correct prediction of wellness and also
better decisions for patient diagnosis and treatment, facilitate in the correct treatment of disease.
which is necessary for the well-being of the people.
Thus, it’s necessary to create a cardiovascular disease When machine learning is employed in aid to
prediction system that could accurately predict the supplement taking care of patients, high results are
occurrence of cardiovascular disease which may be achieved. It has made it easier to spot various types
difficult for a healthcare professional. In this paper, a of diseases and perform diagnoses accurately.
model is proposed for predicting the disease of a person Predictive analysis when performed with the
by knowing demographic, behavioural, and medical assistance of several efficient machine learning
threat factors. The model uses the logistic regression algorithms may facilitate predicting any disease
algorithm, to improve the accuracy of the present with great accuracy and help to treat patients. The
cardiovascular prediction system. The logistic regression huge amount of medical information containing
algorithm assigns observations to a discrete set of classes treatment history and the patient’s health data are
and provides a good level of accuracy. It accesses a often used to extract data for predicting diseases
person's health records and uses the logistic regression happening in near future. The data which is hidden
algorithm to predict accordingly. It will help in assisting within the medical information is used to make
healthcare practitioners by utilizing the resources and by effective decision making for the patient’s health.
saving countless lives.
One of the most vital machine learning application
Keywords—Cardiovascular disease, healthcare, logistic is within the rapidly evolving field of healthcare.
regression Once machine learning is employed in healthcare,
it helps individuals to process vast and complex
disease datasets to perform analysis to turn them
I. INTRODUCTION into helpful clinical insights. The healthcare
Many patients go untreated or are not treated facilities need to be advanced so that it would
accurately by the doctors, proper treatment is become possible to take better decisions for
necessary for the well-being of the people. Thus, patient’s treatment. Then this will be further
predicting a disease using the patient’s symptoms employed by medical practitioners to provide
has become an important task these days. There is a accurate treatment to patients. Hence, machine
lack of doctors in India, there is 1 doctor for every learning, once enforced in healthcare will result in
10,198 doctors in India (WHO recommends the high patient satisfaction. In this paper, the logistic
ratio of 1:1,00). To solve this acute shortage of regression algorithm will be used to predict
doctors there must be a predicting system for cardiovascular disease using the patient’s treatment
predicting the general diseases which would help in history and health data.
proper utilization of the resources.
Logistic Regression algorithm
The first step in treating a patient is the correct
detection of the wellness of an individual by using The logistical regression is also referred to as the
the given symptoms. The prediction of the disease sigmoid function that helps easy representation of
has become a vital task lately however the correct graphs. It additionally provides high accuracy. In
prediction of diseases has become too tough for a this algorithm, the data should be first imported,
and then it should be trained. It can be considered machine learning classifiers, and (v) classifiers’
as a form of regression analysis algorithm, which performance evaluation methods.
efficiently predicts the outcome of a categorical
dependent variable by using a set of predictor
variables. It is mainly used to predict and calculate
the probability. Also, in this type of regression, the
variable amount is usually binary.

II. LITERATURE REVIEW Fig. 1. Cardiovascular disease prediction model framework

The first area of this review concerns IV. IMPLEMENTATIONS


cardiovascular disease prediction using patients' The proposed model is implemented on a heart
treatment history and health data. As per a 2019 disease dataset found on the Kaggle website.
study, in [2] Akash C. Jamgade and Prof. S. D. Although the proposed model can work on any type
Zade a disease risk model was successfully of disease, it has been implemented on the heart
obtained by combining both structured and disease (HD) dataset to check the efficiency of the
unstructured data features. The model showed that model. The purpose of the classification is to
by using statistical knowledge, we could determine predict using logistic regression whether a person
the chronic diseases that will happen in a particular has a chance of coronary heart disease (HD) in 10
region and within a selected community in the near years or not. The dataset presents the patients’
future. The paper also showed that by combining statistics like demographic, behavioural, and
both structured and unstructured data, together with medical threat factors. The dataset contains over
the K-mean clustering algorithm good accuracy 4,000 records and about 15 attributes.
rate can be reached. However, the K-means
clustering partitions a dataset into different groups
of comparable objects, clusters that are highly
dissimilar from the others are considered outliers
and are often discarded. It would be a better idea to
use logistic regression algorithm because it is an
efficient classification algorithm that doesn’t
discard any data and uses all data efficiently. The
improved logistic regression algorithm mentioned
in this paper can classify data more correctly and
achieve higher accuracy. Also, it should show more
illustrative to broaden the scope of this study by
applying the disease prediction model to a bigger Fig. 2. an imported dataset from Kaggle
category of diseases, which has been precisely tried
in this paper. Then we choose by deciding on the variable quality
and the variable amount, where every attribute is
III. PROPOSED MODEL taken under consideration as a probable chance
factor. There are numerous types of demographic,
The model proposed in this paper has been
behavioural, and medical risk elements involved.
developed to classify people on the basis that they
suffer from cardiovascular disease or not. The
performance of the predictive model with selected
features is tested to predict the probabilities of
suffering from heart disease. The important
features present in the dataset were selected using
the feature selection algorithm, and the
performance of the classifiers was tested on these
selected features. The Framingham heart condition
dataset is taken from Kaggle and has been
employed in our study. The popular machine
learning classifier logistic regression is employed
within the system. The proposed model’s validation
metrics and the performance evaluation metrics are
computed. The methodology of the proposed Fig. 3. Number of people with history of heart disease
system is structured into five stages which include:
The above figure shows the medical record of
(i) pre-processing of the given dataset, (ii) feature 4,000 people out of which about 3,500 people have
selection, (iii) cross-validation method, (iv) not suffered from cardiovascular disease in the past
whereas 5,000 people have suffered from
cardiovascular disease.

Fig. 6. Confusion matrix for model evaluation

The confusion matrix shows that the model made


662 correct predictions and 89 incorrect ones . The
negative values are predicted more accurately than
the positives. Also, the model achieves an accuracy
rate of 88%.
V. CONCLUSION
The model predicted the risk of future
Fig. 4. independent and dependent variables cardiovascular disease in 10 years with an accuracy
of 88%. An increase in age, along with the wide
The heart disease dataset is then split into two variety of factors like increase in cigarettes smoked
subsets i.e. training data and testing data and that per day and an increase in a person’s systolic blood
we fit our model on train data to form predictions pressure can also lead to increased chances of
on the test data After that two things can end up getting a cardiovascular disease. It has been
happening, we might overfit our model or we might concluded that men are more prone to heart disease
underfit our model. Any of those things happening than women. Due to the presence of appropriate
would affect the predictability of our model, so we HDL cholesterol inside the cholesterol reading, the
might find ourselves employing a model with lower total cholesterol shows no enormous change in the
accuracy. For the heart disease dataset, the training odds of cardiovascular disease. Similarly, there is
set is taken as 80% of the actual data and test set as also a very negligible change in odds in case of
20% of the data. glucose (0.2%).
REFERENCES
[1]https://www.kaggle.com/amanajmera1/
framingham-heart-study-dataset
[2] Akash C. Jamgade and Prof. S. D. Zade,
Fig. 5. Effect in odds of heart disease
“Disease Prediction Using Machine Learning”,
International Research Journal of Engineering and
This fitted model shows that, holding all other Technology, Vol. 5, Issue 6, May 2019.
features constant, the odds of suffering from
cardiovascular disease for males (sex_male = 1) [3] Vinitha S, Sweetlin S, Vinusha H, and Sajini S,
over that of females (sex_male = 0) is exp(0.5815) “Disease prediction using machine learning over
= 1.788687. In terms of percent change, we can say Big Data”, Computer Science & Engineering: An
that the odds for males are 78.8% higher than the International Journal, Vol. 8, No. 1, February 2018.
odds for females. The coefficient for age says that,
holding all others constant, we will see 6.76% [4] S. Patel and H. Patel, “Survey of data mining
increase in the odds of suffering from techniques used in healthcare domain”, Int. J. of
cardiovascular disease with one year increase in Inform. Sci. and Tech., Vol. 6, pp. 53-60, March
age since exp(0.0655) = 1.067644. Similarly, with 2016.
every extra cigarette one smokes there is a 2%
increase in the odds of getting cardiovascular [5] M. Abinaya, M. Marimuthu, K.S. Hariesh, K.
disease. For Total cholesterol level and glucose Madhankumar and V. Pavithra, “A Review on
level there is no significant change. There is a 1.7% Heart Disease Prediction using Machine Learning
increase in odds for every unit increase in systolic and Data Analytics Approach”, International
Blood Pressure.
Journal of Computer Applications, Vol. 181, No. [8] G. Subbalakshmi, “Decision Support in Heart
18, September 2018. Disease prediction system using Naive Bayes”,
Indian Journal of Computer Science and
[6] Min Chen, Yixue Hao, Kai Hwang, Lu Wang Engineering, Vol.2, No. 2, April 2011.
and Lin Wang, “Disease Prediction by Machine
Learning Over Big Data From Healthcare
Communities”, IEEE, Vol. 5, April 2017. [9] Reddy Prasad, Pidaparthi Anjali, S. Adil, N.
Deepa, “Heart Disease Prediction using Logistic
[7] Tarigoppula V.S Sriram, M. Venkateswara Rao, Regression Algorithm using Machine Learning”,
G V Satya Narayana, DSVGK Kaladhar and T International Journal of Engineering and Advanced
Pandu Ranga Vital, “Intelligent Parkinson Disease Technology, Vol. 8, Issue 3S, February 2019.
Prediction Using Machine Learning Algorithms”,
International Journal of Engineering and Innovative
Technology, Vol. 3, Issue 3, September 2013.

You might also like