Professional Documents
Culture Documents
1
Ezigbo L.I., 2Okonkwo R.O., 3Nwobodo L.O
1,2,3
Enugu State University of Science and Technology
ifyeigbolucy@yahoo.com
Abstract
This paper presents the development of a cardiovascular heart disease classification model
using neural network based data mining technique. The aim was to develop a classification
model for the detection of cardiovascular heart disease using data mining technique. The
research methods are data collection, data extraction, feed forward neural network, and the
cardiovascular classification model. The study collected data of 18 heart disease attributes
with updated features such as alcohol and kola-nut which is domicile in the African region
and has contributed to cardiovascular heart problem in the region. The data was extracted
using Best First Search Method and then trained with neural network model. The models were
implemented with neural network toolbox in Simulink and evaluated. The result achieved a
classification accuracy of 96.51%; Mean square error value of 0.0356810Mu and True
positive classification value of 0.959. The result was compared with existing state of the model
and a percentage improvement of 2.26% was achieved.
Keywords: Cardiovascular Heart Disease, Neural Network, Data Mining, Classification Model
I. INTRODUCTION
According to Carlos (2004) cardiovascular epidermis, which are age, gender, diabetes;
disease is a kind of heart disease which high cholesterol, obesity among other and
affects the heart organ, thus resulting to the heart disease type are dependent on the
various symptoms such as chest pain, symptoms. Cindy (2008) classified heart
complications among other. Many risk disease as congenital, congestive, coronary,
factors have been attributed with this pulmonary and rheumatic heart diseases
respectively and all are very dangerous if et al., 2015; Amato et al., 2015) among
not detected early and treated. others have proposed the use of
classification model based on artificial
According to Richard (2009), heart disease
intelligence and data mining techniques for
remains the most cause of death and stroke
solve this problem of heart disease early
in the last decade”, with over 1.5million
detection.
cases recorded in Nigeria alone per annum.
This population is most common for babies However, from the review and also from
less than two years and senior citizens from discussions with domain expert, it was
60 years and above (Google Medical observed that attributes of heart disease are
Information, 2021). According to Anitha environmental related and some causes of
and Sridevi (2019), in the United States, heart disease in one region may not be the
over 11 million citizens are estimated to same in other regions. For instance the
have this problem by 2050 and 17.9 million consumption of Kola nut which is rampart in
in Europe by 2060. the African region according to (Agatha et
al., 1987) contributes to the rate of heart
Various approaches have been employed to
disease, but not always the case in the
tackle this problem ranging from traditional
European environment. This as a result
to scientific measures. However despite the
makes the conventional classification model,
effort, the disease still posed major threat to
not reliable for the detection of heart disease
the existence of man, especially those with
as they lack some vital attributes like kola
the risk factors. Therefore it is very
nut features content in the body to help
important for physicians to detect this
predict the risk of heart abnormally. To this
epidermis at a very tender stage and prefer
end, there is need for a system which
measures for early diagnosis.
considers this attributes alongside other
Today the conventional means to help heart disease related attributes, and then
cardiologist detect heart disease is based on train with a data mining technique to predict
electronic health recorded, however these heart diseases in time series.
data are not reliable to help detect these
Data Mining has attracted great attention
diseases on early stage. Over the year many
from various fields due to wide and large
literatures like (Sarangam et al., 2018;
data present in these fields. It is a means of
Zeinab, 2017; Gomath et al., 2016; Jaymin
extracting meaningful and useful adopted and used to predict early heart
information from the large databases and is disease issues in patients in this research.
made up of series of algorithms for solving
regression and classification based pattern
recognition problems. This technique will be
Sarangam et al. (2018) used decision table and the result showed a respective
and Naïve Bayesian technique for the prediction accuracies of 79%, 77% and 76%
prediction of heart disease. The result respectively. However despite the success
obtained is 83.70%, the author in another achieved, there is need for improvement.
research used random forest algorithm for Jayamin et al. (2016) detected and predicted
the same cause and achieved a detection heart disease using logistic model tree and
accuracy of 81.85%. However despite the achieved an accuracy of 55.8%. The result
success achieved in the various algorithms however still needs to be improved with a
used, there is need for improved better accuracy.
performance. Zeinab (2017) used ANN with
Amator (2015) presented a paper on the
genetic algorithm for the detection and
ANN based device as a physical aid for
prediction of heart disease using 12
physicians to analyzed body and detect
variables as input and achieved an accuracy
heart beat rate. However the accuracy of the
of 93.85%, however despite the success
system is 89% and can be improved.
achieved, the study was limited by the
Chaurasia et al. (2017) presented a paper
dataset used and kolanut was not
which developed a knowledge based expert
considered.
system using data mining technique to
Gomath et al. (2016) presented a paper enable doctors and physicians to forecast
which used Naives Bayesian, ANN and and predict heart disease disorder early,
decision list for the detection and prediction however the success of the system was
of heart disease respectively. They study limited by the training dataset.
was designed and implemented for testing,
III. METHODLOGY
The methodology used for the development model for the detection of cardiovascular
of the new system was guided by the disease. The dataset were extracted using
Dynamic Systems Development Model Best First Search extraction method in (Han
(DSDM) approach which accommodates and Kamber, 2006) to find a minimum set
the use of structural and mathematical of attributes such that the resulting
modeling inspired techniques for the probability distribution of the data classes is
development of the cardiovascular disease as close as possible to the original
detection system. The methods used were distribution obtained using all attributes and
data collection from physiobank then trained with feed forward neural
(physiobank, 2021) repository as the network to develop the classification model
primary source of data collection, while needed for the detection of cardiovascular
Parklane hospital, Enugu State, Nigeria is heart disease.
the secondary source of data collection
IV. SYSTEM DESIGN
which provided the new heart disease
features of kolanut identified. The sample The system design used the table 1 to
size of data collected is 700MB of heart present the problem formulation as shown
beats recorded between the frequency of 0- in the attributes of the cardiovascular data
128Hz from Parklane and 20 gigabyte of collected. The data model was presented in
recorded heart disease signal from figure 1 using class diagram which showed
physiobank. These were organized into how the various attributes of the data
classes of standard and clinical attributes. collection were classified as standard or
The data was integrated to develop the data clinical symptoms.
The data model in figure 1 was used to Cough, Palpitation, Chest pain,
develop the improved dataset containing Breathlessness, Legs swelling, inability to
new attributes like kolanut and alcoholic lie down because of breathlessness into one
features which were omitted in the dataset of two classes.
conventional data (see physiobank)
Data extraction Model
collected and also existing models. The data
model classified all the cardiovascular The data extraction model was developed
disease attributes collected such as Age, based on Best First Search Method in Han
Weight/BMI, Cholesterol, Blood sugar, and Kamber (2006) which extracted
Blood pressure, Exercise/sedentary, features such as Blood Pressure, Cholesterol
Smoking/no of sticks, level, Blood sugar, Heart rate, Weight, Age,
Alcohol/units/duration, Beverages/Coffee Chest pain, Palpitation and Leg swelling
with caffeine. Clinical symptoms are: from the data model as shown in equation 1;
∑ cadiological attributes ( n )
2
i=1
Fx=
total feature ve ctors
1.0
Where i is the total feature vectors, n is any FFNN was made of interconnected neurons
of the selected input symptoms, Fx is the which has weight and bias function. The
feature vectors. neurons were activated with tanh activation
function and then trained with back
Model of the data mining based
propagation algorithm to learn the
classification model
cardiovascular features and generate the
The data mining algorithm used is the Feed classification reference model. The model
Forward Neural Network (FFNN) adopted of the neural network workflow to generate
from Samuel and Oluwarotimi (2017) and the cardiovascular classification model was
reconfigured to train the data collected. The presented in figure 2;
The architectural model showed how the all represented in the model and then trained
FFNN was developed. The number of input the cardiovascular features to generate the
was inspired from the cardiovascular classification model for the diagnosis of
attributes collected which are 18, the hidden cardiology. Other parameters inclusive
layer was specified to be 27 to enable fast used for the development of the FFNN are
computation of the neurons, activation reported in table 2 while the logical flow
function and training algorithm used were chart was presented in figure 3;
Parameters Values
Training epochs 19
Size of hidden layers 27
Controller training segments 30
No. delayed reference input 2
Maximum feature output 3.1
Maximum feature input 21
Number of non hidden layers 2
Maximum interval per sec 2
No. delayed output 2
No. delayed feature output 2
Minimum reference value -0.7
Maximum reference value 0.7
Number of input 18
The system flow chart in figure 3 presented Method to identify key cardiovascular
the logical data flow of the system, showing attributes for classification with the FFNN
FFNN trained with the cardiovascular developed classification model. If the
training data to generate the classification classification result is cardiovascular
model was used to classify heart disease. disease with the input feature vectors, then
When input from the patient electronic the patient is recommended for diagnosis,
health recorded are collected, the features else the patients is recommended for other
are extracted based on the Best First Search test.
V. IMPLEMENTATION
To test the performance of these models, the Where CVA stands for Cross Validation
sensitivity (true positive rate) and specificity Accuracy, A is the accuracy measure for
(false positive rate) measures was used
each fold. During the training process, Mean
respectively. Sensitivity is also referred to as
the proportion of correctly classified vessel, Square Error (MSE) and Receiver Operator
while specificity is the false positive rate is Characteristics (ROC) curve were selected
the proportion of correctly detecting vessel.
as evaluation tool to measure the training
These models are presented in equation 2
and 3 respectively. error and cardiovascular detection
performance. The MSE analyzer was
presented in figure 4;
The result in figure 4 shows the MSE The next result showed the cardiovascular
performance of the classification model detection performance of the neural network
generated. The aim of this result was to classification model using Receiver
ensure that minimum error occurred during Operator Characteristics (ROC) curve which
the training process with a target of zero used the equation 2 and 3 to measure the
error. From the result, it was observed that true classification performance of the
the MSE value is 0.061673 which is good as system. The aim of this ROC was to
it is approximately zero at epoch 13. The achieved a TPR of 1 or a FPR of zero. These
implication of this result is that the neurons values implied optimal detection of the
correctly learn the cardiovascular data and cardiovascular problem from the test set and
produced constant validation output at epoch the ROC is presented in figure 5;
13.
classification result for cardiovascular dataset in tenfold and the average score
attributes. recorded as the achieved result. This training
process will consider the root mean square
To validate the system performance, tenfold
error, accuracy and true positive rate,
validation technique in equation 5 was used.
presented in table 4;
This process iteratively trains the multi
From the validation result the average TPR is also good. The average accuracy recorded
of the classification model is 0.959 which is is 96.51% which is very good too.
very good as it is very close to the ideal
VII. COMPARATIVE ANALYSIS
value which is 1. The MSE result is also
0.0356810Mu which is approximately zero, This section compared the performance of
indicating very negligible error score which the new classification model with some of
the conventional sate of the art models and
the results presented in figure 6;
Sarangam Kodati, and Dr. R Vivekanandam Zeinab Arabasadi (2017), “ Computer aided
(2018)“A Comparative Study on decision making for heart disease
Open Source Data Mining Tool for detection using hybrid neural network
Heart Disease” , International Journal Genetic algorithm” , Computer
of Innovations & Advancement in Methods and Programs in
Computer Science, Vol. 7, Issue 3. Biomedicine-ELSEVIER, Vol. 141,
pp.19- 26.