You are on page 1of 12

Evolutionary Intelligence

https://doi.org/10.1007/s12065-019-00327-1

SPECIAL ISSUE

Hybrid genetic algorithm and a fuzzy logic classifier for heart disease


diagnosis
G. Thippa Reddy1 · M. Praveen Kumar Reddy1 · Kuruva Lakshmanna1 · Dharmendra Singh Rajput1 · Rajesh Kaluri1 ·
Gautam Srivastava2,3 

Received: 11 September 2019 / Revised: 11 November 2019 / Accepted: 15 November 2019


© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Abstract
For the past two decades, most of the people from developing countries are suffering from heart disease. Diagnosing these
diseases at earlier stages helps patients reduce the risk of death and also in reducing the cost of treatment. The objective of
adaptive genetic algorithm with fuzzy logic (AGAFL) model is to predict heart disease which will help medical practitioners
in diagnosing heart disease at early stages. The model consists of the rough sets based heart disease feature selection module
and the fuzzy rule based classification module. The generated rules from fuzzy classifiers are optimized by applying the adap-
tive genetic algorithm. First, important features which effect heart disease are selected by rough set theory. The second step
predicts the heart disease using the hybrid AGAFL classifier. The experimentation is performed on the publicly available UCI
heart disease datasets. Thorough experimental analysis shows that our approach has outperformed current existing methods.

Keywords  Disease classification · Adaptive genetic algorithm · Rough set theory · Feature reduction · Membership
function

1 Introduction huge amounts of data in the medical field [29]. Extracting


patterns from medical data helps medical practitioners in
The progress made in the field of computer technology, diagnosing patients.
storage of digital data, and technological advancement in A patient’s data is comprised of attributes like demogra-
communication technologies has enabled the generation of phy, test results, images, video clippings, and others [12].
Extraction of desired information from the voluminous data
manually is a herculean task considering size of the data
* Gautam Srivastava and wide dimensionsality of the data [2]. Hence, automated
srivastavag@brandonu.ca analyzing techniques are required for analyzing the data.
G. Thippa Reddy It is handy to use data mining techniques which can auto-
thippareddy.g@vit.ac.in mate the analysis as well as handle the large datasets [1].
M. Praveen Kumar Reddy Data mining [3] helps doctors in diagnosing the patients by
praveenkumarreddy@vit.ac.in extracting useful knowledge from patients’ medical data [6,
Kuruva Lakshmanna 16, 18]. We can use the term Medical data mining for mod-
lakshman.kuruva@vit.ac.in els classifying medical data. It uses data mining methods
Dharmendra Singh Rajput for obtaining accurate information. Medical data mining is
dharmendrasingh@vit.ac.in used to diagnose illness, administer therapy, establish rap-
Rajesh Kaluri port among doctors as well as patients, bettering managing
rajesh.kulari@vit.ac.in of healthcare, and so on [15, 24]. Every day, gigabytes of
1 medical data is generated from several sources including
Vellore Institute of Technology, Vellore, Tamil Nadu, India
image databases like SPECT, MRI, PET, signal databases
2
Department of Mathematics and Computer Science, Brandon like ECG and EEG [25]. Unlike traditional data mining, data
University, Brandon R7A 6A9, Canada
mining in the medical field is very cumbersome [5, 12, 27].
3
Research Center for Interneural Computing, China Medical
University, Taichung 40402, Taiwan, ROC

13
Vol.:(0123456789)
Evolutionary Intelligence

In the past few decades, millions of people across the Sample data was utilized to create the required fuzzy rules.
globe have succumbed to heart disease due to changes in The genetic algorithm chose a significant and relevant subset
lifestyle and food habits. Diagnosing heart disease takes a of rules. The parameters selected were serum cholesterol
lot of time. By using machine learning techniques, medi- (chol), sex, obtained maximum heart rate (thalach), ST
cal practitioners can get valuable inputs about the heart depression induced by exercise relative to rest (oldpeak),
patients, which will enable them to give proper treatment to exercise induced angina (exang), thal value as well as the
the patients [4]. This motivated us to develop this model to count of major vessels coloured (ca). Fuzzifying was done
help doctors in diagnosing heart disease patients. through Gaussian membership function, and de-fuzzifying
This paper proposes a novel AGAFL classifier for classify- by employing centroid method. The model was evaluated
ing heart disease datasets. AGAFL has three steps: by metrics such as specificity, accuracy, sensitivity, confu-
sion matrix.
1. reduction of features/dimensions utilizing rough sets Srinivas et al. [26] has predicted cardiac disease based
2. generating rules from the reduced dataset through the on rough-fuzzy hybrid classifier. The procedure used is: (1)
application of Fuzzy Logic Classifier generating rules utilizing rough set theory, (2) predicting
3. optimizing generated rules through the application of using fuzzy classifier. Experiment is carried out on publicly
Adaptive Genetic Algorithm. available Hungarian, Cleveland, as well as Switzerland heart
disease datasets. Seera and Lim [23] analyzed medical data
The latter utilizes a fitness function for optimizing the using a hybrid fuzzy min–max neural network, Random For-
rules generated through Fuzzy Logic Classifier. The major est model, as well as the Classification and Regression Tree.
contributions of the proposed model are as follows: Yuvraj and Vivekanandan [32] described SVM based
Classifying of Tumour with Factorizing of Symmetry-Non-
• rough set theory to identify most relevant features as Negative Matrix by utilization of Data for Gene Expression.
Rough Set theory is an effective tool to deal with vague- Genes were chosen through Non-negative Matrix Factoriza-
ness and uncertainty information to select the most rel- tion. Symmetry NMF was used for classifying and extraction
evant attributes for a decision system. of features was done by SVM-NMF. Finally, Support Vector
• Adaptive Genetic Algorithm to optimize the classifica- Machine with weighted kernel width was used for classifica-
tion rules, to achieve better accuracy, reduce time com- tion. In the same way, Vafaie et al. [30] had classified heart
plexity as justified in results and discussion section. disease datasets based on ECG by genetic-fuzzy system and
ECG signals’ dynamical model. Long et al [17] proposed a
model to diagnose cardiac ailment by an algorithm based on
2 Paper organization firefly optimization algorithm.
Few models based on Neural Networks are proposed
The rest of the paper is organized as follows. First in Sect. 2, by Kharat et al. [14] to classify human brain images based
we review all related works. Section 3 narrates the necessary on magnetic resonance. Their Neural Network technique
background information. We discuss the proposed classifica- included three stages namely dimensionality reduction, fea-
tion approach in Sect. 4. Experimental results are discussed ture extraction and classification. During the early stages,
in Sect. 5. Finally, Sect. 6 discusses future work followed by features correlated with MRI images are extracted with Dis-
Sect. 7 which summarizes some concluding remarks. crete Wavelet Transformation. In the next stage, MRI param-
eters get reduced using Principles Component Analysis. At
the categorizing stage, a couple of classifying units based
3 Related work on supervised machine learning are used. Former works on
principle of Feed-Forward Artificial Neural Network; the
For diagnosing diseases, several diagnostic techniques are latter rely on back propagation neural network. Brain images
proposed by researchers. Long et al. [17] narrated the car- of MRIs are classified to be normal or abnormal by utiliza-
diac diagnostic method by minimizing dimensions using tion of these classifying units. Henriques et al. [11] explains
rough sets as well as interval type-2 fuzzy logic system. predicting cardiac-failure decompensating events by trend-
Then, a hybrid learning procedure comprising of fuzzy analysis of tele-monitoring data.
c-means clustering and tuning of parameters using firefly Doctors as well as researchers have proposed several
algorithm is employed on the dataset. methods to predict diabetes in order to reduce the cost of
Santhanam and Ephzibah [22] proposed a hybrid genetic the tests, time to diagnosis, and also for precise prediction.
fuzzy model to predict heart disease. For feature selection A scheme to monitor type 2 diabetes mellitus is explained
genetic algorithms was employed. The selected features were by Wang and Kang [10]. The following algorithms are
used to build a classification model using fuzzy inference. employed in this paper:

13
Evolutionary Intelligence

1. Decision Tree (DT) to classify and generate rules. It is approach Genetic Programming (GP) is used for creating deci-
relatively very quick as well as efficient in rules genera- sion trees with arbitrary expressions attached to the nodes. In
tion. the third approach, EDDIE-101, is used for classification.
2. Artificial Neural Networks (ANN) to processes nonlinear From our in-depth literature survey, we have found that
problems. existing algorithms performed well on heart disease datasest,
3. A back propagation neural network is widely utilized to but when features are reduced by optimization algorithms,
diagnose as well as to predict. the performance of the algorithms with respect to several
measures like accuracy, specificity, sensitivity, has also
Time Series is used to predict based on three models: reduced dramatically.
Integrated, Auto Regressive and Moving Average. The
above models are combined to produce the following hybrid
models: Auto-regressive Integrated Moving Average and 4 Algorithm background
Autoregressive Moving Average. After pre-processing of
data, selection of feature is carried out utilizing these, ANN Pawlak [19] introduced Rough sets theory (RST). RST is
predicts the disease and then generates suggestion on imple- used for analyzing details which are classified as not clear
menting clinical procedures as well as strategies that control or not decided. Primary application of RST is with regard
diabetes. to attribute reduction. Rough sets theory’s common notion
Reddy and Khare [28] introduced an algorithm called is listed below: Let I = (U, A ∪ {d}) stand for the scheme
FFBAT-Optimized Rule Based Fuzzy Logic Classifier, for of information, where U denotes universe amongst non-
diabetes classification. In this model, Locality Preserving empty group of limited objects, A denotes state attributes’
Projection is utilized to reduce number of features and dia- group which is limited and non-empty, the decision feature is
betes classification is carried out employing RBFL classi- shown as d (decision table is also such table’s name), ∀a ∈ A
fier. The related attributes are found by the algorithm LPP; there exists a task that is equivalent fa ∶ U → Va , here Va
then the RBFL generates fuzzy rules, finally the algorithm stands for A value’s group. If P ⊆ A , P-in-discernibility
of FFBAT is employed to optimize rules. FFBAT is a hybrid associated being symbolized as IND(P) , being distinct:
of BAT and another optimization technique namely Firefly � �
(FF). FFBAT is also used to classify the publicly available IND(P) = ⟨x, y⟩𝜖U ∣ ∀a ∈ P, fa (x) = fa (y) (1)
heart disease datasets from UCI machine learning reposi-
U’s separation produced by IND(P) being symbolized U/P. If
tory [21].
(x, y) ∈ IND(P) , then x as well as y are indiscernible by a fea-
Gandomi et al. [9] have suggested Cuckoo Search com-
ture from P. After that P-in-discernibility associations simi-
bined with Lévy flights. Reddy and Khare [20] have intro-
larity classes being symbolised as [x]p . Let X ⊆ U  , P-power
duced an algorithm, OFBAT-RBFL for classifying cardiac
approximation PX as well as P-upper approximation PX of
ailments. In this paper Opposition Based Learning is inte-
set X can be distinct:
grated with the FFBAT to improvise FFBAT.
{ }
Kaluri and Reddy [13] created a framework to extract PX = x ∈ U ∣ [x]p ⊈ X (2)
and recognize the sign gesture language using four stages:
{ }
1. Segmentation utilizing Modified Region Growing Algo- PX = x ∈ U ∣ [x]p ∩ X ≠ 𝜙 (3)
rithm
Let p, Q ⊆ A is equivalence relations over U, positive, nega-
2. Utilizing median filter to remove noise,
tive as well as regions of boundary may get definition to be:
3. Feature extraction
4. Recognition employing Adaptive Genetic Fuzzy Classi- ⋃
POSp (Q) = PX
fier (4)
X∈U|Q

Game et al. [8] have proposed a model for classifying ⋃


health care data. It includes the following steps (1) Map- NEGp (Q) = U − PX
(5)
X∈U|Q
reduce framework (2) support vector machine (3) optimized
decision tree classifier. PCA is used for dimensionality ⋃ ⋃
reduction. Next SVM is applied. For optimal rule generation BNDp (Q) = PX − PX
(6)
divergence based grey wolf optimization. X∈U|Q X∈U|Q

Wang et  al. [31] have proposed various evolutionary The optimistic section of separation U ∣ Q with corre-
approaches for classification. In the first approach encoding sponding to P, POSp (Q) , is group of every object of U may
rule sets with bit string genomes is performed. In the second

13
Evolutionary Intelligence

become to be positively categorized for obstructing separa- The adaptive genetic algorithm (AGA​ ) is an improved ver-
tion U ∣ Q utilizing P. Q reliant on p in a level k(0 ≤ k ≤ 1) sion of the genetic algorithm, in which, adaptive mutations
symbolized by are employed for achieving desired optimizing results. A
genetic algorithm employs mutations to each parent chromo-
POSp (Q)
P ⇒ kQ 𝛾P (Q) ∣ ∣ (7) some, where random interchanging of genes occurs. In the
U proposed adaptive mutation, the rate of mutation calculation
Here p is a set of attributes conditionally, Q is decision is based on the chromosome’s fitness. The performance of
and ⋎P (Q) is the classification’s quality. If k = 1 , Q is reliant mutation is based on the rate of mutation. For functioning
entirely on P; if 0 < k < 1 , Q is oriented incompletely on P; of AGA​ , chromosomes are to be generated for the solution
as well as if k = 0 the Q is unbiased on P. ∣ . ∣ represent set’s set. Every chromosome gets subjected to many AGA​steps.
cardinality. Feature diminutions aim is to eradicate features
which are not necessary. Each reduct’s group is distinct: 5.2 Steps in adaptive genetic algorithm
{ }
Red(C) = R ⊆ C| ⋎R (D) = ⋎C (D), ∀B ⊂ R, ⋎B (D) ≠ ⋎C (D) Genetic algorithm is a popular soft computing method. To
(8) improve canonical GAs, many variations are proposed. One
A set of minimal reductions is defined in Eq. 9 as such technique is AGA​ . It has the following steps. The pro-
{ } cess is explained in Fig. 1.
Red(C)min = R ∈ Red ∣ ∀R� ∈ Red, |R| ≤ |R� | (9)
1. Generation of Chromosomes
2. Calculating Fitness function
3. Crossover
5 AGAFL model 4. Adaptive Mutation
5. Selection
5.1 Genetic algorithm
While optimizing the rules that the fuzzy classifier gen-
The genetic algorithm starts with a set of solutions (denoted erates, every rule is considered to be a chromosome. The
by chromosomes) called population. The selected solutions chromosome pools are randomly generated as well as every
form new solutions (offspring) based on their fitness value - chromosome gets subjected to AGA​’s many operations.
the more the value of fitness, the more chances they have to Based on the fitness value, the Chromosomes gets evalu-
reproduce. The Basic Genetic Algorithm has been explained ated and those chromosomes are made available at output.
in Algorithm 1. The vital steps in genetic algorithm are mutation, crossover.

Algorithm 1: Basic Genetic Algorithm


1 Generate population of random type with m chromosomes (solution
that suits solving problem) f (y)
2 Fitness: estimate every chromosome y’s fitness f(y) at the population
3 New population: Iterate steps as follows until it arrives at a new
desired population.
(a) Selection: Chooses as per fitness a couple of chromosomes of parent
from a population
(b) Crossover: Parents do crossover according to a probability of crossover
for figuring an offspring that is new. In case there is an absence of
performing of crossover, the offspring can be parent’s exact copy.
(c) Mutation: With a mutation probability, mutate new offsprings at each
locus (position in the chromosome).
(d) Accepting: Insert a new offspring in a new population
4 Employ newly generated population for an additional run of
algorithm.
5 If the terminating condition is satisfied, end the process and return
the best solution from current population
6 Go to step 2

13
Evolutionary Intelligence

Fig. 1  Flowchart of AGA​

The chromosome represents contained information in a pre- solved. To cite a situation, a real number or an integer could
defined way of the solution. The binary string is a common be directly encoded, certain permutations can be encoded
way to encode information about chromosome. A chromo- and so on.
some can be representing as follows:
Step 1 Chromosome Generation The initial stage in this
Chromosome 1 ⇒ 𝟷𝟷𝟶𝟷𝟷𝟶𝟶𝟷𝟶𝟶𝟷𝟷𝟶𝟷𝟷𝟶
AGA​algorithm is generating chromosomes. The chro-
Chromosome 2 ⇒ 𝟷𝟷𝟶𝟷𝟷𝟷𝟷𝟶𝟶𝟶𝟶𝟷𝟷𝟷𝟷𝟶 mosomes here are nothing but the generated rules
Every bit in the string can retain a correspondence to the employing fuzzy; the genes are nothing but the rule
solution’s characteristic. In other words a number can be rep- parameters. At the solution space, a count of ‘C’ chro-
resented by a complete string. Many coding techniques exist mosomes in random are generated that are given in the
for solution encoding; it depends basically on the problem term shown below

13
Evolutionary Intelligence

Chk = [G0 k G1 k ...GCL −1 k ]0 ≤ k ≤ M − 1;0 ≤ i ≤ CL − 1   The selection of mutation rate depends upon the esti-
(10) mated fitness value. Based on the generated rules by
fuzzy logic, the fitness value is utilized in this method.
where Gki is the jth gene of the chromosome, M is the
Comparison of mutation rate with stated values of fit-
total population and CL is the length of the chromosome.
ness is done based on the threshold and resultant values
Step 2 Calculating Fitness function The fitness function
are selected as final mutation rate. The vector represent-
is given in Eq. 11. The prime goal of fitness function
ing the mutation points is as follows:
is to optimize the rules while selecting solutions. The
{ }
solutions for having better fitness are chosen to proceed MUr = mp1 , mp2 ...mpl (14)
further in Eq. 11.
Where l denotes length of chromosome. Rate of muta-

M
tion r identification is done basing on fitness ft .
ft = Rs ∕M (11)
K=1 {
1; if ft ≤ T
MUr = (15)
where s denotes mk to be included in the summation term, 0; else
and certain parameters to improve. Here we have Rs as
the selected rule and M are the rules total count. where T computation is based upon generated fuzzy rule.
  The fitness value, ft for each chromosome is com- Mutating being done for extraction of every mutating
puted based on the chosen rules. Every chromosome is point utilizing in Eq. 14. Rate of mutation changes for
checked against fitness function. Only those solutions every chromosome during every iteration and depends
which satisfy the fitness function will be selected to upon fitness value.
participate in the reproduction using either crossover or Step 5 Selection
mutation.   The last step in the adaptive genetic algorithm is the
Step 3 Crossover For generating a new chromosome, selection process. Based on the fitness value obtained,
crossover is performed between two parent chromo- the chromosomes that are new ( Np ) are positioned in a
somes. The newly generated chromosome is called an selection pool. In selection pool chromosomes whose
offspring. The crossover is carried out depending upon value of fitness is the best will stay on top. The top most
chosen genes and production of offspring depends on Np chromosomes stored in selection pool are chosen as
the crossover rate (CO rate). The equation to find the the next generation between the 2 Np chromosomes.
crossover point is shown in Eq. 12
5.3 Proposed AGAFL method
CG
CPrate = (12)
CL The process of feature reduction decreases the computa-
tion cost and also increases classification performance. To
where CPr ate is the Crossover Rate, CG is the number of
improve results in predicting disease, in this paper, rough
Genes Generated, and CL is length of the chromosome.
sets is used for feature reduction; to generate rule set Fuzzy
  Based on the computed CO rate, the parent chromo-
Logic classifier is used. The solution set is formed by Adap-
somes perform crossover generating a set of new chro-
tive Genetic Algorithm in order to get optimized rules to
mosomes named offspring. By CO, the crossover point
predict disease. The disease prediction model comprises of
is found, and the genes at these points are interchanged
steps: Rough sets based attribute reduction, normalization,
from chromosomes of both parent so that offsprings are
and then AGAFL classification. To start with the input data-
generated containing characteristics of both parents’
set normalization is done in the range of [0, 1] . The rough
chromosomes. The chromosomes generated will have
set based technique is applied for selecting best attributes.
a better fitness when compared with the older chromo-
The reduced attributes will be divided into a couple of sub-
some generation thereby making it better for processing.
sets: testing dataset, training dataset. The training dataset
Step 4 Adaptive Mutation In the proposed method, in
is fed into AGAFL; testing dataset is utilized for testing the
place of mutating step in which some random genes are
proposed model. The process of proposed heart disease pre-
changed from a single parent, the mutation is done based
diction is depicted in Fig. 2. The details of each step in the
on rate of mutation. Mutation rate is calculated as below:
proposed model are described as follows:
Pm
MUr = (13)
CL

where MUr is the Mutation rate, Pm is the Mutation


Point and CL is the length of the chromosome.

13
Evolutionary Intelligence

verting data into interval of specific type. For normal-


izing, widely used min-max method is employed. The
original dataset is mapped by Min-max normalization
into one range by using
D − Dmin
Dn = × [newmin − newmax ] + newmin (16)
Dmax − Dmin

Range of transform datasets is described by newmin ,


newmax  ; where, it is utilized that newmin = 0 and
newmax = 1 . The pseudocode of the proposed model is
in Algorithm 2.
2. Attribute Reduction using Rough Sets
  The main task here is reducing attributes utilizing
Rough sets. In addition there is reduction in attributes’
quantity and eliminating details which are irrelevant,
unconnected, noisy or even redundant.
3. Solution Representation
  The solution is represented in a binary system. In
every bit 1 shows selecting as 0, represents not select-
ing attribute of equivalence.
( To cite a situation,
) dataset
containing 10 attributes a1 , a2 , a3 , … a10 as well as a
solution
( Y = 1010110010)  , then chosen attributes will
Fig. 2  The overall process of proposed disease prediction system
be a1 , a3 , a5 , a6 , a9 .

Algorithm 2: Pseudocode of the Proposed Model


Input: Hungarian, Switzerland, and Cleveland Heart Disease
Datasets from UCI Machine Learning Repository.
Output: Optimized Fuzzy Logic Classifier with Optimized
Classification Rules
1 Apply rough set theory for feature extraction from the input datasets.
2 Feed the extracted features to Fuzzy Logic Classifier to train the
model and generate classification rules
(a) Fuzzification: Convert the crisp data into fuzzy data as per equation
(17)
(b) Fuzzy Rule Generation: Generate fuzzy rules from the fuzzy data.
Sample rule is If A1 is high and A2 is low and A3 is medium then class is
C2 class
(c) Defuzzification: Convert the fuzzy rules into crisp rules
3 Apply Adaptive Genetic Algorithm on the model constructed in
Step-2 to optimize the classification rules.
4 Cross validate the model using the testing data. The model is
evaluated by the accuracy, specificity and sensitivity measures.
5 Perform statistical test (name the tests) to validate the results.

1. Normalization 4. Fitness Function


  Consider the dataset containing the number of attrib-   Every solution’s value of fitness’ is generated by fit-
utes and entities. Normalization is applied to the dataset ness function. Solution that is best is chosen based on
to reduce the arithmetic complexity of the data by con- the fitness value. It this work, UCI Heart disease data-

13
Evolutionary Intelligence

set 14 attributes. From the attributes, we see that age, membership method is selected for modifying the
fasting blood sugar, sex, resting blood pressure are the data of input into fuzzy value. The principle engaged
attribute subset that are the much contributing param- to analyze the membership values is illustrated
eters. These attributes’ subset is applied for defining below:
the criterion for fitness to generate next population.
Now, rule set from Fuzzy logic classifier will form the ⎧0 if x ≤= i
⎪ x−i
population on which fitness function will be applied. ⎪ j−i if i ≤ x ≤ j
The Fitness Function states that every rule’s anteced- f (x) = ⎨ k−x (17)
⎪ k−j if j ≤ x ≤ k
ent must possess the attribute subset Sf = (age, fasting
⎪0 if x ≥ k
blood sugar, sex, resting blood pressure) to have bet- ⎩
ter fitness, will be selected to proceed further. In other
words, the chosen rule to participate in reproduction to 6.3 Fuzzy rule generation
generate   The fuzzy rule generation is a very important mis-
{ next generation } shall be the super set of Sf  . Let sion that assists in recording the input to its equiva-
R = r1 , r2 , r3 , … .rm be the set of rules under consid-
eration to generate new population. Let Rf ⊆ Rwhere Rf lent output. If A1 , A2 , … , AN are the attributes and
is the set of rules containing super sets of Sf  . Goodness C1 , C2 are the class labels then a fuzzy rule can be
of every solution is evaluated by fitness function Sf . framed based on the linguistic values like high,
5. Termination Criteria medium, low. The values N and M are the number
  The algorithm will stop its implementation only if of attributes and number of classes respectively.
maximum count of iterations is reached. The solution Therefore the fuzzy rule can be framed as follows:
that contains the best fitness value is selected by utiliz-
ing RS, and the AGAFL is used to classify the datasets. • If A1 is high and A2 is low and A3 is medium then
As mentioned earlier the best attributes are given as class is C2;
input to fuzzy classifier. • If A1 is low and A2 is medium and A3 is medium
6. Prediction Based on Fuzzy Logic System then class is C1;
  Once the reduction in the features from the dataset • If A1 is high and A2 is medium and A3 is low then
input is done, hybrid ADAFL classifier predicts disease. class is C2;
Fuzzy logic classifier has three steps:
6.4 Rule based fuzzy score computation
(a) Fuzzification.   The testing data with reduced attribute is fed to
(b) Fuzzy inference engine. the AGAFL, where the test data is converted to the
(c) De-Fuzzification fuzzified value based on the fuzzy membership func-
tion. Then, the fuzzified input is matched with the
fuzzy rules defined in the rule base. Here, the rule
6.1 Fuzzy Inference System inference procedure is used to obtain the linguistic
  A fuzzy inference system aids in mapping the value that is then converted to the fuzzy score using
inputs to the equivalent output by predefined fuzzy the average weighted method. From the fuzzy score
rules. The knowledge support includes if-then rules obtained, the classification decision is produced.
that denote the relationship among the input and out- The proposed model is demonstrated in Fig. 2.
put fuzzy groups. The inference system is enhanced
by a sequence of actions like;
6 Results and discussion
i. Development of fuzzy rules.
ii. Fuzzify values of input based on degree To implement the method proposed, Matlab version (7.12) is
of membership. utilized. It is being carried out using a laptop with windows
iii. Merging of fuzzified input as well as 10 that has the processor of Intel Core i5 having a
fuzzy rules to improve rules’ strength. speed of 1.6 GHz and 8 GB RAM. The model is evaluated on
iv. Finally the output is de-fuzzified to yield three different datasets in UCI machine learning repository.
output as a crisp value.
6.2 Membership function 6.1 Dataset description
  The input data is transformed into a value of
membership (or membership degree) among 0 and The experiments were carried out on three different heart
1 by the membership function (MF). The triangular disease datasets from UCI machine learning repository

13
Evolutionary Intelligence

namely Switzerland, Hungarian and Cleveland. These data-


sets have the following attributes

• sex
• age
• type of chest pain
• cholesterol
• blood pressure while resting
• fasting blood sugar
• maximum attained heart rate
• ECG at rest
• exercise induced angina
• slope of peak exercise
• older peak
• no. of major vessels by fluoroscopy colored
• class label
• thal

6.2 Evaluation metrics Fig. 3  Performance measure based on sensitivity

Evaluating method of heart diease diagnosis is done by using


the metrics below:
TP
Sensitivity =
TP + FN
(Number of true positive assessment)/(Number of all
positive assessment)
TN
Specificity =
TN + FP
(Number of true negative assessment)/(Number of all
negative assessment)
TN + TP
Accuracy =
TN + TP + FN + FP
(Number of correct assessments)/Number of all
assessments).

6.3 Performance evaluation

Heart disease prediction based on novel AGAFL classifica-


tion is applied on the dataset. For classification this model
used hybrid of Adaptive Genetic Algorithm and Fuzzy logic
classifier. In previous works [28], LPP algorithm has been
used for feature reduction and FFBAT+RBFL for predic- Fig. 4  Performance measure based on specificity
tion, where FFBAT is hybrid of firefly and bat optimiza-
tion algorithms and RBFL is Rule Based Fuzzy Classifier.
Also, Rough Set (RS) algorithm has been used for feature specificity measures. 80% of the data being used for train-
reduction and Fuzzy Logic Classifier (FL) for prediction, ing the proposed model and remaining 20% to validate the
where FFBAT is hybrid of firefly and bat optimization model. Table 1 proves that the proposed approach outper-
algorithms and RBFL is Rule Based Fuzzy Classifier. The forms the existing approaches. Figures 3, 4 and 5 show the
entire dataset is cross validated with k-fold cross validation, performance evaluation of proposed and existing algorithms
where k = 10, by shuffling the dataset and split the dataset based on the measures accuracy, sensitivity and specificity
into k groups (k = 10). Then the first group is used as a respectively. The proposed method reduces the search space
validation dataset whereas remaining k − 1 (9) groups are when the class label of a new record has to be predicted,
used to train the model. It shows the comparative analyzes hence reducing the time complexity significantly which is
of proposed approach based on accuracy sensitivity and demonstrated in the Fig. 6.

13
Evolutionary Intelligence

algorithm . We performed t test in MATLAB (R2014a)


for significance level 0.05, where statistics are }h�  , }p�
and }t� . If t test return the value h = 0 , null hypothesis is
accepted and if h = 1 , it rejects the null hypothesis implies
that there exist a significance difference between our pro-
posed algorithm and existing one. This could be proven
with the smaller p value than the significance level 0.05
and the larger t (calculated) value than the t (tabulated).
In our experiment we took three datasets: Cleveland,
Hungarian, Switzerland heart disease datasets taken from
UCI machine learning repository. For four degree of free-
dom ( df = observation − 1 ) comparative t test results for
F-Measure on three datasets are given in Table 2. We can
observe that Cleveland except all datasets are significantly
performing better for Proposed AGAFL than other algo-
rithms. Also for Cleveland dataset, all evaluation measures
are showing better results.
Fig. 5  Time efficiency evaluation ANOVA Null Hypothesis is also performed for signifi-
cance testing. ANOVA’s Null hypothesis is true when all
means of the experiment are identical or have no signifi-
cant difference. Thus, they can be considered as a part of a
larger set of the population. On the other hand, the alternate
hypothesis is valid when at least one of the sample means is
different from the rest of the sample means. In mathematical
form, they can be represented as:
Ho ∶ 𝜇1 = 𝜇2 = … ..𝜇n Null hypothesis
H1 ∶ 𝜇1 ≠ 𝜇2 Alternate hypothesis

If the p value is less than the alpha level selected (which


it is, in our case), as given in Tables 3 and 4, we reject the
Null Hypothesis.
Within the group variance is larger and between the
groups variance is small. So F will be smaller. Here, we can
see that the F-value is greater than the F-critical value for the
alpha level selected (0.05). Therefore, we have evidence to
reject the null hypothesis and say that at least one of the two
Fig. 6  Performance measure based on accuracy samples have significantly different means and thus belong
to an entirely different population.
6.4 Significance testing

To test statistical difference between our proposed algo- 7 Future work


rithm and other existing approaches parametric paired
sample t test is applied [7]. h = 0 is considered as null This work can significantly be improved by using advanced
hypothesis that says that there is no significance differ- meta-heuristic algorithms like whale optimization algo-
ence between one existing algorithm and the proposed rithm, Antlion algorithm, adaptive bee colony algorithm,

Table 1  Performance evaluation Methods Cleveland Hungarian Switzerland


of proposed method and
existing methods Acc Sen Spec Acc Sen Spec Acc Sen Spec

Proposed (AGAFL) 90 91 90 91 92 88 89 97 75
Previous (LPP + RBFL) 68 79 84 67 87 38 72 76 67
Existing (RS + FL) 72.6 100 67 69.7 86 35 63.4 67 72

13
Evolutionary Intelligence

Table 2  Statistics of paired t Datasets AGAFL with LPP + RBFL AGAFL with RS + FL


test for F-measure of AGAFL
with LPP + RBFL, AGAFL h p value t (calculated) h p value t (calculated)
with RS + FL for df = 4 and t
(tabulated) = 2.776 Hungarian 1 0.0048 5.4782 1 0.0142 4.989
Cleveland 0 0.1882 1.3834 0 0.1146 1.8502
Switzerland 1 2.81E−07 10.0543 1 0.0026 5.805

Table 3  ANOVA-single factor Group Count Sum Average Variance


summary
Class A 15 80 5.33 3.333
Class B 15 57 3.8 3.533

Table 4  ANOVA Source of variation SS df MS F p value F-critical

Between group 52.1 3 28.4 9.16433 0.0001669 4.335457


Within group 85.7 32 4.33643 – – –
Total 137.8 35 – – – –

and others. Moreover, these models can be extended to References


other medical datasets as they become available. Also this
model can be tested on several other medical datasets. Also 1. Brameier M, Banzhaf W (2001) A comparison of linear genetic
this model can be tested on other domains like insurance, programming and neural networks in medical data mining. IEEE
Trans Evol Comput 5(1):17–26
finance, etc. In the current study, the environment is static 2. Cios KJ (2000) From the guest editor medical data mining and
and data is not streaming. Thus, we can consider the devel- knowledge discovery. IEEE Eng Med Biol Mag 19(4):15–16
opment and testing of the proposed model in the dynamic 3. Clarkson K, Srivastava G, Meawad F, Dwivedi AD (2019)
environments or use streaming data in this model as a future Where’s @waldo? Finding users on twitter. In: Artificial intel-
ligence and soft computing—18th international conference,
work. ICAISC 2019, Zakopane, Poland, June 16–20, 2019, proceedings,
part II, pp 338–349. https​://doi.org/10.1007/978-3-030-20915​
-5_31
8 Conclusion 4. Dwivedi AD, Malina L, Dzurenda P, Srivastava G (2019) Opti-
mized blockchain model for internet of things based healthcare
applications. In: 42nd international conference on telecommunica-
In this article, a novel method for heart disease classifi- tions and signal processing, TSP 2019, Budapest, Hungary, July
cation has been proposed using Rough Sets and Fuzzy 1–3, 2019, pp 135–139. https​://doi.org/10.1109/TSP.2019.87690​
rule-based classification with adaptive genetic algorithm. 60
5. Dwivedi AD, Srivastava G, Dhar S, Singh R (2019) A decentral-
The classification model proposed in this work has the fol- ized privacy-preserving healthcare blockchain for IOT. Sensors
lowing steps: first, feature reduction is done by rough set 19(2):326. https​://doi.org/10.3390/s1902​0326
theory. Then, prediction of ailment by hybridizing Adap- 6. Feyyad U (1996) Data mining and knowledge discovery: making
tive Genetic Algorithm with fuzzy logic classifier (AGAFL) sense out of data. IEEE Expert 11(5):20–25
7. Fisher R (1955) Statistical methods and scientific induction. J R
is done. The generated rules are optimized by applying Stat Soc Series B Stat Methodol 17(1):69–78
Adaptive Genetic Algorithm. The experimentation is per- 8. Game PS, Vaze V, Emmanuel M (2019) Optimized decision tree
formed on the UCI Heart Disease datasets. The overall rules using divergence based grey wolf optimization for big data
experimental analysis shows that AGAFL performed better classification in health care. Evol Intel. https​://doi.org/10.1007/
s1206​5-019-00267​-w
than other hybrid combinations with respect to measures 9. Gandomi AH, Yang XS, Alavi AH (2013) Cuckoo search algo-
like accuracy, specificity and sensitivity. Major strengths rithm: a metaheuristic approach to solve structural optimization
of the proposed model are, it can efficiently handle noisy problems. Eng Comput 29(1):17–35
data, it works efficiently even on huge number of attrib- 10. Han J, Rodriguez JC, Beheshti M (2008) Diabetes data analysis
and prediction model discovery using rapidminer. In: 2008 second
utes. Also, the proposed model avoids entrapment in local international conference on future generation communication and
optimum. networking, vol. 3. IEEE, pp 96–99
11. Henriques J, Carvalho P, Paredes S, Rocha T, Habetha J, Antunes
M, Morais J (2014) Prediction of heart failure decompensation

13
Evolutionary Intelligence

events by trend analysis of telemonitoring data. IEEE J Biomed 24. Si W, Srivastava G, Zhang Y, Jiang L (2019) Green internet of
Health Inform 19(5):1757–1769 things application of a medical massage robot with system inter-
12. Herland M, Khoshgoftaar TM, Wald R (2014) A review of data ruption. IEEE Access 7:127066–127077. https​://doi.org/10.1109/
mining using big data in health informatics. J Big Data 1(1):2 ACCES​S.2019.29395​02
13. Kaluri R, Reddy P (2016) Sign gesture recognition using modified 25. Sidek KA, Mai V, Khalil I (2014) Data mining in mobile ecg
region growing algorithm and adaptive genetic fuzzy classifier. Int based biometric identification. J Netw Comput Appl 44:83–91
J Intell Eng Syst 9:225–233 26. Srinivas K, Rao GR, Govardhan A (2014) Rough-fuzzy classifier:
14. Kharat KD, Kulkarni PP, Nagori M (2012) Brain tumor classifica- a system to predict the heart disease by blending two different set
tion using neural network based methods. Int J Comput Sci Inform theories. Arab J Sci Eng 39(4):2857–2868
1(4):2231–5292 27. Srivastava G, Crichigno J, Dhar S (2019) A light and secure
15. Lahsasna A, Ainon RN, Zainuddin R, Bulgiba A (2012) Design of healthcare blockchain for iot medical devices. In: 2019 IEEE
a fuzzy-based decision support system for coronary heart disease Canadian conference of electrical and computer engineering
diagnosis. J Med Syst 36(5):3293–3306 (CCECE), pp 1–5. https:​ //doi.org/10.1109/CCECE.​ 2019.886159​ 3
16. Lehmann TM, Güld MO, Deselaers T, Keysers D, Schubert H, 28. Thippa Reddy G, Khare N (2016) FFBAT-optimized rule based
Spitzer K, Ney H, Wein BB (2005) Automatic categorization of fuzzy logic classifier for diabetes. Int J Eng Res Afr 24:137–152
medical images for content-based retrieval and data mining. Com- 29. Tsymbal A, Bolshakova N (2006) Guest editorial introduction to
put Med Imaging Graph 29(2–3):143–155 the special section on mining biomedical data. IEEE Trans Inf
17. Long NC, Meesad P, Unger H (2015) A highly accurate firefly Technol Biomed 10(3):425–428
based algorithm for heart disease prediction. Expert Syst Appl 30. Vafaie M, Ataei M, Koofigar HR (2014) Heart diseases prediction
42(21):8221–8231 based on ecg signals’ classification using a genetic-fuzzy system
18. Mohan S, Thirumalai C, Srivastava G (2019) Effective heart and dynamical model of ECG signals. Biomed Signal Process
disease prediction using hybrid machine learning techniques. Control 14:291–296
IEEE Access 7:81542–81554. https​://doi.org/10.1109/ACCES​ 31. Wang P, Weise T, Chiong R (2011) Novel evolutionary algorithms
S.2019.29237​07 for supervised classification problems: an experimental study.
19. Pawlak Z, Sowinski R (1994) Rough set approach to multi-attrib- Evol Intell 4(1):3–16
ute decision analysis. Eur J Oper Res 72(3):443–459 32. Yuvaraj N, Vivekanandan P (2013) An efficient SVM based tumor
20. Reddy GT, Khare N (2017) An efficient system for heart disease classification with symmetry non-negative matrix factorization
prediction using hybrid ofbat with rule-based fuzzy logic model. using gene expression data. In: 2013 international conference on
J Circuits Systems Comput 26(04):1750061 information communication and embedded systems (ICICES).
21. Reddy GT, Khare N (2018) Heart disease classification system IEEE, pp 761–768
using optimised fuzzy rule based algorithm. Int J Biomed Eng
Technol 27(3):183–202 Publisher’s Note Springer Nature remains neutral with regard to
22. Santhanam T, Ephzibah E (2015) Heart disease prediction using jurisdictional claims in published maps and institutional affiliations.
hybrid genetic fuzzy model. Indian J Sci Technol 8(9):797
23. Seera M, Lim CP (2014) A hybrid intelligent system for medical
data classification. Expert Syst Appl 41(5):2239–2249

13

You might also like