You are on page 1of 40

CHAPTER 5

WRAPPER BASED META – HEURISTIC FEATURE


SELECTION ALGORITHM FOR NCD DATASETS

5.1 INTRODUCTION

Filter based feature selection methods are simple, fast, and feature dependent.
Each feature possesses an intrinsic property which may not be relevant when
considered independent, rather provide meaningful insights when combined with
other features. The importance of the features is measured using a threshold value
or some statistical criteria. Filter based methods do not comply with the classifier.
Features are selected independently and then evaluated with the classifier which
reduces the classification accuracy. The problem of choosing this threshold
criterion for selecting the relevant features imposes a great challenge to this
approach (Binita Kumari, 2011). These issues could be addressed with the help of
wrapper based approaches. Rough set theory has difficulty in accessing continuous
data. This can be handled using discretization procedure but usually results with
the information loss (Jensen, 2008). This could be addressed with the inclusion of
stochastic based meta-heuristic approaches. Some meta-heuristic algorithms gets
stuck in local optima while some algorithms results with smaller feature subset but
lacks classification accuracy and vice versa. The usage of some algorithms leads to
an increase in computational cost as well as runtime. Some selected variables are
not optimal and they may be redundant even.

The two important features of Population – based metaheuristic optimization


algorithms are exploration and exploitation. Exploration travels through the whole
problem space and exploitation promotes the problem to converge to an optimal
solution. The major goal of any heuristic algorithm is balancing the ability of
exploitation and exploration efficiently in order to find the global optimum
(Seyedali Mirjalili, 2010). Some of the most popular metaheuristic algorithms are
Genetic Algorithm (GA), Particle Swarm Optimization (PSO), Artificial Bee
Colony (ABC) Algorithm, Simulated Annealing (SA), Glowworm Swarm

141
Optimization Algorithm (GSO), Gravitational Search Algorithm (GSA) and so on.
These algorithms have attracted the researchers in solving complex optimization
problems, classification and prediction problems but there is no algorithm that
suits all the optimization and classifications problems. For instance, PSO is a
swarm intelligence technique that acts like the swarm of birds. It is widely used by
the researchers for its easier approach, fewer parameter specification and fast
convergence, but suffers exploration. Gravitational Search Algorithm (GSA) is a
self – adaptive, population-based algorithm inspired by the Newton’s Law of
gravity and motion. It has been widely used by researchers for its high
performance for optimization problems, but it suffers premature convergence such
as exploitation.

In order to overcome these drawbacks, a novel hybridization strategy that


combines these meta-heuristics can be employed. According to Talbi et al. (Talbi,
2002), two algorithms can be combined based on functionality, level of
hybridization and methods adopted. This motivated to hybridize PSO and GSA
algorithms based on low-level hybridization method that focuses on the
functionality of the algorithms. Hence, a new Gaussian based Particle Swarm
Optimization Gravitational Search Algorithm (GPSOGSA) has been proposed that
adapts a Gaussian Parameter which would suffice to overcome the problem of
social and individual influence. It helps in promoting an efficient solution with a
reduced set of features, high classification accuracy and a reduced time complexity
for the NCD datasets. It uses an absolute Gaussian random function, which limits
the usage of too many parameters in PSO algorithm. The performance of the
algorithm is compared based on the convergence rate, accuracy and time over the
traditional PSO, GSA and PSOGSA algorithms. The outcomes revealed that
GPSOGSA algorithm outperforms all the other algorithms attained better results.

In this chapter, a novel hybrid wrapper based Feature Selection algorithm has been
proposed to obtain optimal features for the prediction of Non-Communicable
Diseases. It combines both swarm based meta-heuristic and physics based meta-
heuristic techniques namely PSO and GSA for extensive feature selection that
serves highly in making effective predictions. GPSOGSA helps to overcome the

142
problem of being stuck into the local optima and influences the local searching
ability, thus it aims to bridge the gap of exploration and exploitation. The
algorithm also limits the usage of too many parameters like acceleration factors,
maximum velocity, inertia weight that plays a vital role in PSO, GSA and
PSOGSA algorithms (Kumar, 2021).

The performance of the algorithm is evaluated on NCD datasets. The algorithm


has been designed as a wrapper-based approach that includes Support Vector
Machine (SVM) as a learner algorithm, and improves both the execution time and
the performance accuracy. The findings show that the proposed algorithm could
escape from local optimum and converges faster than the PSO, GSA and PSOGSA
algorithms.

The chapter is further organized as follows: an overview of the wrapper based


metaheuristic approach is presented in Section 5.2. Section 5.3 deals with the
preliminaries based on the Gravitational Search Algorithm and PSOGSA
algorithm. Motivation behind this work is dealt in Section 5.4. The proposed
Gaussian based Particle Swarm Optimization Gravitational Search Algorithm
(GPSOGSA) is presented in Section 5.5. Section 5.6 details about the
experimental analysis of the proposed algorithm on the NCD datasets. Section 5.7
concludes with the significant findings and suggests the sequels of this chapter.

5.2 WRAPPER BASED METHODS

Filter based approaches have certain critical issues such as the inability to increase
the consumption time, lowers the classification accuracy, complexity and other
factors. These issues have compelled researchers to probe into other methods that
improves the performance during the process of classification. As a result of this
quest for better procedures that deliver optimal results, wrapper based
metaheuristic feature selection methods for have been discovered. Because of their
tendency to overcome the problem of dimensionality by optimizing the
classification performance, reducing high use of computational resources, storage,
and the quantity of features, wrapper based metaheuristic algorithms have proven
to be appropriate for a wide range of applications (Abiodun, 2021).

143
5.2.1 WRAPPER BASED FEATURE SELECTION METHODS

Wrapper based methods obeys the decision of the classification algorithm towards
the selection of optimal feature subset. It adopts the greedy search method in
evaluating all the possible feature combinations based on the performance
evaluation criteria that is specific to the problem. The evaluation criteria vary
based on the type of the problem. For a classification problem, generally the
criteria would be classification accuracy, F1-measure, recall, precision etc. Then
the feature subsets that produce the optimal results are selected based on the
learner algorithm.

The wrapper based methods are discriminated into three categories namely,

i. Forward selection: it starts with an empty model and fit the model with one
feature at a time. It then selects the feature which resulted with the best performing
model. Then again fits a model by combining all the possible combinations of the
remaining feature with the previous selected feature and a second feature is
selected and so on. This process is repeated until the required number of features is
selected.

ii. Backward elimination: it starts with the entire set of features and eliminates
one at a time. The process is repeated until the required number of features is
selected.

iii. Stepwise selection or Bidirectional elimination: it combines both forward


selection and backward elimination process. It fits a model and selects a feature.
Next when a new feature is selected, it checks for performance of the existing
feature with that of the new one. If it is low, it eliminates the feature using
backward elimination. The process is repeated until an optimal subset of features
is attained.

144
5.3 PRELIMINARIES

5.3.1 GRAVITATIONAL SEARCH ALGORITHM (GSA)


E. Rashedi et al. in 2009 proposed a Gravitational Search Algorithm (GSA). It is a
population based heuristic algorithm. It acquires the basic idea of Newton’s
Gravitational phenomena. The gravitational force helps the object move in the
multidimensional solution space to find the optimal solution (Esmat Rashedi,
2009). GSA is considered as a collection of candidate solutions whose mass
represents the outcome of the fitness function. During iterations, the masses get
attracted towards each other due to the gravitational force that arises between
them. Heavier the mass, more the attraction force is. As a result, the mass which is
heavy is closer to the global optimum and attracts the other masses with respect to
their distances (Kumar, 2021).

Let N be the number of agents in a system. 𝑋𝑋𝑖𝑖 denotes the position of the ith agent.
It is specified as,
𝑋𝑋𝑖𝑖 = ( 𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑁 ), 𝑓𝑜𝑃𝑃 𝑖 = 1,2, … , 𝑁 (5.1)

where, 𝑋𝑋𝑖𝑖 represents the position of agent ‘i’ (Esmat Rashedi, 2009). Fitness
function is used to compute the gravitational mass 𝐻𝐻𝑖𝑖 (𝑧𝑧) and the inertial mass
𝑀𝑖𝑖 (𝑧𝑧) at time z as follows:
𝑓𝑖𝑖𝑡 (𝑧)−𝑊𝑜𝑟𝑠𝑡𝑓(𝑧)
𝐻𝐻𝑖𝑖 (𝑧𝑧) = 𝑏𝑒𝑠𝑡𝑓(𝑧)−𝑊𝑜𝑟𝑠𝑡𝑓(𝑧)
𝑖
(5.2)

𝑚𝑖 (𝑧)
𝑀𝑖𝑖 (𝑧𝑧) = ∑𝑁 (5.3)
𝑗=1 𝑚𝑗 (𝑧)

where 𝑓𝑖𝑃𝑃𝑖𝑖 (𝑧𝑧) represents ith agent’s fitness value at time z, 𝑃𝑃𝑉𝑉𝑃𝑃𝑃𝑃𝑓(𝑧𝑧) specifies best
fitness value or the maximum fitness of jth agent at time z and 𝑤𝑜𝑃𝑃𝑃𝑃𝑃𝑃𝑓(𝑧𝑧) is the
worst fitness value specified as minimum fitness of jth agent at time z in the case of
a maximization problem is represented as (Kumar, 2021):

𝑃𝑃𝑉𝑉𝑃𝑃𝑃𝑃𝑓(𝑧𝑧) = 𝐻𝐻𝑖𝑛𝑗∈{1,…,𝑁} 𝑓𝑖𝑃𝑃𝑗 (𝑧𝑧) (5.4)

145
𝑤𝑜𝑃𝑃𝑃𝑃𝑃𝑃𝑓(𝑧𝑧) = 𝐻𝐻𝐻𝐻𝑥𝑗∈{1,…,𝑁} 𝑓𝑖𝑃𝑃𝑗 (𝑧𝑧) (5.5)

For the ith agent, the force 𝐹𝑜𝑃𝑃𝑐𝑉𝑉𝑖𝑖𝑗 (𝑧𝑧) acting on a mass 𝑗 at time 𝑧𝑧 is given as
(Esmat Rashedi, 2009),

𝑀𝑀𝑗 (𝑧)
𝐹𝑜𝑃𝑃𝑐𝑉𝑉𝑖𝑖𝑗 (𝑧𝑧) = ∑𝑁
𝐽=𝑛𝑏𝑒𝑠𝑡,𝑗≠𝑖𝑖 𝑃𝑃𝐻𝐻𝑛𝑑𝑗 × 𝐺𝑃𝑃𝐻𝐻𝑣𝑖𝑃𝑃(𝑧𝑧) × 𝑅 × (𝑋𝑋𝑗 (𝑧𝑧) − 𝑋𝑋𝑖𝑖 (𝑧𝑧))
𝑒𝑑 𝑖𝑗 (𝑧)+𝜀

(5.6)

where, the random number 𝑃𝑃𝐻𝐻𝑛𝑑𝑗 ranges within 0 and 1, nbest is the set of first
agents with best objective values and biggest mass at time z which is decreased
linearly over iterations (Esmat Rashedi, 2009). 𝐺𝑃𝑃𝐻𝐻𝑣𝑖𝑃𝑃(𝑧𝑧), the gravitational
constant controls the search accuracy and improves the best solution. 𝑀𝑗 is the jth
gravitational mass 𝜀, a constant. 𝑅_𝑉𝑉𝑑𝑖𝑖𝑗 (𝑧𝑧) denotes the Euclidean distance
between ith and jth agents, which is expressed as

𝑅_𝑉𝑉𝑑𝑖𝑖𝑗 (𝑧𝑧) = �𝑋𝑋𝑖𝑖 (𝑧𝑧), 𝑋𝑋𝑗 (𝑧𝑧)� (5.7)


2

where, || ||2 refers the Euclidean distance.


The gravitational constant 𝐺𝑃𝑃𝐻𝐻𝑣𝑖𝑃𝑃(𝑧𝑧) is the decreasing coefficient of time z. It is
initially set to 1 and termed as 𝐺𝑃𝑃𝐻𝐻𝑣𝑖𝑃𝑃0 , the initial gravitational constant, and will
be decreased exponentially towards zero at the last iteration. It is computed as

𝑧
𝐺𝑃𝑃𝐻𝐻𝑣𝑖𝑃𝑃(𝑧𝑧) = 𝐺𝑃𝑃𝐻𝐻𝑣𝑖𝑃𝑃0 × 𝑉𝑉𝑥 p �−𝛼 × 𝑚𝑎𝑥𝑖𝑖𝑡𝑒𝑟� (5.8)

where, 𝛼 is the descending coefficient, ‘z’ is the current iteration and 𝐻𝐻𝐻𝐻𝑥𝑖𝑃𝑃𝑉𝑉𝑃𝑃 the
denotes the maximum number of iterations.

The acceleration 𝐻𝐻𝑐𝑐𝑉𝑉𝑉𝑉𝑖𝑖 (𝑧𝑧) of ith agent at time z is computed as:

𝐹𝑜𝑟𝑐𝑒𝑖 (𝑧)
𝐻𝐻𝑐𝑐𝑉𝑉𝑉𝑉𝑖𝑖 (𝑧𝑧) = (5.9)
𝑀𝑀𝑖 (𝑧)

where, 𝐹𝑜𝑃𝑃𝑐𝑉𝑉𝑖𝑖 (𝑧𝑧) represents the force calculated using Eq. (5.6) and 𝑀𝑗 (𝑧𝑧)
represents the inertial mass which is calculated using Eq. (5.3).

146
Finally, the velocity and the position of the agents are updated. The velocity is the
sum of its current velocity and its acceleration 𝐻𝐻𝑐𝑐𝑉𝑉𝑉𝑉𝑖𝑖 (𝑧𝑧) which is given as:

𝑉𝑉𝑉𝑉𝑉𝑉𝑖𝑖 (𝑧𝑧 + 1) = 𝑃𝑃𝐻𝐻𝑛𝑑𝑖𝑖 × 𝑉𝑉𝑉𝑉𝑉𝑉𝑖𝑖 (𝑧𝑧) + 𝐻𝐻𝑐𝑐𝑉𝑉𝑉𝑉𝑖𝑖 (𝑧𝑧) (5.10)

where, ‘𝑃𝑃𝐻𝐻𝑛𝑑𝑖𝑖 ’ is a uniform random variable that lies within 0 and 1. It promotes a
randomized search for the variable. The position of the agent ‘i’ is updated using
Eq. (5.11).

𝑋𝑋𝑖𝑖 (𝑧𝑧 + 1) = 𝑋𝑋𝑖𝑖 (𝑧𝑧) + 𝑉𝑉𝑉𝑉𝑉𝑉𝑖𝑖 (𝑧𝑧 + 1) (5.11)

The best solution of each agent is updated in each iteration.

𝑋𝑋𝑖𝑖 (𝑧𝑧 + 1), 𝑖𝑓 𝑓𝑖𝑃𝑃�𝑋𝑋𝑖𝑖 (𝑧𝑧 + 1)� ≥ 𝑓𝑖𝑃𝑃(𝑃𝑃𝑉𝑉𝑃𝑃𝑃𝑃𝑉𝑉𝑃𝑃𝑃𝑃𝑖𝑖 (𝑧𝑧))


𝑃𝑃𝑉𝑉𝑃𝑃𝑃𝑃𝑉𝑉𝑃𝑃𝑃𝑃𝑖𝑖 (𝑧𝑧 + 1) = �
𝑋𝑋𝑖𝑖 (𝑧𝑧), 𝑜𝑃𝑃ℎ𝑉𝑉𝑃𝑃𝑤𝑖𝑃𝑃𝑉𝑉
(5.12)

where, z denotes time, 𝑓𝑖𝑃𝑃(𝑋𝑋𝑖𝑖 (𝑧𝑧 + 1)) denotes the fitness value of the agent 𝑋𝑋𝑖𝑖 at
time (𝑧𝑧 + 1) and 𝑓𝑖𝑃𝑃(𝑃𝑃𝑉𝑉𝑃𝑃𝑃𝑃𝑉𝑉𝑃𝑃𝑃𝑃𝑖𝑖 (𝑧𝑧)) is the best fitness value. Finally, the GSA
algorithm terminates while it meets an end criterion. The GSA algorithm is
depicted as in Figure. 5.1

Algorithm: GSA
Input
Load the NCD dataset
Pre-process the NCD dataset using Eq. (3.1) and Eq.(3.2)
N: Size of Population
D: Dimension of features
z: Number of iterations
Initialize 𝐺𝑃𝑃𝐻𝐻𝑣𝑖𝑃𝑃0 , accel, and maxiter
Initialize the initial parameters as fit←0, Fitness(1) ← fit, Globest ← fit
Output
Best agent.

147
Step 1. Randomly generate an initial population 𝑋𝑋𝑖𝑖 = ( 𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑁 )
Step 2. Initialize the parameters z=1, velocity of all agents �𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑖𝑖 )�, ∀𝑖
Step 3. Initialize the fitness values of all the agents𝑓𝑖𝑃𝑃𝑖𝑖 (𝑧𝑧), ∀𝑖
Step 4. Assign 𝑃𝑃𝑉𝑉𝑃𝑃𝑃𝑃𝑉𝑉𝑃𝑃𝑃𝑃𝑖𝑖 , ∀𝑖 with the initial positions
Step 5. Assign Glbest with the best particle
Step 6. Repeat
Step 7. Compute mi(z), Gravit(z) using Eq.(5.2) and (5.8) respectively
Step 8. Compute bestf(z), worstf(z) using Eq.(5.4) and (5.5) respectively
Step 9. Compute 𝐹𝑜𝑃𝑃𝑐𝑉𝑉𝑖𝑖𝑗 (𝑧𝑧) of each agent using the Eq. (5.6)
Step 10. Compute 𝐻𝐻𝑐𝑐𝑉𝑉𝑉𝑉𝑖𝑖 (𝑧𝑧) using Eq. (5.9)
Step 11. Compute 𝑉𝑉𝑉𝑉𝑉𝑉𝑖𝑖 (𝑧𝑧 + 1) using Eq. (5.10)
Step 12. Compute 𝑓𝑖𝑃𝑃𝑖𝑖 (𝑧𝑧) of the agent 𝑋𝑋𝑖𝑖
Step 13. Update 𝑃𝑃𝑉𝑉𝑃𝑃𝑃𝑃𝑉𝑉𝑃𝑃𝑃𝑃𝑖𝑖 using Eq. (5.12)
Step 14. Update global best (Glbest)
Step 15. Compute the agents position 𝑋𝑋𝑖𝑖 (𝑧𝑧 + 1) using Eq.(5.11)
Step 16.Until the stopping criterion is met
Step 17.Return the best agent
Figure 5.1 GSA Algorithm

5.3.2 DIFFERENCE BETWEEN PSO AND GSA

Both PSO and GSA are similar in the sense that both finds the optimal value based
on the movement of the particles. But these algorithms possess different search
strategy. Some of the differences are projected here.

148
Table 5.1 Comparison on PSO and GSA Algorithms

PSO GSA
It is inspired by the behavior of birds. It is inspired by the Newton law of
gravity and motion.

It has less parameters. So needs less It has more parameters to be computed.


computational time. So computation time increases.

The movement of the particle is based The movement of particle depends on


on the two positions namely pbest and the total force acquired from all other
gbest. agents.

The updation procedures are Here the force is related to the fitness
independent of either the quality of the value, So, the particles can explore the
solutions nor the fitness value (Esmat search space around themselves are
Rashedi, 2009). attracted by the force (Esmat Rashedi,
2009)

Slow convergence rate Converges faster.

It utilise a memory (Pbest and Gbest) It does not have a separate memory
for the updation procedure. instead utilises the current position for
carrying out the updation procedure.

Position updation is carried out through Position updation is carried out through
the velocity that exhibits social and the velocity that includes the property of
cognitive behaviour. acceleration.

5.3.3 PARTICLE SWARM OPTIMIZATION AND


GRAVITATIONAL SEARCH ALGORITHM (PSOGSA)

Particle Swarm Optimization and Gravitational Search Algorithm (PSOGSA) has


been proposed by Seyedali et al. in 2010 (Seyedali Mirjalili, 2010). The algorithm
combines the social thinking ability of PSO and the search capability of GSA for
function optimization. It aims to fuse the gap between two algorithms by
assimilating the ability of exploitation in PSO and exploration in GSA.

149
5.3.3.1 PSOGSA ALGORITHM

The population is randomly generated with a set of agents. Each agent


represents a candidate solution. The computation of force, mass,
acceleration is acquired from the GSA algorithm. The best solution is
tracked for each iteration. The algorithm is combined by slightly modifying
the equation of velocity as in Eq. (5.13).

𝑉𝑉𝑉𝑉𝑉𝑉𝑖𝑖 (𝑧𝑧 + 1) = 𝑤 × 𝑉𝑉𝑉𝑉𝑉𝑉𝑖𝑖 (𝑧𝑧) + 𝐿1 × 𝑃𝑃𝐻𝐻𝑛𝑑 × 𝐻𝐻𝑐𝑐𝑉𝑉𝑉𝑉𝑖𝑖 (𝑧𝑧) + 𝐿2 × 𝑃𝑃𝐻𝐻𝑛𝑑 ×


�𝐺𝑉𝑉𝑃𝑃𝑉𝑉𝑃𝑃𝑃𝑃(𝑧𝑧) − 𝑋𝑋𝑖𝑖 (𝑧𝑧)� (5.13)

where 𝑉𝑉𝑉𝑉𝑉𝑉𝑖𝑖 (𝑧𝑧) is the velocity of ith agent at iteration z, L1 and L2 are the
acceleration factors which lie between 1 and 4, and random number ‘rand’ ranges
within 0 and 1, 𝐻𝐻𝑐𝑐𝑉𝑉𝑉𝑉𝑖𝑖 (𝑧𝑧) is the acceleration value of ith agent at iteration ‘z’ as in
Eq. (5.9), global best solution is termed as Glbest and inertia weight ‘w’ as
specified in Eq. (4.3). The PSOGSA algorithm is represented as in Figure 5.2

Algorithm: PSOGSA
Input
Load the NCD dataset
Pre-process the NCD dataset using Eq. (3.1) and Eq.(3.2)
z: Number of iterations
Initialize 𝐺𝑃𝑃𝐻𝐻𝑣𝑖𝑃𝑃0, acceli(z), z=1, fit←0, Fitness(1) ← fit, Globest ← fit and maxiter
Output
Best Particle
Step 1. Randomly generate an initial population 𝑋𝑋𝑖𝑖 = ( 𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑁 )
Step 2. Evaluate the fitness function for each agent 𝑓𝑖𝑃𝑃𝑖𝑖 (𝑧𝑧), ∀𝑖𝑖𝑖 .
Step 3. Assign 𝑃𝑃𝑉𝑉𝑃𝑃𝑃𝑃𝑉𝑉𝑃𝑃𝑃𝑃𝑖𝑖 with the initial positions of 𝑋𝑋𝑖𝑖 .
Step 4. Assign Glbest with the best particle.
Step 5. Repeat
Step 6. Compute mi(z), Gravit(z) using Eq. (5.2) and (5.8) respectively.
Step 7. Compute bestf(z), worstf(z) using Eq. (5.4) and (5.5) respectively.

150
Step 8. For each agent i, compute the force 𝐹𝑜𝑃𝑃𝑐𝑉𝑉𝑖𝑖𝑗 (𝑧𝑧) using the Eq.
(5.6).
Step 9. Compute 𝐻𝐻𝑐𝑐𝑉𝑉𝑉𝑉𝑖𝑖 (𝑧𝑧) using Eq. (5.9)
Step 10. Compute 𝑉𝑉𝑉𝑉𝑉𝑉𝑖𝑖 (𝑧𝑧 + 1) using Eq. (5.13)
Step 11. For each agent 𝑋𝑋𝑖𝑖 , compute 𝑓𝑖𝑃𝑃𝑛𝑉𝑉𝑃𝑃𝑃𝑃𝑖𝑖 (𝑧𝑧).
Step 12. Update 𝑃𝑃𝑉𝑉𝑃𝑃𝑃𝑃𝑉𝑉𝑃𝑃𝑃𝑃𝑖𝑖 using Eq. (5.12)
Step 13. Update global best Glbest
Step 14. Update the position of agent 𝑋𝑋𝑖𝑖 (𝑧𝑧 + 1) using Eq.(5.11)
Step 15. Until the stopping criterion is met
Step 16. Return the Best particle.
Figure 5.2 PSOGSA Algorithm

5.3.3.2 ADVANTAGES OF PSOGSA

PSOGSA possess some of the following important characteristics:

• The updation procedure considers the quality of the solutions and the fitness
value.

• The agents explore the neighborhood solution for optimality and attract other
agents to it in the search space.

• They maintain a memory that stores the current best solution. It can be
accessed any time and it helps the other agents to explore this best solution and
move towards it (Seyedali Mirjalili, 2010).

5.4 SUPPORT VECTOR MACHINE

SVM is a linear supervised learning algorithm introduced by Vladimir Vapnik in


1974 (Vapnik, 1982). It creates a hyperplane that divides the datasets based on the
class value. It is capable of solving both linear and non-linear problems in a high
dimensional feature space. There can be many hyperplanes that classifies the data.
It aims to find an optimal hyperplane that divides the members of the class. The
optimal hyperplane is the one that has maximum margin space.

151
The equation for the hyperplane is written as
𝐻𝐻: 𝑊 𝑇 (𝑋𝑋) + 𝑃𝑃 = 0 (5.14)
where ‘𝑃𝑃’ is the intercept, W is the vector plane and X is the vector.
The distance of the hyperplane for hyperplane equation 𝑊 𝑇 Φ(𝑋𝑋) + 𝑃𝑃 = 0 is
given as
�𝑊 𝑇 Φ(𝑥0 )+𝑏�
𝑑𝐻 �Φ(𝑋𝑋0 )� = ‖𝑤‖2
(5.15)

where, ‖𝑤‖2 is the Euclidean norm for the length ‘𝑊’. It is specified as
‖𝑤‖2 = �𝑤12 + 𝑤22 + 𝑤32 + ⋯ + 𝑤𝑛2 (5.16)
The kernel function plays a major role in making the data linearly separable by
maximizing the margin among the classes (Alomari, 2012). It is primarily categorized
into three types namely Linear, Gaussian and Polynomial kernel functions.

5.5 MOTIVATION BEHIND THE WORK


In the traditional algorithms, the convergence of the particle’s position plays a
vital role in the convergence of the algorithm. As the particles converges to their
local attractor, their personal best position converges to the global best position
which directs the algorithm to converge, i.e. the particle’s local attractor will move
towards the global best which enables the current position of the particle to move
in the direction of the global best position, thus, it influences the convergence of
particle’s current position. If the global best position gets stuck in the local optima,
then the particle’s current position also moves towards it which leads to premature
convergence of the algorithm. So, finding out the proper local attractor becomes
important to improve the convergence of the algorithm.

PSO is a simple nature inspired algorithm that is widely used for various
optimization problems. It suits for various domains like medical data or
engineering problems but gets stuck in local optimum. When it is hybridized with
other heuristic or metaheuristic algorithms, it efficiently finds the global
optimal value and helps improve the classification accuracy in a reduced amount
of time.

152
5.6 GPSOGSA ALGORITHM
Keeping the ideas and limitations of PSO and GSA as a groundwork, a Gaussian
Particle Swarm Optimization and Gravitational Search Algorithm (GPSOGSA) is
proposed. It uses an absolute Gaussian parameter that helps to overcome the
problem of getting trapped into the local optimum and selects the important
parameters that influence the searching ability. It also improves the convergence
rate thereby reaches a better optimal solution in a stipulated time and thus
addresses the problem of exploration and exploitation.

Using a uniform random distribution will mislead in identifying the local attractor,
and leads the traditional algorithms suffer the problem of premature convergence.
The proposed GPSOGSA serves two purposes:

i. It uses a Gaussian Probability Distribution (GPD) parameter that helps in


identifying the proper local attractor thus improves the convergence probability,
which, will not be in the case of a uniform distribution.

ii. It uses only the important parameters that influence the local searching ability
and lead us in attaining a balance between the exploration and exploitation
capability which forms the major characteristic of a heuristic algorithm.

The traditional PSO algorithm uses uniform random numbers L1 and L2 for updating
the velocity is replaced by the absolute Gaussian random variable |𝑔𝑝| and |𝐺𝑃𝑃|, so
the usage of inertia weight ‘w’ and acceleration factors are not required. It thus helps
to escape from getting trapped into the local optima without slowing down the
convergence rate and strives for efficient feature selection and classification.

Since the objective of feature selection is to provide a reduced subset of features


without affecting the originality of the data, and that of classification is to yield a
good classification accuracy, a novel fitness function that considers both the
accuracy and the reduced feature set has been used as an evaluative measure.

Further the coefficients of acceleration 𝐻𝐻𝑐𝑐𝑉𝑉𝑉𝑉𝑖𝑖 (𝑧𝑧) and (Glbest - Xi (z)) such as L1, L2
and rand need not be specified instead they are automatically computed by the GPD.

153
This is carried out by updating the velocity equation in Eq. (5.13) as in Eq. (5.16).

𝑉𝑉𝑉𝑉𝑉𝑉𝑖𝑖𝑑 (𝑧𝑧 + 1) = 𝑉𝑉𝑉𝑉𝑉𝑉𝑖𝑖𝑑 (𝑧𝑧) + |𝑔𝑝| × 𝐻𝐻𝑐𝑐𝑉𝑉𝑉𝑉𝑖𝑖 (𝑧𝑧) + |𝐺𝑃𝑃| × �𝐺𝑉𝑉𝑃𝑃𝑉𝑉𝑃𝑃𝑃𝑃 − 𝑋𝑋𝑖𝑖 (𝑧𝑧)� (5.16)

where, Veli (z) is the velocity of ith agent at iteration z, |𝑔𝑝| and |𝐺𝑃𝑃| are the
absolute random numbers of gaussian probability distribution with mean as 0 and
variance 1, i.e. 𝐻𝐻𝑃𝑃𝑃𝑃(𝑁(0,1)). Further, 𝐻𝐻𝑐𝑐𝑉𝑉𝑉𝑉𝑖𝑖 (𝑧𝑧) is the acceleration value of ith
agent at iteration ‘z’ as in Eq. (5.9) and the global best solution, 𝐺𝑉𝑉𝑃𝑃𝑉𝑉𝑃𝑃𝑃𝑃. Since the
usage of inertia weight ‘w’ does not have an impact over improving the
convergence of GPSOGSA, it is set to 0 (Kumar, 2021).

5.6.1 GPSOGSA AS A WRAPPER ALGORITHM

In order to improve the efficacy of the algorithm in finding optimal features with a
high classification accuracy, the wrapper approach is infused. The wrapper
approach guides the model towards the selection of important features with a
better optimal result. The performance and the complexity of the proposed
algorithm is tested upon the NCD datasets.

5.6.1.1 SVM AS A LEARNER ALGORITHM

From the results of related works on classification techniques carried upon medical
datasets, it is observed that SVM has been widely used by the researchers which
have obtained 100% classification accuracy for certain researches. It also suits
well for binary classification problems. Considering its efficiency towards
handling high dimensional datasets, this algorithm includes SVM as a learner
algorithm for selecting the minimal feature subsets and evaluating the fitness
function for the classification on diseases. The NCD datasets have been divided
into test and training sets using 10-Fold cross validation method in which one
subset is assigned as testing set and the remaining as the training set. Linear kernel
function is used with datasets with small features and a large number of instances
and Radial Basis Function (RBF) is used for datasets with large number of features
than instances. For ex. The diabetes data has got 768 instances with 8 features, so
linear kernel has been used and for ovarian cancer dataset with 4000 features and
216 instances, RBF kernel has been used.

154
A model is built for each trial and evaluated for accuracy. The average accuracy is
calculated for all the trials and is considered as a measure for the fitness
evaluation. In order to achieve a better solution, a novel fitness function has been
defined. This fitness function uses classification accuracy as well as the number of
features in a reduct set as an evaluative measure.

𝑎𝑐𝑐 𝑛𝑓𝑒𝑎𝑡
��𝑎× �−�𝑏× ��
𝑚𝑎𝑥_𝑎𝑐𝑐 𝑁
𝑓𝑖𝑃𝑃𝑛𝑉𝑉𝑃𝑃𝑃𝑃 = (5.17)
𝐺𝑅

where, ′𝐻𝐻′ is the weight factor for accuracy and ′𝑃𝑃 ′ is the weight factor for the
number of features. ′𝐻𝐻′ is initialized to 0.8, ′𝑃𝑃′ is assigned with (1 − 𝐻𝐻), 𝑛𝑓𝑉𝑉𝐻𝐻𝑃𝑃
denotes the number of features selected, 𝑁 is the total number of features,
𝐻𝐻𝑉𝑉𝐻𝐻𝑛_𝐻𝐻𝑐𝑐 is the average classification accuracy defined as 𝐻𝐻𝑉𝑉𝐻𝐻𝑛(𝐻𝐻𝑐𝑐) and
𝐻𝐻𝐻𝐻𝑥_𝐻𝐻𝑐𝑐 is the maximum classification accuracy which is taken as 1. ′𝐺𝑅′ is the
Golden Ratio which holds the value of 1.618. ′𝐻𝐻′ is assigned with the maximum
value to show the importance of the accuracy of the classifier. The classification
accuracy is computed using Eq. (5.18)
(𝑇𝑟𝑢𝑃𝑜𝑠 + 𝑇𝑟𝑢𝑁𝑒𝑔)
𝐻𝐻𝑐𝑐 = 𝑇𝑟𝑢𝑃𝑜𝑠 + 𝑇𝑟𝑢𝑁𝑒𝑔 + 𝐹𝑎𝑙𝑠𝑃𝑜𝑠 + 𝐹𝑎𝑙𝑠𝑁𝑒𝑔
(5.18)

where, TruPos refers to True_Positive, TruNeg refers to True_Negative, FalsPos


refers to False_Positive and FalsNeg refers to False_Negative. The GPSOGSA
algorithm is presented in Figure 5.3.

Algorithm: GPSOGSA
Input
Load the NCD dataset
Pre-process the NCD dataset using Eq. (3.1) and Eq.(3.2)
z: Number of iterations
Initialize 𝐺𝑃𝑃𝐻𝐻𝑣𝑖𝑃𝑃0, acceli(z), z=1, fit←0, Fitness(1) ← fit, Globest ← fit and
maxiter
Output
Best Particle
Step 1. Randomly generate an initial population 𝑋𝑋𝑖𝑖
Step 2. Assign Perbest(i) ← Xi , ∀𝑖

155
Step 3. Assign Glbest with the best particle
Step 4. Compute the initial fitness value 𝑓𝑖𝑃𝑃𝑖𝑖 (𝑧𝑧) for each particle as in Eq.(5.17)
Step 5. Compute the 𝑃𝑃𝑉𝑉𝑃𝑃𝑃𝑃𝑓(𝑧𝑧) 𝐻𝐻𝑛𝑑 𝑊𝑜𝑃𝑃𝑃𝑃𝑃𝑃𝑓(𝑧𝑧) as in Eq. (5.4) and Eq.(5.5)
Step 6. Repeat
Step 7. Compute the Gravitational constant using Eq. (5.8)
Step 8. Compute the Mass of the particle using Eq. (5.3)
Step 9. Compute 𝐹𝑜𝑃𝑃𝑐𝑉𝑉𝑖𝑖𝑗 (𝑧𝑧) of each agent using Eq.(5.6)
Step 10. Compute 𝐻𝐻𝑐𝑐𝑉𝑉𝑉𝑉𝑖𝑖 (𝑧𝑧) using Eq.(5.9)
Step 11. Evaluate the fitness function(𝑓𝑖𝑃𝑃𝑛𝑉𝑉𝑃𝑃𝑃𝑃𝑖𝑖 (𝑋𝑋𝑖𝑖 ))
Step 12. Compute the mean_acc using Function SVM
Step 13. Compute the fitness using Eq. (5.17)
Step 14. Compute the 𝑉𝑉𝑉𝑉𝑉𝑉𝑖𝑖 (𝑧𝑧 + 1) using Eq.(5.16)
Step 15. Update the agents position 𝑋𝑋𝑖𝑖 (𝑧𝑧 + 1) using Eq.(5.11)
Step 16. Update Perbest(i) using Eq. (5.12)
Step 17. Update global best Glbest
Step 18. Until the stopping criterion is met
Step 19. Return the best particle
Figure 5.3 GPSOGSA Algorithm

Function: SVM
Input
NCD Dataset from GPSOGSA
Output
Selected Features, acc and mean_acc
Step 1. Load the dataset with target variable
Step 2. Divide the dataset into training data and testing data using k-fold cross
validation
Step 3. Create the classifier model based on training data
Step 4. Train the model based on the kernel function
Step 5. Evaluate the testing data using the trained model
Step 6. Evaluate the classifier
Step 7. Select the features recursively based on the weights.
Step 8. Return the Selected Features, acc and mean_acc to step 11 in Figure 5.3
Figure 5.4 SVM Learner Algorithm

156
Figure 5.5 depicts the workflow of the proposed algorithm.

Dataset

Training Data Feature

Data Pre-process Selection

GPSOGSA

Feature Subset Fitness Function

Feature Evaluation

Learner Algorithm (SVM)


Classification
Accuracy is the
Fitness
Build Classification Model
Function

Compute Mass, Acceleration, and Force

Update Particle Velocity and Position

Test Data Test Data Classification


Model

Met the end No


criterion?

Yes

Convergence of Particles Features

Optimized Subset of Selected Features

Figure 5.5 Workflow of GPSOGSA

157
5.7 EXPERIMENTAL SETUP AND RESULTS

5.7.1 EXPERIMENTAL SETTING

The proposed work is implemented in MATLAB R2016a. The system has i5


processor, 64-bit windows 8 Operating System at 2.60 GHz with 4 GB RAM. The
algorithm is applied on nine NCD datasets that has been briefly described in Table
3.6. This section projects the performance of the algorithm and compares the
GPSOGSA results with the existing PSO, GSA and PSOGSA algorithms.

5.7.2 PARAMETER SETTING

PSO, GSA, GPSOGSA have certain parameters to be initialized. Population size is


30. For PSO, L1 and L2 are set as 1 and 2 respectively. For GSA and L1= 0.5,
L2= 1.5, w gets decreased from 0.9 to 0.2. For GSA, and GPSOGSA, G0 = 1. The
algorithm is made to run for a maximum of 100 iterations (maxiter) which is set as
the stopping criterion. SVM is used as the learner algorithm with Radial Basis
Function as the Kernel Function. In order to avoid the problem of Overfitting, k –
Fold Cross Validation has been used where k is set to 2, 5 and 10 depending on the
number of features in the problem. The classifier’s performance accuracy and the
reduced feature set acts as the evaluative measure for computing the fitness
function which is represented in Eq. (5.15) which is given as the ratio of correct
predictions to the total number of samples.

Table 5.2 Initialization of Parameters


Parameters GPSOGSA PSOGSA PSO GSA
L1 - 0.5 1 0.5
L2 - 1.5 2 1.5
Gravit0 1 1 - 1
wmin - 0.2 0.2 0.2
wmax - 0.8 0.8 0.8
a 0.8 0.8 0.8 0.8
b 0.2 0.2 0.2 0.2

158
5.7.3 EXPERIMENTAL RESULTS

In this section, GPSOGSA algorithm has been implemented on NCD datasets


defined in Section 3.7.1. The results were tabulated and then compared with the
traditional PSO, GSA and PSOGSA algorithms. The comparison is performed in
three aspects. Firstly, the results are compared based on the classification accuracy
and the best fitness values for various NCD datasets. Next, the computation time
of the proposed algorithm is compared with the existing algorithms. Finally, the
results obtained by various researchers for the same datasets has been compared
with the results obtained from the proposed GPSOGSA algorithm.

Table 5.3 presents the information about the relevant features that have been
selected by the proposed GPSOGSA algorithm. From the table, it can be observed
that, the proposed algorithm selects less than 50% of relevant features for eight
datasets such as SPECTF, Breast Cancer, WBCD, WBCP, Liver Disorder, ILPD
Liver, Hepatitis and Ovarian Cancer. For Diabetes dataset, the algorithm has
selected 5 out of 9 features.

Table 5.3 Selected Features by GPSOGSA Algorithm

NCD Datasets No. of Features selected Selected Features

2, 3, 4, 6, 9, 14, 15, 17, 24, 27,


SPECTF 11
40

Diabetes 5 2, 3, 4, 7, 8

Breast Cancer 5 1, 3, 6, 7, 8

WBCD 5 2, 4, 22, 25, 29

WBCP 10 2, 5, 7, 9, 11, 13, 17, 18, 26, 28

Liver Disorder 4 2, 3, 4, 6

ILPD Liver 4 5, 6, 7, 10

Hepatitis 5 7, 9, 11, 15, 17

159
12, 20, 25, 29, 31, 36, 39, 45, 66, 85, 94, 97, 109, 124, 134, 139, 147, 178, 180, 182, 183, 185, 191,
192, 196, 203, 207, 215, 216, 220, 231, 242, 246, 247, 250, 262, 263, 280, 286, 289, 293, 296, 310, 319,
322, 342, 344, 346, 348, 352, 378, 379, 398, 427, 428, 430, 442, 448, 451, 459, 461, 474, 488, 492, 499,
513, 542, 553, 558, 566, 569, 581, 594, 596, 603, 610, 622, 624, 628, 639, 651, 662, 663, 681, 682, 688,
707, 715, 720, 724, 733, 747, 770, 782, 784, 802, 824, 838, 846, 847, 848, 851, 862, 865, 884, 893, 909,
923, 927, 949, 950, 951, 963, 999, 1005, 1007, 1013, 1025, 1040, 1048, 1057, 1066, 1077, 1081, 1095,
1101, 1110, 1113, 1116, 1118, 1122, 1123, 1130, 1132, 1141, 1144, 1147, 1149, 1157, 1173, 1175, 1180,
1189, 1191, 1194, 1198, 1201, 1204, 1213, 1230, 1237, 1243, 1274, 1294, 1296, 1305, 1306, 1307, 1319,
1320, 1329, 1345, 1349, 1352, 1366, 1369, 1371, 1373, 1384, 1390, 1395, 1412, 1418, 1421, 1430, 1440,
1448, 1451, 1454, 1455, 1460, 1462, 1465, 1467, 1490, 1508, 1511, 1515, 1519, 1521, 1522, 1539, 1572,
1588, 1589, 1596, 1597, 1641, 1647, 1655, 1662, 1664, 1678, 1688, 1691, 1693, 1709, 1711, 1722, 1735,
1736, 1738, 1740, 1748, 1760, 1767, 1769, 1770, 1771, 1773, 1775, 1777, 1783, 1793, 1812, 1813, 1816,
1822, 1827, 1830, 1831, 1836, 1839, 1871, 1872, 1877, 1903, 1907, 1920, 1923, 1925, 1936, 1937, 1945,
Ovarian 1946, 1958, 1959, 1964, 1969, 1988, 1993, 1996, 2001, 2010, 2022, 2031, 2038, 2043, 2071, 2083, 2084,
494
Cancer 2085, 2089, 2094, 2115, 2116, 2120, 2143, 2149, 2154, 2159, 2164, 2190, 2197, 2206, 2211, 2219, 2224,
2235, 2241, 2247, 2252, 2253, 2258, 2262, 2264, 2273, 2282, 2290, 2293, 2297, 2302, 2325, 2342, 2360,
2362, 2364, 2370, 2378, 2384, 2390, 2396, 2415, 2416, 2420, 2439, 2477, 2482, 2484, 2508, 2513, 2514,
2524, 2526, 2545, 2554, 2555, 2557, 2568, 2572, 2574, 2575, 2577, 2580, 2589, 2600, 2609, 2620, 2622,
2625, 2633, 2634, 2654, 2665, 2669, 2678, 2688, 2693, 2694, 2695, 2702, 2731, 2732, 2736, 2739, 2757,
2774, 2775, 2776, 2798, 2805, 2817, 2820, 2824, 2835, 2839, 2842, 2843, 2848, 2849, 2861, 2865, 2872,
2876, 2892, 2894, 2900, 2902, 2919, 2923, 2963, 2972, 3009, 3013, 3016, 3023, 3024, 3027, 3032, 3039,
3040, 3046, 3053, 3054, 3056, 3060, 3063, 3073, 3075, 3079, 3081, 3088, 3093, 3107, 3115, 3119, 3133,
3161, 3164, 3168, 3212, 3214, 3222, 3234, 3243, 3258, 3264, 3267, 3270, 3287, 3303, 3307, 3308, 3322,
3326, 3347, 3348, 3375, 3377, 3407, 3411, 3416, 3436, 3438, 3439, 3440, 3444, 3484, 3495, 3497, 3504,
3515, 3517, 3526, 3539, 3545, 3550, 3560, 3568, 3604, 3624, 3626, 3635, 3656, 3666, 3667, 3668, 3671,
3675, 3678, 3682, 3686, 3689, 3690, 3704, 3717, 3731, 3736, 3744, 3752, 3753, 3755, 3762, 3766, 3770,
3776, 3780, 3787, 3808, 3810, 3815, 3819, 3822, 3834, 3850, 3866, 3879, 3883, 3888, 3909, 3914, 3933,
3949, 3954, 3958, 3970, 3977, 3980, 3984, 3987, 3991, 3996, 3999, 4000

160
5.7.3.1 COMPARISON BASED ON THE ACCURACY AND BEST
FITNESS VALUE

This section compares the performance accuracy and best fitness value of the
aforesaid algorithms for NCD datasets described in Table 3.6. The results are
illustrated in Table 5.4. The table comprises of information such as dataset,
algorithm, number of features selected, accuracy, mean accuracy, best fitness,
RMSE. The highest values have been highlighted.

For the SPECTF heart disease dataset with a total of 44 features, GPSOGSA
outperforms PSO, GSA and PSOGSA with the best fitness value of 0.452 with 11
features with 98.1% accuracy which is 0.31% improved than PSO, 0.41%
improved than PSOGSA and 1.83% improved than GSA. The mean accuracy of
the proposed GPSOGSA is greater than the accuracy of GSA. PSO stands second
with 15 features and with 97.8% accuracy.

For Diabetes disease dataset, GPSOGSA, PSOGSA and PSO has selected an equal
number of 5 features, but GPSOGSA ranked high with 1.75% improved accuracy
and 0.347 as the best fitness value than PSOGSA algorithm. Compared to GSA
algorithms, GPSOGSA acquired an increase of 3.03% accuracy. The best fitness
value of GSA is low compared to other three algorithms.

For Breast cancer disease dataset, both GPSOGSA and PSO selected 5 subset of
features, but the highest accuracy is attained by the proposed one with an increase
of 1.41%. According to the WBCD dataset, GPSOGSA attains the best fitness
with most reduced set of features compared to the other algorithms. For WBCP
dataset, GPSOGSA, PSO and GSA resulted with 10 features each. But GPSOGSA
outperformed with an improved classification accuracy of 3.94% than PSO and
14.1% than GSA. For the Ovarian cancer dataset, GPSOGSA outperformed other
algorithm in all the evaluative aspects. Out of 4000 features, the proposed
algorithm has resulted with a reduced set of 494 features with a difference of
87.65% and achieved 100% accuracy. This projects that the proposed algorithm
suits well for high dimensional datasets.

For the Liver disorder dataset, the proposed GPSOGSA has attained 73%
accuracy. For the ILPD liver dataset, all the four algorithms resulted with same
number of 4 features, both GPSOGSA and PSO resulted with the highest accuracy
and Best fitness. For the Hepatitis dataset, GPSOGSA and PSOGSA selected 5
features, but GPSOGSA preceded with 0.71% with a best fitness value.

161
Table 5.4 Comparison of GPSOGSA with PSO, GSA and PSOGSA Algorithms

No. of
Mean Best
Data Set Algorithm Features Accuracy RMSE
Accuracy Fitness
Selected
GPSOGSA 11 0.981 0.976 0.452 0.4
SPECTF PSOGSA 19 0.977 0.94 0.429 0.4
Heart PSO 15 0.978 0.961 0.443 0.4
GSA 25 0.963 0.925 0.406 0.5
GPSOGSA 5 0.857 0.782 0.347 0.4
PSOGSA 5 0.842 0.794 0.339 0.4
Diabetes
PSO 5 0.805 0.771 0.347 0.4
GSA 9 0.831 0.779 0.272 0.4
GPSOGSA 5 0.993 0.982 0.422 0.3
Breast PSOGSA 6 0.971 0.934 0.398 0.4
Cancer PSO 5 0.979 0.943 0.415 0.4
GSA 6 0.967 0.931 0.396 0.5
GPSOGSA 5 0.989 0.965 0.469 0.2
PSOGSA 8 0.989 0.984 0.457 0.5
WBCD PSO 15 0.975 0.953 0.446 0.5
GSA 8 0.965 0.91 0.445 0.5
GPSOGSA 10 0.989 0.958 0.449 0.3
PSOGSA 13 0.963 0.949 0.425 0.4
WBCP
PSO 10 0.95 0.894 0.432 0.5
GSA 10 0.85 0.809 0.383 0.5
GPSOGSA 494 1.000 0.959 0.479 0.4
Ovarian PSOGSA 776 0.982 0.944 0.461 0.5
Cancer PSO 785 0.973 0.979 0.457 0.5
GSA 1147 0.977 0.984 0.448 0.5
GPSOGSA 4 0.725 0.694 0.279 0.5
Liver PSOGSA 5 0.690 0.710 0.255 0.4
Disorder PSO 4 0.687 0.632 0.257 0.5
GSA 3 0.739 0.710 0.283 0.4
GPSOGSA 4 0.848 0.798 0.369 0.4
ILPD PSOGSA 4 0.845 0.812 0.368 0.4
Liver PSO 4 0.848 0.763 0.369 0.4
GSA 4 0.828 0.761 0.359 0.4
GPSOGSA 5 0.982 0.954 0.453 0.4
PSOGSA 5 0.975 0.947 0.449 0.5
Hepatitis
PSO 7 0.956 0.864 0.427 0.5
GSA 6 0.965 0.939 0.438 0.5

162
From the results, it can be observed that GPSOGSA performs well for all the NCD
datasets. It results with a maximum accuracy for 8 datasets with best fitness value
and with a reduced subset of features except for liver disorder dataset.

5.7.3.2 COMPARISON BASED ON ROOT MEAN SQUARED ERROR

Root Mean Squared Error (RMSE) is the standard deviation of the prediction
errors. For a perfect model, the value of RMSE or Mean Absolute Error (MAE)
should be 0 (Ritter, 2013). But it is impossible to attain this value for a real time
application. Rather, the model is good, if the RMSE value is low and if the RMSE
value is greater than 0.5, it means that the model is poor (Hamid, 2019). The
model is good, if the RMSE value is low. Figure 5.6 depicts the comparison of
various algorithms based on RMSE values on NCD datasets.

From the figure, it can be observed that the RMSE values of the proposed
algorithm is low compared to that of the other three algorithms. For SPECTF
dataset, the three algorithms namely GPSOGSA, PSOGSA and PSO resulted with
an equal value of 0.4 and for Diabetes and ILPD dataset, all the four algorithms
had the same RMSE value of 0.4.

The RMSE values of GPSOGSA ranges between 0.2 and 0.4 for eight NCD
datasets except for liver disorder dataset with a value of 0.5. PSO also resulted
with the same value of 0.5 for this dataset.

For datasets like WBCD, Ovarian Cancer, and Hepatitis, the RMSE value was
similar for three algorithms such as PSOGSA, PSO and GSA with a value of 0.5.

The RMSE value of other algorithms ranges between 0.4 and 0.5 for all the
datasets. It can also be observed that the RMSE values of all the algorithms for all
the datasets did not exceed the value of 0.5. This shows that the proposed
GPSOGSA model predicts the diseases accurately.

163
Comparison based on RMSE values
0.6

0.5

0.4
RMSE Values

0.3

0.2

0.1

0
SPECTF Heart Diabetes Breast Cancer WBCD WBCP Liver Disorder ILPD Liver Hepatitis Ovarian Cancer

NCD Datasets
GPSOGSA PSOGSA PSO GSA

Figure 5.6 Comparison on the RMSE values of GPSOGSA, PSOGSA, PSO and GSA

164
5.7.3.3 COMPARISON BASED ON THE CONVERGENCE RATE

The algorithm is said to converge when the candidate solutions for each iteration
gets closer and closer to the desired solution (MetaFight, 2015). This section
projects the convergence curve of the proposed algorithm based on the fitness
solution to the number of iterations.

The fitness values of each algorithm on the SPECTF dataset is depicted in Table
5.5. The table comprises of details such as the number of iteration and the fitness
values of the corresponding algorithms. From the table, it can be observed that for
GSA, the fitness value starts with a minimum of 0.3784 and rises to 0.3924 at its
10th iteration and starts converging at 17th iteration. In case of PSO, the minimum
fitness value is 0.4333 which extends up to 28th iteration and from there onwards it
starts to converge. Regarding PSOGSA, the worst fitness value is 0.4108 and from
its 21st iteration onwards the algorithm starts its convergence towards 0.4297. With
respect to GPSOGSA, the fitness value begins with the least value of 0.4238, at
12th iteration it reaches 0.4343 and at its 15th iteration, the algorithms attained
0.4525 and there by starts to converge.

Table 5.5 Fitness values of various algorithms on SPECTF dataset

Algorithms
Iteration
GSA PSO PSOGSA GPSOGSA
1 0.3784 0.4333 0.4108 0.4238
2 0.3784 0.4333 0.4108 0.4238
3 0.3784 0.4333 0.4108 0.4238
4 0.3784 0.4333 0.4108 0.4238
5 0.3784 0.4333 0.4108 0.4238
6 0.3784 0.4333 0.4108 0.4238
7 0.3784 0.4333 0.4108 0.4238
8 0.3784 0.4333 0.4108 0.4238
9 0.3784 0.4333 0.4108 0.4238
10 0.3924 0.4333 0.4108 0.4238
11 0.3924 0.4333 0.4108 0.4238
12 0.3924 0.4333 0.4108 0.4343
13 0.3924 0.4333 0.4108 0.4343
14 0.3924 0.4333 0.4108 0.4343
15 0.3924 0.4333 0.4108 0.4525

165
16 0.3924 0.4333 0.4108 0.4525
17 0.4059 0.4333 0.4108 0.4525
18 0.4059 0.4333 0.4108 0.4525
19 0.4059 0.4333 0.4108 0.4525
20 0.4059 0.4333 0.4108 0.4525
21 0.4059 0.4333 0.4297 0.4525
22 0.4059 0.4333 0.4297 0.4525
23 0.4059 0.4333 0.4297 0.4525
24 0.4059 0.4333 0.4297 0.4525
25 0.4059 0.4333 0.4297 0.4525
26 0.4059 0.4333 0.4297 0.4525
27 0.4059 0.4333 0.4297 0.4525
28 0.4059 0.4333 0.4297 0.4525
29 0.4059 0.443 0.4297 0.4525
30 0.4059 0.443 0.4297 0.4525
31 0.4059 0.443 0.4297 0.4525
32 0.4059 0.443 0.4297 0.4525
33 0.4059 0.443 0.4297 0.4525
34 0.4059 0.443 0.4297 0.4525
35 0.4059 0.443 0.4297 0.4525
36 0.4059 0.443 0.4297 0.4525
37 0.4059 0.443 0.4297 0.4525
38 0.4059 0.443 0.4297 0.4525
39 0.4059 0.443 0.4297 0.4525
40 0.4059 0.443 0.4297 0.4525
41 0.4059 0.443 0.4297 0.4525
42 0.4059 0.443 0.4297 0.4525
43 0.4059 0.443 0.4297 0.4525
44 0.4059 0.443 0.4297 0.4525
45 0.4059 0.443 0.4297 0.4525
46 0.4059 0.443 0.4297 0.4525
47 0.4059 0.443 0.4297 0.4525
48 0.4059 0.443 0.4297 0.4525
49 0.4059 0.443 0.4297 0.4525
50 0.4059 0.443 0.4297 0.4525
51 0.4059 0.443 0.4297 0.4525
52 0.4059 0.443 0.4297 0.4525
53 0.4059 0.443 0.4297 0.4525
54 0.4059 0.443 0.4297 0.4525
55 0.4059 0.443 0.4297 0.4525
56 0.4059 0.443 0.4297 0.4525
57 0.4059 0.443 0.4297 0.4525
: : : : :

166
: : : : :
97 0.4059 0.443 0.4297 0.4525
98 0.4059 0.443 0.4297 0.4525
99 0.4059 0.443 0.4297 0.4525
100 0.4059 0.443 0.4297 0.4525

Figure 5.6 pictorially depicts the convergence curve of the proposed and other
compared algorithms on SPECTF dataset. From the figure, it can be observed that
for the GPSOGSA algorithm, the convergence has been achieved within 15
iterations. Initially, the candidate solution attained a peak over 12th iteration and
the next peak value has been reached at the 15th iteration and from that iteration
onwards the algorithm starts its convergence. It can also be perceived that the
proposed algorithm superiors GSA in its convergence with 2 iterations, PSO in 14
iterations and PSOGSA in 5 iterations.

SPECTF HEART DISEASE


0.46

0.45

0.44

0.43

0.42
Fitness

0.41

0.4

0.39 GSA
PSO

0.38 PSOGSA
GPSOGSA

0.37
0 10 20 30 40 50 60 70 80 90 100

Iteration

Figure 5.6 Comparison of convergence characteristics on SPECTF dataset

The detail on the number of iterations taken by each of the algorithm for its
convergence is represented in Table 5.6.

167
Table 5.6 Details of the number of iterations consumed by algorithms for
convergence
Algorithms
Datasets
GSA PSO PSOGSA GPSOGSA
SPECTF 17 29 21 15
Diabetes 11 46 3 8
Breast Cancer 31 23 21 12
WBCD 5 11 3 4
WBCP 14 13 17 10
Ovarian Cancer 18 12 13 20
Liver Disorder 12 13 12 20
ILPD 3 9 2 2
Hepatitis 13 9 10 8

From the table, it can be observed that the proposed algorithm converges quickly
than other algorithms for 5 datasets namely SPECTF, Breast Cancer, WBCP,
ILPD and Hepatitis. PSOGSA converged quickly for 4 datasets namely Diabetes,
WBCD, Liver Disorder, and ILPD. PSO and GSA converges quickly for 1 dataset
each. It can also be observed that the proposed GPSOGSA algorithm starts
converging within 20 iterations for all the datasets compared to other algorithms.
The minimum convergence of 2 is attained for ILPD and maximum of 20
iterations is taken for Ovarian Cancer and Liver Disorder dataset.

The results for each dataset show that GPSOGSA could find global maxima within
20 iterations. It is also able to produce an optimal solution without getting stuck
within the local optimum. As per the individual performance of the traditional
algorithm, the performance is low but when hybridized, it produces promising
results which will help the researchers in making meaningful predictions. It is also
observed that for K-fold cross validation, if the size of k is larger, the algorithm
consumes more time to find the optimal result.

Thus, the proposed algorithm has given promising results with a good
convergence rate than the other compared algorithms.

168
5.7.3.4 COMPARISON BASED ON THE COMPUTATION TIME

Computation time is one of the important aspects that project the efficiency of an
algorithm. An algorithm is proved to be efficient when it produces meaningful
results in a reduced amount of time.

Figure 5.7 represents the computation time taken by various algorithms on various
datasets to build the model. It can be observed that the proposed GPSOGSA
consumed less amount of time for five datasets namely Diabetes, Breast cancer,
WBCD, WBCP and Ovarian Cancer. It can also be observed that the proposed
algorithm consumed less than 1400 seconds to build the model for all the datasets
which is not in the case of other compared algorithms. However it took more time
compared to PSOGSA for two datasets SPECTF and Hepatitis. It can also be
observed that GSA consumed more time for all the datasets except Hepatitis
dataset compared to other algorithms. From the results, tt is clear that GPSOGSA
finds better optimal solution within a reduced time slot compared to other three
algorithms.

169
Comparison based on Computational time (in Secs.)
4500

4000

3500
Time taken in Seconds

3000

2500

2000

1500

1000

500

0
SPECTF Heart Breast Cancer WBCD WBCP Diabetes Liver Disorder ILPD Hepatitis Ovarian Cancer
GPSOGSA 1237.733 1286.74 849.73 553.7405 980 1356.654 1314.737 430.67 986
PSOGSA 665.03 2379.708 1130.75 898.1568 1250 1814.335 1309.141 200.63 1173.42
PSO 1140.32 1612.122 1261.9683 1127.0894 1650 1824.955 860.2132 550.21 4137.1733
GSA 3182.0774 3532.288 1311.785 1525.1812 2349 1935.6 1204.3811 534.98 2448.7832
NCD Datasets

GPSOGSA PSOGSA PSO GSA

Figure 5.8 computational time taken by GPSOGSA, PSOGSA, PSO and GSA

170
5.7.3.5 PERFORMACNCE OF GPSOGSA BASED ON THE RATIO OF
SELECTED FEATURES TO THE TOTAL NUMBER OF FEATURES

This section compares the performance of the GPSOGSA with that of the original
features. The proposed work results with a highest classification accuracy with a
reduced set of features as compared with the original set of features. It projects the
importance of how a wrapper – based feature selection approach helps to identify a
reduced set of features with good classification accuracy. Table 5.7 compares the
details of the number of selected features and the classification accuracy achieved by
GPSOGSA projects with that of the total features present in each of the NCD dataset.

Table 5.7 Results based on the ratio of selected features to that of original features

Total Ratio of
Total No. of Selected GPSOGSA
Data Set Features Features
Features Features Accuracy
Accuracy (in %)
SPECTF 44 0.728 11 0.981 25
Diabetes 8 0.844 5 0.857 62.5
Breast Cancer 9 0.755 5 0.993 55.56
WBCD 31 0.934 5 0.989 16.13
WBCP 31 0.954 10 0.989 32.26
Ovarian Cancer 4000 0.719 494 1 12.35
Liver Disorder 6 0.669 4 0.739 66.67
ILPD Liver 10 0.646 4 0.848 40
Hepatitis 19 0.858 5 0.982 26.32

According to the number of selected features, for SPECTF dataset, the resultant
features were 75% decrease of the total features. For diabetes dataset, it was 37.5% of
the total features. For breast cancer, 44.4% decrease of the total features has been
selected. From a total of 31 features present in the WBCD dataset, the algorithm has
selected a decrease of 83.8% features. For WBCP dataset, 32.26% of features have
been selected. Out of 4000 features in the Ovarian Cancer dataset, the algorithm
selected 494 features which select only 12.35% decrease of the total features. This
shows that the proposed algorithm suits well for high dimensional datasets. 60% of
features have been reduced in ILPD dataset and 73.7% decrease in hepatitis dataset.
The proposed GPSOGSA algorithm consumed less than 50% of features for six
datasets namely SPECTF, WBCD, WBCP, Ovarian Cancer, ILPD liver and Hepatitis.
It consumed more than 50% for three datasets namely Diabetes, Breast Cancer and
Liver Disorder. The results are visually represented in Figure 5.9.

171
Comparison based on the Number of selected features

No. of Features
44

31 31

19
11 8 5 9 5 5 10 6 4 10 4 5

SPECTF Diabetes Breast WBCD WBCP Liver ILPD Liver Hepatitis


Cancer Disorder
NCD Datasets

Total features GPSOGSA Selected Features

Figure 5.9 Comparison on the No. of selected features over the total number of
features

Since the dimension of features in ovarian cancer dataset is high than that of the
other datasets, it does not fit in the chart of Figure 5.9, so, it is represented
individually in Figure 5.10.

Comparison based on the No. of selected features of Ovarian


4500
Cancer
4000
4000
3500
No. of Features

3000
2500
2000
1500
1000 494
500
0
Ovarian Cancer

Total features GPSOGSA Selected Features

Figure 5.10 Comparison based on the No. of selected features of Ovarian Cancer

With respect to the classification accuracy, the selected features have resulted with
over 24% improved accuracy for three datasets such as SPECTF with 26%, Breast
cancer with 24%, Ovarian cancer with 28% and ILPD with 24% and over 13% for
two datasets such as Liver disorder with 13.9% and Hepatitis with 13%. The
comparison based on the classification accuracy is pictorially presented in Figure
5.11.

172
Comparison based on the Classification Accuracy
1

0.9

0.8

0.7

0.6
Accuracy

0.5

0.4

0.3

0.2

0.1

0
SPECTF Diabetes Breast Cancer WBCD WBCP Liver Disorder ILPD Liver Hepatitis Ovarian Cancer

NCD Datasets
Total features Accuracy GPSOGSA Accuracy

Figure 5.11 Comparison based on the Classification Accuracy

It is observed that GPSOGSA outperforms the traditional PSO, GSA and PSOGSA algorithms with the highest accuracy and with a
minimal subset of features.

173
5.7.3.6 COMPARISON WITH PRIOR WORKS

Table 5.8 depicts the comparison of the GPSOGSA algorithm over the recent
related researches on the same datasets. The results prove that GPSOGSA has
outperformed all those algorithms carried out by the earlier researchers.

For SPECTF dataset, the ensemble feature selection method (Ursula Neumann,
2016) has the highest accuracy of 86.5% whereas the proposed GPSOGSA has
achieved 97.8% which is 13.6% more. With respect to the Diabetes data, Ada
Boost + Decision Stump (Veena Vijayan, 2015) resulted with 80.72%, Improved
NB (Sneha, 2019) with 82.3, Improved Electromagnetism-like mechanism (Wang
KJ, 2015) with 77.21 and Hierarchical and Progressive Combination of Classifiers
(Kaur, 2017) produced a highest accuracy of 83.34% whereas GPSOGSA attained
maximum of those with 85.7 which 6.17%, 4.13%, 10.99% and 2.8% greater.

Regarding the Breast cancer dataset, ranker based SVM method (Ahmed Iqbal Pritom
M. A., 2016) resulted with 77.27% which is SVM with CART method (Lavanya,
2011) resulted with 73.03% and Correlation Feature selection and Random Forest (R.
Dhanya, 2019) produced 97.85, whereas the proposed algorithm improved the
accuracy with 28.3%, 35.9% and 1.5%. According to WBCD dataset, the maximum
accuracy as per the Threshold fuzzy entropy-based feature selection method is
97.28% whereas the proposed work resulted with 98.9% which is 1.66% more. As per
WBCP dataset, the result of the proposed work is 4% higher than that of hybrid SVM
and RVM classifier which was 96.41%. For the Ovarian cancer data, both the
proposed algorithm outperforms the 15 – Neuron (M. A. Rahman, 2019) and Self –
Organizing Maps (SOM) and Optimal Recurrent Neural Network (ORNN) (Elhoseny,
2018) with 1.3 % and 5% improved accuracy.

For the liver disease dataset, the SVM method (Esraa M. Hashem, 2014) produced
70% accuracy whereas GPSOGSA produced an improved accuracy of 7% with 4
features. According to ILPD dataset, Backward Elimination + Linear SVM (Fathi,
2020) approach resulted with 82.9% and but the proposed work resulted with
84.8% which is an improved accuracy of 2.29%. According to the hepatitis data,
Brain Storm Optimization Algorithm resulted with 97.16% whereas an improved
accuracy of 3% is attained by the proposed work. From this, it is observed that the
proposed work performs well than the traditional algorithms and the recent
researches carried on the above - mentioned NCD datasets.

174
Table 5.8 Comparison of GPSOGSA results with the recent researches on the NCD datasets

Research papers GPSOGSA


Data Set Selected Accuracy Selected Accuracy
Methods adopted Features (%) Features (%)
SPECTF Association Rules based Feature Selection (Qu Y, 2019) 14 77.14 11 97.8
Ensemble Feature Selection (Ursula Neumann, 2016) 19 86.5

Diabetes Ada Boost + Decision Stump (Veena Vijayan, 2015) - 80.72 5 85.7
Improved NB (Sneha, 2019) - 82.3
Improved Electromagnetism-like mechanism (Wang KJ, 2015) - 77.21
Hierarchical and Progressive Combination of Classifiers (Kaur, - 83.34
2017)
Breast Ranker + SVM (Ahmed Iqbal Pritom M. A., 2016) - 77.27 5 99.3
Cancer SVM + CART (Lavanya, 2011) 5 73.03
Dominance based feature filtering approach + SVM (Atrey, 2019) 5 99.6
Correlation Feature selection + Random Forest (R. Dhanya, 2019) 8 97.85

175
WBCD Threshold fuzzy entropy based feature selection (Jaganathan P, 31 97.28 5 98.9
2013)
Hybrid SVM + RVM Classifier (SK, 2018) 31 96.41
Non Linear Dualist Optimization Algorithm (Vijayeeta P, 2019) 31 97.13
Hybrid PSO + SVM (Utami DA, 2019) 31 87
Hybrid ABC + SVM (Utami DA, 2019) 31 88
WBCP Hybrid SVM + RVM Classifier (SK, 2018) 31 96.41 10 98.9
Hybrid PSO + SVM (Utami DA, 2019) 31 88
Hybrid ABC + SVM (Utami DA, 2019) 31 87
Ovarian Bagging and Random Forest (Arfiani A, 2019) 3600 100 494 100
Cancer Self – Organizing Maps (SOM) and Optimal Recurrent Neural - 95
Network (ORNN) (Elhoseny, 2018)
15 – Neuron ANN model (M. A. Rahman, 2019) - 98.7
Liver SVM (Esraa M. Hashem, 2014) 6 70 4 68.99
Disorder Integrated GA + Case Based Reasoning (CBR) model (Singh, 5 68.98
2017)

176
ILPD Liver SVM Classifier (Mamdouh E, 2014) 8 73.2 4 84.8
Variable – Neighbor Weighted Fuzzy KNN approach (Kumar P, 10 77.59
2020)
Random Under Sampling method + Stability Selection Method + 10 76.77
Random Forest Classifier (Akyo K, 2017)
Backward Elimination + Linear SVM (Fathi, 2020) 5 82.9
Hepatitis Brain Storm Optimization Algorithm (E, 2019) 10 97.16 5 98.2
Threshold fuzzy entropy-based feature selection (Jaganathan P, 10 85.16
2013)
Correlation based ensemble feature Selection Algorithm (Elgin 16 93.90
Christo VR, 2019)
Integrated GA + Case Based Reasoning (CBR) model (Singh, - 94.19
2017)
The symbol ‘-’ specifies that the number of selected features has not been specified by the author

The classification accuracy of the proposed algorithm is compared with the highest ever accuracy results of the recent related
researches. This is visually portrayed in Figure 5.12.

177
Comparison based on the accuracy of related researches
100

90

80

70

60
Accuracy in %

50

40

30

20

10

0
SPECTF Diabetes Breast Cancer WBCD WBCP Liver Disorder ILPD Liver Hepatitis Ovarian Cancer
NCD Datasets

Related works in % GPSOGSA in %

Figure 5.12 Comparison of GPSOGSA and the highest of the recent research works based on Accuracy

178
The figure projects that the proposed algorithm results with highest classification
accuracy than the accuracies obtained in the recent related works. The algorithm
has selected minimum number of features with high accuracy value for most of the
datasets. Some researchers have not specified about the number of resultant
features in their algorithms, but when compared to those accuracy values,
GPSOGSA has attained maximum accuracy values. For ovarian cancer dataset,
Bagging and Random Forest (Arfiani A, 2019) has produced 100% but has utilized
3600 features whereas the proposed GPSOGSA has selected only 494 features
and produced 100% accuracy. For the same dataset, though the authors
((Elhoseny, 2018), (M. A. Rahman, 2019)) has not mentioned about the reduced
features, their accuracy is lesser than the proposed algorithm.

The results clearly state that the proposed GPSOGSA algorithm works well than
the traditional acts as a global searcher algorithm for finding the best particles. The
outcomes show that the proposed GPSOGSA algorithm produces meaningful
results for all the NCD datasets. From this it is proved that GPSOGSA performs
well for high dimensional datasets.

5.8 SUMMARY

This hybrid wrapper based GPSOGSA is efficient in feature selection process that
leads towards the reduced number of features with accuracy, thereby it helps in the
early detection of Non-Communicable Diseases. The algorithm includes an
absolute Gaussian parameter that helps in improving the local searching ability to
identify the local attractor which strives to attain global best position with good
convergence rate that reduces the pitfalls of PSO and GSA. SVM is used as a
learner algorithm. The outcomes are compared with the two traditional algorithms
namely PSO and GSA and one hybrid algorithm PSOGSA.

The results prove that GPSOGSA finds better global optimal with a good
convergence rate escaping from getting trapped into the local minima. The
outcomes project that the GPSOGSA outperforms the traditional PSO and GSA
algorithm with good classification accuracy. The GPSOGSA algorithm has proven
to be efficient in reducing the feature subsets and converges faster than the

179
traditional algorithms. Its efficiency and effectiveness prove that it can be a good
classifier for the classification problems. It also performs well than the traditional
algorithms and the recent researches carried on the same datasets results in best
reducts.

The algorithm helps in the estimation of the classification accuracy with a reduced
set of features and with a reasonable root mean squared error value. This also
mandates the need of feature selection for the diagnosis of NCD’s. From the point
of performance and execution time, it is proved that GPSOGSA performs well for
high dimensional datasets.

Though the algorithm performed well for most of the datasets, it needs further
improvement for certain datasets like Liver disorder and Breast cancer. Also, for
some datasets, the algorithm seems to be time consuming. The next chapter
enriches with a mechanism to overcome these issues.

180

You might also like