You are on page 1of 9

1.C4.

5 algorithm

Ross Quinlan developed C4.5 algorithm which is used in generation of decision trees[1].This
algorithm is an improved version of ID3 algorithm.C4.5 is a statistical classifier used for
classification which builds trees based on information entropy.C4.5 can interpret and implement
discrete and continuous values.It is used for inductive inference which approximates discrete
valued indices.It chooses one best split for discrete feature but many split points for continuous
feature.A top down approach is used in the construction of decision trees using divide and
conquer strategy.Classification begins at the top of the decision tree which gradually moves
downwards to reach a leaf node based on approximate values.Instance based classification
begins at the root node,testing done at all nodes and branch selection is done based on branch
attribute values.The path from root to leaf becomes the prototype rule.The data attribute that is
effective in splitting samples to subsets is chosen at each node.The difference in entropy called
the normalized information gain is taken as the splitting criteria.Inorder to make decision,it is
essential to choose attribute of highest normalized information gain which is produced by
splitting which means reduction in entropy.Thus,a tree grows by selecting an attribute with the
smallest entropy or highest information gain[2].Recursive partitioning of data into subgroups is
the ultimate aim of C4.5 in which assumption based selected attribute is used to represent the
close link between decision tree complexity and amount of information. The learned decision
trees of C4.5 is represented by sets of if-then rules which are in human readable format.Rulesets
built using pruning of trees.

Pseudocode of C4.5 algorithm

……………………………………………………………………………….

Input:The original attribute valued dataset A and preprocessing nodes

Output:Decision tree generated for the purpose of classification

01: Start
02: Create an empty tree or empty node such that Tree={}
03: if (Tree={} or T 0) This is taken as node of failure
04: end if
05: else if
06: for each attribute t € A do splitting of T based on information theory conditions
07: end for
08: Choose the best attribute tbest based on the above computed conditions
09: Tests tbest as the root node and create a decision node
10: Tv=T subdatasets based on tbest that is induced
11: for all Tv do
12: Recursion is performed on sublists by splitting Tree=V=C4.5(Tv )
13: end for
14:Return the decision tree
References

1.Quinlan JR. Induction of decision trees. Machine learning. 1986 Mar;1(1):81-106.

2. Yeon, Y.-K., Han, J.-G., Ryu, K.H., 2010. Landslide susceptibility mapping in Injae, Korea,
using a decision tree. Engineering Geology 116, 274–283. https://doi.org/
10.1016/j.enggeo.2010.09.009.

2.K-nearest neighbor (KNN) algorithm

K-nearest neighbor (KNN) algorithm proposed by cover and Hart[1] is a simple,versatile, non-
parametric,lazy learner,instance based supervised machine learning algorithm.This algorithm is
easy to implement and it can be used for problems related to both classification as well as
regression.It is a wrapper technique that employs the training data in the test time to reach
predictions[2].Available datas are stored by the KNN algorithm and when a new data arrives,it’s
similarity is compared with the available datas and further it assigns the new data into the
category which is very much similar to the available datas.Weights are assigned to the neighbors
contributions in which nearest neighbors contribute more when compared to those that are
located at a distance.Only set of objects whose class or object property value is known is chosen
as neighbors which acts as training set of the algorithm.Training phase of KNN involves storing
of training samples class labels and feature vectors. This algorithm does not learn anything from
the training dataset but rather performs action on datasets when classification is done. In KNN
classification phase,K is a crucial parameter user defined constant chosen from domain
knowledge in which query or test point classification is done by assigning it with frequently
occurring labels nearest to the query or test point.Votes of K-nearest neighbor(small positive
integer) decides object membership and object is assigned to maximum vote class.Relevant
feature selection is essential to achieve better accuracy and cost reduction.Prediction of correct
class for the test data is done by calculating the distance between training points and test data.
Euclidean distance is used as a metric for continuous variables whereas Hamming distance is
used for discrete variables.

Pseudocode of KNN algorithm

……………………………………………………………………………….

Input: Training set T,test object m,category label L

Output:Category lm of test object O, lm belongs to the L

01: Start
02: for each n belongs to T do
03: Calculate the distance D(n,m) between n and m
04: End for
05: Select subset t from the dataset T,t contains n training samples which are the n nearest
neighbours of the test sample m
06: Calculate category of m
lm=arg macl € L ∑ I(l=class(n))
07. End

1.T. Cover, P. Hart, Nearest neighbor pattern classification, IEEE Trans. Inf. Theor. 13 (1)
(1967) 21–27.

2.Wang, A., An, N., Chen, G., Li, L., & Alterovitz, G. (2015). Accelerating wrapper-based
feature selection with K-nearest-neighbor. Knowledge-Based Systems, 83(1), 81–91.
https://doi.org/10.1016/j.knosys.2015.03.009

3.Support Vector Machine(SVM)

SVM is a machine learning algorithm put forward by Vapnik in 1995,which is on the basis of
statistical learning theory as well as structural risk minimization principle[1].Transformation of
input space to a high dimensional linearly differentiation space is done using right kernel
function defined by SVM.It generates function using labeled training data which maps an input
into an output.Like other supervised methods,including irrelevant and redundant features in
SVM models may have a negative impact on their computational
efficiency,accuracy,generalization and interpretability[2].It can be used for classification and
regression but primarily used for machine learning classification problems.The aim of SVM is to
create best decision boundary namely hyperplane created using extreme points or vectors called
support vectors that distinctly classifies the data points whose dimension depends upon the
feature count. Linear dataset classification is performed using linear SVM classifier using a
single straight line and hard margin that avoids misclassification.Non-linear dataset classification
is accomplished by using Non-linear SVM classifier and soft margin that allows few
misclassification with the expectation of achieving better generality.

Pseudocode of SVM algorithm

………………………………………………………………………………………………………

Input:Dataset T containing training and testing data

Output:Confusion matrix,Validation and calculated accuracy

01:Start
02: a and b loaded with labeled training data,β 0 or β partially trained SVM
03: c any value
04: Repeat
05: for all {ai,bi},{aj,bj}
06: do optimize βi and βj
07: end for
08:Until no changes in β or other resource conditions met
09:Support vectors βi > 0 are retained and accuracy is returned

[1] C. Cortes, V. Vapnik, Support-vector networks, Mach. Learn. 20 (3) (1995)273–297,


http://dx.doi.org/10.1007/BF00994018

[2]Phillips T, Abdulla W. Developing a new ensemble approach with multi-class SVMs for
Manuka honey quality classification. Applied Soft Compu

4 Hybrid optimization algorithm

A hybrid swarm intelligent optimization algorithm serves a two-fold purpose in this part of the
research. Initially, the algorithm is employed to select the significant features used for attack
identification and to tune the proposed neural network-based IDS models with optimal parameter
settings. This objective has been achieved using the Harris Hawks Optimization algorithm,
which has limitations during the training process. These limitations are addressed by combining
Harris Hawks Optimization (HHO) optimizer with Particle Swarm Optimization (PSO)
algorithm and also attain better trade-off between exploration and exploitation ability of the
algorithm.

4.1 Harris Hawks Optimization Algorithm


Heidari develops the Harris Hawks Optimization (HHO) algorithm based on the hunting
behavior of Harris Hawks [1]. Generally, the Hawks hunts in groups, and all the individuals in
the population collaboratively involve themselves in deciding on hunting the prey based on its
escaping energy level, which determines the hunting period. Based on the energy level of prey, it
may get caught immediately, or the hawks exhaust the energy level of prey to make a sudden
attack on it. The hawks are highly intelligent in making decisions during rapid and confusing to
exhaust and attack the rabbit that is considered as prey in this algorithm.

Figure 1 Hunting Mechanism of Harris Hawks


Pseudocode of HHO algorithm

............................................................................................................................................................

Input: Population Size, Convergence criteria


Output: The Fitness value and the corresponding position of the prey
01:Start
02: While (stopping criteria)
03: Do
04:Fitness (all Hawks in Population)
05: Define the position of the rabbit
06: for (all Hawks)
07: Update the initial energy level of prey and its jumping power.
08: Update the current energy level of prey
# Exploration Phase
09: if ( E  1 )
10: The position of each hawk in the population is adjusted by the equations
 X rand (t ) −
 q  0.5
 r1 X rand (t ) − 2rX (t )

X (t + 1) = 
( X
 rabbit (t ) − X n (t ) q  0.5

 − r3 ( LB + r4 (UB − LB )) (1)
# Exploitation Phase
11. if ( E  1)
12: if ( r  0.5 and E  0.5 )
# Soft Besiege
13:Adjust the Hawks position by the equation X (t + 1) = X (t ) − E JX rabbit (t ) − X (t ) (2)
#Hard Besiege
14: else if ( r  0.5 and E  0.5 )
15: Adjust the position by the equation X (t + 1) = X rabbit (t ) − E X (t ) (3)
#Soft besiege with dives
16:else if ( r  0.5 and E  0.5 )
Y = X rabbit (t ) − E JX rabbit − X m (t )
17:
18: Z = Y + SxLF (D )

Y if ( F (Y )  F ( X (t )))
X (t + 1) = 
19:  Z if ( F ( Z )  F ( X (t )))
20: else if ( r  0.5 and E  0.5 )
21: Y = X rabbit (t ) − E JX rabbit − X m (t )
22: Z = Y + SxLF (D )

Y if ( F (Y )  F ( X (t )))
X (t + 1) = 
23:  Z if ( F ( Z )  F ( X (t )))
24: return the solution

1.Heidari, AA, Mirjalili, S, Faris, H, Aljarah, I, Mafarja, M & Chen, H 2019, ‘Harris hawks
optimization: Algorithm and applications’, Future generation computer systems, vol. 97, pp. 849-
872.

4.2 Particle Swarm Optimization Algorithm

Kennedy and Eberhart introduced the PSO optimization algorithm in the middle of 1995,
inspiring the behavior of flocks of birds, schools of fish, and herds of animals to search for a
solution in space[1]. The algorithm is framed such that the randomly generated population is
trained to adapt themselves to attain an optimal solution in the space. The PSO is a swarm
intelligence-based strategy that aims to find the global optimal value in the given space.The
principle of the PSO has three steps: particle generation, position, and velocity equations update.
The particle in space represents the point that changes its position based on the velocity changes
in the space. The population is initially generated with a random value of position and velocity.
The design variables are constrained to the lower and upper bounds. A better solution is attained
by influencing fundamental particles in the population. So, the fitness value of all the particles in
the neighborhood is utilized to identify the best position with the optimal solution. Let the best
particle in a neighborhood be, Plbt and the best particle identified from the entire population be
Glbt .

Pseudo-code of PSO algorithm


..........................................................................................................................................................
Input:Position and velocity of particles which are randomly initialized
Output:Approximate global minimum position
01: Initialize the population of the swarm
02: While (stopping criteria)
03: Do
04: For all the individuals in the population
05: Evaluate the fitness value of all particles
06: If current _ pBest  pBest
07: then pBest = current _ pBest
08: else pBest = pBest
09: find gBest among the entire population
10: for all individuals among the population
11: update velocity and position of all individuals
12: return the solution.

1.Eberhart R, Kennedy J. Particle swarm optimization. InProceedings of the IEEE international


conference on neural networks 1995 Nov 27 (Vol. 4, pp. 1942-1948).

4.3 Proposed Hybrid HHO - PSO Algorithm


The conventional HHO algorithm suffers from poor exploration ability as the Hawks need to
wait for prey from several minutes to hours. This limitation has been eliminated by improving
the convergence speed of the algorithm, which has been done by integrating PSO and HSO. The
proposed research has chosen the PSO algorithm because of its simplicity and excellent
exploration ability. The advantages of HHO and PSO have been combined to present a hybrid
HHO algorithm to attain a tradeoff between exploration and exploitation mechanisms than HHO
and other conventional algorithms. In the exploration phase of the HHO optimization algorithm,
Equation 1 is modified by incorporating the PSO algorithm, and the modified form is given in by
the following Equations

 X rand (t ) − r1 X rand (t ) − 2r2 X (t ) + v(t + 1) q  0.5


X (t + 1) = 
 rabbit
( X (t ) − X (t ) − r ( LB + r (UB − LB )) + v (t + 1) q  0.5
n 3 4
(4)
v(t + 1) = v(t ) + c1r1{Plbt − X (t )} + c2 r2 {Glbt − X (t )}
(5)
The parameters of the algorithm are presented in Table 1.

Pseudo-code of proposed hybrid HHO-PSO


……………………………………………………………………………………………………..
Input: Population Size, Convergence criteria, random factors, acceleration coefficient, inertia
factor, upper and lower bounds.
Output: The Fitness value and the corresponding position of the prey
01:Start
02:Initialize the population
03:While (stopping criteria)
04: Do
05:Fitness (all Hawks in Population)
06: If current_pBest > pBest
07: then pBset=current_pBest
08:else pBest = pBest
09:gBest=particle with best pBest among the population
10:Define the position of the rabbit
11:For (all Hawks)
12: Update the initial energy level of prey and its jumping power.
13: Update the current energy level of prey
# Exploration Phase
14: If ( E  1)
15:The position of each hawk in the population is adjusted by the equations (4)
# Exploitation Phase
16: If ( E  1)
17:If ( r  0.5 and E  0.5 )
#Soft Besiege
18:Adjust the Hawks position by the equation (1)
#Hard Besiege
19:else if ( r  0.5 and E  0.5 )
20:Adjust the position by the equation (2)
#Soft besiege with dives
21:else if ( r  0.5 and E  0.5 )
Y=X (t ) − E JX rabbit − X m (t )
22: rabbit

23: Z = Y + SxLF ( D)
Y if ( F (Y )  F ( X (t )))
X (t + 1) = 
24:  Z if ( F ( Z )  F ( X (t )))

25: else if ( r  0.5 and E  0.5 )


26: Y = X rabbit (t ) − E JX rabbit − X m (t )
27: Z = Y + SxLF ( D)
Y if ( F (Y )  F ( X (t )))
X (t + 1) = 
28:  Z if ( F ( Z )  F ( X (t )))
29: return the solution
Table 1 Parameters of the proposed model

Neural Network Models


Parameters BPN Network MLP Network
Weights and Bias Optimally fed by HHO-PSO Optimally fed by HHO-PSO
Number of input Neurons Number of selected Features Number of selected Features
Number of hidden Layers 1 2
Number of hidden Neurons Initialized to (6-8), fixed during Initialized to (6-8), fixed
training during training
Number of output neurons 1 1
Activation Function Sigmoidal Activation Function Sigmoidal Activation Function
Learning rate 0.23(Fixed at end trial) 0.3 (Fixed at end trial)
Momentum Factor 0.4 Not applicable
Learning Rule Gradient descent rule Perceptron rule
Hybrid HHO-PSO
Population Size 100
Maximum Number of Iterations Until convergence attained
(u,v) (0,1)
 1.5
Initial Energy State E0 (0,1)

You might also like